GNOME characters(그놈 문자표)의 선택된 한글 음절의 자모분해 버그 수정 (with GNU libunistring)

First, GNOME Korea’s GNOME 20th anniversary party and Korean Translation Hackathon. Last, GNOME characters on screen.
Selected Hangul Syllable “가” on GNOME characters
Selected Hangul Syllable “가" Left. only shown “ㄱ" Bug , Right: Expected, shown ‘ㄱ’ and ‘ㅏ’
  1. GNU libunistring
The Unicode® Standard Version 10.0 — Core Specification, Chapter 3. Conformance http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
libunistring/lib/uninorm/canonical-decomposition.cif (uc >= 0xAC00 && uc < 0xD7A4) 
{
/* Hangul syllable. See Unicode standard, chapter 3, section “Hangul Syllable Decomposition”, See also the clarification at <http://www.unicode.org/versions/Unicode5.1.0/>, section “Clarification of Hangul Jamo Handling”. */
unsigned int t; uc -= 0xAC00;
t = uc % 28;
if (t == 0)
{
unsigned int v, l;
uc = uc / 28;
v = uc % 21;
l = uc / 21;
decomposition[0] = 0x1100 + l;
decomposition[1] = 0x1161 + v;
return 2;
}
else
{
#if 1 /* Return the pairwise decomposition, not the full decomposition. */
decomposition[0] = 0xAC00 + uc — t; /* = 0xAC00 + (l * 21 + v) * 28; */
decomposition[1] = 0x11A7 + t;
return 2;
#else
unsigned int v, l;
uc = uc / 28; v = uc % 21;
l = uc / 21;
decomposition[0] = 0x1100 + l;
decomposition[1] = 0x1161 + v;
decomposition[2] = 0x11A7 + t;
return 3;
#endif
}
}
  1. 초성+중성 2가지의 조합으로 존재하는 음절의 경우

예[Example,例]: 가 (ga)

Korean Syllable ‘가'’s

예[Example,例]: 쀍 (bbwelg)

Selected Hangul Syllable ‘쀍' canonical decomposition bug. Expected result is ‘ㅃ’ ‘ㅞ’
한국어의 음절의 정준분해 기능 버그를 수정중. I’m working about fixing the Korean Canonical Decomposition bug.
Now, I fixed Korean Canonical Decomposition Bug on GNOME characters
Now, I fixed Korean Canonical Decomposition Bug on GNOME characters
Now, I fixed Korean Canonical Decomposition Bug on GNOME characters
  1. 초성+중성 2가지의 조합으로 존재하는 음절의 경우[Korean Hangul Syllables about combined two elements such as CHOSEONG(초성,初聲,Initia consonant) + JUNGSEONG(중성,中聲,vowel)]

예[Example,例]: 가 (ga)

Decomposition of hangul syllable 
Unicode codepoint: U+AC00
Hangul(한글) ‘가’
jamo(자모/字母): ㄱ plus ㅏ
choseong(초성/初聲): ㄱ (codepoint: U+1100)
jungseong(중성/中聲): ㅏ(codepoint: U+1161)

Selected Hangul syllable ‘가’(U+AC00)
Present
Canonical decomposition:
ㄱ U+1100 HANGUL CHOSEONG KIYEOK
ㅏ U+1161 HANGUL JUNGSEONG A

Expected result
Canonical decomposition:
ㄱ U+1100 HANGUL CHOSEONG KIYEOK
ㅏ U+1161 HANGUL JUNGSEONG A

Hangul Choseong:ᄀ
Hangul Jungseong:ᅡ

예[Example,例]: 쀍 (bbwelg)

Selected Hangul Syllable ‘쀍’ canonical decomposition’s expected result.
Example] 쀍Decomposition of hangul syllable
Unicode code point: U+C00D
Hangul(한글) ‘쀍’
jamo(자모/字母): ‘’ plus ‘’ plus ‘
choseong(초성/初聲):ㄱ (codepoint: U+1108)
jungseong(중성/中聲):ㅏ(codepoint: U+1170)
jongseong(종성/終聲):ᆨ (codepoint: U+11B0)


Selected Hangul syllable ‘쀍’(U+C00D)
Present
Canonical decomposition:
‘쀄 U+C004 HANGUL SYLLABLE BBWE' It's intermediate step.
' U+11B0 HANGUL JONGSEONG KIYEOK'

Expected Result
Canonical decomposition(Fully):
ᄈ U+1108 HANGUL CHOSEONG SSANGPIEUP
ᅰ U+1170 HANGUL JUNGSEONG WE
ᆰ U+11B0 HANGUL JONGSEONG RIEUL-KIYEOK

Hangul Choseong:
Hangul Jungseong:
Hangul Jongseong:
Example] 각Decomposition of hangul syllable
Unicode code point: U+AC01
Hangul(한글) ‘각’
jamo(자모/字母): ‘ᄀ’ plus ‘ᅡ’ plus ‘ᆨ’
choseong(초성/初聲):ㄱ (codepoint: U+1100)
jungseong(중성/中聲):ㅏ(codepoint: U+1161)
jongseong(종성/終聲):ᆨ (codepoint: U+11A8)


Selected Hangul syllable ‘각’(U+AC01)
Present
Canonical decomposition:
‘가 U+AC00 HANGUL SYLLABLE GA' It's intermediate step.
'ᆨ U+11A8 HANGUL JONGSEONG KIYEOK'

Expected Result
Canonical decomposition(Fully):
ㄱ U+1100 HANGUL CHOSEONG KIYEOK
ㅏ U+1161 HANGUL JUNGSEONG A
ᆨ U+11A8 HANGUL JONGSEONG KIYEOK

Hangul Choseong:ᄀ
Hangul Jungseong:ᅡ
Hangul Jongseong:ᆨ
libgc: Perform full canonical decomposition for Hangul syllables
Previously, the code finding related characters only took into account
of composed characters built from a base character and combining
characters (such as Latin, Hiragana, and Katakana). However, Hangul
syllables are composed of two or three Hangul jamo characters, all of
which should be considered as a base character. This patch handles
that case properly.
For the implementation, uc_canonical_decomposition() is not capable of
decomposing Hangul syllables. Instead of the function, this patch
uses u32_normalize() with UNINORM_NFD, as suggested by Bruno Haible in:
https://lists.gnu.org/archive/html/bug-libunistring/2017-11/msg00002.html
https://bugzilla.gnome.org/show_bug.cgi?id=790391

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DaeHyun Sung(성대현,成大鉉,ソン・デヒョン)

DaeHyun Sung(성대현,成大鉉,ソン・デヒョン)

LibreOffice Korean Team,GNU,KDE Contributor,GNOME Foundation Member, My native language is Korean(한국어) My hobby is Learning Language(English,中國語(繁體中文,简体中文),日本語)