Posts Tagged ‘Kanji’

Configuring the correct Japanese fonts for Windows GTK applications

Sunday, April 13th, 2008 by Agro Rachmatullah

On a previous blog, I discussed how win32 GTK/GTK+ programs are smart enough to choose a Japanese translation by default if your system’s language is set to Japanese. However, there’s one big shame that I concealed: it will not choose the fonts correctly.

Related to this problem is how the Unicode standard handles Japanese and Chinese characters. You see, the characters knows as kanji, used in Japan, historically comes from China. In fact, kanji literally means Han characters. But that happened more than a thousand years ago. Time always brings change, and now many characters are drawn differently in each countries.

On the image below, you can see how some Japanese characters (black) differs from the Chinese counterpart (blue):

Difference between Japanese and Chinese kanji glyphs

You can see that even the stroke count can differ!

Unicode, in its effort called Han Unification, insisted that Japanese, traditional Chinese, and Korean characters which historically were same must only get a codepoint. So there can’t be one Unicode character for the Japanese version of ‘close’ and another for the Chinese version. Any differences then must be achieved by fonts. So yes, in the screenshot above, the Japanese and Chinese characters are actually the same Unicode character, but rendered in OpenOffice.org with different fonts. And yes, that means you can’t display both Chinese and Japanese text in a simple text document (which can only use one font for the whole file), unless you happen to use only the characters which are country invariant.

Now, back to GTK. GTK programs use a configuration file called pango.aliases to select its fonts. Here’s a sample line:

sans = "arial,browallia new,mingliu,simhei,gulimche,ms gothic"

Now that line means that, if a character must be drawn on screen as a Sans-serif character (”sans”), then try to display it using the “arial” font which is first in the list. If the character isn’t on the system’s Arial font, then try “browallia new”. If it fails, try the next one, “mingliu”. And so on.

Problem comes when a static list like that meets the intricacies of Unicode’s Han unification. For probably a random reason, the configuration file of Windows GTK programs put Chinese fonts (mingliu etc.) before Japanese fonts (ms gothic etc.). So there you have it, a user interface of Japanese translation displayed using “Chinese” characters:

Inkscape using Japanese translation but Chinese characters!

If you’re like me, then that extra dot stroke on “chikai” will really get on your nerve.

The solution is a simple exercise of find and replace. Now find all files named pango.aliases on your hard drive, which most probably will be inside your Program Files folder. Each installed GTK program can have one, but they can also use the “shared” GTK’s. If you already know where your GTK programs are, the file is actually located in the etc\pango subfolder. Once found, replace the content with my hand-crafted version:

courier = "courier new,MS Mincho"

tahoma = "tahoma,MS PGothic,browallia new,mingliu,simhei,gulimche,ms gothic,kartika,latha,mangal"
sans = "arial,MS PGothic,browallia new,mingliu,simhei,gulimche,ms gothic,kartika,latha,mangal"
serif = "times new roman,MS PMincho,angsana new,mingliu,simsun,gulimche,ms gothic,kartika,latha,mangal"
mono = "courier new,MS Mincho,courier monothai,mingliu,simsun,gulimche,ms gothic,kartika,latha,mangal"
monospace = "courier new,MS Mincho,courier monothai,mingliu,simsun,gulimche,ms gothic,kartika,latha,mangal"

Now your configuration will prefer Japanese fonts rather than Chinese ones. Talk about font discrimination! Here’s the result:

Inkscape using Japanese translation and the correct fonts

Ah, Japanese translation in Japanese fonts. No more wrong fonts. That feels better.

2,500 kanji and counting :)

Wednesday, April 9th, 2008 by Agro Rachmatullah

kanji scroll

Wow… I haven’t done my dump for ages. The last one was around 3 months ago!

But rest assured, since graduation I have been doing lots of real readings. I stil haven’t finished the monstrous Wikipedia WW2 article. I also managed to read some stories from bookstudio, mostly ren’ai stuffs and some adventures. Plus some other random stuffs like newspapers and some physical books I have.

I also still haven’t finished these two textbooks: Routledge and Colloquial Japanese. I’m also still listening to japanesepod101 daily to improve my listening. I’ve finished all survival and newbie series, and now halfway through beginner season 1 :).

But now that I’m now officially working as a programmer whose job doesn’t have anything to do with Japanese, my time to study Japanese has decreased significantly. Let’s just hope it doesn’t die out. Please everyone, pray for me.

Anyway, for this dump I’ve gathered 300 new kanji and 491 new words. Now the total is 2,609 kanji and 10,888 words. Yes, I’ve passed the 2,500 milestone!

Impressive? Well, I think I still need around 500 more to achive the level of literacy I want. I still occasionally find new ones, see? The key here is constancy. If I keep reading and learning new kanji, no matter how slow it is, it will pile up. In my case, the average learning rate between my last dump and this one is only 3.9 kanji per day. That’s not too demanding, right? Couple it with Mnemosyne so you won’t forget kanji you’ve learned.

One tip today for those also learning kanji. Don’t get too caught up in artificial lists (Jouyou, JLPT)! Go find some real reading material and learn the kanji you find there! By experiencing how the kanji is actually used, it will stick on your mind more easily. You might be interested to know that from the 2600+ kanji I’ve learned, I still haven’t encountered three grade 5 and one grade 6 Jouyou kanji. And of course lots of grade 8 kanji. That’s depite the fact that I found lots of “gradeless” kanji! Shows you how useful those lists are…

The only time where you should exhaust an artificial list is when you’re going to take a test with predetermined kanji (e.g., JLPT) soon. Other than that, you can forget about them.

Anyway, here are all the new kanji:

耕俵賃卵憾恨剤剖泥憤窮喪藻准酬酵紡祉劣鯨鋳濫庶忌衰痢弦栓酢沼繕懲閥礁堕諭粘唆朴膜畔虜禍疾較塊稚炊隅彫憂耗爵勲賊糧悦吟穫漸慨隻累霧鎮囚棺愁賄硫鉢窯剛墜狩洪脂昆喬那捷爾鷹彬冶雛匡宏瑚胡鮎蒔晃瑛綺艶杏玲鯛碧倭蓮鶴倖毅茜祐眉袈裟鳳瞭杜耽盧醒嬌倅甥賁躬壺澡嵒癌國對汪峯鍼雚灌旡漑賑榮梁刅桼膝鵜裾氾諜隂榴憐楚諺孚孵芻殲註楯迂坦匛柩匙厷鮪躓珊瞑冥埶藝鼎蹲夋戚尗脆冨撒鞍晦顛顚喩囁妓叱驢哨壽躊躇溌剌悖遜狐與嶼蜂卜頷叕啜雫掬鬯鬱蛤烏曖昧雜囃菱霰俯淵蚤掻怯焚眈窺隙涛偕炸絨毯戮翏耒廻蓋盍牡鴨禽浙咳厠珀奄串魏酎隺堺楔犀蛾皺栖醤籤韱怨杓這睫顰姑舅艮饂飩噤厓弄鯖煎凭吃麺拉軋蕩謨匆枷戎賂櫛凰羔匋掏仇妬耄彔碌抓膠撰洛苔廿燕臙汝鎧

And the words:

(more…)

ni-sen ijou

Monday, January 21st, 2008 by Agro Rachmatullah

Excuse my laziness of blogging… You see, I’m now in this remote place called Sokaraja and circumstances force me to go to the town Purwokerto to surf the net. That’s quite far for my standard and so… Well enough excuses.

This will be just another monotone dump, but believe me the study isn’t as boring as this post looks. I’ve dumped 92 new kanji and 131 new words, for a total of 2,309 kanji and 10,354 words. Believe me, even with this amount of kanji I’m still humbled by the amount of new characters I found every day. Just keep moving on and know no surrender.

To spice things up a bit, I’ll tell you my current Japanese diet. I’m still trying to finish that WW2 article on Wikipedia. It goes roughly two paragraphs a day, so probably hell will freeze faster. I’m also playing freeciv, an open source game which has a Japanese translation! Not so much playing, but exploring all the text inside and trying to read it. If you’re interested in trying it but has problems, just mail me (for me I can’t just run it and get a usable learning environment, but I’m not writing about it now). Like explained on another post, I’m also still going through “Japanese: A Comprehensive Grammar”. All those and randomly leafing through Japanese books I have/borrowed.

Ah, I almost forget… I also now regularly listen to podcasts downloaded from japanesepod101.com. Be sure to visit that site!

So here are the kanji:

肥醸陶婆浸艇殻疫謀喝騰迅肢燥紳捜侯赴薫該貞偵晶拷謹刃彰銃痴斎附帥稼簿弊絞宥邑昌旭禎嘉慧栗堆晒曾傀儡爺塹壕揆簒恫剃蟹宋楷艸已筈馳飴瘡汲釧喉瞿矍攫侭謂唖尖曰籠夭訃凛繚峙骸崖袖嘗袴溺牽溥奢綻

And the words:

(more…)

Dump: 2200 kanji and counting

Saturday, January 5th, 2008 by Agro Rachmatullah

A regular run of the mill dump post. So yeah, I still read Japanese materials routinely to find new words and especially kanji, and right now my main sources are the WW2 article on Wikipedia which is still a long way to finish and starting to get extremely boring and tiresome (勃発、勃発、侵攻、侵攻), an encyclopedic Japanese grammar book “Japanese: A Comprehensive Grammar” from “Routlege Grammars” which I like very much because it contains translations and for every example which is written in genuine Japanese characters, and some other reading sources like the various Japanese magazines and books I have on my disposal which I open randomly and by whim (see screenshot above for an example). Oh, and if you think the previous sentence is too long, blame me for reading too much written Japanese in which sentences are unreasonable long which is apparently just for the author’s pleasure to torment foreign readers which are not accustomed for such lengthy parsing using their untrained brain which is actually a very capable biological computer.

For this dump, there are 100 new kanji and 178 new word. Now my kanji count is 2,217 and my word count is 10,223. It might be interesting to know that among those 2200 or so kanji, I still haven’t encountered six grade 5 kanji and three grade 6 kanji! So there you have it for the commonness of Jouyou kanji.

Note about my method of memorizing these words and kanji. When I encounter new words, I searched for it in an electronic dictionary and then put it on my spreadsheet file of Japanese words (and kanji). I just collect it there as much as I find. Then, I separately put the words there to Mnemosyne, first come first serve. These two are not synchronized, so I don’t have to directly put all words I find to Mnemosyne. In fact, I have almost 3,000 words that I’ve put on my word list waiting to be put into Mnemosyne.

Anyway, here are the kanji:

穀律尺酔怠賓克債墨戒併隷循誇呈排斥薪漂錯枠弧賠窒掌覇津襟某斉撲罰封搭溝啓妄祥洲伽麿蘭玖伍綜渚晋叡哉眸鯉緋鳩冴卆蒋并餅孛勃葛盡儘夸牒狼厭猒區謳賭阡萬肆捌陌戊庚癸苺膣腟氐咸股踪幟摺柿匪榧刳肛菐斬荅亢杭釘惧

And the words:

(more…)