Archive for the ‘Thoughts’ Category

Multilanguage support in Windows programs

Sunday, April 6th, 2008 by Agro Rachmatullah

The more our information society progresses, the more we realize the value of having a program available in different languages. The effort towards internationalization and localization is an amusing trend for me, because at childhood I once thought that the existence of multiple languages is a bother, and envisioned that at some age the human civilization will settle on one ultimate lingua-gaea.

I couldn’t be more wrong. Now I realize that each language is beautiful and unique in its own right and every effort must be spent to conserve them, or at least document them sufficiently before its last speaker dies.

Anyway, multilingual programs… People might need a certain language simply because she couldn’t understand any other languages. In this case, the availability of a program in a certain language is crucial to reduce the technological gap. People might also want a certain language simply because she enjoys that particular language the most. Another use is for studying foreign languages, like me who tries to set every possible program to Japanese in order to immerse myself in the language.

With that in mind, we will investigate how Windows programs currently handle the user’s desire to choose her preferred language. (Some of the discussion might apply to other OSes such as GNU/Linux and OS X)

One installer to rule them all

The most convenient case is when a program installer includes all available translations. Examples are Inkscape, Pidgin, Battle for Wesnoth, Paint.NET, and iTunes. For example, when I tried to download the newest Inkscape, there’s only one installer for Windows: Inkscape-0.46.win32.exe. It has English, Japanese, and a myriad of other languages included. Even Indonesian!

The next question is, what language will such programs use by default? Some programs, most notably GTK programs, are smart enough to detect the operating system’s language settings. Where is it set?

In Windows XP, the user can set what language she prefers from “Control Panel” → “Regional and Language Options” → “Advanced” → “Language for non-Unicode programs”, like so:

Note that this setting is actually to enable non-Unicode (e.g., ancient) programs to display its text correctly instead of mojibake. However modern Unicode-aware programs use the value we set here to decide what language it should present to the user.

Mine is set to Japanese, so Inkscape appears like this:

Japanese Inkscape

Other programs ask for what language the user would like to use, perhaps at install time or when the program is run for the first time. An example is Paint.NET:

Paint.NET Setup

The rest just set the default language to English or whatever else the developer prefers. If the user desires, she can change the language through some means because the language data are already installed anyway. An example is an old version of OpenTTD which defaults to English:

Open TTD defaults to English

If the default language doesn’t suit you, how do you change it? Some programs offer the convenience of setting it within the program itself. An example is iTunes:

Language selection in iTunes

And The Battle for Wesnoth:

Language selection in The Battle for Wesnoth

GTK programs does not visibly offer any such options, because it assumes that you will in most cases want the language you set on the operating system. However, it atually checks for the availability of the LANG environment variable (probably ISO 639-1 codes). You can use it to quickly try out a language. For example, go to the command prompt and type:

set LANG=th

And from the same command prompt, run the program, say inkscape.exe:

Inkscape in Thai

Ah, I feel nostalgic :).

Exceptionally Easy to Extend (E3)

Some programs come with only one language, and to choose another we must download the required language files. Though quite inconvenient, it is not that bad because language files shouldn’t be that large. An example is µTorrent:

Language selection in uTorrent. Oops, you must download the language pack first...

Predestination, believe it or not

Other programs offer seperate installer for each language. If you go to their web sites, you will find one installer for the English version, another for the French version, and so on. The concept is very simple: what you download is what you get. Needless to say, it’s a pain the arse for the curious or the language learners out there. Two glaring examples are OpenOffice.org and Mozilla Firefox. Here’s a Japanese version of Firefox that I recently installed:

Firefox in Japanese

It’s of course better than no multilingual support at all, but still it’s a waste of bandwidth to download another version and troublesome to actually install it (uninstall the other-language version first).

My take

A modern program should at all cost include all available translations in its installer. It should then detect and display the user’s preferred language by default.

About language-changing facility, I can understand GTK’s decision to hide it from the program’s preferences. They are probably following Gnome’s guideline that every GTK app should look the same (think about themes), and if the user wants to change anything, she can apply a system-wide change. To take things into perspective, it’s bizzare to imagine that every GTK app has its own theme settings, right? At least you can set it per program using an environment variable.

For µTorrent that separates its language pack, I think it’s an acceptable special case because µTorrent aims to be a small no-frills downloadable program. Heck, even the language pack is larger than the core English program itself! However, I think for most other programs the size of the language files shouldn’t matter that much.

So big cheers for GTK apps and other programs that include all translations by default. A big, big boo for OpenOffice.org, Firefox, and the gangs that require a different download for each language.

Laborous questions in a test

Tuesday, March 13th, 2007 by Agro Rachmatullah

Why must instructors give a very “long” problem which doesn’t test understanding any better than a “shorter” problem?

Here’s an example problem to test the understanding of shift cipher:

Encrypt the plaintext “example” using the shift cipher with key B.

That problem should suffice. However here’s what some instructors like to give:

Encrypt the plaintext “iliketoseemystudentssufferhahahaiamevil” using the shift cipher with key P.

The second problem isn’t intellectually harder, it’s just more laborous!

I can forsee a similar agony in a microbiology test:

The nucleotide sequence of one DNA strand of a DNA double helix is:
-GGAGATCGCATGCATGCACAGCTGACGATGCA-
(dunno whether it is realistic, I just typed the ATGCs randomly)
What is the sequence of the complementary strand?

Isn’t a strand of -ATGC- enough?

PS: Oh and about that second example, it’s actually quite nice considering that my instructor gave a LONGER ciphertext to encrypt… Unbelievable…

Kanji as a form of data compression

Sunday, September 24th, 2006 by Agro Rachmatullah

Using kanji, many ideas can be expressed using just a few characters. For example, here’s how we write the 12 months in various ways:

Kanji Hiragana Roomaji English Indonesian
一月 いちがつ ichigatsu January Januari
二月 にがつ nigatsu February Februari
三月 さんがつ sangatsu March Maret
四月 しがつ shigatsu April April
五月 ごがつ gogatsu May Mei
六月 ろくがつ rokugatsu June Juni
七月 しちがつ shichigatsu July Juli
八月 はちがつ hachigatsu August Agustus
九月 くがつ kugatsu September September
十月 じゅうがつ juugatsu October Oktober
十一月 じゅういちがつ juuichigatsu November November
十二月 じゅうにがつ juunigatsu December Desember
Average
character
2.17 4.17 8.83 6.17 6.25

Note that the average character count drops from roomaji to hiragana. That is expected, since each hiragana symbol expresses the idea of mora which for this discussion can be regarded as a syllable. If we use roomaji, most syllables must be written using two or more characters. Therefore hiragana can be thought to compress roomaji. As a character, hiragana is more high level than roomaji.

The average character count drops again when we go from hiragana to kanji. Kanji is even more high level than hiragana. Each kanji expresses a certain idea. Because most kanji expands to more than one character when written using hiragana, kanji can be thought to compress hiragana.

I’ve heard people say, “kanji is sooo ancient. They should abolish it and replace it with something simpler and modern like the latin alphabet.” It eventually boils down to the unwillingness to memorize lots of high level symbols.

However, kanji is a form of pictogram. What they don’t realize is they also use some pictograms. Ever saw 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0? Great, let’s abolish them. Then we can all have fun writing “sixty five thousand five hundred thirty six” or “enam puluh lima ribu lima ratus tiga puluh enam”.

Anyway, it is natural to ask, “can we define even more higher level elements?”. I don’t see that happening in natural language, but there is one language in which simpler concepts (encoded in symbols) are used to consecutively build more complex ones: mathematics.

In modern mathematics, everything starts with the set theory. There we see symbols like “{”, “}”, “,”, and “⊆”. From sets, we can define things such as the natural number, and naturally (no pun intended) new symbols like “1″ and “0″ appear.

Going even higher level, there is calculus in which symbols like “∫” appears. Calculus is very high level so that using vector calculus, all electromagnetic phenomena can be written in only four equations (the so-called “Maxwell’s Equations“).

I think it is astonishing that using the more high-level symbols in Clifford Algebra, the Maxwell’s Equations can be written in only one equation.

Character variants in Unicode

Tuesday, September 19th, 2006 by Agro Rachmatullah

In the Unicode, there are several code points for fullwidth characters. Here’s a comparison between the normal ASCII characters and their fullwidth counterparts (the normal is written first):

AABBCCDDEEFF

The superscript characters like ² is also a display variant of normal characters like 2.

Another amusing thing is the existence of language-specific characters. An example is the Greek capital letter eta (Η, U+0397) and the Cyrillic capital letter en (Н, U+041D). In my machine, they look exactly like the Latin capital letter H (which is ASCII 72 or U+0048).

I actually have a mixed feeling about including display variants in a character set. In light of HTML and various text-formatting utilities (TeX, office suites), display variants can be regarded as a waste of code points. For example, in HTML subscripts can be achieved using the tag <sup> and specific fonts (for example fullwidth) can be chosen using CSS (or the old-style <font> tag). About language variants, again HTML renders this unnecessary because there is the “lang” (or “xml:lang”) attribute.

However, variants have some merits. One use of those variants is of course for plain text files. For example, with the character “²” I can write “a² + b² = c²” nicely in a plain text file. The other benefit is space efficiency. For example, “²” is one character, while “<sup>2</sup>” consists of a lot.

What I hate about language variants is that it conflicts with one major theme in the Unicode work: CJK (Chinese Japanese Korean) character unification. In the Unicode, there is no such thing as the Japanese 人, Chinese 人, and Korean 人. There is only one character for all three languages: 人. This is in spite of drawing differences between some of the characters! Thus, it is not possible to convey the difference in a plain text file.

For example, here is the CJK character for “now” but displayed differently (if your computer is set up correctly) because of the “lang” attribute: (Japanese) vs. (Chinese). Both are U+4ECA. In my computer it looks like this:

Japanese vs. Chinese 今

See the HTML source code for more info.

Recalling items in an ordered list etc

Saturday, August 19th, 2006 by Agro Rachmatullah

Recalling items in an ordered list - 5:22 PM 8/18/2006

Consider an ordered list like days (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday) and the first 10 natural numbers (1, 2, 3, 4, 5, 6, 7, 8, 9, 10). It is interesting that the ability to quickly enumerate the items forward doesn’t translate to the ability to do it backwards.

For example, I can say ‘a’ to ‘z’ very quickly, but I can’t say ‘z’ to ‘a’ quickly. Here are some lists and my ability to enumerate it:

  • Natural numbers from 1 to 10: forward and backwards
  • ‘a’ to ‘z’: forward
  • Musical notes (’do’ to a higher ‘do’): forward and backwards
  • Days: forward
  • Months: forward

Speaking about numbers and letters, it’s weird that I can compare 2 numbers blazingly fast but my letter comparing speed is very low. For example, given two numbers like 3 and 8 I can quickly grasp that 3 is smaller. However, given 2 letters like ‘o’ and ‘g’ I need to ponder a while before I can decide that ‘g’ comes before ‘o’. Is this because the set of numeric symbols is smaller (10 compared to 26)? Will backward orientation (practicing enumerating ‘z’ to ‘a’) help?

Word dump: random - 4:57 PM 8/18/2006

29 random words which makes my word count 1372:

Kanji Kana English
しんさ judging
ちょくせつ direct
きんちょう tension
ゆうじょう friendship
ぜったい absolute
入力 にゅうりょく input
英数 えいすう English (ASCII) coding
きねんび holiday
すっぴん face with no make-up
めちゃくちゃ mess
あたり前 あたりまえ usual
当たり前 あたりまえ usual
当り前 あたりまえ usual
かっこいい “cool”
入会 にゅうかい admission
びっくり surprise
おばけ ghost
おくさま madam
たすける to rescue
ぼっちゃん son (of others)
へたくそ extreme clumsiness
しかし however, but
マジ serious (not capricious or flirtatious)
バランス balance
ぶらんこ swing
からす to exhaust
プチ small (fr: petit)
行って来ます いってきます I’m off
からす crow, raven

I’ve exhausted my randow word stock so it’s time to do a topical word hunting (colors, body parts, etc).

Without a digital English-English dictionary etc

Wednesday, August 2nd, 2006 by Agro Rachmatullah

Word Dump: 好きすぎて バカみたい - 8:34 AM 8/2/2006

The current word dump is 好きすぎて バカみたい (Suki Sugite Baka Mitai) by DEF.DIVA (a H!P unit consisting of Abe Natsumi, Goto Maki, Ishikawa Rika, and Matsuura Aya).

Kanji Kana English
みたい -like
ララバイ lullaby
あこがれ yearning
そうしそうあい mutual love
ころ time
まける to lose
ずいぶん extremely
まえ before
こおる to freeze
レンジ stove
かいとう thaw
もどる to turn back
やり直す やりなおす to start over
ねむる to sleep
いじょう more than
めいわく trouble
くちょう tone
さいしょ first
おくり seeing off
下らない くだらない worthless
かえる to go home
うなずく to nod
おわかれ farewell

Stats:

  • Previous “average new words/song”: 19.5
  • New words in this song: 23 (same as previous one)
  • New “average new words/song”: 20.67
  • Total words in word list: 1240

Without a digital English-English dictionary - 8:37 PM 8/1/2006

When I reinstalled my Windows, I backed up Oxford Dictionary’s folder. However running the executable on the new Windows didn’t work because the program seemed to require a library which was unregistered. Therefore I’m now left without a digital dictionary.

PS: Previously I installed the Oxford dictionary (called “Oxford Advanced Genie) from a CD that wasn’t mine. The content is of course superb (you can even hear the sound of each word), however it isn’t free and the user interface is terrible.

OpenOffice.org Writer’s thesaurus isn’t always helpful. For example, giving the thesaurus “gourmet” as an input brings “epicure”, “gastronome”, “bon vivant”, “epicurean”, “foodie”, and “sensualist (general term)” as synomyms. “Sensualist” didn’t seem to fit the context (it was about food). “Foodie” is a term that is absolutely related to food, but what does it mean? A person that likes to eat a lot? No idea. The other words are complete aliens to me.

How about Wiktionary? The entry for “gourmet” is probably there but I can’t access it from home, obviously.

Somehow, I have the installer for linguist (probably copied off a friend). Since it can function as an English to Indonesian dictionary, I installed it. Searching for “gourmet” gives “ahli pencicip makanan”. Nice, but it seems to be a commercial program (which I somehow have illegally obtained) so I’m quite reluctant to use it.

So, currently my legal alternative is a good old printed dictionary (God bless the trees). I currently have “OXFORD Advanced Learner’s DICTIONARY” (4th edition, 1989) which I borrowed from my uncle. “gourmet” gives “person who enjoys and is exper in the choice of fine food, wines, etc”. Nice description.

Another alternative is a “SAT I” book I own. It has a 3500 word list. There, “gourmet” means “connoisseur of food and drink” and “connoisseur” (an alien word) means “person competent to act as a judge of art”. You can’t count on every word being on this book, but when it’s there, you get a nice definition and an example sentence.

Because a paper dictionary is quite a bother to use, I’m going to search for a freely available English to English dictionary database. Something like EDICT, where you can download the database and use any client to view it (or make your own).

Btw, I encountered the word “gourmet” while learning Japanese. In “Hello! Project DVD MAGAZINE volume 7″, the sentence “ゴルメレポーター。。。” popped on the screen:

gorume repootaa

Searching “ゴルメ” (gorume) in EDICT gives “gourmet”, an English word unknown to me (”レポーター” (repootaa) obviously means “reporter”). This is not the first time something like this happened. “curfew”, “stingy”, “sulk”, and “fickle” are some other English words I stumbled upon while learning Japanese. It’s interesting that learning Japanese reveals a lot of my English vocabulary deficiency.

Update: The solution is to use StarDict.

Bad handphone design - 10:41 PM 7/29/2006

On my phone (Samsung something) the disconnect button (the one with the red phone icon) is placed above the “3/def” button and below the “options” button. When writing an SMS, pressing that button will discard the message and bring you straight to the main screen. The message will be lost!

With that design, a user that accidentally presses the button will lose the message. Murphy’s law states that “when something bad can happen, it WILL happen”. It happened to me already around 2 or 3 times, and I was really pissed off when it happened.

I can envision 2 improvements:

  • When the “discard” button is pressed, prompt the user. This behavior can be found anywhere from Notepad (”The text in the “xyz” file has changed. Do you want to save the changes”) to Firefox (”You are about to close “x” open tabs. Are you sure you want to continue?”)
  • Go to the main menu directly, but save the message in the drafts folder. This “no need to save” behavior can be seen in some programs like Tomboy (a great GTK# note taking program).

It’s pretty simple, really. The principle is “don’t let users do disastrous things easily”. Things like this are now taught on a standard Computer Science course, “Human Computer Interaction” (but not if you get my lecturer since he didn’t have a clue what the course is all about).

Moyo Go Studio translation done etc

Tuesday, July 4th, 2006 by Agro Rachmatullah

BOAB naming convention - 4:13 PM 7/3/2006

Since the “BOAB”, “Another BOAB”, “Yet another BOAB” naming scheme sucks, I decided to change the naming scheme. The naming scheme will be “[The most insteresting BOAB] etc”, for example “I won a 10 million lottery etc”

Moyo Go Studio translation done - 3:30 PM 7/3/2006

I think the translation (784 strings) has passed sufficient quality control to be released to the wild. Some translation and naming issues that were brought to light:

  • How should menu items be capitalized? “Add Games”, “Add games”, “add games”, or, God forbid, “ADD GAMES”? I chose to capitalize the start of every word, except for some particles like “and”. The English strings aren’t consistent in this matter (for example “Add games” vs. “Save Board Capture As”).
  • Which brings us to the next question: What words are immune to capitalization? “and”, “or”, “in”, “on”… What else? I chose to not capitalize “not” (”tidak”). Is it appropriate? There must be a guideline somewhere…
  • If a menu item contains an explanation in parentheses, how should it be capitalized? An example is “Result (fast, inaccurate)”. I chose to capitalize the explanation also. Is it overcapitalizing?
  • What is the criteria to give ellipsis (…) to a menu item? I read on a usability article somewhere that ellipsis should be used on a menu item that prompts the user for more information before being able to execute the intended action. For example, “Save As…” should use ellipsis because the program will prompt the user for the file name before being able to save. “About” shouldn’t use ellipsis because it directly does the intended action which is displaying the about dialog. I follow this guideline. Some English strings violate this guideline, for example “Search” and “About…”. (Almost every Windows program use ellipsis for the “About” menu item)
  • A convention I don’t like is that the English strings append the ellipsis in the msgid. I prefer that the ellipsis be added at run time when building the UI. What problem does the current practise pose? Here’s one example: There is the msgid “Search”, which is used in the “Database” - “Search” menu. It really should be “Search…”, so I translated it as “Cari…”. It turns out that the msgid “Search” is used on places other than the menu (for example on the button on the dialog of “Database” - “Search”) and in those other places the ellipsis isn’t appropriate.
  • How should one capitalize tooltips? “Back up and take other variation” or “Back Up and Take Other Variation”? I prefer to capitalize only the first letter of the first word.
  • Should tooltips end with a dot? “Back up and take other variation” or “Back up and take other variation.”? Things get messy when a tooltip contain 2 sentences. I prefer to omit the dot for single-sentenced tooltips while use dots for multiple-sentenced tooltips. There are inconsistencies in the English Moyo Go Studio (For example “Move trough current variation.” vs. “Board perspective”).
  • The translation of a single English word can be multiple words, in which the problem of capitalization arises again. For example, should “Handicap” be translated as “Batu bantu” or “Batu Bantu”? The translator must check where the string will be used to determine the correct capitalization (by running the program and hunting for the string). A serious problem occurs when the string is used in 2 places with different capitalization requirements.

With those problems, it is impossible to create an Indonesian translation that consistently adheres to a good UI naming guideline. Therefore, some compromises must be made. I chose to make the menu perfect even if that means inconsistencies on other places.

I’ll give an example of the inconsistency that arises. Menu capitalization dictates that it is “Langkah Baik”, not “Langkah baik”. Sadly, a tooltip uses the exact same string. Because tooltips are supposed to use a different capitalization rule, the tooltip violates the rule.

My idea is that words in the msgstr should be lowercased except for proper names (which starts with an uppercase), and then the program will call an appropriate method to build the final string. For example:

string menuString = MakeMenuString(GetString(msgid));
string tooltipString = MakeTooltipString(GetString(msgid));

An alternative is to have multiple msgids for the same word which is used in different places. For example:

msgid "menu-handicap"
msgstr "Handicap"

msgid "tooltip-handicap"
msgstr "Handicap"

I envy languages without uppercase/lowercase mess, like Arabic, Japanese (カタカナはuppercaseじゃない), Korean, and Chinese.

I mentioned many inconsistencies of Moyo Go Studio’s English strings. However, I believe that string consistency is generally an underworked area in many other software projects. For example, Notepad++, the software I use to write this blog entry, also has many inconsistencies in its strings.

Other than string consistency (or the lack of), another area where Moyo Go Studio sucks is in its window layouting system. Widgets in its dialogs have fixed coordinate and size, so problems like this arise:

With the availability of toolkits like GTK, hardcoding coordinates and sizes is oh so outdated. It is as obsolete as hardcoding the time cycle of old games to clock frequecies (try playing Sonic 3 on a modern computer and you’ll see what I mean).

Moyo Go Studio sucks on at least one other thing: It can’t change language on run time (need to restart the program). Even my (+Wijaya and Karnan) LifeSimulator can do it :)! Hmm, as far as I remember, even SmartGo can do it :) (tried the time-limited trial version).

Wait, I found another area where MGS sucks: It doesn’t run on Linux (Frank stated clearly that he’s running a Windows software shop so no chance of this happening soon). It would be cool if MGS has a GUI-less server version mode or a library of its core with usage documentation (easier to port), so other people could write a Linux client.

How good is Moyo Go Studio as a Go Suite? I’ll probably write a full review some time in the future after I’ve become more familiar with the beast and if I have free time. My first impression is that it is a superb Go study tool lacking some UI polish. However don’t let this blog entry discourage you to buy MGS because the problems mentioned probably won’t bother your daily use and because it will certainly be fixed in the future (after the kickass tactical module project?). Moyo Go Studio is definitely worth your $$ (or time, if you plan to translate it)!

UPDATE: I told Frank about this blog entry. Some days later, on July 7, Frank released an update that adds an extra space to the problematic widgets. That workaround made my Indonesian translation look fine.

TK II done! - 12:27 PM 7/3/2006

At last I finished the TK II! I’ve given the report and CD to an Ilkom office staff which will hopefully be passed to Pak Medi.

My TK II was about making SharpJiten, a kanji dictionary written in C# which uses GTK# for the GUI and KANJIDIC for the dictionary. Here’s a screenshot:

Some last remarks:

  • I can’t get GTK# for .NET working. Therefore I opted to bundle Mono in the CD.
  • IconView is a terribly slow widget. Give it 200 items to draw and it will choke. A workaround is to divide the items into separate pages but I didn’t implement it.

A page, which contains SharpJiten’s description, installation method, source code, and TK II report can be found at http://agro.web.ugm.ac.id/sharpjiten.

Since the TK II is done, now I need to finish my other duty: translating Moyo Go Studio.

PS: One flaw in the hardcopy of the report is that page 12 and 13 is flipped!!!

Early month shopping - 9:12 PM 7/2/2006

Phenomenon: There are many people shopping early at the month (1st or 2nd day). Queues are extremely long.

Hypothesis: People just got their salary.

Explorer/Windows file name limitations - 10:17 PM 7/1/2006

In Windows Explorer, we can’t rename a file to “prn”, “con”, “com0″ - “com9″, “lpt0″ - “lpt9″, and only God knows what else. That’s because in the Windows world (which originated from the DOS world”, those names are special names used to address some devices. For example, “prn” refers to the printer (Which printer? Probably the primary one.) and “con” refers to the console. Turn on your printer (if you have any) and try this command:

copy con prn

You will effectively have a typewriter.

PS: How did I know it? Remembered it from my old DOS times… In those days, tricks like “copy con config.sys” is commonly cited.

That system itself is inferior to Linux. In Linux, devices are files in the /dev directory so you won’t have artificial file name limitations.

But if that’s the limitation of the lower system, I can accept that Explorer don’t allow it. However, what’s confusing is that Explorer won’t give any explanation for this behavior. Instead, when renaming files to one of those names, it will set the file to the previous name.

Next case. NTFS and FAT32 support file names that start with a dot (for example “.bashrc”). As a proof, try:

echo > .test

And a file with the name “.test” will appear.

However, try making a file in Explorer that starts with a dot and Explorer will say “You must type a file name”. Silly.

Another one. NTFS and FAT32 also support file names that start with a space (for example ” foo”), as long as it is followed by a non-space character. As a proof, try:

echo > " foo"

However, when renaming files using Explorer, it will remove the leading space.

I don’t know whether this is the limitation of NTFS and FAT32, but I can’t find a way to make file names that ends with a dot (like “crhsab.”) and file names that ends with a space (like “foo “).

I’ll test all those in ext3. As an evil hack, I’ll try creating the illegal files on a FAT32 partition under Linux >:).

Another BOAB

Saturday, July 1st, 2006 by Agro Rachmatullah

Dying Windows - 10:12 PM 6/30/2006

My Windows installation is starting to choke badly. Some of the annoying things:

  • Disk management using GUI is awfully slow. This is using any file manager.
  • Closed programs remain on the process list (in other words, they’re still running.
  • Programs randomly eat 99% CPU cycle. When the program is terminated forcefully, other program foolishly assumed the CPU killing job.

I’ll probably reinstall after finishing the TK2 report.

Morning breeze - 6:51 AM 6/30/2006

The cold morning chills me to the bones. Is that why I fear it so much? I have not felt it for ages so that the pain is almost like an exciting thrill.

The SRT subtitle format - 9:38 PM 6/27/2006

Making a subtitle using the SRT format is very easy. The example below is self-explanatory:

1
00:00:02.849 --> 00:00:05.348
JUMP JUMP take offしようぜ!
JUMP JUMP take off shiyouze!

2
00:00:05.400 --> 00:00:08.321
天使の羽を持っている
tenshi no hane wo motteiru

3
00:00:08.309 --> 00:00:11.600
見上げれば 未来
miagereba mirai

4
00:00:11.750 --> 00:00:14.783
Boys & Girls! Be Ambitious!

Some things to note:

  • The file should be saved using the srt extension. Saving in UTF-8 works.
  • The time format MUST be HH:MM:SS.SSS. For example 00:00:1.2 won’t work (there must be 5 digits for the “seconds” part.

A fine media player that supports the SRT subtitle format is Media Player Classic. Just activate the “Load Subtitle…” menu item from the “File” menu.

IME not restricted to natural languages - 9:07 PM 6/27/2006

When I activate the Japanese IME, I can pop 私 by typing “watashi”. This works anywhere: From typing on a text editor to renaming files on Explorer. I often imagine how easy computing life would be when the use of IME is not restricted to languages.

Here are some examples:

  • Typing “date” would yield the current date, for example “2006-06-27″
  • Typing “time” would yield the current time, for example “21:12″
  • Typing an equation would yield the result, for example “=2*3″ would yield “6″

If using an IME layer is overkill, a system wide shortcut key should be available to pop up stuffs that don’t require user input (like date and time)

Another BOAB

Tuesday, June 13th, 2006 by Agro Rachmatullah

Lazy programmer - 6/12/2006 10:44:41 PM

Once I saw an upperclassman editing a very large text file. It turned out that he was removing unwanted lines. The unwanted lines had a clearly defined pattern, but instead of making a program to remove it, he scanned the lines one by one.

He could program, but didn't use his programming skill. Sure, we don't need to make a program for every imaginable trivial task. However the text file which he edited was so large that the benefit of making the program would clearly outweight the cost of making it. A lazy programmer is not a programmer.

There are lots of small disposable programs on my disk. Some examples are:

  • A program to append a string to all file names on a folder (I once needed to rename hundreds of file)
  • A program to modify each line of a text file in various ways in which new modifiers can be easily added (the need to manipulate strings in a text per line happens really often)
  • A program to correct broken links created by a stupid program called WebCopier

That is because when I am faced with a repetitive task which can easily be programmed, I don't hesitate to make the program. I'm a strong believer that uncreative jobs are better left to machines, while humans better spend their time for creative endeavours.

SharpJiten 1.0 RC - 6/12/2006 10:12:01 PM


The program is essentially complete. On the screenshot you can see SharpJiten set to synchronize itself with the clipboard and additionaly only displays grade 1-3 kanji. When a bunch of kanji is copied from a Wikipedia article, the program updates itself and displays the relevant kanji.

What remains is tyding the code (throwing unused commented code, for example) and modifying the program to use an internal (embedded) EDICT instead of a separate EDICT file. Oh, and some days of bug testing to make sure that nothing silly happens.

SharpJiten is a memory hog - 6/12/2006 8:42:02 PM

SharpJiten is the name of the kanji dictionary program I'm building.

On the previous build, SharpJiten only loads some essential kanji info from EDICT. Those fields are the kanji itself in Unicode, (Japanese) readings, English meaning, stroke count, and grade info. This results on a memory footprint of around 26 MB.

However, EDICT has additional fields, more than what you can imagine. Some examples are SKIP code, various printed dictionaries' index, and Korean reading. When all fields are loaded, the memory footprint blows to 40 MB. As a comparison, Wakan requires around 37 MB and JquickTrans 14 MB. The memory usage of JquickTrans is amazing, considering that it is both a kanji and word dictionary.

The EDICT specification specifically states that any fields can be added at a later specification. SharpJiten will have a custom filter in which the user can specify arbitrary field to filter. This makes SharpJiten forward compatible with future versions of EDICT.

About the memory usage, I won't fuss over it. Deadline is approaching (when is it anyway?) and my priority is to have the program working. The only feature not implemented is the arbitrary field filter thing.

Yomiuri Meijin - 6/12/2006 5:38:01 PM

The Meijin is historically a title for the strongest Go player on Japan. Then the title was transformed into a prestigious tournament sponsored by Yomiuri Shimbun. The sponsorship was later took by Asahi Shimbun.

I have quite a lot of games from the old (Yomiuri) Meijin tournament. Here's the search result on the event tag:

Meijin (Yomiuri): 412
Old Meijin: 101

But of course there are some inapproriately/ambigously labeled SGFs:
9th meijin 1970: 2
meijin (yomirui): 1
meijin (yomuri): 1
1st meijin 1962 league: 3
4th meijin title match: 2
9th meijin 1970: 2
5th meijin title match: 1

Those labels are found by searching for "mejin" with the date 1962-1975.

So the total is at least 525.

Update: The total is not 525 but 523. Try to spot a mistake on the above BOAB.

sgf2tex - 6/12/2006 1:40:17 PM

I'm currently adopting Minue622's tutorial on Haeng-ma to Indonesian. While going to add a pro game example which is lacking in the discussion about iron pillar, I thought it was going to be really painful entering the move coordinates one by one to LaTeX. Therefore I created sgf2tex:


Here's a sample output:

cleargoban
black{q16,q4,q10,p15,q6,r3,r2,m4,l4,n4,o5,n16,r16,k4,j5,h5,g6,r11,r12,p6,p10,p9,g7,g8,g9,g10}
white{d4,d16,o17,o4,q3,p3,k3,m3,l3,n3,g16,r17,q17,j4,h4,g5,r10,q11,q7,q9,s9,f6,f7,f8,f9}
begin{center}
showfullgoban
{comment}
end{center}

The annoying thing is that the SGF format uses a different coordinate system than what most Go players (including client) are used to. Conventionally, the rows are numbered, starting from the bottom, 1 to 19 while the columns are labeled, from left to right, a to t. The letter i is ommited to avoid the confusion between capital i (I) and lower case l (as is "lamp"). Compare I to l and you will see that it really looks similar. The SGF format labels the rows from the top and using letters from a to s. The columns are labeled from left to right using letters a to s.

pdflatex bug - 6/12/2006 1:28:19 PM

Creating Go diagrams using the igo.sty LaTeX package and then compiling the pdf using pdflatex produces wriggly lines:


This is probably a bug on the very old pdflatex that I use, so I tried the more orthodox and roundabout method.

The first is to create the dvi using latex, then create the ps using dvips, the finally creating the pdf using ps2pdf. To my dismay, wriggly lines still exist!


You can see that the lines near the left egde are wriggly.

What works well is converting the ps to pdf using GSView:


Indexing is slow - 6/11/2006 1:29:01 AM

Extracted the 40 thousands or so games from MGS's web site. Then added those games into Kombilo. The whole indexing process took almost 2 hours on my supposedly-powerful 1.9 GHz processor!

Anyway, I did a search on games involving a historical Meijin. Here are the Meijins (from first to last) and the number of games in my new database:

Honinbo Sansa - 0 (the number of games in my old database is 0)
Inoue Nakamura Doseki - 9 (0)
Yasui Sanchi - 126 (8)
Honinbo Dosaku - 61 (37)
Inoue Dosetsu Inseki - 15 (0)
Honinbo Dochi - 25 (0)
Honinbo Satsugen - 13 (0)
Honinbo Jowa - 64 (0)
Honinbo Shuei - 227 (0)
Honinbo Shusai - 499 (3)

The new database is clearly superb!!!

The number of games on my old database is only around 7 thousands. An interesting obersvation is that searching for sanrensei on both databases takes the same amount of time (1.8 seconds).

Mnemosyne 0.9.4 - 6/10/2006 9:31:34 PM

Mnemosyne is a flash card program. You give it a set of question and answers, and the program will schedule those questions for you. In other words, it manages the process of memorizing lots of items. It is a must for anyone learning a natural language (English, Japanese, Arabic, etc) where thousands of items (vocabulary) must be memorized.

I replaced my aging 0.9.2 with 0.9.4. Nothing spectacular, just bug fixes for things that don't affect me. For the upgrade process, I backed up the .mnemosyne folder (Windows Explorer won't let you create files/folder starting with a dot btw), exported my 0.9.2 Mnemosyne data to an XML file, uninstalled 0.9.2, installed 0.9.4, and finally exported the XML file. Everything went smooth.

Deletion of the contents of boab.txt - 6/10/2006 9:28:13 PM

On my hard disk, a BOAB is stored on the file boab.txt. I will then bring the text file when I surf the net and post the contents to my blog. I've decided that after I post it, I'll empty the contents of boab.txt on my harddisk.

It would be great if my offline BOAB is stored on a wiki. The full history of the file will then be available. I'll try to install MoinMoin Wiki later on Dapper (MoinMoin Wiki is the wiki engine used for Ubuntu's wiki; Wikipedia uses MediaWiki).

BOAB

Saturday, June 10th, 2006 by Agro Rachmatullah

Don't know what a BOAB is? Read the last entry on this entry (hint, hint).

The numbering of free software projects - 6/9/2006 10:56:57 PM

I always thought version numbers as a decimal number. For example, 5.1 is less than 5.11 which is less than 5.2. So, if those 3 numbers are software versions, the idea that sprang on my mind is "5.1 is released first, then 5.11, then 5.2".

Of course I was shocked to know that free software projects (GNOME, Linux kernel, etc) don't think it that way. They reasoned that 1 is less than 2 whin is less than 11, so 5.1 comes before 5.2 which comes before 5.11.

Now, after getting used to the free software universe, I have come to like it more than my previous preference. You just need to change your point of view. Think of a book and think of the number before the dot as the chapter number while the number after the dot as a section number. For example:

"5.1" is analogous to "chapter 5 section 1"
"5.2" is analogous to "chapter 5 section 2"
"5.11" is analogous to "chapter 5 section 11"

Thinking that way, the sorting will make sense. When multiple dots are present (as in x.y.z.w), think about subsections and subsubsections.

You won't feel nervous again thinking that GNOME 2.12 is newer than GNOME 2.2.

My next computer stats - 6/9/2006 10:32:03 PM

RAM: At least 1GB (currently 512 MB). My current RAM is very insufficient, since on a normal session I open some IDEs (Visual Studio 2005 Express, SharpDevelop 1.1), some SDK documentations (.NET SDK Documentation, Monodoc), a media player, Firefox (browsers can be a crazy memory hog), dictionaries (English, Japanese), and still other programs (Go client, Go database, file explorer, image editor, etc). At this very moment, task manager indicates that 693 MB of memory is used. 512 MB isn't enough, Q.E.D.

Processor: AMD dual-core processor (currently AMD single-core). Programs are starting to get multithreaded, and it's thrilling to experience the speedup on a hardware that can actually run 2 threads at the same time. I'm especially watchful to the development of the multithreaded Go program Moyo Go Studio. Why AMD? Well, why Intel?

Hard disk: RAID 0 configuration (currently no RAID). Who likes to be bottlenecked by that sluggish piece of hardware? Oh, and the capacity should be at least 200 GB (2 x 100 GB).

Video Card: NVIDIA (currently NVIDIA). ATI is notorious for its bad driver support on Linux.

Monitor: LCD which supports 1280×1024 and at least 17 inch (currently CRT, 1024×768, probably 14 inch). My eye is tortured by looking at the monitor n hours a day. Switching from CRT to LCD should ease the pain. A larger resolutin support is needed because complex programs like IDE and Go database client has panels everywhere. A small resolution leaves an uncomfortably small space for the main panel. 17 inch is needed so that things at a large resolution won't look tiny (should be irrelevant on a vector graphic era).

Fan: A silent but powerful fan (currently noisy). The sound pollution emitted from my fan is unbearable. A computer is not a Harley. Noisy is not cool (pun intended).

Writer: DVD writer (currently CD Writer). DVD-Rs (the media) are insanely cheap right now. IIRC, with Rp. 4k you can get 4 GB storage. The perfect solution to backup trashes.

A powerful editor lacking 1 feature - 6/9/2006 10:23:24 PM

ConTEXT (why the crazy capitalization?) is my text editor of choice on Windows. It supports tabs, syntax highlighting, and most importantly user-defined actions. User-defined actions means that we can bind some keys (for example F9) to a shell command (for example to compile the file). This makes ConTEXT effectively a bare IDE for any task you can imagine.However, ConTEXT doesn't support Unicode. Yes, typing "watashi" on the IME yields ‚킽‚µ (hiragana) or Ž„ (kanji) on ConTEXT. That is pretty dumb, considering that it's now on 2006 and the idea of i18n (internationalization: i-(18 middle characters)-n) isn't anything new. I'll file a bug report on it.

BOAB: a neology - 6/9/2006 10:18:57 PM

BOAB stands for "blog on a blog", and this entry is an example of a BOAB. The idea is to blog anything interesting directly at home, and then uploading all offline blogs on 1 online blog entry. Since an offline blog entries will be a part of 1 online blog entry, it is a blog on a blog or BOAB.

The acronym BOAB doesn't come out of thin air. See FOAF.