Sorting strings: OpenOffice.org Calc, Explorer vs. Nautilus, dir vs. ls

(Written on 9:26 PM 9/11/2006 GMT+7)

Sorting Japanese words in OpenOffice.org Calc

While sorting my Japanese words in OOo Calc, I noticed that the katakana ア is between the hiragana あ. After a curious investigation, I concluded that OOo Calc doesn’t distinguish between hiragana and its corresponding katakana for sorting purposes. Uppercase and lowercase latin letters are also regarded as the same.

Therefore, the starting condition will determine the "sorted" condition. For example, the following column won’t change if sorted:

A
a

But the same is true for this column:

a
A

Explorer works the same way as OOo Calc, treating capitals the same as its small counterparts and hiragana the same as katakana:

Sorting in Explorer

However, the dir program treats katakana after hiragana which is inconsistent with Explorer:

E:\Temp\sorting test>dir
 Volume in drive E is Archive
 Volume Serial Number is A809-0E48

 Directory of E:\Temp\sorting test

09/08/2006  08:13 PM    <DIR>          .
09/08/2006  08:13 PM    <DIR>          ..
09/08/2006  07:47 PM                 0 Aa
09/08/2006  07:47 PM                 0 ab
09/08/2006  07:47 PM                 0 ba
09/08/2006  07:47 PM                 0 Bb
09/08/2006  07:47 PM                 0 あa
09/08/2006  07:47 PM                 0 いb
09/08/2006  07:47 PM                 0 アb
09/08/2006  07:47 PM                 0 イa

But the behavior will change if we use /o:n (sort by name):

E:\Temp\sorting test>dir /o:n
 Volume in drive E is Archive
 Volume Serial Number is A809-0E48

 Directory of E:\Temp\sorting test

09/08/2006  08:13 PM    <DIR>          .
09/08/2006  08:13 PM    <DIR>          ..
09/08/2006  07:47 PM                 0 Aa
09/08/2006  07:47 PM                 0 ab
09/08/2006  07:47 PM                 0 ba
09/08/2006  07:47 PM                 0 Bb
09/08/2006  07:47 PM                 0 あa
09/08/2006  07:47 PM                 0 アb
09/08/2006  07:47 PM                 0 イa
09/08/2006  07:47 PM                 0 いb

This is weird because by default dir already sorts latin alphabets by name (in other words, the default behavior should match /o:n).

So how does Ubuntu 6.06 fare? I booted the Live CD and here’s Nautilus in action:

Broken sorting in Nautilus

Total mess! Why are kana interspersed between latin alphabets? I couldn’t figure out how that program sorts…

ls (the console command "el-es") is no better:

ubuntu@ubuntu:/media/ntfs/Temp/sorting test$ ls -l
total 0
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 あa
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 イa
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 Aa
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 ab
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 いb
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 アb
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 ba
-r-xr-xr-x 1 root root 0 2006-09-08 12:47 Bb

I’ve reported those bugs to Ubuntu’s Launchpad.

Share and Enjoy:
  • bodytext
  • del.icio.us
  • Technorati
  • Slashdot
  • StumbleUpon
  • Sphinn
  • Facebook
  • Mixx
  • Google
  • TwitThis
  • Live

Leave a Reply