Pages

Friday, October 10, 2014

Gridwrite for Chinese, part 2

In this post, two reference charts, one for traditional chinese and one for simplified chinese are given. The charts list a collection of radicals and their corresponding Gridwrite code numbers. Radicals are parts built by the 27 basic strokes that we had mentioned on part one of this blog and they acted as components in chinese characters.

As an example, the chinese character '鉛' (meaning the metal 'lead'), the part on the left '釒' is a radical.

Many radicals are actually formed by a modification from a character. The radical '釒' is actually formed from the chinese character '金' (meaning the metal 'gold'). Two general rules exist between a chinese character and it's modified radical :
  1. For all stroke type 7 that contained in the character, change it to stroke type 3.
  2. When the last stroke of the character is stroke type 2 and the stroke is not a bounded stroke, change it to stoke type 6.
It can be seen that the gridwrite code for the character '金' is 67221'62. According to the rules, all stroke type with 7 would be converted to "'" which represents stroke type 3. The last stroke of '金' is not a bounded stroke and it is stroke type 2, so it is changed to stroke type 6, hence resulting code for the radical '釒' is 6'221'66.


For a bounded stroke, it can referenced to the character '日' where the last stroke type 2 is bounded.

Since radicals are standard parts of chinese characters, knowing the code number of those parts would be critical for entering correct code number of a chinese character.

RadicalGridwrite codeRadicalGridwrite codeRadicalGridwrite code
'4'601
1'2860
0606
14221664'
866456''4
25'181216
10188'447
28'420''6
451256''1
6561'6
'2606''6'66'
812661242856
6216''''6426
22161212'41'
'447289'216'
6022
60'6
21122'26'64'667
4'456'416'''468
'2'6614112
62'62''6216'88''''
62218121122628'216
612422660'2'14216'
21411262216'
'2221422142'666''6566
142226'21215''2121267
142121661022262142221
21468221422126'2'6226
6'225''6'4228'
6'221'66212228'14221022
2'41'66'2212152262261222
61'22212
21121422126142226'69614216'
96'6696'1
1421'4102212212221256'''''8168
614212688'621422126212210''''
6414212''''6142220''''12146'''''2
鹿'2641122868
21122142126'14'62122''''
14221821224226124228228228
12126'6'26'6'81'2'6210222828222

The reference chart for simplified chinese is shown below :

RadicalGridwrite codeRadicalGridwrite codeRadicalGridwrite code
'4'601
1'2860
0606
14221664'
866456''4
25'181216
10188'447
28'420''6
451256''1
6561'6
'2066''6'66'
'6161242856
6216''''6426
2216211'41'
'47289'216'
6022
21122'26'64'667
4'456'416'''46'
'2'6614112
62'62''6216'886
62218121122628'216
612422660'2'14216'
21411222216'
'92142'666''6566
146'212656'2121267
142121661022262816
21468221422126'2'6226
6'2256'648
62228212228''10
2'422222212152212221222
61'22212
21121422126146'696'
966'
144'4102212212221256'''''8168
614212688'621422126406
6414212660'0212146'''''2
鹿'2641122868
21122142126'14'62122''''
142142286124228228228
齿12126'81'2686'

Get familiar with these charts and it would become easy to input code number for any chinese character.

Tuesday, August 12, 2014

Gridwrite for Chinese, part 1

This post is about Gridwrite for Chinese. The idea of Gridwrite to input Chinese character is based on the fact that every character are composed by a limited number of basic strokes. From Calligraphy (an art of writing chinese character), the number of basic strokes is 27. Gridwrite grouped 24 of these basic strokes to 10 stroke types and the remaining 3 as a combination of these 10 groups. The glyph (ie the looking shape) of the 10 strokes types are shown below :


The orginial 27 basic strokes and its Gridwrite stroke types representation is shown below :


Here is an example to show you how to use the 10 stroke types to code a chinese character. Using the chinese character "日" (its meaning is 'day' in english), this character can be decomposed into a sequence of basic strokes : | 乛 一 一 which is then coded as 1422.
This is the basic idea for Gridwrite and why it is a stroke-based method (or glyph-based method).To implement this idea, a different approach is needed for chinese character :
  1. What is a stroke ? When we use a pen to write a stroke on a flat surface such as a paper, there involves three processes :
    1. The pen is down.
    2. The pen moves and draw a path on the flat surface.
    3. The pen is up.
    And that is one stroke. That's explain why the character "日" is coded as 1422 but not 12122 because when we write the top right corner, the pen do not moves up, it just changes drawing direction by 90 degree, so the second stroke is not 2 but 4. In principle, we can move pen up and down at any time during writing but it would slow down the writing process and causes more energy.
  2. The code number defined is not unique. That is, two different chinese characters can have same code number. The code number 1422 of the example is also the code number for another chinese character "曰" which means 'say'. The shape of both characters are nearly the same except that their width to height ratio are different. The reason for non-unique of the code number is that the number is just the stroke order and so only provides 'partial' information about the shape of the character.
  3. What is stroke order ? One explanation is the following : It is the order of writing a character that minimizes our 'mental loading'.
    Let's use writing an english word as an example. When you write an english word 'is', normally you would write 'i' first and 's' second. This is because english sentence is read from left to right and so we must write the sentence from left to right. This induces that every words of the sentence must be also written from left to right including 'is' if that sentence contains this word. Because if you write 's' first and then 'i', you need to reserve a space for writing 'i'. This means that you need to write 'i' mentally in your mind first. On the other hand, write 'i' first do not need to do tis reservation process, so it is quite 'natural' to first write 'i' first, then 's'.
    Therefore, we can say that english word writing has a stroke order : from left to right.
    We think similar conclusion can also applied to writing of chinese character. Based on historical experience, there indeed exists at least three common rules for writing a chinese character :
    1. From left to right
    2. From top to bottom
    3. From outter to inner
    But these are just general guidelines and in real situation, there exists more than one standard of stroke order and Gridwrite only follow one of common one. So, if you found that Gridwrite's stroke order of writing some characters are not the same as your, don't worry. The possible variations are limited (always just two or at most three different ways to write a character). It worth to use some time to adapt the difference.
  4. What is a character ? Today's character is defined by an unique number. There are standards such as Unicode, Big-5 or GB to assign such number to represent a character. These numbers are for categorical purpose, sorted in phonetic order or radical based order, the number itself do not contain direct information about the glyph of a character. This means that theoretically you can assign any bitmap image to a character. For example, you can draw a cat to represent 'c' under the same number, a fish to represent 'f' etc. (To see Unicode and its symbols, see our 'Unicode' page.)
    In contrast, Gridwrite uses a different concept to identify a character. Gridwrite's code number reflects 'partial' information about the glyph of a character by forming a sequence of strokes and link it to a number. In extreme case, a character can be thought as a mathematical object that contain 'information' of drawing paths in a normalized 2D flat surface.
  5. The glyph and the stroke order of chinese characters for Gridwrite are referenced below :
    Ref 1 : Dictionary "初階中文字典" version 2 published by Pearson. It is our reference for traditional characters. The dictionary has strokes order information and it is mainly for traditional characters reader. It also contained corresponding simplified character for a traditional character.
    Ref 2 : Dictionary "通用汉字正刑字典" version 1 published by "语文出版社". It is our reference for simplified characters. The dictionary has strokes order information and it is mainly for simplified characters reader. It also contained corresponding traditional character for a simplified character.
  6. What is a font ? A font is 'character + display style'. When we saw a character in a dictionary, it must be printed using some kind of font. So to get the stroke types that compose a chinese character, we need to extract them back from the printed font of the dictionary.
  7. There exists trouble when using two reference dictionaries : Ref 1 and 2 are printed using different type of font. For Ref 1, it used the most commonly used font for traditional chinese which is "楷體" but for Ref 2, it used "宋体" which is the most commonly used font for simplified chinese. These two fonts are not using the same stroke types for some chinese characters. For example, the character "小", as a traditional character and simplified character in Ref 1, it should be coded as 533 but as a simplified character and traditional character in Ref 2, it is coded as 563. Since we use Ref 1 for traditional character and Ref 2 for simplified character, the traditional character would be coded as 533 while the simplified character would be coded as 563. Therefore, when using Gridwrite, the code number change between traditional and simplified involved two factors, one is glyph change due to tradational to simplified, the other is due to font change.
  8. The font "楷體" that are used in Ref 1 is actually referenced to another reference dictionary called "常用字字形表" Year 2000 version published by The Hong Kong Institute of Education (香港教育學院). Interestingly, the font that is used in this glyph dictionary is what it can be called a 'ball-pen' style font. The characters in the dictionary are printed in hand-writing format with a ball pen. We guess the reason for using a ball pen is to minimize the effect of display style and to show clearly the drawing paths of a chinese character. We think that the idea behind this dictionary comes close to our concept of a character as a mathematical object.