A system 100 is capable of segmenting a connected text, such as Japanese or Chinese sentence, into words. The system includes means 110 for reading an input string representing the connected text. Segmentation means 120 identifies at least one word sequence in the connected text by building a tree structure...http://www.google.de/patents/US6374210?utm_source=gb-gplus-sharePatent US6374210 - Automatic segmentation of a text