WO2008141583A1 - Character input method, input system and method for updating word lexicon - Google Patents

Character input method, input system and method for updating word lexicon Download PDF

Info

Publication number
WO2008141583A1
WO2008141583A1 PCT/CN2008/071027 CN2008071027W WO2008141583A1 WO 2008141583 A1 WO2008141583 A1 WO 2008141583A1 CN 2008071027 W CN2008071027 W CN 2008071027W WO 2008141583 A1 WO2008141583 A1 WO 2008141583A1
Authority
WO
WIPO (PCT)
Prior art keywords
cell
vocabulary
lexicon
information
word set
Prior art date
Application number
PCT/CN2008/071027
Other languages
French (fr)
Chinese (zh)
Inventor
Zhankai Ma
Lei Yang
Original Assignee
Beijing Sogou Technology Development Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=38782735&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO2008141583(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Beijing Sogou Technology Development Co., Ltd. filed Critical Beijing Sogou Technology Development Co., Ltd.
Publication of WO2008141583A1 publication Critical patent/WO2008141583A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods

Definitions

  • the present invention relates to the field of inputting character information, and more particularly to a method for inputting characters, an input method system, and a method for updating a thesaurus and a thesaurus publishing system.
  • the accuracy of preferred words is a very important evaluation criterion.
  • the ordering of candidates is also very important.
  • the input method lexicon with the term information and word frequency information is one of the important factors affecting both. If the target word required by the user exists in the thesaurus, and the corresponding word frequency information is in line with the user's usage habits, the preferred word accuracy and candidate ordering for the user will be more in line with the demand.
  • the current lexicon of the input method can only cover a part of the vocabulary used by people, usually including some common vocabulary commonly used, and a considerable part of the vocabulary input lexicon cannot be included. Because if all the vocabulary used by the user is added, the lexical capacity of the input method will be on the order of millions. The vocabulary is too large, too many homophones (or too many reweights), candidates are increased, users who do not need to use these words will be seriously interfered; and such a large vocabulary will inevitably occupy CPU, memory and other computing device resources. It is unacceptable for personal computers.
  • the technical problem to be solved by the present invention is to provide a novel input method lexicon mode and a complete set of input solutions, which can meet the resource allocation of existing computing devices, does not occupy more computing resources, and can significantly improve the users' Input efficiency.
  • an input method system including an input interface unit, an information conversion unit, and a display output unit, and further includes:
  • the cell word set is obtained by at least one cell vocabulary obtained from a plurality of cell lexicons stored on the server side; words in each cell lexicon The word has at least one common attribute;
  • the information conversion unit performs a search query in the system vocabulary and the cell word set to obtain corresponding candidates.
  • the input method system further includes: an automatic update module, configured to obtain, by using a comparison, the required update data from the server according to the local existing cell vocabulary list.
  • an automatic update module configured to obtain, by using a comparison, the required update data from the server according to the local existing cell vocabulary list.
  • the type of related information stored in the cell word set is less than or equal to the type of related information stored in the system vocabulary; at least one cell vocabulary exists in the plurality of cell lexicons stored by the server end by artificial Generated manually.
  • the input method system may further include: a user vocabulary.
  • the input method system may further include: adding a module, configured to add cell lexicon information acquired from the server end to the cell word set; the cell word set is an independent vocabulary or A collection of thesaurus that exists in parallel for multiple thesaurus.
  • the adding process is performed in a separate cache vocabulary.
  • the input method system may further include: a cell lexicon deactivation module, configured to receive a user instruction, and remove a term record belonging to the cell vocabulary selected by the user from the cell word set.
  • a cell lexicon deactivation module configured to receive a user instruction, and remove a term record belonging to the cell vocabulary selected by the user from the cell word set.
  • a method for character input including:
  • the cell word set is used to record extended words and related letters
  • the cell word set is obtained from at least one cell vocabulary obtained from a plurality of cell lexicons stored on the server side; the words in each cell lexicon have at least one common attribute;
  • the user's selection information is received, and the specified candidate is output on the screen.
  • the loading is: combining the cell word set and the system vocabulary into a vocabulary and placing it in a cache; or, the loading is: using the cell word set and the system vocabulary as two or more independent lexicons Placed in the cache and set the thesaurus priority according to the preset rules; the priorities are used for the display ordering of the candidates.
  • the cell word set records the cell vocabulary to which each term belongs and the corresponding cell lexicon priority; the priority is used for display ordering of the candidates.
  • the method may further include: dynamically adjusting the cell lexicon priority according to the usage environment of the input method during the loading process.
  • a method for updating a thesaurus the updated thesaurus relating to a set of cell words for recording extended words and related information, the cell word set being from the server side Obtaining at least one cell vocabulary selected from the plurality of cell lexicons stored; each word in the cell lexicon has at least one common attribute; the method comprising:
  • Receive triggers compare existing cell lexicon lists and server-side cell vocabulary lists to get a list of vocabularies that need to be updated;
  • the method may further include: manually or automatically upgrading the cell vocabulary stored on the server side, and changing the corresponding version information.
  • the adding process is performed in a separate cache vocabulary.
  • a thesaurus publishing system including:
  • the cell vocabulary generating unit comprises: an interface module, configured to receive input information; a generating module, configured to generate a cell vocabulary according to the received information; and an identifying module, configured to specify identifier and version information for each cell vocabulary; , the words in each cell lexicon have at least one common attribute;
  • the communication unit is configured to receive the trigger and transmit the corresponding cell lexicon entry information to the client.
  • the cell vocabulary generating unit may further include: a modified update module, configured to modify The new cell lexicon stores information, and notifies the identification module to generate a new version letter for the cell vocabulary.
  • the lexicon publishing system may further include: an identification module, configured to compare the server-side cell vocabulary list and The client's cell vocabulary list, according to the obtained comparison result, transmits the required update data to the client through the communication unit.
  • the cell vocabulary obtained according to the received information stores a plurality of vocabulary information; or the cell vocabulary obtained according to the received information stores index information, and the index information corresponds to other cell lexicons.
  • the thesaurus publishing system may further include: a merging module, configured to merge the plurality of cell lexicon entry information into one download vocabulary, and notify the communication unit to transmit the downloaded vocabulary to the client.
  • a merging module configured to merge the plurality of cell lexicon entry information into one download vocabulary, and notify the communication unit to transmit the downloaded vocabulary to the client.
  • the present invention has the following advantages: and the cell word set is composed of two parts, wherein the system vocabulary is still oriented to all users, and the general vocabulary is mainly used, and the cell word set part provides multiple cells through the server side.
  • the vocabulary where the user selects one or more cell lexicons that best suits them and then combines them. Therefore, it can be guaranteed that the input method lexicon used by the user still belongs to the existing lexicon capacity level, but by the individual selection and use of each person, it can basically cover almost all the vocabulary of the current user, and has relative More accurate word frequency information, which can greatly improve the current user's preferred word accuracy rate, and can also achieve candidate sorting that is more in line with current user usage habits.
  • the invention realizes a dynamic cell-type vocabulary on the lexicon capacity level of the existing input method, and the user can automatically add a small vocabulary by manual or by computer, through individualized selection or customization of each person, through cell words.
  • the combination of the set and the system vocabulary can cover almost all the words of the current user. This allows the user to enter almost any vocabulary or sentence they need, which greatly increases the accuracy of the preferred word for the input method.
  • the coverage of the individual's thesaurus is expanded to the maximum, so that the accuracy of typing has a greater improvement.
  • the invention uses the use of a plurality of cell lexicons, and can update the cell vocabulary by means of automatic upgrade, so that the individual vocabulary can be synchronized with the times. Individuals can maintain the freshness of their vocabulary without hands-on, so that the accuracy of the preferred word for typing can be more obvious in the context of the rapid development of the Internet. Improve the typing speed, reduce the appearance of new words, and reduce the number of page flips.
  • the present invention also provides a thesaurus publishing system, which is used to help each user manually generate a cell vocabulary of a group to which they belong, and update and modify the cell vocabulary; an automatic update function is added on the client side, thereby helping Users get accurate classification of cell lexicon and automatic update of cell vocabulary to keep users consistent with the world and never fall behind.
  • 1 is a structural block diagram of an embodiment of an input method system
  • FIG. 2 is a flow chart showing the steps of an embodiment of a method for performing character input
  • FIG. 3 is a structural block diagram of an embodiment of a thesaurus publishing system
  • FIG. 4 is a flow chart showing the steps of an embodiment of a method for automatically updating a thesaurus.
  • the invention can be applied to input method platforms of various input modes, including keyboard symbols, handwritten information, voice input and the like. That is, the input information may include an encoded character string, and may also include information such as handwritten input information and voice input, because these input methods also require the use of the thesaurus for candidate sorting. Since the information conversion process in these input methods is a well-known technique, it will not be described in detail herein. The following only takes the code string input as an example for detailed description.
  • an input method system of the present invention may specifically include: an input interface unit 101, configured to receive input information input by a user;
  • the information conversion unit 102 is configured to perform a query in the system vocabulary 104 and the cell word set 105 according to the input information input by the user (for example, receiving keyboard characters), complete the code conversion, and obtain corresponding candidates;
  • the display output unit 103 is configured to display candidates, and receive user selection, upper screen output.
  • System vocabulary 104 used to record basic words and related information
  • a cell word set 105 representing a set of cell lexicons for recording extended words and related information; the cell word set is obtained by at least one cell vocabulary selected from a plurality of cell lexicons stored at the server side; Words in each cell's thesaurus have at least one common attribute.
  • the preset system In the process of character input, the preset system is used to retrieve the system lexicon and cell word set for encoding. Convert, you can complete the input process that meets the user's individual needs.
  • the cell lexicon the specific meaning is a certain common group, a certain person or a part of people using a certain common vocabulary (ie, the words in each cell lexicon have at least one common attribute), for example: Movie vocabulary, the latest song noun library, World of Warcraft lexicon, biological lexicon, Tsinghua University owner noun library, a certain company noun library, Haidian District noun library.
  • the method of obtaining the cell lexicon can be: automatically classifying and parsing the cell vocabulary through a management organization or a server group; or providing a server platform, and manually generating the cell vocabulary of the group described by the user. To better meet the needs of individualized groups. That is, preferably, at least one cell vocabulary of the plurality of cell lexicons in the embodiment is manually generated by a user.
  • the input method platform can be run on a variety of computing devices, such as personal computers, personal digital assistants, mobile terminal devices, etc., and the present invention can also be applied to various computing devices described above, and its operating environment. There is no need to limit it.
  • the user inputs the encoded character string of the Chinese character into the computer through the keyboard (in some cases, the mouse may also be used, such as a soft keyboard); the keyboard input of the user is passed to the input method through the operating system, and the input method is used for decoding; due to different Chinese character sequences (words) , sentences) may have the same encoding, so the input method usually provides a candidate list for the user to choose from.
  • the keyboard in some cases, the mouse may also be used, such as a soft keyboard
  • the keyboard input of the user is passed to the input method through the operating system, and the input method is used for decoding; due to different Chinese character sequences (words) , sentences) may have the same encoding, so the input method usually provides a candidate list for the user to choose from.
  • the Pinyin input method might include the following steps:
  • Pinyin analysis Split the input string to get pinyin, such as zhuanli today [zhuan] [li]. Of course, sometimes this cut is not unique, such as fangan today [313] [311] or 11] 3 ⁇ 4311] (respectively "program” "disgust”).
  • the input method can support Jane spelling, allowing the user to input in the following forms: zl, zhl, zhuanl, zhli, ..., etc. Considering that some users are not standard in pronunciation, they can also support fuzzy sounds: zuanli. In addition, it is also possible to adopt a form such as double spelling.
  • the thesaurus can contain various language information, such as:
  • the input method can also be constructed on the basis of words, since the word is the smallest nominal unit in Chinese, the modern input method makes extensive use of the term information. For example, when the user inputs the "zhuan” pinyin separately, it is difficult to determine which one of the words “want to earn bricks". Similarly, when the user enters "li”, it is difficult to determine which word in “Li Lili" he wants to enter. However, if the user continuously inputs the two syllables "zhuanli”, it is basically determined that the user wants to input the word "patent". This can greatly improve the accuracy of the input method.
  • the above two language information is indispensable in the input method vocabulary, and the input method vocabulary of the present invention may also include other information that is advantageous for improving the accuracy of the input method, for example:
  • the word frequency information is usually represented by a number, which is used to indicate the frequency of use of the word; the more frequently the word is used, the higher the frequency.
  • the word order information is usually also a number, but is only used to indicate the relative meaning of the importance of the item.
  • this data can also be omitted, and the relative position of the terms in the thesaurus is used to express the importance of the terms. For example, you can think of words that are in front of the thesaurus more important than words that are listed later, so that the former is placed before the candidate list.
  • the present invention proposes to divide the input lexicon into two parts: a systematic lexicon and a cell vocabulary.
  • the system vocabulary is used to record common vocabulary to meet the input needs of most people in most cases, while the individualized needs of a user are recorded by the cell word set.
  • the present invention can generate a large number of cell lexicons manually or automatically, and then each user can select his own cell vocabulary to obtain a cell word set.
  • the cell word set is very good for each user's fit, because the personalized part is chosen by itself.
  • the cell lexicon can directly constitute a cell vocabulary.
  • the cell word set can have multiple representations. For example: (1) on the client side, the plurality of cell lexicons are combined into one lexicon, that is, the cell word set exists in the form of an independent lexicon; the lexicon can store the source of each vocabulary (ie, The cell lexicon) information may or may not be stored. (2) on the client side, storing the plurality of cell lexicons in parallel, that is, The cell word set exists in the form of coexistence of multiple independent lexicons, and the plurality of cell lexicons may be sequentially scanned during the coding conversion.
  • cell word sets because some language information is more complicated, for example, language connection, etc., one is difficult to obtain, and the second is difficult to store, so it is preferable for cell word sets (actually including individual cell words).
  • Library wherein the type of language information stored may be less than the type of language information stored in the system lexicon.
  • the type of linguistic information stored in the cell word set may also be more than the type of linguistic information stored in the system lexicon. For example, for word order information or location information, it is generally stored in the cell lexicon, and in the system lexicon. Generally not.
  • the input method system of this embodiment may further include a user vocabulary 106 for recording the user's input habits to better meet the personalized needs of the user.
  • the embodiment may further include: an automatic update module 107, configured to receive the trigger, and download the required update data from the server according to the existing cell vocabulary list.
  • an automatic update module 107 configured to receive the trigger, and download the required update data from the server according to the existing cell vocabulary list.
  • the user's input method system stores a list of information of the cell vocabulary being applied, and then compares it with the information on the server side. If an update is required, the download update is completed according to the preset update policy.
  • the update data may be the entire cell lexicon. For example, if the cell lexicon needs to be updated, all the vocabulary information of the cell lexicon may be directly downloaded; the updated data may also be in a cell vocabulary. Part of the term information, for example, knowing that the cell lexicon needs to be updated, only the changed term information is downloaded through the lexical comparison.
  • the server side can also merge the changed term information in multiple cell thesaurus into a new thesaurus as the update data for download.
  • the server side can merge the multiple cell lexicons into one vocabulary and then send them to the client as a cell word set, that is, the data addition task of the cell vocabulary is completed by the server.
  • the embodiment may further include: an adding module 108, configured to add the downloaded cell lexicon information to the cell word set.
  • the adding module 108 can adopt various feasible adding policies, for example, the adding manner is: completing the update to download a cell vocabulary, adding the cell lexicon to the cell word set; or, the adding manner is: After all downloads of the cell lexicon to be updated are completed, they are merged and added to the cell word set.
  • the add module 108 can be used when the cell word set is first formed, or when the vocabulary is updated.
  • the adding module 108 can be used to download the entire cell lexicon or to download a partial vocabulary information in a cell vocabulary.
  • the thesaurus adding process can be completed in a short time (for example, no more than 1 second), since the influence is not large, the adding process can be directly inserted into the user's input process.
  • the thesaurus addition process should be performed in a separate cached lexicon. In this process, the original vocabulary of the input method is not affected and the user can use it normally.
  • the cached thesaurus is created, you can directly replace the original thesaurus of the input method. Since this replacement process can be fast, interference to the normal use of the user can be minimized.
  • the embodiment may further include: a cell lexicon deactivation module 109, configured to receive user instructions (eg, by clicking a menu item, etc.), to remove from the cell word set.
  • a term record belonging to the cell vocabulary selected by the user for the purpose of deactivating one or some cell lexicons.
  • the removing process may be: receiving a user instruction, deleting the cell vocabulary selected by the user from the list, and re-adding the cell vocabulary in the list to obtain a new cell word set. Since the deleted cell lexicon is no longer in the list, the newly obtained cell word set will not contain the words therein, and the effect is equivalent to the lexicon has been deleted. For a cell vocabulary that exists independently in a cell's vocabulary, it can be deactivated by directly deleting or adding a delete tag.
  • the removing process may be: receiving a user instruction to delete a term record belonging to the cell vocabulary selected by the user from the cell word set, wherein the cell word set records the cell lexicon information to which each term belongs.
  • the removing process may be: receiving a user instruction, in the cell word set, adding a deletion mark to the entry record belonging to the cell vocabulary selected by the user, where the cell word set records the cell to which the term belongs Thesaurus information.
  • the source of each entry is recorded in the large lexicon of the cell word set, when the user specifies to delete a certain
  • the cell lexicon informs the input method system (or it actively) to remove the terms from the cell lexicon from the lexicon. This removal can be done by directly deleting the term from the data structure and releasing its corresponding space, or by deleting the tag. Terms with delete tags will be ignored in subsequent use (no space is freed, but it will be easier to implement).
  • the advantage of this approach is that the system overhead caused by deleting a small number of lexicon entries when the cell vocabulary is large will be relatively small.
  • the method may include: Step 201: loading a system vocabulary and a cell word set; and the cell word set is composed of a plurality of cells stored from a server end. At least one cell lexicon selected in the lexicon is obtained; the words in each cell lexicon have at least one common attribute;
  • Step 202 Receive input information of a user.
  • Step 203 Perform a search in the system vocabulary and the cell word set according to the received input information, and obtain corresponding candidates;
  • Step 204 Receive selection information of the user, and output the specified candidate on the screen.
  • the loading process described in step 201 may be: merging the cell word set and the system vocabulary into a vocabulary, and placing it in the cache.
  • the system vocabulary and cell word set in the input method system are scanned, and the two are combined into one lexicon and then loaded into the cache, so that the user can follow the system lexicon in subsequent operations. Use directly.
  • the loading of the system vocabulary and the loading of the cell word set can be performed separately. For example, in a simple case, the user only needs to load the system vocabulary, and in some cases, the user selection or input method system is automatically started (for example, In the case of a preset policy, the triggering of the loading of the cell word set is triggered, and then the cell word collection is merged into the system vocabulary and placed in the cache for retrieval upon user input.
  • the loading process described in step 201 may also be: placing the cell word set and the system vocabulary as two or more independent vocabularies in a cache, and setting the lexicon priority according to the preset rule; The priority is used for display ordering of candidates.
  • the cell word set is placed in a space other than the system vocabulary, and the cell word set is also retrieved while the system vocabulary is retrieved.
  • the priority of the system vocabulary and the cell word set needs to be specified at this time. For example, if the default cell word set has a higher priority than the system vocabulary, when the candidate is output, Words that belong to a cell word set are forced to be placed in front of words belonging to the system lexicon.
  • the cell word set For a cell word set to be a large word inventory, there are two separate thesaurus in the cache. While the cell word set is also composed of multiple cell lexicons, there may be multiple independent lexicons in the cache. Of course, it is necessary to set the priority of each lexicon at this time; the priority is used for the display order of the candidates.
  • the cell word set is a large word stock
  • the cell vocabulary to which each term belongs and the corresponding cell lexicon priority may also be recorded in the cell word set.
  • a cell word set includes two cell lexicons, "office language” and "network terminology", which normally have the same priority.
  • the input method system recognizes that the current application is a Word word processing program, the "office language” cell vocabulary can be weighted, and when the user switches to the QQ chat program, the "network term” cell lexicon can be weighted.
  • the thesaurus publishing system can be used for input methods
  • the process by which the client first downloads the cell vocabulary from the server side to obtain the cell word set can also be used to update the existing cell vocabulary.
  • the thesaurus publishing system shown in FIG. 3 may specifically include:
  • the cell lexicon generating unit 301 includes an interface module 3011 for receiving input information, a generating module 3012 for obtaining a cell vocabulary based on the received information, and an identifier for specifying identification and version information for each cell vocabulary. Module 3013; words in each cell lexicon have at least one common attribute;
  • the communication unit 302 is configured to receive trigger information and transmit corresponding cell lexicon information to the client.
  • the cell lexicon generating unit 301 is generally located at the server side for uniformly managing and maintaining the cell vocabulary. Certainly, some or all of the modules in the cell lexicon generating unit 301 may also be located in the client (which may be other clients independent of the input method client). For example, the interface module 3011 and the generating module 3012 are located at the client, and the user may directly Send the generated cell vocabulary file to the server Yes, the work of specifying the identification and version information is done by the server.
  • the triggering information may be a user's selection operation, or may be a trigger information automatically sent by the input method system client, or may be an automatic detection trigger of the server.
  • the server or the client retrieves the user IP address or the current input environment, and automatically recommends the corresponding cell vocabulary to the user; or, the update message sent by the client is also a type of trigger information.
  • the generation of cell lexicon can be done manually, automatically, etc.
  • the thesaurus generator needs to provide the following information through the interface module 3011 (for example, including the lexicon edit page): name, category, number of articles, version, description, thesaurus author, entry examples, terms (including pronunciation information), etc. Wait.
  • the submit button is clicked, the information is saved to the database; then the thesaurus generator is enabled.
  • the thesaurus program directly saves this information in a file to a file for users to download.
  • the cell vocabulary is a file containing data that may have:
  • the format of the cell lexicon can also be processed as necessary. For example, the internal terms are sorted. Of course, these tasks can be completed in the generating module 3012, and then the data files sorted by the terms are provided to the user as a cell lexicon file for downloading.
  • the cell lexicon can also be encrypted for purposes such as copyright protection.
  • the server side may further include an encryption module
  • the input method client may further include a decryption module.
  • the identification module 3013 also assigns a unique ID and a version number to each cell vocabulary.
  • the cell vocabulary in the embodiment shown in FIG. 3 can have various representations, for example, generally, multiple vocabulary information is directly stored in the cell vocabulary; or, the cell vocabulary can only store index information.
  • the index information corresponds to other cell lexicons.
  • the cell lexicon storing index information can be generally applied to:
  • the server side stores a plurality of cell vocabularies obtained according to the received information, and then generates a new cell vocabulary according to a certain commonality of the cell lexicons (ie, indirect Using the received information), in order to achieve simplicity, the index information may be stored only in the new cell vocabulary.
  • the server then merges the corresponding lexicons and transmits them.
  • the cell vocabulary generating unit 301 of the vocabulary publishing system in this embodiment may further include: a modify updating module 3014, configured to modify the updated information of the updated cell vocabulary, and notify the The identification module 3013 generates new version information for the cell vocabulary.
  • the modification may be done manually, or may be performed by adjusting the cell vocabulary according to a certain preset strategy, for example: other users add a new entry to a cell lexicon; or, according to a preset policy, The words in the two cell lexicons are combined into one cell lexicon; or, according to the Internet word frequency statistics result, the words in the cell vocabulary whose Internet word frequency does not meet the preset condition are deleted or sorted and adjusted.
  • a cell vocabulary is a file with a specific suffix name, such as a scd suffix.
  • the .scd suffix is associated with a specific application through the registry.
  • the operating system starts the corresponding application module according to the association rule (for example, the adding module in the embodiment shown in FIG. 1), and completes the addition of the cell lexicon data.
  • the second is to directly add the cell lexicon data online by clicking the link on the page.
  • save and execute There are two ways: save and execute. If the user saves the cell lexicon file, the same as the previous method. If the user chooses to execute, the system saves the cell thesaurus file In the system's temporary folder, then run it.
  • the internal implementation mechanism is the same as the first one, except that the file is downloaded to the system temporary folder, so the user is not required to specify the download location. At the same time, the system will clean up the temporary directory when necessary, so although the cell vocabulary has been downloaded to the temporary directory, it is actually invisible to the user.
  • the process of adding the downloaded cell vocabulary to the cell word set may also include a conversion step, such as sorting the original unordered terms in the lexicon to improve the efficiency of the addition. If there is a conversion step, the converted thesaurus file will be used; otherwise the original thesaurus file will be used directly.
  • a conversion step such as sorting the original unordered terms in the lexicon to improve the efficiency of the addition. If there is a conversion step, the converted thesaurus file will be used; otherwise the original thesaurus file will be used directly.
  • the server has completed the conversion sorting process during the thesaurus generation process, the client does not need to repeat the work when the data is added.
  • the input method system ie, the input method client
  • the list of cell lexicons can take various feasible forms, for example, copy all active cell lexicons into a specified directory, or save a list of file names (this list can be placed in a local disk file) , can also be stored in the registry, or stored in a remote, such as on the network).
  • the process of adding the data of the cell lexicon to the cell word set can be operated immediately after the download is completed (for example, notifying the input method client to start the add operation); or waiting for the input method to actively discover the update (for example, the user starts the input next time)
  • add the operation again: Scan the cell vocabulary list, read in and add each cell lexicon to the cell word set.
  • the batch method combines the words in all cell lexicons into one large temporary vocabulary at once, and then adds the cell word set at once. This approach is simpler to implement, but the user must wait until all the lexicons have been merged before using the newly added cell lexicon.
  • the incremental mode is: When several entries are read, they are added to the cell word set. If the merge time is long, the user can use it while merging, but this requires higher system design.
  • the incremental merge mode it can be used during the merge process, so there is no need to notify the input method system when the merge is completed.
  • the batch merge method it is necessary to notify the input method system that the new thesaurus is ready for use after the merge is completed.
  • An alternative approach is to directly access the storage space of the input method and update the data so that although the input method is not notified, the data has been updated, so the actual New data is ready to use.
  • an optimization step may be further included for optimizing the repeated words in the thesaurus, for example, combining the repeated terms.
  • information such as the identifier of the plurality of cell lexicons described therein may be recorded in its source attribute.
  • different priorities of the plurality of cell lexicons described in the word can also be recorded for different candidate environments to prioritize candidate cell rankings for different input environments.
  • the thesaurus publishing system in this embodiment can set the update identification work to be completed on the server side.
  • the vocabulary publishing system in this embodiment may further include: an identifying module 303, configured to compare the cell vocabulary list saved by the server and the cell vocabulary list sent by the client, and transmit according to the obtained comparison result.
  • the required update data to the client For example, the changed cell vocabulary formation list may be sent to the client, and the client may determine and initiate a download request; or, the server may directly push the changed cell vocabulary to the client to complete the update.
  • the update data may be an entire cell vocabulary, for example, identifying that the cell lexicon needs to be updated, and transmitting all the vocabulary information of the cell lexicon; the updated data may also be in a cell vocabulary. Part of the term information, for example, identifying that the cell lexicon needs to be updated, further through the lexical comparison, only transmitting the changed term information.
  • the embodiment may further include: a merging module 304, configured to merge the plurality of cell lexicon terms information into one download vocabulary, and notify the communication unit 302 to transmit the downloaded vocabulary to the Client.
  • the merging unit can be used for various possible scenarios, for example, combining a plurality of cell vocabularies selected by a user into one lexicon for transmission; or changing a plurality of cell lexicons that need to be updated.
  • the term information is merged to obtain a new thesaurus, and then transmitted; or, the cell lexicon corresponding to the index information in the cell lexicon is combined to obtain a new vocabulary, and then transmitted.
  • the required updated thesaurus relates to a set of cell words for recording extended words and related information in an input method system. At least one cell lexicon selected from a plurality of cell lexicons stored on the server side; the words in each cell lexicon have at least one common attribute;
  • Step 401 Receive a trigger, compare an existing cell vocabulary list and a server-side cell vocabulary list, Obtaining a list of the thesaurus that is required to be updated; the triggering may be a manual triggering or an automatic triggering; Step 402: Download the updated cell lexicon entry information and add to the cell word set.
  • the method embodiment may further include the step 403: manually or automatically upgrading the cell vocabulary stored on the server side, and changing the corresponding version information.
  • the upgrade may be done manually, or may be performed by adjusting the cell vocabulary according to a certain preset strategy, for example: other users add a new entry to a cell vocabulary; or, according to a preset policy, The words in the two cell lexicons are merged into one cell lexicon; or, according to the Internet word frequency statistical result, the words in the cell vocabulary whose Internet word frequency does not meet the preset condition are deleted or sorted and adjusted.
  • each cell lexicon can have a unique ID, which can be a naturally growing integer, or a network address or other information (as long as two different cell lexicons have different IDs) can).
  • Each cell vocabulary can also have a version information, which can be a serial number or the time of the last modification.
  • a change in the version information indicates that the thesaurus file needs to be updated. For example, if the last update time of the client is used as the version information, if there is a change in the file update time saved on the server, the thesaurus file needs to be updated.
  • the input method client sends the existing cell vocabulary list to the server, which may be sent through the TCP/IP protocol or sent through the HTTP protocol. ; Aligned by the server: Whether the cell lexicon corresponding to the ID in the list needs to be updated.
  • the input method client initiates an update request, and the server sends back all the list information of the cell lexicon (or the changed cell vocabulary list), and the input method client determines which existing vocabulary needs to be updated.
  • the input method client sends the existing cell vocabulary list to the server, and the server sends back the version information of the cell vocabulary corresponding to the ID in the list, and the input method client determines which existing vocabulary needs to be updated.
  • the server may send the changed cell vocabulary formation list to the client, and the client determines and initiates the download request (for example, selecting a part from the server)
  • the lexicon is updated.
  • the cell vocabulary that has changed can be directly pushed to the client by the server to complete the update.
  • the data downloaded in step 402 may be the entire thesaurus or a part of the term information in a cell vocabulary, for example, changed term information.
  • an incremental mode for the data addition process in step 402, an incremental mode, a batch mode, or a combination of both may be employed.
  • the adding manner is: completing the update to download a cell vocabulary, adding the cell lexicon information to the cell word set; or, adding the method: after completing downloading of all the cell vocabularies to be updated , is added to the cell word set.
  • the thesaurus addition process can be completed in a short period of time (for example, no more than 1 second), it can be inserted directly into the user's input process due to its small impact. However, if it cannot be completed in a short period of time and may affect the user's experience, the thesaurus addition process should be performed in a separate cached lexicon. In the process, the original vocabulary of the input method is not affected, and the user can use it normally. When the cached thesaurus is created, you can directly replace the original thesaurus of the input method. Since this replacement process can be fast, it can be avoided to interfere with the normal use of the user.

Abstract

A character input system has an input interface unit, an information transforming unit and a display and output unit, a system lexicon recording a basal word and correlative information, and a cell word set collection to record an expanding word and correlative information. The cell word set is received from at least one of the multiple cell word lexica that are stored by a server end. Word in each cell word set has at least one common attribute. The application provides a dynamic cell word lexicon based on lexicon capability of existing input system, the user can add small word lexicon in a hand or in an automatic manner, so that the words of the user can be covered through individuation selection or a tailor and associated use of an automatic updating and system lexicon, the covering surface of the word lexica is expanded to the most for the user. As the great improvement of the correct radio of the first selection of words for the input system, and the covering surface of the individual word lexicon is expanded to the most in theory, the correct radio of typing is greatly upgraded.

Description

一种字符输入的方法、 输入法系统及词库更新的方法  Method for inputting characters, input method system and method for updating lexicon
本申请要求于 2007 年 5 月 22 日提交中国专利局、 申请号为 200710099474.6、 发明名称为"一种字符输入的方法、 输入法系统及词库更新 的方法"的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域  This application claims priority to Chinese Patent Application No. 200710099474.6, entitled "A Method of Character Entry, Method of Input Method, and Method of Updating the Thesaurus" submitted by the Chinese Patent Office on May 22, 2007. The entire contents are incorporated herein by reference. Technical field
本发明涉及字符信息的输入领域,特别是涉及一种字符输入的方法、输入 法系统以及一种词库更新的方法和一种词库发布系统。  The present invention relates to the field of inputting character information, and more particularly to a method for inputting characters, an input method system, and a method for updating a thesaurus and a thesaurus publishing system.
背景技术 Background technique
随着计算机技术以及互联网技术的普及与发展, 不同专业领域、不同兴趣 以及使用习惯的用户对于输入法系统的智能性要求越来越高。  With the popularization and development of computer technology and Internet technology, users of different professional fields, different interests and usage habits are increasingly demanding the intelligence of the input method system.
在评价输入法智能性时, 首选词的准确率是一个非常重要的评价标准, 同 时, 候选项的排序也非常的重要。 而在输入法系统中, 记载有词条信息和词频 信息的输入法词库是影响二者重要因素之一。 如果用户所需的目标词在词 库中存在, 并且其相应的词频信息非常符合该用户的使用习惯, 则针对该 用户的首选词准确率及候选项排序就会比较符合需求。  When evaluating the intelligence of input methods, the accuracy of preferred words is a very important evaluation criterion. At the same time, the ordering of candidates is also very important. In the input method system, the input method lexicon with the term information and word frequency information is one of the important factors affecting both. If the target word required by the user exists in the thesaurus, and the corresponding word frequency information is in line with the user's usage habits, the preferred word accuracy and candidate ordering for the user will be more in line with the demand.
但是, 目前输入法的词库一般只能够覆盖人们使用的词汇的一部分,通常 主要包括一些人们普遍的常用词汇,还有相当一部分词汇输入法词库是不可能 全部包括进来的。 因为如果把所有用户用的词汇都加入进来, 那么输入法的词 库容量将在数百万的量级。 词库过大, 同音字过多 (或者重码过多), 候选项 增加, 不需要使用这些词的用户会受到严重干扰; 并且, 这样一个超大的词库 势必大幅占用 CPU、 内存等计算设备资源, 对个人电脑来说是不能接受的。  However, the current lexicon of the input method can only cover a part of the vocabulary used by people, usually including some common vocabulary commonly used, and a considerable part of the vocabulary input lexicon cannot be included. Because if all the vocabulary used by the user is added, the lexical capacity of the input method will be on the order of millions. The vocabulary is too large, too many homophones (or too many reweights), candidates are increased, users who do not need to use these words will be seriously interfered; and such a large vocabulary will inevitably occupy CPU, memory and other computing device resources. It is unacceptable for personal computers.
具体的, 每个人在使用输入法时除了输入许多常用词汇之外 (例如 "现 在"、 "时间"、 "多少"等), 还会输入一些小众词汇, 例如: 一些游戏名词 "艾 泽拉斯"、 "德鲁伊", 最新的电影"云水谣"等等。 这些词汇对一些小群体来 说是需要经常输入的, 例如: 魔兽世界玩家, 化学专业的工程师, 生物学 的教师等等。 但是这些词汇在总体用户中的使用比例特别低, 现有模式下 的输入法词库为了节约空间、 提高效率, 将这些词汇排除在外了。 因此, 在现有技术下, 用户输入上述这些小众词汇时的首选词准确率非常低, 严 重影响用户的使用体验以及其思想的表达。 总之, 需要本领域技术人员迫切解决的一个技术问题就是: 如何改进输 入法词库,使得其既可以满足现有计算设备的资源分配, 又可以大大提高各个 用户的输入效率。 Specifically, when using the input method, everyone enters many common words (such as "now", "time", "how much", etc.), and also enters some niche vocabulary, for example: Some game nouns "Azela ", "Druid", the latest movie "Clouds" and so on. These vocabulary are often entered for small groups, such as: World of Warcraft players, chemistry engineers, biology teachers, and more. However, the use of these terms in the overall user is particularly low, and the input method lexicon in the current model excludes these terms in order to save space and improve efficiency. Therefore, under the prior art, the accuracy of the preferred word when the user inputs the above-mentioned niche vocabulary is very low, which seriously affects the user's use experience and the expression of his thought. In summary, a technical problem that needs to be solved urgently by those skilled in the art is: how to improve the input method vocabulary so that it can satisfy the resource allocation of existing computing devices and greatly improve the input efficiency of each user.
发明内容 Summary of the invention
本发明所要解决的技术问题是提供一种新型的输入法词库模式以及整套 的输入解决方案,能够满足现有计算设备的资源分配,不会占用更多计算资源, 并且可以显著提高各个用户的输入效率。  The technical problem to be solved by the present invention is to provide a novel input method lexicon mode and a complete set of input solutions, which can meet the resource allocation of existing computing devices, does not occupy more computing resources, and can significantly improve the users' Input efficiency.
为了解决上述问题, 本发明公开了一种输入法系统, 包括输入接口单元、 信息转换单元和显示输出单元, 还包括:  In order to solve the above problem, the present invention discloses an input method system, including an input interface unit, an information conversion unit, and a display output unit, and further includes:
系统词库, 用于记录基础字词及其相关信息;  System vocabulary for recording basic words and related information;
细胞词集, 用于记录扩展字词及其相关信息; 所述细胞词集由从服务器端 所存储的多个细胞词库中获取的至少一个细胞词库得到;每个细胞词库中的字 词至少具有一个共同属性;  a set of cell words for recording extended words and related information; the cell word set is obtained by at least one cell vocabulary obtained from a plurality of cell lexicons stored on the server side; words in each cell lexicon The word has at least one common attribute;
所述信息转换单元在所述系统词库和细胞词集中进行检索查询,得到相应 的候选项。  The information conversion unit performs a search query in the system vocabulary and the cell word set to obtain corresponding candidates.
优选的, 所述的输入法系统还可以包括: 自动更新模块, 用于依据本地已 有的细胞词库列表, 通过比较判断从服务器端获取所需的更新数据。  Preferably, the input method system further includes: an automatic update module, configured to obtain, by using a comparison, the required update data from the server according to the local existing cell vocabulary list.
优选的 ,所述细胞词集中存储的相关信息类型少于或等于所述系统词库中 存储的相关信息类型;所述服务器端所存储的多个细胞词库中至少存在一个细 胞词库由人工手动生成。  Preferably, the type of related information stored in the cell word set is less than or equal to the type of related information stored in the system vocabulary; at least one cell vocabulary exists in the plurality of cell lexicons stored by the server end by artificial Generated manually.
进一步, 所述的输入法系统还可以包括: 用户词库。  Further, the input method system may further include: a user vocabulary.
进一步, 所述的输入法系统还可以包括: 添加模块, 用于将从服务器端所 获取的细胞词库词条信息添加至所述细胞词集中;所述细胞词集为一个独立的 词库或者为多个词库并列存在的词库集合。优选的, 所述添加过程在一独立的 緩存词库中进行。  Further, the input method system may further include: adding a module, configured to add cell lexicon information acquired from the server end to the cell word set; the cell word set is an independent vocabulary or A collection of thesaurus that exists in parallel for multiple thesaurus. Preferably, the adding process is performed in a separate cache vocabulary.
进一步, 所述的输入法系统还可以包括: 细胞词库停用模块, 用于接收用 户指令, 从细胞词集中去除属于用户所选细胞词库的词条记录。  Further, the input method system may further include: a cell lexicon deactivation module, configured to receive a user instruction, and remove a term record belonging to the cell vocabulary selected by the user from the cell word set.
根据本发明的实施例, 还公开了一种字符输入的方法, 包括:  According to an embodiment of the present invention, a method for character input is also disclosed, including:
加载系统词库和细胞词集; 所述细胞词集用于记录扩展字词及其相关信 息;所述细胞词集由从服务器端所存储的多个细胞词库中获取的至少一个细胞 词库得到; 每个细胞词库中的字词至少具有一个共同属性; Loading a system vocabulary and a cell word set; the cell word set is used to record extended words and related letters The cell word set is obtained from at least one cell vocabulary obtained from a plurality of cell lexicons stored on the server side; the words in each cell lexicon have at least one common attribute;
接收用户的输入信息;  Receiving input information of the user;
依据所接收的输入信息,在所述系统词库和细胞词集中进行检索,得到相 应的候选项;  Searching in the system lexicon and cell word set according to the received input information, and obtaining corresponding candidates;
接收用户的选择信息 , 将指定的候选项上屏输出。  The user's selection information is received, and the specified candidate is output on the screen.
其中,所述加载为:将细胞词集与系统词库合并为一个词库,置于緩存中; 或者, 所述加载为: 将细胞词集与系统词库作为两个或多个独立词库置于 緩存中,并依据预置规则设定词库优先级;所述优先级用于候选项的显示排序。  The loading is: combining the cell word set and the system vocabulary into a vocabulary and placing it in a cache; or, the loading is: using the cell word set and the system vocabulary as two or more independent lexicons Placed in the cache and set the thesaurus priority according to the preset rules; the priorities are used for the display ordering of the candidates.
优选的,所述细胞词集中记载有各词条所属的细胞词库以及相应的细胞词 库优先级; 所述优先级用于候选项的显示排序。  Preferably, the cell word set records the cell vocabulary to which each term belongs and the corresponding cell lexicon priority; the priority is used for display ordering of the candidates.
进一步, 所述的方法还可以包括: 在加载过程中, 依据输入法的使用环境 动态调整细胞词库优先级。  Further, the method may further include: dynamically adjusting the cell lexicon priority according to the usage environment of the input method during the loading process.
根据本发明的另一实施例,还公开了一种词库更新的方法, 所更新的词库 涉及用于记录扩展字词及其相关信息的细胞词集,所述细胞词集由从服务器端 所存储的多个细胞词库中选取的至少一个细胞词库得到;每个细胞词库中的字 词至少具有一个共同属性; 所述方法包括:  According to another embodiment of the present invention, there is also disclosed a method for updating a thesaurus, the updated thesaurus relating to a set of cell words for recording extended words and related information, the cell word set being from the server side Obtaining at least one cell vocabulary selected from the plurality of cell lexicons stored; each word in the cell lexicon has at least one common attribute; the method comprising:
接收触发, 比较已有细胞词库列表和服务器端细胞词库列表,得到所需更 新的词库列表;  Receive triggers, compare existing cell lexicon lists and server-side cell vocabulary lists to get a list of vocabularies that need to be updated;
下载所需更新的细胞词库词条信息, 并添加至细胞词集中。  Download the updated cell lexicon entry information and add it to the cell word set.
进一步, 所述的方法还可以包括: 手动或者自动升级服务器端所存储的细 胞词库, 并更改相应的版本信息。 优选的, 所述添加过程在一独立的緩存词库 中进行。  Further, the method may further include: manually or automatically upgrading the cell vocabulary stored on the server side, and changing the corresponding version information. Preferably, the adding process is performed in a separate cache vocabulary.
根据本发明的另一实施例, 还公开了一种词库发布系统, 包括:  According to another embodiment of the present invention, a thesaurus publishing system is further disclosed, including:
细胞词库生成单元, 包括: 接口模块, 用于接收输入信息; 生成模块, 用 于依据所接收的信息生成细胞词库; 标识模块, 用于为每个细胞词库指定标识 和版本信息; 其中, 每个细胞词库中的字词至少具有一个共同属性;  The cell vocabulary generating unit comprises: an interface module, configured to receive input information; a generating module, configured to generate a cell vocabulary according to the received information; and an identifying module, configured to specify identifier and version information for each cell vocabulary; , the words in each cell lexicon have at least one common attribute;
通信单元, 用于接收触发, 传输相应的细胞词库词条信息至客户端。  The communication unit is configured to receive the trigger and transmit the corresponding cell lexicon entry information to the client.
进一步, 所述细胞词库生成单元还可以包括: 修改更新模块, 用于修改更 新细胞词库已存信息, 并通知所述标识模块针对该细胞词库生成新的版本信 进一步, 所述的词库发布系统还可以包括: 识别模块, 用于比较服务器端 的细胞词库列表和客户端的细胞词库列表,依据所得到的比较结果,通过通信 单元传输所需的更新数据至客户端。 Further, the cell vocabulary generating unit may further include: a modified update module, configured to modify The new cell lexicon stores information, and notifies the identification module to generate a new version letter for the cell vocabulary. The lexicon publishing system may further include: an identification module, configured to compare the server-side cell vocabulary list and The client's cell vocabulary list, according to the obtained comparison result, transmits the required update data to the client through the communication unit.
优选的,依据所接收的信息得到的细胞词库中存储有多个词条信息;或者, 依据所接收的信息得到的细胞词库中存储有索引信息,所述索引信息对应其他 细胞词库。  Preferably, the cell vocabulary obtained according to the received information stores a plurality of vocabulary information; or the cell vocabulary obtained according to the received information stores index information, and the index information corresponds to other cell lexicons.
进一步, 所述的词库发布系统还可以包括: 合并模块, 用于将多个细胞词 库词条信息合并为一个下载词库, 并通知通信单元将该下载词库传输至客户 端。  Further, the the thesaurus publishing system may further include: a merging module, configured to merge the plurality of cell lexicon entry information into one download vocabulary, and notify the communication unit to transmit the downloaded vocabulary to the client.
与现有技术相比, 本发明具有以下优点: 和细胞词集两部分构成, 其中, 系统词库仍面向所有用户, 以通用词汇为主, 而细胞词集部分则通过服务器端提供多个细胞词库,由用户选择最合适自己的 一个或多个细胞词库, 然后合并得到。 因此, 可以保证最后该用户使用的输入 法词库仍然属于现有的词库容量级别 , 但通过每个人的个性化的选择和使用 , 使得其基本能够覆盖当前用户几乎所有的词汇, 并具有相对更准确的词频信 息,从而可以大大提高当前用户的首选词准确率,也可以实现更符合当前用户 使用习惯的候选项排序。  Compared with the prior art, the present invention has the following advantages: and the cell word set is composed of two parts, wherein the system vocabulary is still oriented to all users, and the general vocabulary is mainly used, and the cell word set part provides multiple cells through the server side. The vocabulary, where the user selects one or more cell lexicons that best suits them and then combines them. Therefore, it can be guaranteed that the input method lexicon used by the user still belongs to the existing lexicon capacity level, but by the individual selection and use of each person, it can basically cover almost all the vocabulary of the current user, and has relative More accurate word frequency information, which can greatly improve the current user's preferred word accuracy rate, and can also achieve candidate sorting that is more in line with current user usage habits.
本发明在现有的输入法的词库容量级别上实现了动态的细胞式词库,用户 可以通过手动或者由电脑自动添加小词库, 通过每个人的个性化的选择或定 制,通过细胞词集和系统词库的联合使用, 就能够覆盖当前用户几乎所有的词 汇。这样就使用户可以输入其所需的几乎所有的词汇或句子, 能够大幅提升输 入法的首选词准确率。在理论上将个人的词库覆盖面扩大到最大,从而使打字 的准确率有一个较大的提升。  The invention realizes a dynamic cell-type vocabulary on the lexicon capacity level of the existing input method, and the user can automatically add a small vocabulary by manual or by computer, through individualized selection or customization of each person, through cell words. The combination of the set and the system vocabulary can cover almost all the words of the current user. This allows the user to enter almost any vocabulary or sentence they need, which greatly increases the accuracy of the preferred word for the input method. In theory, the coverage of the individual's thesaurus is expanded to the maximum, so that the accuracy of typing has a greater improvement.
本发明通过多个细胞词库的使用,并可以通过自动升级的方式来更新细胞 词库,能够使个人的词库与时代同步。个人无需动手就能够保持词汇的新鲜度, 从而在互联网日新月异的发展情况下,提高打字的首选词准确率, 能够较明显 的提高打字速度, 降低生词的出现, 降低翻页次数。 The invention uses the use of a plurality of cell lexicons, and can update the cell vocabulary by means of automatic upgrade, so that the individual vocabulary can be synchronized with the times. Individuals can maintain the freshness of their vocabulary without hands-on, so that the accuracy of the preferred word for typing can be more obvious in the context of the rapid development of the Internet. Improve the typing speed, reduce the appearance of new words, and reduce the number of page flips.
同时,本发明还提供了一个词库发布系统, 用于帮助各用户手动生成自己 所属群体的细胞词库, 以及更新、修改该细胞词库; 在客户端又增加了自动更 新功能, 从而可以帮助用户得到分类准确的细胞词库以及实现细胞词库的自 动更新, 使用户与世界保持一致, 永不落伍。  At the same time, the present invention also provides a thesaurus publishing system, which is used to help each user manually generate a cell vocabulary of a group to which they belong, and update and modify the cell vocabulary; an automatic update function is added on the client side, thereby helping Users get accurate classification of cell lexicon and automatic update of cell vocabulary to keep users consistent with the world and never fall behind.
附图说明 DRAWINGS
图 1是一种输入法系统的实施例的结构框图;  1 is a structural block diagram of an embodiment of an input method system;
图 2是一种用于完成字符输入的方法实施例的步骤流程图;  2 is a flow chart showing the steps of an embodiment of a method for performing character input;
图 3是一种词库发布系统实施例的结构框图;  3 is a structural block diagram of an embodiment of a thesaurus publishing system;
图 4是一种词库自动更新的方法实施例的步骤流程图。  4 is a flow chart showing the steps of an embodiment of a method for automatically updating a thesaurus.
具体实施方式 detailed description
为使本发明的上述目的、特征和优点能够更加明显易懂, 下面结合附图和 具体实施方式对本发明作进一步详细的说明。  The present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
本发明可以应用于各种输入方式的输入法平台, 包括键盘符号、手写信息 以及语音输入等等。 即所述输入信息可以包括编码字符串,也可以包括手写输 入信息以及语音输入等信息,因为这些输入方式也都需要用到词库进行候选项 排序。由于这些输入方式中的信息转换过程都属于公知技术,在此就不详述了。 下面仅仅以编码字符串输入为例进行详细说明。  The invention can be applied to input method platforms of various input modes, including keyboard symbols, handwritten information, voice input and the like. That is, the input information may include an encoded character string, and may also include information such as handwritten input information and voice input, because these input methods also require the use of the thesaurus for candidate sorting. Since the information conversion process in these input methods is a well-known technique, it will not be described in detail herein. The following only takes the code string input as an example for detailed description.
参照图 1, 示出了本发明一种输入法系统的实施例, 具体可以包括: 输入接口单元 101, 用于接收用户输入的输入信息;  Referring to FIG. 1, an embodiment of an input method system of the present invention is shown, which may specifically include: an input interface unit 101, configured to receive input information input by a user;
信息转换单元 102,用于根据用户输入的输入信息(例如,接收键盘字符), 在系统词库 104和细胞词集 105中进行查询, 完成编码转换,得到相应的候选 项;  The information conversion unit 102 is configured to perform a query in the system vocabulary 104 and the cell word set 105 according to the input information input by the user (for example, receiving keyboard characters), complete the code conversion, and obtain corresponding candidates;
显示输出单元 103, 用于显示候选项, 并接收用户选择, 上屏输出。  The display output unit 103 is configured to display candidates, and receive user selection, upper screen output.
系统词库 104, 用于记录基础字词及其相关信息;  System vocabulary 104, used to record basic words and related information;
细胞词集 105 , 表示细胞词库的集合, 用于记录扩展字词及其相关信息; 所述细胞词集由从服务器端所存储的多个细胞词库中选取的至少一个细胞词 库得到; 每个细胞词库中的字词至少具有一个共同属性。  a cell word set 105, representing a set of cell lexicons for recording extended words and related information; the cell word set is obtained by at least one cell vocabulary selected from a plurality of cell lexicons stored at the server side; Words in each cell's thesaurus have at least one common attribute.
在字符输入的过程中, 采用预置策略,检索系统词库和细胞词集进行编码 转换, 即可完成符合该用户个性化需求的输入过程。 In the process of character input, the preset system is used to retrieve the system lexicon and cell word set for encoding. Convert, you can complete the input process that meets the user's individual needs.
所述细胞词库,具体含义为某一特定群体、某一个人或一部分人使用的具 有某一共性的词库(即每个细胞词库中的字词至少具有一个共同属性), 例如: 最新电影词库、 最新歌名词库、 魔兽世界词库、 生物学词库、 清华大学所有人 名词库、 某某公司全体人名词库、 海淀区地名词库等。 获得细胞词库的方式可 以为: 通过一个管理机构或者服务器群来自动分类、 解析获得细胞词库; 也可 以为:提供一服务器平台,由用户自发的手动生成自己所述的群体的细胞词库, 以更好的满足个性化群体的需求。 即优选的,本实施例中的所述多个细胞词库 中至少存在一个细胞词库由用户手动生成。  The cell lexicon, the specific meaning is a certain common group, a certain person or a part of people using a certain common vocabulary (ie, the words in each cell lexicon have at least one common attribute), for example: Movie vocabulary, the latest song noun library, World of Warcraft lexicon, biological lexicon, Tsinghua University owner noun library, a certain company noun library, Haidian District noun library. The method of obtaining the cell lexicon can be: automatically classifying and parsing the cell vocabulary through a management organization or a server group; or providing a server platform, and manually generating the cell vocabulary of the group described by the user. To better meet the needs of individualized groups. That is, preferably, at least one cell vocabulary of the plurality of cell lexicons in the embodiment is manually generated by a user.
在现有技术中,输入法平台可以运行在多种计算设备上,例如,个人电脑、 个人数字助理、移动终端设备等等 ,本发明也可以适用在上述各种计算设备中 , 对其运行环境并不需要加以限制。  In the prior art, the input method platform can be run on a variety of computing devices, such as personal computers, personal digital assistants, mobile terminal devices, etc., and the present invention can also be applied to various computing devices described above, and its operating environment. There is no need to limit it.
下面简单介绍一下汉字、 韩文、 日文等需要编码转换的字符输入的过程, 以中文输入为例:  The following is a brief introduction to the process of inputting characters such as Chinese characters, Korean characters, and Japanese, which require encoding conversion. Take Chinese input as an example:
在中文里,作为基本语言单位的汉字并不与键盘上的按键存在直接对应关 系。 因此需要通过输入法进行输入转换。 首先需要通过汉字编码将汉字转换成 能够直接输入的字母、 数字等; 通常是用的编码就是拼音(包括简拼、 双拼、 模糊音等各种形式)。 用户将汉字的编码字符串通过键盘输入计算机(某些情 况下也可能使用鼠标, 比如软键盘); 用户的键盘输入通过操作系统交给输入 法, 输入法进行解码; 由于不同的汉字序列 (词、 句)可能具有相同的编码, 因此输入法通常提供一个候选列表供用户从中选择。  In Chinese, Chinese characters as the basic language unit do not directly correspond to the keys on the keyboard. Therefore, input conversion is required by the input method. First of all, it is necessary to convert Chinese characters into letters, numbers, etc. that can be directly input by Chinese character encoding; usually, the encoding used is pinyin (including various forms such as simple spelling, double spelling, and fuzzy sounding). The user inputs the encoded character string of the Chinese character into the computer through the keyboard (in some cases, the mouse may also be used, such as a soft keyboard); the keyboard input of the user is passed to the input method through the operating system, and the input method is used for decoding; due to different Chinese character sequences (words) , sentences) may have the same encoding, so the input method usually provides a candidate list for the user to choose from.
例如, 对于拼音输入法可能包含以下步骤:  For example, the Pinyin input method might include the following steps:
a、拼音解析: 切分输入字符串得到拼音, 比如 zhuanli今 [zhuan][li]。 当然, 有时候这种切分不是唯一的, 比如 fangan今 ^31¾][311]或者 11]¾311] (分别对 应"方案" "反感")。 优选的, 输入法可以支持简拼, 允许用户以以下形式输入: zl, zhl, zhuanl, zhli,…等。 考虑到某些用户发音不标准, 也可以支持模糊音: zuanli。 另外还可以采用双拼等形式。  a, Pinyin analysis: Split the input string to get pinyin, such as zhuanli today [zhuan] [li]. Of course, sometimes this cut is not unique, such as fangan today [313] [311] or 11] 3⁄4311] (respectively "program" "disgust"). Preferably, the input method can support Jane spelling, allowing the user to input in the following forms: zl, zhl, zhuanl, zhli, ..., etc. Considering that some users are not standard in pronunciation, they can also support fuzzy sounds: zuanli. In addition, it is also possible to adopt a form such as double spelling.
b、 汉字解码。 才 据切分得到的拼音序列到词库中查找对应的字词, 或者 通过一定的算法生成对应的句子。 c、 用户选择所需要的内容, 上屏(可能还有造词、 造句的过程)。 由于不同的汉字序列可能对应相同的编码,对于特定的编码字符串,输入 法需要猜测用户真实的意图。 而 故到这一点, 需要词库的支持。 b. Chinese character decoding. The pinyin sequence obtained by segmentation is searched for the corresponding word in the lexicon, or a corresponding sentence is generated by a certain algorithm. c, the user selects the required content, the screen (may also have the process of making words, making sentences). Since different Chinese character sequences may correspond to the same encoding, for a particular encoded string, the input method needs to guess the user's true intent. And to this point, the support of the thesaurus is needed.
对于本发明而言, 词库可以包含各种语言信息, 例如:  For the purposes of the present invention, the thesaurus can contain various language information, such as:
( 1 )词条  (1) entry
虽然也可以在字的基础上构建输入法,但由于词才是汉语中的最小表义单 位, 因此现代输入法大量使用了词条信息。 例如用户分别输入" zhuan"这个拼 音的时候, 很难确定他究竟想输入"转专賺砖 ...... "中的哪一个字。 同样, 用户 输入" li"的时候, 也很难确定他想输入的是"里李力利 ...... "中的哪一个字。 但 是, 如果用户连续输入" zhuanli"这两个音节, 基本上可以断定用户想输入的就 是"专利"这个词。 这可以大大提高输入法首选的准确度。  Although the input method can also be constructed on the basis of words, since the word is the smallest nominal unit in Chinese, the modern input method makes extensive use of the term information. For example, when the user inputs the "zhuan" pinyin separately, it is difficult to determine which one of the words "want to earn bricks...". Similarly, when the user enters "li", it is difficult to determine which word in "Li Lili..." he wants to enter. However, if the user continuously inputs the two syllables "zhuanli", it is basically determined that the user wants to input the word "patent". This can greatly improve the accuracy of the input method.
( 2 )词频  (2) word frequency
同音字大量存在, 同音词也仍然是存在的。 遇到这种情况, 只能把所有选 项列出来供用户选择。 但候选位置对输入法的易用性有很大影响。 一般而言, 把较常用的词放到靠前的位置会对用户更有利 , 即词频是候选排序的重要依 据。  There are a lot of homophones, and homophones still exist. In this case, all options can only be listed for the user to choose. However, candidate locations have a large impact on the ease of use of the input method. In general, it is more advantageous for the user to put the more commonly used words in the front position, that is, the word frequency is an important basis for candidate ranking.
另外, 现有的很多输入法中都集成了自动构造句子的功能。 此时, 词频信 息也是句子构造的重要依据。  In addition, many of the existing input methods integrate the function of automatically constructing sentences. At this time, word frequency information is also an important basis for sentence construction.
上面两种语言信息是输入法词库中不可或缺的,而本发明的输入法词库还 可以包括其他一些对提高输入法准确度有利的信息, 例如:  The above two language information is indispensable in the input method vocabulary, and the input method vocabulary of the present invention may also include other information that is advantageous for improving the accuracy of the input method, for example:
语言连接关系。 输入法在构造句子的过程中, 除了需要考虑词频, 还需要 考虑词和词之间的连接关系。 例如"的"常出现在形容词、 名词、 代词等后面, 而"地"则常出现在副词后面。 在这种情况下, 如果用户输入了" de", 是不能只 看"的""地"哪个词频更高的。 在词库中存放了输入法所需的语言信息, 用户就可以完成字符输入了。但 是, 不同用户所需的语言信息并不相同。 比如:  Language connection. In the process of constructing sentences, in addition to the need to consider word frequency, the input method also needs to consider the connection between words and words. For example, "of" often appears after adjectives, nouns, pronouns, etc., while "ground" often appears after adverbs. In this case, if the user enters "de", it is not possible to see only "of" "ground" which word is higher. The language information required by the input method is stored in the thesaurus, and the user can complete the character input. However, the language information required by different users is not the same. For example:
( 1 )词条不同。 几乎每个行业都有自己特殊的词汇, 这些词在其他领域 是很少用到的, 在构造输入法词库的时候可以不必考虑。 例如计算机词汇"緩 存"等等。 (1) The terms are different. Almost every industry has its own special vocabulary, these words are in other areas. It is rarely used, and you don't have to worry about constructing the input method lexicon. For example, computer vocabulary "cache" and so on.
( 2 )词条重要程度不同。 不同的用户可能需要用到相同的词, 但其重要 性却随用户的不同而不同。 比如同音词"研究"和"烟酒", 前者在学术领域使用 较多, 而后者则在日常生活中使用较多。 但两者都是可能用到的, 因此当用户 输入拼音" yanjiu"时, 都会出现在用户的候选列表中。 由于重要性不同, 候选 位置的相对大小会影响用户的直观感受。  (2) The degree of importance of the terms is different. Different users may need to use the same words, but their importance varies from user to user. For example, the homonyms "study" and "smoke and tobacco", the former is used more in the academic field, while the latter is used more in daily life. But both are possible, so when the user enters the pinyin "yanjiu", it will appear in the user's candidate list. Due to their different importance, the relative size of the candidate locations can affect the user's intuitive perception.
对于词条相对于用户的重要程度,可以通过各种方式单独使用或者组合应 用, 在词库中加以体现, 例如:  The importance of the terms relative to the user can be used in a variety of ways, either alone or in combination, in the thesaurus, for example:
词频信息。词频信息通常用一个数字表示, 用来表示这个词的使用频繁程 度; 一般使用越频繁的词词频越高。  Word frequency information. The word frequency information is usually represented by a number, which is used to indicate the frequency of use of the word; the more frequently the word is used, the higher the frequency.
词序信息。词序信息通常也是一个数字,但只用于表示该词条重要程度的 相对含义。  Word order information. The word order information is usually also a number, but is only used to indicate the relative meaning of the importance of the item.
或者, 位置信息。 为了方便, 也可以省略这个数据, 而用词条在词库中的 相对位置来表达词条的重要程度。例如,可以认为排在词库前面的词比排在后 面的词更重要, 从而将前者放在候选列表的前面。  Or, location information. For convenience, this data can also be omitted, and the relative position of the terms in the thesaurus is used to express the importance of the terms. For example, you can think of words that are in front of the thesaurus more important than words that are listed later, so that the former is placed before the candidate list.
由于输入法词库不可能针对每一个用户生成一个专用的词库, 因此,本发 明提出,将输入法词库划分为系统词库和细胞词集两部分。 系统词库用于记载 常用词汇, 以满足大多数人在大多数情况下的输入需求, 而对于某个用户的个 性化需求,则通过细胞词集进行记载。为了提高细胞词集与每个用户的贴合度, 本发明可以通过手动或者自动的方式生成大量的细胞词库,然后由各个用户自 行选择自己所需的细胞词库,得到细胞词集,这样的细胞词集与每一个用户的 贴合度都是非常好的, 因为个性化的部分是其自行选择的。  Since the input method lexicon cannot generate a dedicated vocabulary for each user, the present invention proposes to divide the input lexicon into two parts: a systematic lexicon and a cell vocabulary. The system vocabulary is used to record common vocabulary to meet the input needs of most people in most cases, while the individualized needs of a user are recorded by the cell word set. In order to improve the fit of the cell word set to each user, the present invention can generate a large number of cell lexicons manually or automatically, and then each user can select his own cell vocabulary to obtain a cell word set. The cell word set is very good for each user's fit, because the personalized part is chosen by itself.
对于用户选择了一个细胞词库的情况,则该细胞词库可以直接构成细胞词 集。  In the case where the user selects a cell lexicon, the cell lexicon can directly constitute a cell vocabulary.
对于用户选择了多个细胞词库时, 则细胞词集可以具有多种表现形式。例 如: (1 )在客户端, 将所述的多个细胞词库合并成为一个词库, 即细胞词集以 一个独立词库的形式存在;该词库中可以存储各词条的来源(即所属细胞词库) 信息, 也可以不存储。 (2 )在客户端, 将所述的多个细胞词库并列存储, 即细 胞词集以多个独立词库并存的形式存在 ,编码转换时依次扫描该多个细胞词库 即可。 (3 )在客户端, 将所述的多个细胞词库中的一部分词库合并(例如, 某 些属性比较相近的词库), 即细胞词集以多个独立词库并存的形式存在, 但是 其中某些独立词库是由多个细胞词库合并得到的。 When a user selects multiple cell lexicons, the cell word set can have multiple representations. For example: (1) on the client side, the plurality of cell lexicons are combined into one lexicon, that is, the cell word set exists in the form of an independent lexicon; the lexicon can store the source of each vocabulary (ie, The cell lexicon) information may or may not be stored. (2) on the client side, storing the plurality of cell lexicons in parallel, that is, The cell word set exists in the form of coexistence of multiple independent lexicons, and the plurality of cell lexicons may be sequentially scanned during the coding conversion. (3) at the client side, merging some of the plurality of cell lexicons (for example, some similar lexicons), that is, the cell word set exists in the form of multiple independent lexicons. But some of these independent lexicons are obtained by combining multiple cell lexicons.
对于细胞词集而言, 由于某些语言信息比较复杂, 例如, 语言连接关系等 等, 一是难以获得, 二是难以存储, 所以优选的, 对于细胞词集而言(实际上 包括各个细胞词库), 其中存储的语言信息的类型可以少于系统词库中所存储 的语言信息的类型。 当然, 细胞词集中所存储的语言信息的类型也有可能多于 系统词库中所存储的语言信息的类型, 例如, 对于词序信息或者位置信息, 一 般存储在细胞词库中, 而系统词库中一般没有。  For cell word sets, because some language information is more complicated, for example, language connection, etc., one is difficult to obtain, and the second is difficult to store, so it is preferable for cell word sets (actually including individual cell words). Library), wherein the type of language information stored may be less than the type of language information stored in the system lexicon. Of course, the type of linguistic information stored in the cell word set may also be more than the type of linguistic information stored in the system lexicon. For example, for word order information or location information, it is generally stored in the cell lexicon, and in the system lexicon. Generally not.
进一步, 本实施例的输入法系统中还可以包括用户词库 106, 用于记录该 用户的输入习惯, 以更好的满足该用户的个性化需求。  Further, the input method system of this embodiment may further include a user vocabulary 106 for recording the user's input habits to better meet the personalized needs of the user.
在服务器提供的平台上,存在大量的细胞词库, 并且也会有大量的用户为 了完善这些细胞词库, 对其进行修改和更新, 因此, 如何将最新最好的细胞词 库提供给选择该细胞词库的输入法用户使用 ,也是本发明进一步需要解决的技 术问题之一。  On the platform provided by the server, there are a large number of cell lexicons, and a large number of users will modify and update these cell lexicons. Therefore, how to provide the latest and best cell lexicon to choose The use of the input method user of the cell vocabulary is also one of the technical problems that the present invention further needs to solve.
优选的, 本实施例还可以包括: 自动更新模块 107, 用于接受触发, 依据 已有细胞词库列表, 从服务器端下载所需的更新数据。 例如, 该用户的输入法 系统中存储有正在应用的细胞词库的信息列表,然后与服务器端的信息进行比 较, 如果需要更新, 则根据预置的更新策略, 完成下载更新。 所述的更新数据 可以为整个细胞词库, 例如, 得知该细胞词库需要更新, 则直接下载该细胞词 库的所有词条信息; 所述的更新数据也可以为一细胞词库中的部分词条信息, 例如, 得知该细胞词库需要更新, 则通过词条比对, 仅仅下载发生变化的词条 信息。 当然,服务器端还可以将多个细胞词库中发生变化的词条信息合并成为 一个新词库作为更新数据进行下载。  Preferably, the embodiment may further include: an automatic update module 107, configured to receive the trigger, and download the required update data from the server according to the existing cell vocabulary list. For example, the user's input method system stores a list of information of the cell vocabulary being applied, and then compares it with the information on the server side. If an update is required, the download update is completed according to the preset update policy. The update data may be the entire cell lexicon. For example, if the cell lexicon needs to be updated, all the vocabulary information of the cell lexicon may be directly downloaded; the updated data may also be in a cell vocabulary. Part of the term information, for example, knowing that the cell lexicon needs to be updated, only the changed term information is downloaded through the lexical comparison. Of course, the server side can also merge the changed term information in multiple cell thesaurus into a new thesaurus as the update data for download.
如果用户选择了多个细胞词库,则服务器端可以将这多个细胞词库合并成 为一个词库, 然后发送至客户端作为细胞词集, 即细胞词库的数据添加任务由 服务器端完成。  If the user selects multiple cell lexicons, the server side can merge the multiple cell lexicons into one vocabulary and then send them to the client as a cell word set, that is, the data addition task of the cell vocabulary is completed by the server.
如果用户选择了多个细胞词库,则对于细胞词库的数据添加也可以由输入 法系统自行完成, 此时, 本实施例还可以包括: 添加模块 108, 用于将下载的 细胞词库词条信息添加至所述细胞词集中。该添加模块 108可以采用各种可行 的添加策略, 例如, 所述添加方式为: 完成更新下载一个细胞词库, 则添加该 细胞词库至所述细胞词集中; 或者, 所述添加方式为: 完成所有待更新细胞词 库的下载后, 才合并添加至所述细胞词集中。 If the user selects multiple cell lexicons, the data for the cell lexicon can also be entered by input. The method is completed by itself. In this case, the embodiment may further include: an adding module 108, configured to add the downloaded cell lexicon information to the cell word set. The adding module 108 can adopt various feasible adding policies, for example, the adding manner is: completing the update to download a cell vocabulary, adding the cell lexicon to the cell word set; or, the adding manner is: After all downloads of the cell lexicon to be updated are completed, they are merged and added to the cell word set.
该添加模块 108可以用于细胞词集第一次形成的时候,或者其词库更新的 时候。该添加模块 108可以用于下载整个细胞词库的情况,也可以用于下载一 细胞词库中的部分词条信息的情况。  The add module 108 can be used when the cell word set is first formed, or when the vocabulary is updated. The adding module 108 can be used to download the entire cell lexicon or to download a partial vocabulary information in a cell vocabulary.
优选的, 如果词库添加过程能够在较短时间内完成(比如不超过 1秒), 由于影响不大, 则可以直接将添加过程插入到用户的输入过程中。但如果在较 短时间内无法完成以致可能影响用户的使用感受,则词库添加过程应当在一个 独立的緩存词库中进行。 在这个过程中, 输入法原来的词库不受影响, 用户可 以正常使用。 当緩存词库创建完毕后, 直接替换输入法原来的词库即可。 由于 这个替换过程可以很快, 因此可以做到对用户的正常使用干扰降到最低。  Preferably, if the thesaurus adding process can be completed in a short time (for example, no more than 1 second), since the influence is not large, the adding process can be directly inserted into the user's input process. However, if it cannot be completed in a short period of time and may affect the user's experience, the thesaurus addition process should be performed in a separate cached lexicon. In this process, the original vocabulary of the input method is not affected and the user can use it normally. When the cached thesaurus is created, you can directly replace the original thesaurus of the input method. Since this replacement process can be fast, interference to the normal use of the user can be minimized.
优选的, 为了进一步提高用户对词库的管理, 本实施例还可以包括: 细胞 词库停用模块 109, 用于接收用户指令(例如, 通过点选菜单项等方式), 从 细胞词集中去除属于用户所选细胞词库的词条记录,达到将某个或者某些细胞 词库停用的目的。  Preferably, in order to further improve user management of the thesaurus, the embodiment may further include: a cell lexicon deactivation module 109, configured to receive user instructions (eg, by clicking a menu item, etc.), to remove from the cell word set. A term record belonging to the cell vocabulary selected by the user, for the purpose of deactivating one or some cell lexicons.
其中, 所述的去除过程可以为: 接收用户指令, 将用户所选的细胞词库从 列表中删除, 并重新添加列表中的细胞词库, 得到新的细胞词集。 由于被删除 的细胞词库已经不在列表中存在,新得到的细胞词集将不包含其中的词,效果 上等价于该词库已经被删除。对于在细胞词集中独立存在的细胞词库而言, 直 接删除或者加上删除标记即可达到停用的目的。  The removing process may be: receiving a user instruction, deleting the cell vocabulary selected by the user from the list, and re-adding the cell vocabulary in the list to obtain a new cell word set. Since the deleted cell lexicon is no longer in the list, the newly obtained cell word set will not contain the words therein, and the effect is equivalent to the lexicon has been deleted. For a cell vocabulary that exists independently in a cell's vocabulary, it can be deactivated by directly deleting or adding a delete tag.
或者, 所述去除过程也可以为: 接收用户指令, 从所述细胞词集中删除属 于用户所选细胞词库的词条记录,所述细胞词集中记载有各词条所属的细胞词 库信息。 或者, 所述去除过程也可以为: 接收用户指令, 在所述细胞词集中, 向属于用户所选细胞词库的词条记录添加删除标记,所述细胞词集中记载有各 词条所属的细胞词库信息。  Alternatively, the removing process may be: receiving a user instruction to delete a term record belonging to the cell vocabulary selected by the user from the cell word set, wherein the cell word set records the cell lexicon information to which each term belongs. Alternatively, the removing process may be: receiving a user instruction, in the cell word set, adding a deletion mark to the entry record belonging to the cell vocabulary selected by the user, where the cell word set records the cell to which the term belongs Thesaurus information.
即作为细胞词集的大词库中记载了每个词条的来源,当用户指定删除某个 细胞词库时通知输入法系统(或者其主动 )将来自该细胞词库的词条从词库中 移除。 这种移除可以是直接将该词条从数据结构中删除并释放其对应的空间 , 也可以通过一个删除标记实现。 具有删除标记的词条在后续使用中将被忽略 (不释放空间, 但实现起来会容易些)。 这种方式的好处是, 当细胞词库很多 时删除少量词库词条而引起的系统开销会比较小。 That is, the source of each entry is recorded in the large lexicon of the cell word set, when the user specifies to delete a certain The cell lexicon informs the input method system (or it actively) to remove the terms from the cell lexicon from the lexicon. This removal can be done by directly deleting the term from the data structure and releasing its corresponding space, or by deleting the tag. Terms with delete tags will be ignored in subsequent use (no space is freed, but it will be easier to implement). The advantage of this approach is that the system overhead caused by deleting a small number of lexicon entries when the cell vocabulary is large will be relatively small.
参照图 2, 示出了一种用于完成字符输入的方法实施例, 具体可以包括: 步骤 201、 加载系统词库和细胞词集; 所述细胞词集由从服务器端所存储 的多个细胞词库中选取的至少一个细胞词库得到;每个细胞词库中的字词至少 具有一个共同属性;  Referring to FIG. 2, an embodiment of a method for performing character input is shown. Specifically, the method may include: Step 201: loading a system vocabulary and a cell word set; and the cell word set is composed of a plurality of cells stored from a server end. At least one cell lexicon selected in the lexicon is obtained; the words in each cell lexicon have at least one common attribute;
步骤 202、 接收用户的输入信息;  Step 202: Receive input information of a user.
步骤 203、 依据所接收的输入信息, 在所述系统词库和细胞词集中进行检 索, 得到相应的候选项;  Step 203: Perform a search in the system vocabulary and the cell word set according to the received input information, and obtain corresponding candidates;
步骤 204、 接收用户的选择信息, 将指定的候选项上屏输出。  Step 204: Receive selection information of the user, and output the specified candidate on the screen.
本实施例中比较重要的一个问题是, 当多个词库并存时,如何完成候选项 的检出。 步骤 201中所述的加载过程可以为: 将细胞词集与系统词库合并为一 个词库, 置于緩存中。  One of the more important issues in this embodiment is how to check out candidates when multiple lexicons coexist. The loading process described in step 201 may be: merging the cell word set and the system vocabulary into a vocabulary, and placing it in the cache.
输入法在启动的时候,扫描输入法系统中具有的系统词库和细胞词集,将 二者合并为一个词库后载入緩存中 ,这样用户在后续操作中可以按照系统词库 的使用方式直接使用。其中 ,系统词库的加载和细胞词集的加载可以分开进行, 例如, 简单情况下, 用户仅需要加载系统词库即可, 在某些情况下, 用户选择 或者输入法系统自动启动(例如, 符合预置策略的情况下)触发启动细胞词集 的加载, 然后将细胞词集合并至系统词库, 置于緩存中, 用于用户输入时的检 索。  When the input method is started, the system vocabulary and cell word set in the input method system are scanned, and the two are combined into one lexicon and then loaded into the cache, so that the user can follow the system lexicon in subsequent operations. Use directly. The loading of the system vocabulary and the loading of the cell word set can be performed separately. For example, in a simple case, the user only needs to load the system vocabulary, and in some cases, the user selection or input method system is automatically started (for example, In the case of a preset policy, the triggering of the loading of the cell word set is triggered, and then the cell word collection is merged into the system vocabulary and placed in the cache for retrieval upon user input.
进一步, 步骤 201中所述的加载过程也可以为: 将细胞词集与系统词库作 为两个或多个独立词库置于緩存中, 并依据预置规则设定的词库优先级; 所述 优先级用于候选项的显示排序。  Further, the loading process described in step 201 may also be: placing the cell word set and the system vocabulary as two or more independent vocabularies in a cache, and setting the lexicon priority according to the preset rule; The priority is used for display ordering of candidates.
即在加载过程中,将细胞词集放到系统词库以外指定的空间, 并在检索系 统词库的同时也检索细胞词集。优选的, 此时需要指定系统词库和细胞词集的 优先级, 例如, 默认细胞词集的优先级高于系统词库, 则输出候选项时, 将所 有属于细胞词集的词都强制放在属于系统词库的词的前面。 That is, during the loading process, the cell word set is placed in a space other than the system vocabulary, and the cell word set is also retrieved while the system vocabulary is retrieved. Preferably, the priority of the system vocabulary and the cell word set needs to be specified at this time. For example, if the default cell word set has a higher priority than the system vocabulary, when the candidate is output, Words that belong to a cell word set are forced to be placed in front of words belonging to the system lexicon.
对于细胞词集为一个大词库存在时, 即緩存中存在两个独立的词库。 而对 于细胞词集也由多个细胞词库独立组成时, 则緩存中可能存在多个独立的词 库。 当然, 此时需要设定各个词库的优先级; 所述优先级用于候选项的显示排 序。  For a cell word set to be a large word inventory, there are two separate thesaurus in the cache. While the cell word set is also composed of multiple cell lexicons, there may be multiple independent lexicons in the cache. Of course, it is necessary to set the priority of each lexicon at this time; the priority is used for the display order of the candidates.
优选的,对于细胞词集为一个大词库存在时, 为了体现各个细胞词库的不 同,也可以在细胞词集中记载有各词条所属的细胞词库以及相应的细胞词库优 先级。  Preferably, when the cell word set is a large word stock, in order to reflect the difference of each cell lexicon, the cell vocabulary to which each term belongs and the corresponding cell lexicon priority may also be recorded in the cell word set.
对于针对各个细胞词库设置有优先级的情况(包括各个细胞词库独立存在 和合并为一个大词库存在的情况), 则优选的, 在加载过程中, 可以依据输入 法的使用环境动态调整细胞词库优先级。 例如, 细胞词集包括有"办公用语" 和"网络用语"两个细胞词库, 正常情况下它们的优先级是相同的。 但当输入法 系统识别当前应用程序为 Word字处理程序时, 可以给 "办公用语"细胞词库加 权, 而当用户切换到 QQ聊天程序时, 则可以给"网络用语"细胞词库加权。  For the case where priority is set for each cell lexicon (including the case where each cell vocabulary is independently existed and merged into one big word inventory), it is preferable that during the loading process, it can be dynamically adjusted according to the usage environment of the input method. Cell lexicon priority. For example, a cell word set includes two cell lexicons, "office language" and "network terminology", which normally have the same priority. However, when the input method system recognizes that the current application is a Word word processing program, the "office language" cell vocabulary can be weighted, and when the user switches to the QQ chat program, the "network term" cell lexicon can be weighted.
参照图 3 , 示出了一种适用于前述输入法系统(为了清楚说明, 采用 "输 入法客户端"一词进行描述)的词库发布系统实施例, 该词库发布系统可以用 于输入法客户端首次从服务器端下载细胞词库得到细胞词集的过程,也可以用 于对已有细胞词库进行更新的过程。  Referring to Figure 3, there is shown an embodiment of a thesaurus publishing system suitable for use in the aforementioned input method system (described by the term "input client" for clarity of description), the thesaurus publishing system can be used for input methods The process by which the client first downloads the cell vocabulary from the server side to obtain the cell word set can also be used to update the existing cell vocabulary.
图 3所示的词库发布系统具体可以包括:  The thesaurus publishing system shown in FIG. 3 may specifically include:
细胞词库生成单元 301, 包括用于接收输入信息的接口模块 3011 , 用于依 据所接收的信息得到细胞词库的生成模块 3012, 以及用于为每个细胞词库指 定标识和版本信息的标识模块 3013; 每个细胞词库中的字词至少具有一个共 同属性;  The cell lexicon generating unit 301 includes an interface module 3011 for receiving input information, a generating module 3012 for obtaining a cell vocabulary based on the received information, and an identifier for specifying identification and version information for each cell vocabulary. Module 3013; words in each cell lexicon have at least one common attribute;
通信单元 302, —般位于服务器端, 用于接受触发信息, 传输相应的细胞 词库词条信息至客户端。  The communication unit 302, generally located at the server end, is configured to receive trigger information and transmit corresponding cell lexicon information to the client.
细胞词库生成单元 301中一般位于服务器端,用于统一管理和维护细胞词 库。当然,细胞词库生成单元 301中的部分或者全部模块也可以位于客户端(可 以为独立于输入法客户端的其他客户端) 中, 例如, 接口模块 3011和生成模 块 3012位于客户端, 用户可以直接将生成的细胞词库文件发送至服务器端即 可, 由服务器端完成指定标识和版本信息的工作。 The cell lexicon generating unit 301 is generally located at the server side for uniformly managing and maintaining the cell vocabulary. Certainly, some or all of the modules in the cell lexicon generating unit 301 may also be located in the client (which may be other clients independent of the input method client). For example, the interface module 3011 and the generating module 3012 are located at the client, and the user may directly Send the generated cell vocabulary file to the server Yes, the work of specifying the identification and version information is done by the server.
所述的触发信息可以为用户的选择操作等,也可以是输入法系统客户端自 动发送的触发信息, 还可以为服务器端的自动检测触发。 例如, 服务器或者客 户端检索用户 IP地址或者当前输入环境,而自动推荐相应的细胞词库给用户; 或者, 客户端发送的更新消息也属于触发信息的一种。  The triggering information may be a user's selection operation, or may be a trigger information automatically sent by the input method system client, or may be an automatic detection trigger of the server. For example, the server or the client retrieves the user IP address or the current input environment, and automatically recommends the corresponding cell vocabulary to the user; or, the update message sent by the client is also a type of trigger information.
细胞词库的生成可以采用手动、 自动等方式, 下面对手动生成细胞词库的 过程进行简单说明:  The generation of cell lexicon can be done manually, automatically, etc. The following is a brief description of the process of manually generating cell lexicon:
词库生成人员需要通过接口模块 3011 (例如, 包括以词库编辑页面)提 供以下信息: 名称、类别、条数、版本、说明、词库作者、词条举例、词条(包 括读音信息)等等。 当点击提交按钮后, 这些信息被保存到数据库中; 然后启 用词库生成程序。 最简单的,词库生成程序直接将这些信息以文本的方式保存 到一个文件中供用户下载。  The thesaurus generator needs to provide the following information through the interface module 3011 (for example, including the lexicon edit page): name, category, number of articles, version, description, thesaurus author, entry examples, terms (including pronunciation information), etc. Wait. When the submit button is clicked, the information is saved to the database; then the thesaurus generator is enabled. At its simplest, the thesaurus program directly saves this information in a file to a file for users to download.
个细胞词库为一个文件, 其中包含的数据可能有:  The cell vocabulary is a file containing data that may have:
Figure imgf000015_0001
为了提高细胞词库添加的效率, 还可以对细胞词库的格式进行必要处理。 例如对其内部的词条进行排序, 当然, 这些工作都可以在生成模块 3012中完 成, 然后将词条排序后的数据文件作为细胞词库文件提供给用户下载。
Figure imgf000015_0001
In order to improve the efficiency of cell lexicon addition, the format of the cell lexicon can also be processed as necessary. For example, the internal terms are sorted. Of course, these tasks can be completed in the generating module 3012, and then the data files sorted by the terms are provided to the user as a cell lexicon file for downloading.
出于版权信息保护等目的, 还可以对细胞词库进行加密处理。 对应的, 需 要在安装细胞词库时对其进行解密。 即优选的,服务器端还可以包括一加密模 块, 输入法客户端还可以包括一解密模块。 The cell lexicon can also be encrypted for purposes such as copyright protection. Corresponding To decrypt the cell lexicon when it is installed. That is, preferably, the server side may further include an encryption module, and the input method client may further include a decryption module.
为了便于更新,标识模块 3013同时会为每一个细胞词库指定一个唯一 ID 和一个版本号。  For ease of updating, the identification module 3013 also assigns a unique ID and a version number to each cell vocabulary.
图 3所示实施例中的细胞词库可以具有多种表现形式,例如:一般情况下, 细胞词库中直接存储多个词条信息; 或者, 细胞词库中也可以仅仅存储索引信 息, 所述索引信息对应其他细胞词库。存储索引信息的细胞词库一般可以应用 于: 服务器端存储有多个依据所接收的信息得到的细胞词库, 然后根据这些细 胞词库的某个共性, 生成一个新的细胞词库(即间接利用所接收的信息), 为 了实现简便, 则可以仅仅在该新细胞词库中存储索引信息即可, 用户需要该词 库时, 再由服务器端合并各相应词库后进行传输。  The cell vocabulary in the embodiment shown in FIG. 3 can have various representations, for example, generally, multiple vocabulary information is directly stored in the cell vocabulary; or, the cell vocabulary can only store index information. The index information corresponds to other cell lexicons. The cell lexicon storing index information can be generally applied to: The server side stores a plurality of cell vocabularies obtained according to the received information, and then generates a new cell vocabulary according to a certain commonality of the cell lexicons (ie, indirect Using the received information), in order to achieve simplicity, the index information may be stored only in the new cell vocabulary. When the user needs the vocabulary, the server then merges the corresponding lexicons and transmits them.
进一步, 为了满足细胞词库的快速更新, 则本实施例中词库发布系统的细 胞词库生成单元 301还可以包括: 修改更新模块 3014, 用于修改更新细胞词 库已存信息, 并通知所述标识模块 3013针对该细胞词库生成新的版本信息。 所述修改可以为人工完成,也可以为依据一定的预置策略对细胞词库进行调整 而完成, 例如: 其他用户向某个细胞词库中添加新的词条; 或者, 依据预置策 略, 将两个细胞词库中的词条合并为一个细胞词库; 或者, 依据互联网词频统 计结果,将某个细胞词库中互联网词频不符合预置条件的词条进行删除或者进 行排序调整。  Further, in order to satisfy the rapid update of the cell vocabulary, the cell vocabulary generating unit 301 of the vocabulary publishing system in this embodiment may further include: a modify updating module 3014, configured to modify the updated information of the updated cell vocabulary, and notify the The identification module 3013 generates new version information for the cell vocabulary. The modification may be done manually, or may be performed by adjusting the cell vocabulary according to a certain preset strategy, for example: other users add a new entry to a cell lexicon; or, according to a preset policy, The words in the two cell lexicons are combined into one cell lexicon; or, according to the Internet word frequency statistics result, the words in the cell vocabulary whose Internet word frequency does not meet the preset condition are deleted or sorted and adjusted.
一是先将细胞词库下载至本地, 然后通过双击打开这个文件, 完成数据的 添加。 细胞词库是带有某一特定后缀名的文件, 例如. scd后缀。 当输入法系统 在安装的时候, 会通过注册表将. scd后缀与一个特定的应用程序关联。 当用户 双击后缀为. scd文件的时候,操作系统会 据这个关联规则启动对应的应用程 序模块(例如, 图 1所示实施例中的添加模块), 完成细胞词库数据的添加。 First, download the cell vocabulary to the local, and then open the file by double-clicking to complete the data addition. A cell vocabulary is a file with a specific suffix name, such as a scd suffix. When the input method system is installed, the .scd suffix is associated with a specific application through the registry. When the user double-clicks the suffix to the .scd file, the operating system starts the corresponding application module according to the association rule (for example, the adding module in the embodiment shown in FIG. 1), and completes the addition of the cell lexicon data.
二是通过点击页面上的链接, 直接在线完成细胞词库数据的添加。用户点 击页面上的细胞词库链接后, 有两种方式: 保存和执行。 如果用户保存了细胞 词库文件, 同前一种方式。 如果用户选择了执行, 系统会将细胞词库文件保存 在系统的临时文件夹中, 然后运行它。其内部实现机制和第一种方式也是相同 的,区别在于文件被下载到了系统临时文件夹,因此不需要用户指定下载位置。 同时, 系统会在必要时对临时目录进行清理, 因此虽然细胞词库已经被下载到 临时目录中, 但实际对用户而言是不可见的。 The second is to directly add the cell lexicon data online by clicking the link on the page. After the user clicks on the cell lexicon link on the page, there are two ways: save and execute. If the user saves the cell lexicon file, the same as the previous method. If the user chooses to execute, the system saves the cell thesaurus file In the system's temporary folder, then run it. The internal implementation mechanism is the same as the first one, except that the file is downloaded to the system temporary folder, so the user is not required to specify the download location. At the same time, the system will clean up the temporary directory when necessary, so although the cell vocabulary has been downloaded to the temporary directory, it is actually invisible to the user.
优选的,将所下载的细胞词库添加至细胞词集的过程,还可以包含一个转 换步骤, 例如对词库中原来无序的词条进行排序以便提高添加的效率。如果存 在这个转换步骤,将使用转换后的词库文件;否则直接使用原词库文件。 当然, 如果服务器端在词库生成过程中已经完成了转换排序的工作,则客户端在数据 添加时就不需要重复该工作了。  Preferably, the process of adding the downloaded cell vocabulary to the cell word set may also include a conversion step, such as sorting the original unordered terms in the lexicon to improve the efficiency of the addition. If there is a conversion step, the converted thesaurus file will be used; otherwise the original thesaurus file will be used directly. Of course, if the server has completed the conversion sorting process during the thesaurus generation process, the client does not need to repeat the work when the data is added.
在数据添加过程中, 输入法系统(即输入法客户端)需要维护一个当前所 应用的细胞词库的列表。 所述细胞词库列表可以采用各种可行的形式, 例如, 将所有活动的细胞词库拷贝到一个指定的目录中,或者保存一个文件名的列表 即可(这个列表可以放在本地磁盘文件中, 也可以存放在注册表中, 或者存放 在远程, 例如网络上)。  During the data addition process, the input method system (ie, the input method client) needs to maintain a list of currently applied cell lexicons. The list of cell lexicons can take various feasible forms, for example, copy all active cell lexicons into a specified directory, or save a list of file names (this list can be placed in a local disk file) , can also be stored in the registry, or stored in a remote, such as on the network).
对于将细胞词库的数据添加至细胞词集的过程,可以在下载完成之后立即 操作(例如, 通知输入法客户端开始添加操作); 也可以等待输入法主动发现 更新(例如用户下次启动输入法)的时候, 再开始添加操作: 扫描细胞词库列 表 , 依次读入并将每个细胞词库添加到细胞词集中。  The process of adding the data of the cell lexicon to the cell word set can be operated immediately after the download is completed (for example, notifying the input method client to start the add operation); or waiting for the input method to actively discover the update (for example, the user starts the input next time) When the method is started, add the operation again: Scan the cell vocabulary list, read in and add each cell lexicon to the cell word set.
下面以细胞词集的表现形式为一个独立存在的大词库为例进行说明,具体 的添加过程可以有两种方式: 增量、 批量。  The following is an example of the expression of the cell word set for an independent large vocabulary. The specific addition process can be in two ways: incremental, batch.
批量方式是一次性将所有细胞词库中的词合并成一个大的临时词库,然后 一次性加入细胞词集。这种方式实现起来会比较简单,但用户必须等待所有词 库都合并完成后才能使用新加入的细胞词库。增量方式为: 当读入若干个词条 就将其加入细胞词集, 如果合并时间很长的话, 用户可以边合并边使用, 但这 对系统设计的要求较高。  The batch method combines the words in all cell lexicons into one large temporary vocabulary at once, and then adds the cell word set at once. This approach is simpler to implement, but the user must wait until all the lexicons have been merged before using the newly added cell lexicon. The incremental mode is: When several entries are read, they are added to the cell word set. If the merge time is long, the user can use it while merging, but this requires higher system design.
对于增量合并方式,在合并过程中就可以使用, 因此当合并完成后不需要 通知输入法系统。但对于批量合并方式, 需要在合并完成后通知输入法系统新 的词库已经可以使用了。一种替代的做法是, 直接访问输入法的存储空间并对 数据进行更新, 这样虽然输入法没有得到通知, 但数据已经被更新, 因此实际 已经可以使用新的数据了。 For the incremental merge mode, it can be used during the merge process, so there is no need to notify the input method system when the merge is completed. However, for the batch merge method, it is necessary to notify the input method system that the new thesaurus is ready for use after the merge is completed. An alternative approach is to directly access the storage space of the input method and update the data so that although the input method is not notified, the data has been updated, so the actual New data is ready to use.
优选的, 在数据添加的过程中, 还可以包括优化步骤, 用于对词库中重复 的词进行优化, 例如, 将重复的词条合并。 当然, 为了准确记录该词, 可以在 其来源属性中记录其所述的多个细胞词库的标识等信息。进一步,还可以记录 该词所述的多个细胞词库的不同的优先级, 用于对于不同的输入环境, 采用不 同的细胞词库的优先级进行候选项排序。  Preferably, in the process of data addition, an optimization step may be further included for optimizing the repeated words in the thesaurus, for example, combining the repeated terms. Of course, in order to accurately record the word, information such as the identifier of the plurality of cell lexicons described therein may be recorded in its source attribute. Further, different priorities of the plurality of cell lexicons described in the word can also be recorded for different candidate environments to prioritize candidate cell rankings for different input environments.
为了帮助输入法客户端更好的完成更新任务,则本实施例中的词库发布系 统可以将更新的识别工作设置在服务器端完成。 即优选的,本实施例中的词库 发布系统还可以包括: 识别模块 303, 用于比较服务器端保存的细胞词库列表 和客户端发送的细胞词库列表,依据所得到的比较结果,传输所需的更新数据 至客户端。 例如, 可以将发生变化的细胞词库形成列表发送给客户端, 由客户 端确定和发起下载请求; 或者,也可以直接由服务器将发生变化的细胞词库推 送给客户端, 完成更新。 所述的更新数据可以为整个细胞词库, 例如, 识别得 知该细胞词库需要更新, 则传输该细胞词库的所有词条信息; 所述的更新数据 也可以为一细胞词库中的部分词条信息,例如,识别得知该细胞词库需要更新, 则进一步通过词条比对, 仅仅传输发生变化的词条信息即可。  In order to help the input method client to better complete the update task, the thesaurus publishing system in this embodiment can set the update identification work to be completed on the server side. Preferably, the vocabulary publishing system in this embodiment may further include: an identifying module 303, configured to compare the cell vocabulary list saved by the server and the cell vocabulary list sent by the client, and transmit according to the obtained comparison result. The required update data to the client. For example, the changed cell vocabulary formation list may be sent to the client, and the client may determine and initiate a download request; or, the server may directly push the changed cell vocabulary to the client to complete the update. The update data may be an entire cell vocabulary, for example, identifying that the cell lexicon needs to be updated, and transmitting all the vocabulary information of the cell lexicon; the updated data may also be in a cell vocabulary. Part of the term information, for example, identifying that the cell lexicon needs to be updated, further through the lexical comparison, only transmitting the changed term information.
为了进一步提高词库发布的效率, 本实施例还可以包括: 合并模块 304, 用于将多个细胞词库词条信息合并为一个下载词库,并通知通信单元 302将该 下载词库传输至客户端。 所述合并单元可以用于各种可能的场景, 例如, 将用 户所选的多个细胞词库合并为一个词库后进行传输; 或者,将多个需要更新的 细胞词库中的发生变化的词条信息进行合并,得到一个新词库,然后进行传输; 或者, 将细胞词库中索引信息相应的细胞词库进行合并, 得到一个新词库, 然 后进行传输。  In order to further improve the efficiency of the thesaurus publishing, the embodiment may further include: a merging module 304, configured to merge the plurality of cell lexicon terms information into one download vocabulary, and notify the communication unit 302 to transmit the downloaded vocabulary to the Client. The merging unit can be used for various possible scenarios, for example, combining a plurality of cell vocabularies selected by a user into one lexicon for transmission; or changing a plurality of cell lexicons that need to be updated. The term information is merged to obtain a new thesaurus, and then transmitted; or, the cell lexicon corresponding to the index information in the cell lexicon is combined to obtain a new vocabulary, and then transmitted.
参照图 4, 示出了一种词库更新的方法实施例, 所需更新的词库涉及到在 输入法系统中记录扩展字词及其相关信息的细胞词集,所述细胞词集由从服务 器端所存储的多个细胞词库中选取的至少一个细胞词库得到;每个细胞词库中 的字词至少具有一个共同属性;  Referring to FIG. 4, an embodiment of a method for updating a thesaurus is described. The required updated thesaurus relates to a set of cell words for recording extended words and related information in an input method system. At least one cell lexicon selected from a plurality of cell lexicons stored on the server side; the words in each cell lexicon have at least one common attribute;
所述方法实施例具体可以包括:  The method embodiment may specifically include:
步骤 401、 接收触发, 比较已有细胞词库列表和服务器端细胞词库列表, 得到所需更新的词库列表; 所述触发可以为手动触发, 也可以为自动触发; 步骤 402、 下载所需更新的细胞词库词条信息, 并添加至细胞词集中。 优选的, 所述方法实施例还可以包括步骤 403: 手动或者自动升级服务器 端所存储的细胞词库, 并更改相应的版本信息。 所述升级可以为人工完成, 也 可以为依据一定的预置策略对细胞词库进行调整而完成, 例如: 其他用户向某 个细胞词库中添加新的词条; 或者, 依据预置策略, 将两个细胞词库中的词条 合并为一个细胞词库; 或者, 依据互联网词频统计结果, 将某个细胞词库中互 联网词频不符合预置条件的词条进行删除或者进行排序调整。 Step 401: Receive a trigger, compare an existing cell vocabulary list and a server-side cell vocabulary list, Obtaining a list of the thesaurus that is required to be updated; the triggering may be a manual triggering or an automatic triggering; Step 402: Download the updated cell lexicon entry information and add to the cell word set. Preferably, the method embodiment may further include the step 403: manually or automatically upgrading the cell vocabulary stored on the server side, and changing the corresponding version information. The upgrade may be done manually, or may be performed by adjusting the cell vocabulary according to a certain preset strategy, for example: other users add a new entry to a cell vocabulary; or, according to a preset policy, The words in the two cell lexicons are merged into one cell lexicon; or, according to the Internet word frequency statistical result, the words in the cell vocabulary whose Internet word frequency does not meet the preset condition are deleted or sorted and adjusted.
为了便于更新 , 每个细胞词库可以具有一个唯一的 ID , 这个唯一 ID可以 是一个自然增长的整数,也可以是一个网络地址或者其他信息(只要保证两个 不同的细胞词库具有不同的 ID就可以)。每个细胞词库还可以具有一个版本信 息, 这个版本信息可以是一个流水号, 也可以是最后一次修改的时间。 该版本 信息发生了改变, 则表明该词库文件需要更新。 例如, 采用客户端最后一次更 新时间作为版本信息, 如果与服务器上保存的文件更新时间相比前者有变化, 那么该词库文件需要更新。  For easy updating, each cell lexicon can have a unique ID, which can be a naturally growing integer, or a network address or other information (as long as two different cell lexicons have different IDs) can). Each cell vocabulary can also have a version information, which can be a serial number or the time of the last modification. A change in the version information indicates that the thesaurus file needs to be updated. For example, if the last update time of the client is used as the version information, if there is a change in the file update time saved on the server, the thesaurus file needs to be updated.
对于步骤 401中的比较过程的实现可以采用多种实现方式, 例如: ( 1 )输入法客户端将现有细胞词库列表发送给服务器, 具体可以通过 TCP/IP协议发送, 或者通过 HTTP协议发送; 由服务器比对判断: 与列表中 的 ID相应的细胞词库是否需要更新。  For the implementation of the comparison process in step 401, various implementation manners may be used, for example: (1) The input method client sends the existing cell vocabulary list to the server, which may be sent through the TCP/IP protocol or sent through the HTTP protocol. ; Aligned by the server: Whether the cell lexicon corresponding to the ID in the list needs to be updated.
( 2 )输入法客户端发起更新请求, 服务器将所有的细胞词库的列表信息 发回 (或者发生变化的细胞词库列表), 由输入法客户端判断哪些已有词库需 要更新。  (2) The input method client initiates an update request, and the server sends back all the list information of the cell lexicon (or the changed cell vocabulary list), and the input method client determines which existing vocabulary needs to be updated.
( 3 )输入法客户端将现有细胞词库列表发送给服务器, 服务器将列表中 的 ID相应的细胞词库的版本信息发回, 由输入法客户端判断哪些已有词库需 要更新。  (3) The input method client sends the existing cell vocabulary list to the server, and the server sends back the version information of the cell vocabulary corresponding to the ID in the list, and the input method client determines which existing vocabulary needs to be updated.
上述几种方式对于网络传输带宽和设备计算压力各有所不同,本领域技术 人员根据实际需要选用即可。  The above several methods have different network transmission bandwidths and device computing pressures, and those skilled in the art may select them according to actual needs.
对于由服务器完成识别过程的情况而言 ,服务器可以将发生变化的细胞词 库形成列表发送给客户端, 由客户端确定和发起下载请求(例如, 从中选择部 分词库进行更新); 或者, 也可以直接由服务器将发生变化的细胞词库推送给 客户端, 完成更新。 For the case where the identification process is completed by the server, the server may send the changed cell vocabulary formation list to the client, and the client determines and initiates the download request (for example, selecting a part from the server) The lexicon is updated.) Alternatively, the cell vocabulary that has changed can be directly pushed to the client by the server to complete the update.
对于步骤 402中所下载的数据,可以为整个词库,也可以为一细胞词库中 的部分词条信息, 例如, 发生变化的词条信息。  The data downloaded in step 402 may be the entire thesaurus or a part of the term information in a cell vocabulary, for example, changed term information.
对于步骤 402中的数据添加过程,可以采用增量模式、批量模式或者二者 的结合。 例如, 所述添加方式为: 完成更新下载一个细胞词库, 则添加该细胞 词库词条信息至所述细胞词集中; 或者, 所述添加方式为: 完成所有待更新细 胞词库的下载后, 才添加至所述细胞词集中。  For the data addition process in step 402, an incremental mode, a batch mode, or a combination of both may be employed. For example, the adding manner is: completing the update to download a cell vocabulary, adding the cell lexicon information to the cell word set; or, adding the method: after completing downloading of all the cell vocabularies to be updated , is added to the cell word set.
对于增量模式,可以更新一个词库就安装一个词库,其优点是已下载的词 库不受未下载词库的影响,可以立即生效。但当下载词库较多时可能导致频繁 的词库添加操作,加重系统负担。 而批量模式则要求所有词库都下载到本地后 才进行添加。 由于添加操作较少, 系统负荷较低。 但当下载过程较长, 特别是 中间还可能发生下载失败的情况时,就会出现已下载的词库长期无法使用的问 题。 实际使用中可以将两种模式进行结合, 比如每下载成功一个词库就检查距 上次添加操作是否已经过了一个预定义的时间间隔; 如果超过, 就执行词库添 力口操作。  For incremental mode, you can install a thesaurus by updating a thesaurus. The advantage is that the downloaded thesaurus is not affected by the undownloaded thesaurus and can take effect immediately. However, when downloading a large number of thesaurus, it may lead to frequent addition of the thesaurus, which increases the burden on the system. Batch mode requires all the thesaurus to be downloaded locally before adding. Due to fewer add operations, the system load is lower. However, when the download process is long, especially if the download failure may occur in the middle, there will be a problem that the downloaded thesaurus cannot be used for a long time. In actual use, the two modes can be combined. For example, if a vocabulary is successfully downloaded, it is checked whether the last time the operation has been added has passed a predefined time interval; if it is exceeded, the vocabulary is added.
如果词库添加过程能够在较短时间内完成(比如不超过 1秒;), 由于影响 不大,可以直接插入到用户的输入过程中。但如果在较短时间内无法完成以致 可能影响用户的使用感受, 则词库添加过程应当在一个独立的緩存词库中进 行。 这个过程中输入法原来的词库不受影响, 用户可以正常使用。 当緩存词库 创建完毕后, 直接替换输入法原来的词库即可。 由于这个替换过程可以很快, 因此可以做到避免对用户的正常使用构成干扰。  If the thesaurus addition process can be completed in a short period of time (for example, no more than 1 second), it can be inserted directly into the user's input process due to its small impact. However, if it cannot be completed in a short period of time and may affect the user's experience, the thesaurus addition process should be performed in a separate cached lexicon. In the process, the original vocabulary of the input method is not affected, and the user can use it normally. When the cached thesaurus is created, you can directly replace the original thesaurus of the input method. Since this replacement process can be fast, it can be avoided to interfere with the normal use of the user.
以上对本发明所提供的一种输入法系统、一种字符输入的方法以及一种词 库更新的方法和一种词库发布系统,进行了详细介绍,本文中应用了具体个例 对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解 本发明的方法及其核心思想; 同时, 对于本领域的一般技术人员, 依据本发明 的思想, 在具体实施方式及应用范围上均会有改变之处, 综上所述, 本说明书 内容不应理解为对本发明的限制。  The above provides an input method system, a character input method, a word library update method and a thesaurus publishing system provided by the present invention, and a specific example is applied to the principle of the present invention. The embodiments have been described, and the description of the above embodiments is only for helping to understand the method of the present invention and its core ideas. Meanwhile, for those skilled in the art, according to the idea of the present invention, in the specific embodiments and application scope There are variations, and the description should not be construed as limiting the invention.

Claims

权 利 要 求 Rights request
1、 一种输入法系统, 包括输入接口单元、信息转换单元和显示输出单元, 其特征在于, 还包括:  An input method system, comprising an input interface unit, an information conversion unit, and a display output unit, further comprising:
系统词库, 用于记录基础字词及其相关信息;  System vocabulary for recording basic words and related information;
细胞词集, 用于记录扩展字词及其相关信息; 所述细胞词集由从服务器端 所存储的多个细胞词库中获取的至少一个细胞词库得到;每个细胞词库中的字 词至少具有一个共同属性;  a set of cell words for recording extended words and related information; the cell word set is obtained by at least one cell vocabulary obtained from a plurality of cell lexicons stored on the server side; words in each cell lexicon The word has at least one common attribute;
所述信息转换单元在所述系统词库和细胞词集中进行检索查询,得到相应 的候选项。  The information conversion unit performs a search query in the system vocabulary and the cell word set to obtain corresponding candidates.
2、 如权利要求 1所述的输入法系统, 其特征在于, 还包括:  2. The input method system of claim 1, further comprising:
自动更新模块, 用于依据本地已有的细胞词库列表,通过比较判断从服务 器端获取所需的更新数据。  The automatic update module is configured to obtain the required update data from the server by comparing and judging according to the list of existing cell vocabularies.
3、 如权利要求 1所述的输入法系统, 其特征在于, 所述细胞词集中存储 的相关信息类型少于或等于所述系统词库中存储的相关信息类型;所述服务器 端所存储的多个细胞词库中至少存在一个细胞词库由人工手动生成。  The input method system according to claim 1, wherein the type of related information stored in the cell word set is less than or equal to a type of related information stored in the system vocabulary; and the server side stores At least one cell vocabulary in multiple cell lexicons is manually generated manually.
4、 如权利要求 1所述的输入法系统, 其特征在于, 还包括用户词库。 4. The input method system of claim 1 further comprising a user vocabulary.
5、 如权利要求 1所述的输入法系统, 其特征在于, 还包括: 5. The input method system of claim 1, further comprising:
添加模块,用于将从服务器端所获取的细胞词库词条信息添加至所述细胞 词集中; 所述细胞词集为一个独立的词库或者为多个词库并列存在的词库集 合。  And a module for adding cell lexicon information acquired from the server to the cell word set; the cell word set is an independent lexicon or a lexicon collection in which a plurality of lexicons are juxtaposed.
6、 如权利要求 5所述的输入法系统, 其特征在于,  6. The input method system of claim 5, wherein
所述添加方式为: 完成一个待更新细胞词库的下载, 则添加该细胞词库词 条信息至所述细胞词集中;  The adding manner is: after completing downloading of a cell vocabulary to be updated, adding the cell lexicon information to the cell word set;
或者, 所述添加方式为: 完成所有待更新细胞词库的下载后, 才统一添加 至所述细胞词集中。  Alternatively, the adding manner is: after all the cell vocabularies to be updated are downloaded, the cells are uniformly added to the cell word set.
7、 如权利要求 6所述的输入法系统, 其特征在于,  7. The input method system of claim 6 wherein:
所述添加过程在一独立的緩存词库中进行。  The addition process takes place in a separate cached lexicon.
8、 如权利要求 1所述的输入法系统, 其特征在于, 还包括:  8. The input method system of claim 1, further comprising:
细胞词库停用模块, 用于接收用户指令,从细胞词集中去除属于用户所选 细胞词库的词条记录。 Cell lexicon deactivation module, for receiving user instructions, removing from the cell word set belongs to the user selected The entry of the cell lexicon.
9、 如权利要求 8所述的输入法系统, 其特征在于,  9. The input method system of claim 8 wherein:
所述去除过程为: 接收用户指令, 将用户所选的细胞词库从列表中删除, 并重新添加列表中的细胞词库, 得到新的细胞词集;  The removing process is: receiving a user instruction, deleting the cell vocabulary selected by the user from the list, and re-adding the cell vocabulary in the list to obtain a new cell word set;
或者, 所述去除过程为: 接收用户指令, 从所述细胞词集中删除属于用户 所选细胞词库的词条记录, 所述细胞词集中记载有各词条所属的细胞词库; 或者, 所述去除过程为: 接收用户指令, 在所述细胞词集中, 向属于用户 所选细胞词库的词条记录添加删除标记,所述细胞词集中记载有各词条所属的 细月包词库。  Or the removing process is: receiving a user instruction, deleting, from the cell word set, a term record belonging to a cell vocabulary selected by the user, where the cell word set records a cell vocabulary to which each term belongs; or The removing process is: receiving a user instruction, in the cell word set, adding a deletion mark to the entry record belonging to the cell vocabulary selected by the user, wherein the cell word set records the fine monthly package vocabulary to which each entry belongs.
10、 一种字符输入的方法, 其特征在于, 包括:  10. A method of character input, comprising:
加载系统词库和细胞词集; 所述细胞词集用于记录扩展字词及其相关信 息;所述细胞词集由从服务器端所存储的多个细胞词库中获取的至少一个细胞 词库得到; 每个细胞词库中的字词至少具有一个共同属性;  Loading a system vocabulary and a cell word set; the cell word set is for recording an extended word and related information; the cell word set is obtained by at least one cell vocabulary obtained from a plurality of cell lexicons stored on the server side Get; the words in each cell lexicon have at least one common attribute;
接收用户的输入信息;  Receiving input information of the user;
依据所接收的输入信息,在所述系统词库和细胞词集中进行检索,得到相 应的候选项;  Searching in the system lexicon and cell word set according to the received input information, and obtaining corresponding candidates;
接收用户的选择信息 , 将指定的候选项上屏输出。  The user's selection information is received, and the specified candidate is output on the screen.
11、 如权利要求 10所述的方法, 其特征在于,  11. The method of claim 10, wherein
所述加载为: 将细胞词集与系统词库合并为一个词库, 置于緩存中; 或者, 所述加载为: 将细胞词集与系统词库作为两个或多个独立词库置于 緩存中,并依据预置规则设定词库优先级;所述优先级用于候选项的显示排序。  The loading is: merging the cell word set and the system vocabulary into a vocabulary and placing it in a cache; or, the loading is: placing the cell word set and the system vocabulary as two or more independent lexicons In the cache, the lexicon priority is set according to a preset rule; the priority is used for display ordering of candidates.
12、 如权利要求 10所述的方法, 其特征在于, 所述细胞词集中记载有各 词条所属的细胞词库以及相应的细胞词库优先级;所述优先级用于候选项的显 示排序。  12. The method according to claim 10, wherein the cell word set records a cell vocabulary to which each term belongs and a corresponding cell lexicon priority; the priority is used for display ordering of candidates .
13、 如权利要求 12所述的方法, 其特征在于, 还包括:  13. The method of claim 12, further comprising:
在加载过程中 , 依据输入法的使用环境动态调整细胞词库优先级。  During the loading process, the cell lexicon priority is dynamically adjusted according to the usage environment of the input method.
14、一种词库更新的方法, 其特征在于, 所更新的词库涉及用于记录扩展 字词及其相关信息的细胞词集,所述细胞词集由从服务器端所存储的多个细胞 词库中选取的至少一个细胞词库得到;每个细胞词库中的字词至少具有一个共 同属性; 14. A method of updating a thesaurus, characterized in that the updated thesaurus relates to a set of cell words for recording extended words and related information, the plurality of cells stored by the server side. At least one cell lexicon selected in the lexicon is obtained; the words in each cell lexicon have at least one total Same attribute
所述方法包括:  The method includes:
接收触发, 比较已有细胞词库列表和服务器端细胞词库列表,得到所需更 新的词库列表;  Receive triggers, compare existing cell lexicon lists and server-side cell vocabulary lists to get a list of vocabularies that need to be updated;
下载所需更新的细胞词库词条信息, 并添加至细胞词集中。  Download the updated cell lexicon entry information and add it to the cell word set.
15、 如权利要求 14所述的方法, 其特征在于, 在触发之前还包括: 手动或者自动升级服务器端所存储的细胞词库, 并更改相应的版本信息。 The method according to claim 14, wherein before the triggering, the method further comprises: manually or automatically upgrading the cell vocabulary stored on the server side, and changing the corresponding version information.
16、 如权利要求 14所述的方法, 其特征在于: 16. The method of claim 14 wherein:
所述添加方式为: 完成一个待更新细胞词库的下载, 则添加该细胞词库词 条信息至所述细胞词集中;  The adding manner is: after completing downloading of a cell vocabulary to be updated, adding the cell lexicon information to the cell word set;
或者, 所述添加方式为: 完成所有待更新细胞词库的下载后, 才统一添加 至所述细胞词集中。  Alternatively, the adding manner is: after all the cell vocabularies to be updated are downloaded, the cells are uniformly added to the cell word set.
17、 如权利要求 16所述的方法, 其特征在于, 所述添加过程在一独立的 緩存词库中进行。  17. The method of claim 16 wherein said adding process is performed in a separate cache vocabulary.
18、 一种词库发布系统, 其特征在于, 包括:  18. A thesaurus publishing system, characterized in that:
细胞词库生成单元, 包括: 接口模块, 用于接收输入信息; 生成模块, 用 于依据所接收的信息生成细胞词库; 标识模块, 用于为每个细胞词库指定标识 和版本信息; 其中, 每个细胞词库中的字词至少具有一个共同属性;  The cell vocabulary generating unit comprises: an interface module, configured to receive input information; a generating module, configured to generate a cell vocabulary according to the received information; and an identifying module, configured to specify identifier and version information for each cell vocabulary; , the words in each cell lexicon have at least one common attribute;
通信单元, 用于接收触发, 传输相应的细胞词库词条信息至客户端。  The communication unit is configured to receive the trigger and transmit the corresponding cell lexicon entry information to the client.
19、 如权利要求 18所述的词库发布系统, 其特征在于, 所述细胞词库生 成单元还包括: The vocabulary distribution system according to claim 18, wherein the cell vocabulary generating unit further comprises:
修改更新模块, 用于修改更新细胞词库已存信息, 并通知所述标识模块针 对该细胞词库生成新的版本信息。  The update module is configured to modify the updated cell lexicon stored information, and notify the identifier module to generate new version information for the cell vocabulary.
20、 如权利要求 18所述的词库发布系统, 其特征在于, 还包括: 识别模块 , 用于比较服务器端的细胞词库列表和客户端的细胞词库列表, 依据所得到的比较结果, 通过通信单元传输所需的更新数据至客户端。  The vocabulary publishing system according to claim 18, further comprising: an identification module, configured to compare the cell vocabulary list of the server side and the cell vocabulary list of the client, and communicate according to the obtained comparison result The unit transmits the required update data to the client.
21、 如权利要求 18所述的词库发布系统, 其特征在于,  21. The thesaurus publishing system of claim 18, wherein:
依据所接收的信息得到的细胞词库中存储有多个词条信息;  A plurality of entry information is stored in the cell vocabulary obtained according to the received information;
或者,依据所接收的信息得到的细胞词库中存储有索引信息, 所述索引信 息对应其他细胞词库。 Or the index information is stored in the cell vocabulary obtained according to the received information, and the index information The information corresponds to other cell lexicons.
22、 如权利要求 18所述的词库发布系统, 其特征在于, 还包括: 合并模块, 用于将多个细胞词库词条信息合并为一个下载词库, 并通知通 信单元将该下载词库传输至客户端。  The vocabulary publishing system according to claim 18, further comprising: a merging module, configured to merge the plurality of cell lexicon entry information into one download vocabulary, and notify the communication unit to download the word The library is transferred to the client.
PCT/CN2008/071027 2007-05-22 2008-05-21 Character input method, input system and method for updating word lexicon WO2008141583A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNB2007100994746A CN100483416C (en) 2007-05-22 2007-05-22 Character input method, input method system and method for updating word stock
CN200710099474.6 2007-05-22

Publications (1)

Publication Number Publication Date
WO2008141583A1 true WO2008141583A1 (en) 2008-11-27

Family

ID=38782735

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/071027 WO2008141583A1 (en) 2007-05-22 2008-05-21 Character input method, input system and method for updating word lexicon

Country Status (2)

Country Link
CN (1) CN100483416C (en)
WO (1) WO2008141583A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9594831B2 (en) 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
US9600566B2 (en) 2010-05-14 2017-03-21 Microsoft Technology Licensing, Llc Identifying entity synonyms
US10032131B2 (en) 2012-06-20 2018-07-24 Microsoft Technology Licensing, Llc Data services for enterprises leveraging search system data assets
CN108399013A (en) * 2018-03-16 2018-08-14 北京搜狗科技发展有限公司 A kind of user's word adding method and device
CN109725736A (en) * 2017-10-27 2019-05-07 北京搜狗科技发展有限公司 A kind of candidate's sort method, device and electronic equipment
CN111581971A (en) * 2020-06-04 2020-08-25 腾讯科技(深圳)有限公司 Word stock updating method and device, terminal and storage medium
CN112987941A (en) * 2019-12-17 2021-06-18 北京搜狗科技发展有限公司 Method and device for generating candidate words

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100483416C (en) * 2007-05-22 2009-04-29 北京搜狗科技发展有限公司 Character input method, input method system and method for updating word stock
CN101178741B (en) * 2007-12-24 2010-06-23 腾讯科技(深圳)有限公司 Method and device for updating user's word stock
CN101256557B (en) * 2008-04-16 2010-06-23 腾讯科技(深圳)有限公司 Self-defining word management apparatus and method
CN101645088B (en) * 2008-08-05 2016-06-01 北京搜狗科技发展有限公司 Determine the method for auxiliary lexicon, device and the input method system that need to load
CN101710326B (en) * 2009-12-03 2012-10-03 腾讯科技(深圳)有限公司 Word stock substitution method, device and input method system
CN101840418A (en) * 2010-03-31 2010-09-22 北京搜狗科技发展有限公司 User word library synchronous update method, update server and input method system
CN102346557B (en) * 2010-07-28 2016-08-03 深圳市世纪光速信息技术有限公司 A kind of input method and input method system
CN102346731B (en) 2010-08-02 2014-09-03 联想(北京)有限公司 File processing method and file processing device
CN101995963B (en) * 2010-11-19 2012-07-04 哈尔滨工业大学 Vocabulary self-adaption Chinese input method
CN102566774A (en) * 2010-12-26 2012-07-11 上海量明科技发展有限公司 Method and system for measuring user input characters to adjust levels
CN102682031A (en) * 2011-03-17 2012-09-19 新奥特(北京)视频技术有限公司 Method and system of Chinese Pin Yin search suggest based on relational database
CN102789317B (en) * 2011-05-20 2016-05-25 腾讯科技(深圳)有限公司 A kind of method and apparatus of accelerating text input
CN103108012B (en) * 2011-11-15 2019-11-19 深圳市世纪光速信息技术有限公司 A kind of user thesaurus synchronous method and user thesaurus sync server
CN103248551A (en) * 2012-02-03 2013-08-14 腾讯科技(深圳)有限公司 Information presentation method and system
CN103246355B (en) * 2012-02-06 2017-04-05 百度在线网络技术(北京)有限公司 On-line input method evaluating method, system and device
CN103389979B (en) * 2012-05-08 2018-10-12 深圳市世纪光速信息技术有限公司 Recommend system, the device and method of classified lexicon in input method
CN104423621A (en) * 2013-08-22 2015-03-18 北京搜狗科技发展有限公司 Pinyin string processing method and device
CN103473313B (en) * 2013-09-11 2017-01-18 百度在线网络技术(北京)有限公司 Establishment method and device for name dictionary of input method
CN103825952B (en) * 2014-03-04 2017-07-04 百度在线网络技术(北京)有限公司 Cell dictionary method for pushing and server
CN105824436A (en) * 2015-01-06 2016-08-03 阿里巴巴集团控股有限公司 Character input method and input method system
CN105718071A (en) * 2016-01-19 2016-06-29 努比亚技术有限公司 Terminal and method for recommending associational words in input method
CN105955495A (en) * 2016-04-29 2016-09-21 百度在线网络技术(北京)有限公司 Information input method and device
CN108228620A (en) * 2016-12-14 2018-06-29 北京搜狗科技发展有限公司 A kind of Word library updating method and device
CN106873795A (en) * 2016-12-29 2017-06-20 北京五八信息技术有限公司 A kind of character input method, device and terminal
CN106933801B (en) * 2017-02-13 2021-02-05 北京安云世纪科技有限公司 Word stock updating method and device
CN106896937A (en) * 2017-02-28 2017-06-27 百度在线网络技术(北京)有限公司 Method and apparatus for being input into information
CN108628461B (en) * 2017-03-16 2022-07-08 北京搜狗科技发展有限公司 Input method and device and method and device for updating word stock
CN109240511A (en) * 2017-07-04 2019-01-18 北京搜狗科技发展有限公司 It is a kind of to update the method for dictionary, system and a kind of for updating the device of dictionary
CN107832035B (en) * 2017-11-13 2021-03-12 深圳市矽昊智能科技有限公司 Voice input method of intelligent terminal
CN108256051A (en) * 2018-01-15 2018-07-06 中企动力科技股份有限公司 Website product generation method and device
CN108376129B (en) * 2018-01-24 2022-04-22 北京奇艺世纪科技有限公司 Error correction method and device
CN109284228A (en) * 2018-09-25 2019-01-29 北京金山安全软件有限公司 Input method evaluation method and device, electronic equipment and storage medium
CN109408815A (en) * 2018-10-09 2019-03-01 苏州思必驰信息科技有限公司 Dictionary management method and system for voice dialogue platform
CN109542248A (en) * 2018-11-16 2019-03-29 上海二三四五网络科技有限公司 A kind of control method and control device of incremental update dictionary data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1114066A (en) * 1994-05-08 1995-12-27 刘树根 Sense sgroup input, editing and word code
CN1494025A (en) * 2002-10-31 2004-05-05 英业达股份有限公司 Input method of Chinese character having classification thesaurus and its system
CN1560767A (en) * 2004-02-24 2005-01-05 珠海市汉易通信息科技有限公司 Automatic fully adding method for word input
US20050091031A1 (en) * 2003-10-23 2005-04-28 Microsoft Corporation Full-form lexicon with tagged data and methods of constructing and using the same
CN1920827A (en) * 2006-08-23 2007-02-28 北京搜狗科技发展有限公司 Method for obtaining newly encoded character string, input method system and word stock generation device
CN101051323A (en) * 2007-05-22 2007-10-10 北京搜狗科技发展有限公司 Character input method, input method system and method for updating word stock

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1114066A (en) * 1994-05-08 1995-12-27 刘树根 Sense sgroup input, editing and word code
CN1494025A (en) * 2002-10-31 2004-05-05 英业达股份有限公司 Input method of Chinese character having classification thesaurus and its system
US20050091031A1 (en) * 2003-10-23 2005-04-28 Microsoft Corporation Full-form lexicon with tagged data and methods of constructing and using the same
CN1560767A (en) * 2004-02-24 2005-01-05 珠海市汉易通信息科技有限公司 Automatic fully adding method for word input
CN1920827A (en) * 2006-08-23 2007-02-28 北京搜狗科技发展有限公司 Method for obtaining newly encoded character string, input method system and word stock generation device
CN101051323A (en) * 2007-05-22 2007-10-10 北京搜狗科技发展有限公司 Character input method, input method system and method for updating word stock

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600566B2 (en) 2010-05-14 2017-03-21 Microsoft Technology Licensing, Llc Identifying entity synonyms
US10032131B2 (en) 2012-06-20 2018-07-24 Microsoft Technology Licensing, Llc Data services for enterprises leveraging search system data assets
US9594831B2 (en) 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
CN109725736A (en) * 2017-10-27 2019-05-07 北京搜狗科技发展有限公司 A kind of candidate's sort method, device and electronic equipment
CN108399013A (en) * 2018-03-16 2018-08-14 北京搜狗科技发展有限公司 A kind of user's word adding method and device
CN108399013B (en) * 2018-03-16 2022-08-09 北京搜狗科技发展有限公司 User word adding method and device
CN112987941A (en) * 2019-12-17 2021-06-18 北京搜狗科技发展有限公司 Method and device for generating candidate words
CN112987941B (en) * 2019-12-17 2024-02-13 北京搜狗科技发展有限公司 Method and device for generating candidate words
CN111581971A (en) * 2020-06-04 2020-08-25 腾讯科技(深圳)有限公司 Word stock updating method and device, terminal and storage medium
CN111581971B (en) * 2020-06-04 2024-01-23 腾讯科技(深圳)有限公司 Word stock updating method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN101051323A (en) 2007-10-10
CN100483416C (en) 2009-04-29

Similar Documents

Publication Publication Date Title
WO2008141583A1 (en) Character input method, input system and method for updating word lexicon
KR100999267B1 (en) On-device application catalog updated by management servers
US9336200B2 (en) Assisting document creation
US20180198742A1 (en) Inserting content into an application from an online synchronized content management system
CA2458138C (en) Methods and systems for language translation
US6697838B1 (en) Method and system for annotating information resources in connection with browsing, in both connected and disconnected states
KR101599826B1 (en) Text entry with word prediction, completion, or correction supplemented by search of shared corpus
US7272792B2 (en) Kana-to-kanji conversion method, apparatus and storage medium
JP2006276867A (en) Method and system for applying input mode bias
US7752216B2 (en) Retrieval apparatus, retrieval method and retrieval program
JPWO2004111876A1 (en) Search system and method for reusing search conditions
EP3997589A1 (en) Delta graph traversing system
JP2008305385A (en) Character input device, server device, dictionary download system, method for presenting conversion candidate phrase, information processing method, and program
JP3767763B2 (en) Information retrieval device and computer-readable recording medium recording a program for causing a computer to function as the device
JP2002259387A (en) Document retrieving system
JP2011090463A (en) Document retrieval system, information processing apparatus, and program
JP2006185059A (en) Contents management apparatus
JP4000332B2 (en) Information retrieval apparatus and computer-readable recording medium recording a program for causing a computer to function as the apparatus
JP3310961B2 (en) System and method for specifying a location on a network
JP7272540B2 (en) Information provision system, information provision method, and data structure
JP2000259655A (en) Information retrieving system, data base management device, data base managing method, and computer readable recording medium storing program for executing the method by computer
WO2024012009A1 (en) Keyboard input method and system, and computer-readable storage medium, electronic device and computer program product
JP2008021031A (en) Search server apparatus and its control method, information processing apparatus and its control method, information processing system, information search apparatus and its control method, program, and storage medium
JP2022114721A (en) Information providing system and information providing method
CN116414998A (en) Resource feedback method, related device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08748633

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08748633

Country of ref document: EP

Kind code of ref document: A1