US20100100555A1 - Systems and methods for changing symbol sequences in documents - Google Patents

Systems and methods for changing symbol sequences in documents Download PDF

Info

Publication number
US20100100555A1
US20100100555A1 US12/644,585 US64458509A US2010100555A1 US 20100100555 A1 US20100100555 A1 US 20100100555A1 US 64458509 A US64458509 A US 64458509A US 2010100555 A1 US2010100555 A1 US 2010100555A1
Authority
US
United States
Prior art keywords
symbol sequence
matching
probability
symbol
received
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/644,585
Inventor
John Eric Harrity
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US12/644,585 priority Critical patent/US20100100555A1/en
Publication of US20100100555A1 publication Critical patent/US20100100555A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Definitions

  • Implementations consistent with the principles of the invention relate generally to computer systems and, more particularly, to systems and methods for changing symbol sequences in documents.
  • a computer-implemented editing system is a system capable of creating and altering electronic documents, such as word processing documents, spreadsheets, databases, e-mail messages, and the like.
  • electronic documents such as word processing documents, spreadsheets, databases, e-mail messages, and the like.
  • editing programs available that allow conventional personal computers to function as sophisticated computer-implemented editing systems.
  • a computer-readable medium includes instructions that cause at least one processor to perform a method.
  • the method may include receiving a symbol sequence into a document; comparing the received symbol sequence to a list of previously-stored words; comparing the received symbol sequence to other symbol sequences in the document when the received symbol sequence does not match any of the words in the list; determining a probability of the received symbol sequence matching one or more other symbol sequences in the document when the received symbol sequence does not match any of the other symbol sequences in the document; determining a probability of the received symbol sequence matching one or more variations of another symbol sequence in the document when the received symbol sequence does not match any of the symbol sequences in the document; replacing the received symbol sequence with another symbol sequence in the document or a variation of another symbol sequence in the document when the probability of the received symbol sequence matching the other symbol sequence is above a threshold; obtaining a number of symbol sequences from the document and a number of words from the list that most closely match the received symbol sequence to form a
  • a computer-readable medium includes instructions for causing at least one processor to perform a method.
  • the method may include receiving a symbol sequence into a document, identifying another symbol sequence in the document whose probability of matching the received symbol sequence is above a threshold, and replacing the received symbol sequence with the other symbol sequence.
  • a computer-readable medium may include instructions that cause at least one processor to perform a method.
  • the method may include identifying a symbol sequence in a document that closely matches a first symbol sequence in the document; identifying a word in a list of previously-stored words that closely matches the first symbol sequence; ranking the identified symbol sequence and word to form a ranked list of objects; and providing the ranked list of objects.
  • FIG. 1 illustrates an exemplary system in which systems and methods, consistent with the principles of the invention, may be implemented; of FIG. 1 ;
  • FIG. 2 illustrates an exemplary configuration of the system of FIG. 1 ;
  • FIG. 3 illustrates an exemplary dictionary that may be associated with the system
  • FIGS. 4 and 5 illustrate an exemplary process for changing symbol sequences in documents in an implementation consistent with the principles of the invention
  • FIGS. 6 and 7 illustrate an example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5 ;
  • FIGS. 8 and 9 illustrate a second example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5 ;
  • FIGS. 10 and 11 illustrate a third example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5 ;
  • FIGS. 12 and 13 illustrate a fourth example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5 .
  • Implementations consistent with the principles of the invention improve the changing of symbol sequences in documents.
  • an unknown symbol sequence in a document such as a word processing document, is compared to other symbol sequences in the document. If the unknown symbol sequence closely matches another symbol sequence (or variation of the other symbol sequence) in the document, the unknown symbol sequence may be automatically replaced with the closely matching symbol sequence (or variation).
  • a list of closely matching symbol sequences from the document may be generated, along with a list of closely matching words from a dictionary. The two lists may be combined and the combined list may be ranked based on the symbol sequences in the document. The ranked list (or a predetermined number of highest ranking items in the ranked list) may be provided to a user. Upon selection of one of the items in the provided list, the unknown symbol sequence may be automatically replaced with the selected item. In this way, auto-correction of symbol sequences in documents can be improved.
  • FIG. 1 illustrates an exemplary system 100 in which systems and methods, consistent with the principles of the invention, may be implemented.
  • System 100 may include a computer 110 , a keyboard 120 , a pointing device 130 , and a monitor 140 .
  • the components illustrated in FIG. 1 have been selected for simplicity. It will be appreciated that a typical system may include more or fewer components than illustrated in FIG. 1 . Moreover, it will be appreciated that a typical system could include other components than those illustrated in FIG. 1 .
  • Computer 110 may include any type of computer system, such as a mainframe, minicomputer, personal computer, or the like. In alternative implementations consistent with principles of the invention, computer 110 may alternatively include a laptop, personal digital assistant, cellular telephone, or the like. In fact, computer 110 can include any device capable of running word processing programs. In some implementations consistent with the principles of the invention, keyboard 120 , pointing device 130 , and/or monitor 140 may be integrated with computer 110 .
  • Keyboard 120 may include any conventional keyboard or keypad that allows a user to input information into computer 110 .
  • Pointing device 130 may include one or more conventional pointing devices, such as a mouse, a pen, a trackball, a glide pad, biometric pointing devices, or the like.
  • Monitor 140 may include any conventional device capable of displaying images to a user.
  • Monitor 140 may, for example, include a cathode ray tube display, a liquid crystal display, a gas panel display, or the like.
  • pointing device 130 may be integrated within monitor 140 through the use of touch-screen technology.
  • FIG. 2 illustrates an exemplary configuration of system 100 of FIG. 1 .
  • system 100 may include a bus 210 , a processor 220 , a memory 230 , a read only memory (ROM) 240 , a storage device 250 , an input device 260 , an output device 270 , and a communication interface 280 .
  • Bus 210 permits communication among the components of system 100 .
  • Processor 220 may include any type of conventional processor or microprocessor that interprets and executes instructions. In alternative implementations, processor 220 may be implemented as a field programmable gate array (FPGA), application specific integrated circuit (ASIC), or the like.
  • Memory 230 may include a random access memory (RAM) or another dynamic storage device that stores information and instructions for execution by processor 220 . Memory 230 may also be used to store temporary variables or other intermediate information during execution of instructions by processor 220 .
  • ROM 240 may include a conventional ROM device and/or another static storage device that stores static information and instructions for processor 220 .
  • Storage device 250 may include a magnetic disk or optical disk and its corresponding drive and/or some other type of magnetic or optical recording medium and its corresponding drive for storing information and instructions.
  • Input device 260 may include one or more conventional mechanisms that permit an operator to input information to system 100 , such as keyboard 120 , pointing device 130 (e.g., a mouse, a pen, and the like), one or more biometric mechanisms, such as a voice recognition device, etc.
  • Output device 270 may include one or more conventional mechanisms that output information to the operator, such as a display (e.g., monitor 140 ), a printer, a speaker, etc.
  • Communication interface 280 may include any transceiver-like mechanism that enables system 100 to communicate with other devices and/or systems.
  • communication interface 280 may include a modem or an Ethernet interface to a network.
  • communication interface 280 may include other mechanisms for communicating via a data network.
  • System 100 may implement the functions described below in response to processor 220 executing software instructions contained in a computer-readable medium, such as memory 230 .
  • a computer-readable medium may be defined as one or more memory devices and/or carrier waves.
  • hardwired circuitry may be used in place of or in combination with software instructions to implement the invention.
  • implementations consistent with the principles of the invention are not limited to any specific combination of hardware circuitry and software.
  • System 100 may include a data processing program that may be associated with a dictionary.
  • the dictionary may be stored, for example, in memory 230 , or externally to system 100 .
  • FIG. 3 illustrates an exemplary dictionary 300 consistent with the principles of the invention. While only one dictionary is described below, it will be appreciated that dictionary 300 may consist of multiple dictionaries stored locally at system 100 or external to system 100 . As illustrated, dictionary 300 may include a list of dictionary words. These words may, in one implementation consistent with the principles of the invention, be arranged alphabetically, as illustrated, in FIG. 3 . A user of system 100 may modify the entries in dictionary 300 by adding or deleting entries.
  • FIGS. 4 and 5 illustrate an exemplary process for changing symbol sequences in documents in an implementation consistent with the principles of the invention. Processing may begin by receiving a symbol sequence into the current file (e.g., a current word processing document or any other type of file in which symbol sequence correction or changing would be desired) [act 410 , FIG. 4 ].
  • the symbol sequence may, for example, be received in response to a user depressing keys on keyboard 120 .
  • System 100 may determine that a symbol sequence has been received when a delimiter character has been detected.
  • system 100 may recognize the following characters as delimiter characters: hard space ( ) period (.), comma (,), semicolon (;), colon (:), quotation mark (“), single quotation mark (‘), exclamation point (!), question mark (?), and the like. Therefore, for example, when a delimiter character is received, the symbols preceding the delimiter character may be considered a symbol sequence.
  • the symbol sequence may be compared to the symbol sequences that are already contained in the current file [act 430 ]. If the symbol sequence matches one or more other symbol sequences in the current file [act 440 ], then processing may return to act 410 with the receipt of the next symbol sequence.
  • a dictionary such as dictionary 300 [act 420 ]. If the symbol sequence matches a word in the dictionary, then processing may return to act 410 with system 100 receiving the next symbol sequence. If, on the other hand, the symbol sequence does not match a word in the dictionary, the symbol sequence may be compared to the symbol sequences that are already contained in the current file [act 430 ]. If the symbol sequence matches one or more other symbol sequences in the current file [act 440 ], then processing may return to act 410 with the receipt of the next symbol sequence.
  • the received symbol sequence does not match another symbol sequence in the current file [act 440 ], then it may be determined whether the received symbol sequence closely matches a symbol sequence already contained in the current file [act 510 , FIG. 5 ].
  • the probability of whether the received symbol sequence matches each symbol sequence already contained in the file may be determined. The probability may take into consideration, for example, the number of occurrences of each particular symbol sequence in the file. So, for example, if a particular symbol sequence occurs more than a predetermined number of times in the document and the received symbol sequence closely matches that particular symbol sequence, then the probability of the received symbol sequence matching the particular symbol sequence may be very high.
  • variations of the particular symbol sequence may be considered and their probabilities determined. For example, “jumps,” “jumped,” and “jumping” are variations of the word “jump.”
  • the received symbol sequence may be automatically replaced with the other, closely matching symbol sequence [act 520 ].
  • the threshold for automatically replacing symbol sequences in the file may be configurable by the user to allow the user to determine how close a particular symbol sequence has to be to the received symbol sequence in order for the received symbol sequence to be replaced with the particular symbol sequence. Processing may then return to act 410 ( FIG. 4 ) with the receipt of the next symbol sequence.
  • the closest matching symbol sequences from the file may be obtained, along with the closest matching words from the dictionary [act 530 ].
  • the number of closest matching symbol sequences from the file may be limited.
  • the number of closest matching words from the dictionary may also be limited.
  • the number of symbol sequences obtained from the file may approximately equal the number of words from the dictionary (e.g., the four most closely matching symbol sequences and the four most closely matching words may be obtained).
  • the number of symbol sequences and words that are obtained may be based on the probabilities of these symbol sequences and words matching the received symbol sequence.
  • all symbol sequences and words may be obtained whose probability of matching the received symbol sequence is above a threshold.
  • the threshold may be configurable by the user to allow the user to determine whether a greater number of items or lesser number of items will be provided to the user.
  • the words that are obtained from the dictionary may be based on the symbol sequences obtained from the file. For example, if a particular symbol sequence closely matches the received symbol sequence, words that closely relate to the closely matching symbol sequence may be obtained from the dictionary. In one implementation, if it is determined that the probability of one of the words from the dictionary matching the received symbol sequence is above the first threshold above (i.e., the threshold for determining whether the received symbol sequence is automatically replaced), then the received symbol sequence may be automatically replaced with the word.
  • the first threshold i.e., the threshold for determining whether the received symbol sequence is automatically replaced
  • the closest matching symbol sequences and words may be ranked to create a ranked list [act 540 ]. Any conventional technique for ranking the closest matching symbol sequences and words may be used. For example, the ranking may be based on how closely the symbol sequences and words match the received symbol sequence. Other techniques for ranking the closest matching symbol sequences and words may alternatively be used.
  • the ranked list of closest matching symbol sequences and words may be provided to the user [act 550 ].
  • a user-configurable number of the highest ranking symbol sequences and words may be provided to the user.
  • the ranked list may be provided automatically (e.g., as an automatic popup window) or may be provided in response to an action by the user.
  • the ranked list may be provided in response to the user selecting the symbol sequence. In this situation, the ranked list, for example, may be provided after other symbol sequences are received in the current file.
  • the ranked list may then be determined whether the user selected one of the items in the ranked list [act 560 ].
  • the user may select an item from the ranked list in any conventional manner. For example, the user may select an item from the ranked list using a mouse pointer.
  • the received symbol sequence is automatically replaced with the selected item [act 570 ]. If, on the other hand, the user does not select an item from the list [act 560 ] or after the received symbol sequence is automatically replaced with a select item [act 570 ], processing may return to act 410 above with system 100 receiving another symbol sequence.
  • system 100 may periodically re-perform any or all of acts 420 - 570 on any symbol sequences that do not match a word in the dictionary. In one implementation, system 100 may re-perform any or all of acts 420 - 570 on those symbol sequences not matching a word in the dictionary at predetermined intervals. In other implementations, system 100 may re-perform any or all of acts 420 - 570 during those instances where system 100 is in an idle state (e.g., when the user stops typing into the current file).
  • FIGS. 6 and 7 illustrate a first example of the above processing consistent with the principles of the invention.
  • a group of symbol sequences 600 has been received into a current file.
  • the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • the symbol sequence “Harrtiy” has been received into the current file.
  • the symbol sequence, “Harrtiy” may be compared to words in the dictionary (e.g., dictionary 300 ). Since the symbol sequence, “Harrtiy,” does not match any of the words in the dictionary, the symbol sequence is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file.
  • the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence. Assume, for example, that it is determined that the probability of the symbol sequence, “Harrtiy,” matching the symbol sequence, “Harrity,” is 99%. In this instance, the symbol sequence, “Harrtiy,” would be automatically replaced with the symbol sequence, “Harrity.”
  • the closest matching symbol sequences from the file and the closest matching symbol sequences from the dictionary may be obtained.
  • the closest matching symbol sequences and words may be obtained based on the probability of the symbol sequence matching another symbol sequence or word. Assume, for example, that it is determined that the probability of the symbol sequence, “Harrtiy,” matching the symbol sequence, “Harrity,” is, as set forth above, 95%, and that the probability of the symbol sequence, “Harrtiy,” matching any other symbol sequence in the file is below 75% (the threshold value).
  • the closest matching symbol sequences and words include: “Harrity,” “Hearty,” “Harry,” “Hardy,” and “Harpy.”
  • These closest matching symbol sequences and words may then be ranked based, for example, on their probability of matching the symbol sequence, “Harrtiy.” Table 1 below illustrates the ranked list of closely matching symbol sequences and words.
  • the ranked list of closely matching symbol sequences and words may then be provided.
  • the ranked list may be provided via a popup window, such as popup window 710 , illustrated in FIG. 7 .
  • Popup window 710 may automatically appear when the symbol sequence, “Harrtiy,” is received or may appear in response to a user action. In the latter instance, popup window 710 may appear after additional symbol sequences have been received into the current file.
  • popup window 710 may also include an auto-correct feature that allows the user to add one of the items from the ranked list to an auto-correct list.
  • the symbol sequence, “Harrtiy,” may be automatically replaced.
  • FIGS. 8 and 9 illustrate a second example of the above processing consistent with the principles of the invention.
  • a group of symbol sequences 800 has been received into a current file.
  • the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • the symbol sequence “elastamer” has been received into the current file.
  • the symbol sequence, “elastamer,” may be compared to words in the dictionary (e.g., dictionary 300 ). Since the symbol sequence, “elastamer,” does not match any of the words in the dictionary, the symbol sequence is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file.
  • the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence. Assume, for example, that it is determined that the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastomer,” is 99%. In this instance, the symbol sequence, “elastamer,” would be automatically replaced with the symbol sequence, “elastomer.” If probabilities of two or more symbol sequences from the file are above the threshold, the symbol sequence with the highest probability may be selected to replace the received symbol sequence.
  • the closest matching symbol sequences from the file and the closest matching symbol sequences from the dictionary may be obtained.
  • the closest matching symbol sequences and words may be obtained based on the probability of the symbol sequence matching another symbol sequence or word.
  • the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastomer,” is, as set forth above, 97%
  • the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastomeric,” is 90%
  • the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastic” is 79%
  • the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elasticity,” is 78%
  • the probability of the symbol sequence, “elastamer,” matching any other symbol sequence in the file is below 75% (the threshold value).
  • the closest matching symbol sequences and words include: “elastomer,” “elastomeric,” “elastic,” “elasticity,” and “easterner.”
  • These closest matching symbol sequences and words may then be ranked based, for example, on their probability of matching the symbol sequence, “elastamer.” Table 2 below illustrates the ranked list of closely matching symbol sequences and words.
  • the ranked list of closely matching symbol sequences and words may then be provided.
  • the ranked list may be provided via a popup window, such as popup window 910 , illustrated in FIG. 9 .
  • Popup window 910 may automatically appear when the symbol sequence, “elastamer,” is received or may appear in response to a user action. In the latter instance, popup window 910 may appear after additional symbol sequences have been received into the current file.
  • popup window 910 may also include an auto-correct feature that allows the user to add one of the items from the ranked list to an auto-correct list.
  • the symbol sequence, “elastamer,” may be automatically replaced.
  • FIGS. 10 and 11 illustrate a third example of the above processing consistent with the principles of the invention.
  • a group of symbol sequences 1000 has been received into a current file.
  • the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • the symbol sequence “synchronise” has been received into the current file.
  • the symbol sequence, “synchronise” may be compared to words in the dictionary (e.g., dictionary 300 ). Since the symbol sequence, “synchronise,” does not match any of the words in the dictionary, the symbol sequence is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file.
  • the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence. Assume, for this example, that the probability of the most closely matching symbol sequence from the file is determined to be 95% and, therefore, the symbol sequence is not automatically replaced. In this situation, the closest matching symbol sequences from the file and the closest matching symbol sequences from the dictionary may be obtained. In one implementation, the closest matching symbol sequences and words may be obtained based on the probability of the symbol sequence matching another symbol sequence or word.
  • the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronized,” is 95%
  • the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronization,” is 85%
  • the probability of the symbol sequence, “synchronise,” matching any other symbol sequence in the file is below 75% (the threshold value).
  • the words selected from the dictionary may be based on the most closely matching symbol sequences in the file. Since the symbol sequence, “synchronized,” has a high probability of matching, the following words may be obtained from the dictionary: “synchronize,” “synchronizes,” and “synchronizing.” Assume, for example, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronize,” is 99%. In this example, the symbol sequence, “synchronise,” would be automatically replaced with the symbol sequence, “synchronize.”
  • the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronize,” is 97.9%, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronizes,” is 95%, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronizing,” is 87%, and that the probability of the symbol sequence, “synchronise,” matching any other word from the dictionary is below 75% (the threshold value).
  • These closest matching symbol sequences and words may then be ranked based, for example, on their probability of matching the symbol sequence, “synchronise.”
  • Table 3 illustrates the ranked list of closely matching symbol sequences and words.
  • the ranked list of closely matching symbol sequences and words may then be provided.
  • the ranked list may be provided via a popup window, such as popup window 1110 , illustrated in FIG. 11 .
  • Popup window 1110 may automatically appear when the symbol sequence, “synchronise,” is received or may appear in response to a user action. In the latter instance, popup window 1110 may appear after additional symbol sequences have been received into the current file.
  • popup window 1110 may also include an auto-correct feature that allows the user to add one of the items from the ranked list to an auto-correct list.
  • the symbol sequence, “synchronise,” may be automatically replaced.
  • FIGS. 12 and 13 illustrate a fourth example of the above processing consistent with the principles of the invention.
  • a group of symbol sequences 1200 has been received into a current file.
  • the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • the processing described in FIGS. 4 and 5 may be performed on symbol sequences 1200 in the current file.
  • symbol sequence 1210 “sychnchronization,” does not match any word in the dictionary
  • symbol sequence 1210 is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file. Since symbol sequence 1210 does not match any of the other symbol sequences in the file, it is determined whether the symbol sequence, “sychnchronization,” closely matches any other symbol sequence in the file. In one implementation, the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence.
  • symbol sequence 1210 is automatically replaced with symbol sequence 1310 , “synchronization,” as illustrated in FIG. 13 .
  • system 100 may periodically go through a current file and automatically correct misspelled words.
  • Implementations consistent with the principles of the invention improve the changing of symbol sequences in documents.
  • an unknown symbol sequence in a document such as a word processing document
  • the unknown symbol sequence may be automatically replaced with the closely matching symbol sequence.
  • a list of closely matching symbol sequences from the word processing document may be generated, along with a list of closely matching words from a dictionary. The two lists may be combined and ranked based on the symbol sequences in the word processing document. The ranked list (or a predetermined number of highest ranking items in the ranked list) may be provided to a user. Upon selection of one of the items in the provided list, the unknown symbol sequence may be automatically replaced with the selected item. In this way, auto-correction of symbol sequences in documents can be improved.
  • logic may include hardware, such as an application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.

Abstract

A computer-readable medium includes instructions for causing at least one processor to perform a method. The method may include receiving a symbol sequence into a document, identifying another symbol sequence in the document whose probability of matching the received symbol sequence is above a threshold, and replacing the received symbol sequence with the other symbol sequence.

Description

    RELATED APPLICATION
  • This application claims priority under 35 U.S.C. §119 based on U.S. Provisional Application No. ______ [Docket No. 0001-0003P], filed Jul. 12, 2004, the disclosure of which is hereby incorporated in its entirety by reference herein.
  • FIELD OF THE INVENTION
  • Implementations consistent with the principles of the invention relate generally to computer systems and, more particularly, to systems and methods for changing symbol sequences in documents.
  • BACKGROUND OF THE INVENTION
  • A computer-implemented editing system is a system capable of creating and altering electronic documents, such as word processing documents, spreadsheets, databases, e-mail messages, and the like. There are a wide variety of editing programs available that allow conventional personal computers to function as sophisticated computer-implemented editing systems.
  • SUMMARY OF THE INVENTION
  • In accordance with the purpose of this invention as embodied and broadly described herein, a computer-readable medium includes instructions that cause at least one processor to perform a method. The method may include receiving a symbol sequence into a document; comparing the received symbol sequence to a list of previously-stored words; comparing the received symbol sequence to other symbol sequences in the document when the received symbol sequence does not match any of the words in the list; determining a probability of the received symbol sequence matching one or more other symbol sequences in the document when the received symbol sequence does not match any of the other symbol sequences in the document; determining a probability of the received symbol sequence matching one or more variations of another symbol sequence in the document when the received symbol sequence does not match any of the symbol sequences in the document; replacing the received symbol sequence with another symbol sequence in the document or a variation of another symbol sequence in the document when the probability of the received symbol sequence matching the other symbol sequence is above a threshold; obtaining a number of symbol sequences from the document and a number of words from the list that most closely match the received symbol sequence to form a second list when the probability of the received symbol sequence matching another symbol sequence in the document or a variation of another symbol sequence in the document does not exceed the threshold; ranking the second list based on the symbol sequences in the document to form a ranked list of items; providing the ranked list of items; detecting selection of an item in the ranked list of items; and replacing the received symbol sequence with the selected item.
  • In another implementation consistent with the principles of the invention, a computer-readable medium includes instructions for causing at least one processor to perform a method. The method may include receiving a symbol sequence into a document, identifying another symbol sequence in the document whose probability of matching the received symbol sequence is above a threshold, and replacing the received symbol sequence with the other symbol sequence.
  • In still another implementation consistent with the principles of the invention, a computer-readable medium may include instructions that cause at least one processor to perform a method. The method may include identifying a symbol sequence in a document that closely matches a first symbol sequence in the document; identifying a word in a list of previously-stored words that closely matches the first symbol sequence; ranking the identified symbol sequence and word to form a ranked list of objects; and providing the ranked list of objects.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, explain the invention. In the drawings,
  • FIG. 1 illustrates an exemplary system in which systems and methods, consistent with the principles of the invention, may be implemented; of FIG. 1;
  • FIG. 2 illustrates an exemplary configuration of the system of FIG. 1;
  • FIG. 3 illustrates an exemplary dictionary that may be associated with the system
  • FIGS. 4 and 5 illustrate an exemplary process for changing symbol sequences in documents in an implementation consistent with the principles of the invention;
  • FIGS. 6 and 7 illustrate an example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5;
  • FIGS. 8 and 9 illustrate a second example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5;
  • FIGS. 10 and 11 illustrate a third example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5; and
  • FIGS. 12 and 13 illustrate a fourth example, consistent with the principles of the invention, of the processing described in FIGS. 4 and 5.
  • DETAILED DESCRIPTION
  • The following detailed description of implementations consistent with the invention refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and their equivalents.
  • Implementations consistent with the principles of the invention improve the changing of symbol sequences in documents. In one implementation, an unknown symbol sequence in a document, such as a word processing document, is compared to other symbol sequences in the document. If the unknown symbol sequence closely matches another symbol sequence (or variation of the other symbol sequence) in the document, the unknown symbol sequence may be automatically replaced with the closely matching symbol sequence (or variation). Alternatively, a list of closely matching symbol sequences from the document may be generated, along with a list of closely matching words from a dictionary. The two lists may be combined and the combined list may be ranked based on the symbol sequences in the document. The ranked list (or a predetermined number of highest ranking items in the ranked list) may be provided to a user. Upon selection of one of the items in the provided list, the unknown symbol sequence may be automatically replaced with the selected item. In this way, auto-correction of symbol sequences in documents can be improved.
  • Exemplary System
  • FIG. 1 illustrates an exemplary system 100 in which systems and methods, consistent with the principles of the invention, may be implemented. System 100 may include a computer 110, a keyboard 120, a pointing device 130, and a monitor 140. The components illustrated in FIG. 1 have been selected for simplicity. It will be appreciated that a typical system may include more or fewer components than illustrated in FIG. 1. Moreover, it will be appreciated that a typical system could include other components than those illustrated in FIG. 1.
  • Computer 110 may include any type of computer system, such as a mainframe, minicomputer, personal computer, or the like. In alternative implementations consistent with principles of the invention, computer 110 may alternatively include a laptop, personal digital assistant, cellular telephone, or the like. In fact, computer 110 can include any device capable of running word processing programs. In some implementations consistent with the principles of the invention, keyboard 120, pointing device 130, and/or monitor 140 may be integrated with computer 110.
  • Keyboard 120 may include any conventional keyboard or keypad that allows a user to input information into computer 110. Pointing device 130 may include one or more conventional pointing devices, such as a mouse, a pen, a trackball, a glide pad, biometric pointing devices, or the like.
  • Monitor 140 may include any conventional device capable of displaying images to a user. Monitor 140 may, for example, include a cathode ray tube display, a liquid crystal display, a gas panel display, or the like. In alternative implementations, pointing device 130 may be integrated within monitor 140 through the use of touch-screen technology.
  • FIG. 2 illustrates an exemplary configuration of system 100 of FIG. 1. As illustrated, system 100 may include a bus 210, a processor 220, a memory 230, a read only memory (ROM) 240, a storage device 250, an input device 260, an output device 270, and a communication interface 280. Bus 210 permits communication among the components of system 100.
  • Processor 220 may include any type of conventional processor or microprocessor that interprets and executes instructions. In alternative implementations, processor 220 may be implemented as a field programmable gate array (FPGA), application specific integrated circuit (ASIC), or the like. Memory 230 may include a random access memory (RAM) or another dynamic storage device that stores information and instructions for execution by processor 220. Memory 230 may also be used to store temporary variables or other intermediate information during execution of instructions by processor 220.
  • ROM 240 may include a conventional ROM device and/or another static storage device that stores static information and instructions for processor 220. Storage device 250 may include a magnetic disk or optical disk and its corresponding drive and/or some other type of magnetic or optical recording medium and its corresponding drive for storing information and instructions.
  • Input device 260 may include one or more conventional mechanisms that permit an operator to input information to system 100, such as keyboard 120, pointing device 130 (e.g., a mouse, a pen, and the like), one or more biometric mechanisms, such as a voice recognition device, etc. Output device 270 may include one or more conventional mechanisms that output information to the operator, such as a display (e.g., monitor 140), a printer, a speaker, etc. Communication interface 280 may include any transceiver-like mechanism that enables system 100 to communicate with other devices and/or systems. For example, communication interface 280 may include a modem or an Ethernet interface to a network. Alternatively, communication interface 280 may include other mechanisms for communicating via a data network.
  • System 100 may implement the functions described below in response to processor 220 executing software instructions contained in a computer-readable medium, such as memory 230. A computer-readable medium may be defined as one or more memory devices and/or carrier waves. In alternative embodiments, hardwired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, implementations consistent with the principles of the invention are not limited to any specific combination of hardware circuitry and software.
  • System 100, consistent with the principles of the invention, may include a data processing program that may be associated with a dictionary. The dictionary may be stored, for example, in memory 230, or externally to system 100.
  • FIG. 3 illustrates an exemplary dictionary 300 consistent with the principles of the invention. While only one dictionary is described below, it will be appreciated that dictionary 300 may consist of multiple dictionaries stored locally at system 100 or external to system 100. As illustrated, dictionary 300 may include a list of dictionary words. These words may, in one implementation consistent with the principles of the invention, be arranged alphabetically, as illustrated, in FIG. 3. A user of system 100 may modify the entries in dictionary 300 by adding or deleting entries.
  • Exemplary Processing
  • FIGS. 4 and 5 illustrate an exemplary process for changing symbol sequences in documents in an implementation consistent with the principles of the invention. Processing may begin by receiving a symbol sequence into the current file (e.g., a current word processing document or any other type of file in which symbol sequence correction or changing would be desired) [act 410, FIG. 4]. The symbol sequence may, for example, be received in response to a user depressing keys on keyboard 120. System 100 may determine that a symbol sequence has been received when a delimiter character has been detected. In one implementation, system 100 may recognize the following characters as delimiter characters: hard space ( ) period (.), comma (,), semicolon (;), colon (:), quotation mark (“), single quotation mark (‘), exclamation point (!), question mark (?), and the like. Therefore, for example, when a delimiter character is received, the symbols preceding the delimiter character may be considered a symbol sequence.
  • It may then be determined if the received symbol sequence matches a word in a dictionary, such as dictionary 300 [act 420]. If the symbol sequence matches a word in the dictionary, then processing may return to act 410 with system 100 receiving the next symbol sequence. If, on the other hand, the symbol sequence does not match a word in the dictionary, the symbol sequence may be compared to the symbol sequences that are already contained in the current file [act 430]. If the symbol sequence matches one or more other symbol sequences in the current file [act 440], then processing may return to act 410 with the receipt of the next symbol sequence.
  • If, on the other hand, the received symbol sequence does not match another symbol sequence in the current file [act 440], then it may be determined whether the received symbol sequence closely matches a symbol sequence already contained in the current file [act 510, FIG. 5]. In one implementation, the probability of whether the received symbol sequence matches each symbol sequence already contained in the file may be determined. The probability may take into consideration, for example, the number of occurrences of each particular symbol sequence in the file. So, for example, if a particular symbol sequence occurs more than a predetermined number of times in the document and the received symbol sequence closely matches that particular symbol sequence, then the probability of the received symbol sequence matching the particular symbol sequence may be very high. Also, if the received symbol sequence closely matches a particular symbol sequence from the file, variations of the particular symbol sequence may be considered and their probabilities determined. For example, “jumps,” “jumped,” and “jumping” are variations of the word “jump.” A number of conventional techniques exist for determining the probability of whether one item matches another item.
  • If the received symbol sequence closely matches another symbol sequence in the file (e.g., the probability of the symbol sequence matching the other symbol sequence is above a predetermined threshold), the received symbol sequence may be automatically replaced with the other, closely matching symbol sequence [act 520]. In one implementation, the threshold for automatically replacing symbol sequences in the file may be configurable by the user to allow the user to determine how close a particular symbol sequence has to be to the received symbol sequence in order for the received symbol sequence to be replaced with the particular symbol sequence. Processing may then return to act 410 (FIG. 4) with the receipt of the next symbol sequence.
  • If, on the other hand, no closely matching symbol sequences are present in the file [act 510], the closest matching symbol sequences from the file may be obtained, along with the closest matching words from the dictionary [act 530]. In one implementation, the number of closest matching symbol sequences from the file may be limited. Similarly, the number of closest matching words from the dictionary may also be limited. In one implementation, the number of symbol sequences obtained from the file may approximately equal the number of words from the dictionary (e.g., the four most closely matching symbol sequences and the four most closely matching words may be obtained). In other implementations, the number of symbol sequences and words that are obtained may be based on the probabilities of these symbol sequences and words matching the received symbol sequence. For example, all symbol sequences and words may be obtained whose probability of matching the received symbol sequence is above a threshold. In one implementation, the threshold may be configurable by the user to allow the user to determine whether a greater number of items or lesser number of items will be provided to the user.
  • Moreover, the words that are obtained from the dictionary may be based on the symbol sequences obtained from the file. For example, if a particular symbol sequence closely matches the received symbol sequence, words that closely relate to the closely matching symbol sequence may be obtained from the dictionary. In one implementation, if it is determined that the probability of one of the words from the dictionary matching the received symbol sequence is above the first threshold above (i.e., the threshold for determining whether the received symbol sequence is automatically replaced), then the received symbol sequence may be automatically replaced with the word.
  • The closest matching symbol sequences and words may be ranked to create a ranked list [act 540]. Any conventional technique for ranking the closest matching symbol sequences and words may be used. For example, the ranking may be based on how closely the symbol sequences and words match the received symbol sequence. Other techniques for ranking the closest matching symbol sequences and words may alternatively be used.
  • The ranked list of closest matching symbol sequences and words may be provided to the user [act 550]. In one implementation, a user-configurable number of the highest ranking symbol sequences and words may be provided to the user. The ranked list may be provided automatically (e.g., as an automatic popup window) or may be provided in response to an action by the user. For example, in one implementation, the ranked list may be provided in response to the user selecting the symbol sequence. In this situation, the ranked list, for example, may be provided after other symbol sequences are received in the current file.
  • Once the ranked list is provided to the user, it may then be determined whether the user selected one of the items in the ranked list [act 560]. The user may select an item from the ranked list in any conventional manner. For example, the user may select an item from the ranked list using a mouse pointer.
  • If the user selects an item from the ranked list [act 560], the received symbol sequence is automatically replaced with the selected item [act 570]. If, on the other hand, the user does not select an item from the list [act 560] or after the received symbol sequence is automatically replaced with a select item [act 570], processing may return to act 410 above with system 100 receiving another symbol sequence.
  • It will be appreciated that system 100 may periodically re-perform any or all of acts 420-570 on any symbol sequences that do not match a word in the dictionary. In one implementation, system 100 may re-perform any or all of acts 420-570 on those symbol sequences not matching a word in the dictionary at predetermined intervals. In other implementations, system 100 may re-perform any or all of acts 420-570 during those instances where system 100 is in an idle state (e.g., when the user stops typing into the current file).
  • It will be further appreciated that while the above process includes both automatic replacement of symbol sequences and provisioning of a linked list, these acts may be separately provided in other implementations. For example, a process consistent with the principles of the invention may provide one or both of these separate acts.
  • FIGS. 6 and 7 illustrate a first example of the above processing consistent with the principles of the invention. With reference to FIG. 6, assume that a group of symbol sequences 600 has been received into a current file. Moreover, assume that the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • As illustrated in FIG. 6, the symbol sequence “Harrtiy” has been received into the current file. Following the processing set forth above with respect to FIGS. 4 and 5, the symbol sequence, “Harrtiy,” may be compared to words in the dictionary (e.g., dictionary 300). Since the symbol sequence, “Harrtiy,” does not match any of the words in the dictionary, the symbol sequence is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file.
  • Since the symbol sequence does not match any of the other symbol sequences in the file, it will be determined whether the symbol sequence, “Harrtiy,” closely matches any other symbol sequence in the file. In one implementation, the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence. Assume, for example, that it is determined that the probability of the symbol sequence, “Harrtiy,” matching the symbol sequence, “Harrity,” is 99%. In this instance, the symbol sequence, “Harrtiy,” would be automatically replaced with the symbol sequence, “Harrity.”
  • Assume, as an alternative, that it is determined that the probability of the symbol sequence, “Harrtiy,” matching the symbol sequence, “Harrity,” is 95% and, therefore, the symbol sequence is not automatically replaced. In this situation, the closest matching symbol sequences from the file and the closest matching symbol sequences from the dictionary may be obtained. In one implementation, the closest matching symbol sequences and words may be obtained based on the probability of the symbol sequence matching another symbol sequence or word. Assume, for example, that it is determined that the probability of the symbol sequence, “Harrtiy,” matching the symbol sequence, “Harrity,” is, as set forth above, 95%, and that the probability of the symbol sequence, “Harrtiy,” matching any other symbol sequence in the file is below 75% (the threshold value). Moreover, assume, for example, that it is determined that the probability of the symbol sequence, “Harrtiy,” matching the word, “Hearty,” is 80%, the probability of the symbol sequence, “Harrtiy,” matching the word, “Harry,” is 79%, the probability of the symbol sequence, “Harrtiy,” matching the word, “Hardy,” is 77%, and the probability of the symbol sequence, “Harrtiy,” matching the word, “Harpy,” is 76%. Therefore, the closest matching symbol sequences and words include: “Harrity,” “Hearty,” “Harry,” “Hardy,” and “Harpy.”
  • These closest matching symbol sequences and words may then be ranked based, for example, on their probability of matching the symbol sequence, “Harrtiy.” Table 1 below illustrates the ranked list of closely matching symbol sequences and words.
  • TABLE 1
    Harrity 95%
    Hearty 80%
    Harry 79%
    Hardy 77%
    Harpy 76%
  • The ranked list of closely matching symbol sequences and words may then be provided. In one implementation, the ranked list may be provided via a popup window, such as popup window 710, illustrated in FIG. 7. Popup window 710 may automatically appear when the symbol sequence, “Harrtiy,” is received or may appear in response to a user action. In the latter instance, popup window 710 may appear after additional symbol sequences have been received into the current file. As illustrated, popup window 710 may also include an auto-correct feature that allows the user to add one of the items from the ranked list to an auto-correct list.
  • Upon selection of one of the items in the ranked list, the symbol sequence, “Harrtiy,” may be automatically replaced.
  • FIGS. 8 and 9 illustrate a second example of the above processing consistent with the principles of the invention. With reference to FIG. 8, assume that a group of symbol sequences 800 has been received into a current file. Moreover, assume that the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • As illustrated in FIG. 8, the symbol sequence “elastamer” has been received into the current file. Following the processing set forth above with respect to FIGS. 4 and 5, the symbol sequence, “elastamer,” may be compared to words in the dictionary (e.g., dictionary 300). Since the symbol sequence, “elastamer,” does not match any of the words in the dictionary, the symbol sequence is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file.
  • Since the symbol sequence does not match any of the other symbol sequences in the file, it will be determined whether the symbol sequence, “elastamer,” closely matches any other symbol sequence in the file. In one implementation, the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence. Assume, for example, that it is determined that the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastomer,” is 99%. In this instance, the symbol sequence, “elastamer,” would be automatically replaced with the symbol sequence, “elastomer.” If probabilities of two or more symbol sequences from the file are above the threshold, the symbol sequence with the highest probability may be selected to replace the received symbol sequence.
  • Assume, as an alternative, that it is determined that the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastomer,” is 97% and, therefore, the symbol sequence is not automatically replaced. In this situation, the closest matching symbol sequences from the file and the closest matching symbol sequences from the dictionary may be obtained. In one implementation, the closest matching symbol sequences and words may be obtained based on the probability of the symbol sequence matching another symbol sequence or word.
  • Assume, for example, that it is determined that the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastomer,” is, as set forth above, 97%, the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastomeric,” is 90%, the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elastic,” is 79%, the probability of the symbol sequence, “elastamer,” matching the symbol sequence, “elasticity,” is 78%, and that the probability of the symbol sequence, “elastamer,” matching any other symbol sequence in the file is below 75% (the threshold value). Moreover, assume, for example, that it is determined that the only word from the dictionary that closely matches the symbol sequence is “easterner” with a probability of 75.5% (it is assumed for this example, that the word “elastomer” is not contained in the dictionary). Therefore, the closest matching symbol sequences and words include: “elastomer,” “elastomeric,” “elastic,” “elasticity,” and “easterner.”
  • These closest matching symbol sequences and words may then be ranked based, for example, on their probability of matching the symbol sequence, “elastamer.” Table 2 below illustrates the ranked list of closely matching symbol sequences and words.
  • TABLE 2
    elastomer 97%
    elastomeric 90%
    elastic 79%
    elasticity 78%
    easterner 75.5%  
  • The ranked list of closely matching symbol sequences and words may then be provided. In one implementation, the ranked list may be provided via a popup window, such as popup window 910, illustrated in FIG. 9. Popup window 910 may automatically appear when the symbol sequence, “elastamer,” is received or may appear in response to a user action. In the latter instance, popup window 910 may appear after additional symbol sequences have been received into the current file. As illustrated, popup window 910 may also include an auto-correct feature that allows the user to add one of the items from the ranked list to an auto-correct list.
  • Upon selection of one of the items in the ranked list, the symbol sequence, “elastamer,” may be automatically replaced.
  • FIGS. 10 and 11 illustrate a third example of the above processing consistent with the principles of the invention. With reference to FIG. 10, assume that a group of symbol sequences 1000 has been received into a current file. Moreover, assume that the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • As illustrated in FIG. 10, the symbol sequence “synchronise” has been received into the current file. Following the processing set forth above with respect to FIGS. 4 and 5, the symbol sequence, “synchronise,” may be compared to words in the dictionary (e.g., dictionary 300). Since the symbol sequence, “synchronise,” does not match any of the words in the dictionary, the symbol sequence is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file.
  • Since the symbol sequence does not match any of the other symbol sequences in the file, it will be determined whether the symbol sequence, “synchronise,” closely matches any other symbol sequence in the file. In one implementation, the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence. Assume, for this example, that the probability of the most closely matching symbol sequence from the file is determined to be 95% and, therefore, the symbol sequence is not automatically replaced. In this situation, the closest matching symbol sequences from the file and the closest matching symbol sequences from the dictionary may be obtained. In one implementation, the closest matching symbol sequences and words may be obtained based on the probability of the symbol sequence matching another symbol sequence or word.
  • Assume, for example, that it is determined that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronized,” is 95%, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronization,” is 85%, and that the probability of the symbol sequence, “synchronise,” matching any other symbol sequence in the file is below 75% (the threshold value).
  • As set forth above, the words selected from the dictionary may be based on the most closely matching symbol sequences in the file. Since the symbol sequence, “synchronized,” has a high probability of matching, the following words may be obtained from the dictionary: “synchronize,” “synchronizes,” and “synchronizing.” Assume, for example, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronize,” is 99%. In this example, the symbol sequence, “synchronise,” would be automatically replaced with the symbol sequence, “synchronize.”
  • As an alternative, assume, for example, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronize,” is 97.9%, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronizes,” is 95%, that the probability of the symbol sequence, “synchronise,” matching the symbol sequence, “synchronizing,” is 87%, and that the probability of the symbol sequence, “synchronise,” matching any other word from the dictionary is below 75% (the threshold value).
  • These closest matching symbol sequences and words may then be ranked based, for example, on their probability of matching the symbol sequence, “synchronise.” Table 3 below illustrates the ranked list of closely matching symbol sequences and words.
  • TABLE 3
    synchronize 97.9%  
    synchronized 95%
    synchronizes 95%
    synchronizing 87%
    synchronization 85%
  • The ranked list of closely matching symbol sequences and words may then be provided. In one implementation, the ranked list may be provided via a popup window, such as popup window 1110, illustrated in FIG. 11. Popup window 1110 may automatically appear when the symbol sequence, “synchronise,” is received or may appear in response to a user action. In the latter instance, popup window 1110 may appear after additional symbol sequences have been received into the current file. As illustrated, popup window 1110 may also include an auto-correct feature that allows the user to add one of the items from the ranked list to an auto-correct list.
  • Upon selection of one of the items in the ranked list, the symbol sequence, “synchronise,” may be automatically replaced.
  • FIGS. 12 and 13 illustrate a fourth example of the above processing consistent with the principles of the invention. With reference to FIG. 12, assume that a group of symbol sequences 1200 has been received into a current file. Moreover, assume that the probability threshold for automatically replacing a symbol sequence in the current file is 98% and the probability threshold for determining whether another symbol sequence or word closely matches the symbol sequence is 75%. It will be appreciated that these threshold values are provided for explanatory purposes only. Other values may alternatively be used.
  • As set forth above, at periodic intervals (e.g., during idle periods), the processing described in FIGS. 4 and 5 may be performed on symbol sequences 1200 in the current file. With reference to FIG. 12, since the symbol sequence 1210, “sychnchronization,” does not match any word in the dictionary, symbol sequence 1210 is compared to all other symbol sequences in the file to determine whether it matches any of the other symbol sequences in the file. Since symbol sequence 1210 does not match any of the other symbol sequences in the file, it is determined whether the symbol sequence, “sychnchronization,” closely matches any other symbol sequence in the file. In one implementation, the determination of whether the symbol sequence closely matches another symbol sequence may be based on the probability of the symbol sequence matching another symbol sequence. Assume, for this example, that the probability of symbol sequence 1210 matching the symbol sequence, “synchronization,” is determined to be 99%. Therefore, symbol sequence 1210 is automatically replaced with symbol sequence 1310, “synchronization,” as illustrated in FIG. 13. In this situation, system 100 may periodically go through a current file and automatically correct misspelled words.
  • CONCLUSION
  • Implementations consistent with the principles of the invention improve the changing of symbol sequences in documents. In one implementation, an unknown symbol sequence in a document, such as a word processing document, is compared to other symbol sequences in the word processing document. If the unknown symbol sequence closely matches another symbol sequence in the word processing document, the unknown symbol sequence may be automatically replaced with the closely matching symbol sequence. Alternatively, a list of closely matching symbol sequences from the word processing document may be generated, along with a list of closely matching words from a dictionary. The two lists may be combined and ranked based on the symbol sequences in the word processing document. The ranked list (or a predetermined number of highest ranking items in the ranked list) may be provided to a user. Upon selection of one of the items in the provided list, the unknown symbol sequence may be automatically replaced with the selected item. In this way, auto-correction of symbol sequences in documents can be improved.
  • The foregoing description of exemplary embodiments of the invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. For example, while not explicitly described above, it will be appreciated that the probabilities assigned to potentially matching symbol sequences and words may be based, in some implementations consistent with the principles of the invention, on the context of the received symbol sequence in the file.
  • While a series of acts has been described with regard to FIGS. 4 and 5, the order of the acts may be varied in other implementations consistent with the invention. Moreover, non-dependent acts may be implemented in parallel.
  • It will also be apparent to one of ordinary skill in the art that aspects of the invention, as described above, may be implemented in many different forms of software, firmware, and hardware in the implementations illustrated in the figures. The actual software code or specialized control hardware used to implement aspects consistent with the principles of the invention is not limiting of the invention. Thus, the operation and behavior of the aspects of the invention were described without reference to the specific software code it being understood that one of ordinary skill in the art would be able to design software and control hardware to implement the aspects based on the description herein.
  • Further, certain portions of the invention may be implemented as “logic” that performs one or more functions. This logic may include hardware, such as an application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.
  • No element, act, or instruction used in the description of the invention should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.

Claims (8)

1-13. (canceled)
14. A computer-readable medium comprising instructions for causing at least one processor to perform a method comprising:
receiving a symbol sequence into a document;
identifying another symbol sequence in the document whose probability of matching the received symbol sequence is above a threshold; and
replacing the received symbol sequence with the other symbol sequence.
15. The computer-readable medium of claim 14 wherein the threshold is configurable.
16. The computer-readable medium of claim 14 wherein the method further comprises:
identifying one or more variations of the other symbol sequence; and
replacing the received symbol sequence with a variation of the other symbol sequence when a probability of the variation matching the received symbol sequence is higher than the probability of the other symbol sequence matching the received symbol sequence.
17. The computer-readable medium of claim 14 wherein the replacing includes:
replacing the received symbol sequence with a word from a dictionary when a probability of the word matching the received symbol sequence is higher than the probability of the other symbol sequence matching the received symbol sequence.
18. A computer-readable medium comprising instructions that cause at least one processor to perform a method comprising:
identifying a symbol sequence in a document that closely matches a first symbol sequence in the document;
identifying a word in a list of previously-stored words that closely matches the first symbol sequence;
ranking the identified symbol sequence and word to form a ranked list of objects; and
providing the ranked list of objects.
19. The computer-readable medium of claim 18 wherein the identifying a symbol sequence in a document that closely matches a first symbol sequence in the document includes:
identifying the symbol sequence based on a probability of the symbol sequence matching the first symbol sequence, and
wherein the identifying a word in a list of previously-stored words that closely matches the first symbol sequence includes:
identifying the word based on a probability of the word matching the first symbol sequence.
20. The computer-readable medium of claim 19 wherein the method further comprises:
removing, prior to providing the ranked list of objects, at least one of the identified symbol sequence or word whose probability of matching the first symbol sequence is below a threshold.
US12/644,585 2004-07-12 2009-12-22 Systems and methods for changing symbol sequences in documents Abandoned US20100100555A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/644,585 US20100100555A1 (en) 2004-07-12 2009-12-22 Systems and methods for changing symbol sequences in documents

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US58671904P 2004-07-12 2004-07-12
US10/895,330 US7664748B2 (en) 2004-07-12 2004-07-21 Systems and methods for changing symbol sequences in documents
US12/644,585 US20100100555A1 (en) 2004-07-12 2009-12-22 Systems and methods for changing symbol sequences in documents

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/895,330 Continuation US7664748B2 (en) 2004-07-12 2004-07-21 Systems and methods for changing symbol sequences in documents

Publications (1)

Publication Number Publication Date
US20100100555A1 true US20100100555A1 (en) 2010-04-22

Family

ID=35542566

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/895,330 Expired - Fee Related US7664748B2 (en) 2004-07-12 2004-07-21 Systems and methods for changing symbol sequences in documents
US12/644,585 Abandoned US20100100555A1 (en) 2004-07-12 2009-12-22 Systems and methods for changing symbol sequences in documents

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/895,330 Expired - Fee Related US7664748B2 (en) 2004-07-12 2004-07-21 Systems and methods for changing symbol sequences in documents

Country Status (1)

Country Link
US (2) US7664748B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130290896A1 (en) * 2012-04-30 2013-10-31 Apple Inc. Symbol Disambiguation

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7614008B2 (en) 2004-07-30 2009-11-03 Apple Inc. Operation of a computer with touch screen interface
US9292111B2 (en) 1998-01-26 2016-03-22 Apple Inc. Gesturing with a multipoint sensing device
US20060033724A1 (en) * 2004-07-30 2006-02-16 Apple Computer, Inc. Virtual input device placement on a touch screen user interface
US9239673B2 (en) 1998-01-26 2016-01-19 Apple Inc. Gesturing with a multipoint sensing device
US8479122B2 (en) 2004-07-30 2013-07-02 Apple Inc. Gestures for touch sensitive input devices
US8381135B2 (en) 2004-07-30 2013-02-19 Apple Inc. Proximity detector in handheld device
US7606816B2 (en) * 2005-06-03 2009-10-20 Yahoo! Inc. Record boundary identification and extraction through pattern mining
US8028230B2 (en) * 2007-02-12 2011-09-27 Google Inc. Contextual input method
US20160041626A1 (en) * 2014-08-06 2016-02-11 International Business Machines Corporation Configurable character variant unification
US10550635B2 (en) 2017-08-09 2020-02-04 Whole Space Industries Ltd Window covering control apparatus
US10676988B2 (en) 2017-09-20 2020-06-09 Whole Space Industries Ltd. Window covering control apparatus
USD935221S1 (en) 2019-06-26 2021-11-09 Whole Space Industries Ltd Bottom rail for a window covering

Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5146405A (en) * 1988-02-05 1992-09-08 At&T Bell Laboratories Methods for part-of-speech determination and usage
US5560037A (en) * 1987-12-28 1996-09-24 Xerox Corporation Compact hyphenation point data
US5604897A (en) * 1990-05-18 1997-02-18 Microsoft Corporation Method and system for correcting the spelling of misspelled words
US5682439A (en) * 1995-08-07 1997-10-28 Apple Computer, Inc. Boxed input correction system and method for pen based computer systems
US5761689A (en) * 1994-09-01 1998-06-02 Microsoft Corporation Autocorrecting text typed into a word processing document
US5764910A (en) * 1996-04-02 1998-06-09 National Semiconductor Corporation Method and apparatus for encoding and using network resource locators
US5787451A (en) * 1995-05-08 1998-07-28 Microsoft Corporation Method for background spell checking a word processing document
US5896321A (en) * 1997-11-14 1999-04-20 Microsoft Corporation Text completion system for a miniature computer
US5940847A (en) * 1995-06-07 1999-08-17 Microsoft Corporation System and method for automatically correcting multi-word data entry errors
US6047300A (en) * 1997-05-15 2000-04-04 Microsoft Corporation System and method for automatically correcting a misspelled word
US6085206A (en) * 1996-06-20 2000-07-04 Microsoft Corporation Method and system for verifying accuracy of spelling and grammatical composition of a document
US6131102A (en) * 1998-06-15 2000-10-10 Microsoft Corporation Method and system for cost computation of spelling suggestions and automatic replacement
US20020010726A1 (en) * 2000-03-28 2002-01-24 Rogson Ariel Shai Method and apparatus for updating database of automatic spelling corrections
US20020095448A1 (en) * 2001-01-16 2002-07-18 Scott Selby System and method for managing statistical data regarding corrections to word processing documents
US6424983B1 (en) * 1998-05-26 2002-07-23 Global Information Research And Technologies, Llc Spelling and grammar checking system
US20030033288A1 (en) * 2001-08-13 2003-02-13 Xerox Corporation Document-centric system with auto-completion and auto-correction
US6601059B1 (en) * 1998-12-23 2003-07-29 Microsoft Corporation Computerized searching tool with spell checking
US20030172357A1 (en) * 2002-03-11 2003-09-11 Kao Anne S.W. Knowledge management using text classification
US6770631B1 (en) * 1997-12-31 2004-08-03 Adprotech Limited Non-identical genes and their application in improved molecular adjuvants
US20040197791A1 (en) * 2001-06-29 2004-10-07 Makarov Vladimir L. Methods of using nick translate libraries for snp analysis
US20040220920A1 (en) * 2003-02-24 2004-11-04 Bax Eric Theodore Edit distance string search
US20050210383A1 (en) * 2004-03-16 2005-09-22 Silviu-Petru Cucerzan Systems and methods for improved spell checking
US20050283726A1 (en) * 2004-06-17 2005-12-22 Apple Computer, Inc. Routine and interface for correcting electronic text
US7047493B1 (en) * 2000-03-31 2006-05-16 Brill Eric D Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction
US7069271B1 (en) * 2000-11-03 2006-06-27 Oracle International Corp. Methods and apparatus for implementing internet storefronts to provide integrated functions
US20060143564A1 (en) * 2000-12-29 2006-06-29 International Business Machines Corporation Automated spell analysis
US7149970B1 (en) * 2000-06-23 2006-12-12 Microsoft Corporation Method and system for filtering and selecting from a candidate list generated by a stochastic input method
US20070061753A1 (en) * 2003-07-17 2007-03-15 Xrgomics Pte Ltd Letter and word choice text input method for keyboards and reduced keyboard systems

Patent Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5560037A (en) * 1987-12-28 1996-09-24 Xerox Corporation Compact hyphenation point data
US5146405A (en) * 1988-02-05 1992-09-08 At&T Bell Laboratories Methods for part-of-speech determination and usage
US5604897A (en) * 1990-05-18 1997-02-18 Microsoft Corporation Method and system for correcting the spelling of misspelled words
US5761689A (en) * 1994-09-01 1998-06-02 Microsoft Corporation Autocorrecting text typed into a word processing document
US5787451A (en) * 1995-05-08 1998-07-28 Microsoft Corporation Method for background spell checking a word processing document
US5940847A (en) * 1995-06-07 1999-08-17 Microsoft Corporation System and method for automatically correcting multi-word data entry errors
US5682439A (en) * 1995-08-07 1997-10-28 Apple Computer, Inc. Boxed input correction system and method for pen based computer systems
US5764910A (en) * 1996-04-02 1998-06-09 National Semiconductor Corporation Method and apparatus for encoding and using network resource locators
US6085206A (en) * 1996-06-20 2000-07-04 Microsoft Corporation Method and system for verifying accuracy of spelling and grammatical composition of a document
US6047300A (en) * 1997-05-15 2000-04-04 Microsoft Corporation System and method for automatically correcting a misspelled word
US5896321A (en) * 1997-11-14 1999-04-20 Microsoft Corporation Text completion system for a miniature computer
US6770631B1 (en) * 1997-12-31 2004-08-03 Adprotech Limited Non-identical genes and their application in improved molecular adjuvants
US6424983B1 (en) * 1998-05-26 2002-07-23 Global Information Research And Technologies, Llc Spelling and grammar checking system
US20040093567A1 (en) * 1998-05-26 2004-05-13 Yves Schabes Spelling and grammar checking system
US6131102A (en) * 1998-06-15 2000-10-10 Microsoft Corporation Method and system for cost computation of spelling suggestions and automatic replacement
US6601059B1 (en) * 1998-12-23 2003-07-29 Microsoft Corporation Computerized searching tool with spell checking
US20020010726A1 (en) * 2000-03-28 2002-01-24 Rogson Ariel Shai Method and apparatus for updating database of automatic spelling corrections
US7047493B1 (en) * 2000-03-31 2006-05-16 Brill Eric D Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction
US7149970B1 (en) * 2000-06-23 2006-12-12 Microsoft Corporation Method and system for filtering and selecting from a candidate list generated by a stochastic input method
US7069271B1 (en) * 2000-11-03 2006-06-27 Oracle International Corp. Methods and apparatus for implementing internet storefronts to provide integrated functions
US20060143564A1 (en) * 2000-12-29 2006-06-29 International Business Machines Corporation Automated spell analysis
US20020095448A1 (en) * 2001-01-16 2002-07-18 Scott Selby System and method for managing statistical data regarding corrections to word processing documents
US20040197791A1 (en) * 2001-06-29 2004-10-07 Makarov Vladimir L. Methods of using nick translate libraries for snp analysis
US20030033288A1 (en) * 2001-08-13 2003-02-13 Xerox Corporation Document-centric system with auto-completion and auto-correction
US20030172357A1 (en) * 2002-03-11 2003-09-11 Kao Anne S.W. Knowledge management using text classification
US20040220920A1 (en) * 2003-02-24 2004-11-04 Bax Eric Theodore Edit distance string search
US20070061753A1 (en) * 2003-07-17 2007-03-15 Xrgomics Pte Ltd Letter and word choice text input method for keyboards and reduced keyboard systems
US20050210383A1 (en) * 2004-03-16 2005-09-22 Silviu-Petru Cucerzan Systems and methods for improved spell checking
US20050283726A1 (en) * 2004-06-17 2005-12-22 Apple Computer, Inc. Routine and interface for correcting electronic text

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130290896A1 (en) * 2012-04-30 2013-10-31 Apple Inc. Symbol Disambiguation

Also Published As

Publication number Publication date
US7664748B2 (en) 2010-02-16
US20060010109A1 (en) 2006-01-12

Similar Documents

Publication Publication Date Title
US20100100555A1 (en) Systems and methods for changing symbol sequences in documents
US9460066B2 (en) Systems and methods for character correction in communication devices
US10152139B2 (en) Autocompletion method and system
US10019435B2 (en) Space prediction for text input
US20190163361A1 (en) System and method for inputting text into electronic devices
US7823138B2 (en) Distributed testing for computing features
US6047300A (en) System and method for automatically correcting a misspelled word
US8612213B1 (en) Correction of errors in character strings that include a word delimiter
US6918086B2 (en) Method and apparatus for updating database of automatic spelling corrections
US20160306875A1 (en) Predicting a command in a command line interface
US20150142705A1 (en) String prediction
US20100115402A1 (en) System for data entry using multi-function keys
US20060149557A1 (en) Sentence displaying method, information processing system, and program product
WO2009032483A1 (en) Virtual keyboards in multi-language environment
WO2008095153A2 (en) Spell-check for a keyboard system with automatic correction
JP2013519131A (en) Context-sensitive automatic language correction using an Internet corpus specifically for small keyboard devices
US20150177851A1 (en) User input error detection and correction system
US8112430B2 (en) System for modifying a rule base for use in processing data
WO2018111703A1 (en) Predicting text by combining candidates from user attempts
KR20100080345A (en) System and method for prompting an end user with a preferred sequence of commands which performs an activity in a least number of inputs
EP1975810B1 (en) Use of a suffix-changing spell check algorithm for a spell check function, and associated handheld electronic device
JP2022014966A (en) Information processing device, domain organization information production device, information processing method, domain organization information production method, and program
CA2821787C (en) Electronic device and method for a bidirectional context-based text disambiguation
JP2000222432A (en) Document retrieval device, document retrieval method and recording medium recording document retrieval program
CN117453437A (en) Database statement processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION