WO2001015140A1

WO2001015140A1 - Speech recognition system for data entry

Info

Publication number: WO2001015140A1
Application number: PCT/CA2000/000776
Authority: WO
Inventors: Alexei B. Machovikov; Kirill V. Stolyarov; Maxim A. Chernoff
Original assignee: Telum Canada, Inc.
Priority date: 1999-07-01
Filing date: 2000-07-04
Publication date: 2001-03-01
Also published as: AU5668300A; CA2342787A1

Abstract

A speech recognition method and system for data entry includes a speech recognition engine (36) to review speech of a user and to recognize a search phrase therein. The recognized search phrase is applied to an index in a database engine (40) to locate an appropriate index entry and corresponding textual output messages. A corresponding textual output message is displayed to the user at a user terminal (24) for approval and/or completion and then the completed and approved output message is provided as an input to a data processing system (28), such as a wireless paging system. Multiple corresponding textual messages can be provided for an index entry, for example in different languages and/or character sets, and in such a case the user will select the desired available message.

Description

SPEECH RECOGNITION SYSTEM FOR DATA ENTRY

FIELD OF THE INVENTION

The present invention relates to data entry systems and methods. More specifically, the present invention relates to a method and system for substantially real time entry of predefined information into a data processing system, such as a paging network.

BACKGROUND OF THE INVENTION

Many data entry systems are known and such systems include keyboards, pointing devices such as mice or graphics tablets and, more recently, speech recognition systems. While known data entry systems are quite suitable in many applications, they do still suffer from some disadvantages. For example, keyboard-based data entry in languages which employ ideographic or pictographic character sets with large numbers of characters, such as Japanese or Chinese, can be difficult and/or inefficient to perform. This is especially true in real time applications, such as paging or call center applications, where a customer is communicating with an operator and the customer must wait at least for the time required by the operator to input data. Even for less time-sensitive matters, the efficiency of the operator impacts the costs associated with providing the services, with faster data entry allowing fewer operators to handle customers.

Speech recognition systems also suffer from disadvantages in that they must be trained by each user for the vocabulary to be recognized and this can require a significant amount of time and effort. Further, less than desired results can be obtained due to a variety of factors including background noise, poor enunciation by the user, etc.

It is therefore desired to have a data entry system and method which permits the convenient and efficient input of data, especially data in ideographic or pictographic character sets and in real time.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a novel data entry system and method of inputting data which obviates or mitigates at least one disadvantage of the prior art.

According to a first aspect of the present invention, there is provided a data entry system comprising: a speech recognition engine operable to receive speech and to recognize a search phrase therein; a database engine in communication with the speech recognition engine, the database engine including an index against which said recognized search phrase is applied to identify a corresponding index entry, each index entry having at least one textual output message defined therefore; a user terminal in communication with the database engine, the user interface (24) including a display device for displaying said at least one textual output message corresponding to said identified index entry, and a user input device for receiving a user input representing an approval and/or a completion of said displayed textual output message, the database engine (40) being configured for outputting said approved and/or completed textual output message upon receipt of said user input.

According to another aspect of the present invention, there is provided a method of performing data entry comprising the steps of:

(i) defining a database having an index having at least one index entry and at least one textual output message corresponding to each said index entry;

(ii) performing speech recognition on at least a portion of the speech of a user to recognize a search phrase corresponding to said at least one index entry;

(iii) applying said recognized search phrase to said database to identify a corresponding index entry; (iv) presenting to said user said at least one textual output message corresponding to said index entry for completion and/or approval; and

(v) receiving input from said user representing the approval and/or completion of said at least one textual output message.

The present invention provides a speech recognition method and system for data entry which includes a speech recognition engine to review speech of a user and to recognize a search phrase therein. The recognized search phrase is applied to an index in a database engine to locate an appropriate index entry and corresponding textual output messages. A corresponding textual output message is displayed to the user for approval and/or completion and then the completed and approved output message is provided as an input to a data processing system, such as a wireless paging system. Multiple corresponding textual messages can be provided for an index entry, for example in different languages and/or character sets, and in such a case the user will select the desired available message. BRIEF DESCRIPTION OF THE DRAWINGS

The preferred embodiment of the present invention will now be described, by way of example only, with reference to the drawings in which:

Figure 1 shows a schematic representation of the data entry system in accordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

A data entry system in accordance with an embodiment of the present invention is indicated generally at 20 in Figure 1. System 20 includes a data entry terminal 24 which can be any suitable data entry terminal such as a VT-100 or other "dumb terminal" or a personal computer. As shown, terminal 24 includes a keyboard and a display. Data input by a user of system 20 is passed to a data processing system 28, as discussed in more detail below. Data processing system 28 can be any computer-implemented system requiring data input such as an order entry system, an inventory control system and, in a preferred embodiment of the present invention, is a wireless paging network. System 20 also includes a microphone 32 which, in a preferred embodiment of the invention, is the mouthpiece of a telephone headset or handset but which can be any suitable microphone or other mechanism for capturing the voice of a user. Microphone 32 is connected to a speech recognition engine 36 which can be any appropriate speech recognition system. As described in more detail below, speech recognition engine 36 can employ Hidden Markov Models (HMM) or other known algorithms to recognize speech and can be implemented in dedicated hardware or as an application running on a general purpose personal computer with adequate memory and processing capacity.

The output of speech recognition engine 36 is applied to a database engine 40 which can be any suitable database such as those sold by Oracle, or a Microsoft Access database, etc. As described below in more detail, database engine 36 maintains at least one table relating predefined recognized phrases with corresponding textual message outputs. Selected corresponding textual message outputs from database engine 40 can be reviewed, approved, amended, modified from user terminal 24, or alternative selections of textual message output from user terminal 24, before they are output to data processing system 28. In use, a user defines a set of textual output messages of interest. These messages are selected as being text strings which will be commonly used by the user and can be represented in any language or character set desired, including multi-byte Unicode character sets and/or ideographic character sets. Once a set of textual output messages of interest is defined, the user defines an index entry for each textual output message of interest. The index entry will correspond, in form, to the output of speech recognition engine 36. For example, if speech recognition engine 36 outputs text phrases, the index entries in database engine 40 will be in textual format. If speech recognition engine 36 outputs phoneme and prosity information, index entries in database engine 40 will be in a corresponding phoneme and prosity form.

In an embodiment of the present invention wherein data processing system 28 is a paging system, and assuming that speech recognition engine 36 outputs recognized text phrases, examples of textual output messages of interest and their corresponding index phrases can for example, include:

Index Entry: Textual Output Message:

Pick you up at airport For flight arrival information, call 555-1212. Please pick me up at at the airport at

Please call you at Please return my call at your first convenience at

Cell number is I can be reached at my cellular and the number is

As will be apparent to those of skill in the art, textual output messages, and their corresponding index entries, can be added, amended or deleted from database engine 40 by users as desired. As will also be apparent to those of skill in the art, by selecting a limited set of textual output messages, compared to general purpose speech recognition systems such as dictation systems, etc., speech recognition engine 36 need not be extremely sophisticated. In fact it is contemplated that in some circumstances speech recognition engine 36 may not require training for each individual user and yet can provide acceptably accurate recognition of index entries.

In the above-mentioned example where data processing system 28 is a paging system, a paging operator (i.e. - a user) can answer incoming calls in the conventional manner.^' Microphone 32 can either be an additional microphone into which the operator can speak when desired, or can be the mouthpiece of an otherwise conventional telephone headset or handset. In the latter case, a switch (not shown) is provided which allows the operator to speak such that the person on the other end of the telephone (the caller) can hear the operator or to speak such that the caller and speech recognition engine 36 can each "hear" the operator. In either case, the operator can listen to the caller and repeat back the message to the caller, ensuring that speech recognition engine 36 is able to hear any appropriate index entries by either activating the above- mentioned switch or by speaking into separate microphone 32. For example, the caller can say, "Please tell Mr. Jones to pick me up at the airport at 5:00PM" and the operator will repeat back, "Mr. Jones is to pick you up at airport at 5 :00PM" and will ensure that speech recognition engine 32 hears at least, "pick you up at airport at...".

In this example, speech recognition engine 36 will analyze the speech it has heard and will provide the output of its analysis, as a search input, to database engine 40. Database engine 40 compares the received search input to the index entries in its table or tables and selects the appropriate table entry. The corresponding textual output string, in this example, "For flight arrival information, call 555-1212. Please pick me up at the airport at" is selected by database engine 40 and is displayed on user terminal 24 for approval and/or completion by the operator. In this specific example, the operator would verify that the correct textual output message has been identified and will complete the output message by entering the text " 5 : 00PM" , representing variable information, in a conventional manner such as by the keyboard. It is contemplated that such variable information will be preferably be input in a conventional manner, such as by a keyboard, although it is also contemplated that speech recognition engine 40 can be used to input such information based upon recognized speech of the operator. It is further contemplated that the defined output textual messages are not limited to messages which require completion with variable information, or to messages in which the variable information is located at the end of the message. Specifically, the textual output messages can include one or more embedded codes that identify areas of the message to be completed with variable information and user terminal 24 will automatically place the input cursor at the first field to be completed when displaying an output textual message to the operator for approval. The operator can then complete each field as necessary. Once the output textual message is approved and/or completed by the operator, the approved message is forwarded to data processing system 28.

As indicated in this example, the textual output message can include additional desired information. Specifically, a telephone number for obtaining related information, such as flight arrival information, can be provided as a static part of the textual output message. Further, as also indicated in the example, the textual output message can be significantly longer than the index entry and can thus improve the efficiency of the operator by reducing the number of keystrokes which are required to complete the final input to data processing system 28. This can be an significant advantage when output textual messages are represented in Unicode and/or ideographic character sets which can require multiple keypresses to be performed for each desired character.

It is also contemplated that index entries and output textual messages in database engine 40 can be in different languages. For example, the index entries in database engine 40 can be in English (in any suitable form such as textual or phonetic) and the corresponding textual output messages can be in Unicode Mandarin Chinese. In this manner an operator speaking with an English language caller will be able to create output messages in Mandarin Chinese. In this case, if variable completion information is required, it can be selected from a list of appropriate choices displayed to the operator in English and, once a selection is made, database engine 40 will complete the textual output message with predefined corresponding Mandarin Chinese text.

It is further contemplated that database engine 40 can include multiple textual output messages, arranged by languages of interest, for each index entry. In this case, the textual output messages displayed to the operator on user terminal 24 for approval and/or completion will be in a language selected by the operator, who can, once the message is completed and/or approved, indicate which of the available languages it is to be input to data processing system 28 in.

The present invention provides an efficient real-time data entry system in which user speech is analyzed to extract a search phrase. This search phrase is used to search an index to locate an index entry for which one or more textual output phrases have been defined. When an index entry is found, a corresponding textual output message is presented to the user for approval and/or completion by the user and is then provided as input to a data processing system, such as a paging system. If more than one corresponding textual output message is defined for an index entry, such as messages in different languages or character sets, the user can select the desired textual output message. The corresponding textual output messages can include additional information, defined fields to be completed by the user and/or can be in a different language from the index entry.

The above-described embodiments of the invention are intended to be examples of the present invention and alterations and modifications may be effected thereto, by those of skill in the art, without departing from the scope of the invention which is defined solely by the claims appended hereto.

Claims

WE CLAIM:

1. A data entry system comprising: a speech recognition engine (36) operable to receive speech and to recognize a search phrase therein; a database engine (40) in communication with the speech recognition engine (36), the database engine (40) including an index against which said recognized search phrase is applied to identify a corresponding index entry, each said index entry having at least one textual output message defined therefor; and a user interface (24) in communication with the database engine (40), the user interface (24) including a display device for displaying said at least one textual output message corresponding to said identified index entry, and a user input device for receiving a user input representing an approval and or a completion of said displayed textual output message, the database engine (40) being configured for outputting said approved and/or completed textual output message upon receipt of said user input.

2. The data entry system according to claim 1, wherein the database engine (40) is configured for outputting said approved and/or completed textual output message as a message input to a database processing system (28).

3. The data entry system according to any of the preceding claims, wherein the database engine (40) is configured for receiving said at least one defined textual output message from the user interface (24).

4. The data entry system according to any of the preceding claims, wherein at least one of the index entries includes information in a first language, and said associated at least one textual output message includes information in a second language different from the first language.

5. The data entry system according to any of the preceding claims, wherein at least one of the index entries includes phoneme and prosity information.

6. The data entry system according to any of the preceding claims, wherein at least one of the textual output messages includes Unicode characters.

7. A method of performing data entry comprising the steps of:

(i) providing a database including an index having at least one index entry and at least one textual output message corresponding to each said index entry;

(ii) performing speech recognition on at least a portion of the speech of a user to identify a recognized search phrase corresponding to said at least one index entry;

(iii) applying said recognized search phrase to said database to identify said at least one index entry;

(iv) presenting to said user for completion and/or approval said at least one textual output message corresponding to said at least one index entry; and

8. The method according to claim 7, wherein the step of receiving input comprises receiving information for inclusion with said at least one textual output message.

9. The method according to any of claims 7 to 8, wherein said at least one index entry includes information in a first language, and the presenting step comprises displaying said at least one textual output message in a second language different from the first language.

10. The method according to any of claims 7 to 9, further comprising the step of applying said approved and completed textual output message as an input to a data processing system.

11. The method according to claim 10, wherein the data processing system comprises a wireless paging system.