(12) United States Patent ao) Patent No.: us 6,567,778 Bi
Chao Chang et al. (45) Date of Patent: May 20,2003
(21) Appl. No.: 09/339,470
(22) Filed: Jun. 23, 1999
Related U.S. Application Data
(63) Continuation-in-part of application No. 08/576,874, filed on
Dec. 21, 1995, now Pat. No. 6,292,764. (60) Provisional application No. 60/091,047, filed on Jun. 29,
1998.
(51) Int. CI.7 G10L 15/18
(52) U.S. CI 704/257; 704/275
(58) Field of Search 704/257, 275
(56) References Cited
U.S. PATENT DOCUMENTS
4,590,604 A 5/1986 Feilchenfeld 381/42
5,127,055 A 6/1992 Larkey 381/43
5,418,717 A * 5/1995 Su et al 704/9
5,528,731 A 6/1996 Sachs et al 395/2.55
5,568,540 A 10/1996 Greco et al 379/88.25
5,617,486 A 4/1997 Chow et al 382/181
5,651,054 A 7/1997 Dunn et al 379/67
5,717,743 A 2/1998 McMachon et al 379/188
5,742,905 A 4/1998 Pepe et al 455/461
5,794,192 A 8/1998 Zhao 704/244
5,822,405 A 10/1998 Astarabadi 379/88
5,842,161 A 11/1998 Cohrs et al 704/251
5,848,130 A 12/1998 Rochkind 379/67
5,937,384 A * 8/1999 Huang et al 704/256
6,044,347 A * 3/2000 Abella et al 704/272
6,058,363 A * 5/2000 Ramalingam 704/251
6,181,780 Bl 1/2001 Finnigan 379/67.1
6,208,713 Bl 3/2001 Rahrer et al 379/67.1
6,219,407 Bl 4/2001 Kanevsky et al 379/88.02
6,272,455 Bl * 8/2001 Hoshen et al 704/1
6,327,343 Bl 12/2001 Epstein et al 379/88.01
6,275,801 Bl * 8/2002 Novak et al 704/252
FOREIGN PATENT DOCUMENTS
JP 10079785 3/1998 H04M/1/57
* cited by examiner
Primary Examiner—Talivaldis Ivars Smits
(74) Attorney, Agent, or Firm—Haverstock & Owens LLP
(57) ABSTRACT
A stream of input speech is coupled as an input to a speech recognizer. The speech can be provided to the speech recognizer directly from a user or first stored and provided from a memory circuit. Each input word is recognized by the speech recognizer and a word confidence score is associated with each corresponding recognized word. The recognized words and their associated word confidence scores are provided to a natural language interpreter which parses the stream of recognized words into predetermined edges. From the edges, the natural language interpreter forms semantic slots which represent a semantic meaning. A slot confidence score related to the word or phone confidence scores for each of the words in the slot is determined for each slot. Based upon the slot confidence score, an ancillary application program determines whether to accept the words used to fill each slot. If the slot is rejected, the application program can request the user to repeat the information necessary to fill that slot only, rather than requiring the user to repeat the entire stream of input speech.
53 Claims, 1 Drawing Sheet
![[graphic]](http://www.google.de/patents?id=f30MAAAAEBAJ&hl=de&ie=ISO-8859-1&output=text&pg=PA1&img=1&zoom=3&hl=de&q=&cds=1&sig=ACfU3U1jB0f8vIzQQVdOzF-SGKIe0N5rkA&edge=0&edge=stretch&ci=314,974,378,374)
1 2
NATURAL LANGUAGE SPEECH Example: RECOGNITION USING SLOT SEMANTIC CONFIDENCE SCORES RELATED TO
THEIR WORD RECOGNITION Sentence
CONFIDENCE SCORES Subiect
i
VerbPhrase
This application is a continuation-in-part application of Verb U.S. patent application Ser. No. 08/576,874 filed on Dec. 21, 1995 and entitled METHOD AND SYSTEM FOR BUILD- 10 ING AND RUNNING NATURAL LANGUAGE UNDER- STANDING SYSTEMS, now U.S. Pat. No. 6,292,764, fly
want
complement InfVerbPhrase to
issued Sep. 18, 2001. This application also claims priority of provisional application serial No. 60/091,047 filed Jun. 29,
1998 entitled METHOD AND APPARATUS FOR PRO- 15 Np CESSING AND INTERPRETING NATURAL LAN- £oun GUAGE IN A VOICE ACTIVATED APPLICATION, PpSt0n invented by Eric I Chao Chang and Eric G. Jackson. Preposition
FIELD OF THE INVENTION 2
This invention relates to the field of interpreting natural language. More particularly, this invention relates to a
method and apparatus for processing and interpreting natural THE BACKGROUND DISCUSSION
language which enhances the operation through the use of 25 XT , ,, . . , „, , , c
b . , . 6 Natural language interpreters are well known and used tor
semantic confidence values to enhance efficiency. a yariety of applications. 0ne common use is for an auto
mated telephone system. It will be apparent to those of BACKGROUND OF THE INVENTION ordinary skill in the art that these techniques can and have
30 been applied to a variety of other uses. For example, one Definitions could use such a system to purchase travel tickets, to arrange
hotel reservations, to trade stock, to find a telephone number The following definitions may be helpful in understand- or extension, among many other useful applications, ing the background of the invention as it relates to the As an example, consider a system for use in providing
invention and the discussion outlined below. 35 information about commercial passenger air flights. A caller
. to the system might say "I want to fly from Boston to San
Confidence: a measure of a degree of certainty that a Francisco, tomorrow." This exemplary system requires three system has accurately identified input language. In the pieces of information to provide information about relevant preferred embodiment, it is a measure of the degree of air flights lncluding the orlgln citV; the destination city and
perceived acoustic similarity between input speech and 4Q the tlme of travel other systems could require more or less an acoustic model of the speech. information to complete these tasks depending upon the
Phrase: a sequence of words. goals of the system. While the exemplary system also uses
Example: "from Boston" a speech recognizer to understand the supplied spoken
Grammar rule: a specification of a set of phrases, plus natural language, it could also receive the natural language
meaning of those phrases 45 via other means such as from typed input, or using handExample: (from [(boston ? massachusetts)(dallas? wn reco8m lon
Using a predetermined grammar with a set of grammar
r, . ut , . „ ut , . rules, such a system parses the sentence into edges. Each
Generates: from boston , from boston ,' 3 f , , , . f- r
,, , ^ „ ,,r , „ „ ,,r , „ „ edge represents a particular needed piece or set of lnlorma
Massachusetts , from dallas , from dallas texas ^. ,TM , , , , ,
50 tion. Ihe sentence can be represented by a parse tree as
Grammar: a set of grammar rules. shown in the definitions above.
Edge: a match located by a parser of a grammar rule In a parsing operation, the system performs the parsing
against a phrase contained in an input sentence. operation by matching grammar rules to the natural lan
Example: From the sentence "I want to fly from Boston guage input. For example, one grammar rule that can specify
to Dallas," a parser could create an edge for the 55 than an origin expression is the word "from" or the phrase
phrase "from Boston" using the grammar rule shown "out of" followed by a city name. If the natural language
above. input is "I want to fly from Boston to Dallas:, the system will
Slot: a predetermined unit of information identified by a locate ^ Phrase "from Boston" and create a record in its
, , • , r ,■ r .i internal data structures that these words match the origin
natural language interpreter from a portion of the . , _ . , . . „ ,
, , . ^ _ , . if , 60 expression grammar rules. 1ms record is sometimes referred
natural language input. For example, from the phrase . r J O» iir J * • J
,,. _ ,f , , , . . , to as an edge. Systems look tor predetermined grammars
from Boston the natural language interpreter might ... . .. c , , , r^, c
, . , , ., , within collections of natural language. 1 tie system performs
determine that the origin slot is to be filled with the me parsing operation in accordance with the grammar as a
value "BOS" (the international airport code for way of forming/fllling the desired edges with information
Boston). 65 from a natural language input. For example, the natural
Parse tree: a set of edges used in constructing a meaning language interpreter identifies the initial city by seeking any
for an entire sentence. of several origin city words such as <'from', 'starting',
'leaving', 'beginning',. . . >related to a city name from a list of cities. If the natural language interpreter finds an origin city and a city from the list, it will then fill the origin city edge. Similarly, the natural language interpreter identifies the destination city by seeking any of several destination city 5 words such as <'to', 'ending', 'arriving', 'finishing', . . . >related to a city name from the list of cities. If the natural language interpreter finds a destination city and a predefined city, it will then fill the destination city edge. The grammar for the natural language interpreter similarly identifies the desired time of the flight by seeking any of several time words such as <'o'clock', 'morning', 'afternoon', 'a.m.', 'p.m.', 'January', 'February', . . . , 'Monday', 'Tuesday', . . . >related to a number. Using this technique, the natural language interpreter can interpret spoken utterances if they contain the requisite information, regardless of the ordering of the sentence. Thus, the sentence listed above as "I want to fly from Boston to San Francisco, tomorrow," will provide the same result as the sentence, "Please book a flight to San Francisco, for flying tomorrow, from Boston." 2Q
If the natural language interpreter is unable to identify the appropriate words, or a related city name, then the parsing will be terminated as unsuccessful. For example, if the caller says, "I want to fly to visit my mother," the parsing will be unsuccessful. There is no source city word nor source city in 25 the sentence. Further, even though the natural language interpreter finds a destination city word, it cannot find a city name that it recognizes.
For a natural language interpreter used in conjunction with a speech recognition system, the natural language 30 interpreter is provided the speech recognizer's best determination of each word resulting from the recognition operation. A speech recognizer 'listens' to a user's spoken words, determines what those words are and presents those words in a machine format to the natural language interpreter. As 35 part of the recognition operation, each word is provided a word confidence score which represents the confidence associated with each such word that the speech recognizer has for the accuracy of its recognition. Thus, it is generally considered useful to take into account the accent or speech 40 patterns of a wide variety of users. A score is generated and associated with each word in the recognition step. Using the scores for each individual word is not entirely satisfactory because that collection of scores does not relate to the meaning the speaker intends to convey. If a single word has 45 a very low word confidence score, the user may be required to re-enter the request.
In one prior approach, the scores for each of the words are combined into a single composite confidence score for the entire sentence. While this approach solves certain problems 50 associated with using the scores for each word and provides a workable solution, it suffers from several drawbacks.
The composite confidence score described above is weighted by all the words in the entire sentence. In a long sentence, a speaker might use many words that are in 55 essence unrelated to providing the information that the natural language interpreter needs. For example, if the speaker says, "Please help me to arrange a flight tomorrow to visit my friend for their birthday celebration leaving from Minneapolis and arriving in Cleveland." In this example, 60 assume that the speaker talks clearly, so that almost every word has a very high confidence score. Aloud background noise occurs during the speaking of the words "Minneapolis" and "Cleveland" so that the confidence score for those two words is low. In fact the speech recognizer incorrectly 65 recognizes one of the words. Nevertheless, because the composite confidence score is high for the entire sentence,
the recognition for the sentence is accepted. Thus, the natural language interpreter instructs the system to find the wrong flight information.
On the other hand, even if the critical information is all properly recognized, if the composite confidence score is low, the entire sentence is rejected. For example, the speaker says, "I want to fly tomorrow from Chicago to Phoenix." In this example also assume that the speaker talks clearly. The speech recognizer properly identifies the words, "tomorrow from Chicago to Phoenix." A loud background noise occurs during the speaking of the words, "I want to fly." Those words have a low confidence score. Thus, even though the critical information is accurately recognized this sentence is rejected because the composite confidence score for the sentence is low. Because of the operation of prior systems, the information is rejected and the user is required to again provide all the information. This is inconvenient for the user.
As can be seen, use of a composite confidence score for the entire sentence results in an operation of the system which is not optimal. Under one set of conditions described above, the system will attempt to utilize incorrect information in carrying out its task. Under the other set of conditions described, the system will require the user to re-provide all of the information, i.e., repeat the entire sentence. One scenario provides an incorrect result, the other is inconvenient to the user.
What is needed is a natural language interpreter for use in conjunction with a speech recognizer which provides more accurate results. What is further needed is a natural language interpreter for use in conjunction with a speech recognizer which does not require a user to re-enter information that was correctly received.
SUMMARY OF THE INVENTION
According to the present invention, a stream of input speech is coupled as an input to a speech recognizer. The speech can be provided to the speech recognizer directly from a user or first stored and provided from a memory circuit. Each input word is recognized by the speech recognizer and a word confidence score is associated to with each corresponding recognized word. The recognized words and their associated word confidence scores are provided to a natural language interpreter which parses the stream of recognized words into predetermined edges. From the edges, the natural language interpreter forms semantic slots which represent a semantic meaning. A slot confidence score is determined for each slot. The slot confidence score is used as a basis for determining the confidence to be placed into a particular predetermined semantic meaning. Based upon the slot confidence score, an ancillary application program determines whether to accept the words used to fill each slot. If the slot is rejected, the application program can request the user to repeat the information necessary to fill that slot only, rather than requiring the user to repeat the entire stream of input speech.
While the invention is described in terms of edges and slot confidence scores, it will be apparent to one of ordinary skill in the art that the essence of using slots is to provide a confidence score for a particular meaning. There are a variety of ways that this can be achieved. Nevertheless, the remainder of this patent document will discuss the invention in terms of slot confidence scores. It will be understood that this could be interpreted to mean any semantic meaning.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows a flow chart of the method of the preferred embodiment.
« ZurückWeiter » |