WO2017107518A1 - Method and apparatus for parsing voice content - Google Patents

Method and apparatus for parsing voice content Download PDF

Info

Publication number
WO2017107518A1
WO2017107518A1 PCT/CN2016/096186 CN2016096186W WO2017107518A1 WO 2017107518 A1 WO2017107518 A1 WO 2017107518A1 CN 2016096186 W CN2016096186 W CN 2016096186W WO 2017107518 A1 WO2017107518 A1 WO 2017107518A1
Authority
WO
WIPO (PCT)
Prior art keywords
phrase
word
corpus
probability
voice content
Prior art date
Application number
PCT/CN2016/096186
Other languages
French (fr)
Chinese (zh)
Inventor
周蕾蕾
Original Assignee
乐视控股(北京)有限公司
乐视致新电子科技(天津)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐视控股(北京)有限公司, 乐视致新电子科技(天津)有限公司 filed Critical 乐视控股(北京)有限公司
Publication of WO2017107518A1 publication Critical patent/WO2017107518A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique

Definitions

  • the present application relates to the field of information processing, and in particular, to a method and apparatus for analyzing voice content.
  • Natural language processing technology can help people to communicate better with the machine.
  • the voice recognition module in the computer recognizes the voice content sent by the user, and parses the voice content to obtain the semantics corresponding to the voice content.
  • the computer performs related operations based on the parsed semantics.
  • the general method for the machine to parse the voice content sent by the user is: the first step: establish a language model, usually before the language model is established, it is necessary to artificially mark some commonly used corpora, for example, the user is directed to "I want to see Andy Lau's "The concert” is marked with the corpus, in which "I” can mark adult pronouns, "Andy Lau” is marked as a star name, etc., and then the words in the corpus are classified according to the content of the mark, for example, the personal pronoun is a class, star The name is a class, etc., complete the classification of the phrase, that is, complete the establishment of the language model; the second step: according to the phrase in the established language model, the word content of the user input is cut, usually using CRF (Conditional Random Field)
  • CRF Consumer Random Field
  • the corpus can be cut into "what / time / have / Andy Lau / concert / concert", or cut into "what / time / have /Andy Lau / / singing / meeting", because the language model has two phrases “singing" and "concert", in this case, we must compare the probability of the two words appearing in the corpus, for example, " The singer has a higher probability of appearing in the corpus than the "concert", so the above corpus is preferentially cut into "what / time / have / Andy Lau / / singing / meeting”; the third step: the cut phrase and The grammar files in the machine are matched to resolve the semantics of the user's voice content, and BNF (Backus-Naur Form) is a grammar frequently used by users.
  • BNF Backus-Naur Form
  • the voice content input by the user for example, in the above example, if "singing" has a greater probability of appearing in the corpus than "song", then the above corpus is preferentially cut into "what/time/have/Andy Lau// Singing/meeting, obviously, this does not match the semantics of the voice content sent by the user.
  • the embodiment of the present application provides a method and apparatus for parsing voice content, which is used to solve the problem that a machine error parses a user-entered voice content due to a small number of corpora in a specific domain when a language model is established.
  • An embodiment of the present application provides a method for parsing voice content, the method comprising: combining a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, according to the first word dictionary in the machine
  • the stored corpus performs a word segmentation to obtain a phrase in the corpus; statistics a probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjusts the probability or frequency according to a predetermined rule to make a specific The probability or frequency of occurrence of a phrase in the field in a phrase in the corpus; combining a phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and according to the second slice
  • the word dictionary performs word-cutting on the voice content sent by the user, obtains a phrase in the voice content, and parses the phrase in the voice content according to the grammar file to obtain corresponding semantics.
  • the combining the phrase in the specific domain and the phrase in the non-specific domain to generate the first word dictionary comprises:
  • a preset number of phrases are selected from the phrases of the specific domain in the corpus, and the selected phrases are combined with the phrases in the non-specific domain to generate a first word dictionary.
  • the translating the specific content of the voice content sent by the user according to the second word-cut dictionary include:
  • the speech content sent by the user is respectively cut using the method of the backward maximum cut word and the forward minimum cut word, and if the two types of word-cutting methods are different, the The second word dictionary searches for the probability or frequency corresponding to the different phrases, and selects a phrase with a large probability or frequency as the final word segment.
  • the second word dictionary includes:
  • the address area the guiding machine searches for a position of a phrase in the voice content after the word-cut sent by the user in the second word-cut dictionary;
  • the phrase area stores a corresponding phrase in the address area.
  • the parsing the phrases in the voice content according to the grammar file specifically includes:
  • the keyword matching specifically includes:
  • phrase of the specific domain comprises at least one of the following:
  • An apparatus for parsing voice content comprising: a combining unit, a statistic unit, a word cutting unit, and a parsing unit;
  • the combining unit is configured to combine a phrase in a specific domain and a phrase in a non-specific domain to generate a first word dictionary, and perform a word cutting on the corpus stored in the machine according to the first word dictionary to obtain the corpus Phrase in
  • the statistical unit is configured to count a probability or frequency of occurrence of each phrase in the corpus in a phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that a phrase in a specific domain is in the corpus The probability or frequency of occurrence in the phrase in the phrase increases;
  • the word-cutting unit is configured to combine the phrase in the corpus with the adjusted probability or frequency to generate a second word-cut dictionary, and perform word-cutting on the voice content sent by the user according to the second word-cut dictionary. Obtaining a phrase in the voice content;
  • the parsing unit is configured to parse a phrase in the voice content according to a grammar file to obtain a corresponding semantic.
  • the combining unit includes: a word subunit, a statistical subunit, and a combined subunit; wherein
  • the word-cutting unit configured to perform a word-cutting on a corpus stored by the machine according to a phrase of a specific domain, to obtain a phrase of a specific domain in the corpus;
  • the statistical subunit is configured to count a probability or a frequency of occurrence of a phrase of each specific domain in the corpus in a phrase in a specific domain in the corpus;
  • the combining subunit configured to select a preset number of phrases from a phrase of a specific domain in the corpus according to the ranking of the probability or frequency, and combine the selected phrase with a phrase in a non-specific domain to generate The first word dictionary.
  • the word-cutting unit comprises:
  • a combination subunit configured to combine the phrase in the corpus and the adjusted probability or frequency to generate a second word dictionary
  • the word-cutting unit is configured to perform a word-cutting on the voice content sent by the user according to the second word-cutting dictionary, using a backward maximum cut word and a forward minimum cut word;
  • the finding subunit is configured to search for a probability or a frequency corresponding to the different phrase in the second word dictionary, when the phrases obtained by the two word-cutting methods are different, and select a probability or a frequency with a large frequency
  • the phrase is the final word of the word.
  • the embodiment of the present application provides an electronic device, including the device for parsing voice content according to any of the foregoing embodiments.
  • the embodiment of the present application provides a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium can store computer instructions, which can implement the parsing of voice content provided by the embodiments of the present application. Some or all of the steps in the various implementations of the method.
  • An embodiment of the present application provides an electronic device, including: one or more processors; and a memory; Wherein the memory stores instructions executable by the one or more processors, the instructions being configured to perform the method of parsing the voice content of any of the above-described embodiments of the present application.
  • An embodiment of the present application provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to perform the method for analyzing voice content according to any of the above embodiments of the present application.
  • the probability or frequency of occurrence of each phrase in all phrases in the stored corpus in the machine is increased, thereby Improve the accuracy of the machine's semantics of parsing user speech content.
  • FIG. 1 is a schematic flowchart of a method for parsing voice content according to Embodiment 1 of the present application;
  • FIG. 2 is a schematic flowchart of a language model adaptation according to Embodiment 1 of the present application;
  • FIG. 3 is a schematic diagram of an address area portion in a second word dictionary provided by Embodiment 1 of the present application;
  • FIG. 4 is a schematic diagram of a portion of a phrase region in a second word dictionary provided by Embodiment 1 of the present application;
  • FIG. 5 is a schematic flowchart of a method for cutting a user voice content by using a joint manner of a backward maximum cut word and a forward minimum cut word according to Embodiment 1 of the present application;
  • FIG. 6 is a schematic diagram of a syntax prepared by using a syntax tree according to Embodiment 1 of the present application.
  • FIG. 7 is a schematic flowchart of a method for matching voice content sent by a user according to a grammar file according to Embodiment 1 of the present application;
  • FIG. 8 is a schematic flowchart of a complete method for parsing voice content according to Embodiment 1 of the present application.
  • FIG. 9 is a schematic structural diagram of an apparatus for analyzing voice content according to Embodiment 2 of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the embodiment of the present application provides a method and apparatus for analyzing voice content, which are used to solve a domain-specific corpus when building a language model. There are few problems that cause machine errors to resolve the voice content entered by the user.
  • FIG. 1 is a schematic flowchart of a method for parsing voice content according to an embodiment of the present application. The method is as follows:
  • Step 11 Combine the phrase in the specific domain with the phrase in the non-specific domain to generate a first word dictionary, and perform a word segmentation on the corpus stored in the machine according to the first word dictionary to obtain a phrase in the corpus.
  • the dictionary of a specific domain is screened, and the dictionary of these specific fields is combined to generate a full dictionary, for example, a phrase in a field such as a computer, a machine, or an entertainment is combined into a specific domain dictionary, and a specific domain dictionary is selected.
  • a full dictionary As a full dictionary; then CRF word-cutting according to the phrase in the full dictionary to the stored corpus in the machine (as in step 21 of FIG. 2), obtaining a phrase in a specific field in the corpus; and then counting each of the specific fields
  • the probability or frequency of occurrence of the phrase in all the specific domain phrases in the obtained corpus, and according to the probability or frequency ranking, the phrase is selected as a dynamic dictionary according to the preset number (step 22 of FIG.
  • phrases in a non-specific field can include personal pronouns, such as you, me, him, etc.; phrases in non-specific fields are also Can include common verbs, such as playing, thinking, wanting, taking, etc.
  • the phrase here contains both phrases in a specific field and phrases in a non-specific field.
  • the corpus stored in the machine is: I want to see Andy Lau's concert, and then CRF cut the corpus according to the first word dictionary.
  • the phrase in the first word dictionary is: I, think, see , want to see, Andy Lau, the concert, then according to the phrase in the first word dictionary can cut the corpus into: I / think / see / Andy Lau / / concert, or cut into: I / want to see / Andy Lau / / Concert, then you need to compare the probability or frequency of the words "think” and "want to see” in the corpus. If the probability or frequency of the latter is greater, then "I want to see , Andy Lau, the singer "These phrases as training corpus language model.
  • Step 12 Statistics the probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that the phrase in the specific domain is in the phrase in the corpus The probability or frequency of occurrence increases.
  • step 11 the training corpus in the language model is obtained, that is, the phrases in all the corpora in the machine are obtained.
  • the language model needs to be trained (step 25 in Fig. 2), which can be performed by using the SRILM tool.
  • the training of the language model may include, but is not limited to, the probability or frequency of occurrence of each phrase in all corpora of the statistical machine in all phrases.
  • the SRILM language model training tool is only an exemplary description, and may also be other training methods. Specifically limited.
  • the user After training the language model, the user needs to test the training results. For example, to check the probability of occurrence of each phrase.
  • the probability of occurrence of each phrase it may be found that the phrases in some specific corpora are often appearing in the corpus, but relatively non-specific. Some of the similar phrases in the field have a low probability of occurrence, so that when the relevant corpus is cut, the phrases in these specific corpora may be annihilated by some other similar phrases, causing the word-cutting error, so that the machine cannot correctly resolve the user's Voice content.
  • the probability of occurrence of each particular domain phrase is divided by Psum2 to obtain P2; finally P1 is multiplied by the weight coefficient k1, P2 is multiplied by the weight coefficient k2, respectively, to obtain the non-specific domain phrase respectively.
  • the final probability that each phrase in a particular domain phrase appears in all corpora where the user can set the values of k1 and k2 according to individual needs, but the sum of k1 and k2 is 1, and to be in the dynamic dictionary
  • the final probability of occurrence of each phrase is greater than the final probability of occurrence of each phrase in a non-specific field, and the weighting factor k1 is smaller than Psum1.
  • Step 13 Combine the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and perform a word cut on the voice content sent by the user according to the second word dictionary to obtain the voice.
  • the phrase in the content is
  • the adaptive process of the language model is completed (step 26 of Figure 2), at which time the machine can output the adaptive language model (step 27 of Figure 2). ).
  • the adaptive language model has both the phrases obtained after training and the probability and frequency of redistribution corresponding to each phrase. Then, the adaptive language model needs to be converted into a second word dictionary.
  • the structure of the second word dictionary has many kinds. The main purpose of the second word dictionary is to help the machine send the voice to the user faster and more accurately. The content is cut.
  • the structure of one of the second word-cutting dictionaries is exemplarily described, and the structure of the second word-cutting dictionary includes two parts, an address area and a phrase area.
  • the address information in the address area helps the machine find the corresponding position of the phrase in the second word dictionary according to the phrase after the user cuts the word; the phrase stored in the phrase area is the phrase corresponding to the address area.
  • the address area may include 10 Arabic numerals (ie, 0 to 9), 26 uppercase letters or lowercase letters (ie, A to Z or a to z), and address information corresponding to the commonly used Chinese characters.
  • the numbers and letters are in full-width format, and each number or letter itself occupies two bytes.
  • the address information corresponding to each number, letter or Chinese character occupies four bytes, which is assumed to be commonly used in the second word dictionary.
  • There are 6768 Chinese characters, and the address information corresponding to numbers, letters and Chinese characters is shared (10+26+6768)*4 27216. If the first address is uniDict, the first address of the phrase area is uniDict+27216, as shown in Figure 3.
  • the first address of the phrase area is uniDict+27216, the address holds the phrase with the number “0”; the address corresponding to the letter area “A” is uniDict+40, and the address is saved with the letter “A”.
  • FIG. 4 is a schematic diagram of a phrase in a phrase area: the first address corresponding to “0” is uniDict+27216, and can be seen as “0”.
  • the phrase that is the first word can be "05 mm”. If the user wants to find a phrase with the first word of "0”, look down from the first address uniDict+27216 until the guard mark is encountered.
  • the guard mark here refers to The phrase with the "0" as the first word in the second word dictionary has reached the last one.
  • the first word may not be stored in the phrase portion. For example, "05 mm" shown in Fig. 4 is stored in the dictionary as "5 mm".
  • Wordlen indicates the length of the phrase
  • the second word dictionary can include numbers, letters and Chinese characters, which can improve the accuracy of the machine to parse the semantics of the user's voice content.
  • the voice content input by the user is "when to play Journey 2”
  • the word is cut In the dictionary, only "Journey to the West", without the number "2”, may cut the above voice content into "what / time / play / Journey to the West / ah", which may lead to machine parsing errors.
  • the manner in which the voice content sent by the user is cut according to the second word-cutting dictionary is very different.
  • the word can be cut in the manner of the backward maximum cut word, or the forward minimum cut word can be used.
  • first search for the phrase in front of the voice content for example, first search for the word "juvenile” in the word dictionary, and find the corresponding phrase, then the phrase after the "boy” Searching, that is, searching for "Baoqing", and finding that there is no corresponding phrase in the word dictionary, then re-searching for one more word, that is, searching for "Baoqingtian", using the same method to finally complete the cutting of the voice content. word.
  • the words of the user's voice content After cutting the words of the user's voice content by using the above-mentioned backward maximum cut word and forward minimum cut word combination, if the obtained word results are different, that is, the obtained phrases are different, the words are compared by comparing different phrases.
  • the probability or frequency in the dictionary determines the final result of the word. As shown in Figure 5, if the speech content of the "Juvenile Bao Qingtian broadcasts on the TV" is the maximum backward word and the forward minimum word, the result of the backward maximum word is "Junior/Bag".
  • Step 14 Parse the phrases in the voice content according to the grammar file to obtain corresponding semantics.
  • BNF grammar As an example.
  • the basic rules of BNF grammar include but are not limited to the following aspects:
  • the content contained in the content is optional, indicating that its content can be skipped;
  • &keyword(textFrag,key,defaultValue,showValue) This function is used to extract the keywords of the input text.
  • the function of the above function is illustrated.
  • the function defined in the machine is: &keyword(Beijing
  • the specific value is the defaultValue of the function.
  • the defaultValue here is "local”; then the "tomorrow” entered by the user is The keyword in the function "&keyword” is matched with “tomorrow” in the function. Because showValue is not defined in the function, the time entered by the user is directly assigned. "Tomorrow”; finally, the "rain” input by the user is matched with the keyword in "&keyword” (the rain, snow, weather, undefined, weather), and the “rain” in the function Successful match, because the function defines showValue, and the showValue of the function is "weather", so the "rain” input by the user is replaced with "weather”.
  • the machine matches the "when it rains tomorrow" input by the user into "local tomorrow weather” and performs related operations.
  • the order in which the content input by the user is matched in the above example is only an exemplary description.
  • the matching order is not specifically limited herein.
  • the words “tomorrow” and “rain” are input by the user. In order, you can match “Tomorrow” first, or match “rain” first, or you can match both words at the same time.
  • &duplicate(TextFrag,least,most) This function indicates that the TextFrag is repeated m times.
  • the value range of m is: least ⁇ n ⁇ most, for example, the definition function: &duplicate(TextFrag,1,3), the output content at this time Is: TextFrag[TextFrag][TextFrag];
  • &comb(textFrag1, textFrag2,...,textFragN) This function indicates that the syntax fragments TextFrag1, TextFrag2, ..., textFragN are arranged and combined, for example, the definition function: &comb(Text Frag1, TextFrag2); the output content is: (TextFrag1TextFrag2 )
  • the grammar file is parsed: the name of the grammar file is "video on demand", and the grammar file has three keywords: type, movie, and year. Specifically, if the text content "plays the 2002 film infernal" for the text content, the defined grammar file can be:
  • ⁇ category list> &keyword (movie
  • each grammar is written in the form of a grammar tree, and finally a grammar file is written in the form of a "grammar forest.”
  • the syntax tree written is as shown in Fig. 6: in the first level of the syntax tree, the file name is displayed: "video on demand”; second In the level, there are four parts: the first part is “play”, the second part is “year”, the third part is “of”, the fourth part is “film list” and “category list”, among them, “video list” It can be a movie or a TV show.
  • the machine can match the voice content sent by the user according to the grammar file, and the matching manner includes two types: full matching and keyword matching.
  • the schematic diagram of the specific matching process is as shown in FIG. 7: firstly, the voice content input by the user is fully matched according to the grammar file (step 71 in FIG. 7), where the voice content is the voice content after the word is cut; the matching result is judged. (Step 72 in Figure 7), if the full match is successful, the matching result is printed (as in step 73 in Figure 7); if the full match fails, the keyword matching is performed (step 74 in Figure 7), specifically It means: searching for the corresponding keyword from the keyword list in the grammar file, and if the matching is successful, printing the matching result.
  • the voice content input by the user is “I want to play the 2002 infernal movie”
  • the machine converts the voice content into corresponding text content, and cuts the text content.
  • the word, the result of the word cut is "I want / play / 2002 / / movie / infernal”.
  • "I want" in the grammar file does not have the corresponding word to cover, that is, the full match fails; then the keyword matching, as follows:
  • Keyword matching process as long as the keywords in the input text can match the keywords in the keyword list in the grammar file, so as to match the full match
  • keyword matching is more flexible, and the constraints on the input text content are smaller, which improves the probability of successful matching.
  • the grammar is as comprehensive as possible. Here you can write examples in the written grammar rules. The specific process is: first design the user scene; then write the example sentence; finally, cover the example sentence according to the written grammar.
  • the keywords should be clear, which is convenient for the machine to perform keyword matching.
  • the grammar fragment in the grammar file is "[today][[][[]][weather]"
  • the grammar fragment can cover "/ /Weather" text content, obviously, the text content does not conform to human language habits, and this serious overproduction will reduce the advantages of the grammar file structure.
  • the grammar file can be split into several sub-entries.
  • the grammar grammar fragment described above it can be written as: the first-level sub-entries are: “[today][of] ⁇ >[] [Weather]”; the second sub-entries are: “[Today][Guangzhou][[][Weather]”, the third-level subentry is: “[Today][[][Guangzhou][Weather]", so It is possible to reduce the overproduction in the grammar file.
  • the phrase in the grammar file is as close as possible to the phrase in the word dictionary, which makes the machine more accurately parse the user's voice content. For example, "I want to know” can be cut into “I want / know” according to the word dictionary, and the phrase in the grammar file should be consistent, which can be "[I want] ⁇ know>” instead of "[I ][Think] ⁇ Know> and so on.
  • the voice content sent by the user is "I want to make a call”
  • the machine may cut the voice content into "I want/play/telephone”, at this time although the machine There is an error in the word cut, but the grammar file should be parsed according to the "call” method, which can reduce the machine's parsing error due to the wording error.
  • the grammar file When the grammar file is written in the syntax tree, at least one mandatory option is included in the root node, otherwise the input text is overwritten by the syntax, causing the machine to parse the error.
  • the grammar fragment in the grammar file is "[today][[][[]][weather]", because the phrases in the grammar fragment are all optional, if the user input the voice content is "Today's Shanghai The weather, this time can also match the phrases in the grammar file, obviously, this will lead to machine parsing errors.
  • the method for analyzing the voice content is performed.
  • the description of the system is as shown in FIG. 8: the first step: the adaptive process of the language model (step 81 in FIG. 8), specifically: adjusting each phrase in the corpus in the phrase in the corpus The probability or frequency of occurrence increases the probability and frequency of occurrence of phrases in a particular domain in a phrase in the corpus of the machine; the second step: cutting the speech content sent by the user according to the word-cut dictionary (as in Figure 8) Step 82); Step 3: Perform a full match on the voice content after the word cut according to the grammar file (step 83 in FIG. 8), at which time the machine determines whether the full match is successful (step 84 in FIG.
  • step 8 if the matching is successful, the matching result is printed (such as step 85 in FIG. 8), where the grammar file can be in the form of a syntax tree; and the fourth step: if the full matching fails, the keyword matching is performed (step 86 in FIG. 8). ), the matching result is printed after the keyword matching is successful.
  • the process of completing the matching is the process of the machine parsing the user's voice content.
  • the word dictionary in the embodiment of the present application includes an address area and a phrase area, and the first word partition is used in the phrase area, so that the machine can quickly find the position of the corresponding phrase in the word dictionary.
  • the phrases in the phrase area contain numbers, letters, and Chinese characters, increasing the accuracy of the machine's ability to resolve the semantics of the user's voice content.
  • the embodiment of the present application expands on the basis of the existing BNF grammar rules, and provides the writing skills of the grammar rules, improves the readability of the grammar file, and improves the semantic accuracy of the machine parsing the user's voice content. .
  • the non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • Embodiment 1 A method for parsing voice content is provided in Embodiment 1.
  • the embodiment of the present application provides an apparatus for parsing voice content, which is used to improve the accuracy of semantic analysis of semantics in a user's voice content.
  • An apparatus for parsing voice content comprising: a combining unit 91, a statistic unit 92, a word cutting unit 93, and a parsing unit 94;
  • the combining unit 91 may be configured to combine a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, and perform a word cutting on the corpus stored in the machine according to the first word dictionary to obtain the a phrase in a corpus;
  • the statistic unit 92 may be configured to count the probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that the phrase in the specific domain is in the corpus The probability or frequency of occurrence in the phrase in the phrase increases;
  • the word unit 93 may be configured to combine the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and perform word cutting on the voice content sent by the user according to the second word dictionary. Obtaining a phrase in the voice content;
  • the parsing unit 94 is configured to parse the phrases in the voice content according to the grammar file to obtain corresponding semantics.
  • the working process of the above device embodiment is: first step: the combining unit 91 combines the phrase in the specific domain with the phrase in the non-specific domain to generate a first word dictionary, and stores the same in the machine according to the first word dictionary.
  • the corpus performs a word cut to obtain a phrase in the corpus;
  • the second step the statistic unit 92 counts the probability or frequency of occurrence of each phrase in the corpus in the corpus, and adjusts the probability or frequency according to a predetermined rule, so that the specific The probability or frequency of occurrence of a phrase in the field in a phrase in the corpus increases;
  • a third step: the word segmentation unit 93 combines the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and according to the phrase
  • the second word-cut dictionary performs word-cutting on the voice content sent by the user to obtain a phrase in the voice content.
  • the fourth step: the parsing unit 94 parses the phrase in the voice content according to the grammar file to obtain
  • the combining unit 91 includes: a word subunit, a statistical subunit, and a combined subunit. ;among them,
  • a word subunit which can be used to perform a word segmentation on a corpus stored by the machine according to a phrase of a specific domain, to obtain a phrase of a specific domain in the corpus; and obtain an artificial markup method according to the prior art.
  • a statistical subunit which can be used to count the probability or frequency of occurrence of a phrase of each particular domain in the corpus in a particular domain of the corpus;
  • the combining subunit may be configured to select a preset number of phrases from a phrase of a specific domain in the corpus according to the ranking of the probability or frequency, and combine the selected phrase with a phrase in a non-specific domain to generate a first All word dictionary. Select the phrase with the highest probability or frequency of the phrase in a specific field, and generate the first word dictionary from the phrases that often appear in the corpus, which can improve the efficiency of machine word cutting.
  • the word-cutting unit 93 includes:
  • Combining a subunit, the phrase in the corpus and the adjusted probability or frequency are combined to generate a second word dictionary; wherein each phrase in the corpus is adjusted to appear in a phrase in the corpus
  • the probability or frequency increases the probability and frequency of occurrences of phrases in a particular domain in a phrase in the corpus of the machine, thereby increasing the accuracy of the machine's semantics of parsing the user's speech content.
  • the word segmentation unit may be configured to perform a word segmentation on the voice content sent by the user according to the second word dictionary, using a backward maximum cut word and a forward minimum cut word;
  • the search subunit may be configured to search for a probability or a frequency corresponding to the different phrase in the second word dictionary when the phrases obtained by the two word-cutting methods are different, and select a phrase with a large probability or a frequency As the final word of the word.
  • the above-mentioned word segmentation unit and the search sub-unit are used to cut the word content of the user by using a word-cutting method in which the backward maximum word segmentation and the forward minimum word segmentation are combined, so that the result of the word segmentation is more accurate.
  • an electronic device including the device for parsing voice content according to any of the foregoing embodiments.
  • a non-transitory computer readable storage medium is also provided, the non-transitory computer readable storage medium storing computer executable instructions executable by any of the above methods The method of parsing voice content in the example.
  • FIG. 10 is a hardware node of an electronic device for performing a method for parsing voice content according to an embodiment of the present application.
  • the schematic diagram, as shown in FIG. 10, includes:
  • processors 1010 and a memory 1020 are illustrated by one processor 1010 in FIG.
  • the apparatus that performs the method of parsing the voice content may further include: an input device 1030 and an output device 1040.
  • the processor 1010, the memory 1020, the input device 1030, and the output device 1040 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
  • the memory 1020 is used as a non-transitory computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer executable program, and a module, such as a program corresponding to the method for parsing voice content in the embodiment of the present application.
  • An instruction/module for example, the combination unit 91, the statistical unit 92, the word-cutting unit 93, and the parsing unit 94 shown in FIG. 9).
  • the processor 1010 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 1020, that is, a method of parsing the voice content by the above method embodiments.
  • the memory 1020 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the device that parses the voice content, and the like.
  • memory 1020 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 1020 can optionally include memory remotely disposed relative to processor 1010, which can be connected to a device that parses the voice content over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Input device 1030 can receive input numeric or character information, as well as generate key signal inputs related to user settings and function control of the device that parses the voice content.
  • the output device 1040 can include a display device such as a display screen.
  • the one or more modules are stored in the memory 1020, and when executed by the one or more processors 1010, perform the method of parsing voice content in any of the above method embodiments.
  • the electronic device of the embodiment of the present application exists in various forms, including but not limited to:
  • Mobile communication devices These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access.
  • Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
  • Portable entertainment devices These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.
  • the server consists of a processor, a hard disk, a memory, a system bus, etc.
  • the server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability High reliability in terms of reliability, security, scalability, and manageability.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Abstract

A method and apparatus for parsing a voice content. The method comprises: generating a first word segmentation dictionary by combining a word group in a specified field with a word group in a non-specified field, and performing word segmentation on a corpus stored in a machine according to the first word segmentation dictionary to obtain a word group in the corpus (11); making statistics, in the corpus, on the probability or frequency of occurrence of each word group in the word group in the corpus, and adjusting the probability or frequency according to a pre-determined rule so that the probability or frequency of occurrence of the word group in the specified field in the word group in the corpus increases (12); generating a second word segmentation dictionary by combining the word group in the corpus with the adjusted probability or frequency, and performing word segmentation on a voice content sent by a user according to the second word segmentation dictionary to obtain a word group in the voice content (13); and parsing the word group in the voice content according to a grammar file to obtain a corresponding semanteme (14). By means of the method, the probability of occurrence of a word group in a specified field in all word groups in a machine increases, thereby improving the accuracy rate of the machine parsing a semanteme of a voice content.

Description

一种解析语音内容的方法及装置Method and device for analyzing voice content
本申请要求于2015年12月25日提交中国专利局、申请号为201510995231.5、申请名称为“一种解析语音内容的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201510995231.5, filed on Dec. 25, 2015, the entire disclosure of which is incorporated herein by reference. In the application.
技术领域Technical field
本申请涉及信息处理领域,尤其涉及一种解析语音内容的方法及装置。The present application relates to the field of information processing, and in particular, to a method and apparatus for analyzing voice content.
背景技术Background technique
自然语言处理技术可以帮助人与机器之间更好的交流,例如,计算机中的语音识别模块在识别用户发出的语音内容后,并对该语音内容进行解析,获得该语音内容对应的语义,最后计算机根据解析出的语义执行相关操作。Natural language processing technology can help people to communicate better with the machine. For example, the voice recognition module in the computer recognizes the voice content sent by the user, and parses the voice content to obtain the semantics corresponding to the voice content. Finally, The computer performs related operations based on the parsed semantics.
目前,机器解析用户发送的语音内容的一般方法是:第一步:建立语言模型,通常在建立语言模型前,需要人为地对常用的一些语料进行标记,例如,用户针对“我想看刘德华的演唱会”这句语料进行标记,其中,“我”可以标记成人称代词,“刘德华”标记成明星名字等,然后根据标记的内容对语料中的词组进行分类,例如人称代词为一类,明星名字为一类等,完成词组的分类,即完成语言模型的建立;第二步:根据建立的语言模型中的词组对用户输入的语音内容进行切词,通常采用CRF(Conditional Random Field)切词方法,例如,用户向计算机输入的语音内容为“什么时候有刘德华的演唱会”,这时计算机根据语音模型中的词组对这句语料进行切词,例如,假如在语言模型中明星名字类别里有“刘德华”一词,在动词类别里有“演唱”一词,在名词类别里有“时候”和“演唱会”对应的词组等,根据这些词组可以将该语料切成“什么/时候/有/刘德华/的/演唱会”,也可以切成“什么/时候/有/刘德华/的/演唱/会”,因为语言模型中有“演唱”和“演唱会”这两个词组,在这种情况下,就要比较这两词在语料中出现的概率,例如,“演唱”比“演唱会”在语料中出现的概率大,那么就优先的将上述语料切成“什么/时候/有/刘德华/的/演唱/会”;第三步:将切好的词组与机器中的语法文件进行匹配,从而解析出用户发送语音内容的语义,其中BNF(Backus-Naur Form)是用户经常使用的语法。 At present, the general method for the machine to parse the voice content sent by the user is: the first step: establish a language model, usually before the language model is established, it is necessary to artificially mark some commonly used corpora, for example, the user is directed to "I want to see Andy Lau's "The concert" is marked with the corpus, in which "I" can mark adult pronouns, "Andy Lau" is marked as a star name, etc., and then the words in the corpus are classified according to the content of the mark, for example, the personal pronoun is a class, star The name is a class, etc., complete the classification of the phrase, that is, complete the establishment of the language model; the second step: according to the phrase in the established language model, the word content of the user input is cut, usually using CRF (Conditional Random Field) The method, for example, the voice content input by the user to the computer is "When is there a concert of Andy Lau", at which time the computer cuts the corpus according to the phrase in the phonetic model, for example, if in the star name category in the language model There is the word "Andy Lau", which has the word "singing" in the verb category, in the noun category. There are phrases such as "time" and "concert". According to these phrases, the corpus can be cut into "what / time / have / Andy Lau / concert / concert", or cut into "what / time / have /Andy Lau / / singing / meeting", because the language model has two phrases "singing" and "concert", in this case, we must compare the probability of the two words appearing in the corpus, for example, " The singer has a higher probability of appearing in the corpus than the "concert", so the above corpus is preferentially cut into "what / time / have / Andy Lau / / singing / meeting"; the third step: the cut phrase and The grammar files in the machine are matched to resolve the semantics of the user's voice content, and BNF (Backus-Naur Form) is a grammar frequently used by users.
随着信息的不断发展和更新,某些特定领域中的词组的数量也逐渐增大,但是机器包含这些特定领域词组的语料却有限,因此在建立的语言模型时,可能会导致某些特定领域的词组在语言模型所有词组中出现的概率相对较小。当机器根据语言模型对用户发送的语音内容进行切词时,可能会由于某些特定领域的词组出现的概率小的问题,将用户发送的语音内容进行错误的切词,从而造成机器错误的解析用户输入的语音内容,例如,在上述例子中,如果“演唱”比“演唱会”在语料中出现的概率大,那么就优先的将上述语料切成“什么/时候/有/刘德华/的/演唱/会”,显然,这不符合用户发送的语音内容的语义。With the continuous development and updating of information, the number of phrases in certain specific fields has gradually increased, but the corpus of machines containing these specific domain phrases is limited, so when the language model is established, it may lead to certain specific fields. The probability that a phrase appears in all phrases of a language model is relatively small. When the machine cuts the voice content sent by the user according to the language model, the voice content sent by the user may be incorrectly cut due to the small probability of occurrence of certain specific domain phrases, thereby causing the machine error analysis. The voice content input by the user, for example, in the above example, if "singing" has a greater probability of appearing in the corpus than "song", then the above corpus is preferentially cut into "what/time/have/Andy Lau// Singing/meeting, obviously, this does not match the semantics of the voice content sent by the user.
发明内容Summary of the invention
鉴于上述问题,本申请实施例提供了一种解析语音内容的方法及装置,用来解决在建立语言模型时由于特定领域的语料少而导致机器错误解析用户输入的语音内容的问题。In view of the above problems, the embodiment of the present application provides a method and apparatus for parsing voice content, which is used to solve the problem that a machine error parses a user-entered voice content due to a small number of corpora in a specific domain when a language model is established.
本申请实施例提供了一种解析语音内容的方法,该方法包括:将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,根据所述第一切词词典对机器中存储的语料进行切词,获得所述语料中的词组;统计所述语料中每个词组在所述语料中的词组中出现的概率或频数,并按照预定规则调整所述概率或频数,使得特定领域中的词组在所述语料中的词组中出现的概率或频数增加;将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典,并根据所述第二切词词典对用户发送的语音内容进行切词,获得所述语音内容中的词组;根据语法文件对所述语音内容中的词组进行解析,获得相应的语义。An embodiment of the present application provides a method for parsing voice content, the method comprising: combining a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, according to the first word dictionary in the machine The stored corpus performs a word segmentation to obtain a phrase in the corpus; statistics a probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjusts the probability or frequency according to a predetermined rule to make a specific The probability or frequency of occurrence of a phrase in the field in a phrase in the corpus; combining a phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and according to the second slice The word dictionary performs word-cutting on the voice content sent by the user, obtains a phrase in the voice content, and parses the phrase in the voice content according to the grammar file to obtain corresponding semantics.
优选地,所述将特定领域中的词组和非特定领域中的词组组合生成第一切词词典具体包括:Preferably, the combining the phrase in the specific domain and the phrase in the non-specific domain to generate the first word dictionary comprises:
根据特定领域的词组对所述机器存储的语料进行切词,获得所述语料中特定领域的词组;Generating a corpus stored by the machine according to a phrase of a specific domain, and obtaining a phrase of a specific domain in the corpus;
统计所述语料中的每个特定领域的词组在所述语料中特定领域的词组中出现的概率或频数;Counting the probability or frequency of occurrence of a phrase for each particular domain in the corpus in a particular domain of the phrase in the corpus;
根据所述概率或频数的排名,从所述语料中特定领域的词组中选出预设数量的词组,并将选出的词组与非特定领域中的词组组合生成第一切词词典。According to the ranking of the probability or the frequency, a preset number of phrases are selected from the phrases of the specific domain in the corpus, and the selected phrases are combined with the phrases in the non-specific domain to generate a first word dictionary.
优选地,所述根据所述第二切词词典对用户发送的语音内容进行切词具体包 括:Preferably, the translating the specific content of the voice content sent by the user according to the second word-cut dictionary include:
根据所述第二切词词典,使用后向最大切词和前向最小切词的方式分别对用户发送的语音内容进行切词,如果所述两种切词方式得到的词组不同,则在所述第二切词词典中查找所述不同词组对应的的概率或频数,选取概率或频数较大的词组作为最终切词词组。According to the second word-cutting dictionary, the speech content sent by the user is respectively cut using the method of the backward maximum cut word and the forward minimum cut word, and if the two types of word-cutting methods are different, the The second word dictionary searches for the probability or frequency corresponding to the different phrases, and selects a phrase with a large probability or frequency as the final word segment.
优选地,所述第二切词词典包括:Preferably, the second word dictionary includes:
地址区域和词组区域;其中,Address area and phrase area;
所述地址区域,引导机器查找所述用户发送的切词后的语音内容中的词组在所述第二切词词典中的位置;The address area, the guiding machine searches for a position of a phrase in the voice content after the word-cut sent by the user in the second word-cut dictionary;
所述词组区域,存储所述地址区域中对应的词组。The phrase area stores a corresponding phrase in the address area.
优选地,所述根据语法文件对所述语音内容中的词组进行解析具体包括:Preferably, the parsing the phrases in the voice content according to the grammar file specifically includes:
将所述语音内容中的词组与所述语法文件中的词组进行匹配,如果所述语音内容中的词组与语法文件中的词组完全匹配,则解析成功;如果全匹配失败,则进行关键词匹配。Matching the phrase in the voice content with the phrase in the grammar file, if the phrase in the voice content completely matches the phrase in the grammar file, the analysis is successful; if the full match fails, the keyword matching is performed .
优选地,所述关键词匹配具体包括:Preferably, the keyword matching specifically includes:
将所述语音内容中的词组与所述语法文件中的关键词进行匹配,如果匹配成功,则解析成功;如果匹配不成功,则解析失败。Matching the phrase in the voice content with the keyword in the grammar file, if the matching is successful, the parsing is successful; if the matching is unsuccessful, the parsing fails.
优选地,所述特定领域的词组包括以下至少一种:Preferably, the phrase of the specific domain comprises at least one of the following:
汉字;英文字母;数字。Chinese characters; English letters; numbers.
一种解析语音内容的装置,该装置包括:组合单元、统计单元、切词单元和解析单元;其中,An apparatus for parsing voice content, the apparatus comprising: a combining unit, a statistic unit, a word cutting unit, and a parsing unit; wherein
所述组合单元,用于将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,根据所述第一切词词典对机器中存储的语料进行切词,获得所述语料中的词组;The combining unit is configured to combine a phrase in a specific domain and a phrase in a non-specific domain to generate a first word dictionary, and perform a word cutting on the corpus stored in the machine according to the first word dictionary to obtain the corpus Phrase in
所述统计单元,用于统计所述语料中每个词组在所述语料中的词组中出现的概率或频数,并按照预定规则调整所述概率或频数,使得特定领域中的词组在所述语料中的词组中出现的概率或频数增加; The statistical unit is configured to count a probability or frequency of occurrence of each phrase in the corpus in a phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that a phrase in a specific domain is in the corpus The probability or frequency of occurrence in the phrase in the phrase increases;
所述切词单元,用于将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典,并根据所述第二切词词典对用户发送的语音内容进行切词,获得所述语音内容中的词组;The word-cutting unit is configured to combine the phrase in the corpus with the adjusted probability or frequency to generate a second word-cut dictionary, and perform word-cutting on the voice content sent by the user according to the second word-cut dictionary. Obtaining a phrase in the voice content;
所述解析单元,用于根据语法文件对所述语音内容中的词组进行解析,获得相应的语义。The parsing unit is configured to parse a phrase in the voice content according to a grammar file to obtain a corresponding semantic.
优选地,所述组合单元包括:切词子单元、统计子单元和组合子单元;其中,Preferably, the combining unit includes: a word subunit, a statistical subunit, and a combined subunit; wherein
所述切词子单元,用于根据特定领域的词组对所述机器存储的语料进行切词,获得所述语料中特定领域的词组;The word-cutting unit, configured to perform a word-cutting on a corpus stored by the machine according to a phrase of a specific domain, to obtain a phrase of a specific domain in the corpus;
所述统计子单元,用于统计所述语料中的每个特定领域的词组在所述语料中特定领域的词组中出现的概率或频数;The statistical subunit is configured to count a probability or a frequency of occurrence of a phrase of each specific domain in the corpus in a phrase in a specific domain in the corpus;
所述组合子单元,用于根据所述概率或频数的排名,从所述语料中特定领域的词组中选出预设数量的词组,并将选出的词组与非特定领域中的词组组合生成第一切词词典。The combining subunit, configured to select a preset number of phrases from a phrase of a specific domain in the corpus according to the ranking of the probability or frequency, and combine the selected phrase with a phrase in a non-specific domain to generate The first word dictionary.
优选地,所述切词单元包括:Preferably, the word-cutting unit comprises:
组合子单元、切词子单元和查找子单元;其中,Combining subunits, cutting subunits, and finding subunits; wherein
组合子单元,用于将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典;a combination subunit, configured to combine the phrase in the corpus and the adjusted probability or frequency to generate a second word dictionary;
所述切词子单元,用于根据所述第二切词词典,使用后向最大切词和前向最小切词的方式分别对用户发送的语音内容进行切词;The word-cutting unit is configured to perform a word-cutting on the voice content sent by the user according to the second word-cutting dictionary, using a backward maximum cut word and a forward minimum cut word;
所述查找子单元,用于当所述两种切词方式得到的词组不同时,在所述第二切词词典中查找所述不同词组对应的的概率或频数,选取概率或频数较大的词组作为最终切词词组。The finding subunit is configured to search for a probability or a frequency corresponding to the different phrase in the second word dictionary, when the phrases obtained by the two word-cutting methods are different, and select a probability or a frequency with a large frequency The phrase is the final word of the word.
本申请实施例提供一种电子设备,包括前述任一实施例所述的解析语音内容的装置。The embodiment of the present application provides an electronic device, including the device for parsing voice content according to any of the foregoing embodiments.
本申请实施例提供一种非暂态计算机可读存储介质,其中,该非暂态计算机可读存储介质可存储有计算机指令,该计算机指令执行时可实现本申请实施例提供的解析语音内容的方法的各实现方式中的部分或全部步骤。The embodiment of the present application provides a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium can store computer instructions, which can implement the parsing of voice content provided by the embodiments of the present application. Some or all of the steps in the various implementations of the method.
本申请实施例提供一种电子设备,包括:一个或多个处理器;以及,存储器; 其中,所述存储器存储有可被所述一个或多个处理器执行的指令,所述指令被设置为用于执行本申请上述任一项解析语音内容的方法。An embodiment of the present application provides an electronic device, including: one or more processors; and a memory; Wherein the memory stores instructions executable by the one or more processors, the instructions being configured to perform the method of parsing the voice content of any of the above-described embodiments of the present application.
本申请实施例提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行本申请实施例上述任一项解析语音内容的方法。An embodiment of the present application provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to perform the method for analyzing voice content according to any of the above embodiments of the present application.
应用本申请实施例在训练语言模型时,通过调整机器中存储语料中每个词组在所有词组中出现的概率或频数,使得特定领域中的词组在所有词组中出现的概率或频数增大,从而提高机器解析用户语音内容的语义的准确率。When applying the language model of the present application, by adjusting the probability or frequency of occurrence of each phrase in all phrases in the stored corpus in the machine, the probability or frequency of occurrence of the phrase in the specific domain in all phrases is increased, thereby Improve the accuracy of the machine's semantics of parsing user speech content.
附图说明DRAWINGS
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present application, and other drawings can be obtained according to the drawings without any creative work for those skilled in the art.
图1为本申请实施例1提供的一种解析语音内容的方法的具体流程示意图;1 is a schematic flowchart of a method for parsing voice content according to Embodiment 1 of the present application;
图2为本申请实施例1提供的语言模型自适应的具体流程示意图;2 is a schematic flowchart of a language model adaptation according to Embodiment 1 of the present application;
图3为本申请实施例1提供的第二切词词典中地址区域部分的示意图;3 is a schematic diagram of an address area portion in a second word dictionary provided by Embodiment 1 of the present application;
图4为本申请实施例1提供的第二切词词典中词组区域部分的示意图;4 is a schematic diagram of a portion of a phrase region in a second word dictionary provided by Embodiment 1 of the present application;
图5为本申请实施例1提供的采用后向最大切词和前向最小切词的联合方式对用户语音内容进行切词的具体流程示意图;FIG. 5 is a schematic flowchart of a method for cutting a user voice content by using a joint manner of a backward maximum cut word and a forward minimum cut word according to Embodiment 1 of the present application;
图6为本申请实施例1提供的采用语法树编写的语法的示意图;6 is a schematic diagram of a syntax prepared by using a syntax tree according to Embodiment 1 of the present application;
图7为本申请实施例1提供的根据语法文件对用户发送的语音内容进行匹配的具体流程示意图;FIG. 7 is a schematic flowchart of a method for matching voice content sent by a user according to a grammar file according to Embodiment 1 of the present application;
图8为本申请实施例1提供的完整的解析语音内容的方法的具体流程示意图;FIG. 8 is a schematic flowchart of a complete method for parsing voice content according to Embodiment 1 of the present application;
图9为本申请实施例2提供的一种解析语音内容的装置的具体结构示意图;FIG. 9 is a schematic structural diagram of an apparatus for analyzing voice content according to Embodiment 2 of the present application;
图10为本申请实施例提供的一种电子设备的结构示意图。 FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present application. Some embodiments are applied, not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
鉴于背景技术中提到的目前机器在解析用户发送的语音内容时存在的问题,本申请实施例提供了一种解析语音内容的方法及装置,用来解决在建立语言模型时由于特定领域的语料少而导致机器错误解析用户输入的语音内容的问题。In view of the problems existing in the prior art in analyzing the voice content sent by the user, the embodiment of the present application provides a method and apparatus for analyzing voice content, which are used to solve a domain-specific corpus when building a language model. There are few problems that cause machine errors to resolve the voice content entered by the user.
实施例1Example 1
本申请实施例提供了一种解析语音内容的方法,用来提高机器解析用户语音内容中语义的准确率。图1为本申请实施例提供的一种解析语音内容的方法的具体流程示意图。该方法如下所述:The embodiment of the present application provides a method for parsing voice content, which is used to improve the accuracy of semantic analysis of semantics in a user's voice content. FIG. 1 is a schematic flowchart of a method for parsing voice content according to an embodiment of the present application. The method is as follows:
步骤11:将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,根据所述第一切词词典对机器中存储的语料进行切词,获得所述语料中的词组。Step 11: Combine the phrase in the specific domain with the phrase in the non-specific domain to generate a first word dictionary, and perform a word segmentation on the corpus stored in the machine according to the first word dictionary to obtain a phrase in the corpus.
在本步骤中,首先筛选特定领域的词典,并将这些特定领域的词典组合生成全词典,例如,将计算机、机械、娱乐等领域中的词组,组合成的特定领域词典,并将特定领域词典作为全词典;然后根据全词典中的词组对机器中的存储的语料进行CRF切词(如图2的步骤21),获得该语料中特定领域中的词组;再统计该特定领域中的每个词组在获得的该语料中所有特定领域词组中出现的概率或频数,并根据概率或频数的排名,按照预设的数量选取词组作为动态词典(如图2的步骤22),例如,在切词后的语料中选取概率排名前五万的特定领域词组,组合成为动态词典,其中,这五万词组中可以包含很多特定领域中用户经常用的词组;最后将生成的动态词典与非特定领域中的词组中的词组进行组合(如图2的步骤23),生成离线切词词典,即第一切词词典,这里的非特定领域中的词组是指用户经常用到的词组,且不包括特定领域中的词组,例如,非特定领域中的词组可以包括人称代词,比如你、我、他等;非特定领域中的词组也可以包括常用的动词,比如打、想、要、拿等。 In this step, firstly, the dictionary of a specific domain is screened, and the dictionary of these specific fields is combined to generate a full dictionary, for example, a phrase in a field such as a computer, a machine, or an entertainment is combined into a specific domain dictionary, and a specific domain dictionary is selected. As a full dictionary; then CRF word-cutting according to the phrase in the full dictionary to the stored corpus in the machine (as in step 21 of FIG. 2), obtaining a phrase in a specific field in the corpus; and then counting each of the specific fields The probability or frequency of occurrence of the phrase in all the specific domain phrases in the obtained corpus, and according to the probability or frequency ranking, the phrase is selected as a dynamic dictionary according to the preset number (step 22 of FIG. 2), for example, in the word cutting The following corpus selects the top 50,000 topic-specific phrases of probability, and combines them into a dynamic dictionary, in which the 50,000 phrases can contain many phrases frequently used by users in specific fields; finally, the dynamic dictionary and non-specific fields will be generated. The phrase in the phrase is combined (as in step 23 of Figure 2), generating an offline word dictionary, the first word dictionary, where the non-specific Phrases in a domain refer to phrases that users often use, and do not include phrases in a specific domain. For example, phrases in a non-specific field can include personal pronouns, such as you, me, him, etc.; phrases in non-specific fields are also Can include common verbs, such as playing, thinking, wanting, taking, etc.
在生成第一切词词典后,根据所述第一切词词典对机器中存储的语料进行切词(如图2的步骤24),获得所述语料中的所有词组,并作为语言模型中的训练语料,这里的词组中既包含特定领域中的词组,又包含非特定领域中的词组,其中,对语料进行切词的方式有很多种,这里示例性的对其中一种方式进行说明,例如,机器中存储的一句语料是:我想看刘德华的演唱会,然后根据第一切词词典对这一句语料进行CRF切词,例如,第一切词词典中的词组有:我、想、看、想看、刘德华、的、演唱会,这时根据第一切词词典中的词组可以将该语料切成:我/想/看/刘德华/的/演唱会,或者切成:我/想看/刘德华/的/演唱会,这时需要通过比较“想”和“想看”两词组在语料中出现的概率或频数,如果后者的概率或频数更大,则就将“我、想看、刘德华、的、演唱会”这些词组作为语言模型的训练语料。After generating the first word-cut dictionary, cutting the corpus stored in the machine according to the first word-cut dictionary (step 24 of FIG. 2), obtaining all the phrases in the corpus, and using the phrase in the language model Training corpus, the phrase here contains both phrases in a specific field and phrases in a non-specific field. There are many ways to cut words in a corpus. Here is an example of one way, such as The corpus stored in the machine is: I want to see Andy Lau's concert, and then CRF cut the corpus according to the first word dictionary. For example, the phrase in the first word dictionary is: I, think, see , want to see, Andy Lau, the concert, then according to the phrase in the first word dictionary can cut the corpus into: I / think / see / Andy Lau / / concert, or cut into: I / want to see / Andy Lau / / Concert, then you need to compare the probability or frequency of the words "think" and "want to see" in the corpus. If the probability or frequency of the latter is greater, then "I want to see , Andy Lau, the singer "These phrases as training corpus language model.
步骤12:统计所述语料中每个词组在所述语料中的词组中出现的概率或频数,并按照预定规则调整所述概率或频数,使得特定领域中的词组在所述语料中的词组中出现的概率或频数增加。Step 12: Statistics the probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that the phrase in the specific domain is in the phrase in the corpus The probability or frequency of occurrence increases.
在步骤11中获得了语言模型中的训练语料,即获得了机器中所有语料中的词组,在本步骤中,需要对语言模型进行训练(如图2的步骤25),这里可以使用SRILM工具进行语言模型的训练,具体可以包括但不限于:统计机器所有语料中每个词组在所有词组中出现的概率或频数,这里SRILM语言模型训练工具只是示例性的说明,还可以是其他训练方式,不作具体限定。In step 11, the training corpus in the language model is obtained, that is, the phrases in all the corpora in the machine are obtained. In this step, the language model needs to be trained (step 25 in Fig. 2), which can be performed by using the SRILM tool. The training of the language model may include, but is not limited to, the probability or frequency of occurrence of each phrase in all corpora of the statistical machine in all phrases. Here, the SRILM language model training tool is only an exemplary description, and may also be other training methods. Specifically limited.
在对语言模型训练后,用户需要检验训练结果,例如,检验每个词组出现的概率,在检查词组对应的概率时,可能发现有些特定语料中的词组虽然在语料中经常出现,但是相对非特定领域中的某些相近词组出现的概率较小,这样在对相关语料进行切词时,这些特定语料中的词组可能被其他某些相近词组湮没,造成切词错误,使得机器无法正确解析用户的语音内容。例如,针对用户在计算机输入“打狗棒”这句语音内容,假如“打”为非特定领域中的词组,“打狗棒”是特定领域中的词组,且“打”在所有语料中词组出现的概率大于“打狗棒”的概率,这样计算机就会把“打狗棒”切成“打/狗棒”,导致计算机没有正确解析用户的语义。After training the language model, the user needs to test the training results. For example, to check the probability of occurrence of each phrase. When checking the probability of the corresponding phrase, it may be found that the phrases in some specific corpora are often appearing in the corpus, but relatively non-specific. Some of the similar phrases in the field have a low probability of occurrence, so that when the relevant corpus is cut, the phrases in these specific corpora may be annihilated by some other similar phrases, causing the word-cutting error, so that the machine cannot correctly resolve the user's Voice content. For example, if the user inputs the speech content of "dog stick" on the computer, if "hit" is a phrase in a non-specific field, "dog stick" is a phrase in a specific field, and "hit" is a phrase in all corpora. The probability of occurrence is greater than the probability of "dog sticking", so the computer will cut the "dog stick" into "hitting/dog stick", causing the computer to not correctly parse the user's semantics.
解决上述问题的方法有很多种,例如,可以采取重新分配每个特定领域的词组在语料中出现的概率或频数,使机器更加准确的对用户发送的语音内容进行切 词。这里具体针对一种重新分配概率的方式进行说明:首先,分别统计非特定领域词组和特定领域词组在所有语料中词组出现的概率和,分别表示为Psum1和Psum2;然后将每个非特定领域词组出现的概率除以Psum1获得P1,同理,将每个特定领域词组出现的概率除以Psum2获得P2;最后将P1乘以权重系数k1,P2乘以权重系数k2,从而分别获得非特定领域词组和特定领域词组中的每个词组在所有语料中词组出现的最终概率,其中,用户可以根据个人需要自行设定k1和k2的值,但是k1和k2的和为1,且要使动态词典中每个词组出现的最终概率大于非特定领域中每个词组出现的最终概率,权重系数k1要小于Psum1。通过对非特定领域词组和特定领域词组出现的概率重新分配,使得特定领域词组出现的概率增大,从而使得机器更加准确的对用户发送的语音内容进行切词,提高解析用户语音内容的语义的准确性。There are many ways to solve the above problems. For example, you can take the probability or frequency of redistributing the phrases in each specific domain in the corpus, so that the machine can more accurately cut the voice content sent by the user. word. Here is a specific way to redistribute the probability: firstly, the probability sums of the non-specific domain phrases and the specific domain phrases appearing in all corpora are respectively expressed as Psum1 and Psum2; then each non-specific domain phrase is displayed. The probability of occurrence is divided by Psum1 to obtain P1. Similarly, the probability of occurrence of each particular domain phrase is divided by Psum2 to obtain P2; finally P1 is multiplied by the weight coefficient k1, P2 is multiplied by the weight coefficient k2, respectively, to obtain the non-specific domain phrase respectively. And the final probability that each phrase in a particular domain phrase appears in all corpora, where the user can set the values of k1 and k2 according to individual needs, but the sum of k1 and k2 is 1, and to be in the dynamic dictionary The final probability of occurrence of each phrase is greater than the final probability of occurrence of each phrase in a non-specific field, and the weighting factor k1 is smaller than Psum1. By reallocating the probability of occurrence of non-specific domain phrases and domain-specific phrases, the probability of occurrence of specific domain phrases is increased, so that the machine can more accurately cut the voice content sent by the user, and improve the semantics of the user's voice content. accuracy.
步骤13:将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典,并根据所述第二切词词典对用户发送的语音内容进行切词,获得所述语音内容中的词组。Step 13: Combine the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and perform a word cut on the voice content sent by the user according to the second word dictionary to obtain the voice. The phrase in the content.
在重新分配每个词组出现的概率或频数后,也就完成了语言模型的自适应过程(如图2的步骤26),这时机器可输出自适应后的语言模型(如图2的步骤27)。该自适应后的语言模型中既有训练后获得的词组,还有每个词组对应的重新分配后的概率和频数。然后需要将自适应后的语言模型转换成第二切词词典,第二切词词典的结构有很多种,这里第二切词词典的主要目的是帮助机器更快、更准确对用户发送的语音内容进行切词。After reallocating the probability or frequency of occurrence of each phrase, the adaptive process of the language model is completed (step 26 of Figure 2), at which time the machine can output the adaptive language model (step 27 of Figure 2). ). The adaptive language model has both the phrases obtained after training and the probability and frequency of redistribution corresponding to each phrase. Then, the adaptive language model needs to be converted into a second word dictionary. The structure of the second word dictionary has many kinds. The main purpose of the second word dictionary is to help the machine send the voice to the user faster and more accurately. The content is cut.
这里示例性的对其中一种第二切词词典的结构进行说明,该第二切词词典的结构包括两部分,即地址区域和词组区域。地址区域中地址信息帮助机器根据用户切词后的词组找到该词组在第二切词词典中对应的位置;词组区域中存储的词组是地址区域对应的词组。Here, the structure of one of the second word-cutting dictionaries is exemplarily described, and the structure of the second word-cutting dictionary includes two parts, an address area and a phrase area. The address information in the address area helps the machine find the corresponding position of the phrase in the second word dictionary according to the phrase after the user cuts the word; the phrase stored in the phrase area is the phrase corresponding to the address area.
具体地,在地址区域中可以包含10个阿拉伯数字(即0~9)、26个大写字母或小写字母(即A~Z或a~z)和常用汉字组成词组对应的地址信息。这里数字和字母都是全角格式,且每个数字或字母本身占用两个字节,每个数字、字母或者汉字对应的地址信息均占用四个字节,假设在第二切词词典中的常用汉字有6768个,则数字、字母和汉字对应的地址信息共占(10+26+6768)*4=27216,假如首地址为uniDict,那么词组区域的首地址为uniDict+27216,如图3所示为地址区 域中地址信息的示意图:词组区域的首地址为uniDict+27216,该地址保存以数字“0”为首的词组;字母区域“A”对应的地址为uniDict+40,该地址保存以字母“A”为首的词组的地址;常用汉字“啊”对应的地址为uniDict+144,该地址保存以汉字“啊”为首的词组的地址。Specifically, the address area may include 10 Arabic numerals (ie, 0 to 9), 26 uppercase letters or lowercase letters (ie, A to Z or a to z), and address information corresponding to the commonly used Chinese characters. Here, the numbers and letters are in full-width format, and each number or letter itself occupies two bytes. The address information corresponding to each number, letter or Chinese character occupies four bytes, which is assumed to be commonly used in the second word dictionary. There are 6768 Chinese characters, and the address information corresponding to numbers, letters and Chinese characters is shared (10+26+6768)*4=27216. If the first address is uniDict, the first address of the phrase area is uniDict+27216, as shown in Figure 3. Address area Schematic diagram of the address information in the domain: the first address of the phrase area is uniDict+27216, the address holds the phrase with the number “0”; the address corresponding to the letter area “A” is uniDict+40, and the address is saved with the letter “A”. The address of the first phrase; the address corresponding to the commonly used Chinese character "ah" is uniDict+144, which holds the address of the phrase headed by the Chinese character "ah".
具体地,在词组区域中,以全角数字“0”为例,如图4所示为词组区域中词组的示意图:“0”对应的首地址为uniDict+27216,可以看到,以“0”为首字的词组可以是“05毫米”,如果用户想查找以“0”为首字的词组时,从首地址uniDict+27216向下查找即可,直到遇到guard标记为止,这里的guard标记是指第二切词词典中以“0”为首字的词组已经到达最后一个。通过利用词组的首字将切词词典中的词组区域进行划分,提高了机器在切词词典中查找词组的效率。为了节约切词词典的空间,可以在词组部分中不存储首字,例如,图4所示的“05毫米”在字典存储的形式为“5毫米”。Specifically, in the phrase area, taking the full-width number “0” as an example, as shown in FIG. 4 is a schematic diagram of a phrase in a phrase area: the first address corresponding to “0” is uniDict+27216, and can be seen as “0”. The phrase that is the first word can be "05 mm". If the user wants to find a phrase with the first word of "0", look down from the first address uniDict+27216 until the guard mark is encountered. The guard mark here refers to The phrase with the "0" as the first word in the second word dictionary has reached the last one. By using the first word of the phrase to divide the phrase area in the word dictionary, the efficiency of the machine in finding the phrase in the word dictionary is improved. In order to save space in the word dictionary, the first word may not be stored in the phrase portion. For example, "05 mm" shown in Fig. 4 is stored in the dictionary as "5 mm".
在词组区域除了存储词组以外,还可以有其他参数,这里示例性的列举几个:In addition to storing phrases in the phrase area, there are other parameters, here are a few examples:
wordlen:表示的是词组的长度;Wordlen: indicates the length of the phrase;
buf:表示去掉首字的词组内容,则sizeof(buf)=wordlen-2,表示去掉首字后词组的长度;Buf: means to remove the phrase content of the first word, then sizeof(buf)=wordlen-2, indicating the length of the phrase after the first word is removed;
frequency:表示在语言模型中重新分配后的词组对应的频数,则sizeof(frequency)=2字节,表示频数所占的长度;Frequency: indicates the frequency corresponding to the phrase after redistribution in the language model, then sizeof(frequency)=2 bytes, indicating the length of the frequency;
reclen:表示存储一个词组所占用的空间,sizeof(reclen)=1字节,这里reclen=sizeof(reclen)+sizeof(frequency)+sizeof(buf)+sizeof(wordlen);Reclen: indicates the space occupied by storing a phrase, sizeof(reclen)=1 bytes, where reclen=sizeof(reclen)+sizeof(frequency)+sizeof(buf)+sizeof(wordlen);
guard:表示每个分区的结束,sizeof(guard)=1字节。Guard: indicates the end of each partition, sizeof(guard) = 1 byte.
上述第二切词词典中可以包含数字、字母和汉字,这样可以提高机器解析用户语音内容的语义的准确率,例如,用户输入的语音内容为“什么时候演西游记2啊”,如果切词词典中只有“西游记”,没有数字“2”,可能将上述语音内容切成“什么/时候/演/西游记/2/啊”,这样可能导致机器解析错误。The second word dictionary can include numbers, letters and Chinese characters, which can improve the accuracy of the machine to parse the semantics of the user's voice content. For example, the voice content input by the user is "when to play Journey 2", if the word is cut In the dictionary, only "Journey to the West", without the number "2", may cut the above voice content into "what / time / play / Journey to the West / ah", which may lead to machine parsing errors.
在本步骤中,根据第二切词词典对用户发送的语音内容进行切词的方式有很种,例如,可以采用后向最大切词的方式进行切词,也可以采用前向最小切词的方式进行切词。这里列举一种后向最大切词和前向最小切词联合的切词方式:例如,针对用户发送的“少年包青天在卫视播出的时间”这句语音内容进行切词, 如图5所示,假如规定搜索的最大长度maxLen=5,最小长度minLen=2,在后向最大切词中,先搜索“播出的时间”,在切词词典中没有找到对应的词组,则减去一个字变为“出的时间”进行重新搜索,搜索后在切词词典还是没有找到对应的词组,再对“的时间”进行搜索,这样通过减字的方式依次进行搜索,最后到达最小长度时,即“时间”,在切词词典中找到对应的词组;然后再用上述的方法对“时间”以前的词组进行搜索,最终完成对语音内容的切词。在前向最小切词中,首先对语音内容前面的词组进行搜索,例如,先对“少年”一词在切词词典中进行搜索,发现有对应的词组,则再对“少年”后面的词组进行搜索,即对“包青”进行搜索,发现在切词词典中没有对应的词组,则多一字进行重新搜索,即对“包青天”进行搜索,采用同样的方法最终完成语音内容的切词。In this step, the manner in which the voice content sent by the user is cut according to the second word-cutting dictionary is very different. For example, the word can be cut in the manner of the backward maximum cut word, or the forward minimum cut word can be used. Ways to cut words. Here is a method of word-cutting in which the backward maximum cut word and the forward minimum cut word are combined: for example, for the voice content of the "time of the teenager baggage broadcast on the TV" sent by the user, As shown in FIG. 5, if the maximum length of the search maxLen=5 and the minimum length minLen=2 are specified, in the backward maximum cut word, the “time of broadcast” is searched first, and the corresponding phrase is not found in the cut word dictionary. Then subtract a word into "out of time" to re-search, after the search, the corresponding dictionary is not found in the word dictionary, and then the "time" is searched, so that the search is performed in turn by the word reduction method, and finally arrives. When the minimum length, that is, "time", find the corresponding phrase in the word dictionary; then use the above method to search for the phrase before "time", and finally complete the word for the voice content. In the forward minimum cut word, first search for the phrase in front of the voice content, for example, first search for the word "juvenile" in the word dictionary, and find the corresponding phrase, then the phrase after the "boy" Searching, that is, searching for "Baoqing", and finding that there is no corresponding phrase in the word dictionary, then re-searching for one more word, that is, searching for "Baoqingtian", using the same method to finally complete the cutting of the voice content. word.
采用上述后向最大切词和前向最小切词联合的切词方式对用户的语音内容切词后,如果获得的切词结果不同,即获得的词组不相同时,通过比较不同词组在切词字典中的概率或频数,决定最终的切词结果。如图5所示,假如针对“少年包青天在卫视播出的时间”这句语音内容分别进行后向最大切词和前向最小切词,后向最大切词后的结果为“少年/包青/天在/卫视/播出/的/时间”,而在采用前向最小切词后的结果为“少年/包青天/在/卫视/播出/的/时间”,这时候可以通过比较“天在”和“包青天”在切词词典中出现的概率或频数,发现“包青天”的概率或频数较大,则最终的切词结果为“少年/包青天/在/卫视/播出/的/时间”。这里采用向最大切词和前向最小切词联合的切词方式使得切词的结果更加准确。After cutting the words of the user's voice content by using the above-mentioned backward maximum cut word and forward minimum cut word combination, if the obtained word results are different, that is, the obtained phrases are different, the words are compared by comparing different phrases. The probability or frequency in the dictionary determines the final result of the word. As shown in Figure 5, if the speech content of the "Juvenile Bao Qingtian broadcasts on the TV" is the maximum backward word and the forward minimum word, the result of the backward maximum word is "Junior/Bag". "Blue/Day in / Satellite TV / Broadcast / / Time", and after using the minimum forward word, the result is "Youth / Bao Qing Tian / In / Satellite / Broadcast / / Time", this time can be compared by The probability or frequency of "Tian Zai" and "Bao Qingtian" appearing in the cut word dictionary, and the probability or frequency of "Bao Qingtian" is found to be large, then the final result of the word cut is "Juvenile / Bao Qingtian / in / TV / broadcast Out / / time". Here, the word-cutting method of combining the maximum cut word and the forward minimum cut word makes the result of the cut word more accurate.
步骤14:根据语法文件对所述语音内容中的词组进行解析,获得相应的语义。Step 14: Parse the phrases in the voice content according to the grammar file to obtain corresponding semantics.
语法文件中采用的语法有很多种,这里以BNF语法为例,BNF语法的基本规则包含但不限于以下几个方面:There are many grammars used in grammar files. Here is the BNF grammar as an example. The basic rules of BNF grammar include but are not limited to the following aspects:
<>:内包含的内容为必选项,是语法必须进一步解释的非终结节点;The content contained in <>: is mandatory, and is a non-terminal node whose syntax must be further explained;
[]:内包含的内容为可选项,表示其内容可以跳过;[]: The content contained in the content is optional, indicating that its content can be skipped;
|:表示在其左右两边任选一项,相当于"或"的意思;|: means to choose one of the left and right sides, which is equivalent to the meaning of "or";
():表示组合;(): indicates a combination;
但是在实际应用中,有时这些语法规则满足不了用户的需求,本申请实施例在BNF语法规则的基础上进行了扩展,增加了以下规则: However, in practical applications, sometimes these grammar rules cannot meet the needs of the user. The embodiment of the present application expands on the basis of the BNF grammar rules, and the following rules are added:
#:表示注释;#: indicates a comment;
::非终结节点与其解释的分隔符;:: the delimiter of the non-terminal node and its interpretation;
;:表示语法中语句的结束;;: indicates the end of the statement in the grammar;
“”:表示引用外部词典文件;"": indicates that an external dictionary file is referenced;
&root(<name>):写在语法的开始部分,表示该语法名字为name;&root(<name>): written at the beginning of the grammar, indicating that the grammar name is name;
&keyword(textFrag,key,defaultValue,showValue):该函数是用来提取输入文本的关键词。该函数具体表示:假设输入为inputTextFrag,如果inputTextFrag与textFrag成功匹配,那么key=showValue,否则key=defaultValue;且该函数可以不对showValue进行定义,即该函数可以定义为:&keyword(textFrag,key,defaultValue),这时如果输入的inputTextFrag与textFrag成功匹配,则直接赋值为textFrag,即key=textFrag。&keyword(textFrag,key,defaultValue,showValue): This function is used to extract the keywords of the input text. The function specifically indicates: the input is inputTextFrag, if inputTextFrag and textFrag match successfully, then key=showValue, otherwise key=defaultValue; and the function can not define showValue, that is, the function can be defined as: &keyword(textFrag, key, defaultValue ), if the inputTextFrag entered successfully matches textFrag, it is directly assigned to textFrag, which is key=textFrag.
具体对上述函数的作用进行举例说明,例如,机器中定义函数为:&keyword(北京|天津|上海,place,本地);&keyword(下雨|下雪,weather,未定义,天气);&keyword(明天|今天|后天,date,今天)。假如用户输入的文本内容为“明天下雨吗”,此时机器在定义的函数中查找是否存在该关键词,首先用户输入的内容中没有具体地址,因此与函数“&keyword(北京|天津|上海,place,本地)”中的关键词匹配失败,这时将自动给用户输入的内容赋值,具体赋值为该函数的defaultValue,这里的defaultValue为“本地”;然后再将用户输入的“明天”与函数“&keyword(明天|今天|后天,date,今天)”中的关键词进行匹配,则与该函数中的“明天”成功匹配,因为该函数中没有定义showValue,所以将用户输入的时间直接赋值为“明天”;最后再将用户输入的“下雨”与“&keyword(下雨|下雪,weather,未定义,天气)”中的关键词进行匹配,则与该函数中的“下雨”成功匹配,因为该函数定义了showValue,且该函数的showValue为“天气”,所以将用户输入的“下雨”用“天气”替换。则机器根据该函数将用户输入的“明天下雨吗”匹配成“本地明天天气”,并进行相关操作。且上述例子中对用户输入的内容进行匹配的顺序只是示例性的说明,这里并没有对匹配顺序作具体限定,例如,上述例子中对用户输入的“明天”和“下雨”两词匹配的顺序,可以先对“明天”进行匹配,或者先对“下雨”进行匹配,再或者可以对这两词同时进行匹配。 Specifically, the function of the above function is illustrated. For example, the function defined in the machine is: &keyword(Beijing|Tianjin|Shanghai, place, local); &keyword(raining|snowing, weather, undefined, weather); &keyword(Tomorrow | Today | The day after tomorrow, date, today). If the text content input by the user is “It’s raining tomorrow”, the machine searches for the keyword in the defined function. First, there is no specific address in the content input by the user, so the function “&keyword(Beijing|Tianjin|Shanghai The keyword matching in "place, local)" fails. At this time, the content input by the user is automatically assigned. The specific value is the defaultValue of the function. The defaultValue here is "local"; then the "tomorrow" entered by the user is The keyword in the function "&keyword" is matched with "tomorrow" in the function. Because showValue is not defined in the function, the time entered by the user is directly assigned. "Tomorrow"; finally, the "rain" input by the user is matched with the keyword in "&keyword" (the rain, snow, weather, undefined, weather), and the "rain" in the function Successful match, because the function defines showValue, and the showValue of the function is "weather", so the "rain" input by the user is replaced with "weather". Then, according to the function, the machine matches the "when it rains tomorrow" input by the user into "local tomorrow weather" and performs related operations. The order in which the content input by the user is matched in the above example is only an exemplary description. The matching order is not specifically limited herein. For example, in the above example, the words "tomorrow" and "rain" are input by the user. In order, you can match "Tomorrow" first, or match "rain" first, or you can match both words at the same time.
&duplicate(TextFrag,least,most):这个函数表示把TextFrag重复m遍,m的取值范围是:least≤n≤most,例如,定义函数:&duplicate(TextFrag,1,3),此时输出的内容为:TextFrag[TextFrag][TextFrag];&duplicate(TextFrag,least,most): This function indicates that the TextFrag is repeated m times. The value range of m is: least≤n≤most, for example, the definition function: &duplicate(TextFrag,1,3), the output content at this time Is: TextFrag[TextFrag][TextFrag];
&comb(textFrag1,textFrag2,…,textFragN):该函数表示把语法片段TextFrag1,TextFrag2,…,textFragN做排列组合,例如,定义函数:&comb(Text Frag1,TextFrag2);此时输出的内容为:(TextFrag1TextFrag2)|(TextFrag2TextFrag1)。&comb(textFrag1, textFrag2,...,textFragN): This function indicates that the syntax fragments TextFrag1, TextFrag2, ..., textFragN are arranged and combined, for example, the definition function: &comb(Text Frag1, TextFrag2); the output content is: (TextFrag1TextFrag2 )|(TextFrag2TextFrag1).
对于对BNF语法规则的扩展方式有多种,以上只是示例性的说明,例如,上述对于符号的定义,可以换成其他符号;或者同一符号,可能表示其他含义,这里不做具体限定。另外,为了更清楚的说明上述语法,下面针对基于上述语法规则编写的语法文件进行举例,文件内容具体如下:There are many ways to extend the BNF grammar rules. The above is only an exemplary description. For example, the above definitions of symbols may be replaced with other symbols; or the same symbols may indicate other meanings, which are not specifically limited herein. In addition, in order to explain the above syntax more clearly, the following is an example of a grammar file written based on the above grammar rules, and the contents of the file are as follows:
&root(<影视点播>);&root(<Video on Demand>);
#key words:#key words:
#type:影视类别#type:Video category
#moviNamee:影视名#moviNamee: Film name
#year:年份#year:年
根据上述定义的语法规则,对上述语法文件进行解析:该语法的文件的名字为“影视点播”,且该语法文件中有三个关键词:type、movie和year。具体地,假如针对文本内容“播放2002年的电影无间道”,定义的语法文件可以是:According to the grammar rules defined above, the grammar file is parsed: the name of the grammar file is "video on demand", and the grammar file has three keywords: type, movie, and year. Specifically, if the text content "plays the 2002 film infernal" for the text content, the defined grammar file can be:
#例:播放2002年的电影无间道#例:Playing the 2002 film infernal
<影视点播>:[播放][<年份>][的]&comb([<类别列表>],<影视列表>)<Video on Demand>: [Play][<Year>][of]&comb([<Category List>], <Video List>)
<类别列表>:&keyword(电影|电视剧,type,unspecified);<category list>: &keyword (movie | TV series, type, unspecified);
<影视列表>:&keyword("movieList.dic",movieName,unspecified);<video list>: &keyword("movieList.dic", movieName, unspecified);
<年份>:&keyword((<时间>年),year,unspecified);<year>: &keyword((<time>year), year,unspecified);
<时间>:&duplicate(<数字>,2,4);<time>: &duplicate(<number>, 2, 4);
<数字>:0|1|2|3|4|5|6|7|8|9;<number>:0|1|2|3|4|5|6|7|8|9;
另外,为了方便机器根据定义的语法规则解析用户输入的文本内容,可以将 每个语法编写成语法树的形式,最终将一个语法文件写成一个“语法森林”的形式。以上述“播放2002年的电影无间道”编写的语法文件为例,编写的语法树如图6所示:在语法树的第一级中,显示的是文件名:“影视点播”;第二级中,包含四部分:第一部分为“播放”,第二部分为“年份”,第三部分为“的”,第四部分为“影视列表”和“类别列表”,其中,“影视列表”可以是电影或电视剧。这样通过语法树的形式将语法文件中内容层次化,便于机器解析用户输入的语音内容。In addition, in order to facilitate the machine to parse the text content input by the user according to the defined grammar rules, Each grammar is written in the form of a grammar tree, and finally a grammar file is written in the form of a "grammar forest." Taking the grammar file written in the above-mentioned "Playing Movies in 2002" as an example, the syntax tree written is as shown in Fig. 6: in the first level of the syntax tree, the file name is displayed: "video on demand"; second In the level, there are four parts: the first part is “play”, the second part is “year”, the third part is “of”, the fourth part is “film list” and “category list”, among them, “video list” It can be a movie or a TV show. This hierarchically categorizes the content of the grammar file in the form of a syntax tree, which facilitates the machine to parse the voice content input by the user.
上述在完成相关语法定义后,机器就可以根据语法文件对用户发送的语音内容进行匹配,匹配的方式包含两种:全匹配和关键词匹配。具体匹配的流程示意图如图7所示:先根据语法文件对用户输入的语音内容进行全匹配(如图7中的步骤71),这里的语音内容是经过切词后的语音内容;判断匹配结果(如图7中的步骤72),如果全匹配成功,则打印匹配结果(如图7中的步骤73);如果全匹配失败,则进行关键词匹配(如图7中的步骤74),具体是指:从语法文件中的关键词列表中搜索相应的关键词,如果匹配成功,则打印匹配结果。After completing the definition of the relevant grammar, the machine can match the voice content sent by the user according to the grammar file, and the matching manner includes two types: full matching and keyword matching. The schematic diagram of the specific matching process is as shown in FIG. 7: firstly, the voice content input by the user is fully matched according to the grammar file (step 71 in FIG. 7), where the voice content is the voice content after the word is cut; the matching result is judged. (Step 72 in Figure 7), if the full match is successful, the matching result is printed (as in step 73 in Figure 7); if the full match fails, the keyword matching is performed (step 74 in Figure 7), specifically It means: searching for the corresponding keyword from the keyword list in the grammar file, and if the matching is successful, printing the matching result.
针对上述匹配过程中,通过举例详细说明:例如,用户输入的语音内容为“我想播放2002年的电影无间道”,机器将该语音内容转换为相应的文本内容,并将该文本内容进行切词,切词后的结果为“我想/播放/2002年/的/电影/无间道”。然后根据语法文件对该文本内容进行匹配:首先,“我想”在语法文件中没有相应的词进行覆盖,即全匹配失败;然后再进行关键词匹配,具体如下:For the above matching process, a detailed description is given by way of example: for example, the voice content input by the user is “I want to play the 2002 infernal movie”, the machine converts the voice content into corresponding text content, and cuts the text content. The word, the result of the word cut is "I want / play / 2002 / / movie / infernal". Then match the text content according to the grammar file: First, "I want" in the grammar file does not have the corresponding word to cover, that is, the full match fails; then the keyword matching, as follows:
type=电影;movieName=无间道;year=2002年;在关键词匹配过程中,只要输入文本中的关键词能够与语法文件中关键词列表中的关键词匹配成功即可,因此相对于全匹配的方式,采用关键词匹配的方式会更加的灵活,且对输入的文本内容的约束更小,提高匹配成功的几率。Type=movie; movieName=infernal; year=2002; in the keyword matching process, as long as the keywords in the input text can match the keywords in the keyword list in the grammar file, so as to match the full match The way to use keyword matching is more flexible, and the constraints on the input text content are smaller, which improves the probability of successful matching.
通过上述根据语法文件对用户输入的语音内容进行解析过程中,可以发现:要使得机器快速、准确地解析出用户语音内容的语义,那么用户在编写机器中的语法文件时要尽可能的规范,这里针对语法文件的编写规范和编写技巧进行举例说明:Through the above process of parsing the voice content input by the user according to the grammar file, it can be found that: to enable the machine to quickly and accurately parse the semantics of the user's voice content, the user should be as standardized as possible when writing the grammar file in the machine. Here is an example of the specification and writing skills of grammar files:
1、语法尽可能的覆盖全面,这里可以在编写的语法规则中编写例子,具体流程为:首先设计用户场景;然后书写例句;最后根据编写的语法对例句进行覆盖。 1. The grammar is as comprehensive as possible. Here you can write examples in the written grammar rules. The specific process is: first design the user scene; then write the example sentence; finally, cover the example sentence according to the written grammar.
2、根据语法场景,关键词要清晰,便于机器在进行关键词匹配时的准确率。2. According to the grammar scene, the keywords should be clear, which is convenient for the machine to perform keyword matching.
3、编写语法文件时,要尽量避免过产生,例如,语法文件中的语法片段为“[今天][的][广州][的][天气]”,根据该文法片段可以覆盖“的/的/天气”的文本内容,显然,该文本内容不符合人类的语言习惯,且这种严重的过产生会降低语法文件结构的优势。为减少这种过产生的情况,可以将语法文件拆成若干子条目,例如,针对上述语法语法片段,可以编写成:一级子条目为:“[今天][的]<广州>[的][天气]”;二级子条目为:“[今天][的广州][的][天气]”,三级子条目为:“[今天][的][广州的][天气]”,这样就可以减少语法文件中的过产生的情况。3. When writing a grammar file, try to avoid it. For example, the grammar fragment in the grammar file is "[today][[][[]][weather]", according to the grammar fragment can cover "/ /Weather" text content, obviously, the text content does not conform to human language habits, and this serious overproduction will reduce the advantages of the grammar file structure. In order to reduce this over-production situation, the grammar file can be split into several sub-entries. For example, for the grammar grammar fragment described above, it can be written as: the first-level sub-entries are: "[today][of]<广州>[] [Weather]"; the second sub-entries are: "[Today][Guangzhou][[][Weather]", the third-level subentry is: "[Today][[][Guangzhou][Weather]", so It is possible to reduce the overproduction in the grammar file.
4、在编写语法文件时,尽量采用分级的编写的方法,使得语法文件具有好的可读性。例如,上述提到的语法树规则。4. When writing grammar files, try to use a hierarchical writing method to make the grammar files have good readability. For example, the syntax tree rules mentioned above.
5、语法文件中的词组尽量与切词词典中的词组一致,这样使得机器更加准确的解析用户的语音内容。例如,“我想知道”可以根据切词词典切成“我想/知道”,这时的语法文件中的词组也应保持一致,可以是“[我想]<知道>”而不是“[我][想]<知道>”等。5. The phrase in the grammar file is as close as possible to the phrase in the word dictionary, which makes the machine more accurately parse the user's voice content. For example, "I want to know" can be cut into "I want / know" according to the word dictionary, and the phrase in the grammar file should be consistent, which can be "[I want] <know>" instead of "[I ][Think] <Know> and so on.
这里在解析语音内容时,需要考虑切词的影响,例如,用户发送的语音内容为“我想打电话”,机器将该语音内容可能切成“我想/打/电话”,这时虽然机器切词出现错误,但语法文件中应按照“打电话”这种方式对用户发送的语音内容进行解析,这样可以减少机器由于切词错误,而造成解析错误。Here, when parsing the voice content, it is necessary to consider the influence of the word-cutting. For example, the voice content sent by the user is "I want to make a call", and the machine may cut the voice content into "I want/play/telephone", at this time although the machine There is an error in the word cut, but the grammar file should be parsed according to the "call" method, which can reduce the machine's parsing error due to the wording error.
6、在采用语法树编写语法文件时,根节点中中至少包括一个必选项,否则输入的文本都被该语法覆盖,造成机器的错误解析。例如,语法文件中的语法片段为“[今天][的][广州][的][天气]”,因为语法片段中的词组都为可选项,假如用户输入的语音内容为“今天的上海的天气”,这时候也能够与语法文件中的词组进行匹配,显然,这样会导致机器解析错误。6. When the grammar file is written in the syntax tree, at least one mandatory option is included in the root node, otherwise the input text is overwritten by the syntax, causing the machine to parse the error. For example, the grammar fragment in the grammar file is "[today][[][[]][weather]", because the phrases in the grammar fragment are all optional, if the user input the voice content is "Today's Shanghai The weather, this time can also match the phrases in the grammar file, obviously, this will lead to machine parsing errors.
7、在采用语法树编写语法文件时,如果根节点中的必选项词组同时也是关键词,这时可以设置:defaultValue=error。当用户发送的语音内容与根节点中的必选项不能匹配时,直接输出error,避免机器再进行关键词匹配操作,浪费机器的资源。7. When writing a grammar file using the syntax tree, if the required phrase in the root node is also a keyword, you can set: defaultValue=error. When the voice content sent by the user cannot match the mandatory content in the root node, the error is directly output to prevent the machine from performing the keyword matching operation and waste the resources of the machine.
为了更清楚的理解本申请实施例,对上述提供的解析语音内容的方法进行系 统的说明,如图8所示:第一步:语言模型的自适应过程(如图8中的步骤81),具体是指:调整所述语料中每个词组在所述语料中的词组中出现的概率或频数,使得特定领域中的词组在机器中语料中词组中出现的概率和频数增大;第二步:根据切词词典对用户发送的语音内容进行切词(如图8中的步骤82);第三步:根据语法文件对切词后的语音内容进行全匹配(如图8中的步骤83),这时机器判断全匹配是否成功(如图8中的步骤84),如果匹配成功,则打印匹配结果(如图8中的步骤85),这里的语法文件可以采用语法树的形式;第四步:如果全匹配失败,则进行关键词匹配(如图8中的步骤86),关键词匹配成功后打印匹配结果。完成匹配的过程,就是机器对用户语音内容进行解析的过程。In order to understand the embodiment of the present application more clearly, the method for analyzing the voice content provided above is performed. The description of the system is as shown in FIG. 8: the first step: the adaptive process of the language model (step 81 in FIG. 8), specifically: adjusting each phrase in the corpus in the phrase in the corpus The probability or frequency of occurrence increases the probability and frequency of occurrence of phrases in a particular domain in a phrase in the corpus of the machine; the second step: cutting the speech content sent by the user according to the word-cut dictionary (as in Figure 8) Step 82); Step 3: Perform a full match on the voice content after the word cut according to the grammar file (step 83 in FIG. 8), at which time the machine determines whether the full match is successful (step 84 in FIG. 8), if If the matching is successful, the matching result is printed (such as step 85 in FIG. 8), where the grammar file can be in the form of a syntax tree; and the fourth step: if the full matching fails, the keyword matching is performed (step 86 in FIG. 8). ), the matching result is printed after the keyword matching is successful. The process of completing the matching is the process of the machine parsing the user's voice content.
应用本申请实施例提供的解析语音内容的方法,获得的有益效果如下:The beneficial effects obtained by the method for parsing voice content provided by the embodiments of the present application are as follows:
1、在对语言模型进行训练时,调整机器中存储语料中每个词组在所有词组中出现的概率或频数,使得特定领域中的词组在所有词组中出现的概率或频数增大,从而提高机器解析用户语音内容的语义的准确率。1. When training the language model, adjust the probability or frequency of occurrence of each phrase in the stored corpus in all phrases in the machine, so that the probability or frequency of occurrence of phrases in a particular field in all phrases increases, thereby increasing the machine. Analyze the accuracy of the semantics of user voice content.
2、本申请实施例中的切词词典包含地址区域和词组区域,且在词组区域中采用首字分区,便于机器在切词词典中快速的找到对应词组的位置。另外,词组区域中的词组包含数字、字母和汉字,提高机器解析用户语音内容的语义的准确率。2. The word dictionary in the embodiment of the present application includes an address area and a phrase area, and the first word partition is used in the phrase area, so that the machine can quickly find the position of the corresponding phrase in the word dictionary. In addition, the phrases in the phrase area contain numbers, letters, and Chinese characters, increasing the accuracy of the machine's ability to resolve the semantics of the user's voice content.
3、本申请实施例在现有BNF语法规则的基础上进行了扩展,并提供了语法规则的编写技巧,提高了语法文件的可读性,并且提高了机器解析用户语音内容的语义的准确率。3. The embodiment of the present application expands on the basis of the existing BNF grammar rules, and provides the writing skills of the grammar rules, improves the readability of the grammar file, and improves the semantic accuracy of the machine parsing the user's voice content. .
4、在根据语法文件对用户发送的语音内容进行匹配时,采用全匹配和关键词匹配,使得匹配更加全面,进而提高机器解析用户语音内容的语义的准确率。4. When matching the voice content sent by the user according to the grammar file, the full matching and the keyword matching are adopted, so that the matching is more comprehensive, thereby improving the accuracy of the machine parsing the semantics of the user's voice content.
最后需要说明的是,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非暂态计算机可读存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的非暂态计算机可读存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。Finally, it should be understood that those skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-transitory computer. In a readable storage medium, the program, when executed, may include the flow of an embodiment of the methods as described above. The non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
实施例2 Example 2
实施例1中提供了一种解析语音内容的方法,相应的,本申请实施例提供了一种解析语音内容的装置,用来提高机器解析用户语音内容中语义的准确率。A method for parsing voice content is provided in Embodiment 1. Correspondingly, the embodiment of the present application provides an apparatus for parsing voice content, which is used to improve the accuracy of semantic analysis of semantics in a user's voice content.
一种解析语音内容的装置,该装置包括:组合单元91、统计单元92、切词单元93和解析单元94;其中,An apparatus for parsing voice content, the apparatus comprising: a combining unit 91, a statistic unit 92, a word cutting unit 93, and a parsing unit 94;
组合单元91,可以用于将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,并根据所述第一切词词典对机器中存储的语料进行切词,获得所述语料中的词组;The combining unit 91 may be configured to combine a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, and perform a word cutting on the corpus stored in the machine according to the first word dictionary to obtain the a phrase in a corpus;
统计单元92,可以用于统计所述语料中每个词组在所述语料中的词组中出现的概率或频数,并按照预定规则调整所述概率或频数,使得特定领域中的词组在所述语料中的词组中出现的概率或频数增加;The statistic unit 92 may be configured to count the probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that the phrase in the specific domain is in the corpus The probability or frequency of occurrence in the phrase in the phrase increases;
切词单元93,可以用于将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典,并根据所述第二切词词典对用户发送的语音内容进行切词,获得所述语音内容中的词组;The word unit 93 may be configured to combine the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and perform word cutting on the voice content sent by the user according to the second word dictionary. Obtaining a phrase in the voice content;
解析单元94,可以用于根据语法文件对所述语音内容中的词组进行解析,获得相应的语义。The parsing unit 94 is configured to parse the phrases in the voice content according to the grammar file to obtain corresponding semantics.
上述装置实施例的工作过程是:第一步:组合单元91将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,并根据该第一切词词典对机器中存储的语料进行切词,获得该语料中的词组;第二步:统计单元92统计该语料中每个词组在该语料中的词组中出现的概率或频数,并按照预定规则调整概率或频数,使得特定领域中的词组在该语料中的词组中出现的概率或频数增加;第三步:切词单元93将该语料中的词组和调整后的概率或频数组合生成第二切词词典,并根据该第二切词词典对用户发送的语音内容进行切词,获得该语音内容中的词组;第四步:解析单元94根据语法文件对该语音内容中的词组进行解析,获得相应的语义。The working process of the above device embodiment is: first step: the combining unit 91 combines the phrase in the specific domain with the phrase in the non-specific domain to generate a first word dictionary, and stores the same in the machine according to the first word dictionary. The corpus performs a word cut to obtain a phrase in the corpus; the second step: the statistic unit 92 counts the probability or frequency of occurrence of each phrase in the corpus in the corpus, and adjusts the probability or frequency according to a predetermined rule, so that the specific The probability or frequency of occurrence of a phrase in the field in a phrase in the corpus increases; a third step: the word segmentation unit 93 combines the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and according to the phrase The second word-cut dictionary performs word-cutting on the voice content sent by the user to obtain a phrase in the voice content. The fourth step: the parsing unit 94 parses the phrase in the voice content according to the grammar file to obtain corresponding semantics.
上述装置实施例提高机器解析用户语音内容中语义的准确率的实施方式有很多种,例如,在一种实施方式中,所述组合单元91包括:切词子单元、统计子单元和组合子单元;其中,The foregoing apparatus embodiments improve the accuracy of the machine to resolve the semantic accuracy of the user's voice content. For example, in one embodiment, the combining unit 91 includes: a word subunit, a statistical subunit, and a combined subunit. ;among them,
切词子单元,可以用于根据特定领域的词组对所述机器存储的语料进行切词,获得所述语料中特定领域的词组;相比于现有技术采用人为标记的方式获得 特定领域的词组的方式,这里采用机器切词的方式获得特定领域词组更加便捷。a word subunit, which can be used to perform a word segmentation on a corpus stored by the machine according to a phrase of a specific domain, to obtain a phrase of a specific domain in the corpus; and obtain an artificial markup method according to the prior art. The way in which a particular field of phrase is used, it is more convenient to use a machine-cut word to obtain a specific domain phrase.
统计子单元,可以用于统计所述语料中的每个特定领域的词组在所述语料中特定领域的词组中出现的概率或频数;a statistical subunit, which can be used to count the probability or frequency of occurrence of a phrase of each particular domain in the corpus in a particular domain of the corpus;
组合子单元,可以用于根据所述概率或频数的排名,从所述语料中特定领域的词组中选出预设数量的词组,并将选出的词组与非特定领域中的词组组合生成第一切词词典。选取特定领域中的词组的概率或频数排名靠前的词组,即将语料中经常出现的词组生成第一切词词典,可以提高机器切词效率。The combining subunit may be configured to select a preset number of phrases from a phrase of a specific domain in the corpus according to the ranking of the probability or frequency, and combine the selected phrase with a phrase in a non-specific domain to generate a first All word dictionary. Select the phrase with the highest probability or frequency of the phrase in a specific field, and generate the first word dictionary from the phrases that often appear in the corpus, which can improve the efficiency of machine word cutting.
在另一种实施方式中,所述切词单元93包括:In another embodiment, the word-cutting unit 93 includes:
组合子单元、切词子单元和查找子单元;其中,Combining subunits, cutting subunits, and finding subunits; wherein
组合子单元,可以用于将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典;这里调整所述语料中每个词组在所述语料中的词组中出现的概率或频数,使得特定领域中的词组在机器中语料中词组中出现的概率和频数增大,从而提高机器解析用户语音内容的语义的准确率。Combining a subunit, the phrase in the corpus and the adjusted probability or frequency are combined to generate a second word dictionary; wherein each phrase in the corpus is adjusted to appear in a phrase in the corpus The probability or frequency increases the probability and frequency of occurrences of phrases in a particular domain in a phrase in the corpus of the machine, thereby increasing the accuracy of the machine's semantics of parsing the user's speech content.
切词子单元,可以用于根据所述第二切词词典,使用后向最大切词和前向最小切词的方式分别对用户发送的语音内容进行切词;The word segmentation unit may be configured to perform a word segmentation on the voice content sent by the user according to the second word dictionary, using a backward maximum cut word and a forward minimum cut word;
查找子单元,可以用于当所述两种切词方式得到的词组不同时,在所述第二切词词典中查找所述不同词组对应的的概率或频数,选取概率或频数较大的词组作为最终切词词组。The search subunit may be configured to search for a probability or a frequency corresponding to the different phrase in the second word dictionary when the phrases obtained by the two word-cutting methods are different, and select a phrase with a large probability or a frequency As the final word of the word.
上述采用切词子单元和查找子单元,通过采用后向最大切词和前向最小切词联合的切词方式将用户的语音内容进行切词,使得切词的结果更加准确。The above-mentioned word segmentation unit and the search sub-unit are used to cut the word content of the user by using a word-cutting method in which the backward maximum word segmentation and the forward minimum word segmentation are combined, so that the result of the word segmentation is more accurate.
上述装置实施例获得的有益效果与前述的方法实施例获得的有益效果相同或者类似,为避免重复,这里不做赘述。The beneficial effects obtained by the above device embodiments are the same as or similar to those obtained by the foregoing method embodiments. To avoid repetition, no further details are provided herein.
在本申请另一实施例中,还提供一种电子设备,包括前述任一实施例所述的解析语音内容的装置。In another embodiment of the present application, an electronic device is provided, including the device for parsing voice content according to any of the foregoing embodiments.
在本申请另一实施例中,还提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令可执行上述任意方法实施例中的解析语音内容的方法。In another embodiment of the present application, a non-transitory computer readable storage medium is also provided, the non-transitory computer readable storage medium storing computer executable instructions executable by any of the above methods The method of parsing voice content in the example.
图10是本申请实施例提供的执行解析语音内容的方法的电子设备的硬件结 构示意图,如图10所示,该设备包括:10 is a hardware node of an electronic device for performing a method for parsing voice content according to an embodiment of the present application. The schematic diagram, as shown in FIG. 10, includes:
一个或多个处理器1010以及存储器1020,图10中以一个处理器1010为例。One or more processors 1010 and a memory 1020 are illustrated by one processor 1010 in FIG.
执行解析语音内容的方法的设备还可以包括:输入装置1030和输出装置1040。The apparatus that performs the method of parsing the voice content may further include: an input device 1030 and an output device 1040.
处理器1010、存储器1020、输入装置1030和输出装置1040可以通过总线或者其他方式连接,图10中以通过总线连接为例。The processor 1010, the memory 1020, the input device 1030, and the output device 1040 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
存储器1020作为一种非暂态计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的解析语音内容的方法对应的程序指令/模块(例如,附图9所示的组合单元91、统计单元92、切词单元93和解析单元94)。处理器1010通过运行存储在存储器1020中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例解析语音内容的方法。The memory 1020 is used as a non-transitory computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer executable program, and a module, such as a program corresponding to the method for parsing voice content in the embodiment of the present application. An instruction/module (for example, the combination unit 91, the statistical unit 92, the word-cutting unit 93, and the parsing unit 94 shown in FIG. 9). The processor 1010 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 1020, that is, a method of parsing the voice content by the above method embodiments.
存储器1020可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据解析语音内容的装置的使用所创建的数据等。此外,存储器1020可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器1020可选包括相对于处理器1010远程设置的存储器,这些远程存储器可以通过网络连接至解析语音内容的装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 1020 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the device that parses the voice content, and the like. Moreover, memory 1020 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 1020 can optionally include memory remotely disposed relative to processor 1010, which can be connected to a device that parses the voice content over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
输入装置1030可接收输入的数字或字符信息,以及产生与解析语音内容的装置的用户设置以及功能控制有关的键信号输入。输出装置1040可包括显示屏等显示设备。 Input device 1030 can receive input numeric or character information, as well as generate key signal inputs related to user settings and function control of the device that parses the voice content. The output device 1040 can include a display device such as a display screen.
所述一个或者多个模块存储在所述存储器1020中,当被所述一个或者多个处理器1010执行时,执行上述任意方法实施例中的解析语音内容的方法。The one or more modules are stored in the memory 1020, and when executed by the one or more processors 1010, perform the method of parsing voice content in any of the above method embodiments.
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。The above products can perform the methods provided by the embodiments of the present application, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.
本申请实施例的电子设备以多种形式存在,包括但不限于: The electronic device of the embodiment of the present application exists in various forms, including but not limited to:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。(1) Mobile communication devices: These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。(3) Portable entertainment devices: These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.
(4)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。(4) Server: A device that provides computing services. The server consists of a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability High reliability in terms of reliability, security, scalability, and manageability.
(5)其他具有数据交互功能的电子装置。(5) Other electronic devices with data interaction functions.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware. Based on such understanding, the above-described technical solutions may be embodied in the form of software products in essence or in the form of software products, which may be stored in a computer readable storage medium such as ROM/RAM, magnetic Discs, optical discs, etc., include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or portions of the embodiments.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技 术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 Finally, it should be noted that the above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still Modify the technical solutions described in the foregoing embodiments, or part of the techniques The features are equivalently substituted; and the modifications or substitutions do not detract from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (14)

  1. 一种解析语音内容的方法,其特征在于,应用于电子设备,该方法包括:A method for parsing voice content, which is characterized by being applied to an electronic device, the method comprising:
    将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,根据所述第一切词词典对机器中存储的语料进行切词,获得所述语料中的词组;Combining a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, and performing a word segmentation on the corpus stored in the machine according to the first word dictionary to obtain a phrase in the corpus;
    统计所述语料中每个词组在所述语料中的词组中出现的概率或频数,并按照预定规则调整所述概率或频数,使得特定领域中的词组在所述语料中的词组中出现的概率或频数增加;Counting a probability or frequency of occurrence of each phrase in the corpus in a phrase in the corpus, and adjusting the probability or frequency according to a predetermined rule such that a phrase in a particular domain appears in a phrase in the corpus Or increase the frequency;
    将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典,并根据所述第二切词词典对用户发送的语音内容进行切词,获得所述语音内容中的词组;Combining the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and performing a word cut on the voice content sent by the user according to the second word dictionary to obtain the voice content phrase;
    根据语法文件对所述语音内容中的词组进行解析,获得相应的语义。The phrases in the voice content are parsed according to a grammar file to obtain corresponding semantics.
  2. 根据权利要求1所述的方法,其特征在于,所述将特定领域中的词组和非特定领域中的词组组合生成第一切词词典具体包括:The method according to claim 1, wherein the combining the phrases in the specific domain and the phrases in the non-specific domain to generate the first word dictionary comprises:
    根据特定领域的词组对所述机器存储的语料进行切词,获得所述语料中特定领域的词组;Generating a corpus stored by the machine according to a phrase of a specific domain, and obtaining a phrase of a specific domain in the corpus;
    统计所述语料中的每个特定领域的词组在所述语料中特定领域的词组中出现的概率或频数;Counting the probability or frequency of occurrence of a phrase for each particular domain in the corpus in a particular domain of the phrase in the corpus;
    根据所述概率或频数的排名,从所述语料中特定领域的词组中选出预设数量的词组,并将选出的词组与非特定领域中的词组组合生成第一切词词典。According to the ranking of the probability or the frequency, a preset number of phrases are selected from the phrases of the specific domain in the corpus, and the selected phrases are combined with the phrases in the non-specific domain to generate a first word dictionary.
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述第二切词词典对用户发送的语音内容进行切词具体包括:The method according to claim 1, wherein the translating the voice content sent by the user according to the second word-cut dictionary comprises:
    根据所述第二切词词典,使用后向最大切词和前向最小切词的方式分别对用户发送的语音内容进行切词,如果所述两种切词方式得到的词组不同,则在所述第二切词词典中查找所述不同词组对应的的概率或频数,选取概率或频数较大的词组作为最终切词词组。 According to the second word-cutting dictionary, the speech content sent by the user is respectively cut using the method of the backward maximum cut word and the forward minimum cut word, and if the two types of word-cutting methods are different, the The second word dictionary searches for the probability or frequency corresponding to the different phrases, and selects a phrase with a large probability or frequency as the final word segment.
  4. 根据权利要求1所述的方法,其特征在于,所述第二切词词典包括:The method of claim 1 wherein said second word-cut dictionary comprises:
    地址区域和词组区域;其中,Address area and phrase area;
    所述地址区域,引导机器查找所述用户发送的切词后的语音内容中的词组在所述第二切词词典中的位置;The address area, the guiding machine searches for a position of a phrase in the voice content after the word-cut sent by the user in the second word-cut dictionary;
    所述词组区域,存储所述地址区域中对应的词组。The phrase area stores a corresponding phrase in the address area.
  5. 根据权利要求1所述的方法,其特征在于,所述根据语法文件对所述语音内容中的词组进行解析具体包括:The method according to claim 1, wherein the parsing the phrases in the voice content according to the grammar file specifically comprises:
    将所述语音内容中的词组与所述语法文件中的词组进行匹配,如果所述语音内容中的词组与语法文件中的词组完全匹配,则解析成功;如果全匹配失败,则进行关键词匹配。Matching the phrase in the voice content with the phrase in the grammar file, if the phrase in the voice content completely matches the phrase in the grammar file, the analysis is successful; if the full match fails, the keyword matching is performed .
  6. 根据权利要求5所述的方法,其特征在于,所述关键词匹配具体包括:The method according to claim 5, wherein the keyword matching specifically comprises:
    将所述语音内容中的词组与所述语法文件中的关键词进行匹配,如果匹配成功,则解析成功;如果匹配不成功,则解析失败。Matching the phrase in the voice content with the keyword in the grammar file, if the matching is successful, the parsing is successful; if the matching is unsuccessful, the parsing fails.
  7. 根据权利要求1所述的方法,其特征在于,所述特定领域的词组包括以下至少一种:The method of claim 1 wherein the phrase of the particular domain comprises at least one of the following:
    汉字;英文字母;数字。Chinese characters; English letters; numbers.
  8. 一种解析语音内容的装置,其特征在于,该装置包括:组合单元、统计单元、切词单元和解析单元;其中,An apparatus for parsing a voice content, the apparatus comprising: a combination unit, a statistics unit, a word-cutting unit, and a parsing unit; wherein
    所述组合单元,用于将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,根据所述第一切词词典对机器中存储的语料进行切词,获得所述语料中的词组;The combining unit is configured to combine a phrase in a specific domain and a phrase in a non-specific domain to generate a first word dictionary, and perform a word cutting on the corpus stored in the machine according to the first word dictionary to obtain the corpus Phrase in
    所述统计单元,用于统计所述语料中每个词组在所述语料中的词组中出现的概率或频数,并按照预定规则调整所述概率或频数,使得特定领域中的词组在所述语料中的词组中出现的概率或频数增加;The statistical unit is configured to count a probability or frequency of occurrence of each phrase in the corpus in a phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that a phrase in a specific domain is in the corpus The probability or frequency of occurrence in the phrase in the phrase increases;
    所述切词单元,用于将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典,并根据所述第二切词词典对用户发送的语音内容进行切词,获得所述语音内容中的词组; The word-cutting unit is configured to combine the phrase in the corpus with the adjusted probability or frequency to generate a second word-cut dictionary, and perform word-cutting on the voice content sent by the user according to the second word-cut dictionary. Obtaining a phrase in the voice content;
    所述解析单元,用于根据语法文件对所述语音内容中的词组进行解析,获得相应的语义。The parsing unit is configured to parse a phrase in the voice content according to a grammar file to obtain a corresponding semantic.
  9. 根据权利要求8所述的装置,其特征在于,所述组合单元包括:切词子单元、统计子单元和组合子单元;其中,The apparatus according to claim 8, wherein the combining unit comprises: a word subunit, a statistical subunit, and a combined subunit; wherein
    所述切词子单元,用于根据特定领域的词组对所述机器存储的语料进行切词,获得所述语料中特定领域的词组;The word-cutting unit, configured to perform a word-cutting on a corpus stored by the machine according to a phrase of a specific domain, to obtain a phrase of a specific domain in the corpus;
    所述统计子单元,用于统计所述语料中的每个特定领域的词组在所述语料中特定领域的词组中出现的概率或频数;The statistical subunit is configured to count a probability or a frequency of occurrence of a phrase of each specific domain in the corpus in a phrase in a specific domain in the corpus;
    所述组合子单元,用于根据所述概率或频数的排名,从所述语料中特定领域的词组中选出预设数量的词组,并将选出的词组与非特定领域中的词组组合生成第一切词词典。The combining subunit, configured to select a preset number of phrases from a phrase of a specific domain in the corpus according to the ranking of the probability or frequency, and combine the selected phrase with a phrase in a non-specific domain to generate The first word dictionary.
  10. 根据权利要求8所述的装置,其特征在于,所述切词单元包括:The apparatus according to claim 8, wherein said word-cutting unit comprises:
    组合子单元、切词子单元和查找子单元;其中,Combining subunits, cutting subunits, and finding subunits; wherein
    组合子单元,用于将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典;a combination subunit, configured to combine the phrase in the corpus and the adjusted probability or frequency to generate a second word dictionary;
    所述切词子单元,用于根据所述第二切词词典,使用后向最大切词和前向最小切词的方式分别对用户发送的语音内容进行切词;The word-cutting unit is configured to perform a word-cutting on the voice content sent by the user according to the second word-cutting dictionary, using a backward maximum cut word and a forward minimum cut word;
    所述查找子单元,用于当所述两种切词方式得到的词组不同时,在所述第二切词词典中查找所述不同词组对应的的概率或频数,选取概率或频数较大的词组作为最终切词词组。The finding subunit is configured to search for a probability or a frequency corresponding to the different phrase in the second word dictionary, when the phrases obtained by the two word-cutting methods are different, and select a probability or a frequency with a large frequency The phrase is the final word of the word.
  11. 一种电子设备,其特征在于,包括如权利要求8-10任一项所述的解析语音内容的装置。An electronic device, comprising the apparatus for parsing voice content according to any one of claims 8-10.
  12. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行权利要求1-7任一所述方法。A non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions for causing the computer to perform the method of any of claims 1-7 .
  13. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    一个或多个处理器;以及,One or more processors; and,
    与所述一个或多个处理器通信连接的存储器;其中, a memory communicatively coupled to the one or more processors; wherein
    所述存储器存储有可被所述一个或多个处理器执行的指令,所述指令被所述一个或多个处理器执行,以使所述一个或多个处理器能够:The memory stores instructions executable by the one or more processors, the instructions being executed by the one or more processors to enable the one or more processors to:
    将特定领域中的词组和非特定领域中的词组组合生成第一切词词典,根据所述第一切词词典对机器中存储的语料进行切词,获得所述语料中的词组;Combining a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, and performing a word segmentation on the corpus stored in the machine according to the first word dictionary to obtain a phrase in the corpus;
    统计所述语料中每个词组在所述语料中的词组中出现的概率或频数,并按照预定规则调整所述概率或频数,使得特定领域中的词组在所述语料中的词组中出现的概率或频数增加;Counting a probability or frequency of occurrence of each phrase in the corpus in a phrase in the corpus, and adjusting the probability or frequency according to a predetermined rule such that a phrase in a particular domain appears in a phrase in the corpus Or increase the frequency;
    将所述语料中的词组和所述调整后的概率或频数组合生成第二切词词典,并根据所述第二切词词典对用户发送的语音内容进行切词,获得所述语音内容中的词组;Combining the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and performing a word cut on the voice content sent by the user according to the second word dictionary to obtain the voice content phrase;
    根据语法文件对所述语音内容中的词组进行解析,获得相应的语义。The phrases in the voice content are parsed according to a grammar file to obtain corresponding semantics.
  14. 一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行权利要求1-7所述的方法。 A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to execute The method of claims 1-7.
PCT/CN2016/096186 2015-12-25 2016-08-22 Method and apparatus for parsing voice content WO2017107518A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510995231.5A CN105912521A (en) 2015-12-25 2015-12-25 Method and device for parsing voice content
CN201510995231.5 2015-12-25

Publications (1)

Publication Number Publication Date
WO2017107518A1 true WO2017107518A1 (en) 2017-06-29

Family

ID=56744050

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/096186 WO2017107518A1 (en) 2015-12-25 2016-08-22 Method and apparatus for parsing voice content

Country Status (2)

Country Link
CN (1) CN105912521A (en)
WO (1) WO2017107518A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019034957A1 (en) * 2017-08-17 2019-02-21 International Business Machines Corporation Domain-specific lexically-driven pre-parser
CN110390002A (en) * 2019-06-18 2019-10-29 深圳壹账通智能科技有限公司 Call resource allocation method, device, computer readable storage medium and server
US10769376B2 (en) 2017-08-17 2020-09-08 International Business Machines Corporation Domain-specific lexical analysis
CN112016297A (en) * 2020-08-27 2020-12-01 深圳壹账通智能科技有限公司 Intention recognition model testing method and device, computer equipment and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399919A (en) * 2017-02-06 2018-08-14 中兴通讯股份有限公司 A kind of method for recognizing semantics and device
CN107193973B (en) * 2017-05-25 2021-07-20 百度在线网络技术(北京)有限公司 Method, device and equipment for identifying field of semantic analysis information and readable medium
US10599645B2 (en) * 2017-10-06 2020-03-24 Soundhound, Inc. Bidirectional probabilistic natural language rewriting and selection
CN109447863A (en) * 2018-10-23 2019-03-08 广州努比互联网科技有限公司 A kind of 4MAT real-time analysis method and system
CN109446376B (en) * 2018-10-31 2021-06-25 广东小天才科技有限公司 Method and system for classifying voice through word segmentation
CN111831832B (en) * 2020-07-27 2022-07-01 北京世纪好未来教育科技有限公司 Word list construction method, electronic device and computer readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050289141A1 (en) * 2004-06-25 2005-12-29 Shumeet Baluja Nonstandard text entry
CN1949211A (en) * 2005-10-13 2007-04-18 中国科学院自动化研究所 New Chinese characters spoken language analytic method and device
US20070233458A1 (en) * 2004-03-18 2007-10-04 Yousuke Sakao Text Mining Device, Method Thereof, and Program
CN101788989A (en) * 2009-01-22 2010-07-28 蔡亮华 Vocabulary information processing method and system
CN104077275A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Method and device for performing word segmentation based on context
CN105096933A (en) * 2015-05-29 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for generating word segmentation dictionary and method and apparatus for text to speech

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404035A (en) * 2008-11-21 2009-04-08 北京得意音通技术有限责任公司 Information search method based on text or voice
US9569425B2 (en) * 2013-03-01 2017-02-14 The Software Shop, Inc. Systems and methods for improving the efficiency of syntactic and semantic analysis in automated processes for natural language understanding using traveling features
CN103294666B (en) * 2013-05-28 2017-03-01 百度在线网络技术(北京)有限公司 Grammar compilation method, semantic analytic method and corresponding intrument

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233458A1 (en) * 2004-03-18 2007-10-04 Yousuke Sakao Text Mining Device, Method Thereof, and Program
US20050289141A1 (en) * 2004-06-25 2005-12-29 Shumeet Baluja Nonstandard text entry
CN1949211A (en) * 2005-10-13 2007-04-18 中国科学院自动化研究所 New Chinese characters spoken language analytic method and device
CN101788989A (en) * 2009-01-22 2010-07-28 蔡亮华 Vocabulary information processing method and system
CN104077275A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Method and device for performing word segmentation based on context
CN105096933A (en) * 2015-05-29 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for generating word segmentation dictionary and method and apparatus for text to speech

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019034957A1 (en) * 2017-08-17 2019-02-21 International Business Machines Corporation Domain-specific lexically-driven pre-parser
US10445423B2 (en) 2017-08-17 2019-10-15 International Business Machines Corporation Domain-specific lexically-driven pre-parser
US10496744B2 (en) 2017-08-17 2019-12-03 International Business Machines Corporation Domain-specific lexically-driven pre-parser
GB2579957A (en) * 2017-08-17 2020-07-08 Ibm Domain-specific lexically-driven pre-parser
US10769376B2 (en) 2017-08-17 2020-09-08 International Business Machines Corporation Domain-specific lexical analysis
US10769375B2 (en) 2017-08-17 2020-09-08 International Business Machines Corporation Domain-specific lexical analysis
CN110390002A (en) * 2019-06-18 2019-10-29 深圳壹账通智能科技有限公司 Call resource allocation method, device, computer readable storage medium and server
CN112016297A (en) * 2020-08-27 2020-12-01 深圳壹账通智能科技有限公司 Intention recognition model testing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN105912521A (en) 2016-08-31

Similar Documents

Publication Publication Date Title
WO2017107518A1 (en) Method and apparatus for parsing voice content
JP6675463B2 (en) Bidirectional stochastic rewriting and selection of natural language
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
US10810272B2 (en) Method and apparatus for broadcasting search result based on artificial intelligence
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
Zajic et al. Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
JP5819860B2 (en) Compound word division
US7158930B2 (en) Method and apparatus for expanding dictionaries during parsing
US20050154580A1 (en) Automated grammar generator (AGG)
WO2014187096A1 (en) Method and system for adding punctuation to voice files
WO2012083892A1 (en) Method and device for filtering harmful information
WO2015127747A1 (en) Method and device for adding multimedia file
CN106649253B (en) Auxiliary control method and system based on rear verifying
CN106294460B (en) A kind of Chinese speech keyword retrieval method based on word and word Hybrid language model
JP2001101185A (en) Machine translation method and device capable of automatically switching dictionaries and program storage medium with program for executing such machine translation method stored therein
US20190138270A1 (en) Training Data Optimization in a Service Computing System for Voice Enablement of Applications
US20190138269A1 (en) Training Data Optimization for Voice Enablement of Applications
WO2012079257A1 (en) Method and device for machine translation
US10037321B1 (en) Calculating a maturity level of a text string
CN109190116B (en) Semantic analysis method, system, electronic device and storage medium
US20210312901A1 (en) Automatic learning of entities, words, pronunciations, and parts of speech
CN112149403A (en) Method and device for determining confidential text
Mrva et al. A PLSA-based language model for conversational telephone speech.
JP2011065380A (en) Opinion classification device and program
US11361761B2 (en) Pattern-based statement attribution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877330

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16877330

Country of ref document: EP

Kind code of ref document: A1