WO2017041510A1

WO2017041510A1 - Voice output method and device

Info

Publication number: WO2017041510A1
Application number: PCT/CN2016/082427
Authority: WO
Inventors: 王天一; 刘升平
Original assignee: 北京云知声信息技术有限公司
Priority date: 2015-09-08
Filing date: 2016-05-18
Publication date: 2017-03-16
Also published as: CN105304082B; CN107077845B; CN105304082A; CN107077845A

Abstract

A voice output method and device. The method comprises: receiving a voice input content input by a user (S11); determining a cognition degree of the user on a category, to which the voice input content belongs, according to the voice input content, wherein the cognition degree is a professional knowledge cognition degree of the user on the category (S12); and acquiring and outputting a voice output content matching the cognition degree from at least one voice output content corresponding to the voice input content (S13). By means of the technical solution, a voice output content matching a cognition degree of a user can be selected for the user according to the cognition degree of the user on a category, to which an input voice input content belongs, and can be output, so that the voice output content better conforms to the requirements of the user; a personalized voice output function is provided for the user; the voice output accuracy is improved, so that the user can acquire the maximum information quantity from the voice output content; and the user experience is improved.

Description

Voice output method and device

The present application is based on an invention patent application filed on September 8, 2015, the application number is CN201510568430.8, entitled "A Voice Output Method and Apparatus", and claims the priority of the invention patent application, the invention patent The entire contents of the application are incorporated herein by reference.

Technical field

The present invention relates to the field of information processing technologies, and in particular, to a voice output method and apparatus.

Background technique

At present, with the development of electronic technology, voice input is more and more praised by people. Voice input is an input method for converting people's spoken content into text through voice recognition. With the popularity of smart terminals in people's lives, more and more intelligent terminals gradually have the function of voice services. For example, users can ask questions through voice input. The voice software on smart terminals analyzes the user's voice and also uses voice. The way to answer the user's questions, thus providing help services to the user. However, this method brings great convenience to the user, and does not require the user to obtain an answer through a cumbersome online query. However, there is only one answer mode in the current voice service software, that is, different users ask the same question. (The main content of the question is the same), the same help information is output. The technical level or patent capability of different users is different. For users, different help information or different answering methods may be required. Therefore, the above methods cannot distinguish different technical requirements to provide voice assistance to users. Not targeted.

Summary of the invention

Embodiments of the present invention provide a voice output method and apparatus. The technical solution is as follows:

In a first aspect, a voice output method is provided, including the following steps:

Receiving voice input input by the user;

Determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs, the degree of recognition being a degree of knowledge of the user's professional knowledge of the category;

From the at least one voice output content corresponding to the voice input content, the voice output content matching the recognition degree is acquired and output.

Some beneficial effects of embodiments of the present invention may include:

According to the above technical solution, the user can select the voice output content that matches the recognition degree of the user according to the recognition degree of the input voice input content, so that the voice output content is more in line with the user's needs, thereby providing the user with A more personalized voice output function, while improving the accuracy of voice output, enabling users to obtain the maximum amount of information from the voice output content, improving the user experience.

In an embodiment, the determining, according to the voice input content, the user's awareness of a category to which the voice input content belongs includes:

Identifying voiceprint information of the user;

Determining, according to the voiceprint information, whether the voice input content of the user is received for the first time;

When the voice input content of the user is received for the first time, it is determined that the user's awareness of the category to which the voice input content belongs is a preset minimum awareness.

In this embodiment, according to whether the voice input content of the user is received for the first time, the user selects the matched voice output content for output, so that the voice output content is more in line with the user's needs, thereby providing the user with a more personalized voice output function, and simultaneously The accuracy of the voice output is improved, and the user can obtain the maximum amount of information from the voice output content, thereby improving the user experience.

In an embodiment, the method further includes:

Recording an input time and a duration of use of the voice input content, the duration of use being a duration between receipt of the voice input content and output of the voice output content.

In this embodiment, by recording the input time and the duration of the voice input content, when the user outputs the voice output content, the basis for determining the user's recognition is more abundant, thereby more accurately determining the user's recognition, and further Output more accurate and personalized voice output content for users.

Identifying voiceprint information of the user;

Determining, according to the voiceprint information of the user, whether the voice input content received twice adjacently is input by the same user;

When the voice input content received twice in the adjacent time is input by the same user, calculating the voice input content of the two adjacent received voices according to the input time and the duration of use of the voice input content received by the two adjacent times Time interval between

And determining, according to the time interval, the user's awareness of the category to which the voice input content belongs; wherein the longer the time interval, the lower the awareness.

In this embodiment, by calculating the time interval between the voice input contents of the same user received twice, the basis for determining the user's recognition is more abundant, thereby more accurately determining the user's recognition, and further Output more accurate and personalized voice output content for users.

Identifying voiceprint information of the user;

Acquiring, according to the voiceprint information of the user, history input record information corresponding to the user, where the history input record information includes at least one of historical accumulated use time, historical cumulative input times, and historical input frequency;

Determining, by the history input record information, the user's awareness of the category to which the voice input content belongs; wherein the longer the historical accumulated usage time, the higher the awareness; the historical cumulative input times The more the recognition, the higher the degree of recognition; the higher the historical input frequency, the higher the recognition.

In this embodiment, the user's recognition is determined according to the historical input record information corresponding to the user, so that the terminal can more accurately determine the user's recognition, thereby outputting more accurate and personalized voice output content for the user.

Extracting keywords in the voice input content;

Determining a degree of matching between a keyword in the voice input content and a preset keyword;

Determining, by the user, the recognition of the category to which the voice input content belongs according to the matching degree of the keyword in the voice input content and the preset keyword; wherein the keyword and the preset in the voice input content The higher the matching degree of the professional keyword in the keyword, the higher the recognition degree; the higher the matching degree between the keyword in the voice input content and the non-professional keyword in the preset keyword, the recognition The lower the knowledge.

In this embodiment, the user's recognition is determined according to the matching degree of the keyword in the voice input content and the preset keyword, so that the determination of the user's recognition is more accurate and personalized, thereby outputting more accurate and personalized for the user. Voice output content.

Determining a statement structure type of the voice input content, the statement structure type including a professional statement structure type or a non-professional statement structure type;

Determining the user's awareness of the category to which the voice input content belongs according to the sentence structure type of the voice input content; wherein the user's recognition of the category of the voice input content of the professional sentence structure type Higher than the recognition of the category of the voice input content of the non-professional statement structure type.

In this embodiment, the user's recognition is determined according to the sentence structure type of the voice input content, so that the determination of the user's recognition is more accurate and personalized, thereby outputting a more accurate and personalized voice output content for the user.

When it is determined that the voice input content received two times in the adjacent two is input by the same user, determining, between the two received voice input contents, according to the keywords in the voice input content received two times adjacent to each other Degree of association

Determining the user's awareness of the category to which the voice input content belongs according to the degree of association between the two received voice input contents; wherein the higher the degree of association, the awareness The lower.

In this embodiment, the user's awareness is determined according to the degree of association between the voice input contents of the same user received twice, so that the user's recognition is more accurate and personalized, thereby outputting more for the user. Accurate, personalized voice output.

Determining, according to the voice input content, at least two voice input parameters of the voice input content, where the voice input parameter comprises: voiceprint information of the user, and voice input content of two adjacent inputs of the same user The time interval, the historical input record information corresponding to the user, the matching degree of the keyword in the voice input content with the preset keyword, the statement structure type of the voice input content, and the second input of the same user Between voice input content Union degree

The user's awareness of the category to which the voice input content belongs is calculated according to the weight of each of the preset voice input parameters.

In this embodiment, the user's recognition of the category of the voice input content is calculated according to the different weights of the voice input parameters of the plurality of voice input contents, so that the determination of the user's recognition degree is more accurate and personalized, thereby outputting for the user. More accurate and personalized voice output.

When the voice input parameter of the voice input content cannot be determined, determining that the user's recognition of the category of the voice input content belongs to a preset minimum awareness.

In this embodiment, for the voice input content that cannot determine the voice input parameter, the voice output content matching the voice input content is output, thereby providing the user with a more accurate and personalized voice output function, so that the user can output from the voice. Get more useful information in the content to improve the user experience.

In one embodiment, the obtaining, and outputting, from the at least one voice output content corresponding to the voice input content, the voice output content that matches the recognition, includes:

Determining a cognitive level corresponding to the recognition according to a correspondence between the recognition level and the cognitive level;

Acquiring the voice output content corresponding to the cognitive level according to the correspondence between the cognitive level and the voice output content;

The voice output content is output.

In this embodiment, according to the correspondence between the cognitive level and the voice output content, the user selects the matched voice output content for output, so as to select the voice output content that matches the user's recognition for the user to output, so that The voice output content is more in line with the user's needs, and the accuracy of the voice output is improved, so that the user can obtain the maximum amount of information from the voice output content, thereby improving the user experience.

In an embodiment, the method further includes:

The history input record information is updated according to an input time and a usage duration of the voice input content.

In this embodiment, by updating the history input record information, when the voice output content is output again for the user, the user's recognition can be determined according to the accurate history input record, thereby outputting more accurate voice output content for the user.

In an embodiment, the method further includes:

Storing the user's awareness of the category to which the voice input content belongs;

Determining the user's awareness of the category to which the voice input content belongs according to the voice input content, including:

Identifying voiceprint information of the user;

The user's voiceprint information is used to query the user's awareness of the category to which the voice input content belongs.

In this embodiment, by querying the user's awareness, it is more convenient and quick to determine the user's input to the voice. The recognition of the category, so that the user can select the matching voice output content for output more accurately and quickly.

In a second aspect, a voice output device is provided, including:

a receiving module, configured to receive a voice input content input by a user;

a determining module, configured to determine, according to the voice input content, the user's awareness of a category to which the voice input content belongs, the awareness being a degree of knowledge of the user's professional knowledge of the category;

And an output module, configured to acquire and output the voice output content that matches the recognition degree from the at least one voice output content corresponding to the voice input content.

In an embodiment, the determining module comprises:

a first identification submodule, configured to identify voiceprint information of the user;

a first determining sub-module, configured to determine, according to the voiceprint information, whether to receive the voice input content of the user for the first time;

The second determining submodule is configured to determine, when the voice input content of the user is received for the first time, the user's awareness of the category to which the voice input content belongs is a preset minimum awareness.

In one embodiment, the apparatus further includes:

And a recording module, configured to record an input time and a usage duration of the voice input content, where the usage duration is a duration between receiving the voice input content and outputting the voice output content.

In an embodiment, the determining module comprises:

a second identification submodule, configured to identify voiceprint information of the user;

a second determining sub-module, configured to determine, according to the voiceprint information of the user, whether the voice input content received twice adjacently is input by the same user;

a first calculation sub-module, configured to calculate two adjacent two voice input contents received by the same user when inputting the voice input content of the two adjacent voice input contents, and calculating the adjacent two The time interval between the received voice input contents;

And a third determining submodule, configured to determine, according to the time interval, the user's awareness of the category to which the voice input content belongs; wherein the longer the time interval, the lower the awareness.

In an embodiment, the determining module comprises:

a third identification submodule, configured to identify voiceprint information of the user;

a first obtaining sub-module, configured to acquire historical input record information corresponding to the user according to the voiceprint information of the user, where the historical input record information includes a historical accumulated use time, a historical cumulative input number, and a historical input frequency. At least one piece of information;

a fourth determining submodule, configured to determine, according to the historical input record information, the user's awareness of a category to which the voice input content belongs; wherein the longer the historical accumulated usage time, the more the cognitive High; the more the historical cumulative input times, the higher the awareness; the higher the historical input frequency, the higher the recognition.

In an embodiment, the determining module comprises:

Extracting a sub-module for extracting keywords in the voice input content;

a fifth determining submodule, configured to determine a matching degree between the keyword in the voice input content and the preset keyword;

a sixth determining submodule, configured to determine, according to a matching degree of the keyword in the voice input content and a preset keyword, the user's recognition of a category to which the voice input content belongs; wherein the voice input The higher the degree of matching between the keyword in the content and the professional keyword in the preset keyword, the higher the recognition; the keyword in the voice input content and the non-professional keyword in the preset keyword The higher the degree of matching, the lower the awareness.

In an embodiment, the determining module comprises:

a seventh determining submodule, configured to determine a statement structure type of the voice input content, where the statement structure type includes a professional statement structure type or a non-professional statement structure type;

An eighth determining submodule, configured to determine, according to a statement structure type of the voice input content, a recognition of a category of the voice input content by the user; wherein, the user voices the type of the professional sentence structure The recognition of the category to which the input content belongs is higher than the recognition of the category of the voice input content of the non-professional sentence structure type.

In an embodiment, the determining module comprises:

a ninth determining sub-module, configured to determine, when the voice input content received twice in the adjacent two times is input by the same user, determining the adjacent two times according to keywords in the voice input content received twice adjacent to each other The degree of association between the received voice input content;

a tenth determining submodule, configured to determine, according to the degree of association between the two received voice input contents, the user's awareness of the category to which the voice input content belongs; wherein the degree of association The higher the recognition, the lower the awareness.

In an embodiment, the determining module comprises:

An eleventh determining submodule, configured to determine, according to the voice input content, at least two voice input parameters of the voice input content, where the voice input parameter comprises: voiceprint information of the user, adjacent to the same user a time interval between two input voice input contents, history input record information corresponding to the user, a degree of matching between a keyword in the voice input content and a preset keyword, and a sentence structure of the voice input content The degree of association between the type and the voice input input twice between the same user;

And a calculation submodule, configured to calculate, according to a preset weight of each of the voice input parameters, the user's awareness of the category to which the voice input content belongs.

In an embodiment, the determining module comprises:

The twelfth determining submodule is configured to determine, when the voice input parameter of the voice input content cannot be determined, the recognition of the category of the voice input content by the user as a preset minimum awareness.

In one embodiment, the output module comprises:

a thirteenth determining submodule, configured to determine a cognitive level corresponding to the cognition according to a correspondence between the cognition and the cognition level;

a second obtaining submodule, configured to acquire, according to a correspondence between the cognitive level and the voice output content, the voice output content corresponding to the cognitive level;

An output submodule for outputting the voice output content.

In one embodiment, the apparatus further includes:

And an update module, configured to update the historical input record information according to an input time and a usage duration of the voice input content.

In one embodiment, the apparatus further includes:

a storage module, configured to store, by the user, an awareness of a category to which the voice input content belongs;

The determining module includes:

a fourth identification submodule, configured to identify voiceprint information of the user;

The query sub-module is configured to query, according to the voiceprint information of the user, the user's awareness of the category to which the voice input content belongs.

Some beneficial effects of embodiments of the present invention may include:

The device can output a voice output content that matches the recognition degree of the user according to the user's recognition of the category of the input voice input content, so that the voice output content is more in line with the user's needs, thereby providing the user with more The personalized voice output function improves the accuracy of the voice output, enabling the user to obtain the maximum amount of information from the voice output content, thereby improving the user experience.

In a third aspect, a voice output device is provided, the device comprising:

processor;

a memory for storing the processor executable instructions;

Wherein the processor is configured to:

Receiving voice input input by the user;

The above processor is also configured to:

Identifying voiceprint information of the user;

The above processor is also configured to:

Identifying voiceprint information of the user;

When the voice input content received two times adjacently is input by the same user, according to the two received words The input time and duration of the audio input content, and calculate the time interval between the two received voice input contents;

The above processor is also configured to:

Identifying voiceprint information of the user;

The above processor is also configured to:

Extracting keywords in the voice input content;

The above processor is also configured to:

Determining, according to the voice input content, at least two voice input parameters of the voice input content, where the voice input parameter comprises: voiceprint information of the user, and voice input content of two adjacent inputs of the same user The time interval, the historical input record information corresponding to the user, the matching degree of the keyword in the voice input content with the preset keyword, the statement structure type of the voice input content, and the second input of the same user The degree of association between voice input content;

Calculating, according to the weight of each of the preset voice input parameters, the category of the voice input content of the user understanding.

The above processor is also configured to:

The voice output content is output.

The above processor is also configured to:

Identifying voiceprint information of the user;

According to a fourth aspect, there is provided a non-transitory computer readable recording medium having recorded thereon a computer program, the program comprising instructions for performing the method of the first aspect of the embodiment of the present invention.

In a fifth aspect, a computer program is provided, the program comprising: instructions for performing the method of the first aspect of the embodiment of the invention when the program is executed by a computer.

Other features and advantages of the invention will be set forth in the description which follows, The objectives and other advantages of the invention may be realized and obtained by means of the structure particularly pointed in the appended claims.

The technical solution of the present invention will be further described in detail below through the accompanying drawings and embodiments.

DRAWINGS

The drawings are intended to provide a further understanding of the invention, and are intended to be a In the drawing:

1 is a flowchart of a voice output method according to an embodiment of the present invention;

2 is a flowchart of step S12 in a voice output method according to an embodiment of the present invention;

FIG. 3 is a flowchart of step S12 in a voice output method according to an embodiment of the present invention;

4 is a flowchart of step S12 in a voice output method according to an embodiment of the present invention;

FIG. 5 is a flowchart of step S12 in a voice output method according to an embodiment of the present invention;

FIG. 6 is a flowchart of step S13 in a voice output method according to an embodiment of the present invention;

FIG. 7 is a block diagram of a voice output apparatus according to an embodiment of the present invention; FIG.

8 is a block diagram of a determining module in a voice output device according to an embodiment of the present invention;

9 is a block diagram of a determining module in a voice output device according to an embodiment of the present invention;

10 is a block diagram of a determining module in a voice output device according to an embodiment of the present invention;

11 is a block diagram of a determining module in a voice output device according to an embodiment of the present invention;

12 is a block diagram of an output module in a voice output device according to an embodiment of the present invention;

FIG. 13 is a block diagram of a voice output apparatus according to an embodiment of the present invention; FIG.

FIG. 14 is a block diagram of an apparatus for performing a voice output method according to an embodiment of the present invention.

detailed description

The preferred embodiments of the present invention are described with reference to the accompanying drawings, which are intended to illustrate and illustrate the invention.

FIG. 1 is a flowchart of a voice output method according to an embodiment of the present invention. As shown in FIG. 1 , the method is used in a terminal, and the terminal may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc., including the following steps. S11-S13:

In step S11, the voice input content input by the user is received.

In this step, the user can input the voice input content by inputting a voice.

Step S12: Determine, according to the voice input content, the user's recognition of the category to which the voice input content belongs; the awareness is the degree of professional knowledge of the user's category of the voice input content.

For example, if the user inputs the voice input content "how to set the air conditioning temperature", then the user's recognition of the category of the voice input content is the user's knowledge of the air conditioning class; the user inputs the voice input content "What is the medicine of aspirin" Then, the user's recognition of the category of the voice input content is the user's knowledge of the professional knowledge of the medicine. The terminal can determine the category to which the voice input content belongs by extracting keywords in the voice input content.

Step S13: Acquire and output the voice output content that matches the recognition degree from the at least one voice output content corresponding to the voice input content.

According to the technical solution provided by the embodiment of the present invention, according to the user's recognition of the category of the input voice input content, the voice output content matching the recognition degree is selected for the user to output, so that the voice output content is more in line with the user. The demand provides users with more personalized voice output functions, and at the same time improves the accuracy of voice output, enabling users to obtain the maximum amount of information from the voice output content, thereby improving the user experience.

In step S12, the user's awareness of the category to which the voice input content belongs may be determined in various ways. Firstly, according to the voice input content, the voice input parameter of the voice input content is determined, and then the user's recognition of the category of the voice input content is determined according to the voice input parameter. The method for determining the degree of recognition may be different according to different voice input parameters, and the voice input parameter may include the voiceprint information of the user, the time interval between the voice input contents of the two adjacent inputs of the same user, and the user. The corresponding history input record information, the matching degree of the keyword in the voice input content with the preset keyword, the statement structure type of the voice input content, and the association between the voice input contents input by the same user twice Degree, and so on. The embodiment of step S12 will be described below by means of different embodiments.

In an embodiment, as shown in FIG. 2, step S12 may be implemented as the following steps S21-S23:

In step S21, the voiceprint information of the user is identified.

In step S22, based on the voiceprint information, it is determined whether the voice input content of the user is received for the first time.

In step S23, when the content of the voice input content of the user is received for the first time, it is determined that the user's recognition of the category of the voice input content belongs to the preset minimum awareness.

In this embodiment, the voiceprint information corresponding to different users is stored in the terminal. When the user inputs the voice input content, if the terminal can query the voiceprint information of the user in the voiceprint information stored in advance, the first time is not received. The voice input content of the user, and if the terminal fails to query the voiceprint information of the user in the pre-stored voiceprint information, the terminal indicates that the voice input content of the user is received for the first time. When the content is not input for receiving the voice of the user for the first time, the terminal continues to determine other item voice input parameters according to the voice input content, and performs step S12 according to the other item voice input parameters. In the terminal, a correspondence between the recognition and the voice input content is stored in advance, and the voice input content corresponding to the preset minimum recognition is included.

In one embodiment, the above method further comprises the step of recording an input time and a duration of use of the voice input content, the duration of use being the length of time between receipt of the voice input content and output of the voice output content. Therefore, as shown in FIG. 3, step S12 can be implemented as the following steps S31-S34:

In step S31, the voiceprint information of the user is identified.

Step S32: Determine, according to the voiceprint information of the user, whether the voice input content received twice adjacently is input by the same user.

Step S33, when the voice input content received twice in the adjacent two times is input by the same user, calculate the voice input content received twice in the adjacent two according to the input time and the usage duration of the two received voice input contents. The time interval between.

Step S34: Determine, according to the time interval, the user's awareness of the category to which the voice input content belongs; wherein the longer the time interval, the lower the recognition.

In this embodiment, when the voice input content received two times adjacently is input by the same user, the time interval between the voice input contents received twice adjacently may reflect the previous voice output by the user to the terminal. The response time of the output content, in addition, the response time of the user to the last voice output content output by the terminal can also be characterized by the time interval between the last output of the voice output content and the content of the voice input received this time. For example, the voice input content received by the terminal last time is “how to set the air conditioner temperature”, and for the voice input content, the terminal outputs the corresponding voice output content as “first enters the temperature adjustment mode, then changes the temperature”; The received voice input content is “how to enter the temperature adjustment mode”. When the terminal determines that the voice input content received twice in the adjacent time is input by the same user, the voice input content “how to set the air conditioner temperature” may be adopted. The time interval between receiving the voice input content "how to enter the temperature adjustment mode" to characterize the response time of the user to the previous voice output content "first enters the temperature adjustment mode, then changes the temperature", thereby determining the user's input to the voice The recognition of the category. Alternatively, you can also use the output voice output content to “enter the temperature adjustment mode first, then change the temperature” and the voice input content received this time. The time interval between how to enter the temperature adjustment mode is used to characterize the response time of the user to the first voice output content "first enters the temperature adjustment mode and then changes the temperature", thereby determining the user's awareness of the category of the voice input content. The longer the time interval, the longer the user's response to the previous voice output, and the lower the awareness.

In addition, a preset time interval may be preset, when the time interval between the two received voice input contents is the same user input, and the time interval between the two received voice input contents exceeds the preset time During the interval, the terminal may directly determine that the user's recognition of the category of the voice input content is the preset minimum awareness, and acquire the voice output content that matches the preset minimum recognition for output.

In one embodiment, as shown in FIG. 4, step S12 can be implemented as the following steps S41-S43:

In step S41, the voiceprint information of the user is identified.

Step S42: Acquire historical input record information corresponding to the user according to the voiceprint information of the user; the historical input record information includes at least one of historical accumulated use time, historical cumulative input times, and historical input frequency.

Step S43, determining the user's recognition of the category of the voice input content according to the historical input record information; wherein, the longer the historical cumulative use time, the higher the recognition degree; the more the historical cumulative input times, the higher the recognition degree; The higher the historical input frequency, the higher the awareness.

In this embodiment, each time the terminal receives the voice input content input by the user, the input time and the usage duration of the voice input content are recorded, and the usage duration is the length of time between receiving the voice input content and outputting the voice output content. The terminal can count the historical input record information corresponding to the user according to the recorded input time and the duration of use, wherein the historical accumulated use time is the sum of the used durations recorded each time. In addition, the above method further includes the step of: updating the history input record information according to the input time and the usage duration of the voice input content. In this way, when the terminal determines the user's recognition of the category of the voice input content according to the historical input record information corresponding to the user, the history input record information is more rich and accurate, so that the user can select a more accurate and personalized voice. The output is output.

In an embodiment, as shown in FIG. 5, step S12 may be implemented as the following steps S51-S53:

In step S51, keywords in the voice input content are extracted.

Step S52, determining a degree of matching between the keyword in the voice input content and the preset keyword.

Step S53: determining, according to the matching degree of the keyword in the voice input content and the preset keyword, the user's recognition of the category of the voice input content; wherein, the keyword in the voice input content and the professional in the preset keyword The higher the matching degree of the keyword, the higher the recognition degree; the higher the matching degree between the keyword in the voice input content and the non-professional keyword in the preset keyword, the lower the recognition degree.

In this embodiment, the preset keywords pre-stored in the terminal include two types of professional keywords and non-professional keywords. When performing step S52, it is necessary to separately determine the matching between the keywords in the voice input content and the professional keywords. Degree, and the degree of matching with non-professional keywords. For example, professional keywords include "set path", and non-professional keywords include "how to use". If the voice input content received by the terminal is "set path of ...", then the keywords and professions in the voice input content can be determined. The matching degree between the keywords is higher, so the user's awareness of the category of the voice input content is higher; if the voice input content received by the terminal is "how to use", then the voice input content can be determined. The keyword has a higher degree of matching with the non-professional keyword, so the user’s awareness of the category of the voice input content is higher. low.

In an embodiment, step S12 can be implemented as the following steps A1-A2:

In step A1, the statement structure type of the voice input content is determined, and the statement structure type includes a professional statement structure type or a non-professional statement structure type.

Step A2: determining, according to the sentence structure type of the voice input content, the user's recognition of the category of the voice input content; wherein, the user's recognition of the category of the voice input content of the professional sentence structure type is higher than that of the non-professional statement structure The type of voice input content is recognized by the category.

In this embodiment, the statement structure type is pre-stored in the terminal, and the statement structure type can be embodied by a regular expression. Among them, the regular expressions of the professional sentence structure type are: adjective + noun + verb; non-professional statement structure type regular expression such as: pronoun + verb. It should be pointed out that the expression of the statement structure type is not limited to regular expressions, but can also be embodied in other ways that can reflect the structure of the statement. For example, as follows, the voice input content received by the terminal is “what is the step of booting up”, and the terminal determines that the structure structure of the voice input content is “adjective + noun + verb + pronoun” by analyzing the voice input content, then The statement structure type of the voice input content is determined to be a professional sentence structure type, and the user has a higher awareness of the category to which the voice input content belongs. For another example, the voice input content received by the terminal is “how to use this thing”, and the terminal determines the voice input content, and determines that the sentence structure type of the voice input content is “pronoun + verb”, then the voice input content can be determined. The statement structure type is a non-professional statement structure type, and the user has low awareness of the category to which the voice input belongs.

In an embodiment, step S12 can be implemented as the following steps B1-B2:

Step B1, when it is determined that the voice input content received two times in the adjacent two is input by the same user, determining the voice input content of the two adjacent received voices according to the keywords in the voice input content received two times adjacent to each other. The degree of association between them.

In step B2, the user's recognition of the category of the voice input content is determined according to the degree of association between the two received voice input contents; wherein the higher the degree of association, the lower the degree of recognition.

In this embodiment, when the voice input content received two times adjacently is input by the same user, the degree of association between the voice input contents received twice adjacently may reflect the user's content of the previous voice output. The degree of understanding, so the higher the degree of association between the two received voice input contents, the lower the understanding of the user's previous voice output content, and the more the user's awareness of the category of the voice input content. Low; the lower the degree of association between the two received voice input contents, the higher the user's understanding of the previous voice output content, and the higher the user's awareness of the category of the voice input content. For example, the voice input content received by the terminal last time is “How to set the air conditioner temperature”, and the voice input content received by the terminal at this time is “How to enter the temperature adjustment mode”, when the terminal determines the voice input received twice adjacently. When the content is input by the same user, keywords in the voice input content received twice adjacently can be extracted, such as keywords "air conditioning temperature", "temperature adjustment mode", by keyword "air conditioning temperature" and keyword " The degree of association between the temperature adjustment modes determines the degree of association between the two previously received speech input contents, since both "air conditioning temperature" and "temperature adjustment mode" are temperature related keywords, so both The degree of correlation between the two is higher. For another example, the voice input content received by the terminal last time is “how to set the air conditioner temperature”, and the voice input content received by the terminal at this time is “what is the step of powering on”, when the terminal determines the voice received twice adjacently. Input is the same When the user inputs, the keywords in the adjacent two voice input contents are respectively extracted as "air conditioning temperature" and "power on", since the two keywords are two unrelated types of keywords, The degree of association is almost zero, which means that the degree of association between the two received voice input contents is very low, and the user has a higher degree of understanding of the previous voice output content, thereby indicating the user's category of the voice input content. High recognition.

In an embodiment, for the multiple manners of performing step S12 in the foregoing embodiment, multiple voice input parameters may also be combined, and the user's recognition of the category of the voice input content is calculated according to the preset weight. Therefore, the foregoing step S12 may be further implemented as: determining, according to the voice input content, at least two voice input parameters of the voice input content, wherein the voice input parameter comprises: voice tone information of the user, and two adjacent inputs of the same user The time interval between the voice input contents, the history input record information corresponding to the user, the matching degree of the keyword in the voice input content with the preset keyword, the statement structure type of the voice input content, and the input of the same user twice The degree of association between the voice input contents; calculating the user's awareness of the category to which the voice input content belongs according to the weight of each of the preset voice input parameters.

In an embodiment, the method further includes the step of determining that the user's recognition of the category of the voice input content is a preset minimum awareness when the voice input parameter of the voice input content cannot be determined. In this embodiment, when the voice input content input by the user is received, for the voice input content that cannot determine the voice input content, the terminal may directly determine that the user's recognition of the category of the voice input content belongs to the preset minimum awareness. Therefore, even if the voice input content of the voice input parameter cannot be determined, the user can obtain the voice output content matching the same, thereby improving the user experience.

In one embodiment, the above method further comprises the step of storing the user's awareness of the category to which the voice input content belongs. At this time, step S12 may be implemented as the following steps: identifying the voiceprint information of the user; and querying the user's awareness of the category of the voice input content according to the voiceprint information of the user. In this embodiment, by querying the user's awareness, it is more convenient and quick to determine the user's recognition of the category of the voice input content, thereby outputting the matched voice output content for the user more accurately and quickly.

In one embodiment, as shown in FIG. 6, step S13 can be implemented as the following steps S61-S63:

In step S61, the cognitive level corresponding to the cognition is determined according to the correspondence between the cognition and the cognition level.

Step S62: Acquire a voice output content corresponding to the cognitive level according to the correspondence between the cognitive level and the voice output content.

In step S63, the voice output content is output.

In this embodiment, the terminal pre-stores a correspondence between the cognitive level and the cognitive level, and a correspondence between the cognitive level and the voice output content. For example, the cognitive level may be classified into a low cognitive level according to needs. There are three levels of cognition level and high cognition level. The cognition is in the corresponding low cognitive level between “0% and 30%”, and the recognition is recognized in the correspondence between “31% to 70%”. Knowing the level, the recognition is in the corresponding high cognitive level between "71% to 100%". The voice output content corresponding to the low cognitive level is the detailed version of the voice output content, the voice output content corresponding to the middle cognitive level is the standard version of the voice output content, and the voice output content corresponding to the high cognitive level is the compact version of the voice output content. For each voice input content, the terminal stores the three voice output contents corresponding to the detailed version, the compact version, and the standard version. For example, for speech Input content "How to set the air conditioning temperature", the corresponding voice output content includes: detailed version "Click the mode button in the middle of the first row, click twice to enter the temperature adjustment mode, click the left button of the second row '+/-' Change the temperature, click once, the temperature '+/-'1 degree"; Standard version "Click the mode button to enter the temperature adjustment mode, click the button '+/-' to change the temperature"; the simple version "first enter the temperature adjustment mode, then change the temperature ". In addition, the cognitive level corresponding to the preset minimum recognition may be a low cognitive level. Therefore, for the voice input content that cannot determine the voice input parameter, or the voice input content of the user is received for the first time, the terminal may directly output the detailed version. Voice output content. It can be seen that, by using the technical solution of the embodiment, when the terminal outputs the voice output content for the user, the terminal can analyze the current demand of the user by determining the user's recognition of the category of the voice input content, and output according to the current demand of the user. The voice output content matched with it enables the user to obtain more and more accurate information from the voice output content.

Corresponding to the above-mentioned voice output method, the embodiment of the present invention further provides a voice output device, which is used to perform the above method.

FIG. 7 is a block diagram of a voice output device according to an embodiment of the present invention. As shown in Figure 7, the device includes:

The receiving module 71 is configured to receive voice input content input by the user;

The determining module 72 is configured to determine, according to the voice input content, the user's recognition of the category of the voice input content, and the recognition degree is the user's knowledge of the category's professional knowledge;

The output module 73 is configured to acquire and output the voice output content that matches the recognition degree from the at least one voice output content corresponding to the voice input content.

In one embodiment, as shown in FIG. 8, the determining module 72 includes:

a first identification sub-module 721, configured to identify voiceprint information of the user;

The first determining sub-module 722 is configured to determine, according to the voiceprint information, whether the voice input content of the user is received for the first time;

The second determining sub-module 723 is configured to determine, when the voice input content of the user is received for the first time, the user's recognition of the category to which the voice input content belongs is a preset minimum awareness.

In an embodiment, the above apparatus further includes:

The recording module is configured to record an input time and a usage duration of the voice input content, and the usage duration is a duration between receiving the voice input content and outputting the voice output content.

In one embodiment, as shown in FIG. 9, the determining module 72 includes:

a second identification sub-module 724, configured to identify voiceprint information of the user;

The second determining sub-module 725 is configured to determine, according to the voiceprint information of the user, whether the voice input content received twice adjacently is input by the same user;

The first calculation sub-module 726 is configured to calculate the adjacent two times according to the input time and the usage duration of the two received voice input contents when the voice input content received twice in the adjacent two times is input by the same user. The time interval between received speech input content;

The third determining sub-module 727 is configured to determine, according to the time interval, the user's awareness of the category to which the voice input content belongs; wherein the longer the time interval, the lower the awareness.

In one embodiment, as shown in FIG. 10, the determining module 72 includes:

a third identification sub-module 728, configured to identify voiceprint information of the user;

The first obtaining sub-module 729 is configured to acquire historical input record information corresponding to the user according to the voiceprint information of the user, where the historical input record information includes at least one of historical accumulated use time, historical cumulative input times, and historical input frequency;

The fourth determining sub-module 7210 is configured to determine, according to the historical input record information, the user's recognition of the category of the voice input content; wherein, the longer the historical accumulated usage time, the higher the recognition degree; the more the historical cumulative input times, The higher the awareness; the higher the historical input frequency, the higher the recognition.

In one embodiment, as shown in FIG. 11, the determining module 72 includes:

Extracting a sub-module 7211 for extracting keywords in the voice input content;

a fifth determining sub-module 7212, configured to determine a matching degree between the keyword in the voice input content and the preset keyword;

The sixth determining sub-module 7213 is configured to determine, according to the matching degree of the keyword in the voice input content and the preset keyword, the user's recognition of the category of the voice input content; wherein, the keyword and the pre-information in the voice input content The higher the matching degree of the professional keywords in the keywords, the higher the recognition degree; the higher the matching degree between the keywords in the voice input content and the non-professional keywords in the preset keywords, the lower the recognition.

In one embodiment, the determining module 72 includes:

a seventh determining submodule, configured to determine a statement structure type of the voice input content, the statement structure type including a professional statement structure type or a non-professional statement structure type;

The eighth determining submodule is configured to determine, according to the sentence structure type of the voice input content, the user's recognition of the category of the voice input content; wherein the user has higher recognition of the category of the voice input content of the professional sentence structure type Awareness of the category of the voice input content of the non-professional statement structure type.

In one embodiment, the determining module 72 includes:

a ninth determining sub-module, configured to determine, when the voice input content received twice in the adjacent two times is input by the same user, determining that the two adjacent ones are received according to the keywords in the voice input content received two times adjacent to each other The degree of association between voice input content;

a tenth determining sub-module, configured to determine, according to the degree of association between the two received voice input contents, the user's recognition of the category of the voice input content; wherein the higher the degree of association, the lower the degree of recognition .

In one embodiment, the determining module 72 includes:

The eleventh determining sub-module is configured to determine at least two voice input parameters of the voice input content according to the voice input content, where the voice input parameter comprises: voice tone information of the user, and voice input content of two adjacent inputs of the same user The time interval between the time, the historical input record information corresponding to the user, the matching degree of the keyword in the voice input content with the preset keyword, the sentence structure type of the voice input content, and the voice input content input twice by the same user Degree of association

The calculation sub-module is configured to calculate the user's recognition of the category of the voice input content according to the weight of each preset voice input parameter.

In one embodiment, the determining module 72 includes:

The twelfth determining submodule is configured to determine the user to the voice when the voice input parameter of the voice input content cannot be determined The recognition of the category to which the input content belongs is the preset minimum awareness.

In one embodiment, as shown in FIG. 12, the output module 73 includes:

a thirteenth determining sub-module 731, configured to determine a cognitive level corresponding to the cognition according to a correspondence between the cognition and the cognition level;

The second obtaining sub-module 732 is configured to acquire, according to a correspondence between the cognitive level and the voice output content, the voice output content corresponding to the cognitive level;

The output sub-module 733 is configured to output the voice output content.

In an embodiment, as shown in FIG. 13, the foregoing apparatus further includes:

The updating module 74 is configured to update the historical input record information according to the input time and the usage duration of the voice input content.

The storage module 75 is configured to store the user's awareness of the category to which the voice input content belongs.

In one embodiment, the determining module 72 includes:

a fourth identification sub-module, configured to identify voiceprint information of the user;

The query sub-module is configured to query the user's recognition of the category of the voice input content according to the user's voiceprint information.

According to the device provided by the embodiment of the present invention, according to the user's recognition of the category of the input voice input content, the voice output content matching the recognition degree is selected for the user to output, so that the voice output content is more in line with the user's needs. Therefore, the user is provided with a more personalized voice output function, and the accuracy of the voice output is improved, so that the user can obtain the maximum amount of information from the voice output content, thereby improving the user experience.

FIG. 14 is a block diagram of an apparatus for performing a voice output method, according to an exemplary embodiment. For example, device 1600 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

Referring to Figure 14, device 1600 can include one or more of the following components: processor 1601, memory 1602, and communication component 1603.

The processor 1601 typically controls the overall operation of the device 1600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processor 1601 can execute instructions to perform all or part of the steps of the above method.

Memory 1602 is configured to store various types of data to support operation at device 1600. Examples of such data include instructions for any application or method operating on device 1600, contact data, phone book data, messages, pictures, videos, and the like. The memory 1602 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable. Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.

Communication component 1603 is configured to facilitate wired or wireless communication between device 1600 and other devices. The device 1600 can access a wireless network based on a communication standard, such as Wi-Fi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1603 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, communication component 1603 further includes a near field communication (NFC) module to facilitate short range Communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, device 1600 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the voice output method described above.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory 1602 comprising instructions executable by processor 1601 of apparatus 1600 to perform the voice output method described above. For example, the non-transitory computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.

The present invention also provides a non-transitory computer readable recording medium having recorded thereon a computer program including instructions for executing the voice output method according to the above-described embodiment of the present invention.

The present invention also provides a computer program comprising: instructions for executing a voice output method according to the above-described embodiment of the present invention when the program is executed by a computer.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims

A voice output method, comprising:

Receiving voice input input by the user;

Determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs, the degree of recognition being a degree of knowledge of the user's professional knowledge of the category;

From the at least one voice output content corresponding to the voice input content, the voice output content matching the recognition degree is acquired and output.
The method according to claim 1, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

Identifying voiceprint information of the user;

Determining, according to the voiceprint information, whether the voice input content of the user is received for the first time;

When the voice input content of the user is received for the first time, it is determined that the user's awareness of the category to which the voice input content belongs is a preset minimum awareness.
The method of claim 1 further comprising:

Recording an input time and a duration of use of the voice input content, the duration of use being a duration between receipt of the voice input content and output of the voice output content.
The method according to claim 3, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

Identifying voiceprint information of the user;

Determining, according to the voiceprint information of the user, whether the voice input content received twice adjacently is input by the same user;

When the voice input content received twice in the adjacent time is input by the same user, calculating the voice input content of the two adjacent received voices according to the input time and the duration of use of the voice input content received by the two adjacent times Time interval between

And determining, according to the time interval, the user's awareness of the category to which the voice input content belongs; wherein the longer the time interval, the lower the awareness.
The method according to claim 3, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

Identifying voiceprint information of the user;

Acquiring, according to the voiceprint information of the user, history input record information corresponding to the user, where the history input record information includes at least one of historical accumulated use time, historical cumulative input times, and historical input frequency;

Determining, by the history input record information, the user's awareness of the category to which the voice input content belongs; wherein the longer the historical accumulated usage time, the higher the awareness; the historical cumulative input times The more the recognition, the higher the degree of recognition; the higher the historical input frequency, the higher the recognition.
The method according to claim 1, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

Extracting keywords in the voice input content;

Determining a degree of matching between a keyword in the voice input content and a preset keyword;

Determining, by the user, the recognition of the category to which the voice input content belongs according to the matching degree of the keyword in the voice input content and the preset keyword; wherein the keyword and the preset in the voice input content The higher the matching degree of the professional keyword in the keyword, the higher the recognition degree; the higher the matching degree between the keyword in the voice input content and the non-professional keyword in the preset keyword, the recognition The lower the knowledge.
The method according to claim 1, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

Determining a statement structure type of the voice input content, the statement structure type including a professional statement structure type or a non-professional statement structure type;

Determining the user's awareness of the category to which the voice input content belongs according to the sentence structure type of the voice input content; wherein the user's recognition of the category of the voice input content of the professional sentence structure type Higher than the recognition of the category of the voice input content of the non-professional statement structure type.
The method according to claim 1, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

When it is determined that the voice input content received two times in the adjacent two is input by the same user, determining, between the two received voice input contents, according to the keywords in the voice input content received two times adjacent to each other Degree of association

Determining the user's awareness of the category to which the voice input content belongs according to the degree of association between the two received voice input contents; wherein the higher the degree of association, the awareness The lower.
The method according to claim 1, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

Determining, according to the voice input content, at least two voice input parameters of the voice input content, where the voice input parameter comprises: voiceprint information of the user, and voice input content of two adjacent inputs of the same user The time interval, the historical input record information corresponding to the user, the matching degree of the keyword in the voice input content with the preset keyword, the statement structure type of the voice input content, and the second input of the same user The degree of association between voice input content;

The user's awareness of the category to which the voice input content belongs is calculated according to the weight of each of the preset voice input parameters.
The method according to claim 9, wherein the determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs includes:

When the voice input parameter of the voice input content cannot be determined, determining that the user's recognition of the category of the voice input content belongs to a preset minimum awareness.
The method according to claim 1, wherein said from said voice input content And acquiring, in the at least one voice output content, the voice output content that matches the recognition, including:

Determining a cognitive level corresponding to the recognition according to a correspondence between the recognition level and the cognitive level;

Acquiring the voice output content corresponding to the cognitive level according to the correspondence between the cognitive level and the voice output content;

The voice output content is output.
The method of claim 5, wherein the method further comprises:

The history input record information is updated according to an input time and a usage duration of the voice input content.
The method of claim 1 further comprising:

Storing the user's awareness of the category to which the voice input content belongs;

Determining the user's awareness of the category to which the voice input content belongs according to the voice input content, including:

Identifying voiceprint information of the user;

The user's voiceprint information is used to query the user's awareness of the category to which the voice input content belongs.
A voice output device, characterized in that the device comprises:

processor;

a memory for storing the processor executable instructions;

Wherein the processor is configured to perform a voice output method, the method comprising:

Receiving voice input input by the user;

Determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs, the degree of recognition being a degree of knowledge of the user's professional knowledge of the category;

From the at least one voice output content corresponding to the voice input content, the voice output content matching the recognition degree is acquired and output.
The apparatus of claim 14 wherein the processor is further configured to:

Identifying voiceprint information of the user;

Determining, according to the voiceprint information, whether the voice input content of the user is received for the first time;

When the voice input content of the user is received for the first time, it is determined that the user's awareness of the category to which the voice input content belongs is a preset minimum awareness.
The apparatus of claim 14 wherein the processor is further configured to:

Recording an input time and a duration of use of the voice input content, the duration of use being a duration between receipt of the voice input content and output of the voice output content.
The apparatus of claim 16 wherein said processor is further configured to:

Identifying voiceprint information of the user;

Determining, according to the voiceprint information of the user, whether the voice input content received twice adjacently is input by the same user;

When the voice input content received two times adjacently is input by the same user, according to the received two adjacent The input time and the duration of use of the voice input content, and calculate the time interval between the two received voice input contents;

And determining, according to the time interval, the user's awareness of the category to which the voice input content belongs; wherein the longer the time interval, the lower the awareness.
The apparatus of claim 16 wherein said processor is further configured to:

Identifying voiceprint information of the user;

Acquiring, according to the voiceprint information of the user, history input record information corresponding to the user, where the history input record information includes at least one of historical accumulated use time, historical cumulative input times, and historical input frequency;

Determining, by the history input record information, the user's awareness of the category to which the voice input content belongs; wherein the longer the historical accumulated usage time, the higher the awareness; the historical cumulative input times The more the recognition, the higher the degree of recognition; the higher the historical input frequency, the higher the recognition.
The apparatus of claim 14 wherein the processor is further configured to:

Extracting keywords in the voice input content;

Determining a degree of matching between a keyword in the voice input content and a preset keyword;

Determining, by the user, the recognition of the category to which the voice input content belongs according to the matching degree of the keyword in the voice input content and the preset keyword; wherein the keyword and the preset in the voice input content The higher the matching degree of the professional keyword in the keyword, the higher the recognition degree; the higher the matching degree between the keyword in the voice input content and the non-professional keyword in the preset keyword, the recognition The lower the knowledge.
A non-transitory computer readable recording medium having recorded thereon a computer program, the program comprising instructions for executing a voice output method, the method comprising: receiving a voice input content input by a user;

Determining, according to the voice input content, the user's recognition of a category to which the voice input content belongs, the degree of recognition being a degree of knowledge of the user's professional knowledge of the category;

From the at least one voice output content corresponding to the voice input content, the voice output content matching the recognition degree is acquired and output.