US20040024598A1 - Thematic segmentation of speech - Google Patents
Thematic segmentation of speech Download PDFInfo
- Publication number
- US20040024598A1 US20040024598A1 US10/610,679 US61067903A US2004024598A1 US 20040024598 A1 US20040024598 A1 US 20040024598A1 US 61067903 A US61067903 A US 61067903A US 2004024598 A1 US2004024598 A1 US 2004024598A1
- Authority
- US
- United States
- Prior art keywords
- document
- thematic
- linguistic
- information
- segments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/42—Graphical user interfaces
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/60—Medium conversion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/30—Aspects of automatic or semi-automatic exchanges related to audio recordings in general
- H04M2203/305—Recording playback features, e.g. increased speed
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
Definitions
- the present invention relates generally to speech processing and, more particularly, to the segmentation of speech based on thematic classification.
- Speech has not traditionally been valued as an archival information source. As effective as the spoken word is for communicating, archiving spoken segments in a useful and easily retrievable manner has long been a difficult proposition. Although the act of recording audio is not difficult, automatically transcribing and indexing speech in an intelligent and useful manner can be difficult.
- FIG. 1 is a block diagram illustrating this technique in additional detail.
- Initial input audio information is transcribed by transcription component 101 .
- the transcription may be performed manually or automatically.
- Transcription component 101 outputs a continuous stream of text.
- Windowing component 102 segments the text into chunks of texts of a predetermined length (e.g., 200 words) and generates a vector of the words that occur within the window. Words that occur more frequently within the window are weighted more heavily in the vector.
- Boundary decision component 103 detects changes in thematic segments based on the word count weighted vectors.
- a problem with this technique is that it can produce erroneous or non-optimal thematic segments. Accordingly, there is a need in the art to improve thematic segmentation of speech.
- Systems and methods consistent with the principles of this invention provide a thematic segmentation tool that acts on text augmented with additional information extracted from the spoken version of the text.
- the thematic segmentation tool may generate overlapping thematic segments for a single portion of text.
- Another aspect of the invention is directed to a method for determining thematically coherent segments within a document.
- the method comprises receiving a document having associated linguistic information that describes linguistic features of the document and generating indications of thematically coherent segments within the document that occur at the linguistic features in the document.
- Yet another aspect of the invention is directed to a computing device comprising a processor and a computer memory coupled to the processor.
- the computer memory contains program instructions that when executed by the processor associate linguistic information with a document.
- the linguistic information demarcates linguistic breaks within the document.
- the program instructions additionally generate, based on the linguistic breaks within the document, indications of thematically coherent segments, and output the thematically coherent segments associated with labels describing thematic content of the thematically coherent segments.
- FIG. 1 is a block diagram illustrating thematic segmentation using a conventional technique based on a word count within a moving window of text
- FIG. 2 is a diagram illustrating an exemplary system in which concepts consistent with the invention may be implemented
- FIG. 3 is a block diagram illustrating software elements in a thematic segmentation tool consistent with the invention.
- FIG. 4 is a diagram illustrating exemplary thematic segments for a document.
- Thematic segmentation of spoken audio is performed by a thematic segmentation tool on a transcribed version of the audio supplemented with additional information that further describes the audio.
- the transcription is supplemented with visible linguistic structural information, such as sentence demarcations and non-visible linguistic structural information such, as phrasal boundaries, topic lists, and speaker boundaries.
- visible linguistic structural information such as sentence demarcations
- non-visible linguistic structural information such, as phrasal boundaries, topic lists, and speaker boundaries.
- FIG. 2 is a diagram illustrating an exemplary system 200 in which concepts consistent with the invention may be implemented.
- System 200 includes a computing device 201 that has a computer-readable medium 209 , such as random access memory, coupled to a processor 208 .
- Computing device 201 may also include a number of additional external or internal devices, such as, without limitation, a mouse, a CD-ROM, a keyboard, and a display.
- computing device 201 may be any type of computing platform, and may be connected to a network 202 .
- Computing device 201 is exemplary only. Concepts consistent with the present invention can be implemented on any computing device, whether or not connected to a network.
- Processor 208 can be any of a number of well-known computer processors, such as processors from Intel Corporation, of Santa Clara, Calif. Processor 208 executes program instructions stored in memory 209 .
- Memory 209 contains an application program 215 .
- application program 215 may implement the thematic segmentation tool described below.
- the thematic segmentation tool 215 may receive input data, such as linguistically segmented text, from other application programs executing in computing device 201 or other computing devices, such as those connected to computing device 201 through network 202 .
- Thematic segmentation tool 215 processes the input data to generate indications of thematic segments.
- FIG. 3 is a block diagram conceptually illustrating software elements of thematic segmentation tool 215 . Decisions relating to thematic segmentation are made by thematic decision component 310 .
- Thematic decision component 310 implements a statistical framework that generates thematic segments for a “document.”
- the term document refers to a textual information and associated descriptive information relating to the document (e.g., speaker boundaries, phrasal boundaries, etc.). Although such a document may be generated from data from audio sources, it could be generated in other manners, such as from data from video or textual sources.
- Thematic decision component 310 receives a number of inputs that describe the document. Specifically, as shown in FIG. 3, thematic decision component 310 receives a text transcript of the document from transcription component 320 , speaker boundary information from speaker boundary detection component 321 , linguistic information from linguistic detection component 322 , and topic classifications from topic classification component 323 . Although transcription component 320 , speaker boundary detection component 321 , linguistic detection component 322 , and topic classification component 323 are illustrated as part of thematic segmentation tool 215 , in other implementations, these components may be considered as providing input information to a thematic segmentation tool implemented by thematic decision component 310 .
- Transcription component 320 may be an automated or manual transcription tool that converts the audio input stream it receives into text. Transcription component 320 may use conventional techniques to perform the conversion.
- Speaker boundary detection component 321 locates boundaries between speakers in the audio input stream. Knowledge of speaker changes in an audio stream may be a useful indicator of potential changes in thematic content. Automated speaker boundary detection techniques are known in the art. For example, speaker boundary detection is described in Liu et al., “Fast Speaker Change Detection for Broadcast News Transcription and Indexing,” Eurospeech '99, Budapest, Hungary, September 99, pp. 1031-1034.; and Chen et al., “Speaker, Environment, and Channel Change Detection and Clustering via the Bayesian Information Criterion,” Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop , Lansdowne, Va., 1998. Alternatively, instead of automatically detecting speaker boundaries, the speaker boundaries may be manually inserted into the document.
- Linguistic detection component 322 receives the text generated by transcription component 320 and the audio input stream. Automated transcription techniques generally produce a simple stream of words without linguistic information (e.g., periods, exclamation marks, quotation marks) that would ideally be associated with the text. Linguistic component 322 annotates the text from transcription component 320 to include this linguistic information. In addition to visible linguistic information, such as periods, linguistic component 322 may associate non-visible linguistic information, such as phrasal boundaries, with the received text.
- linguistic information e.g., periods, exclamation marks, quotation marks
- Topic classification component 323 generates topics selected from a predefined topic vocabulary that are relevant to the document.
- a document may include any combination of words from a 60,000 word corpus.
- Topic classification component 323 examines the document and outputs one or more predefined topics, where the number of possible topics is less than the 60,000 word corpus (e.g., a 5,000 word topic vocabulary).
- Topic classification component 323 uses a Bayesian framework to generate topics for a document. More particularly, topic classification component 323 may be implemented as a probabilistic Hidden Markov Model (HMM) whose parameters are estimated from training samples of documents with given topic labels. This model allows each word in a document to contribute different amounts to each of the topics assigned to the document. The output of topic classification component 323 may be a rank-ordered list of all possible topics and corresponding scores that indicate the estimated relevance of each topic.
- HMM Hidden Markov Model
- the output of topic classification component 323 may be a rank-ordered list of all possible topics and corresponding scores that indicate the estimated relevance of each topic.
- automated topic classification systems are known in the art. See, for example, Makhoul et al., “Speech and Language Technologies for Audio Indexing and Retrieval,” Proceedings of the IEEE , vol. 88, no. 8, August 2000.
- topic classification component 323 can be constructed to generate topics based on unsupervised topic discovery.
- Thematic decision component 310 uses the outputs of transcription component 320 , speaker boundary detection component 321 , linguistic detection component 322 , and topic classification component 323 to generate indications of thematic segments in the input document.
- the thematic segments generated by thematic decision component 310 may include multiple overlapping thematic segments for a particular portion of a document. Additionally, thematic decision component 310 may label the thematic segments using a hierarchical labeling scheme such that a specific thematic segment (e.g., a thematic segment labeled “hurricane”) is organized as a subset of a more general thematic segment (e.g., the thematic segment labeled “weather”).
- a specific thematic segment e.g., a thematic segment labeled “hurricane”
- a more general thematic segment e.g., the thematic segment labeled “weather”.
- FIG. 4 is a diagram conceptually illustrating exemplary thematic segments for a document.
- Document 401 is conceptually illustrated as a series of lines that are assumed to correspond to text.
- linguistic cues such as periods 402 and commas 403 .
- speaker boundaries, topics, and non-visible linguistic information may also be associated with document 401 .
- Thematic segments in FIG. 4 are illustrated by the bracketed segments 410 - 412 . As shown, thematic segment 410 and 412 overlap one another. In general, thematic segments do not necessarily have to sequentially follow one another.
- Thematic segment 412 may be hierarchically related to thematic segment 410 as a subset of thematic segment 410 , or thematic segment 412 may be an independent and concurrent thematic segment.
- thematic decision component 310 honors the linguistic boundary information as basic constituents of the document.
- Thematic segments are formed as one or more of the sequential constituents (e.g., one or more sentences) determined by linguistic detection component 322 .
- Speech has a range of properties that make it very different from plain text.
- Thematic segmentation on speech-transcribed text gains from the fact that the problem now has access to the original signal from the speaker in addition to the textual content of what was spoken.
- Nuances in speech from the speaker are frequently very relevant indicators of changes in content as well as intent by the speaker, both of which can be used to effectively model the shift in themes in an episode.
- Prosodic features such as pause, pitch, energy, and, speaking rate, can be used in statistical models for detecting changes in the speech that correspond to a change in the theme of the content.
- LSA Latent Semantic Analysis
- LSA uses singular value decomposition to map the high-dimensional word-document count matrix to a lower dimensional latent ‘semantic’ space wherein terms and documents that are closely associated are placed near one another.
- LSA has the additional property that it can reduce the dimensions of the linguistic features space (typically of the order of 60,000 terms for conventional large-vocabulary speech recognition systems) to much more manageable size and do this intelligently such that the inherent similarities between the terms in the space is not only preserved, it is collated for better modeling.
- Additional linguistic features like Minimum Description Length (MDL) phrases, and named-entity phrases, can be added to the linguistic sub-space and rely on the LSA technique to connect the terms and the phrases effectively.
- MDL Minimum Description Length
- Segmentation techniques can compare the distance between two blocks of text and select segmentation points based on the similarity values between pairs of adjacent blocks.
- M. A. Hearst “Multi-paragraph Segmentation of Expository Text,” Proceedings of the Association for Computational Linguistics, 1994, uses a sliding window and computes similarities between adjacent blocks based on their term frequency vectors.
- sliding windows of text can be used with a similarity measure based on the persistence of statistical-model-based hypothesized topics between pairs of adjacent blocks of windowed-text.
- the smallest unit for the segmentation process is an elementary block. Sentences can be used as the elementary blocks for defining the segmentation candidates.
- the text can be broken into blocks, i.e., sequences of consecutive elementary blocks, where each block includes some number of elementary blocks.
- these blocks are variable-sized, non-overlapping and generally do not cross segment boundaries. However, in the documents to be segmented, these blocks may be overlapping, as in the use of a sliding window.
- the set of positions between every pair of adjacent blocks compose the segmentation candidates.
- the models can operate on the pure acoustic features or the pure linguistic features, as well as the combined acoustic/linguistic features. For example, statistical learn-by-example techniques can be trained on roughly annotated training data for every domain and language may be used.
- Neural Networks can additionally be effective in approximating the complex, non-linear relationships that exist between features of various types, such as continuous, discrete, and in some cases even Boolean, as well as the change in structure in the underlying speech.
- Neural Networks can be used to model the acoustic features and produce an estimate of the similarity or dissimilarity between adjacent blocks on either side of a segmentation candidate.
- LSA high-dimensional linguistic features can be mapped onto a low dimensional compact sub-space. The mapped features can be used with the prosodic information in a combined neural network to detect changes in themes.
- Probabilistic Latent Semantic Analysis can be used to model high-dimensional linguistic features.
- the PLSA model can be used quite effectively with the combined feature space since it is highly adept at finding the subtle cross-correlations between features that expose the inter-relations between the terms and underlying themes.
- PLSA is a statistical latent class model that may provide better results than LSA for term matching in retrieval applications.
- the conditional probability between documents d and feature terms f is modeled through a latent variable z, which can be loosely thought of as a class or topic.
- a PLSA model is parameterized by P(f
- the latent variable z can be thought of as an unknown variable in the context of Expectation-Maximization algorithms and thus the parameters of the PLSA model can be trained from a corpus of documents using the EM algorithm.
- the use of PLSA allows for a better representation of sparse information in a text block, such as a sentence or a sequence of sentences.
- a wide variety of similarity measures like the cosine distance, the Bhattacharya distance, as well as Kullback-Leibler divergence can be used with the scores generated from the PLSA model to determine the segmentation boundaries.
- a thematic segmentation tool demarcates segments for a document that have similar thematic content.
- the thematic segmentation tool bases the thematic segments on a transcription of audio data augmented with additional information relating to linguistic and speaker descriptive properties of the audio.
- the thematic segments generated by the tool may be hierarchical and may include multiple different thematic segments for a portion of text.
- the software may more generally be implemented as any type of logic.
- This logic may include hardware, such as application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.
Abstract
Description
- This application claims priority under 35 U.S.C. § 119 based on U.S. Provisional Application Nos. 60/394,064 and 60/394,082 filed Jul. 3, 2002 and Provisional Application No. 60/419,214 filed Oct. 17, 2002, the disclosures of which are incorporated herein by reference.
- [0002] The U.S. Government may have a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract No. N66001-00-C-8008 (Defense Advanced Research Projects Agency (DARPA)).
- A. Field of the Invention
- The present invention relates generally to speech processing and, more particularly, to the segmentation of speech based on thematic classification.
- B. Description of Related Art
- Speech has not traditionally been valued as an archival information source. As effective as the spoken word is for communicating, archiving spoken segments in a useful and easily retrievable manner has long been a difficult proposition. Although the act of recording audio is not difficult, automatically transcribing and indexing speech in an intelligent and useful manner can be difficult.
- Speech is typically received into a speech recognition system as a continuous stream of words without breaks. In order to effectively use the speech in information management systems (e.g., information retrieval, natural language processing, real-time alerting), the speech recognition system initially processes the speech to generate a formatted version of the speech. For example, the speech may be transcribed and linguistic information, such as sentence structures, may be associated with the transcription.
- In addition to segmenting speech segments based on linguistic information, it may be desirable to also segment the speech based on thematic structure. For example, when archiving a continuous broadcast of a radio news program, it may be desirable to know the portions of the news program that discussed the weather and the portions that were about foreign affairs. The portion of the broadcast that was directed to foreign affairs may be further classified into European and Middle East news segments. Users can, thus, later browse or listen to an archive copy of the news broadcast based on topics of interest.
- One technique for segmenting a continuous stream of speech based on thematic elements involves making thematic boundary decisions based on a word count within a moving window of text. FIG. 1 is a block diagram illustrating this technique in additional detail. Initial input audio information is transcribed by
transcription component 101. The transcription may be performed manually or automatically.Transcription component 101 outputs a continuous stream of text.Windowing component 102 segments the text into chunks of texts of a predetermined length (e.g., 200 words) and generates a vector of the words that occur within the window. Words that occur more frequently within the window are weighted more heavily in the vector.Boundary decision component 103 detects changes in thematic segments based on the word count weighted vectors. - A problem with this technique is that it can produce erroneous or non-optimal thematic segments. Accordingly, there is a need in the art to improve thematic segmentation of speech.
- Systems and methods consistent with the principles of this invention provide a thematic segmentation tool that acts on text augmented with additional information extracted from the spoken version of the text. The thematic segmentation tool may generate overlapping thematic segments for a single portion of text.
- One aspect of the invention is directed to a thematic segmentation tool that includes a transcription component configured to receive spoken audio information and to convert the spoken audio information into a document of text corresponding to the audio information. A linguistic detection component generates linguistic information corresponding to the text produced by the transcription component. A topic classification component generates topics relevant to the document. A thematic decision component generates indications of thematic segments based on the linguistic information, the document, and the topics.
- Another aspect of the invention is directed to a method for determining thematically coherent segments within a document. The method comprises receiving a document having associated linguistic information that describes linguistic features of the document and generating indications of thematically coherent segments within the document that occur at the linguistic features in the document.
- Yet another aspect of the invention is directed to a computing device comprising a processor and a computer memory coupled to the processor. The computer memory contains program instructions that when executed by the processor associate linguistic information with a document. The linguistic information demarcates linguistic breaks within the document. The program instructions additionally generate, based on the linguistic breaks within the document, indications of thematically coherent segments, and output the thematically coherent segments associated with labels describing thematic content of the thematically coherent segments.
- The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and, together with the description, explain the invention. In the drawings,
- FIG. 1 is a block diagram illustrating thematic segmentation using a conventional technique based on a word count within a moving window of text;
- FIG. 2 is a diagram illustrating an exemplary system in which concepts consistent with the invention may be implemented;
- FIG. 3 is a block diagram illustrating software elements in a thematic segmentation tool consistent with the invention; and
- FIG. 4 is a diagram illustrating exemplary thematic segments for a document.
- The following detailed description of the invention refers to the accompanying drawings. The same reference numbers may be used in different drawings to identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents of the claim limitations.
- Thematic segmentation of spoken audio is performed by a thematic segmentation tool on a transcribed version of the audio supplemented with additional information that further describes the audio. In one implementation, the transcription is supplemented with visible linguistic structural information, such as sentence demarcations and non-visible linguistic structural information such, as phrasal boundaries, topic lists, and speaker boundaries. The result of the thematic segmentation includes hierarchical and potentially overlapping thematic segments.
- Thematic segmentation, as described herein, may be performed on one or more processing devices or networks of processing devices. FIG. 2 is a diagram illustrating an
exemplary system 200 in which concepts consistent with the invention may be implemented.System 200 includes acomputing device 201 that has a computer-readable medium 209, such as random access memory, coupled to aprocessor 208.Computing device 201 may also include a number of additional external or internal devices, such as, without limitation, a mouse, a CD-ROM, a keyboard, and a display. - In general,
computing device 201 may be any type of computing platform, and may be connected to anetwork 202.Computing device 201 is exemplary only. Concepts consistent with the present invention can be implemented on any computing device, whether or not connected to a network. -
Processor 208 can be any of a number of well-known computer processors, such as processors from Intel Corporation, of Santa Clara, Calif.Processor 208 executes program instructions stored inmemory 209. -
Memory 209 contains anapplication program 215. In particular,application program 215 may implement the thematic segmentation tool described below. Thethematic segmentation tool 215 may receive input data, such as linguistically segmented text, from other application programs executing incomputing device 201 or other computing devices, such as those connected tocomputing device 201 throughnetwork 202.Thematic segmentation tool 215 processes the input data to generate indications of thematic segments. - FIG. 3 is a block diagram conceptually illustrating software elements of
thematic segmentation tool 215. Decisions relating to thematic segmentation are made bythematic decision component 310.Thematic decision component 310 implements a statistical framework that generates thematic segments for a “document.” The term document, as used herein, refers to a textual information and associated descriptive information relating to the document (e.g., speaker boundaries, phrasal boundaries, etc.). Although such a document may be generated from data from audio sources, it could be generated in other manners, such as from data from video or textual sources. -
Thematic decision component 310 receives a number of inputs that describe the document. Specifically, as shown in FIG. 3,thematic decision component 310 receives a text transcript of the document fromtranscription component 320, speaker boundary information from speakerboundary detection component 321, linguistic information fromlinguistic detection component 322, and topic classifications fromtopic classification component 323. Althoughtranscription component 320, speakerboundary detection component 321,linguistic detection component 322, andtopic classification component 323 are illustrated as part ofthematic segmentation tool 215, in other implementations, these components may be considered as providing input information to a thematic segmentation tool implemented bythematic decision component 310. -
Transcription component 320 may be an automated or manual transcription tool that converts the audio input stream it receives into text.Transcription component 320 may use conventional techniques to perform the conversion. - Speaker
boundary detection component 321 locates boundaries between speakers in the audio input stream. Knowledge of speaker changes in an audio stream may be a useful indicator of potential changes in thematic content. Automated speaker boundary detection techniques are known in the art. For example, speaker boundary detection is described in Liu et al., “Fast Speaker Change Detection for Broadcast News Transcription and Indexing,” Eurospeech '99, Budapest, Hungary, September 99, pp. 1031-1034.; and Chen et al., “Speaker, Environment, and Channel Change Detection and Clustering via the Bayesian Information Criterion,” Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Va., 1998. Alternatively, instead of automatically detecting speaker boundaries, the speaker boundaries may be manually inserted into the document. -
Linguistic detection component 322 receives the text generated bytranscription component 320 and the audio input stream. Automated transcription techniques generally produce a simple stream of words without linguistic information (e.g., periods, exclamation marks, quotation marks) that would ideally be associated with the text.Linguistic component 322 annotates the text fromtranscription component 320 to include this linguistic information. In addition to visible linguistic information, such as periods,linguistic component 322 may associate non-visible linguistic information, such as phrasal boundaries, with the received text. - Techniques for generating both visible and non-visible linguistic information are described in detail in U.S. patent application Ser. No. ______ (Attorney Docket Number 02-4024), titled “Linguistic Segmentation of Speech,” filed ______, the contents of which are hereby incorporated by reference.
-
Topic classification component 323 generates topics selected from a predefined topic vocabulary that are relevant to the document. For example, a document may include any combination of words from a 60,000 word corpus.Topic classification component 323 examines the document and outputs one or more predefined topics, where the number of possible topics is less than the 60,000 word corpus (e.g., a 5,000 word topic vocabulary). -
Topic classification component 323, in one implementation, uses a Bayesian framework to generate topics for a document. More particularly,topic classification component 323 may be implemented as a probabilistic Hidden Markov Model (HMM) whose parameters are estimated from training samples of documents with given topic labels. This model allows each word in a document to contribute different amounts to each of the topics assigned to the document. The output oftopic classification component 323 may be a rank-ordered list of all possible topics and corresponding scores that indicate the estimated relevance of each topic. In general, automated topic classification systems are known in the art. See, for example, Makhoul et al., “Speech and Language Technologies for Audio Indexing and Retrieval,” Proceedings of the IEEE, vol. 88, no. 8, August 2000. - In another possible implementation, instead of estimating parameters based on training samples that have topics manually generated,
topic classification component 323 can be constructed to generate topics based on unsupervised topic discovery. -
Thematic decision component 310 uses the outputs oftranscription component 320, speakerboundary detection component 321,linguistic detection component 322, andtopic classification component 323 to generate indications of thematic segments in the input document. - Consistent with an aspect of the present invention, the thematic segments generated by
thematic decision component 310 may include multiple overlapping thematic segments for a particular portion of a document. Additionally,thematic decision component 310 may label the thematic segments using a hierarchical labeling scheme such that a specific thematic segment (e.g., a thematic segment labeled “hurricane”) is organized as a subset of a more general thematic segment (e.g., the thematic segment labeled “weather”). - FIG. 4 is a diagram conceptually illustrating exemplary thematic segments for a document.
Document 401 is conceptually illustrated as a series of lines that are assumed to correspond to text. Associated with the text indocument 401 are linguistic cues such asperiods 402 andcommas 403. Although not shown, speaker boundaries, topics, and non-visible linguistic information may also be associated withdocument 401. - Thematic segments in FIG. 4 are illustrated by the bracketed segments410-412. As shown,
thematic segment 410 and 412 overlap one another. In general, thematic segments do not necessarily have to sequentially follow one another. Thematic segment 412 may be hierarchically related tothematic segment 410 as a subset ofthematic segment 410, or thematic segment 412 may be an independent and concurrent thematic segment. - In general, when generating thematic segments, such as thematic segments410-412,
thematic decision component 310 honors the linguistic boundary information as basic constituents of the document. Thematic segments are formed as one or more of the sequential constituents (e.g., one or more sentences) determined bylinguistic detection component 322. - A number of different techniques can be used to implement the statistical framework of
thematic decision component 310. Some of these techniques, and the speech features on which they are based, will now be described in more detail. - Acoustic Features
- Speech has a range of properties that make it very different from plain text. Thematic segmentation on speech-transcribed text gains from the fact that the problem now has access to the original signal from the speaker in addition to the textual content of what was spoken. Nuances in speech from the speaker are frequently very relevant indicators of changes in content as well as intent by the speaker, both of which can be used to effectively model the shift in themes in an episode. Prosodic features, such as pause, pitch, energy, and, speaking rate, can be used in statistical models for detecting changes in the speech that correspond to a change in the theme of the content.
- Linguistic Features
- Word repetition can be used alone or in conjunction with other features like word frequency and synonyms. In most cases, synonyms are identified using predefined word-tables or word thesaurus, both of which are hard-to-generate-and-generalize resources. Latent Semantic Analysis (LSA) is a known robust technique used to match words that are synonyms and better handle the multiple meanings of a term. An example of the use of LSA is given in T. Brants, “Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis,” Proceedings of the Conference on Information and Knowledge Management, Nov. 4-9, 2002, McLean, Va. LSA uses singular value decomposition to map the high-dimensional word-document count matrix to a lower dimensional latent ‘semantic’ space wherein terms and documents that are closely associated are placed near one another. LSA has the additional property that it can reduce the dimensions of the linguistic features space (typically of the order of 60,000 terms for conventional large-vocabulary speech recognition systems) to much more manageable size and do this intelligently such that the inherent similarities between the terms in the space is not only preserved, it is collated for better modeling. Additional linguistic features, like Minimum Description Length (MDL) phrases, and named-entity phrases, can be added to the linguistic sub-space and rely on the LSA technique to connect the terms and the phrases effectively.
- Segmentation Approaches
- Segmentation techniques can compare the distance between two blocks of text and select segmentation points based on the similarity values between pairs of adjacent blocks. For Example, M. A. Hearst, “Multi-paragraph Segmentation of Expository Text,” Proceedings of the Association for Computational Linguistics, 1994, uses a sliding window and computes similarities between adjacent blocks based on their term frequency vectors. For thematic segmentation in speech, sliding windows of text can be used with a similarity measure based on the persistence of statistical-model-based hypothesized topics between pairs of adjacent blocks of windowed-text. The smallest unit for the segmentation process is an elementary block. Sentences can be used as the elementary blocks for defining the segmentation candidates. The text can be broken into blocks, i.e., sequences of consecutive elementary blocks, where each block includes some number of elementary blocks. In the training documents, these blocks are variable-sized, non-overlapping and generally do not cross segment boundaries. However, in the documents to be segmented, these blocks may be overlapping, as in the use of a sliding window. The set of positions between every pair of adjacent blocks compose the segmentation candidates.
- Mathematical Models
- There are a number of mathematical models that can be used to determine the relationship between varied features and shifts in thematic content. The models can operate on the pure acoustic features or the pure linguistic features, as well as the combined acoustic/linguistic features. For example, statistical learn-by-example techniques can be trained on roughly annotated training data for every domain and language may be used.
- Neural Networks
- Neural Networks can additionally be effective in approximating the complex, non-linear relationships that exist between features of various types, such as continuous, discrete, and in some cases even Boolean, as well as the change in structure in the underlying speech. Neural Networks can be used to model the acoustic features and produce an estimate of the similarity or dissimilarity between adjacent blocks on either side of a segmentation candidate. With the help of LSA, high-dimensional linguistic features can be mapped onto a low dimensional compact sub-space. The mapped features can be used with the prosodic information in a combined neural network to detect changes in themes.
- Probabilistic Latent Semantic Analysis (PLSA)
- Probabilistic Latent Semantic Analysis can be used to model high-dimensional linguistic features. The PLSA model can be used quite effectively with the combined feature space since it is highly adept at finding the subtle cross-correlations between features that expose the inter-relations between the terms and underlying themes. PLSA is a statistical latent class model that may provide better results than LSA for term matching in retrieval applications. In PLSA, the conditional probability between documents d and feature terms f is modeled through a latent variable z, which can be loosely thought of as a class or topic. A PLSA model is parameterized by P(f|z) and P(z|d), and the words may belong to more than one class and a document may discuss more than one “topic”. The latent variable z can be thought of as an unknown variable in the context of Expectation-Maximization algorithms and thus the parameters of the PLSA model can be trained from a corpus of documents using the EM algorithm. The use of PLSA allows for a better representation of sparse information in a text block, such as a sentence or a sequence of sentences. A wide variety of similarity measures like the cosine distance, the Bhattacharya distance, as well as Kullback-Leibler divergence can be used with the scores generated from the PLSA model to determine the segmentation boundaries.
- As described above, a thematic segmentation tool demarcates segments for a document that have similar thematic content. The thematic segmentation tool bases the thematic segments on a transcription of audio data augmented with additional information relating to linguistic and speaker descriptive properties of the audio. The thematic segments generated by the tool may be hierarchical and may include multiple different thematic segments for a portion of text.
- The foregoing description of preferred embodiments of the invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention.
- Certain portions of the invention have been described as software that performs one or more functions. The software may more generally be implemented as any type of logic. This logic may include hardware, such as application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.
- No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used.
- The scope of the invention is defined by the claims and their equivalents.
Claims (35)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/610,679 US20040024598A1 (en) | 2002-07-03 | 2003-07-02 | Thematic segmentation of speech |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US39406402P | 2002-07-03 | 2002-07-03 | |
US39408202P | 2002-07-03 | 2002-07-03 | |
US41921402P | 2002-10-17 | 2002-10-17 | |
US10/610,679 US20040024598A1 (en) | 2002-07-03 | 2003-07-02 | Thematic segmentation of speech |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040024598A1 true US20040024598A1 (en) | 2004-02-05 |
Family
ID=30003990
Family Applications (11)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/610,699 Abandoned US20040117188A1 (en) | 2002-07-03 | 2003-07-02 | Speech based personal information manager |
US10/610,697 Expired - Fee Related US7290207B2 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for providing multimedia information management |
US10/610,684 Abandoned US20040024582A1 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for aiding human translation |
US10/610,679 Abandoned US20040024598A1 (en) | 2002-07-03 | 2003-07-02 | Thematic segmentation of speech |
US10/610,696 Abandoned US20040024585A1 (en) | 2002-07-03 | 2003-07-02 | Linguistic segmentation of speech |
US10/610,532 Abandoned US20040006481A1 (en) | 2002-07-03 | 2003-07-02 | Fast transcription of speech |
US10/611,106 Active 2026-04-11 US7337115B2 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for providing acoustic classification |
US10/610,533 Expired - Fee Related US7801838B2 (en) | 2002-07-03 | 2003-07-02 | Multimedia recognition system comprising a plurality of indexers configured to receive and analyze multimedia data based on training data and user augmentation relating to one or more of a plurality of generated documents |
US10/610,574 Abandoned US20040006748A1 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for providing online event tracking |
US10/610,799 Abandoned US20040199495A1 (en) | 2002-07-03 | 2003-07-02 | Name browsing systems and methods |
US12/806,465 Expired - Fee Related US8001066B2 (en) | 2002-07-03 | 2010-08-13 | Systems and methods for improving recognition results via user-augmentation of a database |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/610,699 Abandoned US20040117188A1 (en) | 2002-07-03 | 2003-07-02 | Speech based personal information manager |
US10/610,697 Expired - Fee Related US7290207B2 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for providing multimedia information management |
US10/610,684 Abandoned US20040024582A1 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for aiding human translation |
Family Applications After (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/610,696 Abandoned US20040024585A1 (en) | 2002-07-03 | 2003-07-02 | Linguistic segmentation of speech |
US10/610,532 Abandoned US20040006481A1 (en) | 2002-07-03 | 2003-07-02 | Fast transcription of speech |
US10/611,106 Active 2026-04-11 US7337115B2 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for providing acoustic classification |
US10/610,533 Expired - Fee Related US7801838B2 (en) | 2002-07-03 | 2003-07-02 | Multimedia recognition system comprising a plurality of indexers configured to receive and analyze multimedia data based on training data and user augmentation relating to one or more of a plurality of generated documents |
US10/610,574 Abandoned US20040006748A1 (en) | 2002-07-03 | 2003-07-02 | Systems and methods for providing online event tracking |
US10/610,799 Abandoned US20040199495A1 (en) | 2002-07-03 | 2003-07-02 | Name browsing systems and methods |
US12/806,465 Expired - Fee Related US8001066B2 (en) | 2002-07-03 | 2010-08-13 | Systems and methods for improving recognition results via user-augmentation of a database |
Country Status (1)
Country | Link |
---|---|
US (11) | US20040117188A1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050246296A1 (en) * | 2004-04-29 | 2005-11-03 | Microsoft Corporation | Method and system for calculating importance of a block within a display page |
US20060112128A1 (en) * | 2004-11-23 | 2006-05-25 | Palo Alto Research Center Incorporated | Methods, apparatus, and program products for performing incremental probabilitstic latent semantic analysis |
US20080013830A1 (en) * | 2006-07-11 | 2008-01-17 | Data Domain, Inc. | Locality-based stream segmentation for data deduplication |
US7487094B1 (en) * | 2003-06-20 | 2009-02-03 | Utopy, Inc. | System and method of call classification with context modeling based on composite words |
US20090306797A1 (en) * | 2005-09-08 | 2009-12-10 | Stephen Cox | Music analysis |
US20100198598A1 (en) * | 2009-02-05 | 2010-08-05 | Nuance Communications, Inc. | Speaker Recognition in a Speech Recognition System |
US20110173210A1 (en) * | 2010-01-08 | 2011-07-14 | Microsoft Corporation | Identifying a topic-relevant subject |
US20110246183A1 (en) * | 2008-12-15 | 2011-10-06 | Kentaro Nagatomo | Topic transition analysis system, method, and program |
US20120323575A1 (en) * | 2011-06-17 | 2012-12-20 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
US8819023B1 (en) * | 2011-12-22 | 2014-08-26 | Reputation.Com, Inc. | Thematic clustering |
US9538010B2 (en) | 2008-12-19 | 2017-01-03 | Genesys Telecommunications Laboratories, Inc. | Method and system for integrating an interaction management system with a business rules management system |
US9542936B2 (en) | 2012-12-29 | 2017-01-10 | Genesys Telecommunications Laboratories, Inc. | Fast out-of-vocabulary search in automatic speech recognition systems |
US9912816B2 (en) | 2012-11-29 | 2018-03-06 | Genesys Telecommunications Laboratories, Inc. | Workload distribution with resource awareness |
US9992336B2 (en) | 2009-07-13 | 2018-06-05 | Genesys Telecommunications Laboratories, Inc. | System for analyzing interactions and reporting analytic results to human operated and system interfaces in real time |
US10025773B2 (en) * | 2015-07-24 | 2018-07-17 | International Business Machines Corporation | System and method for natural language processing using synthetic text |
EP3583511A4 (en) * | 2017-02-20 | 2020-11-25 | Gong I.O Ltd. | Unsupervised automated topic detection, segmentation and labeling of conversations |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
US11276407B2 (en) | 2018-04-17 | 2022-03-15 | Gong.Io Ltd. | Metadata-based diarization of teleconferences |
US11568231B2 (en) * | 2017-12-08 | 2023-01-31 | Raytheon Bbn Technologies Corp. | Waypoint detection for a contact center analysis system |
Families Citing this family (212)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7349477B2 (en) * | 2002-07-10 | 2008-03-25 | Mitsubishi Electric Research Laboratories, Inc. | Audio-assisted video segmentation and summarization |
US20070225614A1 (en) * | 2004-05-26 | 2007-09-27 | Endothelix, Inc. | Method and apparatus for determining vascular health conditions |
US7574447B2 (en) * | 2003-04-08 | 2009-08-11 | United Parcel Service Of America, Inc. | Inbound package tracking systems and methods |
US20050010231A1 (en) * | 2003-06-20 | 2005-01-13 | Myers Thomas H. | Method and apparatus for strengthening the biomechanical properties of implants |
US7231396B2 (en) * | 2003-07-24 | 2007-06-12 | International Business Machines Corporation | Data abstraction layer for a database |
US8229744B2 (en) * | 2003-08-26 | 2012-07-24 | Nuance Communications, Inc. | Class detection scheme and time mediated averaging of class dependent models |
US20060212830A1 (en) * | 2003-09-09 | 2006-09-21 | Fogg Brian J | Graphical messaging system |
US8046814B1 (en) * | 2003-10-22 | 2011-10-25 | The Weather Channel, Inc. | Systems and methods for formulating and delivering video having perishable information |
GB2409087A (en) * | 2003-12-12 | 2005-06-15 | Ibm | Computer generated prompting |
US7496500B2 (en) * | 2004-03-01 | 2009-02-24 | Microsoft Corporation | Systems and methods that determine intent of data and respond to the data based on the intent |
US8014765B2 (en) * | 2004-03-19 | 2011-09-06 | Media Captioning Services | Real-time captioning framework for mobile devices |
US8266313B2 (en) * | 2004-03-19 | 2012-09-11 | Media Captioning Services, Inc. | Live media subscription framework for mobile devices |
US7421477B2 (en) * | 2004-03-19 | 2008-09-02 | Media Captioning Services | Real-time media captioning subscription framework for mobile devices |
US7844684B2 (en) * | 2004-03-19 | 2010-11-30 | Media Captioning Services, Inc. | Live media captioning subscription framework for mobile devices |
US20050209849A1 (en) * | 2004-03-22 | 2005-09-22 | Sony Corporation And Sony Electronics Inc. | System and method for automatically cataloguing data by utilizing speech recognition procedures |
WO2005122141A1 (en) * | 2004-06-09 | 2005-12-22 | Canon Kabushiki Kaisha | Effective audio segmentation and classification |
US8036893B2 (en) | 2004-07-22 | 2011-10-11 | Nuance Communications, Inc. | Method and system for identifying and correcting accent-induced speech recognition difficulties |
US9635429B2 (en) | 2004-07-30 | 2017-04-25 | Broadband Itv, Inc. | Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection |
US11259059B2 (en) | 2004-07-30 | 2022-02-22 | Broadband Itv, Inc. | System for addressing on-demand TV program content on TV services platform of a digital TV services provider |
US9584868B2 (en) | 2004-07-30 | 2017-02-28 | Broadband Itv, Inc. | Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection |
US9344765B2 (en) | 2004-07-30 | 2016-05-17 | Broadband Itv, Inc. | Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection |
US7590997B2 (en) | 2004-07-30 | 2009-09-15 | Broadband Itv, Inc. | System and method for managing, converting and displaying video content on a video-on-demand platform, including ads used for drill-down navigation and consumer-generated classified ads |
US7631336B2 (en) | 2004-07-30 | 2009-12-08 | Broadband Itv, Inc. | Method for converting, navigating and displaying video content uploaded from the internet to a digital TV video-on-demand platform |
US7769579B2 (en) | 2005-05-31 | 2010-08-03 | Google Inc. | Learning facts from semi-structured text |
EP1889255A1 (en) * | 2005-05-24 | 2008-02-20 | Loquendo S.p.A. | Automatic text-independent, language-independent speaker voice-print creation and speaker recognition |
TWI270052B (en) * | 2005-08-09 | 2007-01-01 | Delta Electronics Inc | System for selecting audio content by using speech recognition and method therefor |
JP4972645B2 (en) | 2005-08-26 | 2012-07-11 | ニュアンス コミュニケーションズ オーストリア ゲーエムベーハー | System and method for synchronizing sound and manually transcribed text |
US20070061703A1 (en) * | 2005-09-12 | 2007-03-15 | International Business Machines Corporation | Method and apparatus for annotating a document |
US20070078644A1 (en) * | 2005-09-30 | 2007-04-05 | Microsoft Corporation | Detecting segmentation errors in an annotated corpus |
KR101329167B1 (en) * | 2005-10-12 | 2013-11-14 | 톰슨 라이센싱 | Region of interest h.264 scalable video coding |
US20070094023A1 (en) * | 2005-10-21 | 2007-04-26 | Callminer, Inc. | Method and apparatus for processing heterogeneous units of work |
JP4432877B2 (en) * | 2005-11-08 | 2010-03-17 | ソニー株式会社 | Information processing system, information processing method, information processing apparatus, program, and recording medium |
US8019752B2 (en) * | 2005-11-10 | 2011-09-13 | Endeca Technologies, Inc. | System and method for information retrieval from object collections with complex interrelationships |
US20070150540A1 (en) * | 2005-12-27 | 2007-06-28 | Microsoft Corporation | Presence and peer launch pad |
TW200731113A (en) * | 2006-02-09 | 2007-08-16 | Benq Corp | Method for utilizing a media adapter for controlling a display device to display information of multimedia data corresponding to an authority datum |
US20070225606A1 (en) * | 2006-03-22 | 2007-09-27 | Endothelix, Inc. | Method and apparatus for comprehensive assessment of vascular health |
US7752031B2 (en) * | 2006-03-23 | 2010-07-06 | International Business Machines Corporation | Cadence management of translated multi-speaker conversations using pause marker relationship models |
US20070225973A1 (en) * | 2006-03-23 | 2007-09-27 | Childress Rhonda L | Collective Audio Chunk Processing for Streaming Translated Multi-Speaker Conversations |
US8301448B2 (en) * | 2006-03-29 | 2012-10-30 | Nuance Communications, Inc. | System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy |
WO2007130864A2 (en) * | 2006-05-02 | 2007-11-15 | Lit Group, Inc. | Method and system for retrieving network documents |
US20080027330A1 (en) * | 2006-05-15 | 2008-01-31 | Endothelix, Inc. | Risk assessment method for acute cardiovascular events |
US7587407B2 (en) * | 2006-05-26 | 2009-09-08 | International Business Machines Corporation | System and method for creation, representation, and delivery of document corpus entity co-occurrence information |
US7593940B2 (en) * | 2006-05-26 | 2009-09-22 | International Business Machines Corporation | System and method for creation, representation, and delivery of document corpus entity co-occurrence information |
US10339208B2 (en) | 2006-06-12 | 2019-07-02 | Brief-Lynx, Inc. | Electronic documentation |
US8219543B2 (en) * | 2006-06-12 | 2012-07-10 | Etrial Communications, Inc. | Electronic documentation |
US7620551B2 (en) * | 2006-07-20 | 2009-11-17 | Mspot, Inc. | Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet |
US20080081963A1 (en) * | 2006-09-29 | 2008-04-03 | Endothelix, Inc. | Methods and Apparatus for Profiling Cardiovascular Vulnerability to Mental Stress |
US8122026B1 (en) * | 2006-10-20 | 2012-02-21 | Google Inc. | Finding and disambiguating references to entities on web pages |
DE102006057159A1 (en) * | 2006-12-01 | 2008-06-05 | Deutsche Telekom Ag | Method for classifying spoken language in speech dialogue systems |
JP4827721B2 (en) * | 2006-12-26 | 2011-11-30 | ニュアンス コミュニケーションズ,インコーポレイテッド | Utterance division method, apparatus and program |
TW200841189A (en) * | 2006-12-27 | 2008-10-16 | Ibm | Technique for accurately detecting system failure |
US20080172219A1 (en) * | 2007-01-17 | 2008-07-17 | Novell, Inc. | Foreign language translator in a document editor |
US8285697B1 (en) * | 2007-01-23 | 2012-10-09 | Google Inc. | Feedback enhanced attribute extraction |
US20080177536A1 (en) * | 2007-01-24 | 2008-07-24 | Microsoft Corporation | A/v content editing |
US20080215318A1 (en) * | 2007-03-01 | 2008-09-04 | Microsoft Corporation | Event recognition |
US8886540B2 (en) | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Using speech recognition results based on an unstructured language model in a mobile communication facility application |
US8838457B2 (en) | 2007-03-07 | 2014-09-16 | Vlingo Corporation | Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility |
US20110054897A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Transmitting signal quality information in mobile dictation application |
US20080221880A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile music environment speech processing facility |
US20090030687A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Adapting an unstructured language model speech recognition system based on usage |
US20110054896A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Sending a communications header with voice recording to send metadata for use in speech recognition and formatting in mobile dictation application |
US8886545B2 (en) * | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Dealing with switch latency in speech recognition |
US10056077B2 (en) * | 2007-03-07 | 2018-08-21 | Nuance Communications, Inc. | Using speech recognition results based on an unstructured language model with a music system |
US20090030688A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Tagging speech recognition results based on an unstructured language model for use in a mobile communication facility application |
US20090030697A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using contextual information for delivering results generated from a speech recognition facility using an unstructured language model |
US8880405B2 (en) * | 2007-03-07 | 2014-11-04 | Vlingo Corporation | Application text entry in a mobile environment using a speech processing facility |
US8635243B2 (en) | 2007-03-07 | 2014-01-21 | Research In Motion Limited | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application |
US8949130B2 (en) | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Internal and external speech recognition use with a mobile communication facility |
US20110054899A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Command and control utilizing content information in a mobile voice-to-speech application |
US20090030691A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using an unstructured language model associated with an application of a mobile communication facility |
US20110054898A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Multiple web-based content search user interface in mobile search application |
US8949266B2 (en) | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Multiple web-based content category searching in mobile search application |
US20110054895A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Utilizing user transmitted text to improve language model in mobile dictation application |
US20090030685A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using speech recognition results based on an unstructured language model with a navigation system |
JP4466665B2 (en) * | 2007-03-13 | 2010-05-26 | 日本電気株式会社 | Minutes creation method, apparatus and program thereof |
US8347202B1 (en) | 2007-03-14 | 2013-01-01 | Google Inc. | Determining geographic locations for place names in a fact repository |
US20080229914A1 (en) * | 2007-03-19 | 2008-09-25 | Trevor Nathanial | Foot operated transport controller for digital audio workstations |
US8078464B2 (en) * | 2007-03-30 | 2011-12-13 | Mattersight Corporation | Method and system for analyzing separated voice data of a telephonic communication to determine the gender of the communicant |
US8856002B2 (en) * | 2007-04-12 | 2014-10-07 | International Business Machines Corporation | Distance metrics for universal pattern processing tasks |
US20080288239A1 (en) * | 2007-05-15 | 2008-11-20 | Microsoft Corporation | Localization and internationalization of document resources |
US9374242B2 (en) * | 2007-11-08 | 2016-06-21 | Invention Science Fund I, Llc | Using evaluations of tentative message content |
US20080320088A1 (en) * | 2007-06-19 | 2008-12-25 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Helping valuable message content pass apparent message filtering |
US8984133B2 (en) * | 2007-06-19 | 2015-03-17 | The Invention Science Fund I, Llc | Providing treatment-indicative feedback dependent on putative content treatment |
US8682982B2 (en) * | 2007-06-19 | 2014-03-25 | The Invention Science Fund I, Llc | Preliminary destination-dependent evaluation of message content |
US11570521B2 (en) | 2007-06-26 | 2023-01-31 | Broadband Itv, Inc. | Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection |
US20090027485A1 (en) * | 2007-07-26 | 2009-01-29 | Avaya Technology Llc | Automatic Monitoring of a Call Participant's Attentiveness |
JP5088050B2 (en) * | 2007-08-29 | 2012-12-05 | ヤマハ株式会社 | Voice processing apparatus and program |
US8065404B2 (en) * | 2007-08-31 | 2011-11-22 | The Invention Science Fund I, Llc | Layering destination-dependent content handling guidance |
US8082225B2 (en) * | 2007-08-31 | 2011-12-20 | The Invention Science Fund I, Llc | Using destination-dependent criteria to guide data transmission decisions |
US8326833B2 (en) * | 2007-10-04 | 2012-12-04 | International Business Machines Corporation | Implementing metadata extraction of artifacts from associated collaborative discussions |
US20090122157A1 (en) * | 2007-11-14 | 2009-05-14 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and computer-readable storage medium |
US7930389B2 (en) | 2007-11-20 | 2011-04-19 | The Invention Science Fund I, Llc | Adaptive filtering of annotated messages or the like |
EP2081405B1 (en) * | 2008-01-21 | 2012-05-16 | Bernafon AG | A hearing aid adapted to a specific type of voice in an acoustical environment, a method and use |
WO2009146042A2 (en) | 2008-03-31 | 2009-12-03 | Terra Soft Solutions Of Colorado, Inc. | Tablet computer |
US20090259469A1 (en) * | 2008-04-14 | 2009-10-15 | Motorola, Inc. | Method and apparatus for speech recognition |
US8326788B2 (en) * | 2008-04-29 | 2012-12-04 | International Business Machines Corporation | Determining the degree of relevance of alerts in an entity resolution system |
US8015137B2 (en) * | 2008-04-29 | 2011-09-06 | International Business Machines Corporation | Determining the degree of relevance of alerts in an entity resolution system over alert disposition lifecycle |
US20090271394A1 (en) * | 2008-04-29 | 2009-10-29 | Allen Thomas B | Determining the degree of relevance of entities and identities in an entity resolution system that maintains alert relevance |
US8250637B2 (en) * | 2008-04-29 | 2012-08-21 | International Business Machines Corporation | Determining the degree of relevance of duplicate alerts in an entity resolution system |
US7475344B1 (en) * | 2008-05-04 | 2009-01-06 | International Business Machines Corporation | Genders-usage assistant for composition of electronic documents, emails, or letters |
WO2010013371A1 (en) * | 2008-07-28 | 2010-02-04 | 日本電気株式会社 | Dialogue speech recognition system, dialogue speech recognition method, and recording medium for storing dialogue speech recognition program |
US8655950B2 (en) * | 2008-08-06 | 2014-02-18 | International Business Machines Corporation | Contextual awareness in real time collaborative activity alerts |
US8744532B2 (en) * | 2008-11-10 | 2014-06-03 | Disney Enterprises, Inc. | System and method for customizable playback of communication device alert media |
US8249870B2 (en) * | 2008-11-12 | 2012-08-21 | Massachusetts Institute Of Technology | Semi-automatic speech transcription |
US8135333B2 (en) * | 2008-12-23 | 2012-03-13 | Motorola Solutions, Inc. | Distributing a broadband resource locator over a narrowband audio stream |
US8301444B2 (en) | 2008-12-29 | 2012-10-30 | At&T Intellectual Property I, L.P. | Automated demographic analysis by analyzing voice activity |
US8954328B2 (en) * | 2009-01-15 | 2015-02-10 | K-Nfb Reading Technology, Inc. | Systems and methods for document narration with multiple characters having multiple moods |
US8458105B2 (en) * | 2009-02-12 | 2013-06-04 | Decisive Analytics Corporation | Method and apparatus for analyzing and interrelating data |
US20100235314A1 (en) * | 2009-02-12 | 2010-09-16 | Decisive Analytics Corporation | Method and apparatus for analyzing and interrelating video data |
US9646603B2 (en) * | 2009-02-27 | 2017-05-09 | Longsand Limited | Various apparatus and methods for a speech recognition system |
CN101847412B (en) * | 2009-03-27 | 2012-02-15 | 华为技术有限公司 | Method and device for classifying audio signals |
CN101901235B (en) * | 2009-05-27 | 2013-03-27 | 国际商业机器公司 | Method and system for document processing |
ES2959694T3 (en) * | 2009-07-17 | 2024-02-27 | Implantica Patent Ltd | Voice control system for a medical implant |
US8190420B2 (en) | 2009-08-04 | 2012-05-29 | Autonomy Corporation Ltd. | Automatic spoken language identification based on phoneme sequence patterns |
US8160877B1 (en) * | 2009-08-06 | 2012-04-17 | Narus, Inc. | Hierarchical real-time speaker recognition for biometric VoIP verification and targeting |
US8799408B2 (en) * | 2009-08-10 | 2014-08-05 | Sling Media Pvt Ltd | Localization systems and methods |
US9727842B2 (en) | 2009-08-21 | 2017-08-08 | International Business Machines Corporation | Determining entity relevance by relationships to other relevant entities |
EP2478451A2 (en) * | 2009-09-18 | 2012-07-25 | Lexxe PTY Ltd | Method and system for scoring texts |
US9552845B2 (en) | 2009-10-09 | 2017-01-24 | Dolby Laboratories Licensing Corporation | Automatic generation of metadata for audio dominance effects |
US8903847B2 (en) * | 2010-03-05 | 2014-12-02 | International Business Machines Corporation | Digital media voice tags in social networks |
US8831942B1 (en) * | 2010-03-19 | 2014-09-09 | Narus, Inc. | System and method for pitch based gender identification with suspicious speaker detection |
US20150279354A1 (en) * | 2010-05-19 | 2015-10-01 | Google Inc. | Personalization and Latency Reduction for Voice-Activated Commands |
US8600750B2 (en) * | 2010-06-08 | 2013-12-03 | Cisco Technology, Inc. | Speaker-cluster dependent speaker recognition (speaker-type automated speech recognition) |
US9465935B2 (en) * | 2010-06-11 | 2016-10-11 | D2L Corporation | Systems, methods, and apparatus for securing user documents |
US20130115606A1 (en) * | 2010-07-07 | 2013-05-09 | The University Of British Columbia | System and method for microfluidic cell culture |
TWI403304B (en) * | 2010-08-27 | 2013-08-01 | Ind Tech Res Inst | Method and mobile device for awareness of linguistic ability |
US9678572B2 (en) | 2010-10-01 | 2017-06-13 | Samsung Electronics Co., Ltd. | Apparatus and method for turning e-book pages in portable terminal |
KR101743632B1 (en) | 2010-10-01 | 2017-06-07 | 삼성전자주식회사 | Apparatus and method for turning e-book pages in portable terminal |
EP2437151B1 (en) | 2010-10-01 | 2020-07-08 | Samsung Electronics Co., Ltd. | Apparatus and method for turning e-book pages in portable terminal |
EP2437153A3 (en) * | 2010-10-01 | 2016-10-05 | Samsung Electronics Co., Ltd. | Apparatus and method for turning e-book pages in portable terminal |
US8498998B2 (en) * | 2010-10-11 | 2013-07-30 | International Business Machines Corporation | Grouping identity records to generate candidate lists to use in an entity and relationship resolution process |
US20120197643A1 (en) * | 2011-01-27 | 2012-08-02 | General Motors Llc | Mapping obstruent speech energy to lower frequencies |
US20120246238A1 (en) | 2011-03-21 | 2012-09-27 | International Business Machines Corporation | Asynchronous messaging tags |
US8688090B2 (en) | 2011-03-21 | 2014-04-01 | International Business Machines Corporation | Data session preferences |
US20120244842A1 (en) | 2011-03-21 | 2012-09-27 | International Business Machines Corporation | Data Session Synchronization With Phone Numbers |
US9160837B2 (en) | 2011-06-29 | 2015-10-13 | Gracenote, Inc. | Interactive streaming content apparatus, systems and methods |
US20130144414A1 (en) * | 2011-12-06 | 2013-06-06 | Cisco Technology, Inc. | Method and apparatus for discovering and labeling speakers in a large and growing collection of videos with minimal user effort |
US9396277B2 (en) * | 2011-12-09 | 2016-07-19 | Microsoft Technology Licensing, Llc | Access to supplemental data based on identifier derived from corresponding primary application data |
US9330188B1 (en) | 2011-12-22 | 2016-05-03 | Amazon Technologies, Inc. | Shared browsing sessions |
US9324323B1 (en) | 2012-01-13 | 2016-04-26 | Google Inc. | Speech recognition using topic-specific language models |
US9087024B1 (en) * | 2012-01-26 | 2015-07-21 | Amazon Technologies, Inc. | Narration of network content |
US8839087B1 (en) | 2012-01-26 | 2014-09-16 | Amazon Technologies, Inc. | Remote browsing and searching |
US9336321B1 (en) | 2012-01-26 | 2016-05-10 | Amazon Technologies, Inc. | Remote browsing and searching |
JP2013161205A (en) * | 2012-02-03 | 2013-08-19 | Sony Corp | Information processing device, information processing method and program |
US8543398B1 (en) | 2012-02-29 | 2013-09-24 | Google Inc. | Training an automatic speech recognition system using compressed word frequencies |
US8775177B1 (en) | 2012-03-08 | 2014-07-08 | Google Inc. | Speech recognition process |
US8965766B1 (en) * | 2012-03-15 | 2015-02-24 | Google Inc. | Systems and methods for identifying music in a noisy environment |
CN104380222B (en) * | 2012-03-28 | 2018-03-27 | 泰瑞·克劳福德 | Sector type is provided and browses the method and system for having recorded dialogue |
GB2502944A (en) * | 2012-03-30 | 2013-12-18 | Jpal Ltd | Segmentation and transcription of speech |
US9129605B2 (en) * | 2012-03-30 | 2015-09-08 | Src, Inc. | Automated voice and speech labeling |
US8374865B1 (en) | 2012-04-26 | 2013-02-12 | Google Inc. | Sampling training data for an automatic speech recognition system based on a benchmark classification distribution |
US8805684B1 (en) | 2012-05-31 | 2014-08-12 | Google Inc. | Distributed speaker adaptation |
US8571859B1 (en) | 2012-05-31 | 2013-10-29 | Google Inc. | Multi-stage speaker adaptation |
US8775175B1 (en) * | 2012-06-01 | 2014-07-08 | Google Inc. | Performing dictation correction |
WO2013179275A2 (en) * | 2012-06-01 | 2013-12-05 | Donald, Heather June | Method and system for generating an interactive display |
US9881616B2 (en) * | 2012-06-06 | 2018-01-30 | Qualcomm Incorporated | Method and systems having improved speech recognition |
GB2505072A (en) | 2012-07-06 | 2014-02-19 | Box Inc | Identifying users and collaborators as search results in a cloud-based system |
US8554559B1 (en) | 2012-07-13 | 2013-10-08 | Google Inc. | Localized speech recognition with offload |
US9123333B2 (en) | 2012-09-12 | 2015-09-01 | Google Inc. | Minimum bayesian risk methods for automatic speech recognition |
US10915492B2 (en) * | 2012-09-19 | 2021-02-09 | Box, Inc. | Cloud-based platform enabled with media content indexed for text-based searches and/or metadata extraction |
US8676590B1 (en) * | 2012-09-26 | 2014-03-18 | Google Inc. | Web-based audio transcription tool |
TW201417093A (en) * | 2012-10-19 | 2014-05-01 | Hon Hai Prec Ind Co Ltd | Electronic device with video/audio files processing function and video/audio files processing method |
EP2736042A1 (en) * | 2012-11-23 | 2014-05-28 | Samsung Electronics Co., Ltd | Apparatus and method for constructing multilingual acoustic model and computer readable recording medium for storing program for performing the method |
KR102112742B1 (en) * | 2013-01-22 | 2020-05-19 | 삼성전자주식회사 | Electronic apparatus and voice processing method thereof |
US9208777B2 (en) | 2013-01-25 | 2015-12-08 | Microsoft Technology Licensing, Llc | Feature space transformation for personalization using generalized i-vector clustering |
WO2014132402A1 (en) * | 2013-02-28 | 2014-09-04 | 株式会社東芝 | Data processing device and method for constructing story model |
US9190055B1 (en) * | 2013-03-14 | 2015-11-17 | Amazon Technologies, Inc. | Named entity recognition with personalized models |
US9195656B2 (en) * | 2013-12-30 | 2015-11-24 | Google Inc. | Multilingual prosody generation |
WO2015105994A1 (en) | 2014-01-08 | 2015-07-16 | Callminer, Inc. | Real-time conversational analytics facility |
US9430186B2 (en) * | 2014-03-17 | 2016-08-30 | Google Inc | Visual indication of a recognized voice-initiated action |
US9497868B2 (en) | 2014-04-17 | 2016-11-15 | Continental Automotive Systems, Inc. | Electronics enclosure |
US9773499B2 (en) | 2014-06-18 | 2017-09-26 | Google Inc. | Entity name recognition based on entity type |
EP3184623A4 (en) * | 2014-08-22 | 2018-04-25 | Olympus Corporation | Cell culture bag, cell culture device, and cell culture container |
US9772816B1 (en) * | 2014-12-22 | 2017-09-26 | Google Inc. | Transcription and tagging system |
EP3089159B1 (en) | 2015-04-28 | 2019-08-28 | Google LLC | Correcting voice recognition using selective re-speak |
US10381022B1 (en) | 2015-12-23 | 2019-08-13 | Google Llc | Audio classifier |
US10282411B2 (en) * | 2016-03-31 | 2019-05-07 | International Business Machines Corporation | System, method, and recording medium for natural language learning |
CN107305541B (en) * | 2016-04-20 | 2021-05-04 | 科大讯飞股份有限公司 | Method and device for segmenting speech recognition text |
US20180018973A1 (en) | 2016-07-15 | 2018-01-18 | Google Inc. | Speaker verification |
US9978392B2 (en) * | 2016-09-09 | 2018-05-22 | Tata Consultancy Services Limited | Noisy signal identification from non-stationary audio signals |
CN109102810B (en) * | 2017-06-21 | 2021-10-15 | 北京搜狗科技发展有限公司 | Voiceprint recognition method and device |
GB2578386B (en) | 2017-06-27 | 2021-12-01 | Cirrus Logic Int Semiconductor Ltd | Detection of replay attack |
GB2563953A (en) | 2017-06-28 | 2019-01-02 | Cirrus Logic Int Semiconductor Ltd | Detection of replay attack |
GB201713697D0 (en) | 2017-06-28 | 2017-10-11 | Cirrus Logic Int Semiconductor Ltd | Magnetic detection of replay attack |
GB201801532D0 (en) | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for audio playback |
GB201801526D0 (en) | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for authentication |
GB201801530D0 (en) | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for authentication |
GB201801528D0 (en) | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Method, apparatus and systems for biometric processes |
GB201801527D0 (en) | 2017-07-07 | 2018-03-14 | Cirrus Logic Int Semiconductor Ltd | Method, apparatus and systems for biometric processes |
EP3432560A1 (en) * | 2017-07-20 | 2019-01-23 | Dialogtech Inc. | System, method, and computer program product for automatically analyzing and categorizing phone calls |
GB201801661D0 (en) | 2017-10-13 | 2018-03-21 | Cirrus Logic International Uk Ltd | Detection of liveness |
GB201804843D0 (en) | 2017-11-14 | 2018-05-09 | Cirrus Logic Int Semiconductor Ltd | Detection of replay attack |
GB201801874D0 (en) | 2017-10-13 | 2018-03-21 | Cirrus Logic Int Semiconductor Ltd | Improving robustness of speech processing system against ultrasound and dolphin attacks |
GB201803570D0 (en) | 2017-10-13 | 2018-04-18 | Cirrus Logic Int Semiconductor Ltd | Detection of replay attack |
GB2567503A (en) | 2017-10-13 | 2019-04-17 | Cirrus Logic Int Semiconductor Ltd | Analysing speech signals |
GB201801663D0 (en) | 2017-10-13 | 2018-03-21 | Cirrus Logic Int Semiconductor Ltd | Detection of liveness |
GB201801664D0 (en) | 2017-10-13 | 2018-03-21 | Cirrus Logic Int Semiconductor Ltd | Detection of liveness |
GB201801659D0 (en) | 2017-11-14 | 2018-03-21 | Cirrus Logic Int Semiconductor Ltd | Detection of loudspeaker playback |
US11443646B2 (en) | 2017-12-22 | 2022-09-13 | Fathom Technologies, LLC | E-Reader interface system with audio and highlighting synchronization for digital books |
US10671251B2 (en) | 2017-12-22 | 2020-06-02 | Arbordale Publishing, LLC | Interactive eReader interface generation based on synchronization of textual and audial descriptors |
US11270071B2 (en) * | 2017-12-28 | 2022-03-08 | Comcast Cable Communications, Llc | Language-based content recommendations using closed captions |
US11735189B2 (en) | 2018-01-23 | 2023-08-22 | Cirrus Logic, Inc. | Speaker identification |
US11264037B2 (en) | 2018-01-23 | 2022-03-01 | Cirrus Logic, Inc. | Speaker identification |
US11475899B2 (en) | 2018-01-23 | 2022-10-18 | Cirrus Logic, Inc. | Speaker identification |
US10692490B2 (en) | 2018-07-31 | 2020-06-23 | Cirrus Logic, Inc. | Detection of replay attack |
US10915614B2 (en) | 2018-08-31 | 2021-02-09 | Cirrus Logic, Inc. | Biometric authentication |
US11037574B2 (en) * | 2018-09-05 | 2021-06-15 | Cirrus Logic, Inc. | Speaker recognition and speaker change detection |
US11183195B2 (en) * | 2018-09-27 | 2021-11-23 | Snackable Inc. | Audio content processing systems and methods |
US11410658B1 (en) * | 2019-10-29 | 2022-08-09 | Dialpad, Inc. | Maintainable and scalable pipeline for automatic speech recognition language modeling |
US11373657B2 (en) * | 2020-05-01 | 2022-06-28 | Raytheon Applied Signal Technology, Inc. | System and method for speaker identification in audio data |
US11315545B2 (en) | 2020-07-09 | 2022-04-26 | Raytheon Applied Signal Technology, Inc. | System and method for language identification in audio data |
CN112289323B (en) * | 2020-12-29 | 2021-05-28 | 深圳追一科技有限公司 | Voice data processing method and device, computer equipment and storage medium |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5806032A (en) * | 1996-06-14 | 1998-09-08 | Lucent Technologies Inc. | Compilation of weighted finite-state transducers from decision trees |
US6006184A (en) * | 1997-01-28 | 1999-12-21 | Nec Corporation | Tree structured cohort selection for speaker recognition system |
US6052657A (en) * | 1997-09-09 | 2000-04-18 | Dragon Systems, Inc. | Text segmentation and identification of topic using language models |
US6073096A (en) * | 1998-02-04 | 2000-06-06 | International Business Machines Corporation | Speaker adaptation system and method based on class-specific pre-clustering training speakers |
US6076053A (en) * | 1998-05-21 | 2000-06-13 | Lucent Technologies Inc. | Methods and apparatus for discriminative training and adaptation of pronunciation networks |
US6185531B1 (en) * | 1997-01-09 | 2001-02-06 | Gte Internetworking Incorporated | Topic indexing method |
US6308222B1 (en) * | 1996-06-03 | 2001-10-23 | Microsoft Corporation | Transcoding of audio data |
US20020001261A1 (en) * | 2000-04-21 | 2002-01-03 | Yoshinori Matsui | Data playback apparatus |
US20020010575A1 (en) * | 2000-04-08 | 2002-01-24 | International Business Machines Corporation | Method and system for the automatic segmentation of an audio stream into semantic or syntactic units |
US6347295B1 (en) * | 1998-10-26 | 2002-02-12 | Compaq Computer Corporation | Computer method and apparatus for grapheme-to-phoneme rule-set-generation |
US6434520B1 (en) * | 1999-04-16 | 2002-08-13 | International Business Machines Corporation | System and method for indexing and querying audio archives |
US6571208B1 (en) * | 1999-11-29 | 2003-05-27 | Matsushita Electric Industrial Co., Ltd. | Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training |
US20040024739A1 (en) * | 1999-06-15 | 2004-02-05 | Kanisa Inc. | System and method for implementing a knowledge management system |
US6711541B1 (en) * | 1999-09-07 | 2004-03-23 | Matsushita Electric Industrial Co., Ltd. | Technique for developing discriminative sound units for speech recognition and allophone modeling |
US6718305B1 (en) * | 1999-03-19 | 2004-04-06 | Koninklijke Philips Electronics N.V. | Specifying a tree structure for speech recognizers using correlation between regression classes |
US6732183B1 (en) * | 1996-12-31 | 2004-05-04 | Broadware Technologies, Inc. | Video and audio streaming for multiple users |
US20060129541A1 (en) * | 2002-06-11 | 2006-06-15 | Microsoft Corporation | Dynamically updated quick searches and strategies |
US7257528B1 (en) * | 1998-02-13 | 2007-08-14 | Zi Corporation Of Canada, Inc. | Method and apparatus for Chinese character text input |
Family Cites Families (163)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AUPQ131399A0 (en) * | 1999-06-30 | 1999-07-22 | Silverbrook Research Pty Ltd | A method and apparatus (NPAGE02) |
US4193119A (en) * | 1977-03-25 | 1980-03-11 | Xerox Corporation | Apparatus for assisting in the transposition of foreign language text |
US4317611A (en) * | 1980-05-19 | 1982-03-02 | International Business Machines Corporation | Optical ray deflection apparatus |
US4615595A (en) * | 1984-10-10 | 1986-10-07 | Texas Instruments Incorporated | Frame addressed spatial light modulator |
US4908866A (en) * | 1985-02-04 | 1990-03-13 | Eric Goldwasser | Speech transcribing system |
JPH0693221B2 (en) | 1985-06-12 | 1994-11-16 | 株式会社日立製作所 | Voice input device |
JPH0743719B2 (en) * | 1986-05-20 | 1995-05-15 | シャープ株式会社 | Machine translation device |
US4879648A (en) * | 1986-09-19 | 1989-11-07 | Nancy P. Cochran | Search system which continuously displays search terms during scrolling and selections of individually displayed data sets |
JPH0833799B2 (en) * | 1988-10-31 | 1996-03-29 | 富士通株式会社 | Data input / output control method |
US5146439A (en) * | 1989-01-04 | 1992-09-08 | Pitney Bowes Inc. | Records management system having dictation/transcription capability |
US6978277B2 (en) * | 1989-10-26 | 2005-12-20 | Encyclopaedia Britannica, Inc. | Multimedia search system |
US5418716A (en) * | 1990-07-26 | 1995-05-23 | Nec Corporation | System for recognizing sentence patterns and a system for recognizing sentence patterns and grammatical cases |
US5404295A (en) * | 1990-08-16 | 1995-04-04 | Katz; Boris | Method and apparatus for utilizing annotations to facilitate computer retrieval of database material |
US5408686A (en) * | 1991-02-19 | 1995-04-18 | Mankovitz; Roy J. | Apparatus and methods for music and lyrics broadcasting |
US5317732A (en) * | 1991-04-26 | 1994-05-31 | Commodore Electronics Limited | System for relocating a multimedia presentation on a different platform by extracting a resource map in order to remap and relocate resources |
US5477451A (en) * | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5875108A (en) | 1991-12-23 | 1999-02-23 | Hoffberg; Steven M. | Ergonomic man-machine interface incorporating adaptive pattern recognition based control system |
US5544257A (en) | 1992-01-08 | 1996-08-06 | International Business Machines Corporation | Continuous parameter hidden Markov model approach to automatic handwriting recognition |
US5311360A (en) * | 1992-04-28 | 1994-05-10 | The Board Of Trustees Of The Leland Stanford, Junior University | Method and apparatus for modulating a light beam |
JP2524472B2 (en) * | 1992-09-21 | 1996-08-14 | インターナショナル・ビジネス・マシーンズ・コーポレイション | How to train a telephone line based speech recognition system |
CA2108536C (en) | 1992-11-24 | 2000-04-04 | Oscar Ernesto Agazzi | Text recognition using two-dimensional stochastic models |
US5369704A (en) * | 1993-03-24 | 1994-11-29 | Engate Incorporated | Down-line transcription system for manipulating real-time testimony |
US5525047A (en) * | 1993-06-30 | 1996-06-11 | Cooper Cameron Corporation | Sealing system for an unloader |
US5689641A (en) * | 1993-10-01 | 1997-11-18 | Vicor, Inc. | Multimedia collaboration system arrangement for routing compressed AV signal through a participant site without decompressing the AV signal |
JP2986345B2 (en) * | 1993-10-18 | 1999-12-06 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Voice recording indexing apparatus and method |
US5452024A (en) * | 1993-11-01 | 1995-09-19 | Texas Instruments Incorporated | DMD display system |
JP3185505B2 (en) * | 1993-12-24 | 2001-07-11 | 株式会社日立製作所 | Meeting record creation support device |
GB2285895A (en) | 1994-01-19 | 1995-07-26 | Ibm | Audio conferencing system which generates a set of minutes |
US5810599A (en) * | 1994-01-26 | 1998-09-22 | E-Systems, Inc. | Interactive audio-visual foreign language skills maintenance system and method |
FR2718539B1 (en) * | 1994-04-08 | 1996-04-26 | Thomson Csf | Device for amplifying the amplitude modulation rate of an optical beam. |
JPH07319917A (en) * | 1994-05-24 | 1995-12-08 | Fuji Xerox Co Ltd | Document data base managing device and document data base system |
US5613032A (en) * | 1994-09-02 | 1997-03-18 | Bell Communications Research, Inc. | System and method for recording, playing back and searching multimedia events wherein video, audio and text can be searched and retrieved |
US5715445A (en) * | 1994-09-02 | 1998-02-03 | Wolfe; Mark A. | Document retrieval system employing a preloading procedure |
WO1996010799A1 (en) * | 1994-09-30 | 1996-04-11 | Motorola Inc. | Method and system for extracting features from handwritten text |
US5768607A (en) * | 1994-09-30 | 1998-06-16 | Intel Corporation | Method and apparatus for freehand annotation and drawings incorporating sound and for compressing and synchronizing sound |
US5835667A (en) * | 1994-10-14 | 1998-11-10 | Carnegie Mellon University | Method and apparatus for creating a searchable digital video library and a system and method of using such a library |
US5777614A (en) | 1994-10-14 | 1998-07-07 | Hitachi, Ltd. | Editing support system including an interactive interface |
US5614940A (en) * | 1994-10-21 | 1997-03-25 | Intel Corporation | Method and apparatus for providing broadcast information with indexing |
US6029195A (en) | 1994-11-29 | 2000-02-22 | Herz; Frederick S. M. | System for customized electronic identification of desirable objects |
US5729656A (en) | 1994-11-30 | 1998-03-17 | International Business Machines Corporation | Reduction of search space in speech recognition using phone boundaries and phone ranking |
US5638487A (en) * | 1994-12-30 | 1997-06-10 | Purespeech, Inc. | Automatic speech recognition |
US5715367A (en) | 1995-01-23 | 1998-02-03 | Dragon Systems, Inc. | Apparatuses and methods for developing and using models for speech recognition |
US5684924A (en) * | 1995-05-19 | 1997-11-04 | Kurzweil Applied Intelligence, Inc. | User adaptable speech recognition system |
EP0834139A4 (en) * | 1995-06-07 | 1998-08-05 | Int Language Engineering Corp | Machine assisted translation tools |
US6046840A (en) * | 1995-06-19 | 2000-04-04 | Reflectivity, Inc. | Double substrate reflective spatial light modulator with self-limiting micro-mechanical elements |
US5559875A (en) * | 1995-07-31 | 1996-09-24 | Latitude Communications | Method and apparatus for recording and retrieval of audio conferences |
US6151598A (en) * | 1995-08-14 | 2000-11-21 | Shaw; Venson M. | Digital dictionary with a communication system for the creating, updating, editing, storing, maintaining, referencing, and managing the digital dictionary |
US6006221A (en) * | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US6026388A (en) * | 1995-08-16 | 2000-02-15 | Textwise, Llc | User interface and other enhancements for natural language information retrieval system and method |
US5963940A (en) * | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
US5757536A (en) * | 1995-08-30 | 1998-05-26 | Sandia Corporation | Electrically-programmable diffraction grating |
US6332147B1 (en) | 1995-11-03 | 2001-12-18 | Xerox Corporation | Computer controlled display system using a graphical replay device to control playback of temporal data representing collaborative activities |
US5742419A (en) * | 1995-11-07 | 1998-04-21 | The Board Of Trustees Of The Leland Stanford Junior Universtiy | Miniature scanning confocal microscope |
US5960447A (en) * | 1995-11-13 | 1999-09-28 | Holt; Douglas | Word tagging and editing system for speech recognition |
US5999306A (en) * | 1995-12-01 | 1999-12-07 | Seiko Epson Corporation | Method of manufacturing spatial light modulator and electronic device employing it |
JPH09269931A (en) | 1996-01-30 | 1997-10-14 | Canon Inc | Cooperative work environment constructing system, its method and medium |
US6067517A (en) * | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
EP0823112B1 (en) * | 1996-02-27 | 2002-05-02 | Koninklijke Philips Electronics N.V. | Method and apparatus for automatic speech segmentation into phoneme-like units |
US5862259A (en) * | 1996-03-27 | 1999-01-19 | Caere Corporation | Pattern recognition employing arbitrary segmentation and compound probabilistic evaluation |
US6024571A (en) * | 1996-04-25 | 2000-02-15 | Renegar; Janet Elaine | Foreign language communication system/device and learning aid |
US5778187A (en) * | 1996-05-09 | 1998-07-07 | Netcast Communications Corp. | Multicasting method and apparatus |
US5835908A (en) * | 1996-11-19 | 1998-11-10 | Microsoft Corporation | Processing multiple database transactions in the same process to reduce process overhead and redundant retrieval from database servers |
US6169789B1 (en) | 1996-12-16 | 2001-01-02 | Sanjay K. Rao | Intelligent keyboard system |
US5897614A (en) * | 1996-12-20 | 1999-04-27 | International Business Machines Corporation | Method and apparatus for sibilant classification in a speech recognition system |
US6807570B1 (en) * | 1997-01-21 | 2004-10-19 | International Business Machines Corporation | Pre-loading of web pages corresponding to designated links in HTML |
US6088669A (en) * | 1997-01-28 | 2000-07-11 | International Business Machines, Corporation | Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling |
US6029124A (en) * | 1997-02-21 | 2000-02-22 | Dragon Systems, Inc. | Sequential, nonparametric speech recognition and speaker identification |
US6024751A (en) * | 1997-04-11 | 2000-02-15 | Coherent Inc. | Method and apparatus for transurethral resection of the prostate |
WO1998054655A1 (en) * | 1997-05-28 | 1998-12-03 | Shinar Linguistic Technologies Inc. | Translation system |
US6360234B2 (en) * | 1997-08-14 | 2002-03-19 | Virage, Inc. | Video cataloger system with synchronized encoders |
US6567980B1 (en) * | 1997-08-14 | 2003-05-20 | Virage, Inc. | Video cataloger system with hyperlinked output |
US6463444B1 (en) * | 1997-08-14 | 2002-10-08 | Virage, Inc. | Video cataloger system with extensibility |
US6317716B1 (en) * | 1997-09-19 | 2001-11-13 | Massachusetts Institute Of Technology | Automatic cueing of speech |
JP2001511991A (en) | 1997-10-01 | 2001-08-14 | エイ・ティ・アンド・ティ・コーポレーション | Method and apparatus for storing and retrieving label interval data for multimedia records |
US6961954B1 (en) | 1997-10-27 | 2005-11-01 | The Mitre Corporation | Automated segmentation, information extraction, summarization, and presentation of broadcast news |
US6064963A (en) | 1997-12-17 | 2000-05-16 | Opus Telecom, L.L.C. | Automatic key word or phrase speech recognition for the corrections industry |
JP4183311B2 (en) * | 1997-12-22 | 2008-11-19 | 株式会社リコー | Document annotation method, annotation device, and recording medium |
US5970473A (en) | 1997-12-31 | 1999-10-19 | At&T Corp. | Video communication device providing in-home catalog services |
SE511584C2 (en) * | 1998-01-15 | 1999-10-25 | Ericsson Telefon Ab L M | information Routing |
US6327343B1 (en) | 1998-01-16 | 2001-12-04 | International Business Machines Corporation | System and methods for automatic call and data transfer processing |
JP3181548B2 (en) * | 1998-02-03 | 2001-07-03 | 富士通株式会社 | Information retrieval apparatus and information retrieval method |
US6361326B1 (en) * | 1998-02-20 | 2002-03-26 | George Mason University | System for instruction thinking skills |
US6381640B1 (en) * | 1998-09-11 | 2002-04-30 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus for automated personalization and presentation of workload assignments to agents within a multimedia communication center |
US6112172A (en) | 1998-03-31 | 2000-08-29 | Dragon Systems, Inc. | Interactive searching |
CN1159662C (en) * | 1998-05-13 | 2004-07-28 | 国际商业机器公司 | Automatic punctuating for continuous speech recognition |
US6243680B1 (en) * | 1998-06-15 | 2001-06-05 | Nortel Networks Limited | Method and apparatus for obtaining a transcription of phrases through text and spoken utterances |
US6067514A (en) * | 1998-06-23 | 2000-05-23 | International Business Machines Corporation | Method for automatically punctuating a speech utterance in a continuous speech recognition system |
US6341330B1 (en) * | 1998-07-27 | 2002-01-22 | Oak Technology, Inc. | Method and system for caching a selected viewing angle in a DVD environment |
US6233389B1 (en) * | 1998-07-30 | 2001-05-15 | Tivo, Inc. | Multimedia time warping system |
US6246983B1 (en) * | 1998-08-05 | 2001-06-12 | Matsushita Electric Corporation Of America | Text-to-speech e-mail reader with multi-modal reply processor |
US6373985B1 (en) | 1998-08-12 | 2002-04-16 | Lucent Technologies, Inc. | E-mail signature block analysis |
US6161087A (en) | 1998-10-05 | 2000-12-12 | Lernout & Hauspie Speech Products N.V. | Speech-recognition-assisted selective suppression of silent and filled speech pauses during playback of an audio recording |
US6360237B1 (en) * | 1998-10-05 | 2002-03-19 | Lernout & Hauspie Speech Products N.V. | Method and system for performing text edits during audio recording playback |
US6038058A (en) * | 1998-10-15 | 2000-03-14 | Memsolutions, Inc. | Grid-actuated charge controlled mirror and method of addressing the same |
US6332139B1 (en) | 1998-11-09 | 2001-12-18 | Mega Chips Corporation | Information communication system |
US6292772B1 (en) * | 1998-12-01 | 2001-09-18 | Justsystem Corporation | Method for identifying the language of individual words |
JP3252282B2 (en) * | 1998-12-17 | 2002-02-04 | 松下電器産業株式会社 | Method and apparatus for searching scene |
US6654735B1 (en) | 1999-01-08 | 2003-11-25 | International Business Machines Corporation | Outbound information analysis for generating user interest profiles and improving user productivity |
US6253179B1 (en) * | 1999-01-29 | 2001-06-26 | International Business Machines Corporation | Method and apparatus for multi-environment speaker verification |
CN1592403A (en) | 1999-03-30 | 2005-03-09 | 提维股份有限公司 | Data storage management and program system and method |
US6345252B1 (en) * | 1999-04-09 | 2002-02-05 | International Business Machines Corporation | Methods and apparatus for retrieving audio information using content and speaker information |
US6338033B1 (en) * | 1999-04-20 | 2002-01-08 | Alis Technologies, Inc. | System and method for network-based teletranslation from one natural language to another |
US6219640B1 (en) * | 1999-08-06 | 2001-04-17 | International Business Machines Corporation | Methods and apparatus for audio-visual speaker recognition and utterance verification |
IES990800A2 (en) | 1999-08-20 | 2000-09-06 | Digitake Software Systems Ltd | An audio processing system |
JP3232289B2 (en) | 1999-08-30 | 2001-11-26 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Symbol insertion device and method |
US6480826B2 (en) | 1999-08-31 | 2002-11-12 | Accenture Llp | System and method for a telephonic emotion detection that provides operator feedback |
US6624826B1 (en) * | 1999-09-28 | 2003-09-23 | Ricoh Co., Ltd. | Method and apparatus for generating visual representations for audio documents |
US6396619B1 (en) * | 2000-01-28 | 2002-05-28 | Reflectivity, Inc. | Deflectable spatial light modulator having stopping mechanisms |
US7412643B1 (en) * | 1999-11-23 | 2008-08-12 | International Business Machines Corporation | Method and apparatus for linking representation and realization data |
JP2003518266A (en) | 1999-12-20 | 2003-06-03 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Speech reproduction for text editing of speech recognition system |
US20020071169A1 (en) * | 2000-02-01 | 2002-06-13 | Bowers John Edward | Micro-electro-mechanical-system (MEMS) mirror device |
ATE336776T1 (en) * | 2000-02-25 | 2006-09-15 | Koninkl Philips Electronics Nv | DEVICE FOR SPEECH RECOGNITION WITH REFERENCE TRANSFORMATION MEANS |
US7197694B2 (en) * | 2000-03-21 | 2007-03-27 | Oki Electric Industry Co., Ltd. | Image display system, image registration terminal device and image reading terminal device used in the image display system |
AU2001255599A1 (en) * | 2000-04-24 | 2001-11-07 | Microsoft Corporation | Computer-aided reading system and method with cross-language reading wizard |
US7107204B1 (en) * | 2000-04-24 | 2006-09-12 | Microsoft Corporation | Computer-aided writing system and method with cross-language writing wizard |
US6388661B1 (en) * | 2000-05-03 | 2002-05-14 | Reflectivity, Inc. | Monochrome and color digital display systems and methods |
US6505153B1 (en) * | 2000-05-22 | 2003-01-07 | Compaq Information Technologies Group, L.P. | Efficient method for producing off-line closed captions |
US6748356B1 (en) * | 2000-06-07 | 2004-06-08 | International Business Machines Corporation | Methods and apparatus for identifying unknown speakers using a hierarchical tree structure |
US7047192B2 (en) * | 2000-06-28 | 2006-05-16 | Poirier Darrell A | Simultaneous multi-user real-time speech recognition system |
US6337760B1 (en) * | 2000-07-17 | 2002-01-08 | Reflectivity, Inc. | Encapsulated multi-directional light beam steering device |
US6931376B2 (en) * | 2000-07-20 | 2005-08-16 | Microsoft Corporation | Speech-related event notification system |
US20020059204A1 (en) | 2000-07-28 | 2002-05-16 | Harris Larry R. | Distributed search system and method |
AU2001279589A1 (en) | 2000-07-28 | 2002-02-13 | Jan Pathuel | Method and system of securing data and systems |
US7155061B2 (en) * | 2000-08-22 | 2006-12-26 | Microsoft Corporation | Method and system for searching for words and phrases in active and stored ink word documents |
AU2001288469A1 (en) * | 2000-08-28 | 2002-03-13 | Emotion, Inc. | Method and apparatus for digital media management, retrieval, and collaboration |
US6604110B1 (en) * | 2000-08-31 | 2003-08-05 | Ascential Software, Inc. | Automated software code generation from a metadata-based repository |
US6647383B1 (en) * | 2000-09-01 | 2003-11-11 | Lucent Technologies Inc. | System and method for providing interactive dialogue and iterative search functions to find information |
US7075671B1 (en) * | 2000-09-14 | 2006-07-11 | International Business Machines Corp. | System and method for providing a printing capability for a transcription service or multimedia presentation |
WO2002029612A1 (en) | 2000-09-30 | 2002-04-11 | Intel Corporation | Method and system for generating and searching an optimal maximum likelihood decision tree for hidden markov model (hmm) based speech recognition |
US7472064B1 (en) | 2000-09-30 | 2008-12-30 | Intel Corporation | Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition |
US6431714B1 (en) * | 2000-10-10 | 2002-08-13 | Nippon Telegraph And Telephone Corporation | Micro-mirror apparatus and production method therefor |
US6934756B2 (en) * | 2000-11-01 | 2005-08-23 | International Business Machines Corporation | Conversational networking via transport, coding and control conversational protocols |
US20050060162A1 (en) | 2000-11-10 | 2005-03-17 | Farhad Mohit | Systems and methods for automatic identification and hyperlinking of words or other data items and for information retrieval using hyperlinked words or data items |
US6574026B2 (en) * | 2000-12-07 | 2003-06-03 | Agere Systems Inc. | Magnetically-packaged optical MEMs device |
US6944272B1 (en) * | 2001-01-16 | 2005-09-13 | Interactive Intelligence, Inc. | Method and system for administering multiple messages over a public switched telephone network |
SG98440A1 (en) * | 2001-01-16 | 2003-09-19 | Reuters Ltd | Method and apparatus for a financial database structure |
US6714911B2 (en) | 2001-01-25 | 2004-03-30 | Harcourt Assessment, Inc. | Speech transcription and analysis system and method |
US6429033B1 (en) * | 2001-02-20 | 2002-08-06 | Nayna Networks, Inc. | Process for manufacturing mirror devices using semiconductor technology |
US20020133477A1 (en) * | 2001-03-05 | 2002-09-19 | Glenn Abel | Method for profile-based notice and broadcast of multimedia content |
US6732095B1 (en) * | 2001-04-13 | 2004-05-04 | Siebel Systems, Inc. | Method and apparatus for mapping between XML and relational representations |
WO2002086737A1 (en) * | 2001-04-20 | 2002-10-31 | Wordsniffer, Inc. | Method and apparatus for integrated, user-directed web site text translation |
US7035804B2 (en) * | 2001-04-26 | 2006-04-25 | Stenograph, L.L.C. | Systems and methods for automated audio transcription, translation, and transfer |
US6820055B2 (en) * | 2001-04-26 | 2004-11-16 | Speche Communications | Systems and methods for automated audio transcription, translation, and transfer with text display software for manipulating the text |
US6895376B2 (en) * | 2001-05-04 | 2005-05-17 | Matsushita Electric Industrial Co., Ltd. | Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification |
CN1236423C (en) * | 2001-05-10 | 2006-01-11 | 皇家菲利浦电子有限公司 | Background learning of speaker voices |
US6973428B2 (en) * | 2001-05-24 | 2005-12-06 | International Business Machines Corporation | System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition |
US20030018663A1 (en) * | 2001-05-30 | 2003-01-23 | Cornette Ranjita K. | Method and system for creating a multimedia electronic book |
US7027973B2 (en) * | 2001-07-13 | 2006-04-11 | Hewlett-Packard Development Company, L.P. | System and method for converting a standard generalized markup language in multiple languages |
US6778979B2 (en) * | 2001-08-13 | 2004-08-17 | Xerox Corporation | System for automatically generating queries |
US6993473B2 (en) * | 2001-08-31 | 2006-01-31 | Equality Translation Services | Productivity tool for language translators |
US20030078973A1 (en) * | 2001-09-25 | 2003-04-24 | Przekop Michael V. | Web-enabled system and method for on-demand distribution of transcript-synchronized video/audio records of legal proceedings to collaborative workgroups |
US6748350B2 (en) * | 2001-09-27 | 2004-06-08 | Intel Corporation | Method to compensate for stress between heat spreader and thermal interface material |
US6708148B2 (en) | 2001-10-12 | 2004-03-16 | Koninklijke Philips Electronics N.V. | Correction device to mark parts of a recognized text |
US20030093580A1 (en) | 2001-11-09 | 2003-05-15 | Koninklijke Philips Electronics N.V. | Method and system for information alerts |
WO2003065245A1 (en) * | 2002-01-29 | 2003-08-07 | International Business Machines Corporation | Translating method, translated sentence outputting method, recording medium, program, and computer device |
US7165024B2 (en) | 2002-02-22 | 2007-01-16 | Nec Laboratories America, Inc. | Inferring hierarchical descriptions of a set of documents |
US7522910B2 (en) | 2002-05-31 | 2009-04-21 | Oracle International Corporation | Method and apparatus for controlling data provided to a mobile device |
US6618702B1 (en) * | 2002-06-14 | 2003-09-09 | Mary Antoinette Kohler | Method of and device for phone-based speaker recognition |
US7131117B2 (en) * | 2002-09-04 | 2006-10-31 | Sbc Properties, L.P. | Method and system for automating the analysis of word frequencies |
US6999918B2 (en) * | 2002-09-20 | 2006-02-14 | Motorola, Inc. | Method and apparatus to facilitate correlating symbols to sounds |
EP1422692A3 (en) | 2002-11-22 | 2004-07-14 | ScanSoft, Inc. | Automatic insertion of non-verbalized punctuation in speech recognition |
US7627479B2 (en) * | 2003-02-21 | 2009-12-01 | Motionpoint Corporation | Automation tool for web site content language translation |
US8464150B2 (en) * | 2008-06-07 | 2013-06-11 | Apple Inc. | Automatic language identification for dynamic text processing |
-
2003
- 2003-07-02 US US10/610,699 patent/US20040117188A1/en not_active Abandoned
- 2003-07-02 US US10/610,697 patent/US7290207B2/en not_active Expired - Fee Related
- 2003-07-02 US US10/610,684 patent/US20040024582A1/en not_active Abandoned
- 2003-07-02 US US10/610,679 patent/US20040024598A1/en not_active Abandoned
- 2003-07-02 US US10/610,696 patent/US20040024585A1/en not_active Abandoned
- 2003-07-02 US US10/610,532 patent/US20040006481A1/en not_active Abandoned
- 2003-07-02 US US10/611,106 patent/US7337115B2/en active Active
- 2003-07-02 US US10/610,533 patent/US7801838B2/en not_active Expired - Fee Related
- 2003-07-02 US US10/610,574 patent/US20040006748A1/en not_active Abandoned
- 2003-07-02 US US10/610,799 patent/US20040199495A1/en not_active Abandoned
-
2010
- 2010-08-13 US US12/806,465 patent/US8001066B2/en not_active Expired - Fee Related
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6308222B1 (en) * | 1996-06-03 | 2001-10-23 | Microsoft Corporation | Transcoding of audio data |
US5806032A (en) * | 1996-06-14 | 1998-09-08 | Lucent Technologies Inc. | Compilation of weighted finite-state transducers from decision trees |
US6732183B1 (en) * | 1996-12-31 | 2004-05-04 | Broadware Technologies, Inc. | Video and audio streaming for multiple users |
US6185531B1 (en) * | 1997-01-09 | 2001-02-06 | Gte Internetworking Incorporated | Topic indexing method |
US6006184A (en) * | 1997-01-28 | 1999-12-21 | Nec Corporation | Tree structured cohort selection for speaker recognition system |
US6052657A (en) * | 1997-09-09 | 2000-04-18 | Dragon Systems, Inc. | Text segmentation and identification of topic using language models |
US6073096A (en) * | 1998-02-04 | 2000-06-06 | International Business Machines Corporation | Speaker adaptation system and method based on class-specific pre-clustering training speakers |
US7257528B1 (en) * | 1998-02-13 | 2007-08-14 | Zi Corporation Of Canada, Inc. | Method and apparatus for Chinese character text input |
US6076053A (en) * | 1998-05-21 | 2000-06-13 | Lucent Technologies Inc. | Methods and apparatus for discriminative training and adaptation of pronunciation networks |
US6347295B1 (en) * | 1998-10-26 | 2002-02-12 | Compaq Computer Corporation | Computer method and apparatus for grapheme-to-phoneme rule-set-generation |
US6718305B1 (en) * | 1999-03-19 | 2004-04-06 | Koninklijke Philips Electronics N.V. | Specifying a tree structure for speech recognizers using correlation between regression classes |
US6434520B1 (en) * | 1999-04-16 | 2002-08-13 | International Business Machines Corporation | System and method for indexing and querying audio archives |
US20040024739A1 (en) * | 1999-06-15 | 2004-02-05 | Kanisa Inc. | System and method for implementing a knowledge management system |
US6711541B1 (en) * | 1999-09-07 | 2004-03-23 | Matsushita Electric Industrial Co., Ltd. | Technique for developing discriminative sound units for speech recognition and allophone modeling |
US6571208B1 (en) * | 1999-11-29 | 2003-05-27 | Matsushita Electric Industrial Co., Ltd. | Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training |
US20020010575A1 (en) * | 2000-04-08 | 2002-01-24 | International Business Machines Corporation | Method and system for the automatic segmentation of an audio stream into semantic or syntactic units |
US20020001261A1 (en) * | 2000-04-21 | 2002-01-03 | Yoshinori Matsui | Data playback apparatus |
US20060129541A1 (en) * | 2002-06-11 | 2006-06-15 | Microsoft Corporation | Dynamically updated quick searches and strategies |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7487094B1 (en) * | 2003-06-20 | 2009-02-03 | Utopy, Inc. | System and method of call classification with context modeling based on composite words |
US8095478B2 (en) | 2004-04-29 | 2012-01-10 | Microsoft Corporation | Method and system for calculating importance of a block within a display page |
US7363279B2 (en) * | 2004-04-29 | 2008-04-22 | Microsoft Corporation | Method and system for calculating importance of a block within a display page |
US8401977B2 (en) | 2004-04-29 | 2013-03-19 | Microsoft Corporation | Method and system for calculating importance of a block within a display page |
US20080256068A1 (en) * | 2004-04-29 | 2008-10-16 | Microsoft Corporation | Method and system for calculating importance of a block within a display page |
US20050246296A1 (en) * | 2004-04-29 | 2005-11-03 | Microsoft Corporation | Method and system for calculating importance of a block within a display page |
US20060112128A1 (en) * | 2004-11-23 | 2006-05-25 | Palo Alto Research Center Incorporated | Methods, apparatus, and program products for performing incremental probabilitstic latent semantic analysis |
US7529765B2 (en) | 2004-11-23 | 2009-05-05 | Palo Alto Research Center Incorporated | Methods, apparatus, and program products for performing incremental probabilistic latent semantic analysis |
US20090306797A1 (en) * | 2005-09-08 | 2009-12-10 | Stephen Cox | Music analysis |
US7504969B2 (en) * | 2006-07-11 | 2009-03-17 | Data Domain, Inc. | Locality-based stream segmentation for data deduplication |
US20080013830A1 (en) * | 2006-07-11 | 2008-01-17 | Data Domain, Inc. | Locality-based stream segmentation for data deduplication |
US20110246183A1 (en) * | 2008-12-15 | 2011-10-06 | Kentaro Nagatomo | Topic transition analysis system, method, and program |
US8670978B2 (en) * | 2008-12-15 | 2014-03-11 | Nec Corporation | Topic transition analysis system, method, and program |
US10250750B2 (en) | 2008-12-19 | 2019-04-02 | Genesys Telecommunications Laboratories, Inc. | Method and system for integrating an interaction management system with a business rules management system |
US9924038B2 (en) | 2008-12-19 | 2018-03-20 | Genesys Telecommunications Laboratories, Inc. | Method and system for integrating an interaction management system with a business rules management system |
US9538010B2 (en) | 2008-12-19 | 2017-01-03 | Genesys Telecommunications Laboratories, Inc. | Method and system for integrating an interaction management system with a business rules management system |
US20100198598A1 (en) * | 2009-02-05 | 2010-08-05 | Nuance Communications, Inc. | Speaker Recognition in a Speech Recognition System |
US9992336B2 (en) | 2009-07-13 | 2018-06-05 | Genesys Telecommunications Laboratories, Inc. | System for analyzing interactions and reporting analytic results to human operated and system interfaces in real time |
US8954434B2 (en) * | 2010-01-08 | 2015-02-10 | Microsoft Corporation | Enhancing a document with supplemental information from another document |
US20110173210A1 (en) * | 2010-01-08 | 2011-07-14 | Microsoft Corporation | Identifying a topic-relevant subject |
US10311893B2 (en) | 2011-06-17 | 2019-06-04 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
US20120323575A1 (en) * | 2011-06-17 | 2012-12-20 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
US11069367B2 (en) | 2011-06-17 | 2021-07-20 | Shopify Inc. | Speaker association with a visual representation of spoken content |
US9747925B2 (en) | 2011-06-17 | 2017-08-29 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
US9053750B2 (en) * | 2011-06-17 | 2015-06-09 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
US9613636B2 (en) | 2011-06-17 | 2017-04-04 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
US8819023B1 (en) * | 2011-12-22 | 2014-08-26 | Reputation.Com, Inc. | Thematic clustering |
US8886651B1 (en) * | 2011-12-22 | 2014-11-11 | Reputation.Com, Inc. | Thematic clustering |
US10298766B2 (en) | 2012-11-29 | 2019-05-21 | Genesys Telecommunications Laboratories, Inc. | Workload distribution with resource awareness |
US9912816B2 (en) | 2012-11-29 | 2018-03-06 | Genesys Telecommunications Laboratories, Inc. | Workload distribution with resource awareness |
US9542936B2 (en) | 2012-12-29 | 2017-01-10 | Genesys Telecommunications Laboratories, Inc. | Fast out-of-vocabulary search in automatic speech recognition systems |
US10290301B2 (en) | 2012-12-29 | 2019-05-14 | Genesys Telecommunications Laboratories, Inc. | Fast out-of-vocabulary search in automatic speech recognition systems |
US10025773B2 (en) * | 2015-07-24 | 2018-07-17 | International Business Machines Corporation | System and method for natural language processing using synthetic text |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
EP3583511A4 (en) * | 2017-02-20 | 2020-11-25 | Gong I.O Ltd. | Unsupervised automated topic detection, segmentation and labeling of conversations |
US11568231B2 (en) * | 2017-12-08 | 2023-01-31 | Raytheon Bbn Technologies Corp. | Waypoint detection for a contact center analysis system |
US11276407B2 (en) | 2018-04-17 | 2022-03-15 | Gong.Io Ltd. | Metadata-based diarization of teleconferences |
Also Published As
Publication number | Publication date |
---|---|
US20040024582A1 (en) | 2004-02-05 |
US20040199495A1 (en) | 2004-10-07 |
US20040024585A1 (en) | 2004-02-05 |
US20040006748A1 (en) | 2004-01-08 |
US7290207B2 (en) | 2007-10-30 |
US20110004576A1 (en) | 2011-01-06 |
US20040117188A1 (en) | 2004-06-17 |
US7801838B2 (en) | 2010-09-21 |
US20040030550A1 (en) | 2004-02-12 |
US20040006481A1 (en) | 2004-01-08 |
US8001066B2 (en) | 2011-08-16 |
US20040006737A1 (en) | 2004-01-08 |
US7337115B2 (en) | 2008-02-26 |
US20040006576A1 (en) | 2004-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20040024598A1 (en) | Thematic segmentation of speech | |
Kim et al. | Two-stage multi-intent detection for spoken language understanding | |
Chelba et al. | Retrieval and browsing of spoken content | |
Gudivada et al. | Big data driven natural language processing research and applications | |
Xu et al. | A bidirectional lstm approach with word embeddings for sentence boundary detection | |
Kumar et al. | A comprehensive review of recent automatic speech summarization and keyword identification techniques | |
Ghannay et al. | Combining continuous word representation and prosodic features for asr error prediction | |
Moyal et al. | Phonetic search methods for large speech databases | |
Sharma et al. | A comprehensive empirical review of modern voice activity detection approaches for movies and TV shows | |
El Hannani et al. | Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection | |
Jeong et al. | Multi-domain spoken language understanding with transfer learning | |
Errattahi et al. | System-independent asr error detection and classification using recurrent neural network | |
Zhang et al. | Automatic parliamentary meeting minute generation using rhetorical structure modeling | |
Celikyilmaz et al. | An empirical investigation of word class-based features for natural language understanding | |
Ghannay et al. | A study of continuous space word and sentence representations applied to ASR error detection | |
Suresh et al. | Approximating probabilistic models as weighted finite automata | |
Errattahi et al. | Incorporating label dependency for ASR error detection via RNN | |
Anandika et al. | Review on usage of Hidden Markov Model in natural language processing | |
Anidjar et al. | Speech and multilingual natural language framework for speaker change detection and diarization | |
Velikovich | Semantic model for fast tagging of word lattices | |
McTear et al. | Spoken language understanding | |
CN114595324A (en) | Method, device, terminal and non-transitory storage medium for power grid service data domain division | |
JP2006107353A (en) | Information processor, information processing method, recording medium and program | |
Nazarov et al. | Algorithms to increase data reliability in video transcription | |
Anidjar et al. | A thousand words are worth more than one recording: Nlp based speaker change point detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BBNT SOLUTIONS LLC, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SRIVASTAVA, AMIT;KUBALA, FRANCIS;REEL/FRAME:014336/0622 Effective date: 20030701 |
|
AS | Assignment |
Owner name: FLEET NATIONAL BANK, AS AGENT, MASSACHUSETTS Free format text: PATENT & TRADEMARK SECURITY AGREEMENT;ASSIGNOR:BBNT SOLUTIONS LLC;REEL/FRAME:014624/0196 Effective date: 20040326 Owner name: FLEET NATIONAL BANK, AS AGENT,MASSACHUSETTS Free format text: PATENT & TRADEMARK SECURITY AGREEMENT;ASSIGNOR:BBNT SOLUTIONS LLC;REEL/FRAME:014624/0196 Effective date: 20040326 |
|
AS | Assignment |
Owner name: BBN TECHNOLOGIES CORP.,MASSACHUSETTS Free format text: MERGER;ASSIGNOR:BBNT SOLUTIONS LLC;REEL/FRAME:017274/0318 Effective date: 20060103 Owner name: BBN TECHNOLOGIES CORP., MASSACHUSETTS Free format text: MERGER;ASSIGNOR:BBNT SOLUTIONS LLC;REEL/FRAME:017274/0318 Effective date: 20060103 |
|
AS | Assignment |
Owner name: BBNT SOLUTIONS LLC, MASSACHUSETTS Free format text: CORRECTION OF ASSIGNEE ADDRESS RECORDED AT REEL/FRAME 014336/0622;ASSIGNORS:SRIVASTAVA, AMIT;KUBALA, FRANCIS;REEL/FRAME:019682/0623 Effective date: 20030701 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: BBN TECHNOLOGIES CORP. (AS SUCCESSOR BY MERGER TO Free format text: RELEASE OF SECURITY INTEREST;ASSIGNOR:BANK OF AMERICA, N.A. (SUCCESSOR BY MERGER TO FLEET NATIONAL BANK);REEL/FRAME:023427/0436 Effective date: 20091026 |