US20090222442A1 - User-directed navigation of multimedia search results - Google Patents

User-directed navigation of multimedia search results Download PDF

Info

Publication number
US20090222442A1
US20090222442A1 US12/391,770 US39177009A US2009222442A1 US 20090222442 A1 US20090222442 A1 US 20090222442A1 US 39177009 A US39177009 A US 39177009A US 2009222442 A1 US2009222442 A1 US 2009222442A1
Authority
US
United States
Prior art keywords
content
segment
audio
segments
media content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/391,770
Inventor
Henry Houh
Jeffrey Nathan Stern
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cxense Asa
Original Assignee
Henry Houh
Jeffrey Nathan Stern
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/395,732 external-priority patent/US20070106646A1/en
Application filed by Henry Houh, Jeffrey Nathan Stern filed Critical Henry Houh
Priority to US12/391,770 priority Critical patent/US20090222442A1/en
Publication of US20090222442A1 publication Critical patent/US20090222442A1/en
Assigned to RAMP HOLDINGS, INC. reassignment RAMP HOLDINGS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOUH, HENRY, STERN, JEFFREY N.
Assigned to CXENSE ASA reassignment CXENSE ASA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RAMP HOLDINGS INC.
Priority to US15/047,372 priority patent/US20160188577A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/489Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/745Browsing; Visualisation therefor the internal structure of a single video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features

Definitions

  • aspects of the invention relate to methods and apparatus for generating and using enhanced metadata in search-driven applications.
  • Metadata which can be broadly defined as “data about data,” refers to the searchable definitions used to locate information. This issue is particularly relevant to searches on the Web, where metatags may determine the ease with which a particular Web site is located by searchers. Metadata that are embedded with content is called embedded metadata.
  • a data repository typically stores the metadata detached from the data.
  • Results obtained from search engine queries are limited to metadata information stored in a data repository, referred to as an index.
  • the metadata information that describes the audio content or the video content is typically limited to information provided by the content publisher.
  • the metadata information associated with audio/video podcasts generally consists of a URL link to the podcast, title, and a brief summary of its content. If this limited information fails to satisfy a search query, the search engine is not likely to provide the corresponding audio/video podcast as a search result even if the actual content of the audio/video podcast satisfies the query.
  • the invention features an automated method and apparatus for generating metadata enhanced for audio, video or both (“audio/video”) search-driven applications.
  • the apparatus includes a media indexer that obtains an media file or stream (“media file/stream”), applies one or more automated media processing techniques to the media file/stream, combines the results of the media processing into metadata enhanced for audio/video search, and stores the enhanced metadata in a searchable index or other data repository.
  • the media file/stream can be an audio/video podcast, for example.
  • the invention features a computerized method and apparatus for timed tagging of media content.
  • the method and apparatus can include the steps of, or structure for, obtaining at least one keyword tag associated with discrete media content; generating a timed segment index of discrete media content, the timed segment index identifying content segments of the discrete media content and corresponding timing boundaries of the content segments; searching the timed segment index for a match to the at least one keyword tag, the match corresponding to at least one of the content segments identified in the segment index; and generating a timed tag index that includes the at least one keyword tag and the timing boundaries corresponding to the least one content segment of the discrete media content containing the match.
  • One or more of the content segments identified in the timed segment index can include word segments, audio speech segments, video segments, or marker segments.
  • one or more of the content segments identified in the timed segment index can include audio corresponding to an individual word, audio corresponding to a phrase, audio corresponding to a sentence, audio corresponding to a paragraph, audio corresponding to a story, audio corresponding to a topic, audio within a range of volume levels, audio of an identified speaker, audio during a speaker turn, audio associated with a speaker emotion, audio separated by sound gaps, audio separated by markers embedded within the media content or audio corresponding to a named entity.
  • One or more of the content segments identified in the timed segment index can also include video of individual scenes, watermarks, recognized objects, recognized faces, overlay text or video separated by markers embedded within the media content.
  • the computerized method and apparatus can further include the steps of, or structure for, generating a timed segment index of discrete media content, the timed segment index identifying text of audible words from content segments of the discrete media content and corresponding timing boundaries of the content segments; searching the timed segment index for text matching the at least one keyword tag, the matching text corresponding to at least one of the content segments identified in the segment index; and generating a timed tag index that includes the at least one keyword tag and the timing boundaries corresponding to the least one content segment of the discrete media content containing the matching text.
  • the text of audible words from content segments of the discrete media content can be derived from the discrete media content using one or more media processing techniques or obtained from closed caption data associated with the discrete media content.
  • the computerized method and apparatus can further include the steps of, or structure for, aligning the text from the closed caption data to timing boundaries corresponding to the content segments of the discrete media content; and generating the timed segment index of discrete media content, the timed segment index identifying the text from the closed caption data aligned to the corresponding timing boundaries of the content segments.
  • the computerized method and apparatus can further include the step of, or structure for, receiving the keyword tag from a content provider, the keyword tag being associated with the discrete media content by the content provider.
  • the computerized method and apparatus can further include the step of, or structure for, receiving the keyword tag from a content provider, the keyword tag being uploaded along with the discrete media content by the content provider.
  • the computerized method and apparatus can further include the step of, or structure for, receiving the keyword tag from a content provider, the keyword tag being embedded in a content descriptor corresponding to the discrete media content provided by the content provider.
  • the computerized method and apparatus can further include the step of, or structure for, generating the keyword tag from the timed segment index.
  • the content segments identified in the timed segment index can include word segments, such that each word segment identifies the text of an audible word and the corresponding timing boundaries of the audible word within the discrete media content.
  • the computerized method and apparatus can further include the steps of, or structure for, comparing the at least one keyword tag to the text of the audible word identified in each of the word segments; obtaining the corresponding timing boundaries for at least one of the word segments including the text of an audible word matching to the at least one keyword tag; identifying a broader content segment from the timed segment index having timing boundaries that include the corresponding timing boundaries of the word segment matching to the at least one keyword tag; and mapping the timing boundaries of the broader content segment to the at least one keyword tag in the timed tag index.
  • the computerized method and apparatus can further include the step of, or structure for, presenting a search result that enables a user to arbitrarily select and commence playback of the discrete media content at any of the content segments associated with the at least one keyword tag using the timing boundaries identified within the timed tag index.
  • FIG. 1A is a diagram illustrating an apparatus and method for generating metadata enhanced for audio/video search-driven applications.
  • FIG. 1B is a diagram illustrating an example of a media indexer.
  • FIG. 2 is a diagram illustrating an example of metadata enhanced for audio/video search-driven applications.
  • FIG. 3 is a diagram illustrating an example of a search snippet that enables user-directed navigation of underlying media content.
  • FIGS. 4 and 5 are diagrams illustrating a computerized method and apparatus for generating search snippets that enable user navigation of the underlying media content.
  • FIG. 6A is a diagram illustrating another example of a search snippet that enables user navigation of the underlying media content.
  • FIGS. 6B and 6C are diagrams illustrating a method for navigating media content using the search snippet of FIG. 6A .
  • FIG. 7 is a diagram that illustrates the concept of a tagged media file.
  • FIG. 8A is a diagram that illustrates a system including an apparatus for timed tagging of media content.
  • FIG. 8B is a flow diagram that illustrates a method for timed tagging of media content according to the apparatus of FIG. 8A .
  • FIG. 9 is a diagram that illustrates an exemplary timed segment index for media clip of FIG. 7 .
  • FIGS. 10A and 10B are diagrams that conceptually illustrate a timed tag index.
  • FIG. 11 is a diagram illustrating a system for accessing timed tagged media content from a search engine.
  • the invention features an automated method and apparatus for generating metadata enhanced for audio/video search-driven applications.
  • the apparatus includes a media indexer that obtains an media file/stream (e.g., audio/video podcasts), applies one or more automated media processing techniques to the media file/stream, combines the results of the media processing into metadata enhanced for audio/video search, and stores the enhanced metadata in a searchable index or other data repository.
  • an media file/stream e.g., audio/video podcasts
  • the apparatus includes a media indexer that obtains an media file/stream (e.g., audio/video podcasts), applies one or more automated media processing techniques to the media file/stream, combines the results of the media processing into metadata enhanced for audio/video search, and stores the enhanced metadata in a searchable index or other data repository.
  • FIG. 1A is a diagram illustrating an apparatus and method for generating metadata enhanced for audio/video search-driven applications.
  • the media indexer 10 cooperates with a descriptor indexer 50 to generate the enhanced metadata 30 .
  • a content descriptor 25 is received and processed by both the media indexer 10 and the descriptor indexer 50 .
  • the metadata 27 corresponding to one or more audio/video podcasts includes a title, summary, and location (e.g., URL link) for each podcast.
  • the descriptor indexer 50 extracts the descriptor metadata 27 from the text and embedded metatags of the content descriptor 25 and outputs it to a combiner 60 .
  • the content descriptor 25 can also be a simple web page link to a media file.
  • the link can contain information in the text of the link that describes the file and can also include attributes in the HTML that describe the target media file.
  • the media indexer 10 reads the metadata 27 from the content descriptor 25 and downloads the audio/video podcast 20 from the identified location.
  • the media indexer 10 applies one or more automated media processing techniques to the downloaded podcast and outputs the combined results to the combiner 60 .
  • the metadata information from the media indexer 10 and the descriptor indexer 50 are combined in a predetermined format to form the enhanced metadata 30 .
  • the enhanced metadata 30 is then stored in the index 40 accessible to search-driven applications such as those disclosed herein.
  • the descriptor indexer 50 is optional and the enhanced metadata is generated by the media indexer 10 .
  • FIG. 1B is a diagram illustrating an example of a media indexer.
  • the media indexer 10 includes a bank of media processors 100 that are managed by a media indexing controller 110 .
  • the media indexing controller 110 and each of the media processors 100 can be implemented, for example, using a suitably programmed or dedicated processor (e.g., a microprocessor or microcontroller), hardwired logic, Application Specific Integrated Circuit (ASIC), and a Programmable Logic Device (PLD) (e.g., Field Programmable Gate Array (FPGA)).
  • a suitably programmed or dedicated processor e.g., a microprocessor or microcontroller
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • a content descriptor 25 is fed into the media indexing controller 110 , which allocates one or more appropriate media processors 100 a . . . 100 n to process the media files/streams 20 identified in the metadata 27 .
  • Each of the assigned media processors 100 obtains the media file/stream (e.g., audio/video podcast) and applies a predefined set of audio or video processing routines to derive a portion of the enhanced metadata from the media content.
  • Examples of known media processors 100 include speech recognition processors 100 a , natural language processors 100 b , video frame analyzers 100 c , non-speech audio analyzers 100 d , marker extractors 100 e and embedded metadata processors 100 f .
  • Other media processors known to those skilled in the art of audio and video analysis can also be implemented within the media indexer.
  • the results of such media processing define timing boundaries of a number of content segment within a media file/stream, including timed word segments 105 a , timed audio speech segments 105 b , timed video segments 105 c , timed non-speech audio segments 105 d , timed marker segments 105 e , as well as miscellaneous content attributes 105 f , for example.
  • FIG. 2 is a diagram illustrating an example of metadata enhanced for audio/video search-driven applications.
  • the enhanced metadata 200 include metadata 210 corresponding to the underlying media content generally.
  • metadata 210 can include a URL 215 a , title 215 b , summary 215 c , and miscellaneous content attributes 215 d .
  • Such information can be obtained from a content descriptor by the descriptor indexer 50 .
  • An example of a content descriptor is a Really Simple Syndication (RSS) document that is descriptive of one or more audio/video podcasts.
  • RSS Really Simple Syndication
  • such information can be extracted by an embedded metadata processor 100 f from header fields embedded within the media file/stream according to a predetermined format.
  • the enhanced metadata 200 further identifies individual segments of audio/video content and timing information that defines the boundaries of each segment within the media file/stream. For example, in FIG. 2 , the enhanced metadata 200 includes metadata that identifies a number of possible content segments within a typical media file/stream, namely word segments, audio speech segments, video segments, non-speech audio segments, and/or marker segments, for example.
  • the metadata 220 includes descriptive parameters for each of the timed word segments 225 , including a segment identifier 225 a , the text of an individual word 225 b , timing information defining the boundaries of that content segment (i.e., start offset 225 c , end offset 225 d , and/or duration 225 e ), and optionally a confidence score 225 f .
  • the segment identifier 225 a uniquely identifies each word segment amongst the content segments identified within the metadata 200 .
  • the text of the word segment 225 b can be determined using a speech recognition processor 100 a or parsed from closed caption data included with the media file/stream.
  • the start offset 225 c is an offset for indexing into the audio/video content to the beginning of the content segment.
  • the end offset 225 d is an offset for indexing into the audio/video content to the end of the content segment.
  • the duration 225 e indicates the duration of the content segment.
  • the start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art.
  • the confidence score 225 f is a relative ranking (typically between 0 and 1) provided by the speech recognition processor 100 a as to the accuracy of the recognized word.
  • the metadata 230 includes descriptive parameters for each of the timed audio speech segments 235 , including a segment identifier 235 a , an audio speech segment type 235 b , timing information defining the boundaries of the content segment (e.g., start offset 235 c , end offset 235 d , and/or duration 235 e ), and optionally a confidence score 235 f .
  • the segment identifier 235 a uniquely identifies each audio speech segment amongst the content segments identified within the metadata 200 .
  • the audio speech segment type 235 b can be a numeric value or string that indicates whether the content segment includes audio corresponding to a phrase, a sentence, a paragraph, story or topic, particular gender, and/or an identified speaker.
  • the audio speech segment type 235 b and the corresponding timing information can be obtained using a natural language processor 100 b capable of processing the timed word segments from the speech recognition processors 100 a and/or the media file/stream 20 itself.
  • the start offset 235 c is an offset for indexing into the audio/video content to the beginning of the content segment.
  • the end offset 235 d is an offset for indexing into the audio/video content to the end of the content segment.
  • the duration 235 e indicates the duration of the content segment.
  • the start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art.
  • the confidence score 235 f can be in the form of a statistical value (e.g., average, mean, variance, etc.) calculated from the individual confidence scores 225 f of the individual word segments.
  • the metadata 240 includes descriptive parameters for each of the timed video segments 245 , including a segment identifier 225 a , a video segment type 245 b , and timing information defining the boundaries of the content segment (e.g., start offset 245 c , end offset 245 d , and/or duration 245 e ).
  • the segment identifier 245 a uniquely identifies each video segment amongst the content segments identified within the metadata 200 .
  • the video segment type 245 b can be a numeric value or string that indicates whether the content segment corresponds to video of an individual scene, watermark, recognized object, recognized face, or overlay text.
  • the video segment type 245 b and the corresponding timing information can be obtained using a video frame analyzer 100 c capable of applying one or more image processing techniques.
  • the start offset 235 c is an offset for indexing into the audio/video content to the beginning of the content segment.
  • the end offset 235 d is an offset for indexing into the audio/video content to the end of the content segment.
  • the duration 235 e indicates the duration of the content segment.
  • the start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art.
  • the metadata 250 includes descriptive parameters for each of the timed non-speech audio segments 255 include a segment identifier 225 a , a non-speech audio segment type 255 b , and timing information defining the boundaries of the content segment (e.g., start offset 255 c , end offset 255 d , and/or duration 255 e ).
  • the segment identifier 255 a uniquely identifies each non-speech audio segment amongst the content segments identified within the metadata 200 .
  • the audio segment type 235 b can be a numeric value or string that indicates whether the content segment corresponds to audio of non-speech sounds, audio associated with a speaker emotion, audio within a range of volume levels, or sound gaps, for example.
  • the non-speech audio segment type 255 b and the corresponding timing information can be obtained using a non-speech audio analyzer 10 d .
  • the start offset 255 c is an offset for indexing into the audio/video content to the beginning of the content segment.
  • the end offset 255 d is an offset for indexing into the audio/video content to the end of the content segment.
  • the duration 255 e indicates the duration of the content segment.
  • the start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art.
  • the metadata 260 includes descriptive parameters for each of the timed marker segments 265 , including a segment identifier 265 a , a marker segment type 265 b , timing information defining the boundaries of the content segment (e.g., start offset 265 c , end offset 265 d , and/or duration 265 e ).
  • the segment identifier 265 a uniquely identifies each video segment amongst the content segments identified within the metadata 200 .
  • the marker segment type 265 b can be a numeric value or string that can indicates that the content segment corresponds to a predefined chapter or other marker within the media content (e.g., audio/video podcast).
  • the marker segment type 265 b and the corresponding timing information can be obtained using a marker extractor 100 e to obtain metadata in the form of markers (e.g., chapters) that are embedded within the media content in a manner known to those skilled in the art.
  • the invention features a computerized method and apparatus for generating and presenting search snippets that enable user-directed navigation of the underlying audio/video content.
  • the method involves obtaining metadata associated with discrete media content that satisfies a search query.
  • the metadata identifies a number of content segments and corresponding timing information derived from the underlying media content using one or more automated media processing techniques.
  • a search result or “snippet” can be generated that enables a user to arbitrarily select and commence playback of the underlying media content at any of the individual content segments.
  • FIG. 3 is a diagram illustrating an example of a search snippet that enables user-directed navigation of underlying media content.
  • the search snippet 310 includes a text area 320 displaying the text 325 of the words spoken during one or more content segments of the underlying media content.
  • a media player 330 capable of audio/video playback is embedded within the search snippet or alternatively executed in a separate window.
  • the text 325 for each word in the text area 320 is preferably mapped to a start offset of a corresponding word segment identified in the enhanced metadata.
  • an object e.g. SPAN object
  • the object defines a start offset of the word segment and an event handler.
  • Each start offset can be a timestamp or other indexing value that identifies the start of the corresponding word segment within the media content.
  • the text 325 for a group of words can be mapped to the start offset of a common content segment that contains all of those words.
  • Such content segments can include a audio speech segment, a video segment, or a marker segment, for example, as identified in the enhanced metadata of FIG. 2 .
  • Playback of the underlying media content occurs in response to the user selection of a word and begins at the start offset corresponding to the content segment mapped to the selected word or group of words.
  • User selection can be facilitated, for example, by directing a graphical pointer over the text area 320 using a pointing device and actuating the pointing device once the pointer is positioned over the text 325 of a desired word.
  • the object event handler provides the media player 330 with a set of input parameters, including a link to the media file/stream and the corresponding start offset, and directs the player 330 to commence or otherwise continue playback of the underlying media content at the input start offset.
  • the media player 330 begins to plays back the media content at the audio/video segment starting with “state of the union address . . . ”
  • the media player 330 commences playback of the audio/video segment starting with “bush outlined . . . ”
  • An advantage of this aspect of the invention is that a user can read the text of the underlying audio/video content displayed by the search snippet and then actively “jump to” a desired segment of the media content for audio/video playback without having to listen to or view the entire media stream.
  • FIGS. 4 and 5 are diagrams illustrating a computerized method and apparatus for generating search snippets that enable user navigation of the underlying media content.
  • a client 410 interfaces with a search engine module 420 for searching an index 430 for desired audio/video content.
  • the index includes a plurality of metadata associated with a number of discrete media content and enhanced for audio/video search as shown and described with reference to FIG. 2 .
  • the search engine module 420 also interfaces with a snippet generator module 440 that processes metadata satisfying a search query to generate the navigable search snippet for audio/video content for the client 410 .
  • Each of these modules can be implemented, for example, using a suitably programmed or dedicated processor (e.g., a microprocessor or microcontroller), hardwired logic, Application Specific Integrated Circuit (ASIC), and a Programmable Logic Device (PLD) (e.g., Field Programmable Gate Array (FPGA)).
  • a suitably programmed or dedicated processor e.g., a microprocessor or microcontroller
  • hardwired logic e.g., Application Specific Integrated Circuit (ASIC), and a Programmable Logic Device (PLD) (e.g., Field Programmable Gate Array (FPGA)).
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • FIG. 5 is a flow diagram illustrating a computerized method for generating search snippets that enable user-directed navigation of the underlying audio/video content.
  • the search engine 420 conducts a keyword search of the index 430 for a set of enhanced metadata documents satisfying the search query.
  • the search engine 420 obtains the enhanced metadata documents descriptive of one or more discrete media files/streams (e.g., audio/video podcasts).
  • the snippet generator 440 obtains an enhanced metadata document corresponding to the first media file/stream in the set.
  • the enhanced metadata identifies content segments and corresponding timing information defining the boundaries of each segment within the media file/stream.
  • the snippet generator 440 reads or parses the enhanced metadata document to obtain information on each of the content segments identified within the media file/stream.
  • the information obtained preferably includes the location of the underlying media content (e.g. URL), a segment identifier, a segment type, a start offset, an end offset (or duration), the word or the group of words spoken during that segment, if any, and an optional confidence score.
  • Step 530 is an optional step in which the snippet generator 440 makes a determination as to whether the information obtained from the enhanced metadata is sufficiently accurate to warrant further search and/or presentation as a valid search snippet.
  • each of the word segments 225 includes a confidence score 225 f assigned by the speech recognition processor 100 a .
  • Each confidence score is a relative ranking (typically between 0 and 1) as to the accuracy of the recognized text of the word segment.
  • a statistical value e.g., average, mean, variance, etc.
  • the process continues at steps 535 and 525 to obtain and read/parse the enhanced metadata document corresponding to the next media file/stream identified in the search at step 510 .
  • the process continues at step 540 .
  • the snippet generator 440 determines a segment type preference.
  • the segment type preference indicates which types of content segments to search and present as snippets.
  • the segment type preference can include a numeric value or string corresponding to one or more of the segment types. For example, if the segment type preference can be defined to be one of the audio speech segment types, e.g., “story,” the enhanced metadata is searched on a story-by-story basis for a match to the search query and the resulting snippets are also presented on a story-by-story basis. In other words, each of the content segments identified in the metadata as type “story” are individually searched for a match to the search query and also presented in a separate search snippet if a match is found.
  • the segment type preference can alternatively be defined to be one of the video segment types, e.g., individual scene.
  • the segment type preference can be fixed programmatically or user configurable.
  • the snippet generator 440 obtains the metadata information corresponding to a first content segment of the preferred segment type (e.g., the first story segment).
  • the metadata information for the content segment preferably includes the location of the underlying media file/stream, a segment identifier, the preferred segment type, a start offset, an end offset (or duration) and an optional confidence score.
  • the start offset and the end offset/duration define the timing boundaries of the content segment.
  • the text of words spoken during that segment if any, can be determined by identifying each of the word segments falling within the start and end offsets. For example, if the underlying media content is an audio/video podcast of a news program and the segment preference is “story,” the metadata information for the first content segment includes the text of the word segments spoken during the first news story.
  • Step 550 is an optional step in which the snippet generator 440 makes a determination as to whether the metadata information for the content segment is sufficiently accurate to warrant further search and/or presentation as a valid search snippet.
  • This step is similar to step 530 except that the confidence score is a statistical value (e.g., average, mean, variance, etc.) calculated from the individual confidence scores of the word segments 225 falling within the timing boundaries of the content segment.
  • step 555 the process continues at step 555 to obtain the metadata information corresponding to a next content segment of the preferred segment type. If there are no more content segments of the preferred segment type, the process continues at step 535 to obtain the enhanced metadata document corresponding to the next media file/stream identified in the search at step 510 . Conversely, if the confidence score of the metadata information for the content segment equals or exceeds the predetermined threshold, the process continues at step 560 .
  • the snippet generator 440 compares the text of the words spoken during the selected content segment, if any, to the keyword(s) of the search query. If the text derived from the content segment does not contain a match to the keyword search query, the metadata information for that segment is discarded. Otherwise, the process continues at optional step 565 .
  • the snippet generator 440 trims the text of the content segment (as determined at step 545 ) to fit within the boundaries of the display area (e.g., text area 320 of FIG. 3 ).
  • the text can be trimmed by locating the word(s) matching the search query and limiting the number of additional words before and after.
  • the text can be trimmed by locating the word(s) matching the search query, identifying another content segment that has a duration shorter than the segment type preference and contains the matching word(s), and limiting the displayed text of the search snippet to that of the content segment of shorter duration. For example, assuming that the segment type preference is of type “story,” the displayed text of the search snippet can be limited to that of segment type “sentence” or “paragraph”.
  • the snippet generator 440 filters the text of individual words from the search snippet according to their confidence scores. For example, in FIG. 2 , a confidence score 225 f is assigned to each of the word segments to represent a relative ranking that corresponds to the accuracy of the text of the recognized word. For each word in the text of the content segment, the confidence score from the corresponding word segment 225 is compared against a predetermined threshold value. If the confidence score for a word segment falls below the threshold, the text for that word segment is replaced with a predefined symbol (e.g., —). Otherwise no change is made to the text for that word segment.
  • a predefined symbol e.g., —
  • the snippet generator 440 adds the resulting metadata information for the content segment to a search result for the underlying media stream/file.
  • Each enhanced metadata document that is returned from the search engine can have zero, one or more content segments containing a match to the search query.
  • the corresponding search result associated with the media file/stream can also have zero, one or more search snippets associated with it.
  • An example of a search result that includes no search snippets occurs when the metadata of the original content descriptor contains the search term, but the timed word segments 105 a of FIG. 2 do not.
  • step 555 The process returns to step 555 to obtain the metadata information corresponding to the next content snippet segment of the preferred segment type. If there are no more content segments of the preferred segment type, the process continues at step 535 to obtain the enhanced metadata document corresponding to the next media file/stream identified in the search at step 510 . If there are no further metadata results to process, the process continues at optional step 582 to rank the search results before sending to the client 410 .
  • the snippet generator 440 ranks and sorts the list of search results.
  • One factor for determining the rank of the search results can include confidence scores.
  • the search results can be ranked by calculating the sum, average or other statistical value from the confidence scores of the constituent search snippets for each search result and then ranking and sorting accordingly. Search results being associated with higher confidence scores can be ranked and thus sorted higher than search results associated with lower confidence scores.
  • Other factors for ranking search results can include the publication date associated with the underlying media content and the number of snippets in each of the search results that contain the search term or terms. Any number of other criteria for ranking search results known to those skilled in the art can also be utilized in ranking the search results for audio/video content.
  • the search results can be returned in a number of different ways.
  • the snippet generator 440 can generate a set of instructions for rendering each of the constituent search snippets of the search result as shown in FIG. 3 , for example, from the raw metadata information for each of the identified content segments. Once the instructions are generated, they can be provided to the search engine 420 for forwarding to the client. If a search result includes a long list of snippets, the client can display the search result such that a few of the snippets are displayed along with an indicator that can be selected to show the entire set of snippets for that search result.
  • such a client includes (i) a browser application that is capable of presenting graphical search query forms and resulting pages of search snippets; (ii) a desktop or portable application capable of, or otherwise modified for, subscribing to a service and receiving alerts containing embedded search snippets (e.g., RSS reader applications); or (iii) a search applet embedded within a DVD (Digital Video Disc) that allows users to search a remote or local index to locate and navigate segments of the DVD audio/video content.
  • a browser application that is capable of presenting graphical search query forms and resulting pages of search snippets
  • a desktop or portable application capable of, or otherwise modified for, subscribing to a service and receiving alerts containing embedded search snippets (e.g., RSS reader applications)
  • a search applet embedded within a DVD Digital Video Disc
  • the metadata information contained within the list of search results in a raw data format are forwarded directly to the client 410 or indirectly to the client 410 via the search engine 420 .
  • the raw metadata information can include any combination of the parameters including a segment identifier, the location of the underlying content (e.g., URL or filename), segment type, the text of the word or group of words spoken during that segment (if any), timing information (e.g., start offset, end offset, and/or duration) and a confidence score (if any).
  • Such information can then be stored or further processed by the client 410 according to application specific requirements.
  • a client desktop application such as iTunes Music Store available from Apple Computer, Inc.
  • iTunes Music Store available from Apple Computer, Inc.
  • FIG. 6A is a diagram illustrating another example of a search snippet that enables user navigation of the underlying media content.
  • the search snippet 610 is similar to the snippet described with respect to FIG. 3 , and additionally includes a user actuated display element 640 that serves as a navigational control.
  • the navigational control 640 enables a user to control playback of the underlying media content.
  • the text area 620 is optional for displaying the text 625 of the words spoken during one or more segments of the underlying media content as previously discussed with respect to FIG. 3 .
  • Typical fast forward and fast reverse functions cause media players to jump ahead or jump back during media playback in fixed time increments.
  • the navigational control 640 enables a user to jump from one content segment to another segment using the timing information of individual content segments identified in the enhanced metadata.
  • the user-actuated display element 640 can include a number of navigational controls (e.g., Back 642 , Forward 648 , Play 644 , and Pause 646 ).
  • the Back 642 and Forward 648 controls can be configured to enable a user to jump between word segments, audio speech segments, video segments, non-speech audio segments, and marker segments. For example, if an audio/video podcast includes several content segments corresponding to different stories or topics, the user can easily skip such segments until the desired story or topic segment is reached.
  • FIGS. 6B and 6C are diagrams illustrating a method for navigating media content using the search snippet of FIG. 6A .
  • the client presents the search snippet of FIG. 6A , for example, that includes the user actuated display element 640 .
  • the user-actuated display element 640 includes a number of individual navigational controls (i.e., Back 642 , Forward 648 , Play 644 , and Pause 646 ).
  • Each of the navigational controls 642 , 644 , 646 , 648 is associated with an object defining at least one event handler that is responsive to user actuations.
  • the object event handler provides the media player 630 with a link to the media file/stream and directs the player 630 to initiate playback of the media content from the beginning of the file/stream or from the most recent playback offset.
  • a playback offset associated with the underlying media content in playback is determined.
  • the playback offset can be a timestamp or other indexing value that varies according to the content segment presently in playback. This playback offset can be determined by polling the media player or by autonomously tracking the playback time.
  • the playback state of media player module 830 is determined from the identity of the media file/stream presently in playback (e.g., URL or filename), if any, and the playback timing offset. Determination of the playback state can be accomplished by a sequence of status request/response 855 signaling to and from the media player module 830 .
  • a background media playback state tracker module 860 can be executed that keeps track of the identity of the media file in playback and maintains a playback clock (not shown) that tracks the relative playback timing offsets.
  • the playback offset is compared with the timing information corresponding to each of the content segments of the underlying media content to determine which of the content segments is presently in playback.
  • the navigational event handler 850 references a segment list 870 that identifies each of the content segments in the media file/stream and the corresponding timing offset of that segment.
  • the segment list 870 includes a segment list 872 corresponding to a set of timed audio speech segments (e.g., topics).
  • the segment list 872 can include a number of entries corresponding to the various topics discussed during that episode (e.g., news, weather, sports, entertainment, etc.) and the time offsets corresponding to the start of each topic.
  • the segment list 870 can also include a video segment list 874 or other lists (not shown) corresponding to timed word segments, timed non-speech audio segments, and timed marker segments, for example.
  • the segment lists 870 can be derived from the enhanced metadata or can be the enhanced metadata itself.
  • the underlying media content is played back at an offset that is prior to or subsequent to the offset of the content segment presently in playback.
  • the event handler 850 compares the playback timing offset to the set of predetermined timing offsets in one or more of the segment lists 870 to determine which of the content segments to playback next. For example, if the user clicked on the “forward” control 848 , the event handler 850 obtains the timing offset for the content segment that is greater in time than the present playback offset. Conversely, if the user clicks on the “backward” control 842 , the event handler 850 obtains the timing offset for the content segment that is earlier in time than the present playback offset. After determining the timing offset of the next segment to play, the event handler 850 provides the media player module 830 with instructions 880 directing playback of the media content at the next playback state (e.g., segment offset and/or URL).
  • the media player module 830 with instructions 880 directing playback of the media content at the next playback state (e.
  • an advantage of this aspect of the invention is that a user can control media using a client that is capable of jumping from one content segment to another segment using the timing information of individual content segments identified in the enhanced metadata.
  • portable player devices such as the iPod audio/video player available from Apple Computer, Inc.
  • iPod audio/video player available from Apple Computer, Inc.
  • the control buttons on the front panel of the iPod can be used to jump from one segment to the next segment of the podcast in a manner similar to that previously described.
  • Keyword tags have been used to associate audio and video files with keywords that are descriptive of the content of such media files.
  • An audio/video file or stream can be tagged in a number of different ways.
  • a content provider can publish a content descriptor document, such as a web page or RSS document, that includes a link and one or more keyword tags corresponding to an audio/video file or stream.
  • Keyword tags can also be embedded within the audio/video file itself.
  • the specifications for MPEG-1 Audio Layer 3, more commonly referred to as MP3 defines a field for reading and writing keyword tags (e.g., ID3V1 tag).
  • online systems such as search engines, can store indexes of tagged media files and allow end users to search for desired audio/video content through keyword searches of matching tags.
  • Particular online systems, such as YouTube at www.youtube.com also enable an end user to tag and upload audio/video files themselves to a database to allow others to search and access tagged media files.
  • FIG. 7 is a diagram that illustrates the concept of a tagged media file.
  • the media file 900 is a video clip from a sports news program in which the topics of discussion include the World Baseball Classic 905 and the effect of steroids in sports 910 .
  • Media clip 900 is organized such that the World Baseball Classic segment starts at time T 1 , which precedes the steroid segment starting at time T 2 .
  • the associated keyword tag 912 is “steroids.” Assuming that an end user establishes a connection to a search engine and conducts a search for audio/video associated with the tag “steroids,” the user might be presented with a search result including a link to the media clip of FIG. 7 . However, the end user must listen or watch the Word Baseball Classic segment 905 before reaching the steroids segment 910 . The user can try to fast forward past the World Baseball Classic segment 905 , but the user is unlikely to know where the steroids segment 910 starts.
  • the invention features a computerized method and apparatus for timed tagging of media content.
  • the method and apparatus can include the steps of, or structure for, obtaining at least one keyword tag associated with discrete media content; generating a timed segment index of discrete media content, the timed segment index identifying content segments of the discrete media content and corresponding timing boundaries of the content segments; searching the timed segment index for a match to the at least one keyword tag, the match corresponding to at least one of the content segments identified in the segment index; and generating a timed tag index that includes the at least one keyword tag and the timing boundaries corresponding to the least one content segment of the discrete media content containing the match.
  • FIG. 8A is a diagram that illustrates a system including an apparatus for timed tagging of media content.
  • the apparatus 920 includes a number of modules. As shown, the apparatus 920 includes an input module 925 , a media indexer module 930 , a timed tag generator module 935 and a database 940 .
  • the database 940 can be accessible to a search engine, for example (not shown).
  • FIG. 8B is a flow diagram that illustrates a method for timed tagging of media content according to the apparatus of FIG. 8A .
  • the input module 925 provides an interface for receiving information regarding an audio/video file or stream and optionally a corresponding set of keyword tags from a content provider 950 .
  • the input module 925 can provide a graphical or text-based user interface that is capable of being presented to a content provider 950 a (e.g., user) through a browser. Through such an interface, the content provider 950 a can upload an audio/video file and an optional set of provider-defined keyword tags to be associated with the media file.
  • the content provider 950 b can push to the input module 925 , or alternatively, the input module 925 can pull from the content provider 950 b , a content descriptor that includes a link to a corresponding audio/video file or stream (e.g., RSS document, web page, URL link) and an optional set of keyword tags embedded within the content descriptor.
  • a content descriptor that includes a link to a corresponding audio/video file or stream (e.g., RSS document, web page, URL link) and an optional set of keyword tags embedded within the content descriptor.
  • the input module 925 transmits the information regarding the audio/video file or stream to the media indexer 930 , and transmits the optional set of provider-defined tags to the timed tag generator 935 .
  • the input module can simply pass the data directly to the media indexer and timed tag generator respectively.
  • the input module 925 can process the content descriptor to extract the link to the media file or stream and the optional set of tags.
  • the input module 925 can forward them to the media indexer 930 and timed tag generator 935 , respectively. If a link to the media file is provided to the media indexer 930 , the media indexer uses the link to retrieve the media file or stream for further processing.
  • the media indexer 930 creates a timed segment index from the audio/video content of the media file.
  • the timed segment index 200 (or enhanced metadata) can identify a number of timed word segments 220 corresponding to the audio portion of the media file.
  • Each of the timed word segments 220 can include a segment identifier 225 a , the text of an individual word 225 b , timing information defining the boundaries of that content segment (i.e., start offset 225 c , end offset 225 d , and/or duration 225 e ), and optionally a confidence score 225 f .
  • the segment index can also include one or more of the other types of content segments (e.g., audio speech segment 230 , video segment 240 , marker segment 260 ).
  • the media indexer 930 then transmits the segment index to the timed tag generator 935 .
  • the timed tag generator 935 can automatically generate tags from the timed segment index 200 .
  • the timed tag generator 935 can generate additional tags according to a number of different ways.
  • the series of timed word segments 220 include the text of the words spoken during the audio portion of the media file.
  • the timed tag generator 935 can read these words and employ an algorithm that maintains a word count for each word and generates a new tag for the top “n” words that exceed a threshold count.
  • the timed tag generator 935 can employ an algorithm that compares the text of the words to a predetermined list of tags. If a match is found, the matching tag is added to the list of provider-defined tags.
  • the timed tag generator 935 can employ a named entity extractor module, such as those known in the art, to read the text of the words, obtain a list of people, places or things, for example, and then use one or more of the named entities as keyword tags.
  • FIG. 9 is a diagram that illustrates an exemplary timed segment index for media clip of FIG. 7 .
  • the timed segment index 1200 includes a set of word segments 1210 and a set of marker segments 1220 .
  • Marker segments 1220 can be defined by markers can be embedded in the audio/video content by the content provider that indicate the beginning and/or end of a content segment. Markers can also be embedded in a content descriptor corresponding to an audio/video file or stream.
  • a content provider can publish a web page that includes a link to an audio/video file and specifies in the text of the descriptor the beginning and end of content segments (e.g., “The discussion on the Word Baseball Classic starts at time T 1 and ends at time T 2 . .
  • the corresponding media clip is associated with provider-defined tag “steroids.”
  • the timed tag generator 935 can also identify the words “world baseball classic” spoken during segment 905 of the media clip 900 as an additional tag.
  • the timed tag generator 935 obtains the first tag from the list of provider-defined tags and/or automatically generated tags associated with the media file.
  • the timed tag generator 935 searches for the tag within the timed segment index. For example, with respect to the timed segment index of FIG. 9 , the timed tag generator 935 can search for the tag “steroids” within the set of timed word segments 1210 that provide the text of the words spoken during the audio portion of the media file.
  • the timed tag generator 935 can compare the text of one or more word segments to the tag. If there is a match, the process continues at step 1060 .
  • the timing boundaries are obtained for the matching word segment, or segments in the case of a multi-word tag.
  • the timing boundaries of a word segment can include a start offset and an end offset, or duration, as previously described with respect to FIG. 2 . These timing boundaries define the segment of the media content when the particular tag is spoken.
  • the first word segment containing the tag “steroids” is word segment WS 050 having timing boundaries of T 30 and T 31 .
  • the timing boundaries of the matching word segment(s) containing the tag are extended by comparing the timing boundaries of the matching word segment to the timing boundaries of the other types of content segments (e.g., audio speech segment, video segment, marker segment as previously described in FIG. 2 ). If the timing boundaries of the matching word segment fall within the timing boundaries of a broader content segment, the timing boundaries for the tag can be extended to coincide with the timing boundaries of that broader content segment.
  • marker segments MS 001 and MS 002 defining timing boundaries that contain a plurality of the word segments 1210 .
  • marker segment MS 001 defines the timing boundaries for the World Baseball Classic segment
  • marker segment MS 002 defines the timing boundaries for the steroids segment.
  • the timed tag generator 935 searches for the first word segment containing the keyword tag “steroids” in the text of the timed word segments 1210 , and obtains the timing boundaries for the matching word segment WS 050 , namely start offset T 30 and end offset T 31 .
  • the timed tag generator 935 then expands the timing boundaries for the tag by comparing the timing boundaries T 30 and T 31 against the timing boundaries for marker segments MS 001 and MS 002 .
  • the keyword tag “steroids” is mapped to the timing boundaries T 25 and T 99 .
  • the second and third instances of the keyword tag “steroids” in word segments WS 060 and WS 070 fall within the timing boundaries of marker segment MS 002 , and thus the timing boundaries associated with tag “steroids” do not change.
  • the tag can be associated with multiple timing boundaries corresponding to each of the broader segments.
  • the timed tag generator creates or obtains a timed tag index for the audio/video file and maps the tag to the extended timing boundaries.
  • FIGS. 10A and 10B are diagrams that conceptually illustrate a timed tag index.
  • the timed tag index 1250 can be implemented as a table corresponding to a specific tag (e.g., “steroids”).
  • the timed tag index 1255 can also be implemented as a table corresponding to a specific media file.
  • the entries of the table includes one or more specific tags associated with the media file, the timing boundaries of the audio/video content associated with each tag, and a link or pointer to the audio/video file in the database or other remote location.
  • the timed tag generator 935 obtains the next tag from the list of provider-defined tags and/or automatically generated tags associated with the media file. If another tag is available, the process continues returning back to step 1050 . Conversely, if all of the tags from the list have been processed, the process continues at step 1100 in which the timed tag generator 935 stores the timed tag index and optionally the audio/video file, itself in the searchable database 940 .
  • FIG. 11 is a diagram illustrating a system for accessing timed tagged media content from a search engine.
  • the system 1300 includes a search engine 1320 or other server capable of accessing database 1335 .
  • the database 1335 includes one or more timed tag indexes 1335 that map a tag to timed segments of one or more media files.
  • each of the timed tag indexes 1335 can map timed segments of a particular media file to one or more provider-defined or automatically generated tags.
  • the search engine 1320 access the timed tag indexes 1335 to identify each of the timed segments that correspond to the requested tag.
  • the search engine can then generate instructions to present one or more of timed tagged segments of media content to the request via a browser interface 1340 , for example.
  • FIG. 11 illustrates a browser interface 1340 that presents a media player 1345 and a toolbar 1350 for jumping between the tagged timed segments.
  • the toolbar 1350 includes a button 1352 for jumping to the timed segment associated with the tag “world baseball classic.” and another button 1354 for jumping to the timed segment associated with the tag “steroids.” Any number of different ways can be implemented for presented timed tagged segments to a user.
  • the above-described techniques can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them.
  • the implementation can be as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers.
  • a computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
  • Method steps can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by, and apparatus can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). Modules can refer to portions of the computer program and/or the processor/special circuitry that implements that functionality.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Data transmission and instructions can also occur over a communications network.
  • Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
  • magnetic disks e.g., internal hard disks or removable disks
  • magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry.
  • module and “function,” as used herein, mean, but are not limited to, a software or hardware component which performs certain tasks.
  • a module may advantageously be configured to reside on addressable storage medium and configured to execute on one or more processors.
  • a module may be fully or partially implemented with a general purpose integrated circuit (IC), FPGA, or ASIC.
  • IC general purpose integrated circuit
  • a module may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • the functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.
  • the components and modules may advantageously be implemented on many different platforms, including computers, computer servers, data communications infrastructure equipment such as application-enabled switches or routers, or telecommunications infrastructure equipment, such as public or private telephone switches or private branch exchanges (PBX).
  • data communications infrastructure equipment such as application-enabled switches or routers
  • telecommunications infrastructure equipment such as public or private telephone switches or private branch exchanges (PBX).
  • PBX private branch exchanges
  • the above described techniques can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer (e.g., interact with a user interface element).
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • the above described techniques can be implemented in a distributed computing system that includes a back-end component, e.g., as a data server, and/or a middleware component, e.g., an application server, and/or a front-end component, e.g., a client computer having a graphical user interface and/or a Web browser through which a user can interact with an example implementation, or any combination of such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet, and include both wired and wireless networks. Communication networks can also all or a portion of the PSTN, for example, a portion owned by a specific carrier.
  • LAN local area network
  • WAN wide area network
  • Communication networks can also all or a portion of the PSTN, for example, a portion owned
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Abstract

A method and apparatus for timed tagging of content is featured. The method and apparatus can include the steps of, or structure for, obtaining at least one keyword tag associated with discrete media content; generating a timed segment index of discrete media content, the timed segment index identifying content segments of the discrete media content and corresponding timing boundaries of the content segments; searching the timed segment index for a match to the at least one keyword tag, the match corresponding to at least one of the content segments identified in the segment index; and generating a timed tag index that includes the at least one keyword tag and the timing boundaries corresponding to the least one content segment of the discrete media content containing the match.

Description

    RELATED APPLICATIONS
  • This application is a continuation-in-part of U.S. patent application Ser. No. 11/395,732, filed on Mar. 31, 2006, which claims the benefit of U.S. Provisional Application No. 60/736,124, filed on Nov. 9, 2005. The entire teachings of the above applications are incorporated herein by reference.
  • FIELD OF THE INVENTION
  • Aspects of the invention relate to methods and apparatus for generating and using enhanced metadata in search-driven applications.
  • BACKGROUND OF THE INVENTION
  • As the World Wide Web has emerged as a major research tool across all fields of study, the concept of metadata has become a crucial topic. Metadata, which can be broadly defined as “data about data,” refers to the searchable definitions used to locate information. This issue is particularly relevant to searches on the Web, where metatags may determine the ease with which a particular Web site is located by searchers. Metadata that are embedded with content is called embedded metadata. A data repository typically stores the metadata detached from the data.
  • Results obtained from search engine queries are limited to metadata information stored in a data repository, referred to as an index. With respect to media files or streams, the metadata information that describes the audio content or the video content is typically limited to information provided by the content publisher. For example, the metadata information associated with audio/video podcasts generally consists of a URL link to the podcast, title, and a brief summary of its content. If this limited information fails to satisfy a search query, the search engine is not likely to provide the corresponding audio/video podcast as a search result even if the actual content of the audio/video podcast satisfies the query.
  • SUMMARY OF THE INVENTION
  • According to one aspect, the invention features an automated method and apparatus for generating metadata enhanced for audio, video or both (“audio/video”) search-driven applications. The apparatus includes a media indexer that obtains an media file or stream (“media file/stream”), applies one or more automated media processing techniques to the media file/stream, combines the results of the media processing into metadata enhanced for audio/video search, and stores the enhanced metadata in a searchable index or other data repository. The media file/stream can be an audio/video podcast, for example. By generating or otherwise obtaining such enhanced metadata that identifies content segments and corresponding timing information from the underlying media content, a number of for audio/video search-driven applications can be implemented as described herein. The term “media” as referred to herein includes audio, video or both.
  • According to another aspect, the invention features a computerized method and apparatus for timed tagging of media content. According to an embodiment, the method and apparatus can include the steps of, or structure for, obtaining at least one keyword tag associated with discrete media content; generating a timed segment index of discrete media content, the timed segment index identifying content segments of the discrete media content and corresponding timing boundaries of the content segments; searching the timed segment index for a match to the at least one keyword tag, the match corresponding to at least one of the content segments identified in the segment index; and generating a timed tag index that includes the at least one keyword tag and the timing boundaries corresponding to the least one content segment of the discrete media content containing the match.
  • One or more of the content segments identified in the timed segment index can include word segments, audio speech segments, video segments, or marker segments. For example, one or more of the content segments identified in the timed segment index can include audio corresponding to an individual word, audio corresponding to a phrase, audio corresponding to a sentence, audio corresponding to a paragraph, audio corresponding to a story, audio corresponding to a topic, audio within a range of volume levels, audio of an identified speaker, audio during a speaker turn, audio associated with a speaker emotion, audio separated by sound gaps, audio separated by markers embedded within the media content or audio corresponding to a named entity. One or more of the content segments identified in the timed segment index can also include video of individual scenes, watermarks, recognized objects, recognized faces, overlay text or video separated by markers embedded within the media content.
  • The computerized method and apparatus can further include the steps of, or structure for, generating a timed segment index of discrete media content, the timed segment index identifying text of audible words from content segments of the discrete media content and corresponding timing boundaries of the content segments; searching the timed segment index for text matching the at least one keyword tag, the matching text corresponding to at least one of the content segments identified in the segment index; and generating a timed tag index that includes the at least one keyword tag and the timing boundaries corresponding to the least one content segment of the discrete media content containing the matching text. The text of audible words from content segments of the discrete media content can be derived from the discrete media content using one or more media processing techniques or obtained from closed caption data associated with the discrete media content. Where the text of the audible words is obtained from closed caption data, the computerized method and apparatus can further include the steps of, or structure for, aligning the text from the closed caption data to timing boundaries corresponding to the content segments of the discrete media content; and generating the timed segment index of discrete media content, the timed segment index identifying the text from the closed caption data aligned to the corresponding timing boundaries of the content segments.
  • The computerized method and apparatus can further include the step of, or structure for, receiving the keyword tag from a content provider, the keyword tag being associated with the discrete media content by the content provider. The computerized method and apparatus can further include the step of, or structure for, receiving the keyword tag from a content provider, the keyword tag being uploaded along with the discrete media content by the content provider. The computerized method and apparatus can further include the step of, or structure for, receiving the keyword tag from a content provider, the keyword tag being embedded in a content descriptor corresponding to the discrete media content provided by the content provider. The computerized method and apparatus can further include the step of, or structure for, generating the keyword tag from the timed segment index.
  • The content segments identified in the timed segment index can include word segments, such that each word segment identifies the text of an audible word and the corresponding timing boundaries of the audible word within the discrete media content. Using such an index, the computerized method and apparatus can further include the steps of, or structure for, comparing the at least one keyword tag to the text of the audible word identified in each of the word segments; obtaining the corresponding timing boundaries for at least one of the word segments including the text of an audible word matching to the at least one keyword tag; identifying a broader content segment from the timed segment index having timing boundaries that include the corresponding timing boundaries of the word segment matching to the at least one keyword tag; and mapping the timing boundaries of the broader content segment to the at least one keyword tag in the timed tag index.
  • The computerized method and apparatus can further include the step of, or structure for, presenting a search result that enables a user to arbitrarily select and commence playback of the discrete media content at any of the content segments associated with the at least one keyword tag using the timing boundaries identified within the timed tag index.
  • BRIEF DESCRIPTIONS OF THE DRAWINGS
  • The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
  • FIG. 1A is a diagram illustrating an apparatus and method for generating metadata enhanced for audio/video search-driven applications.
  • FIG. 1B is a diagram illustrating an example of a media indexer.
  • FIG. 2 is a diagram illustrating an example of metadata enhanced for audio/video search-driven applications.
  • FIG. 3 is a diagram illustrating an example of a search snippet that enables user-directed navigation of underlying media content.
  • FIGS. 4 and 5 are diagrams illustrating a computerized method and apparatus for generating search snippets that enable user navigation of the underlying media content.
  • FIG. 6A is a diagram illustrating another example of a search snippet that enables user navigation of the underlying media content.
  • FIGS. 6B and 6C are diagrams illustrating a method for navigating media content using the search snippet of FIG. 6A.
  • FIG. 7 is a diagram that illustrates the concept of a tagged media file.
  • FIG. 8A is a diagram that illustrates a system including an apparatus for timed tagging of media content.
  • FIG. 8B is a flow diagram that illustrates a method for timed tagging of media content according to the apparatus of FIG. 8A.
  • FIG. 9 is a diagram that illustrates an exemplary timed segment index for media clip of FIG. 7.
  • FIGS. 10A and 10B are diagrams that conceptually illustrate a timed tag index.
  • FIG. 11 is a diagram illustrating a system for accessing timed tagged media content from a search engine.
  • DETAILED DESCRIPTION Generation of Enhanced Metadata for Audio/Video
  • The invention features an automated method and apparatus for generating metadata enhanced for audio/video search-driven applications. The apparatus includes a media indexer that obtains an media file/stream (e.g., audio/video podcasts), applies one or more automated media processing techniques to the media file/stream, combines the results of the media processing into metadata enhanced for audio/video search, and stores the enhanced metadata in a searchable index or other data repository.
  • FIG. 1A is a diagram illustrating an apparatus and method for generating metadata enhanced for audio/video search-driven applications. As shown, the media indexer 10 cooperates with a descriptor indexer 50 to generate the enhanced metadata 30. A content descriptor 25 is received and processed by both the media indexer 10 and the descriptor indexer 50. For example, if the content descriptor 25 is a Really Simple Syndication (RSS) document, the metadata 27 corresponding to one or more audio/video podcasts includes a title, summary, and location (e.g., URL link) for each podcast. The descriptor indexer 50 extracts the descriptor metadata 27 from the text and embedded metatags of the content descriptor 25 and outputs it to a combiner 60. The content descriptor 25 can also be a simple web page link to a media file. The link can contain information in the text of the link that describes the file and can also include attributes in the HTML that describe the target media file.
  • In parallel, the media indexer 10 reads the metadata 27 from the content descriptor 25 and downloads the audio/video podcast 20 from the identified location. The media indexer 10 applies one or more automated media processing techniques to the downloaded podcast and outputs the combined results to the combiner 60. At the combiner 60, the metadata information from the media indexer 10 and the descriptor indexer 50 are combined in a predetermined format to form the enhanced metadata 30. The enhanced metadata 30 is then stored in the index 40 accessible to search-driven applications such as those disclosed herein.
  • In other embodiments, the descriptor indexer 50 is optional and the enhanced metadata is generated by the media indexer 10.
  • FIG. 1B is a diagram illustrating an example of a media indexer. As shown, the media indexer 10 includes a bank of media processors 100 that are managed by a media indexing controller 110. The media indexing controller 110 and each of the media processors 100 can be implemented, for example, using a suitably programmed or dedicated processor (e.g., a microprocessor or microcontroller), hardwired logic, Application Specific Integrated Circuit (ASIC), and a Programmable Logic Device (PLD) (e.g., Field Programmable Gate Array (FPGA)).
  • A content descriptor 25 is fed into the media indexing controller 110, which allocates one or more appropriate media processors 100 a . . . 100 n to process the media files/streams 20 identified in the metadata 27. Each of the assigned media processors 100 obtains the media file/stream (e.g., audio/video podcast) and applies a predefined set of audio or video processing routines to derive a portion of the enhanced metadata from the media content.
  • Examples of known media processors 100 include speech recognition processors 100 a, natural language processors 100 b, video frame analyzers 100 c, non-speech audio analyzers 100 d, marker extractors 100 e and embedded metadata processors 100 f. Other media processors known to those skilled in the art of audio and video analysis can also be implemented within the media indexer. The results of such media processing define timing boundaries of a number of content segment within a media file/stream, including timed word segments 105 a, timed audio speech segments 105 b, timed video segments 105 c, timed non-speech audio segments 105 d, timed marker segments 105 e, as well as miscellaneous content attributes 105 f, for example.
  • FIG. 2 is a diagram illustrating an example of metadata enhanced for audio/video search-driven applications. As shown, the enhanced metadata 200 include metadata 210 corresponding to the underlying media content generally. For example, where the underlying media content is an audio/video podcast, metadata 210 can include a URL 215 a, title 215 b, summary 215 c, and miscellaneous content attributes 215 d. Such information can be obtained from a content descriptor by the descriptor indexer 50. An example of a content descriptor is a Really Simple Syndication (RSS) document that is descriptive of one or more audio/video podcasts. Alternatively, such information can be extracted by an embedded metadata processor 100 f from header fields embedded within the media file/stream according to a predetermined format.
  • The enhanced metadata 200 further identifies individual segments of audio/video content and timing information that defines the boundaries of each segment within the media file/stream. For example, in FIG. 2, the enhanced metadata 200 includes metadata that identifies a number of possible content segments within a typical media file/stream, namely word segments, audio speech segments, video segments, non-speech audio segments, and/or marker segments, for example.
  • The metadata 220 includes descriptive parameters for each of the timed word segments 225, including a segment identifier 225 a, the text of an individual word 225 b, timing information defining the boundaries of that content segment (i.e., start offset 225 c, end offset 225 d, and/or duration 225 e), and optionally a confidence score 225 f. The segment identifier 225 a uniquely identifies each word segment amongst the content segments identified within the metadata 200. The text of the word segment 225 b can be determined using a speech recognition processor 100 a or parsed from closed caption data included with the media file/stream. The start offset 225 c is an offset for indexing into the audio/video content to the beginning of the content segment. The end offset 225 d is an offset for indexing into the audio/video content to the end of the content segment. The duration 225 e indicates the duration of the content segment. The start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art. The confidence score 225 f is a relative ranking (typically between 0 and 1) provided by the speech recognition processor 100 a as to the accuracy of the recognized word.
  • The metadata 230 includes descriptive parameters for each of the timed audio speech segments 235, including a segment identifier 235 a, an audio speech segment type 235 b, timing information defining the boundaries of the content segment (e.g., start offset 235 c, end offset 235 d, and/or duration 235 e), and optionally a confidence score 235 f. The segment identifier 235 a uniquely identifies each audio speech segment amongst the content segments identified within the metadata 200. The audio speech segment type 235 b can be a numeric value or string that indicates whether the content segment includes audio corresponding to a phrase, a sentence, a paragraph, story or topic, particular gender, and/or an identified speaker. The audio speech segment type 235 b and the corresponding timing information can be obtained using a natural language processor 100 b capable of processing the timed word segments from the speech recognition processors 100 a and/or the media file/stream 20 itself. The start offset 235 c is an offset for indexing into the audio/video content to the beginning of the content segment. The end offset 235 d is an offset for indexing into the audio/video content to the end of the content segment. The duration 235 e indicates the duration of the content segment. The start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art. The confidence score 235 f can be in the form of a statistical value (e.g., average, mean, variance, etc.) calculated from the individual confidence scores 225 f of the individual word segments.
  • The metadata 240 includes descriptive parameters for each of the timed video segments 245, including a segment identifier 225 a, a video segment type 245 b, and timing information defining the boundaries of the content segment (e.g., start offset 245 c, end offset 245 d, and/or duration 245 e). The segment identifier 245 a uniquely identifies each video segment amongst the content segments identified within the metadata 200. The video segment type 245 b can be a numeric value or string that indicates whether the content segment corresponds to video of an individual scene, watermark, recognized object, recognized face, or overlay text. The video segment type 245 b and the corresponding timing information can be obtained using a video frame analyzer 100 c capable of applying one or more image processing techniques. The start offset 235 c is an offset for indexing into the audio/video content to the beginning of the content segment. The end offset 235 d is an offset for indexing into the audio/video content to the end of the content segment. The duration 235 e indicates the duration of the content segment. The start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art.
  • The metadata 250 includes descriptive parameters for each of the timed non-speech audio segments 255 include a segment identifier 225 a, a non-speech audio segment type 255 b, and timing information defining the boundaries of the content segment (e.g., start offset 255 c, end offset 255 d, and/or duration 255 e). The segment identifier 255 a uniquely identifies each non-speech audio segment amongst the content segments identified within the metadata 200. The audio segment type 235 b can be a numeric value or string that indicates whether the content segment corresponds to audio of non-speech sounds, audio associated with a speaker emotion, audio within a range of volume levels, or sound gaps, for example. The non-speech audio segment type 255 b and the corresponding timing information can be obtained using a non-speech audio analyzer 10 d. The start offset 255 c is an offset for indexing into the audio/video content to the beginning of the content segment. The end offset 255 d is an offset for indexing into the audio/video content to the end of the content segment. The duration 255 e indicates the duration of the content segment. The start offset, end offset and duration can each be represented as a timestamp, frame number or value corresponding to any other indexing scheme known to those skilled in the art.
  • The metadata 260 includes descriptive parameters for each of the timed marker segments 265, including a segment identifier 265 a, a marker segment type 265 b, timing information defining the boundaries of the content segment (e.g., start offset 265 c, end offset 265 d, and/or duration 265 e). The segment identifier 265 a uniquely identifies each video segment amongst the content segments identified within the metadata 200. The marker segment type 265 b can be a numeric value or string that can indicates that the content segment corresponds to a predefined chapter or other marker within the media content (e.g., audio/video podcast). The marker segment type 265 b and the corresponding timing information can be obtained using a marker extractor 100 e to obtain metadata in the form of markers (e.g., chapters) that are embedded within the media content in a manner known to those skilled in the art.
  • By generating or otherwise obtaining such enhanced metadata that identifies content segments and corresponding timing information from the underlying media content, a number of for audio/video search-driven applications can be implemented as described herein.
  • Audio/Video Search Snippets
  • According to another aspect, the invention features a computerized method and apparatus for generating and presenting search snippets that enable user-directed navigation of the underlying audio/video content. The method involves obtaining metadata associated with discrete media content that satisfies a search query. The metadata identifies a number of content segments and corresponding timing information derived from the underlying media content using one or more automated media processing techniques. Using the timing information identified in the metadata, a search result or “snippet” can be generated that enables a user to arbitrarily select and commence playback of the underlying media content at any of the individual content segments.
  • FIG. 3 is a diagram illustrating an example of a search snippet that enables user-directed navigation of underlying media content. The search snippet 310 includes a text area 320 displaying the text 325 of the words spoken during one or more content segments of the underlying media content. A media player 330 capable of audio/video playback is embedded within the search snippet or alternatively executed in a separate window.
  • The text 325 for each word in the text area 320 is preferably mapped to a start offset of a corresponding word segment identified in the enhanced metadata. For example, an object (e.g. SPAN object) can be defined for each of the displayed words in the text area 320. The object defines a start offset of the word segment and an event handler. Each start offset can be a timestamp or other indexing value that identifies the start of the corresponding word segment within the media content. Alternatively, the text 325 for a group of words can be mapped to the start offset of a common content segment that contains all of those words. Such content segments can include a audio speech segment, a video segment, or a marker segment, for example, as identified in the enhanced metadata of FIG. 2.
  • Playback of the underlying media content occurs in response to the user selection of a word and begins at the start offset corresponding to the content segment mapped to the selected word or group of words. User selection can be facilitated, for example, by directing a graphical pointer over the text area 320 using a pointing device and actuating the pointing device once the pointer is positioned over the text 325 of a desired word. In response, the object event handler provides the media player 330 with a set of input parameters, including a link to the media file/stream and the corresponding start offset, and directs the player 330 to commence or otherwise continue playback of the underlying media content at the input start offset.
  • For example, referring to FIG. 3, if a user clicks on the word 325 a, the media player 330 begins to plays back the media content at the audio/video segment starting with “state of the union address . . . ” Likewise, if the user clicks on the word 325 b, the media player 330 commences playback of the audio/video segment starting with “bush outlined . . . ”
  • An advantage of this aspect of the invention is that a user can read the text of the underlying audio/video content displayed by the search snippet and then actively “jump to” a desired segment of the media content for audio/video playback without having to listen to or view the entire media stream.
  • FIGS. 4 and 5 are diagrams illustrating a computerized method and apparatus for generating search snippets that enable user navigation of the underlying media content. Referring to FIG. 4, a client 410 interfaces with a search engine module 420 for searching an index 430 for desired audio/video content. The index includes a plurality of metadata associated with a number of discrete media content and enhanced for audio/video search as shown and described with reference to FIG. 2. The search engine module 420 also interfaces with a snippet generator module 440 that processes metadata satisfying a search query to generate the navigable search snippet for audio/video content for the client 410. Each of these modules can be implemented, for example, using a suitably programmed or dedicated processor (e.g., a microprocessor or microcontroller), hardwired logic, Application Specific Integrated Circuit (ASIC), and a Programmable Logic Device (PLD) (e.g., Field Programmable Gate Array (FPGA)).
  • FIG. 5 is a flow diagram illustrating a computerized method for generating search snippets that enable user-directed navigation of the underlying audio/video content. At step 510, the search engine 420 conducts a keyword search of the index 430 for a set of enhanced metadata documents satisfying the search query. At step 515, the search engine 420 obtains the enhanced metadata documents descriptive of one or more discrete media files/streams (e.g., audio/video podcasts).
  • At step 520, the snippet generator 440 obtains an enhanced metadata document corresponding to the first media file/stream in the set. As previously discussed with respect to FIG. 2, the enhanced metadata identifies content segments and corresponding timing information defining the boundaries of each segment within the media file/stream.
  • At step 525, the snippet generator 440 reads or parses the enhanced metadata document to obtain information on each of the content segments identified within the media file/stream. For each content segment, the information obtained preferably includes the location of the underlying media content (e.g. URL), a segment identifier, a segment type, a start offset, an end offset (or duration), the word or the group of words spoken during that segment, if any, and an optional confidence score.
  • Step 530 is an optional step in which the snippet generator 440 makes a determination as to whether the information obtained from the enhanced metadata is sufficiently accurate to warrant further search and/or presentation as a valid search snippet. For example, as shown in FIG. 2, each of the word segments 225 includes a confidence score 225 f assigned by the speech recognition processor 100 a. Each confidence score is a relative ranking (typically between 0 and 1) as to the accuracy of the recognized text of the word segment. To determine an overall confidence score for the enhanced metadata document in its entirety, a statistical value (e.g., average, mean, variance, etc.) can be calculated from the individual confidence scores of all the word segments 225.
  • Thus, if, at step 530, the overall confidence score falls below a predetermined threshold, the enhanced metadata document can be deemed unacceptable from which to present any search snippet of the underlying media content. Thus, the process continues at steps 535 and 525 to obtain and read/parse the enhanced metadata document corresponding to the next media file/stream identified in the search at step 510. Conversely, if the confidence score for the enhanced metadata in its entirety equals or exceeds the predetermined threshold, the process continues at step 540.
  • At step 540, the snippet generator 440 determines a segment type preference. The segment type preference indicates which types of content segments to search and present as snippets. The segment type preference can include a numeric value or string corresponding to one or more of the segment types. For example, if the segment type preference can be defined to be one of the audio speech segment types, e.g., “story,” the enhanced metadata is searched on a story-by-story basis for a match to the search query and the resulting snippets are also presented on a story-by-story basis. In other words, each of the content segments identified in the metadata as type “story” are individually searched for a match to the search query and also presented in a separate search snippet if a match is found. Likewise, the segment type preference can alternatively be defined to be one of the video segment types, e.g., individual scene. The segment type preference can be fixed programmatically or user configurable.
  • At step 545, the snippet generator 440 obtains the metadata information corresponding to a first content segment of the preferred segment type (e.g., the first story segment). The metadata information for the content segment preferably includes the location of the underlying media file/stream, a segment identifier, the preferred segment type, a start offset, an end offset (or duration) and an optional confidence score. The start offset and the end offset/duration define the timing boundaries of the content segment. By referencing the enhanced metadata, the text of words spoken during that segment, if any, can be determined by identifying each of the word segments falling within the start and end offsets. For example, if the underlying media content is an audio/video podcast of a news program and the segment preference is “story,” the metadata information for the first content segment includes the text of the word segments spoken during the first news story.
  • Step 550 is an optional step in which the snippet generator 440 makes a determination as to whether the metadata information for the content segment is sufficiently accurate to warrant further search and/or presentation as a valid search snippet. This step is similar to step 530 except that the confidence score is a statistical value (e.g., average, mean, variance, etc.) calculated from the individual confidence scores of the word segments 225 falling within the timing boundaries of the content segment.
  • If the confidence score falls below a predetermined threshold, the process continues at step 555 to obtain the metadata information corresponding to a next content segment of the preferred segment type. If there are no more content segments of the preferred segment type, the process continues at step 535 to obtain the enhanced metadata document corresponding to the next media file/stream identified in the search at step 510. Conversely, if the confidence score of the metadata information for the content segment equals or exceeds the predetermined threshold, the process continues at step 560.
  • At step 560, the snippet generator 440 compares the text of the words spoken during the selected content segment, if any, to the keyword(s) of the search query. If the text derived from the content segment does not contain a match to the keyword search query, the metadata information for that segment is discarded. Otherwise, the process continues at optional step 565.
  • At optional step 565, the snippet generator 440 trims the text of the content segment (as determined at step 545) to fit within the boundaries of the display area (e.g., text area 320 of FIG. 3). According to one embodiment, the text can be trimmed by locating the word(s) matching the search query and limiting the number of additional words before and after. According to another embodiment, the text can be trimmed by locating the word(s) matching the search query, identifying another content segment that has a duration shorter than the segment type preference and contains the matching word(s), and limiting the displayed text of the search snippet to that of the content segment of shorter duration. For example, assuming that the segment type preference is of type “story,” the displayed text of the search snippet can be limited to that of segment type “sentence” or “paragraph”.
  • At optional step 575, the snippet generator 440 filters the text of individual words from the search snippet according to their confidence scores. For example, in FIG. 2, a confidence score 225 f is assigned to each of the word segments to represent a relative ranking that corresponds to the accuracy of the text of the recognized word. For each word in the text of the content segment, the confidence score from the corresponding word segment 225 is compared against a predetermined threshold value. If the confidence score for a word segment falls below the threshold, the text for that word segment is replaced with a predefined symbol (e.g., —). Otherwise no change is made to the text for that word segment.
  • At step 580, the snippet generator 440 adds the resulting metadata information for the content segment to a search result for the underlying media stream/file. Each enhanced metadata document that is returned from the search engine can have zero, one or more content segments containing a match to the search query. Thus, the corresponding search result associated with the media file/stream can also have zero, one or more search snippets associated with it. An example of a search result that includes no search snippets occurs when the metadata of the original content descriptor contains the search term, but the timed word segments 105 a of FIG. 2 do not.
  • The process returns to step 555 to obtain the metadata information corresponding to the next content snippet segment of the preferred segment type. If there are no more content segments of the preferred segment type, the process continues at step 535 to obtain the enhanced metadata document corresponding to the next media file/stream identified in the search at step 510. If there are no further metadata results to process, the process continues at optional step 582 to rank the search results before sending to the client 410.
  • At optional step 582, the snippet generator 440 ranks and sorts the list of search results. One factor for determining the rank of the search results can include confidence scores. For example, the search results can be ranked by calculating the sum, average or other statistical value from the confidence scores of the constituent search snippets for each search result and then ranking and sorting accordingly. Search results being associated with higher confidence scores can be ranked and thus sorted higher than search results associated with lower confidence scores. Other factors for ranking search results can include the publication date associated with the underlying media content and the number of snippets in each of the search results that contain the search term or terms. Any number of other criteria for ranking search results known to those skilled in the art can also be utilized in ranking the search results for audio/video content.
  • At step 585, the search results can be returned in a number of different ways. According to one embodiment, the snippet generator 440 can generate a set of instructions for rendering each of the constituent search snippets of the search result as shown in FIG. 3, for example, from the raw metadata information for each of the identified content segments. Once the instructions are generated, they can be provided to the search engine 420 for forwarding to the client. If a search result includes a long list of snippets, the client can display the search result such that a few of the snippets are displayed along with an indicator that can be selected to show the entire set of snippets for that search result.
  • Although not so limited, such a client includes (i) a browser application that is capable of presenting graphical search query forms and resulting pages of search snippets; (ii) a desktop or portable application capable of, or otherwise modified for, subscribing to a service and receiving alerts containing embedded search snippets (e.g., RSS reader applications); or (iii) a search applet embedded within a DVD (Digital Video Disc) that allows users to search a remote or local index to locate and navigate segments of the DVD audio/video content.
  • According to another embodiment, the metadata information contained within the list of search results in a raw data format are forwarded directly to the client 410 or indirectly to the client 410 via the search engine 420. The raw metadata information can include any combination of the parameters including a segment identifier, the location of the underlying content (e.g., URL or filename), segment type, the text of the word or group of words spoken during that segment (if any), timing information (e.g., start offset, end offset, and/or duration) and a confidence score (if any). Such information can then be stored or further processed by the client 410 according to application specific requirements. For example, a client desktop application, such as iTunes Music Store available from Apple Computer, Inc., can be modified to process the raw metadata information to generate its own proprietary user interface for enabling user-directed navigation of media content, including audio/video podcasts, resulting from a search of its Music Store repository.
  • FIG. 6A is a diagram illustrating another example of a search snippet that enables user navigation of the underlying media content. The search snippet 610 is similar to the snippet described with respect to FIG. 3, and additionally includes a user actuated display element 640 that serves as a navigational control. The navigational control 640 enables a user to control playback of the underlying media content. The text area 620 is optional for displaying the text 625 of the words spoken during one or more segments of the underlying media content as previously discussed with respect to FIG. 3.
  • Typical fast forward and fast reverse functions cause media players to jump ahead or jump back during media playback in fixed time increments. In contrast, the navigational control 640 enables a user to jump from one content segment to another segment using the timing information of individual content segments identified in the enhanced metadata.
  • As shown in FIG. 6A, the user-actuated display element 640 can include a number of navigational controls (e.g., Back 642, Forward 648, Play 644, and Pause 646). The Back 642 and Forward 648 controls can be configured to enable a user to jump between word segments, audio speech segments, video segments, non-speech audio segments, and marker segments. For example, if an audio/video podcast includes several content segments corresponding to different stories or topics, the user can easily skip such segments until the desired story or topic segment is reached.
  • FIGS. 6B and 6C are diagrams illustrating a method for navigating media content using the search snippet of FIG. 6A. At step 710, the client presents the search snippet of FIG. 6A, for example, that includes the user actuated display element 640. The user-actuated display element 640 includes a number of individual navigational controls (i.e., Back 642, Forward 648, Play 644, and Pause 646). Each of the navigational controls 642, 644, 646, 648 is associated with an object defining at least one event handler that is responsive to user actuations. For example, when a user clicks on the Play control 644, the object event handler provides the media player 630 with a link to the media file/stream and directs the player 630 to initiate playback of the media content from the beginning of the file/stream or from the most recent playback offset.
  • At step 720, in response to an indication of user actuation of Forward 648 and Back 642 display elements, a playback offset associated with the underlying media content in playback is determined. The playback offset can be a timestamp or other indexing value that varies according to the content segment presently in playback. This playback offset can be determined by polling the media player or by autonomously tracking the playback time.
  • For example, as shown in FIG. 6C, when the navigational event handler 850 is triggered by user actuation of the Forward 648 or Back 642 control elements, the playback state of media player module 830 is determined from the identity of the media file/stream presently in playback (e.g., URL or filename), if any, and the playback timing offset. Determination of the playback state can be accomplished by a sequence of status request/response 855 signaling to and from the media player module 830. Alternatively, a background media playback state tracker module 860 can be executed that keeps track of the identity of the media file in playback and maintains a playback clock (not shown) that tracks the relative playback timing offsets.
  • At step 730 of FIG. 6B, the playback offset is compared with the timing information corresponding to each of the content segments of the underlying media content to determine which of the content segments is presently in playback. As shown in FIG. 6C, once the media file/stream and playback timing offset are determined, the navigational event handler 850 references a segment list 870 that identifies each of the content segments in the media file/stream and the corresponding timing offset of that segment. As shown, the segment list 870 includes a segment list 872 corresponding to a set of timed audio speech segments (e.g., topics). For example, if the media file/stream is an audio/video podcast of an episode of a daily news program, the segment list 872 can include a number of entries corresponding to the various topics discussed during that episode (e.g., news, weather, sports, entertainment, etc.) and the time offsets corresponding to the start of each topic. The segment list 870 can also include a video segment list 874 or other lists (not shown) corresponding to timed word segments, timed non-speech audio segments, and timed marker segments, for example. The segment lists 870 can be derived from the enhanced metadata or can be the enhanced metadata itself.
  • At step 740 of FIG. 6B, the underlying media content is played back at an offset that is prior to or subsequent to the offset of the content segment presently in playback. For example, referring to FIG. 6C, the event handler 850 compares the playback timing offset to the set of predetermined timing offsets in one or more of the segment lists 870 to determine which of the content segments to playback next. For example, if the user clicked on the “forward” control 848, the event handler 850 obtains the timing offset for the content segment that is greater in time than the present playback offset. Conversely, if the user clicks on the “backward” control 842, the event handler 850 obtains the timing offset for the content segment that is earlier in time than the present playback offset. After determining the timing offset of the next segment to play, the event handler 850 provides the media player module 830 with instructions 880 directing playback of the media content at the next playback state (e.g., segment offset and/or URL).
  • Thus, an advantage of this aspect of the invention is that a user can control media using a client that is capable of jumping from one content segment to another segment using the timing information of individual content segments identified in the enhanced metadata. One particular application of this technology can be applied to portable player devices, such as the iPod audio/video player available from Apple Computer, Inc. For example, after downloading a podcast to the iPod, it is unacceptable for a user to have to listen to or view an entire podcast if he/she is only interested in a few segments of the content. Rather, by modifying the internal operating system software of iPod, the control buttons on the front panel of the iPod can be used to jump from one segment to the next segment of the podcast in a manner similar to that previously described.
  • Timed Tagging of Media Content
  • Keyword tags have been used to associate audio and video files with keywords that are descriptive of the content of such media files. An audio/video file or stream can be tagged in a number of different ways. For example, a content provider can publish a content descriptor document, such as a web page or RSS document, that includes a link and one or more keyword tags corresponding to an audio/video file or stream. Keyword tags can also be embedded within the audio/video file itself. For example, the specifications for MPEG-1 Audio Layer 3, more commonly referred to as MP3, defines a field for reading and writing keyword tags (e.g., ID3V1 tag). Using such tags, online systems, such as search engines, can store indexes of tagged media files and allow end users to search for desired audio/video content through keyword searches of matching tags. Particular online systems, such as YouTube at www.youtube.com, also enable an end user to tag and upload audio/video files themselves to a database to allow others to search and access tagged media files.
  • A disadvantage of such methods for tagging audio/video content is that a keyword tag is associated with the media file generally. In other words, a tag for a tagged media files is not associated with a particular point or segment of the audio/video content. FIG. 7 is a diagram that illustrates the concept of a tagged media file. In this example, the media file 900 is a video clip from a sports news program in which the topics of discussion include the World Baseball Classic 905 and the effect of steroids in sports 910. Media clip 900 is organized such that the World Baseball Classic segment starts at time T1, which precedes the steroid segment starting at time T2. The associated keyword tag 912 is “steroids.” Assuming that an end user establishes a connection to a search engine and conducts a search for audio/video associated with the tag “steroids,” the user might be presented with a search result including a link to the media clip of FIG. 7. However, the end user must listen or watch the Word Baseball Classic segment 905 before reaching the steroids segment 910. The user can try to fast forward past the World Baseball Classic segment 905, but the user is unlikely to know where the steroids segment 910 starts.
  • Thus, according to another aspect, the invention features a computerized method and apparatus for timed tagging of media content. The method and apparatus can include the steps of, or structure for, obtaining at least one keyword tag associated with discrete media content; generating a timed segment index of discrete media content, the timed segment index identifying content segments of the discrete media content and corresponding timing boundaries of the content segments; searching the timed segment index for a match to the at least one keyword tag, the match corresponding to at least one of the content segments identified in the segment index; and generating a timed tag index that includes the at least one keyword tag and the timing boundaries corresponding to the least one content segment of the discrete media content containing the match.
  • FIG. 8A is a diagram that illustrates a system including an apparatus for timed tagging of media content. The apparatus 920 includes a number of modules. As shown, the apparatus 920 includes an input module 925, a media indexer module 930, a timed tag generator module 935 and a database 940. The database 940 can be accessible to a search engine, for example (not shown).
  • FIG. 8B is a flow diagram that illustrates a method for timed tagging of media content according to the apparatus of FIG. 8A. At step 1010, the input module 925 provides an interface for receiving information regarding an audio/video file or stream and optionally a corresponding set of keyword tags from a content provider 950. For example, according to one embodiment, the input module 925 can provide a graphical or text-based user interface that is capable of being presented to a content provider 950 a (e.g., user) through a browser. Through such an interface, the content provider 950 a can upload an audio/video file and an optional set of provider-defined keyword tags to be associated with the media file. According to another embodiment, the content provider 950 b can push to the input module 925, or alternatively, the input module 925 can pull from the content provider 950 b, a content descriptor that includes a link to a corresponding audio/video file or stream (e.g., RSS document, web page, URL link) and an optional set of keyword tags embedded within the content descriptor.
  • The input module 925 transmits the information regarding the audio/video file or stream to the media indexer 930, and transmits the optional set of provider-defined tags to the timed tag generator 935. For example, where the content provider 950 a uploads the audio/video file and the optional set of provider-defined keyword tags to the input module 925, the input module can simply pass the data directly to the media indexer and timed tag generator respectively. Where the information regarding the audio/video file or stream and the optional set of keyword tags are embedded within a content descriptor, the input module 925 can process the content descriptor to extract the link to the media file or stream and the optional set of tags. Once the link and tags have been extracted from the descriptor document, the input module 925 can forward them to the media indexer 930 and timed tag generator 935, respectively. If a link to the media file is provided to the media indexer 930, the media indexer uses the link to retrieve the media file or stream for further processing.
  • At step 1020, the media indexer 930 creates a timed segment index from the audio/video content of the media file. For example, as previously described with respect to FIGS. 1B and 2, the timed segment index 200 (or enhanced metadata) can identify a number of timed word segments 220 corresponding to the audio portion of the media file. Each of the timed word segments 220 can include a segment identifier 225 a, the text of an individual word 225 b, timing information defining the boundaries of that content segment (i.e., start offset 225 c, end offset 225 d, and/or duration 225 e), and optionally a confidence score 225 f. In addition to the timed word segments, the segment index can also include one or more of the other types of content segments (e.g., audio speech segment 230, video segment 240, marker segment 260). The media indexer 930 then transmits the segment index to the timed tag generator 935.
  • At optional step 1030, the timed tag generator 935 can automatically generate tags from the timed segment index 200. Upon receiving the segment index 200, the timed tag generator 935 can generate additional tags according to a number of different ways. For example, the series of timed word segments 220 include the text of the words spoken during the audio portion of the media file. The timed tag generator 935 can read these words and employ an algorithm that maintains a word count for each word and generates a new tag for the top “n” words that exceed a threshold count. The timed tag generator 935 can employ an algorithm that compares the text of the words to a predetermined list of tags. If a match is found, the matching tag is added to the list of provider-defined tags. The timed tag generator 935 can employ a named entity extractor module, such as those known in the art, to read the text of the words, obtain a list of people, places or things, for example, and then use one or more of the named entities as keyword tags.
  • For example, FIG. 9 is a diagram that illustrates an exemplary timed segment index for media clip of FIG. 7. In this example, the timed segment index 1200 includes a set of word segments 1210 and a set of marker segments 1220. Marker segments 1220 can be defined by markers can be embedded in the audio/video content by the content provider that indicate the beginning and/or end of a content segment. Markers can also be embedded in a content descriptor corresponding to an audio/video file or stream. For example, a content provider can publish a web page that includes a link to an audio/video file and specifies in the text of the descriptor the beginning and end of content segments (e.g., “The discussion on the Word Baseball Classic starts at time T1 and ends at time T2 . . . ”). The corresponding media clip is associated with provider-defined tag “steroids.” However, by applying one or more of the techniques to the segment index 1200, such as those previously described in optional step 1030, the timed tag generator 935 can also identify the words “world baseball classic” spoken during segment 905 of the media clip 900 as an additional tag.
  • Referring back to FIG. 8B at step 1040, the timed tag generator 935 obtains the first tag from the list of provider-defined tags and/or automatically generated tags associated with the media file. At step 1050, the timed tag generator 935 searches for the tag within the timed segment index. For example, with respect to the timed segment index of FIG. 9, the timed tag generator 935 can search for the tag “steroids” within the set of timed word segments 1210 that provide the text of the words spoken during the audio portion of the media file. The timed tag generator 935 can compare the text of one or more word segments to the tag. If there is a match, the process continues at step 1060.
  • At step 1060, the timing boundaries are obtained for the matching word segment, or segments in the case of a multi-word tag. The timing boundaries of a word segment can include a start offset and an end offset, or duration, as previously described with respect to FIG. 2. These timing boundaries define the segment of the media content when the particular tag is spoken. For example, in FIG. 9, the first word segment containing the tag “steroids” is word segment WS050 having timing boundaries of T30 and T31. At step 1070, the timing boundaries of the matching word segment(s) containing the tag are extended by comparing the timing boundaries of the matching word segment to the timing boundaries of the other types of content segments (e.g., audio speech segment, video segment, marker segment as previously described in FIG. 2). If the timing boundaries of the matching word segment fall within the timing boundaries of a broader content segment, the timing boundaries for the tag can be extended to coincide with the timing boundaries of that broader content segment.
  • For example, in FIG. 9, marker segments MS001 and MS002 defining timing boundaries that contain a plurality of the word segments 1210. In this example, marker segment MS001 defines the timing boundaries for the World Baseball Classic segment, and marker segment MS002 defines the timing boundaries for the steroids segment. The timed tag generator 935 searches for the first word segment containing the keyword tag “steroids” in the text of the timed word segments 1210, and obtains the timing boundaries for the matching word segment WS050, namely start offset T30 and end offset T31. The timed tag generator 935 then expands the timing boundaries for the tag by comparing the timing boundaries T30 and T31 against the timing boundaries for marker segments MS001 and MS002. Since the timing boundaries of the matching word segment falls within the timing boundaries of marker segment MS002, namely start offset T25 and end offset T99, the keyword tag “steroids” is mapped to the timing boundaries T25 and T99. Similarly, the second and third instances of the keyword tag “steroids” in word segments WS060 and WS070 fall within the timing boundaries of marker segment MS002, and thus the timing boundaries associated with tag “steroids” do not change. Where multiple instances of the tag cannot be found in multiple non-contiguous content segments, the tag can be associated with multiple timing boundaries corresponding to each of the broader segments.
  • Referring back to FIG. 8B at step 1080, the timed tag generator creates or obtains a timed tag index for the audio/video file and maps the tag to the extended timing boundaries. For example, FIGS. 10A and 10B are diagrams that conceptually illustrate a timed tag index. As shown in FIG. 10A, the timed tag index 1250 can be implemented as a table corresponding to a specific tag (e.g., “steroids”). The entries of the table can include identifiers (e.g., AV1 . . . . AV5) for each of audio/video files associated with the specific tag, the timing boundaries of the audio/video content associated with the tag (e.g. “start= . . . ”, “end= . . . ”) and links or pointers to the audio/video files in the database or other remote locations (e.g., “location= . . . ”). As shown in FIG. 10B, the timed tag index 1255 can also be implemented as a table corresponding to a specific media file. The entries of the table includes one or more specific tags associated with the media file, the timing boundaries of the audio/video content associated with each tag, and a link or pointer to the audio/video file in the database or other remote location.
  • Referring back to FIG. 8B at step 1090, the timed tag generator 935 obtains the next tag from the list of provider-defined tags and/or automatically generated tags associated with the media file. If another tag is available, the process continues returning back to step 1050. Conversely, if all of the tags from the list have been processed, the process continues at step 1100 in which the timed tag generator 935 stores the timed tag index and optionally the audio/video file, itself in the searchable database 940.
  • With the timed tag indexes 1250, 1255, a search engine, or other online system, can enable a user to request audio/video content based on a specific tag and, in return, provide such content in a manner such that the user can readily access the desired segment of content associated with the desired tag. For example, FIG. 11 is a diagram illustrating a system for accessing timed tagged media content from a search engine. As shown, the system 1300 includes a search engine 1320 or other server capable of accessing database 1335. The database 1335 includes one or more timed tag indexes 1335 that map a tag to timed segments of one or more media files. Alternatively, each of the timed tag indexes 1335 can map timed segments of a particular media file to one or more provider-defined or automatically generated tags.
  • In operation, a client requestor 1310 establishes a session with the search engine 1320 and transmits a request for audio/video content associated with one or more tags (e.g. tag=“steroids”). In response the search engine 1320 access the timed tag indexes 1335 to identify each of the timed segments that correspond to the requested tag. The search engine can then generate instructions to present one or more of timed tagged segments of media content to the request via a browser interface 1340, for example. For purposes of example only, FIG. 11 illustrates a browser interface 1340 that presents a media player 1345 and a toolbar 1350 for jumping between the tagged timed segments. In this example, the toolbar 1350 includes a button 1352 for jumping to the timed segment associated with the tag “world baseball classic.” and another button 1354 for jumping to the timed segment associated with the tag “steroids.” Any number of different ways can be implemented for presented timed tagged segments to a user.
  • The above-described techniques can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The implementation can be as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers.
  • A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
  • Method steps can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by, and apparatus can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). Modules can refer to portions of the computer program and/or the processor/special circuitry that implements that functionality.
  • Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Data transmission and instructions can also occur over a communications network.
  • Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry.
  • The terms “module” and “function,” as used herein, mean, but are not limited to, a software or hardware component which performs certain tasks. A module may advantageously be configured to reside on addressable storage medium and configured to execute on one or more processors. A module may be fully or partially implemented with a general purpose integrated circuit (IC), FPGA, or ASIC. Thus, a module may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.
  • Additionally, the components and modules may advantageously be implemented on many different platforms, including computers, computer servers, data communications infrastructure equipment such as application-enabled switches or routers, or telecommunications infrastructure equipment, such as public or private telephone switches or private branch exchanges (PBX). In any of these cases, implementation may be achieved either by writing applications that are native to the chosen platform, or by interfacing the platform to one or more external application engines.
  • To provide for interaction with a user, the above described techniques can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer (e.g., interact with a user interface element). Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • The above described techniques can be implemented in a distributed computing system that includes a back-end component, e.g., as a data server, and/or a middleware component, e.g., an application server, and/or a front-end component, e.g., a client computer having a graphical user interface and/or a Web browser through which a user can interact with an example implementation, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet, and include both wired and wireless networks. Communication networks can also all or a portion of the PSTN, for example, a portion owned by a specific carrier.
  • The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims (12)

1-15. (canceled)
16. A system for presenting search results for media content, the system comprising a media processor configured to:
receive discrete media content;
derive timing offsets from the discrete media content using one or more automated media processing techniques; and
present a search result that enables a user to arbitrarily select and commence playback of the discrete media content at any of the content segments of the discrete media content.
17. The system of claim 16, wherein the media processor is further configured to:
present the search result including transcriptions of one or more of the content segments of the discrete media content, each of the transcriptions being mapped to a timing offset of a corresponding content segment;
receive a user selection of one of the transcriptions presented in the search result; and
cause playback of the discrete media content at a timing offset of the corresponding content segment mapped to the selected one of the transcriptions.
18. The system of claim 17, wherein each of the transcriptions is derived from the discrete media content using one or more automated media processing techniques or obtained from closed caption data associated with the discrete media content.
19. The system of claim 17, wherein the search result further comprises a user actuated display element that enables the user to navigate from an offset of one content segment to another content segment within the discrete media content in response to user actuation of the element.
20. The system of claim 16, wherein the search result further comprises a user actuated display element that enables the user to navigate from an offset of one content segment to another content segment within the discrete media content in response to user actuation of the element.
21. The system of claim 20, wherein the media processor is further configured to:
obtain timing offsets corresponding to each of the content segments within the discrete media content;
in response to an indication of user actuation of the display element, determine a playback offset associated with the discrete media content in playback;
compare the playback offset with the timing offsets corresponding to each of the content segments to determine which of the content segments is presently in playback; and
cause playback of the discrete media content to continue at an offset that is prior to or subsequent to the offset of the content segment presently in playback.
22. The system of claim 16, wherein one or more of the content segments comprise word segments, audio speech segments, video segments, non-speech audio segments, or marker segments.
23. The system of claim 16, wherein one or more of the content segments comprise audio corresponding to an individual word, audio corresponding to a phrase, audio corresponding to a sentence, audio corresponding to a paragraph, audio corresponding to a story, audio corresponding to a topic, audio within a range of volume levels, audio of an identified speaker, audio during a speaker turn, audio associated with a speaker emotion, audio of non-speech sounds, audio separated by sound gaps, audio separated by markers embedded within the media content or audio corresponding to a named entity.
24. The system of claim 16, wherein one or more of the content segments comprise video of individual scenes, watermarks, recognized objects, recognized faces, overlay text or video separated by markers embedded within the media content.
25. The system of claim 17, wherein each of the transcriptions is associated with a confidence level, and the media processor is further configured to:
present the search result including the transcriptions of the one or more of the content segments of the discrete media content, such that any transcription that is associated with a confidence level that fails to satisfy a predefined threshold is displayed with one or more predefined symbols.
26. An apparatus for presenting search results for media content, comprising:
means for presenting a search result that enables a user to arbitrarily select and commence playback of the discrete media content at any of the content segments of the discrete media content using timing offsets derived from the discrete media content using one or more automated media processing techniques.
US12/391,770 2005-11-09 2009-02-24 User-directed navigation of multimedia search results Abandoned US20090222442A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/391,770 US20090222442A1 (en) 2005-11-09 2009-02-24 User-directed navigation of multimedia search results
US15/047,372 US20160188577A1 (en) 2005-11-09 2016-02-18 User-directed navigation of multimedia search results

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US73612405P 2005-11-09 2005-11-09
US11/395,732 US20070106646A1 (en) 2005-11-09 2006-03-31 User-directed navigation of multimedia search results
US11/444,989 US7801910B2 (en) 2005-11-09 2006-06-01 Method and apparatus for timed tagging of media content
US12/391,770 US20090222442A1 (en) 2005-11-09 2009-02-24 User-directed navigation of multimedia search results

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/444,989 Continuation US7801910B2 (en) 2005-11-09 2006-06-01 Method and apparatus for timed tagging of media content

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/047,372 Continuation US20160188577A1 (en) 2005-11-09 2016-02-18 User-directed navigation of multimedia search results

Publications (1)

Publication Number Publication Date
US20090222442A1 true US20090222442A1 (en) 2009-09-03

Family

ID=37865784

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/444,989 Expired - Fee Related US7801910B2 (en) 2005-11-09 2006-06-01 Method and apparatus for timed tagging of media content
US12/391,770 Abandoned US20090222442A1 (en) 2005-11-09 2009-02-24 User-directed navigation of multimedia search results
US15/047,372 Abandoned US20160188577A1 (en) 2005-11-09 2016-02-18 User-directed navigation of multimedia search results

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/444,989 Expired - Fee Related US7801910B2 (en) 2005-11-09 2006-06-01 Method and apparatus for timed tagging of media content

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/047,372 Abandoned US20160188577A1 (en) 2005-11-09 2016-02-18 User-directed navigation of multimedia search results

Country Status (2)

Country Link
US (3) US7801910B2 (en)
WO (1) WO2007056535A2 (en)

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106760A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for dynamic presentation of advertising, factual, and informational content using enhanced metadata in search-driven media applications
US20070106693A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for providing virtual media channels based on media search
US20070106685A1 (en) * 2005-11-09 2007-05-10 Podzinger Corp. Method and apparatus for updating speech recognition databases and reindexing audio and video content using the same
US20070118873A1 (en) * 2005-11-09 2007-05-24 Bbnt Solutions Llc Methods and apparatus for merging media content
US20080077583A1 (en) * 2006-09-22 2008-03-27 Pluggd Inc. Visual interface for identifying positions of interest within a sequentially ordered information encoding
US20080154889A1 (en) * 2006-12-22 2008-06-26 Pfeiffer Silvia Video searching engine and methods
US20080229205A1 (en) * 2007-03-13 2008-09-18 Samsung Electronics Co., Ltd. Method of providing metadata on part of video image, method of managing the provided metadata and apparatus using the methods
US20100169095A1 (en) * 2008-12-26 2010-07-01 Yasuharu Asano Data processing apparatus, data processing method, and program
US20110106531A1 (en) * 2009-10-30 2011-05-05 Sony Corporation Program endpoint time detection apparatus and method, and program information retrieval system
US20110119291A1 (en) * 2006-06-14 2011-05-19 Qsent, Inc. Entity Identification and/or Association Using Multiple Data Elements
US20110196981A1 (en) * 2010-02-03 2011-08-11 Futurewei Technologies, Inc. Combined Binary String for Signaling Byte Range of Media Fragments in Adaptive Streaming
US20110307623A1 (en) * 2010-06-09 2011-12-15 Microsoft Corporation Smooth streaming client component
US20120180137A1 (en) * 2008-07-10 2012-07-12 Mcafee, Inc. System and method for data mining and security policy management
CN102592628A (en) * 2012-02-15 2012-07-18 张群 Play control method of audio and video play file
US8396878B2 (en) 2006-09-22 2013-03-12 Limelight Networks, Inc. Methods and systems for generating automated tags for video files
US20130215013A1 (en) * 2012-02-22 2013-08-22 Samsung Electronics Co., Ltd. Mobile communication terminal and method of generating content thereof
US8521719B1 (en) 2012-10-10 2013-08-27 Limelight Networks, Inc. Searchable and size-constrained local log repositories for tracking visitors' access to web content
CN103268345A (en) * 2013-05-27 2013-08-28 慈文传媒集团股份有限公司 Method and device for retrieving film and television data
US8548170B2 (en) 2003-12-10 2013-10-01 Mcafee, Inc. Document de-registration
US8554774B2 (en) 2005-08-31 2013-10-08 Mcafee, Inc. System and method for word indexing in a capture system and querying thereof
US8560534B2 (en) 2004-08-23 2013-10-15 Mcafee, Inc. Database for a capture system
US8656039B2 (en) 2003-12-10 2014-02-18 Mcafee, Inc. Rule parser
US8667121B2 (en) 2009-03-25 2014-03-04 Mcafee, Inc. System and method for managing data and policies
US8683035B2 (en) 2006-05-22 2014-03-25 Mcafee, Inc. Attributes of captured objects in a capture system
US8700561B2 (en) 2011-12-27 2014-04-15 Mcafee, Inc. System and method for providing data protection workflows in a network environment
US8706709B2 (en) 2009-01-15 2014-04-22 Mcafee, Inc. System and method for intelligent term grouping
US8707008B2 (en) 2004-08-24 2014-04-22 Mcafee, Inc. File system for a capture system
US8730955B2 (en) 2005-08-12 2014-05-20 Mcafee, Inc. High speed packet capture
US8762386B2 (en) 2003-12-10 2014-06-24 Mcafee, Inc. Method and apparatus for data capture and analysis system
US8806615B2 (en) 2010-11-04 2014-08-12 Mcafee, Inc. System and method for protecting specified data combinations
US8850591B2 (en) 2009-01-13 2014-09-30 Mcafee, Inc. System and method for concept building
US8918359B2 (en) 2009-03-25 2014-12-23 Mcafee, Inc. System and method for data mining and security policy management
US9015172B2 (en) 2006-09-22 2015-04-21 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search service system
US9195937B2 (en) 2009-02-25 2015-11-24 Mcafee, Inc. System and method for intelligent state management
US20150358261A1 (en) * 2014-06-04 2015-12-10 Wistron Corporation Playback method and associated transmitting device, playback device, and communication system
US9253154B2 (en) 2008-08-12 2016-02-02 Mcafee, Inc. Configuration management for a capture/registration system
WO2017023763A1 (en) * 2015-07-31 2017-02-09 Promptu Systems Corporation Natural language navigation and assisted viewing of indexed audio video streams, notably sports contests
US10068573B1 (en) * 2016-12-21 2018-09-04 Amazon Technologies, Inc. Approaches for voice-activated audio commands
WO2019041193A1 (en) * 2017-08-30 2019-03-07 深圳市云中飞网络科技有限公司 Application resource processing method and related product
US20190303402A1 (en) * 2011-12-22 2019-10-03 Tivo Solutions Inc. User interface for viewing targeted segments of multimedia content based on time-based metadata search criteria
WO2020091431A1 (en) * 2018-11-02 2020-05-07 주식회사 모두앤모두 Subtitle generation system using graphic object

Families Citing this family (164)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006659A1 (en) * 2001-10-19 2009-01-01 Collins Jack M Advanced mezzanine card for digital network data inspection
US7716330B2 (en) * 2001-10-19 2010-05-11 Global Velocity, Inc. System and method for controlling transmission of data packets over an information network
US7827312B2 (en) 2002-12-27 2010-11-02 The Nielsen Company (Us), Llc Methods and apparatus for transcoding metadata
US7602785B2 (en) * 2004-02-09 2009-10-13 Washington University Method and system for performing longest prefix matching for network address lookup using bloom filters
US8412763B2 (en) * 2006-06-21 2013-04-02 Apple Inc. Podcast organization and usage at a computing device
US8516035B2 (en) * 2006-06-21 2013-08-20 Apple Inc. Browsing and searching of podcasts
US7962591B2 (en) * 2004-06-23 2011-06-14 Mcafee, Inc. Object classification in a capture system
US9584868B2 (en) 2004-07-30 2017-02-28 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US7631336B2 (en) 2004-07-30 2009-12-08 Broadband Itv, Inc. Method for converting, navigating and displaying video content uploaded from the internet to a digital TV video-on-demand platform
US7590997B2 (en) 2004-07-30 2009-09-15 Broadband Itv, Inc. System and method for managing, converting and displaying video content on a video-on-demand platform, including ads used for drill-down navigation and consumer-generated classified ads
US11259059B2 (en) 2004-07-30 2022-02-22 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9344765B2 (en) 2004-07-30 2016-05-17 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
EP1859378A2 (en) 2005-03-03 2007-11-28 Washington University Method and apparatus for performing biosequence similarity searching
US7801910B2 (en) * 2005-11-09 2010-09-21 Ramp Holdings, Inc. Method and apparatus for timed tagging of media content
US7702629B2 (en) 2005-12-02 2010-04-20 Exegy Incorporated Method and device for high performance regular expression pattern matching
US11128489B2 (en) 2017-07-18 2021-09-21 Nicira, Inc. Maintaining data-plane connectivity between hosts
WO2007068119A1 (en) 2005-12-13 2007-06-21 Audio Pod Inc. Segmentation and transmission of audio streams
US9319720B2 (en) 2005-12-13 2016-04-19 Audio Pod Inc. System and method for rendering digital content using time offsets
US7954114B2 (en) 2006-01-26 2011-05-31 Exegy Incorporated Firmware socket module for FPGA-based pipeline processing
US7636703B2 (en) * 2006-05-02 2009-12-22 Exegy Incorporated Method and apparatus for approximate pattern matching
US20070276852A1 (en) * 2006-05-25 2007-11-29 Microsoft Corporation Downloading portions of media files
US7840482B2 (en) 2006-06-19 2010-11-23 Exegy Incorporated Method and system for high speed options pricing
US7921046B2 (en) 2006-06-19 2011-04-05 Exegy Incorporated High speed processing of financial information using FPGA devices
US7966362B2 (en) * 2006-06-21 2011-06-21 Apple Inc. Management of podcasts
US20090049122A1 (en) * 2006-08-14 2009-02-19 Benjamin Wayne System and method for providing a video media toolbar
US20080201412A1 (en) * 2006-08-14 2008-08-21 Benjamin Wayne System and method for providing video media on a website
JP4835321B2 (en) * 2006-08-21 2011-12-14 ソニー株式会社 Program providing method, program providing method program, recording medium recording program providing method program, and program providing apparatus
US9311394B2 (en) * 2006-10-31 2016-04-12 Sony Corporation Speech recognition for internet video search and navigation
US8296315B2 (en) * 2006-11-03 2012-10-23 Microsoft Corporation Earmarking media documents
US7660793B2 (en) * 2006-11-13 2010-02-09 Exegy Incorporated Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors
US8326819B2 (en) * 2006-11-13 2012-12-04 Exegy Incorporated Method and system for high performance data metatagging and data indexing using coprocessors
US9417758B2 (en) * 2006-11-21 2016-08-16 Daniel E. Tsai AD-HOC web content player
US20080133475A1 (en) * 2006-11-30 2008-06-05 Donald Fischer Identification of interesting content based on observation of passive user interaction
US8176191B2 (en) * 2006-11-30 2012-05-08 Red Hat, Inc. Automated identification of high/low value content based on social feedback
US8751475B2 (en) * 2007-02-14 2014-06-10 Microsoft Corporation Providing additional information related to earmarks
US10382514B2 (en) * 2007-03-20 2019-08-13 Apple Inc. Presentation of media in an application
US20080240227A1 (en) * 2007-03-30 2008-10-02 Wan Wade K Bitstream processing using marker codes with offset values
US9071796B2 (en) * 2007-03-30 2015-06-30 Verizon Patent And Licensing Inc. Managing multiple media content sources
KR20100017194A (en) * 2007-05-08 2010-02-16 톰슨 라이센싱 Movie based forensic data for digital cinema
US20080301282A1 (en) * 2007-05-30 2008-12-04 Vernit Americas, Inc. Systems and Methods for Storing Interaction Data
US9576302B2 (en) * 2007-05-31 2017-02-21 Aditall Llc. System and method for dynamic generation of video content
US9032298B2 (en) * 2007-05-31 2015-05-12 Aditall Llc. Website application system for online video producers and advertisers
US20080313146A1 (en) * 2007-06-15 2008-12-18 Microsoft Corporation Content search service, finding content, and prefetching for thin client
US11570521B2 (en) 2007-06-26 2023-01-31 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9654833B2 (en) 2007-06-26 2017-05-16 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US8165450B2 (en) 2007-11-19 2012-04-24 Echostar Technologies L.L.C. Methods and apparatus for filtering content in a video stream using text data
US8165451B2 (en) 2007-11-20 2012-04-24 Echostar Technologies L.L.C. Methods and apparatus for displaying information regarding interstitials of a video stream
US8359303B2 (en) * 2007-12-06 2013-01-22 Xiaosong Du Method and apparatus to provide multimedia service using time-based markup language
US8606085B2 (en) 2008-03-20 2013-12-10 Dish Network L.L.C. Method and apparatus for replacement of audio data in recorded audio/video stream
US8374986B2 (en) 2008-05-15 2013-02-12 Exegy Incorporated Method and system for accelerated stream processing
CN103402070B (en) 2008-05-19 2017-07-07 日立麦克赛尔株式会社 Record reproducing device and method
US9183885B2 (en) 2008-05-30 2015-11-10 Echostar Technologies L.L.C. User-initiated control of an audio/video stream to skip interstitial content between program segments
US8156520B2 (en) * 2008-05-30 2012-04-10 EchoStar Technologies, L.L.C. Methods and apparatus for presenting substitute content in an audio/video stream using text data
JP2010055259A (en) * 2008-08-27 2010-03-11 Konica Minolta Business Technologies Inc Image processing apparatus, image processing program, and image processing method
CN101668004B (en) * 2008-09-04 2016-02-10 阿里巴巴集团控股有限公司 A kind of webpage acquisition methods, Apparatus and system
US20100107090A1 (en) * 2008-10-27 2010-04-29 Camille Hearst Remote linking to media asset groups
CA2744746C (en) 2008-12-15 2019-12-24 Exegy Incorporated Method and apparatus for high-speed processing of financial market depth data
US8588579B2 (en) 2008-12-24 2013-11-19 Echostar Technologies L.L.C. Methods and apparatus for filtering and inserting content into a presentation stream using signature data
US8438485B2 (en) * 2009-03-17 2013-05-07 Unews, Llc System, method, and apparatus for generating, customizing, distributing, and presenting an interactive audio publication
US20100299131A1 (en) * 2009-05-21 2010-11-25 Nexidia Inc. Transcript alignment
KR101816113B1 (en) * 2009-07-16 2018-01-08 블루핀 랩스, 인코포레이티드 Estimating and displaying social interest in time-based media
US9489577B2 (en) * 2009-07-27 2016-11-08 Cxense Asa Visual similarity for video content
US8707381B2 (en) * 2009-09-22 2014-04-22 Caption Colorado L.L.C. Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs
US8934758B2 (en) * 2010-02-09 2015-01-13 Echostar Global B.V. Methods and apparatus for presenting supplemental content in association with recorded content
FR2956515A1 (en) * 2010-02-15 2011-08-19 France Telecom NAVIGATION METHOD IN SOUND CONTENT
US9305603B2 (en) * 2010-07-07 2016-04-05 Adobe Systems Incorporated Method and apparatus for indexing a video stream
US8839318B2 (en) * 2010-07-08 2014-09-16 Echostar Broadcasting Corporation Apparatus, systems and methods for quick speed presentation of media content
US20120094768A1 (en) * 2010-10-14 2012-04-19 FlixMaster Web-based interactive game utilizing video components
US9484065B2 (en) * 2010-10-15 2016-11-01 Microsoft Technology Licensing, Llc Intelligent determination of replays based on event identification
US10037568B2 (en) 2010-12-09 2018-07-31 Ip Reservoir, Llc Method and apparatus for managing orders in financial markets
EP2466492A1 (en) * 2010-12-20 2012-06-20 Paul Peter Vaclik A method of making text data associated with video data searchable
US20130334300A1 (en) 2011-01-03 2013-12-19 Curt Evans Text-synchronized media utilization and manipulation based on an embedded barcode
US9380356B2 (en) 2011-04-12 2016-06-28 The Nielsen Company (Us), Llc Methods and apparatus to generate a tag for media content
US9967600B2 (en) * 2011-05-26 2018-05-08 Nbcuniversal Media, Llc Multi-channel digital content watermark system and method
US9209978B2 (en) 2012-05-15 2015-12-08 The Nielsen Company (Us), Llc Methods and apparatus to measure exposure to streaming media
US9515904B2 (en) 2011-06-21 2016-12-06 The Nielsen Company (Us), Llc Monitoring streaming media content
US9208158B2 (en) * 2011-08-26 2015-12-08 Cfe Media Llc System and method for content syndication service
US11055334B2 (en) * 2011-09-23 2021-07-06 Avaya Inc. System and method for aligning messages to an event based on semantic similarity
US8433577B2 (en) 2011-09-27 2013-04-30 Google Inc. Detection of creative works on broadcast media
US20130089301A1 (en) * 2011-10-06 2013-04-11 Chi-cheng Ju Method and apparatus for processing video frames image with image registration information involved therein
US8531602B1 (en) * 2011-10-19 2013-09-10 Google Inc. Audio enhancements for media
US8438595B1 (en) 2011-11-04 2013-05-07 General Instrument Corporation Method and apparatus for temporal correlation of content-specific metadata with content obtained from disparate sources
US20130226983A1 (en) * 2012-02-29 2013-08-29 Jeffrey Martin Beining Collaborative Video Highlights
US9990393B2 (en) 2012-03-27 2018-06-05 Ip Reservoir, Llc Intelligent feed switch
US10650452B2 (en) 2012-03-27 2020-05-12 Ip Reservoir, Llc Offload processing of data packets
US10121196B2 (en) 2012-03-27 2018-11-06 Ip Reservoir, Llc Offload processing of data packets containing financial market data
US11436672B2 (en) 2012-03-27 2022-09-06 Exegy Incorporated Intelligent switch for processing financial market data
US9804754B2 (en) * 2012-03-28 2017-10-31 Terry Crawford Method and system for providing segment-based viewing of recorded sessions
US10389779B2 (en) 2012-04-27 2019-08-20 Arris Enterprises Llc Information processing
US10277933B2 (en) * 2012-04-27 2019-04-30 Arris Enterprises Llc Method and device for augmenting user-input information related to media content
US20130308922A1 (en) * 2012-05-15 2013-11-21 Microsoft Corporation Enhanced video discovery and productivity through accessibility
US8819759B2 (en) 2012-06-27 2014-08-26 Google Technology Holdings LLC Determining the location of a point of interest in a media stream that includes caption data
US10165245B2 (en) 2012-07-06 2018-12-25 Kaltura, Inc. Pre-fetching video content
US9282366B2 (en) 2012-08-13 2016-03-08 The Nielsen Company (Us), Llc Methods and apparatus to communicate audience measurement information
US8798242B1 (en) * 2012-09-25 2014-08-05 Emc Corporation Voice-related information in data management systems
US9633093B2 (en) 2012-10-23 2017-04-25 Ip Reservoir, Llc Method and apparatus for accelerated format translation of data in a delimited data format
CA2887022C (en) 2012-10-23 2021-05-04 Ip Reservoir, Llc Method and apparatus for accelerated format translation of data in a delimited data format
US9633097B2 (en) 2012-10-23 2017-04-25 Ip Reservoir, Llc Method and apparatus for record pivoting to accelerate processing of data fields
US10971191B2 (en) * 2012-12-12 2021-04-06 Smule, Inc. Coordinated audiovisual montage from selected crowd-sourced content with alignment to audio baseline
GB2510424A (en) * 2013-02-05 2014-08-06 British Broadcasting Corp Processing audio-video (AV) metadata relating to general and individual user parameters
US9313544B2 (en) 2013-02-14 2016-04-12 The Nielsen Company (Us), Llc Methods and apparatus to measure exposure to streaming media
US9916295B1 (en) * 2013-03-15 2018-03-13 Richard Henry Dana Crawford Synchronous context alignments
US9236088B2 (en) 2013-04-18 2016-01-12 Rapt Media, Inc. Application communication
US9123330B1 (en) * 2013-05-01 2015-09-01 Google Inc. Large-scale speaker identification
EP2819418A1 (en) * 2013-06-27 2014-12-31 British Telecommunications public limited company Provision of video data
US20150074129A1 (en) * 2013-09-12 2015-03-12 Cisco Technology, Inc. Augmenting media presentation description and index for metadata in a network environment
WO2015038749A1 (en) * 2013-09-13 2015-03-19 Arris Enterprises, Inc. Content based video content segmentation
TW201513095A (en) * 2013-09-23 2015-04-01 Hon Hai Prec Ind Co Ltd Audio or video files processing system, device and method
US9332035B2 (en) 2013-10-10 2016-05-03 The Nielsen Company (Us), Llc Methods and apparatus to measure exposure to streaming media
US9792386B2 (en) 2013-10-25 2017-10-17 Turner Broadcasting System, Inc. Concepts for providing an enhanced media presentation
US11910066B2 (en) 2013-10-25 2024-02-20 Turner Broadcasting System, Inc. Providing interactive advertisements
US10037380B2 (en) 2014-02-14 2018-07-31 Microsoft Technology Licensing, Llc Browsing videos via a segment list
GB2541577A (en) 2014-04-23 2017-02-22 Ip Reservoir Llc Method and apparatus for accelerated data translation
WO2015167187A1 (en) * 2014-04-27 2015-11-05 엘지전자 주식회사 Broadcast signal transmitting apparatus, broadcast signal receiving apparatus, method for transmitting broadcast signal, and method for receiving broadcast signal
US9699499B2 (en) 2014-04-30 2017-07-04 The Nielsen Company (Us), Llc Methods and apparatus to measure exposure to streaming media
US20150347390A1 (en) * 2014-05-30 2015-12-03 Vavni, Inc. Compliance Standards Metadata Generation
US20180047387A1 (en) * 2015-03-05 2018-02-15 Igal NIR System and method for generating accurate speech transcription from natural speech audio signals
US9762965B2 (en) 2015-05-29 2017-09-12 The Nielsen Company (Us), Llc Methods and apparatus to measure exposure to streaming media
US10645457B2 (en) 2015-06-04 2020-05-05 Comcast Cable Communications, Llc Using text data in content presentation and content search
US10380166B2 (en) 2015-06-29 2019-08-13 The Nielson Company (Us), Llc Methods and apparatus to determine tags for media using multiple media features
US10621231B2 (en) 2015-08-24 2020-04-14 Google Llc Generation of a topic index with natural language processing
US10942943B2 (en) 2015-10-29 2021-03-09 Ip Reservoir, Llc Dynamic field data translation to support high performance stream data processing
US9812028B1 (en) * 2016-05-04 2017-11-07 Wespeke, Inc. Automated generation and presentation of lessons via digital media content extraction
KR20180105693A (en) * 2016-01-25 2018-09-28 웨스페케 아이앤시. Digital media content extraction and natural language processing system
US20180061256A1 (en) * 2016-01-25 2018-03-01 Wespeke, Inc. Automated digital media content extraction for digital lesson generation
KR102468763B1 (en) * 2016-02-05 2022-11-18 삼성전자 주식회사 Image processing apparatus and control method thereof
CN107193841B (en) * 2016-03-15 2022-07-26 北京三星通信技术研究有限公司 Method and device for accelerating playing, transmitting and storing of media file
US20170329762A1 (en) * 2016-05-13 2017-11-16 Comcast Cable Communications, Llc Methods and systems for matching interests with content
US11373219B2 (en) * 2016-08-12 2022-06-28 Eric Koenig System and method for providing a profiled video preview and recommendation portal
US10475349B2 (en) 2017-03-10 2019-11-12 SmartNoter Inc. System and method of producing and providing user specific educational digital media modules
CA3012688A1 (en) * 2017-07-28 2019-01-28 Comcast Cable Communications, Llc Dynamic detection of custom linear video clip boundaries
CN107480228A (en) * 2017-08-03 2017-12-15 深圳市置辰海信科技有限公司 A kind of monitor video searching method towards hard objectives
CN107704609B (en) * 2017-10-18 2021-01-08 浪潮金融信息技术有限公司 Video content retrieval method and device, computer-readable storage medium and terminal
KR102452644B1 (en) * 2017-10-31 2022-10-11 삼성전자주식회사 Electronic apparatus, voice recognition method and storage medium
US10714144B2 (en) 2017-11-06 2020-07-14 International Business Machines Corporation Corroborating video data with audio data from video content to create section tagging
US11140450B2 (en) * 2017-11-28 2021-10-05 Rovi Guides, Inc. Methods and systems for recommending content in context of a conversation
US11403337B2 (en) * 2017-12-05 2022-08-02 Google Llc Identifying videos with inappropriate content by processing search logs
US10726732B2 (en) 2018-01-16 2020-07-28 SmartNoter Inc. System and method of producing and providing user specific educational digital media modules augmented with electronic educational testing content
CN110309353A (en) * 2018-02-06 2019-10-08 上海全土豆文化传播有限公司 Video index method and device
KR102468214B1 (en) * 2018-02-19 2022-11-17 삼성전자주식회사 The system and an appratus for providig contents based on a user utterance
US11100164B2 (en) * 2018-06-12 2021-08-24 Verizon Media Inc. Displaying videos based upon selectable inputs associated with tags
US11748418B2 (en) 2018-07-31 2023-09-05 Marvell Asia Pte, Ltd. Storage aggregator controller with metadata computation control
WO2020053862A1 (en) * 2018-09-13 2020-03-19 Ichannel.Io Ltd. A system and computerized method for subtitles synchronization of audiovisual content using the human voice detection for synchronization
KR20200086569A (en) * 2019-01-09 2020-07-17 삼성전자주식회사 Apparatus and method for controlling sound quaulity of terminal using network
US11403300B2 (en) * 2019-02-15 2022-08-02 Wipro Limited Method and system for improving relevancy and ranking of search result
US11347471B2 (en) * 2019-03-04 2022-05-31 Giide Audio, Inc. Interactive podcast platform with integrated additional audio/visual content
US10856041B2 (en) * 2019-03-18 2020-12-01 Disney Enterprises, Inc. Content promotion using a conversational agent
CN111859028A (en) * 2019-04-30 2020-10-30 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for creating an index for streaming storage
CN110297943B (en) * 2019-07-05 2022-07-26 联想(北京)有限公司 Label adding method and device, electronic equipment and storage medium
US11270123B2 (en) * 2019-10-22 2022-03-08 Palo Alto Research Center Incorporated System and method for generating localized contextual video annotation
CN111031354B (en) * 2019-12-09 2020-12-01 腾讯科技(深圳)有限公司 Multimedia playing method, device and storage medium
US11250872B2 (en) 2019-12-14 2022-02-15 International Business Machines Corporation Using closed captions as parallel training data for customization of closed captioning systems
KR20210100368A (en) * 2020-02-06 2021-08-17 삼성전자주식회사 Electronice device and control method thereof
US11032620B1 (en) * 2020-02-14 2021-06-08 Sling Media Pvt Ltd Methods, systems, and apparatuses to respond to voice requests to play desired video clips in streamed media based on matched close caption and sub-title text
US11172269B2 (en) 2020-03-04 2021-11-09 Dish Network L.L.C. Automated commercial content shifting in a video streaming system
US11373657B2 (en) * 2020-05-01 2022-06-28 Raytheon Applied Signal Technology, Inc. System and method for speaker identification in audio data
US11335326B2 (en) * 2020-05-14 2022-05-17 Spotify Ab Systems and methods for generating audible versions of text sentences from audio snippets
US11315545B2 (en) 2020-07-09 2022-04-26 Raytheon Applied Signal Technology, Inc. System and method for language identification in audio data
US11475210B2 (en) 2020-08-31 2022-10-18 Twilio Inc. Language model for abstractive summarization
US11765267B2 (en) * 2020-12-31 2023-09-19 Twilio Inc. Tool for annotating and reviewing audio conversations
US11809804B2 (en) 2021-05-26 2023-11-07 Twilio Inc. Text formatter
CN113242470B (en) * 2021-06-15 2023-03-31 广州聚焦网络技术有限公司 Video publishing method and device applied to foreign trade marketing
US11683558B2 (en) * 2021-06-29 2023-06-20 The Nielsen Company (Us), Llc Methods and apparatus to determine the speed-up of media programs using speech recognition
US11785278B1 (en) * 2022-03-18 2023-10-10 Comcast Cable Communications, Llc Methods and systems for synchronization of closed captions with content output

Citations (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5613034A (en) * 1991-09-14 1997-03-18 U.S. Philips Corporation Method and apparatus for recognizing spoken words in a speech signal
US5613036A (en) * 1992-12-31 1997-03-18 Apple Computer, Inc. Dynamic categories for a speech recognition system
US6006265A (en) * 1998-04-02 1999-12-21 Hotv, Inc. Hyperlinks resolution at and by a special network server in order to enable diverse sophisticated hyperlinking upon a digital network
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
US6081779A (en) * 1997-02-28 2000-06-27 U.S. Philips Corporation Language model adaptation for automatic speech recognition
US6112172A (en) * 1998-03-31 2000-08-29 Dragon Systems, Inc. Interactive searching
US6157912A (en) * 1997-02-28 2000-12-05 U.S. Philips Corporation Speech recognition method with language model adaptation
US20010045962A1 (en) * 2000-05-27 2001-11-29 Lg Electronics Inc. Apparatus and method for mapping object data for efficient matching between user preference information and content description information
US20010049826A1 (en) * 2000-01-19 2001-12-06 Itzhak Wilf Method of searching video channels by content
US6345253B1 (en) * 1999-04-09 2002-02-05 International Business Machines Corporation Method and apparatus for retrieving audio information using primary and supplemental indexes
US20020052925A1 (en) * 2000-08-29 2002-05-02 Yoohwan Kim Method and apparatus for information delivery on the internet
US20020069218A1 (en) * 2000-07-24 2002-06-06 Sanghoon Sull System and method for indexing, searching, identifying, and editing portions of electronic multimedia files
US6418431B1 (en) * 1998-03-30 2002-07-09 Microsoft Corporation Information retrieval and speech recognition based on language models
US20020099695A1 (en) * 2000-11-21 2002-07-25 Abajian Aram Christian Internet streaming media workflow architecture
US20020108112A1 (en) * 2001-02-02 2002-08-08 Ensequence, Inc. System and method for thematically analyzing and annotating an audio-visual sequence
US20020133398A1 (en) * 2001-01-31 2002-09-19 Microsoft Corporation System and method for delivering media
US20020143852A1 (en) * 1999-01-19 2002-10-03 Guo Katherine Hua High quality streaming multimedia
US6484136B1 (en) * 1999-10-21 2002-11-19 International Business Machines Corporation Language model adaptation via network of similar users
US6501833B2 (en) * 1995-05-26 2002-12-31 Speechworks International, Inc. Method and apparatus for dynamic adaptation of a large vocabulary speech recognition system and for use of constraints from a database in a large vocabulary speech recognition system
US6546427B1 (en) * 1999-06-18 2003-04-08 International Business Machines Corp. Streaming multimedia network with automatically switchable content sources
US20030123841A1 (en) * 2001-12-27 2003-07-03 Sylvie Jeannin Commercial detection in audio-visual content based on scene change distances on separator boundaries
US6611803B1 (en) * 1998-12-17 2003-08-26 Matsushita Electric Industrial Co., Ltd. Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition
US20030171926A1 (en) * 2002-03-07 2003-09-11 Narasimha Suresh System for information storage, retrieval and voice based content search and methods thereof
US6671692B1 (en) * 1999-11-23 2003-12-30 Accenture Llp System for facilitating the navigation of data
US6687697B2 (en) * 2001-07-30 2004-02-03 Microsoft Corporation System and method for improved string matching under noisy channel conditions
US6691123B1 (en) * 2000-11-10 2004-02-10 Imp Technology As Method for structuring and searching information
US6697796B2 (en) * 2000-01-13 2004-02-24 Agere Systems Inc. Voice clip search
US6728763B1 (en) * 2000-03-09 2004-04-27 Ben W. Chen Adaptive media streaming server for playing live and streaming media content on demand through web client's browser with no additional software or plug-ins
US6738745B1 (en) * 2000-04-07 2004-05-18 International Business Machines Corporation Methods and apparatus for identifying a non-target language in a speech recognition system
US20040103433A1 (en) * 2000-09-07 2004-05-27 Yvan Regeard Search method for audio-visual programmes or contents on an audio-visual flux containing tables of events distributed by a database
US6748375B1 (en) * 2000-09-07 2004-06-08 Microsoft Corporation System and method for content retrieval
US6768999B2 (en) * 1996-06-28 2004-07-27 Mirror Worlds Technologies, Inc. Enterprise, stream-based, information management system
US20040199507A1 (en) * 2003-04-04 2004-10-07 Roger Tawa Indexing media files in a distributed, multi-user system for managing and editing digital media
US20040205535A1 (en) * 2001-09-10 2004-10-14 Xerox Corporation Method and apparatus for the construction and use of table-like visualizations of hierarchic material
US6816858B1 (en) * 2000-03-31 2004-11-09 International Business Machines Corporation System, method and apparatus providing collateral information for a video/audio stream
US6848080B1 (en) * 1999-11-05 2005-01-25 Microsoft Corporation Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors
US20050033758A1 (en) * 2003-08-08 2005-02-10 Baxter Brent A. Media indexer
US6856997B2 (en) * 2000-10-27 2005-02-15 Lg Electronics Inc. Apparatus and method for providing file structure for multimedia streaming service
US6859799B1 (en) * 1998-11-30 2005-02-22 Gemstar Development Corporation Search engine for video and graphics
US20050055346A1 (en) * 2002-06-14 2005-03-10 Entopia, Inc. System and method for personalized information retrieval based on user expertise
US6873993B2 (en) * 2000-06-21 2005-03-29 Canon Kabushiki Kaisha Indexing method and apparatus
US6877134B1 (en) * 1997-08-14 2005-04-05 Virage, Inc. Integrated data and real-time metadata capture system and method
US20050086692A1 (en) * 2003-10-17 2005-04-21 Mydtv, Inc. Searching for programs and updating viewer preferences with reference to program segment characteristics
US20050096910A1 (en) * 2002-12-06 2005-05-05 Watson Kirk L. Formed document templates and related methods and systems for automated sequential insertion of speech recognition results
US20050165771A1 (en) * 2000-03-14 2005-07-28 Sony Corporation Information providing apparatus and method, information processing apparatus and method, and program storage medium
US20050192987A1 (en) * 2002-04-16 2005-09-01 Microsoft Corporation Media content descriptions
US20050197724A1 (en) * 2004-03-08 2005-09-08 Raja Neogi System and method to generate audio fingerprints for classification and storage of audio clips
US20050216443A1 (en) * 2000-07-06 2005-09-29 Streamsage, Inc. Method and system for indexing and searching timed media information based upon relevance intervals
US20050229118A1 (en) * 2004-03-31 2005-10-13 Fuji Xerox Co., Ltd. Systems and methods for browsing multimedia content on small mobile devices
US20050234875A1 (en) * 2004-03-31 2005-10-20 Auerbach David B Methods and systems for processing media files
US20050256867A1 (en) * 2004-03-15 2005-11-17 Yahoo! Inc. Search systems and methods with integration of aggregate user annotations
US6973428B2 (en) * 2001-05-24 2005-12-06 International Business Machines Corporation System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition
US6985861B2 (en) * 2001-12-12 2006-01-10 Hewlett-Packard Development Company, L.P. Systems and methods for combining subword recognition and whole word recognition of a spoken input
US20060015904A1 (en) * 2000-09-08 2006-01-19 Dwight Marcus Method and apparatus for creation, distribution, assembly and verification of media
US20060020662A1 (en) * 2004-01-27 2006-01-26 Emergent Music Llc Enabling recommendations and community by massively-distributed nearest-neighbor searching
US20060020971A1 (en) * 2004-07-22 2006-01-26 Thomas Poslinski Multi channel program guide with integrated progress bars
US20060047580A1 (en) * 2004-08-30 2006-03-02 Diganta Saha Method of searching, reviewing and purchasing music track or song by lyrical content
US20060053156A1 (en) * 2004-09-03 2006-03-09 Howard Kaushansky Systems and methods for developing intelligence from information existing on a network
US7111009B1 (en) * 1997-03-14 2006-09-19 Microsoft Corporation Interactive playlist generation using annotations
US7120582B1 (en) * 1999-09-07 2006-10-10 Dragon Systems, Inc. Expanding an effective vocabulary of a speech recognition system
US20060265421A1 (en) * 2005-02-28 2006-11-23 Shamal Ranasinghe System and method for creating a playlist
US20070005569A1 (en) * 2005-06-30 2007-01-04 Microsoft Corporation Searching an index of media content
US7177881B2 (en) * 2003-06-23 2007-02-13 Sony Corporation Network media channels
US20070041522A1 (en) * 2005-08-19 2007-02-22 At&T Corp. System and method for integrating and managing E-mail, voicemail, and telephone conversations using speech processing techniques
US20070078708A1 (en) * 2005-09-30 2007-04-05 Hua Yu Using speech recognition to determine advertisements relevant to audio content and/or audio content relevant to advertisements
US20070100787A1 (en) * 2005-11-02 2007-05-03 Creative Technology Ltd. System for downloading digital content published in a media channel
US20070106660A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Method and apparatus for using confidence scores of enhanced metadata in search-driven media applications
US20070106760A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for dynamic presentation of advertising, factual, and informational content using enhanced metadata in search-driven media applications
US20070106693A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for providing virtual media channels based on media search
US20070106685A1 (en) * 2005-11-09 2007-05-10 Podzinger Corp. Method and apparatus for updating speech recognition databases and reindexing audio and video content using the same
US7222155B1 (en) * 1999-06-15 2007-05-22 Wink Communications, Inc. Synchronous updating of dynamic interactive applications
US20070118873A1 (en) * 2005-11-09 2007-05-24 Bbnt Solutions Llc Methods and apparatus for merging media content
US7260564B1 (en) * 2000-04-07 2007-08-21 Virage, Inc. Network video guide and spidering
US7308487B1 (en) * 2000-12-12 2007-12-11 Igate Corp. System and method for providing fault-tolerant remote controlled computing devices
US7801910B2 (en) * 2005-11-09 2010-09-21 Ramp Holdings, Inc. Method and apparatus for timed tagging of media content

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1008931A3 (en) 1998-11-30 2003-08-27 Sun Microsystems, Inc. TV PIP applet implementation using a PIP framework
US6990453B2 (en) 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
KR20020024865A (en) 2000-09-27 2002-04-03 강명호 A method for providing radio using internet
US7143353B2 (en) * 2001-03-30 2006-11-28 Koninklijke Philips Electronics, N.V. Streaming video bookmarks
US7149230B2 (en) * 2002-03-08 2006-12-12 Microsoft Corporation Transport processor for processing multiple transport streams
US20040006628A1 (en) 2002-07-03 2004-01-08 Scott Shepard Systems and methods for providing real-time alerting
JP2004350253A (en) 2003-05-23 2004-12-09 Adachi Taro Hosting service or network service which enable user to view program via network such as internet, exclusive line network, and vpn network from outside service area of broadcasting
CN101189658A (en) * 2005-02-08 2008-05-28 兰德马克数字服务有限责任公司 Automatic identification of repeated material in audio signals

Patent Citations (80)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5613034A (en) * 1991-09-14 1997-03-18 U.S. Philips Corporation Method and apparatus for recognizing spoken words in a speech signal
US5613036A (en) * 1992-12-31 1997-03-18 Apple Computer, Inc. Dynamic categories for a speech recognition system
US6501833B2 (en) * 1995-05-26 2002-12-31 Speechworks International, Inc. Method and apparatus for dynamic adaptation of a large vocabulary speech recognition system and for use of constraints from a database in a large vocabulary speech recognition system
US6768999B2 (en) * 1996-06-28 2004-07-27 Mirror Worlds Technologies, Inc. Enterprise, stream-based, information management system
US6081779A (en) * 1997-02-28 2000-06-27 U.S. Philips Corporation Language model adaptation for automatic speech recognition
US6157912A (en) * 1997-02-28 2000-12-05 U.S. Philips Corporation Speech recognition method with language model adaptation
US7111009B1 (en) * 1997-03-14 2006-09-19 Microsoft Corporation Interactive playlist generation using annotations
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
US6877134B1 (en) * 1997-08-14 2005-04-05 Virage, Inc. Integrated data and real-time metadata capture system and method
US6418431B1 (en) * 1998-03-30 2002-07-09 Microsoft Corporation Information retrieval and speech recognition based on language models
US6112172A (en) * 1998-03-31 2000-08-29 Dragon Systems, Inc. Interactive searching
US6006265A (en) * 1998-04-02 1999-12-21 Hotv, Inc. Hyperlinks resolution at and by a special network server in order to enable diverse sophisticated hyperlinking upon a digital network
US6859799B1 (en) * 1998-11-30 2005-02-22 Gemstar Development Corporation Search engine for video and graphics
US6728673B2 (en) * 1998-12-17 2004-04-27 Matsushita Electric Industrial Co., Ltd Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition
US6611803B1 (en) * 1998-12-17 2003-08-26 Matsushita Electric Industrial Co., Ltd. Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition
US20020143852A1 (en) * 1999-01-19 2002-10-03 Guo Katherine Hua High quality streaming multimedia
US6345253B1 (en) * 1999-04-09 2002-02-05 International Business Machines Corporation Method and apparatus for retrieving audio information using primary and supplemental indexes
US7222155B1 (en) * 1999-06-15 2007-05-22 Wink Communications, Inc. Synchronous updating of dynamic interactive applications
US6546427B1 (en) * 1999-06-18 2003-04-08 International Business Machines Corp. Streaming multimedia network with automatically switchable content sources
US7120582B1 (en) * 1999-09-07 2006-10-10 Dragon Systems, Inc. Expanding an effective vocabulary of a speech recognition system
US6484136B1 (en) * 1999-10-21 2002-11-19 International Business Machines Corporation Language model adaptation via network of similar users
US6848080B1 (en) * 1999-11-05 2005-01-25 Microsoft Corporation Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors
US6671692B1 (en) * 1999-11-23 2003-12-30 Accenture Llp System for facilitating the navigation of data
US6697796B2 (en) * 2000-01-13 2004-02-24 Agere Systems Inc. Voice clip search
US20010049826A1 (en) * 2000-01-19 2001-12-06 Itzhak Wilf Method of searching video channels by content
US6728763B1 (en) * 2000-03-09 2004-04-27 Ben W. Chen Adaptive media streaming server for playing live and streaming media content on demand through web client's browser with no additional software or plug-ins
US20050165771A1 (en) * 2000-03-14 2005-07-28 Sony Corporation Information providing apparatus and method, information processing apparatus and method, and program storage medium
US6816858B1 (en) * 2000-03-31 2004-11-09 International Business Machines Corporation System, method and apparatus providing collateral information for a video/audio stream
US7260564B1 (en) * 2000-04-07 2007-08-21 Virage, Inc. Network video guide and spidering
US6738745B1 (en) * 2000-04-07 2004-05-18 International Business Machines Corporation Methods and apparatus for identifying a non-target language in a speech recognition system
US20010045962A1 (en) * 2000-05-27 2001-11-29 Lg Electronics Inc. Apparatus and method for mapping object data for efficient matching between user preference information and content description information
US6873993B2 (en) * 2000-06-21 2005-03-29 Canon Kabushiki Kaisha Indexing method and apparatus
US20050216443A1 (en) * 2000-07-06 2005-09-29 Streamsage, Inc. Method and system for indexing and searching timed media information based upon relevance intervals
US20020069218A1 (en) * 2000-07-24 2002-06-06 Sanghoon Sull System and method for indexing, searching, identifying, and editing portions of electronic multimedia files
US20020052925A1 (en) * 2000-08-29 2002-05-02 Yoohwan Kim Method and apparatus for information delivery on the internet
US20040199502A1 (en) * 2000-09-07 2004-10-07 Microsoft Corporation System and method for content retrieval
US6748375B1 (en) * 2000-09-07 2004-06-08 Microsoft Corporation System and method for content retrieval
US20040103433A1 (en) * 2000-09-07 2004-05-27 Yvan Regeard Search method for audio-visual programmes or contents on an audio-visual flux containing tables of events distributed by a database
US20060015904A1 (en) * 2000-09-08 2006-01-19 Dwight Marcus Method and apparatus for creation, distribution, assembly and verification of media
US6856997B2 (en) * 2000-10-27 2005-02-15 Lg Electronics Inc. Apparatus and method for providing file structure for multimedia streaming service
US6691123B1 (en) * 2000-11-10 2004-02-10 Imp Technology As Method for structuring and searching information
US20020099695A1 (en) * 2000-11-21 2002-07-25 Abajian Aram Christian Internet streaming media workflow architecture
US6785688B2 (en) * 2000-11-21 2004-08-31 America Online, Inc. Internet streaming media workflow architecture
US20050187965A1 (en) * 2000-11-21 2005-08-25 Abajian Aram C. Grouping multimedia and streaming media search results
US7308487B1 (en) * 2000-12-12 2007-12-11 Igate Corp. System and method for providing fault-tolerant remote controlled computing devices
US20020133398A1 (en) * 2001-01-31 2002-09-19 Microsoft Corporation System and method for delivering media
US20020108112A1 (en) * 2001-02-02 2002-08-08 Ensequence, Inc. System and method for thematically analyzing and annotating an audio-visual sequence
US6973428B2 (en) * 2001-05-24 2005-12-06 International Business Machines Corporation System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition
US6687697B2 (en) * 2001-07-30 2004-02-03 Microsoft Corporation System and method for improved string matching under noisy channel conditions
US20040205535A1 (en) * 2001-09-10 2004-10-14 Xerox Corporation Method and apparatus for the construction and use of table-like visualizations of hierarchic material
US6985861B2 (en) * 2001-12-12 2006-01-10 Hewlett-Packard Development Company, L.P. Systems and methods for combining subword recognition and whole word recognition of a spoken input
US20030123841A1 (en) * 2001-12-27 2003-07-03 Sylvie Jeannin Commercial detection in audio-visual content based on scene change distances on separator boundaries
US20030171926A1 (en) * 2002-03-07 2003-09-11 Narasimha Suresh System for information storage, retrieval and voice based content search and methods thereof
US20050192987A1 (en) * 2002-04-16 2005-09-01 Microsoft Corporation Media content descriptions
US20050055346A1 (en) * 2002-06-14 2005-03-10 Entopia, Inc. System and method for personalized information retrieval based on user expertise
US20050096910A1 (en) * 2002-12-06 2005-05-05 Watson Kirk L. Formed document templates and related methods and systems for automated sequential insertion of speech recognition results
US20040199507A1 (en) * 2003-04-04 2004-10-07 Roger Tawa Indexing media files in a distributed, multi-user system for managing and editing digital media
US7177881B2 (en) * 2003-06-23 2007-02-13 Sony Corporation Network media channels
US20050033758A1 (en) * 2003-08-08 2005-02-10 Baxter Brent A. Media indexer
US20050086692A1 (en) * 2003-10-17 2005-04-21 Mydtv, Inc. Searching for programs and updating viewer preferences with reference to program segment characteristics
US20060020662A1 (en) * 2004-01-27 2006-01-26 Emergent Music Llc Enabling recommendations and community by massively-distributed nearest-neighbor searching
US20050197724A1 (en) * 2004-03-08 2005-09-08 Raja Neogi System and method to generate audio fingerprints for classification and storage of audio clips
US20050256867A1 (en) * 2004-03-15 2005-11-17 Yahoo! Inc. Search systems and methods with integration of aggregate user annotations
US20050234875A1 (en) * 2004-03-31 2005-10-20 Auerbach David B Methods and systems for processing media files
US20050229118A1 (en) * 2004-03-31 2005-10-13 Fuji Xerox Co., Ltd. Systems and methods for browsing multimedia content on small mobile devices
US20060020971A1 (en) * 2004-07-22 2006-01-26 Thomas Poslinski Multi channel program guide with integrated progress bars
US20060047580A1 (en) * 2004-08-30 2006-03-02 Diganta Saha Method of searching, reviewing and purchasing music track or song by lyrical content
US20060053156A1 (en) * 2004-09-03 2006-03-09 Howard Kaushansky Systems and methods for developing intelligence from information existing on a network
US20060265421A1 (en) * 2005-02-28 2006-11-23 Shamal Ranasinghe System and method for creating a playlist
US20070005569A1 (en) * 2005-06-30 2007-01-04 Microsoft Corporation Searching an index of media content
US20070041522A1 (en) * 2005-08-19 2007-02-22 At&T Corp. System and method for integrating and managing E-mail, voicemail, and telephone conversations using speech processing techniques
US20070078708A1 (en) * 2005-09-30 2007-04-05 Hua Yu Using speech recognition to determine advertisements relevant to audio content and/or audio content relevant to advertisements
US20070100787A1 (en) * 2005-11-02 2007-05-03 Creative Technology Ltd. System for downloading digital content published in a media channel
US20070106693A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for providing virtual media channels based on media search
US20070106685A1 (en) * 2005-11-09 2007-05-10 Podzinger Corp. Method and apparatus for updating speech recognition databases and reindexing audio and video content using the same
US20070106646A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc User-directed navigation of multimedia search results
US20070118873A1 (en) * 2005-11-09 2007-05-24 Bbnt Solutions Llc Methods and apparatus for merging media content
US20070106760A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for dynamic presentation of advertising, factual, and informational content using enhanced metadata in search-driven media applications
US20070106660A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Method and apparatus for using confidence scores of enhanced metadata in search-driven media applications
US7801910B2 (en) * 2005-11-09 2010-09-21 Ramp Holdings, Inc. Method and apparatus for timed tagging of media content

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8656039B2 (en) 2003-12-10 2014-02-18 Mcafee, Inc. Rule parser
US8548170B2 (en) 2003-12-10 2013-10-01 Mcafee, Inc. Document de-registration
US9092471B2 (en) 2003-12-10 2015-07-28 Mcafee, Inc. Rule parser
US9374225B2 (en) 2003-12-10 2016-06-21 Mcafee, Inc. Document de-registration
US8762386B2 (en) 2003-12-10 2014-06-24 Mcafee, Inc. Method and apparatus for data capture and analysis system
US8560534B2 (en) 2004-08-23 2013-10-15 Mcafee, Inc. Database for a capture system
US8707008B2 (en) 2004-08-24 2014-04-22 Mcafee, Inc. File system for a capture system
US8730955B2 (en) 2005-08-12 2014-05-20 Mcafee, Inc. High speed packet capture
US8554774B2 (en) 2005-08-31 2013-10-08 Mcafee, Inc. System and method for word indexing in a capture system and querying thereof
US9697231B2 (en) 2005-11-09 2017-07-04 Cxense Asa Methods and apparatus for providing virtual media channels based on media search
US20070118873A1 (en) * 2005-11-09 2007-05-24 Bbnt Solutions Llc Methods and apparatus for merging media content
US9697230B2 (en) 2005-11-09 2017-07-04 Cxense Asa Methods and apparatus for dynamic presentation of advertising, factual, and informational content using enhanced metadata in search-driven media applications
US20070106693A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for providing virtual media channels based on media search
US20070106760A1 (en) * 2005-11-09 2007-05-10 Bbnt Solutions Llc Methods and apparatus for dynamic presentation of advertising, factual, and informational content using enhanced metadata in search-driven media applications
US20070106685A1 (en) * 2005-11-09 2007-05-10 Podzinger Corp. Method and apparatus for updating speech recognition databases and reindexing audio and video content using the same
US9094338B2 (en) 2006-05-22 2015-07-28 Mcafee, Inc. Attributes of captured objects in a capture system
US8683035B2 (en) 2006-05-22 2014-03-25 Mcafee, Inc. Attributes of captured objects in a capture system
US20110119291A1 (en) * 2006-06-14 2011-05-19 Qsent, Inc. Entity Identification and/or Association Using Multiple Data Elements
US20080077583A1 (en) * 2006-09-22 2008-03-27 Pluggd Inc. Visual interface for identifying positions of interest within a sequentially ordered information encoding
US8966389B2 (en) 2006-09-22 2015-02-24 Limelight Networks, Inc. Visual interface for identifying positions of interest within a sequentially ordered information encoding
US8396878B2 (en) 2006-09-22 2013-03-12 Limelight Networks, Inc. Methods and systems for generating automated tags for video files
US9015172B2 (en) 2006-09-22 2015-04-21 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search service system
US20080154889A1 (en) * 2006-12-22 2008-06-26 Pfeiffer Silvia Video searching engine and methods
US20080229205A1 (en) * 2007-03-13 2008-09-18 Samsung Electronics Co., Ltd. Method of providing metadata on part of video image, method of managing the provided metadata and apparatus using the methods
US8635706B2 (en) * 2008-07-10 2014-01-21 Mcafee, Inc. System and method for data mining and security policy management
US8601537B2 (en) * 2008-07-10 2013-12-03 Mcafee, Inc. System and method for data mining and security policy management
US20120180137A1 (en) * 2008-07-10 2012-07-12 Mcafee, Inc. System and method for data mining and security policy management
US10367786B2 (en) 2008-08-12 2019-07-30 Mcafee, Llc Configuration management for a capture/registration system
US9253154B2 (en) 2008-08-12 2016-02-02 Mcafee, Inc. Configuration management for a capture/registration system
US20100169095A1 (en) * 2008-12-26 2010-07-01 Yasuharu Asano Data processing apparatus, data processing method, and program
US8850591B2 (en) 2009-01-13 2014-09-30 Mcafee, Inc. System and method for concept building
US8706709B2 (en) 2009-01-15 2014-04-22 Mcafee, Inc. System and method for intelligent term grouping
US9602548B2 (en) 2009-02-25 2017-03-21 Mcafee, Inc. System and method for intelligent state management
US9195937B2 (en) 2009-02-25 2015-11-24 Mcafee, Inc. System and method for intelligent state management
US9313232B2 (en) 2009-03-25 2016-04-12 Mcafee, Inc. System and method for data mining and security policy management
US8918359B2 (en) 2009-03-25 2014-12-23 Mcafee, Inc. System and method for data mining and security policy management
US8667121B2 (en) 2009-03-25 2014-03-04 Mcafee, Inc. System and method for managing data and policies
US20110106531A1 (en) * 2009-10-30 2011-05-05 Sony Corporation Program endpoint time detection apparatus and method, and program information retrieval system
US9009054B2 (en) * 2009-10-30 2015-04-14 Sony Corporation Program endpoint time detection apparatus and method, and program information retrieval system
US20110196981A1 (en) * 2010-02-03 2011-08-11 Futurewei Technologies, Inc. Combined Binary String for Signaling Byte Range of Media Fragments in Adaptive Streaming
US9237178B2 (en) * 2010-02-03 2016-01-12 Futurewei Technologies, Inc. Combined binary string for signaling byte range of media fragments in adaptive streaming
US8555163B2 (en) * 2010-06-09 2013-10-08 Microsoft Corporation Smooth streaming client component
US20110307623A1 (en) * 2010-06-09 2011-12-15 Microsoft Corporation Smooth streaming client component
US10313337B2 (en) 2010-11-04 2019-06-04 Mcafee, Llc System and method for protecting specified data combinations
US9794254B2 (en) 2010-11-04 2017-10-17 Mcafee, Inc. System and method for protecting specified data combinations
US11316848B2 (en) 2010-11-04 2022-04-26 Mcafee, Llc System and method for protecting specified data combinations
US10666646B2 (en) 2010-11-04 2020-05-26 Mcafee, Llc System and method for protecting specified data combinations
US8806615B2 (en) 2010-11-04 2014-08-12 Mcafee, Inc. System and method for protecting specified data combinations
US20190303402A1 (en) * 2011-12-22 2019-10-03 Tivo Solutions Inc. User interface for viewing targeted segments of multimedia content based on time-based metadata search criteria
US11709888B2 (en) * 2011-12-22 2023-07-25 Tivo Solutions Inc. User interface for viewing targeted segments of multimedia content based on time-based metadata search criteria
US9430564B2 (en) 2011-12-27 2016-08-30 Mcafee, Inc. System and method for providing data protection workflows in a network environment
US8700561B2 (en) 2011-12-27 2014-04-15 Mcafee, Inc. System and method for providing data protection workflows in a network environment
CN102592628A (en) * 2012-02-15 2012-07-18 张群 Play control method of audio and video play file
US20130215013A1 (en) * 2012-02-22 2013-08-22 Samsung Electronics Co., Ltd. Mobile communication terminal and method of generating content thereof
US8521719B1 (en) 2012-10-10 2013-08-27 Limelight Networks, Inc. Searchable and size-constrained local log repositories for tracking visitors' access to web content
CN103268345A (en) * 2013-05-27 2013-08-28 慈文传媒集团股份有限公司 Method and device for retrieving film and television data
US20150358261A1 (en) * 2014-06-04 2015-12-10 Wistron Corporation Playback method and associated transmitting device, playback device, and communication system
US10609454B2 (en) 2015-07-31 2020-03-31 Promptu Systems Corporation Natural language navigation and assisted viewing of indexed audio video streams, notably sports contests
US11363338B2 (en) 2015-07-31 2022-06-14 Promptu Systems Corporation Natural language navigation relative to events in content of an audio video stream
WO2017023763A1 (en) * 2015-07-31 2017-02-09 Promptu Systems Corporation Natural language navigation and assisted viewing of indexed audio video streams, notably sports contests
US10068573B1 (en) * 2016-12-21 2018-09-04 Amazon Technologies, Inc. Approaches for voice-activated audio commands
CN110786028A (en) * 2017-08-30 2020-02-11 深圳市欢太科技有限公司 Application resource processing method and related product
WO2019041193A1 (en) * 2017-08-30 2019-03-07 深圳市云中飞网络科技有限公司 Application resource processing method and related product
WO2020091431A1 (en) * 2018-11-02 2020-05-07 주식회사 모두앤모두 Subtitle generation system using graphic object

Also Published As

Publication number Publication date
US20070112837A1 (en) 2007-05-17
WO2007056535A2 (en) 2007-05-18
US20160188577A1 (en) 2016-06-30
US7801910B2 (en) 2010-09-21
WO2007056535A3 (en) 2007-10-11

Similar Documents

Publication Publication Date Title
US7801910B2 (en) Method and apparatus for timed tagging of media content
US9934223B2 (en) Methods and apparatus for merging media content
US9697231B2 (en) Methods and apparatus for providing virtual media channels based on media search
US9697230B2 (en) Methods and apparatus for dynamic presentation of advertising, factual, and informational content using enhanced metadata in search-driven media applications
US20070106646A1 (en) User-directed navigation of multimedia search results
US20160012047A1 (en) Method and Apparatus for Updating Speech Recognition Databases and Reindexing Audio and Video Content Using the Same
US7640272B2 (en) Using automated content analysis for audio/video content consumption
JP6838098B2 (en) Knowledge panel contextualizing
KR101994987B1 (en) Related entities
US8762370B1 (en) Document-based synonym generation
US8312022B2 (en) Search engine optimization
US8332391B1 (en) Method and apparatus for automatically identifying compounds
KR101579551B1 (en) Automatic expanded language search
US8204891B2 (en) Method and subsystem for searching media content within a content-search-service system
US20150161251A1 (en) Triggering music answer boxes relevant to user search queries
US20120323897A1 (en) Query-dependent audio/video clip search result previews
US20080235209A1 (en) Method and apparatus for search result snippet analysis for query expansion and result filtering
US20050055372A1 (en) Matching media file metadata to standardized metadata
US9015172B2 (en) Method and subsystem for searching media content within a content-search service system
US20090287677A1 (en) Streaming media instant answer on internet search result page
Zizka et al. Web-based lecture browser with speech search

Legal Events

Date Code Title Description
AS Assignment

Owner name: RAMP HOLDINGS, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HOUH, HENRY;STERN, JEFFREY N.;REEL/FRAME:025311/0657

Effective date: 20101102

AS Assignment

Owner name: CXENSE ASA, NORWAY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RAMP HOLDINGS INC.;REEL/FRAME:037018/0816

Effective date: 20151021

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION