WO1999041684A1 - Processing and delivery of audio-video information - Google Patents

Processing and delivery of audio-video information Download PDF

Info

Publication number
WO1999041684A1
WO1999041684A1 PCT/US1999/003028 US9903028W WO9941684A1 WO 1999041684 A1 WO1999041684 A1 WO 1999041684A1 US 9903028 W US9903028 W US 9903028W WO 9941684 A1 WO9941684 A1 WO 9941684A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
audio
video
scenes
user
Prior art date
Application number
PCT/US1999/003028
Other languages
French (fr)
Inventor
Ali S. Kazeroonian
David Kamins
James Pratt Twitchell
John M. Gauch
Susan E. Gauch
David James Pankratz
Robert James O'connell
Original Assignee
Fast Tv
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fast Tv filed Critical Fast Tv
Priority to AU26744/99A priority Critical patent/AU2674499A/en
Publication of WO1999041684A1 publication Critical patent/WO1999041684A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/785Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Definitions

  • This invention relates to processing and delivery of audio-video information.
  • Audio-video information is broadcast to viewers, using a variety of communication media and techniques including, for example, conventional television transmissions, digital or analog satellite transmissions, or data transfer to networked personal computers (for example, through the Internet's World Wide Web) .
  • Limited experimental and non-commercial deployments of "interactive television” (ITV) and "video-on-demand” (VOD) for example by cable television providers, attempt to provide more targeted programming to viewers. These systems have not yet led to significant commercial viewership.
  • Some of these interactive television systems offer the capability for a viewer to search a large pool of programs and to retrieve and view only a subset of programs that match his interest.
  • Video-on-demand systems allow a user to select, retrieve and view one or more video programs in their entirety by selecting from an existing, pre-processed content menu. For example, the viewer may rely on the program title, a predetermined subject category, names of on-screen talent, and perhaps other tangential information to decide whether to select a particular program. A significant amount of time and resources are devoted to preparing the programs for availability on the video-on-demand selection menus.
  • Research systems that provide content-based access to audio-video libraries have also been developed.
  • One such system called “Informedia, " has been developed by researchers at the Carnegie-Mellon University. This system incorporates mechanisms for detecting, indexing, retrieving, and viewing video scenes in a stored audio- video library. The system requires the viewer to specify the keywords manually and then it retrieves the relevant videos. This and other systems use automatic methods for detecting scene changes.
  • a variety of systems have also been developed for accessing audio-video information that has already been indexed using, for example, manual indexing techniques.
  • Audio-video processing techniques have been developed for detecting changes of scene based on video information. Such video scene detection has been used to allow a user to browse a video archive by viewing representative frames of each scene. Another use of automatic video scene detection has been to tag and index video for future retrieval, for example, to manage film and video assets. Systems with similar capabilities have also been based on image recognition techniques rather than video scene detection.
  • the invention relates to an automated real-time system for the processing and distribution of audio-video data.
  • One aspect of this system performs automated, real-time analysis and scene detection, segmentation, indexing, and encoding of video information for real-time presentation, browsing, or searching.
  • the data is available to a user (a viewer) without a substantial delay that would be introduced by manual or off-line (batch oriented) automatic processing of the data.
  • the processing is arranged in a pipelined process, reducing processing delays and requiring less intermediate storage than a batch-oriented processing approach.
  • the audio-video sources are segmented into individual scenes, such as one story in a news broadcast, - 3 - thereby allowing a user to access portions of source programming without having to view or scan through entire programs or to specify particular time intervals.
  • the system also makes combined use of video, audio, and closed-caption information in an audio-video signal to identify individual scenes. This combined use of multiple sources of information provides an improved scene detection capability. Characterization of individual scenes is based on a variety of sources of information including, for example, the closed-caption text and the output of a speech recognizer analyzing the audio information in the signal.
  • the invention provides a method for fully automated real-time processing and delivery of an audio-video program.
  • the method features accepting the audio-video program, for example from a satellite receiver, and detecting discrete scenes within the program based on the content of the program. For each of the discrete scenes, the method includes determining textual information related to the scene, indexing the scene using the textual information, optionally compressing or encoding the video data, storing audio-video data from the scene, and storing index data for the scene.
  • the method can also include accepting the description of interests of a user, such as keywords or category names associated with topics of interest to that user. The method then includes matching the description of interests of a user to stored index data for the scenes, and providing audio-video data from the matching scenes to the user, for example over a data communication network.
  • the invention can include one or more of the following features.
  • Detecting scenes, determining textual information related to the scene, indexing the scenes, and storing data from the scenes, can all occur in a pipelined manner while accepting the program. Providing audio-video data to the user can therefore begin prior to completion of accepting of the program. In this way, the user can view scenes of the program with low delay relative to accepting those scenes in the program.
  • the method can also include accepting a text document, for example from a text source such as a news source. For each of the discrete scenes, the method then includes further matching of the scene to the text document and, if a match is found, storing the matching information which associates the scene to the document. Providing audio-video data from the matching scenes then also includes providing any stored matching information which associates the scenes to the text document.
  • a text document for example from a text source such as a news source.
  • Scenes can be detected within the program based on the content of the program using both the audio and video portions of the program. This can include comparing a color distribution at one time in the program to the color distribution at another time, or computing a statistic of the rate of change of the video signal.
  • Scenes can also be detected by processing the closed captions of the program. This may involve, for example, comparing the frequency of previously selected words in one portion of the program to their frequency in another portion of the program. It may also involve detecting predetermined character sequences, such as punctuation or particular words, in the closed captions.
  • the textual information used for indexing can be determined by processing the closed captions of the scene. In addition, the information can be determined by processing the audio portion of the scene using an automatic speech recognizer.
  • the audio-video data is sent to the user over a data network, it can be sent using a data streaming - 5 - protocol and the Internet Protocol (IP) network protocol. Also, it can be multicast for reception by multiple users. In addition, the data can be compressed in accordance with the communication capacity of a communication path over the data communication network to the user.
  • IP Internet Protocol
  • the description of interests of a user can be received prior to accepting an audio-video program, thereby forming a profile for that user.
  • the description of interests can also be received after accepting the program, thereby forming a search request for that user.
  • the invention features a system for fully automated real-time processing and delivery of an audio-video program.
  • the system includes a segmenter/indexer for accepting the program and providing data for discrete scenes in the program, a media database for accepting and storing audio-video data for the discrete scenes from the segmenter/indexer, an information database for accepting and storing index data based on textual data related to the discrete scenes from the segmenter/indexer, and a communication network coupling the media database and a client computer for providing audio-video data stored in the media database which match a description of interests provided by a user of the client computer.
  • the invention provides software stored on computer readable media for causing a computer implementation of such an audio-video processing and delivery system to function.
  • Fig. 1 is a block diagram of physical components of a video processing system and its interconnections with data sources and users;
  • Fig. 2 is a block diagram of the software components of the audio-video processor, and the media and information databases;
  • Fig. 3 is a flowchart of operation of the audio- video processor
  • Fig. 4 is a schematic of the pipeline stages of processing
  • Fig. 5 is a schematic of the time evolution of pipeline stages on a dual processor computer.
  • a user at client computer t09 requests data from a video processing system (VPS) 100, and accepts the requested data, which originated at audio-video sources 110 and text content sources 104.
  • VPS 100 includes several server computers which process the audio-video and text data, respond to requests from users, and provide data to those users.
  • An audio-video processing server (AVPS) 101 accepts data from audio- video sources 110, processes the audio-video data, and provides the processed data to a media server 102 where it is stored or buffered prior to being distributed to client computer 109.
  • AVPS audio-video processing server
  • a text processing server (TPS) 105 accepts textual data, including text accompanying the audio-video data such as closed captioning (CC) , as well as text data from other text content sources 104, such as news wire service, processes the text data, and provides processed data to media server computer 102.
  • AVPS 101 and TPS 105 provide information related to the context and structure of the processed data to an information - 7 - server 103.
  • Client computer 109 communicates with a Web server 106 in order to request audio-video data.
  • Data stored in media server 102 is identified based on data stored in information server 103, and the identified audio-video and text data is sent from media server 102 to client computer 109 as a compressed data stream.
  • the server computers, AVPS 101, media server 102, information server 103, TPS 105, and Web server 106 are coupled to a local area network (LAN) 107.
  • LAN 107 is coupled to a wide area network (WAN) 108, such as the Internet.
  • Client computers 109 are also coupled to WAN 108.
  • Each of the computers 109 includes a processor, working memory, a program storage, such as a fixed or removable magnetic disk drive, and a network communication interface.
  • media server 102 and information server 103 include data storage devices, such as magnetic disk drives, and audio-video processing server 101 includes an additional processor enabling parallel processing on the audio-video server.
  • LAN 107 provides reliable high-data-rate communication between the server computers coupled directly to the LAN, for example using Ethernet supporting communication rates in the range 10-lOOMb/s.
  • WAN 108 provides lower-rate and less reliable communication between LAN 107 and client computer 109. If a client computer is coupled to WAN 108 over a standard telephone modem, this rate may be only 28kb/s, while the rate may be as high as IMb/s or higher if the client is coupled to the WAN over a Tl telephone connection and the WAN provides as fast or faster path than LAN 107.
  • Audio-video sources 110 includes one or more realtime sources of audio-video data.
  • the sources can include receivers of terrestrial or satellite broadcasts of analog or digitally-encoded television signals, or receivers of television signals provided over a wired - 8 - distribution system (for example, a cable television system) or over a data network (for example, the Internet) .
  • audio-video sources can include sources of non-real-time data, for example, recorded signals, such as edited television programming or feature films .
  • Audio-video data provided by audio-video sources 110 includes analog or digitally-encoded television signals.
  • a television signal includes video and audio signals, as well as a closed-caption (CC) text signal, encoded during the vertical blanking interval of analog signals .
  • CC closed-caption
  • client computer 109 includes software that communicates with software on Web server 106 to search for data or to set a personal profile, and includes software that accepts an audio-video data stream originating at media server 102 and transported over WAN 108.
  • search mode the user searches the available archives of video clips and text.
  • a corresponding data stream is transported from media server 102 to client computer 109.
  • the data stream can be transported using a variety of protocols, such as ones using streaming techniques layered on the Internet Protocol (IP) network layer protocol.
  • IP Internet Protocol
  • the data stream can be compressed to satisfy communication-rate constraints on the data path between media server 102 and client computer 109.
  • Data can be transferred in a variety of ways from media server 102 to client computer 109, including using an on-demand (that is, direct) transmission, or as part of a broadcast or multicast transmission in which multiple client computers receive the same data streams.
  • the client computer can also receive text data directly from text content sources 104 for concurrent presentation with audio-video data sent from media server 102.
  • AVPS 101 accepts a television signal from audio- video sources 110.
  • Information related to changes in video that is, from frame to frame
  • audio intensity, and punctuation and words in the CC text are used to split the video into distinct scenes (video clips) .
  • the video and audio from each scene is then compressed (encoded) and stored on media server 102.
  • the CC text, as well as information about the particular video source, broadcast time, etc., for each scene is stored in information server 103 to enable future searches that might return those scenes .
  • TPS 105 receives CC text for each video scene directly from AVPS 101.
  • TPS 105 assigns a category (a subject code or topic) to the CC text for a clip.
  • TPS 105 also receives text documents from text content sources 104, for example from a text server computer on site at a text news provider.
  • the CC for an audio-video clip is used to find related documents from the text sources.
  • the relationship information is provided to information server 103.
  • An example of this type of correlation of sources is an audio-video clip of a weather forecast being matched to a textual weather forecast. This match could later be used to present to a user at client computer 109 the textual weather forecast side-by-side with the audio-video weather forecast broadcast. Based on this type of correlation, news stories that fall in the same category as a video clip can also be shown side by side. The user is therefore able to see retrieved video together with related news stories. - 10 -
  • Server computer 106 includes server software that communicates with client software executing at client computer 109.
  • client software For instance, a Web browser executes at client computer 109 and communicates with a Web server executing at server computer 106.
  • the user using the client software, provides the server software with a request for data, for example by filling out an electronic form for a keyword-based search or for selecting a category or topic. If Web client and server software is used, the form is transported to between server computer 106 and client computer 109 using the hyper-text transport protocol (http) .
  • the server software accepts the request, accesses information server 103, and compiles a list of video clips and news articles.
  • This list is provided to the client software, in the form of a multimedia document or in some other form that can be interpreted by the client software.
  • the client software In response to input from the user, the client software then submits requests for the video clips to media server 102, and for text data to text content sources 104 and accepts the data provided in response to the requests.
  • a hyper-text list of video clips and news articles can include "thumbnail" views of video clips presented along with other identifying information for the clips.
  • the thumbnail can be a static image, or can change while it is presented to the user.
  • the frame chosen to be displayed can be chosen at the time the video clip is initially processed. For instance, the most stationary portion of the clip can be used to as the representative frame to display in the thumbnail view.
  • a time-varying thumbnail can sample frames from the video clip, for example sampling uniformly in time, or using a previously chosen sequence of relatively stationary frames.
  • the user can provide profile information in advance of the data being available. Then, as relevant audio-video data is processed, it can be provided to the user, for example, as a personalized broadcast .
  • a software audio-video processor 201 which executes on audio-video processing server (AVPS) 101 (shown in Fig. 1) is depicted.
  • Each of the illustrated software modules executes on one or more of the processors in AVPS 101 and communicates with other modules in audio-video processor 201 through shared memory buffers in the working storage. Execution of the modules can be allocated to the multiple processors in a variety of ways, for example, with one processor digitizing while another is performing segmentation. No intermediate disk files of the digitized video are created prior to the data passing through a segmentation coordinator 217.
  • a digitizer/decoder 225 provides an interface for the audio-video data accepted from audio-video sources 110.
  • digitizer/decoder 225 controls the acquisition of the analog signal and extraction and digitization of video, audio, and closed-caption data in the signal.
  • digitizer/decoder decodes the digital data stream and creates digital streams for the video, audio, and closed-caption data.
  • the digital video stream in both cases includes individual digitized frames. Each frame, as it is extracted, is stored in an indexed location in a video buffer 214. The corresponding digitized audio frame is stored in an audio buffer 223 at the same index location, thus insuring audio/video synchronization.
  • the digitized closed- caption text is stored in a CC buffer 228.
  • the correspondence between audio data and closed caption data - 12- is preserved by keeping track of the number of bytes of CC data in a given video frame.
  • a video segmenter 213 performs two analyses on each frame. First, it computes a color histogram of the frame, and compares it with the histogram of the previous and the next frames to produce a video color difference quantity. This quantity is related to the rate of change of the image. This comparison is performed by summing the absolute differences of corresponding histogram values associated with different colors. In the second analysis, a pixel-to-pixel change comparison is made between the current frame and the previous frame by summing of absolute differences of all changes in the pixel intensity and color values to produce a pixel difference quantity. Both quantities computed by video segmenter 213 are passed to segmentation coordinator 217. An audio segmenter 216 measures the average audio level for each frame to produce an audio level quantity. The audio level quantity is also passed to segmentation coordinator 217.
  • a CC segmenter 227 analyzes the CC data in CC Buffer 228 and checks for the presence of punctuation marks such as periods and for CC special characters such as carets (">>") . The CC segmenter also checks for the occurrence of common words across sentence boundaries . CC segmenter 227 passes a signal to segmentation coordinator 217 indicating whether a scene change is likely to have taken place based on the CC text. For example, the same important words used in two frames contributes to an indication that both frames are part of the same scene .
  • segmentation coordinator 217 the video color difference, pixel difference, and audio level quantities are compared to respective thresholds stored in the segmentation coordinator. If all of the quantities - 13 - exceed their respective thresholds, the segmentation coordinator determines whether a scene boundary should be declared at that point. This declaration is based in part on the signal received from CC segmenter 227. Once segmentation coordinator 217 determines that a scene boundary has occurred, the contents of video buffer 214, and audio buffer 223, including the audio- video data from the new scene, are passed to an encoder 231. Alternatively, the audio-video data can have been buffered in a temporary hard disk storage 234 and read by encoder 231 when the scene boundary is determined.
  • Encoder 231 compresses the video and audio data for a scene, and passes the compressed data to a media database 232, stored on media server 102.
  • Media database 232 includes individual files for each of the scenes, as well as an index or directory to access those files.
  • Segmentation coordinator 217 also sends data for new entries to be stored in a content table 219 and a text table 220 in a database 218 stored on information server 103.
  • the entry in content table 219 includes the location or index of the corresponding data stored in a media database 232, plus other information such as the duration of the scene and the name of the audio-video source.
  • the entry in text table 220 contains all of the CC text for the given scene. The two entries are related by a common index so that once the database is searched for a specific word in the closed caption entries, all of the related video scenes locations can be found.
  • Segmentation coordinator 217 also passes the CC text for a scene to a text processor 240 executing on TPS 105.
  • Text processor 240 matches the CC text to available text sources from text content sources 104.
  • the system can also include a speech recognizer.
  • This speech recognizer takes, as an input, the audio portion of an audio-video program and produces a word - 14 - sequence corresponding to that audio data.
  • This word sequence can be used to supplement, or can be used instead of, the CC text in the approach described above.
  • the audio-video processor 201 determines whether each new frame of data begins a new scene. Audio-video processor 201 begins by capturing, in a buffer, multiple frames, for example a fixed duration interval, of the audio-video source data (step 336) . This buffer is part of digitizer/decoder 225. One frame of data is then stored in each of video buffer 214 and audio buffer 223 (step 337) . In parallel, the CC text for the entire buffer is stored in the CC buffer 228 (step 338) . In many video sequences, speakers, anchor persons, or other talking persons stop their speech as the video subject matter changes from one topic to another. Video editors almost always paste together video segments with quiet audio tracks at both ends of the segment.
  • Audio Level quantity is measured (step 339) by taking the average of the audio level of the data in audio buffer 223. AL is then compared to a pre-set Audio Level Threshold (AL ⁇ h ) value (step 341) . If the Audio Level is less than the Audio Level Threshold, then a quiet frame has been reached that satisfies the audio level condition for a scene boundary, and further consideration of this frame as a scene boundary continues. If the level condition is not satisfied, a new frame, if available, is processed (step 349) . If the last frame of the buffer has been used then a new buffer is captured (step 336) otherwise the next frame from the buffer is read and processing continues (step 337) .
  • Consideration of the frame continues by computing a histogram of the colors in the frame.
  • an 8-bit color palette that is, 256 colors or color classes
  • a video color difference quantity (color spectrum, CS) is determined as the sum of the absolute differences of the corresponding histogram values (step 343) .
  • the value of CS is compared to a threshold CS ⁇ h (step 345) and if CS is greater than CS ⁇ h , the processing determines that a significant change in the color of the frames has occurred (perhaps a new object is being depicted in the frame with new colors) suggesting the possible start of a new scene. If CS is not greater than CS ⁇ h , then processing continues with the next frame (step 349) .
  • PS pixel difference quantity
  • PS is determined as the sum of the absolute differences between each of the intensity (luminance) and color (chrominance) values of corresponding pixels in the current frame and the previous frame.
  • the value of PS is compared to a threshold PS ⁇ h (step 348) and if PS is greater than PS ⁇ h , the processing concludes that a significant change in the whole frame has occurred, for example, caused by a movement of an object in the frame suggesting the possible start of a new scene. If PS is not greater than PS ⁇ h , then processing continues with the next frame (step 349) .
  • the text stored in the CC buffer is examined (step 340) to locate punctuation characters, including period (".”), single caret (">"), double carets (">>”), triple carets (">>>”), marking a new sentence, subject, or speaker. If a new sentence is found (step 342) , then processing proceeds to identify the most important words in the CC text (step 344) . Important words are identified by determining the frequency of their appearance in a representative textual - 16 - corpus . The least frequently occurring words are ranked as the most important words .
  • the processing compares the top words in the present CC buffer to those of the previous buffer and if the two buffers have several words in common, then the processing decides that the subject matter of the two buffers are probably the same and the two buffers belong to the same scene (step 346) and processing proceeds with the next buffer.
  • this frame is determined to be the first frame of a new scene (step 350) .
  • the video and audio data buffered for the entire scene is indexed and compressed as follows.
  • Processing determines whether to store the buffers belonging to the present scene as raw video files on disk buffers rather than passing them directly to the encoder (step 351) . If the disk storage mode is being used, then raw audio and video buffers for the new scene are written to the disk (step 353) . Otherwise, the audio and video buffers are passed directly to the encoder (step 352) . Processing for the scene finishes with passing information about the scene to information database 218, including the CC text, the length of the scene, the publisher of the video, the broadcast time, and the encoding algorithm with which the audio and video data is encoded or compressed (step 355) . An additional criterion can be used to locate scene boundaries based on the video signal . The average intensity (luminosity) of each frame is compared to a threshold. When the intensity is below a threshold, a scene change is declared. This intensity threshold can be used to detect black frames that would usually - 17 - indicate a scene change during a broadcast . Most broadcasts fade their programming to black before a commercial or between unrelated shows
  • An additional criterion can be used to locate scene boundaries based on the audio signal.
  • a fall in the audio level quantity below a threshold for more than a minimum duration is used to detect a pause or lull in a speech in a scene, and a scene boundary can be declared at that point.
  • a short pause (lull) by a speaker can be construed as the end of a sentence and can be used for detecting a scene where accurate punctuation in closed caption can not be available.
  • a lull duration of one second would indicate a possible scene boundary.
  • An additional criterion can be used to detect scene changes based on the CC text. Presence of a specified string or pattern can determine an automatic (overriding) boundary for scene.
  • Audio-video processing server (AVPS) 101 can be a multi-processor allowing parallel processing of the modules and tasks shown in Figs. 2 and 3. The tasks and processing steps are divided between separate processing threads or operating system processes. AVPS 101 uses threads to achieve real-time video encoding throughput, high segmentation speed, and other characteristics that affect user interface responsiveness. The pipelined architecture supported by these multiple threads provides - 18 - the end-users with access to the indexed videos only moments after the videos are received.
  • a video capture thread 458 is separate from a segmentation thread 459, allowing continuous (real-time) capture of video while the segmentation of the earlier video data is taking place.
  • Using threads is especially advantageous when a process consists of multiple tasks of varying urgency and processing needs.
  • the capture task (performed by the capture thread 458) is an urgent task: if the capture falls behind, the system loses video information that cannot be recovered.
  • the encoding process (performed by an encoding and database update thread 461) requires significant processing power, especially if the video clips are being compressed to a high compression factor.
  • Control of the processing threads is initiated by a (callback) message from the video capture driver 457 awakening capture thread 458 which begins processing.
  • a new encoding and database update thread 461 is spawned to encode the corresponding video buffer data and update the database. Since segmentation can take less time than encoding, a single segmentation thread can spawn multiple - 19 - encoding and database update threads . Note that at any given time, there can be multiple video buffers being processed in thread pipeline 456.
  • Fig. 5 is a schematic of the time evolution of pipeline stages (tasks) in audio-video processor 201 on a dual -processor computer, according to one aspect of the invention. This schematic can be generalized to any number of processors.
  • stage 564 spans the three major stages of processing: video capture 565, video segmentation 566, and video encoding/database updates 567.
  • P x the processors
  • the capture thread runs on P x and handles the capture stage 565.
  • the segmentation stage 566 is handled on threads running on either processor.
  • Encoding and database stage 567 is handled by multiple threads running concurrently on the two processors.
  • the horizontal axis represents time 570.
  • the six time intervals marked on the horizontal axis represent the equal -length captured buffers and ⁇ represents the duration of each buffer.
  • the capture of the second buffer B 2 572 begins immediately after the completion of the capture of B x .
  • Each buffer B x contains, typically, a plurality of frames with video and related information.
  • each segmentation process S ⁇ , S 2 , etc. is not the same as that of the - 20 - buffers B 1# B 2 , . . . since the duration of each segmentation process is the same as the length of a scene, while buffers are all equal size.
  • the buffer boundary line 575 between B x and B 2 to cross S ⁇ .
  • the various segments Si-Sg are typically of different lengths.
  • a single segment can contain video frames from two (or more) buffers.
  • S_ can begin before the end of B_ since the Bj is a full buffer while the segmentation process takes place on each frame within a full buffer.
  • the time offset ⁇ G 576 is necessary because in many situations (especially when closed captioning is done in real-time by the video broadcasters) the closed caption text lags behind the video. Typically ⁇ G is greater than this time lag.
  • segmentation algorithms are not as latency sensitive as the capture process and they are not as computationally intensive as the encoding process.
  • either of the two processors can be assigned to S l t S 2 , etc.
  • the encoding process E x 558 begins a time ⁇ E 587 after the start time 583 of S x .
  • E 1 (P2) which we assume is running on the first processor P ⁇ , can take longer than the actual duration of the scene. This is because encoding is a time consuming process and P x can be performing multiple tasks.
  • the gray projection 584 demonstrates the mapping between S x and the corresponding encoding process E .
  • the second segment (S 2 578) is ready for encoding E 2 before E_ is completed.
  • the pipeline architecture described here enables the audio-video processor to begin encoding the second segment before - 21 - encoding of the first segment is completed and thereby provide real-time scene detection, capture, and encoding.
  • E_ and E 2 run concurrently for part of their lifetime (and in this case on different processors) .
  • E 3 589 begins before the completion of E 2 and at certain times E 4 592, E 5 590, and E 6 593 all run concurrently .
  • the gray projection 585 shows the mapping between S 4 and E 4 . Note that E 4 takes significantly longer than S 4 , partly because E 4 is running on ⁇ > ⁇ which is also occupied with the capture process.
  • the gray projection 586 shows the mapping between S 6 582 and E 6 593. Note that E 6 ends shortly after B 6 , thus maintaining the throughput of the pipeline without any congestion.

Abstract

An automated real-time system for processing and distribution of audio-video data. The system performs automated, real-time analysis and scene detection, segmentation, indexing, and encoding of video for real-time presentation, browsing, or searching. By using automated real-time processing of audio-video data sources, the data is available to a user (a viewer) without a substantial delay that would be introduced by manual or off-line (batch oriented) automatic processing of the data. The processing is arranged in a pipelined process, reducing processing delays and requiring less intermediate storage than a batch-oriented processing approach. The audio-video sources are segmented into individual scenes, such as one story in a news broadcast, thereby allowing a user to access portions of source programming without having to view or scan through the entire programs or to specify a particular time interval. The system also makes combined use of video, audio, and closed-caption information in an audio-video signal to identify individual scenes. This combined use of multiple sources of information provides an improved scene detection capability. Characterization of individual scenes is based on a variety of sources of information including, for example, the closed-caption text and the output of a speech recognizer analyzing the audio information in the signal.

Description

- 1 -
PROCESSING AND DELIVERY OF AUDIO-VIDEO INFORMATION
Background This invention relates to processing and delivery of audio-video information.
Audio-video information is broadcast to viewers, using a variety of communication media and techniques including, for example, conventional television transmissions, digital or analog satellite transmissions, or data transfer to networked personal computers (for example, through the Internet's World Wide Web) . Limited experimental and non-commercial deployments of "interactive television" (ITV) and "video-on-demand" (VOD) , for example by cable television providers, attempt to provide more targeted programming to viewers. These systems have not yet led to significant commercial viewership. Some of these interactive television systems offer the capability for a viewer to search a large pool of programs and to retrieve and view only a subset of programs that match his interest. Video-on-demand systems allow a user to select, retrieve and view one or more video programs in their entirety by selecting from an existing, pre-processed content menu. For example, the viewer may rely on the program title, a predetermined subject category, names of on-screen talent, and perhaps other tangential information to decide whether to select a particular program. A significant amount of time and resources are devoted to preparing the programs for availability on the video-on-demand selection menus. Research systems that provide content-based access to audio-video libraries have also been developed. One such system, called "Informedia, " has been developed by researchers at the Carnegie-Mellon University. This system incorporates mechanisms for detecting, indexing, retrieving, and viewing video scenes in a stored audio- video library. The system requires the viewer to specify the keywords manually and then it retrieves the relevant videos. This and other systems use automatic methods for detecting scene changes. A variety of systems have also been developed for accessing audio-video information that has already been indexed using, for example, manual indexing techniques.
Audio-video processing techniques have been developed for detecting changes of scene based on video information. Such video scene detection has been used to allow a user to browse a video archive by viewing representative frames of each scene. Another use of automatic video scene detection has been to tag and index video for future retrieval, for example, to manage film and video assets. Systems with similar capabilities have also been based on image recognition techniques rather than video scene detection.
Summary of the Invention The invention relates to an automated real-time system for the processing and distribution of audio-video data. One aspect of this system performs automated, real-time analysis and scene detection, segmentation, indexing, and encoding of video information for real-time presentation, browsing, or searching. By using automated real-time processing of audio-video data sources, the data is available to a user (a viewer) without a substantial delay that would be introduced by manual or off-line (batch oriented) automatic processing of the data. The processing is arranged in a pipelined process, reducing processing delays and requiring less intermediate storage than a batch-oriented processing approach. The audio-video sources are segmented into individual scenes, such as one story in a news broadcast, - 3 - thereby allowing a user to access portions of source programming without having to view or scan through entire programs or to specify particular time intervals. The system also makes combined use of video, audio, and closed-caption information in an audio-video signal to identify individual scenes. This combined use of multiple sources of information provides an improved scene detection capability. Characterization of individual scenes is based on a variety of sources of information including, for example, the closed-caption text and the output of a speech recognizer analyzing the audio information in the signal.
In one aspect, in general, the invention provides a method for fully automated real-time processing and delivery of an audio-video program. The method features accepting the audio-video program, for example from a satellite receiver, and detecting discrete scenes within the program based on the content of the program. For each of the discrete scenes, the method includes determining textual information related to the scene, indexing the scene using the textual information, optionally compressing or encoding the video data, storing audio-video data from the scene, and storing index data for the scene. The method can also include accepting the description of interests of a user, such as keywords or category names associated with topics of interest to that user. The method then includes matching the description of interests of a user to stored index data for the scenes, and providing audio-video data from the matching scenes to the user, for example over a data communication network.
The invention can include one or more of the following features.
Detecting scenes, determining textual information related to the scene, indexing the scenes, and storing data from the scenes, can all occur in a pipelined manner while accepting the program. Providing audio-video data to the user can therefore begin prior to completion of accepting of the program. In this way, the user can view scenes of the program with low delay relative to accepting those scenes in the program.
The method can also include accepting a text document, for example from a text source such as a news source. For each of the discrete scenes, the method then includes further matching of the scene to the text document and, if a match is found, storing the matching information which associates the scene to the document. Providing audio-video data from the matching scenes then also includes providing any stored matching information which associates the scenes to the text document.
Scenes can be detected within the program based on the content of the program using both the audio and video portions of the program. This can include comparing a color distribution at one time in the program to the color distribution at another time, or computing a statistic of the rate of change of the video signal.
Scenes can also be detected by processing the closed captions of the program. This may involve, for example, comparing the frequency of previously selected words in one portion of the program to their frequency in another portion of the program. It may also involve detecting predetermined character sequences, such as punctuation or particular words, in the closed captions. The textual information used for indexing can be determined by processing the closed captions of the scene. In addition, the information can be determined by processing the audio portion of the scene using an automatic speech recognizer.
If the audio-video data is sent to the user over a data network, it can be sent using a data streaming - 5 - protocol and the Internet Protocol (IP) network protocol. Also, it can be multicast for reception by multiple users. In addition, the data can be compressed in accordance with the communication capacity of a communication path over the data communication network to the user.
The description of interests of a user can be received prior to accepting an audio-video program, thereby forming a profile for that user. The description of interests can also be received after accepting the program, thereby forming a search request for that user.
In another aspect, in general, the invention features a system for fully automated real-time processing and delivery of an audio-video program. The system includes a segmenter/indexer for accepting the program and providing data for discrete scenes in the program, a media database for accepting and storing audio-video data for the discrete scenes from the segmenter/indexer, an information database for accepting and storing index data based on textual data related to the discrete scenes from the segmenter/indexer, and a communication network coupling the media database and a client computer for providing audio-video data stored in the media database which match a description of interests provided by a user of the client computer.
In another aspect, in general, the invention provides software stored on computer readable media for causing a computer implementation of such an audio-video processing and delivery system to function. Other features and advantages of the invention will be apparent from the following description, and from the claims. - 6
Description of the Drawings Fig. 1 is a block diagram of physical components of a video processing system and its interconnections with data sources and users; Fig. 2 is a block diagram of the software components of the audio-video processor, and the media and information databases;
Fig. 3 is a flowchart of operation of the audio- video processor; Fig. 4 is a schematic of the pipeline stages of processing; and
Fig. 5 is a schematic of the time evolution of pipeline stages on a dual processor computer.
Description Referring to Fig. 1, a user at client computer t09 requests data from a video processing system (VPS) 100, and accepts the requested data, which originated at audio-video sources 110 and text content sources 104. VPS 100 includes several server computers which process the audio-video and text data, respond to requests from users, and provide data to those users. An audio-video processing server (AVPS) 101 accepts data from audio- video sources 110, processes the audio-video data, and provides the processed data to a media server 102 where it is stored or buffered prior to being distributed to client computer 109. A text processing server (TPS) 105 accepts textual data, including text accompanying the audio-video data such as closed captioning (CC) , as well as text data from other text content sources 104, such as news wire service, processes the text data, and provides processed data to media server computer 102. AVPS 101 and TPS 105 provide information related to the context and structure of the processed data to an information - 7 - server 103. Client computer 109 communicates with a Web server 106 in order to request audio-video data. Data stored in media server 102 is identified based on data stored in information server 103, and the identified audio-video and text data is sent from media server 102 to client computer 109 as a compressed data stream.
The server computers, AVPS 101, media server 102, information server 103, TPS 105, and Web server 106 are coupled to a local area network (LAN) 107. LAN 107 is coupled to a wide area network (WAN) 108, such as the Internet. Client computers 109 are also coupled to WAN 108. Each of the computers 109 includes a processor, working memory, a program storage, such as a fixed or removable magnetic disk drive, and a network communication interface. In addition, media server 102 and information server 103 include data storage devices, such as magnetic disk drives, and audio-video processing server 101 includes an additional processor enabling parallel processing on the audio-video server. LAN 107 provides reliable high-data-rate communication between the server computers coupled directly to the LAN, for example using Ethernet supporting communication rates in the range 10-lOOMb/s. WAN 108 provides lower-rate and less reliable communication between LAN 107 and client computer 109. If a client computer is coupled to WAN 108 over a standard telephone modem, this rate may be only 28kb/s, while the rate may be as high as IMb/s or higher if the client is coupled to the WAN over a Tl telephone connection and the WAN provides as fast or faster path than LAN 107.
Audio-video sources 110 includes one or more realtime sources of audio-video data. The sources can include receivers of terrestrial or satellite broadcasts of analog or digitally-encoded television signals, or receivers of television signals provided over a wired - 8 - distribution system (for example, a cable television system) or over a data network (for example, the Internet) . In addition, audio-video sources can include sources of non-real-time data, for example, recorded signals, such as edited television programming or feature films .
Audio-video data provided by audio-video sources 110 includes analog or digitally-encoded television signals. A television signal includes video and audio signals, as well as a closed-caption (CC) text signal, encoded during the vertical blanking interval of analog signals .
Turning now to the user's interface to the system, client computer 109 includes software that communicates with software on Web server 106 to search for data or to set a personal profile, and includes software that accepts an audio-video data stream originating at media server 102 and transported over WAN 108. In a search mode, the user searches the available archives of video clips and text. Once a list of desired video clips and news articles is compiled by the user, a corresponding data stream is transported from media server 102 to client computer 109. The data stream can be transported using a variety of protocols, such as ones using streaming techniques layered on the Internet Protocol (IP) network layer protocol. The data stream can be compressed to satisfy communication-rate constraints on the data path between media server 102 and client computer 109. Data can be transferred in a variety of ways from media server 102 to client computer 109, including using an on-demand (that is, direct) transmission, or as part of a broadcast or multicast transmission in which multiple client computers receive the same data streams. The client computer can also receive text data directly from text content sources 104 for concurrent presentation with audio-video data sent from media server 102.
Turning now to the server computers that make up VPS 100, AVPS 101 accepts a television signal from audio- video sources 110. Information related to changes in video (that is, from frame to frame) , audio intensity, and punctuation and words in the CC text are used to split the video into distinct scenes (video clips) . The video and audio from each scene is then compressed (encoded) and stored on media server 102. The CC text, as well as information about the particular video source, broadcast time, etc., for each scene is stored in information server 103 to enable future searches that might return those scenes . TPS 105 receives CC text for each video scene directly from AVPS 101. TPS 105 assigns a category (a subject code or topic) to the CC text for a clip. This category information is provided to information server 103. TPS 105 also receives text documents from text content sources 104, for example from a text server computer on site at a text news provider. The CC for an audio-video clip is used to find related documents from the text sources. The relationship information is provided to information server 103. An example of this type of correlation of sources is an audio-video clip of a weather forecast being matched to a textual weather forecast. This match could later be used to present to a user at client computer 109 the textual weather forecast side-by-side with the audio-video weather forecast broadcast. Based on this type of correlation, news stories that fall in the same category as a video clip can also be shown side by side. The user is therefore able to see retrieved video together with related news stories. - 10 -
Server computer 106 includes server software that communicates with client software executing at client computer 109. For instance, a Web browser executes at client computer 109 and communicates with a Web server executing at server computer 106. The user, using the client software, provides the server software with a request for data, for example by filling out an electronic form for a keyword-based search or for selecting a category or topic. If Web client and server software is used, the form is transported to between server computer 106 and client computer 109 using the hyper-text transport protocol (http) . The server software accepts the request, accesses information server 103, and compiles a list of video clips and news articles. This list is provided to the client software, in the form of a multimedia document or in some other form that can be interpreted by the client software. In response to input from the user, the client software then submits requests for the video clips to media server 102, and for text data to text content sources 104 and accepts the data provided in response to the requests.
A hyper-text list of video clips and news articles can include "thumbnail" views of video clips presented along with other identifying information for the clips. The thumbnail can be a static image, or can change while it is presented to the user. The frame chosen to be displayed can be chosen at the time the video clip is initially processed. For instance, the most stationary portion of the clip can be used to as the representative frame to display in the thumbnail view. A time-varying thumbnail can sample frames from the video clip, for example sampling uniformly in time, or using a previously chosen sequence of relatively stationary frames.
In an alternative mode of operation, rather than search for audio-video data that has already been stored - 11 - in the media database, the user can provide profile information in advance of the data being available. Then, as relevant audio-video data is processed, it can be provided to the user, for example, as a personalized broadcast .
Referring to Fig. 2, the details of a software audio-video processor 201 which executes on audio-video processing server (AVPS) 101 (shown in Fig. 1) is depicted. Each of the illustrated software modules executes on one or more of the processors in AVPS 101 and communicates with other modules in audio-video processor 201 through shared memory buffers in the working storage. Execution of the modules can be allocated to the multiple processors in a variety of ways, for example, with one processor digitizing while another is performing segmentation. No intermediate disk files of the digitized video are created prior to the data passing through a segmentation coordinator 217.
A digitizer/decoder 225 provides an interface for the audio-video data accepted from audio-video sources 110. In the case of analog television data, digitizer/decoder 225 controls the acquisition of the analog signal and extraction and digitization of video, audio, and closed-caption data in the signal. In the case of digitized data, digitizer/decoder decodes the digital data stream and creates digital streams for the video, audio, and closed-caption data. The digital video stream in both cases includes individual digitized frames. Each frame, as it is extracted, is stored in an indexed location in a video buffer 214. The corresponding digitized audio frame is stored in an audio buffer 223 at the same index location, thus insuring audio/video synchronization. The digitized closed- caption text is stored in a CC buffer 228. The correspondence between audio data and closed caption data - 12- is preserved by keeping track of the number of bytes of CC data in a given video frame.
A video segmenter 213 performs two analyses on each frame. First, it computes a color histogram of the frame, and compares it with the histogram of the previous and the next frames to produce a video color difference quantity. This quantity is related to the rate of change of the image. This comparison is performed by summing the absolute differences of corresponding histogram values associated with different colors. In the second analysis, a pixel-to-pixel change comparison is made between the current frame and the previous frame by summing of absolute differences of all changes in the pixel intensity and color values to produce a pixel difference quantity. Both quantities computed by video segmenter 213 are passed to segmentation coordinator 217. An audio segmenter 216 measures the average audio level for each frame to produce an audio level quantity. The audio level quantity is also passed to segmentation coordinator 217.
A CC segmenter 227 analyzes the CC data in CC Buffer 228 and checks for the presence of punctuation marks such as periods and for CC special characters such as carets (">>") . The CC segmenter also checks for the occurrence of common words across sentence boundaries . CC segmenter 227 passes a signal to segmentation coordinator 217 indicating whether a scene change is likely to have taken place based on the CC text. For example, the same important words used in two frames contributes to an indication that both frames are part of the same scene .
In segmentation coordinator 217, the video color difference, pixel difference, and audio level quantities are compared to respective thresholds stored in the segmentation coordinator. If all of the quantities - 13 - exceed their respective thresholds, the segmentation coordinator determines whether a scene boundary should be declared at that point. This declaration is based in part on the signal received from CC segmenter 227. Once segmentation coordinator 217 determines that a scene boundary has occurred, the contents of video buffer 214, and audio buffer 223, including the audio- video data from the new scene, are passed to an encoder 231. Alternatively, the audio-video data can have been buffered in a temporary hard disk storage 234 and read by encoder 231 when the scene boundary is determined.
Encoder 231 compresses the video and audio data for a scene, and passes the compressed data to a media database 232, stored on media server 102. Media database 232 includes individual files for each of the scenes, as well as an index or directory to access those files.
Segmentation coordinator 217 also sends data for new entries to be stored in a content table 219 and a text table 220 in a database 218 stored on information server 103. The entry in content table 219 includes the location or index of the corresponding data stored in a media database 232, plus other information such as the duration of the scene and the name of the audio-video source. The entry in text table 220 contains all of the CC text for the given scene. The two entries are related by a common index so that once the database is searched for a specific word in the closed caption entries, all of the related video scenes locations can be found.
Segmentation coordinator 217 also passes the CC text for a scene to a text processor 240 executing on TPS 105. Text processor 240 matches the CC text to available text sources from text content sources 104.
The system can also include a speech recognizer. This speech recognizer takes, as an input, the audio portion of an audio-video program and produces a word - 14 - sequence corresponding to that audio data. This word sequence can be used to supplement, or can be used instead of, the CC text in the approach described above.
Referring to Fig. 3, in operation, the audio-video processor 201 determines whether each new frame of data begins a new scene. Audio-video processor 201 begins by capturing, in a buffer, multiple frames, for example a fixed duration interval, of the audio-video source data (step 336) . This buffer is part of digitizer/decoder 225. One frame of data is then stored in each of video buffer 214 and audio buffer 223 (step 337) . In parallel, the CC text for the entire buffer is stored in the CC buffer 228 (step 338) . In many video sequences, speakers, anchor persons, or other talking persons stop their speech as the video subject matter changes from one topic to another. Video editors almost always paste together video segments with quiet audio tracks at both ends of the segment. Therefore, a quiet period in the audio track of a video is usually an indication of a possible video scene change. Audio Level quantity (AL) is measured (step 339) by taking the average of the audio level of the data in audio buffer 223. AL is then compared to a pre-set Audio Level Threshold (ALτh) value (step 341) . If the Audio Level is less than the Audio Level Threshold, then a quiet frame has been reached that satisfies the audio level condition for a scene boundary, and further consideration of this frame as a scene boundary continues. If the level condition is not satisfied, a new frame, if available, is processed (step 349) . If the last frame of the buffer has been used then a new buffer is captured (step 336) otherwise the next frame from the buffer is read and processing continues (step 337) .
Consideration of the frame continues by computing a histogram of the colors in the frame. In this - 15 - embodiment, an 8-bit color palette (that is, 256 colors or color classes) is used to compute the histogram. A video color difference quantity (color spectrum, CS) is determined as the sum of the absolute differences of the corresponding histogram values (step 343) . The value of CS is compared to a threshold CSτh (step 345) and if CS is greater than CSτh, the processing determines that a significant change in the color of the frames has occurred (perhaps a new object is being depicted in the frame with new colors) suggesting the possible start of a new scene. If CS is not greater than CSτh, then processing continues with the next frame (step 349) .
Next the processing determines a pixel difference quantity (pixel spectrum, PS) (step 347) . PS is determined as the sum of the absolute differences between each of the intensity (luminance) and color (chrominance) values of corresponding pixels in the current frame and the previous frame. The value of PS is compared to a threshold PSτh (step 348) and if PS is greater than PSτh, the processing concludes that a significant change in the whole frame has occurred, for example, caused by a movement of an object in the frame suggesting the possible start of a new scene. If PS is not greater than PSτh, then processing continues with the next frame (step 349) .
In parallel with the steps involving the audio- video frame, the text stored in the CC buffer (at step 338) is examined (step 340) to locate punctuation characters, including period ("."), single caret (">"), double carets (">>"), triple carets (">>>"), marking a new sentence, subject, or speaker. If a new sentence is found (step 342) , then processing proceeds to identify the most important words in the CC text (step 344) . Important words are identified by determining the frequency of their appearance in a representative textual - 16 - corpus . The least frequently occurring words are ranked as the most important words . The processing compares the top words in the present CC buffer to those of the previous buffer and if the two buffers have several words in common, then the processing decides that the subject matter of the two buffers are probably the same and the two buffers belong to the same scene (step 346) and processing proceeds with the next buffer.
If the current frame is a candidate scene change based on the audio and video signals (step 348) and the current buffer is a candidate scene change based on the CC text (step 346) , then this frame is determined to be the first frame of a new scene (step 350) .
If a new scene is found, the video and audio data buffered for the entire scene is indexed and compressed as follows.
Processing determines whether to store the buffers belonging to the present scene as raw video files on disk buffers rather than passing them directly to the encoder (step 351) . If the disk storage mode is being used, then raw audio and video buffers for the new scene are written to the disk (step 353) . Otherwise, the audio and video buffers are passed directly to the encoder (step 352) . Processing for the scene finishes with passing information about the scene to information database 218, including the CC text, the length of the scene, the publisher of the video, the broadcast time, and the encoding algorithm with which the audio and video data is encoded or compressed (step 355) . An additional criterion can be used to locate scene boundaries based on the video signal . The average intensity (luminosity) of each frame is compared to a threshold. When the intensity is below a threshold, a scene change is declared. This intensity threshold can be used to detect black frames that would usually - 17 - indicate a scene change during a broadcast . Most broadcasts fade their programming to black before a commercial or between unrelated shows.
An additional criterion can be used to locate scene boundaries based on the audio signal. A fall in the audio level quantity below a threshold for more than a minimum duration is used to detect a pause or lull in a speech in a scene, and a scene boundary can be declared at that point. For example, a short pause (lull) by a speaker can be construed as the end of a sentence and can be used for detecting a scene where accurate punctuation in closed caption can not be available. A lull duration of one second would indicate a possible scene boundary. An additional criterion can be used to detect scene changes based on the CC text. Presence of a specified string or pattern can determine an automatic (overriding) boundary for scene. For example, when segmenting a video program covering the U.S. House of Representatives chamber discussions, automatic determination of a scene with every instance of the phrase "THE SPEAKER PRO TEMPORE : " is useful. This type of over-ride is most useful when a video signal is covering a live event or where very little action or movement is present (or likely to be) within the video stream itself.
Audio-video processing server (AVPS) 101 can be a multi-processor allowing parallel processing of the modules and tasks shown in Figs. 2 and 3. The tasks and processing steps are divided between separate processing threads or operating system processes. AVPS 101 uses threads to achieve real-time video encoding throughput, high segmentation speed, and other characteristics that affect user interface responsiveness. The pipelined architecture supported by these multiple threads provides - 18 - the end-users with access to the indexed videos only moments after the videos are received.
Even on a single processor system multi-threading can improve performance by judicious assignment of various tasks (stages) to separate threads. For example, as shown in Fig. 4, a video capture thread 458 is separate from a segmentation thread 459, allowing continuous (real-time) capture of video while the segmentation of the earlier video data is taking place. Using threads is especially advantageous when a process consists of multiple tasks of varying urgency and processing needs. For example, the capture task (performed by the capture thread 458) is an urgent task: if the capture falls behind, the system loses video information that cannot be recovered. The encoding process (performed by an encoding and database update thread 461) requires significant processing power, especially if the video clips are being compressed to a high compression factor. Moreover, in some applications, it can be required that the same video content to be encoded to several different compression factors, or using different video file formats, simultaneously. In these cases and others, performance of the overall system is enhanced if encoding tasks are distinct from the capture and segmentation tasks. Note that more than one encoding and database update threads can be running at the same time.
Control of the processing threads is initiated by a (callback) message from the video capture driver 457 awakening capture thread 458 which begins processing. As soon as a new scene is found by segmentation thread 459, a new encoding and database update thread 461 is spawned to encode the corresponding video buffer data and update the database. Since segmentation can take less time than encoding, a single segmentation thread can spawn multiple - 19 - encoding and database update threads . Note that at any given time, there can be multiple video buffers being processed in thread pipeline 456.
Fig. 5 is a schematic of the time evolution of pipeline stages (tasks) in audio-video processor 201 on a dual -processor computer, according to one aspect of the invention. This schematic can be generalized to any number of processors.
As was shown previously in Fig. 4, different processor threads are assigned to different stages of processing. In Fig. 5, the vertical axis labeled stage 564 spans the three major stages of processing: video capture 565, video segmentation 566, and video encoding/database updates 567. For ease of discussion we assume that one of the processors (Px) is assigned to handle the capture task. In other words, the capture thread runs on Px and handles the capture stage 565. The segmentation stage 566 is handled on threads running on either processor. Encoding and database stage 567 is handled by multiple threads running concurrently on the two processors. The horizontal axis represents time 570. The six time intervals marked on the horizontal axis represent the equal -length captured buffers and Δ represents the duration of each buffer. For the purposes of this discussion, the capture of the first buffer Bx 571 begins at time=0 (568) and ends at time=Δ(569). Similarly, the capture of the second buffer B2 572 begins immediately after the completion of the capture of Bx . Each buffer Bx contains, typically, a plurality of frames with video and related information.
Within a time interval ΔG 576 after the start 573 of capture of B17 the segmentation process S1 577 begins (which results in a segment or a scene) for buffer B1. Note that in general the duration of each segmentation process Sλ , S2, etc. is not the same as that of the - 20 - buffers B1# B2, . . . since the duration of each segmentation process is the same as the length of a scene, while buffers are all equal size. To make this difference clear, we extended the buffer boundary line 575 between Bx and B2 to cross Sλ . Also note that the various segments Si-Sg (corresponding to 577-582) are typically of different lengths. In fact, as in Su a single segment can contain video frames from two (or more) buffers. Note also that S_ can begin before the end of B_ since the Bj is a full buffer while the segmentation process takes place on each frame within a full buffer. The time offset ΔG 576 is necessary because in many situations (especially when closed captioning is done in real-time by the video broadcasters) the closed caption text lags behind the video. Typically ΔG is greater than this time lag.
At any given time there is only one segmentation thread: segmentation algorithms are not as latency sensitive as the capture process and they are not as computationally intensive as the encoding process. Depending on the details of the operating system scheduling, either of the two processors can be assigned to Sl t S2, etc.
The encoding process Ex 558 begins a time ΔE 587 after the start time 583 of Sx. E1(P2), which we assume is running on the first processor Pλ , can take longer than the actual duration of the scene. This is because encoding is a time consuming process and Px can be performing multiple tasks. The gray projection 584 demonstrates the mapping between Sx and the corresponding encoding process E .
Note, that the second segment (S2 578) is ready for encoding E2 before E_ is completed. The pipeline architecture described here enables the audio-video processor to begin encoding the second segment before - 21 - encoding of the first segment is completed and thereby provide real-time scene detection, capture, and encoding. Note that E_ and E2 run concurrently for part of their lifetime (and in this case on different processors) . Similarly, E3 589 begins before the completion of E2 and at certain times E4592, E5590, and E6 593 all run concurrently .
The gray projection 585 shows the mapping between S4 and E4. Note that E4 takes significantly longer than S4, partly because E4 is running on Ε>λ which is also occupied with the capture process. The gray projection 586 shows the mapping between S6 582 and E6 593. Note that E6 ends shortly after B6, thus maintaining the throughput of the pipeline without any congestion.
It is understood to those skilled in the art of design of electronic and computer systems that different arrangements of the components are possible. For example, functions carried out by separate computers can be carried out on a single computer, and a single function can be distributed over multiple computers. For instance, the functions of media server 102 and information server 103 can be provided by a single computer. Also, information database 218 and media database 232 can be combined into a single database, and can use a variety of data storage approaches, for example, using a file system or an object oriented or a relational database. Other forms of data communication networks can be used, for example, using only a local area network. Also, client computers 109 can be diskless computers, receiving client software over a data network.
It is also understood that functions implemented by software executing a general purpose computer can also be implemented using a special purpose computer or other specialized hardware. - 22 -
Finally, it is also understood that the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims .
What is claimed is:

Claims

- 23 -
1. A method for fully automated real-time processing and delivery of an audio-video program, the method comprising the steps of: accepting the audio-video program; detecting a plurality of discrete scenes within the program based on the content of the program; for each of the discrete scenes, determining textual information related to the scene, indexing the scene using the textual information, storing audio-video data from the scene, and
storing index data for the scene; matching a description of interests of a user to stored index data for the scenes; and providing audio-video data from the matching scenes to the user.
2. The method of claim 1 wherein the steps of detecting scenes, determining textual information related to the scene, indexing the scenes, and storing data from the scenes, all occur in a pipelined manner while accepting the program, and further comprising the step of beginning the step of providing audio-video data to the user prior to completion of accepting of the program, whereby the user can view scenes of the program with low delay relative to the accepting of those scenes in the program.
3. The method of claim 1 further comprising the steps of : accepting a text document; and for each of the discrete scenes, - 24 - further matching the scene to the text document , and storing the matching information associating scene to the document if a match is found; wherein providing audio-video data from the matching scenes includes providing any stored matching information associating the scenes to the text document.
4. The method of claim 1 wherein the step of accepting the audio-video program includes the step of accepting the program from a satellite receiver.
5. The method of claim 1 wherein the step of detecting a plurality of discrete scenes within the program based on the content of the program includes the step of processing both the audio and video portions of the program.
6. The method of claim 5 wherein the step of processing both the audio and video portions of the program includes the step of comparing a color distribution at one time in the program to the color distribution at another time.
7. The method of claim 5 wherein the step of processing both the audio and video portions of the program includes the step of computing a statistic of the rate of change of the video signal .
8. The method of claim 5 wherein the step of detecting a plurality of discrete scenes further includes the step of processing the closed captions of the program. - 25 -
9. The method of claim 8 wherein the step of processing the closed captions of the program includes the step of comparing the frequency of previously selected words in one portion of the program to their frequency in another portion of the program.
10. The method of claim 8 wherein the step of processing the closed captions of the program includes the step of detecting predetermined character sequences in the closed captions.
11. The method of claim 8 wherein the step of determining textual information related to the scene includes the step of processing the closed captions of the scene .
12. The method of claim 11 wherein the step of determining textual information related to the scene further includes the step of processing the audio portion of the scene using an automatic speech recognizer.
13. The method of claim 1 wherein the step of providing audio-video data to the user includes the step of passing the data over a data communication network to the user.
14. The method of claim 13 wherein the step of the data is passed using a data streaming protocol and the Internet Protocol (IP) network protocol.
15. The method of claim 13 wherein the step of passing the data over the data communication network includes the step of multicasting the data for reception by the user and a plurality of other users. - 26 -
16. The method of claim 13 wherein the data is compressed in accordance with the communication capacity of a communication path over the data communication network to the user.
17. The method of claim 1 further comprising the step of accepting the description of interests of a user.
18. The method of claim 17 wherein the description of interests of a user includes keywords associated with topics of interest to that user.
19. The method of claim 17 wherein the description of interests of a user includes category names associated with topics of interest to that user.
20. The method of claim 17 wherein the step of accepting the description of interests of a user occurs prior to accepting the program, whereby the description of interests forms a profile for that user.
21. The method of claim 17 wherein the step of accepting the description of interests of a user occurs after accepting the program, whereby the description of interests forms a search request from that user.
22. The method of claim 21 further comprising the steps of : providing identifiers for the matching scenes to the user; and providing audio-video data from the matching scenes to the user is in response to accepting one or more request from the user that include the identifiers for the matching scenes . - 27 -
23. The method of claim 22 wherein the step of providing identifiers for the matching scenes includes providing an image from each of the matching scenes.
24. The method of claim 1 further comprising the step of compressing or encoding the video data for a scene prior to storing that video data.
25. A system for fully automated real-time processing and delivery of an audio-video program, the system comprising: a segmenter/indexer for accepting the program and for providing data for a plurality of discrete scenes in the program, wherein the data includes index data based on textual data related to the plurality of discrete scenes; a media database for storing audio-video data for the plurality of discrete scenes; an information database for storing the index data ; and a communication network coupling the media database and a client computer for providing audio-video data stored in the media data and which match a description of interests for a user of the client computer.
26. The system of claim 25 wherein the information database further stores information relating the plurality of discrete scenes to a plurality of text documents, and wherein the communication network is further used for providing references to matching ones of the text documents related to the provided audio-video data. - 28 -
27. The system of claim 25 further comprising a speech recognizer for processing audio data in the program and providing the output words to the segmenter/indexer .
28. Software stored on computer readable media for causing a computer system to perform the functions of: accepting an audio-video program; detecting a plurality of discrete scenes within the program based on the content of the program; for each of the discrete scenes, determining textual information related to the scene, indexing the scene using the textual information, storing audio-video data from the scene, and storing index data for the scene; matching a description of interests of a user to stored index data for the scenes; and providing audio-video data from the matching scenes to the user.
29. The software of claim 28 further causing the computer system to concurrently perform the function of accepting the audio-video program, and at least one of the functions of detecting scenes, determining textual information related to the scene, indexing the scenes, or storing data from the scenes, and to begin the function of providing audio-video data to the user prior to completion of accepting of the program, whereby the user can view scenes of the program with low delay relative to accepting those scenes in the program. - 29 -
30. The software of claim 28 further causing the computer system to perform the functions of: accepting a text document; and for each of the discrete scenes, further matching the scene to the text document, and storing the matching information associating scene to the document if a match is found; and wherein providing audio-video data from the matching scenes includes providing any stored matching information associating the scenes to the text document.
31. The software of claim 28 wherein providing audio-video data to the user includes sending the data over a data communication network to the user.
32. The software of claim 31 wherein the data is sent using a data streaming protocol and the Internet
Protocol (IP) network protocol.
33. The software of claim 32 wherein sending the data over the data communication network includes multicasting the data for reception by the user and a plurality of other users.
34. The software of claim 28 further causing the computer system to perform the functions of accepting the description of interests of a user.
35. The software of claim 34 wherein the description of interests of a user includes keywords associated with topics of interest to that user.
36. The software of claim 34 wherein accepting the description of interests of a user includes receiving - 30 - a communication using the hyper-text transport protocol (http) .
37. The software of claim 28 further causing the computer system to perform the function of compressing or encoding the video data for a scene prior to storing that video data.
PCT/US1999/003028 1998-02-13 1999-02-11 Processing and delivery of audio-video information WO1999041684A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU26744/99A AU2674499A (en) 1998-02-13 1999-02-11 Processing and delivery of audio-video information

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10459798P 1998-02-13 1998-02-13
US2357698A 1998-02-13 1998-02-13
US09/023,576 1998-02-13
US60/104,597 1998-02-13

Publications (1)

Publication Number Publication Date
WO1999041684A1 true WO1999041684A1 (en) 1999-08-19

Family

ID=26697345

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/003028 WO1999041684A1 (en) 1998-02-13 1999-02-11 Processing and delivery of audio-video information

Country Status (2)

Country Link
AU (1) AU2674499A (en)
WO (1) WO1999041684A1 (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19963045A1 (en) * 1999-12-24 2001-06-28 Prodac Media Ag System and method for providing data requested by users, in particular audio and / or video data
GB2358938A (en) * 1999-09-08 2001-08-08 Tveyes Com Inc Keyword searching of closed caption television programming on multiple channels
FR2807902A1 (en) * 2000-04-17 2001-10-19 Thomson Multimedia Sa Video image sequence cut position determination having luminance value determined and histogram frames compared then minimum differences summed and set level compared.
WO2001093091A2 (en) * 2000-06-01 2001-12-06 Koninklijke Philips Electronics N.V. Content with bookmarks obtained from an audience's appreciation
EP1220117A2 (en) * 2000-12-28 2002-07-03 Pioneer Corporation An information delivery device and method, an information retrieving device and method, an information delivery retrieving system, and information recording medium
EP1221659A1 (en) * 2000-12-28 2002-07-10 K.K. Asobou's Automatic image retrieval system
EP1222634A1 (en) * 1999-10-11 2002-07-17 Electronics and Telecommunications Research Institute Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing
AU759681B2 (en) * 1999-09-27 2003-04-17 Canon Kabushiki Kaisha Method and system for addressing audio-visual content fragments
WO2003049430A2 (en) * 2001-12-06 2003-06-12 Koninklijke Philips Electronics N.V. Adaptive environment system and method of providing an adaptive environment
EP1381225A2 (en) * 2002-07-09 2004-01-14 Samsung Electronics Co., Ltd. Scene change detector and method thereof
US6751398B2 (en) 2000-12-22 2004-06-15 Koninklijke Philips Electronics N.V. System and method for determining whether a video program has been previously recorded
US6798912B2 (en) 2000-12-18 2004-09-28 Koninklijke Philips Electronics N.V. Apparatus and method of program classification based on syntax of transcript information
EP1484763A1 (en) * 2003-06-04 2004-12-08 Pioneer Corporation Music program contents menu creation apparatus and method
US6998527B2 (en) 2002-06-20 2006-02-14 Koninklijke Philips Electronics N.V. System and method for indexing and summarizing music videos
AU2003252853B2 (en) * 2000-09-19 2006-03-09 Canon Kabushiki Kaisha Method and System for Addressing Audio-visual Content Fragments
US7047191B2 (en) * 2000-03-06 2006-05-16 Rochester Institute Of Technology Method and system for providing automated captioning for AV signals
US7181757B1 (en) 1999-10-11 2007-02-20 Electronics And Telecommunications Research Institute Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing
US7210157B2 (en) 2000-12-18 2007-04-24 Koninklijke Philips Electronics N.V. Apparatus and method of program classification using observed cues in the transcript information
US7383504B1 (en) * 1999-08-30 2008-06-03 Mitsubishi Electric Research Laboratories Method for representing and comparing multimedia content according to rank
US7461168B1 (en) 1999-09-27 2008-12-02 Canon Kabushiki Kaisha Method and system for addressing audio-visual content fragments
WO2008156894A2 (en) * 2007-04-05 2008-12-24 Raytheon Company System and related techniques for detecting and classifying features within data
US8037502B1 (en) 2000-01-12 2011-10-11 Digital Connection, LLC Method and apparatus for archiving media content
US20130036124A1 (en) * 2011-08-02 2013-02-07 Comcast Cable Communications, Llc Segmentation of Video According to Narrative Theme
US9420318B2 (en) 2004-07-30 2016-08-16 Broadband Itv, Inc. Method for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9491511B2 (en) 2004-07-30 2016-11-08 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US9529870B1 (en) 2000-09-14 2016-12-27 Network-1 Technologies, Inc. Methods for linking an electronic media work to perform an action
US9584868B2 (en) 2004-07-30 2017-02-28 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9635429B2 (en) 2004-07-30 2017-04-25 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10225584B2 (en) 1999-08-03 2019-03-05 Videoshare Llc Systems and methods for sharing video with advertisements over a network
US10277654B2 (en) 2000-03-09 2019-04-30 Videoshare, Llc Sharing a streaming video
US11252459B2 (en) 2004-07-30 2022-02-15 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US11570521B2 (en) 2007-06-26 2023-01-31 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0805405A2 (en) * 1996-02-05 1997-11-05 Texas Instruments Incorporated Motion event detection for video indexing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0805405A2 (en) * 1996-02-05 1997-11-05 Texas Instruments Incorporated Motion event detection for video indexing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GAUCH S ET AL: "The vision digital video library", INFORMATION PROCESSING & MANAGEMENT (INCORPORATING INFORMATION TECHNOLOGY), vol. 33, no. 4, 1 July 1997 (1997-07-01), pages 413-426, XP004087986 *
SATOU T ET AL: "VIDEO ACQUISITION ON LIVE HYPERMEDIA", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, WASHINGTON, MAY 15 - 18, 1995, 15 May 1995 (1995-05-15), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 175 - 181, XP000632099 *
TANIGUCHI Y ET AL: "AN INTUITIVE AND EFFICIENT ACCESS INTERFACE TO REAL-TIME INCOMING VIDEO BASED ON AUTOMATIC INDEXING", PROCEEDINGS OF ACM MULTIMEDIA '95, SAN FRANCISCO, NOV. 5 - 9, 1995, 5 November 1995 (1995-11-05), ASSOCIATION FOR COMPUTING MACHINERY, pages 25 - 33, XP000599026 *

Cited By (134)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10225584B2 (en) 1999-08-03 2019-03-05 Videoshare Llc Systems and methods for sharing video with advertisements over a network
US10362341B2 (en) 1999-08-03 2019-07-23 Videoshare, Llc Systems and methods for sharing video with advertisements over a network
US7383504B1 (en) * 1999-08-30 2008-06-03 Mitsubishi Electric Research Laboratories Method for representing and comparing multimedia content according to rank
GB2358938A (en) * 1999-09-08 2001-08-08 Tveyes Com Inc Keyword searching of closed caption television programming on multiple channels
AU759681B2 (en) * 1999-09-27 2003-04-17 Canon Kabushiki Kaisha Method and system for addressing audio-visual content fragments
US7461168B1 (en) 1999-09-27 2008-12-02 Canon Kabushiki Kaisha Method and system for addressing audio-visual content fragments
EP1222634A1 (en) * 1999-10-11 2002-07-17 Electronics and Telecommunications Research Institute Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing
EP1222634A4 (en) * 1999-10-11 2006-07-05 Korea Electronics Telecomm Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing
US7181757B1 (en) 1999-10-11 2007-02-20 Electronics And Telecommunications Research Institute Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing
DE19963045A1 (en) * 1999-12-24 2001-06-28 Prodac Media Ag System and method for providing data requested by users, in particular audio and / or video data
US8037502B1 (en) 2000-01-12 2011-10-11 Digital Connection, LLC Method and apparatus for archiving media content
US11388448B2 (en) 2000-01-12 2022-07-12 Joachim Kim Method and apparatus for archiving media content
US7047191B2 (en) * 2000-03-06 2006-05-16 Rochester Institute Of Technology Method and system for providing automated captioning for AV signals
US10523729B2 (en) 2000-03-09 2019-12-31 Videoshare, Llc Sharing a streaming video
US10277654B2 (en) 2000-03-09 2019-04-30 Videoshare, Llc Sharing a streaming video
FR2807902A1 (en) * 2000-04-17 2001-10-19 Thomson Multimedia Sa Video image sequence cut position determination having luminance value determined and histogram frames compared then minimum differences summed and set level compared.
US6810145B2 (en) 2000-04-17 2004-10-26 Thomson Licensing, S.A. Process for detecting a change of shot in a succession of video images
WO2001093091A2 (en) * 2000-06-01 2001-12-06 Koninklijke Philips Electronics N.V. Content with bookmarks obtained from an audience's appreciation
WO2001093091A3 (en) * 2000-06-01 2003-02-27 Koninkl Philips Electronics Nv Content with bookmarks obtained from an audience's appreciation
US10367885B1 (en) 2000-09-14 2019-07-30 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
US9544663B1 (en) 2000-09-14 2017-01-10 Network-1 Technologies, Inc. System for taking action with respect to a media work
US10063940B1 (en) 2000-09-14 2018-08-28 Network-1 Technologies, Inc. System for using extracted feature vectors to perform an action associated with a work identifier
US9883253B1 (en) 2000-09-14 2018-01-30 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a product
US10063936B1 (en) 2000-09-14 2018-08-28 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a work identifier
US10073862B1 (en) 2000-09-14 2018-09-11 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
US9832266B1 (en) 2000-09-14 2017-11-28 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with identified action information
US9824098B1 (en) 2000-09-14 2017-11-21 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with identified action information
US10621227B1 (en) 2000-09-14 2020-04-14 Network-1 Technologies, Inc. Methods for using extracted features to perform an action
US9807472B1 (en) 2000-09-14 2017-10-31 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a product
US10621226B1 (en) 2000-09-14 2020-04-14 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
US10552475B1 (en) 2000-09-14 2020-02-04 Network-1 Technologies, Inc. Methods for using extracted features to perform an action
US10540391B1 (en) 2000-09-14 2020-01-21 Network-1 Technologies, Inc. Methods for using extracted features to perform an action
US10521471B1 (en) 2000-09-14 2019-12-31 Network-1 Technologies, Inc. Method for using extracted features to perform an action associated with selected identified image
US9805066B1 (en) 2000-09-14 2017-10-31 Network-1 Technologies, Inc. Methods for using extracted features and annotations associated with an electronic media work to perform an action
US10521470B1 (en) 2000-09-14 2019-12-31 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
US9529870B1 (en) 2000-09-14 2016-12-27 Network-1 Technologies, Inc. Methods for linking an electronic media work to perform an action
US9538216B1 (en) 2000-09-14 2017-01-03 Network-1 Technologies, Inc. System for taking action with respect to a media work
US9536253B1 (en) 2000-09-14 2017-01-03 Network-1 Technologies, Inc. Methods for linking an electronic media work to perform an action
US9781251B1 (en) 2000-09-14 2017-10-03 Network-1 Technologies, Inc. Methods for using extracted features and annotations associated with an electronic media work to perform an action
US9558190B1 (en) 2000-09-14 2017-01-31 Network-1 Technologies, Inc. System and method for taking action with respect to an electronic media work
US10108642B1 (en) 2000-09-14 2018-10-23 Network-1 Technologies, Inc. System for using extracted feature vectors to perform an action associated with a work identifier
US10057408B1 (en) 2000-09-14 2018-08-21 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a work identifier
US10205781B1 (en) 2000-09-14 2019-02-12 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
US10303713B1 (en) 2000-09-14 2019-05-28 Network-1 Technologies, Inc. Methods for using extracted features to perform an action
US10303714B1 (en) 2000-09-14 2019-05-28 Network-1 Technologies, Inc. Methods for using extracted features to perform an action
US10305984B1 (en) 2000-09-14 2019-05-28 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
AU2003252853B2 (en) * 2000-09-19 2006-03-09 Canon Kabushiki Kaisha Method and System for Addressing Audio-visual Content Fragments
US7210157B2 (en) 2000-12-18 2007-04-24 Koninklijke Philips Electronics N.V. Apparatus and method of program classification using observed cues in the transcript information
US6798912B2 (en) 2000-12-18 2004-09-28 Koninklijke Philips Electronics N.V. Apparatus and method of program classification based on syntax of transcript information
US6751398B2 (en) 2000-12-22 2004-06-15 Koninklijke Philips Electronics N.V. System and method for determining whether a video program has been previously recorded
EP1220117A2 (en) * 2000-12-28 2002-07-03 Pioneer Corporation An information delivery device and method, an information retrieving device and method, an information delivery retrieving system, and information recording medium
EP1221659A1 (en) * 2000-12-28 2002-07-10 K.K. Asobou's Automatic image retrieval system
EP1220117A3 (en) * 2000-12-28 2004-02-11 Pioneer Corporation An information delivery device and method, an information retrieving device and method, an information delivery retrieving system, and information recording medium
WO2003049430A2 (en) * 2001-12-06 2003-06-12 Koninklijke Philips Electronics N.V. Adaptive environment system and method of providing an adaptive environment
WO2003049430A3 (en) * 2001-12-06 2004-10-07 Koninkl Philips Electronics Nv Adaptive environment system and method of providing an adaptive environment
US6998527B2 (en) 2002-06-20 2006-02-14 Koninklijke Philips Electronics N.V. System and method for indexing and summarizing music videos
EP1381225A2 (en) * 2002-07-09 2004-01-14 Samsung Electronics Co., Ltd. Scene change detector and method thereof
EP1381225A3 (en) * 2002-07-09 2004-05-06 Samsung Electronics Co., Ltd. Scene change detector and method thereof
EP1484763A1 (en) * 2003-06-04 2004-12-08 Pioneer Corporation Music program contents menu creation apparatus and method
US9491512B2 (en) 2004-07-30 2016-11-08 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US10893334B2 (en) 2004-07-30 2021-01-12 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US11601697B2 (en) 2004-07-30 2023-03-07 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9936240B2 (en) 2004-07-30 2018-04-03 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11516525B2 (en) 2004-07-30 2022-11-29 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9998791B2 (en) 2004-07-30 2018-06-12 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US10028026B2 (en) 2004-07-30 2018-07-17 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US10028027B2 (en) 2004-07-30 2018-07-17 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US10045084B2 (en) 2004-07-30 2018-08-07 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US10057649B2 (en) 2004-07-30 2018-08-21 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US11272233B2 (en) 2004-07-30 2022-03-08 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9888287B2 (en) 2004-07-30 2018-02-06 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV services subscribers
US9866910B2 (en) 2004-07-30 2018-01-09 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US9866909B2 (en) 2004-07-30 2018-01-09 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US11259089B2 (en) 2004-07-30 2022-02-22 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US10129597B2 (en) 2004-07-30 2018-11-13 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US10129598B2 (en) 2004-07-30 2018-11-13 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV services subscribers
US11259059B2 (en) 2004-07-30 2022-02-22 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US11259060B2 (en) 2004-07-30 2022-02-22 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US11252476B2 (en) 2004-07-30 2022-02-15 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US9648388B2 (en) 2004-07-30 2017-05-09 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV services subscribers
US11252459B2 (en) 2004-07-30 2022-02-15 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US10791351B2 (en) 2004-07-30 2020-09-29 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9641896B2 (en) 2004-07-30 2017-05-02 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US10306321B2 (en) 2004-07-30 2019-05-28 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US10785517B2 (en) 2004-07-30 2020-09-22 Broadband Itv, Inc. Method for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9635429B2 (en) 2004-07-30 2017-04-25 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9635423B2 (en) 2004-07-30 2017-04-25 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV services subscribers
US10341730B2 (en) 2004-07-30 2019-07-02 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US10341699B2 (en) 2004-07-30 2019-07-02 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US10349100B2 (en) 2004-07-30 2019-07-09 Broadband Itv, Inc. Method for addressing on-demand TV program content on TV services platform of a digital TV services provider
US10349101B2 (en) 2004-07-30 2019-07-09 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9635395B2 (en) 2004-07-30 2017-04-25 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9584868B2 (en) 2004-07-30 2017-02-28 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10375428B2 (en) 2004-07-30 2019-08-06 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US10555014B2 (en) 2004-07-30 2020-02-04 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US10491955B2 (en) 2004-07-30 2019-11-26 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV services subscribers
US10491954B2 (en) 2004-07-30 2019-11-26 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US10506269B2 (en) 2004-07-30 2019-12-10 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9578376B2 (en) 2004-07-30 2017-02-21 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US9491497B2 (en) 2004-07-30 2016-11-08 Broadband Itv, Inc. Method for addressing on-demand TV program content on TV services platform of a digital TV services provider
US9491511B2 (en) 2004-07-30 2016-11-08 Broadband Itv, Inc. Video-on-demand content delivery method for providing video-on-demand services to TV service subscribers
US10536750B2 (en) 2004-07-30 2020-01-14 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US10536751B2 (en) 2004-07-30 2020-01-14 Broadband Itv, Inc. Video-on-demand content delivery system for providing video-on-demand services to TV service subscribers
US9420318B2 (en) 2004-07-30 2016-08-16 Broadband Itv, Inc. Method for addressing on-demand TV program content on TV services platform of a digital TV services provider
US11589093B2 (en) 2007-03-12 2023-02-21 Broadband Itv, Inc. System for addressing on-demand TV program content on TV services platform of a digital TV services provider
US11245942B2 (en) 2007-03-12 2022-02-08 Broadband Itv, Inc. Method for addressing on-demand TV program content on TV services platform of a digital TV services provider
WO2008156894A3 (en) * 2007-04-05 2009-07-09 Raytheon Co System and related techniques for detecting and classifying features within data
US8566314B2 (en) 2007-04-05 2013-10-22 Raytheon Company System and related techniques for detecting and classifying features within data
WO2008156894A2 (en) * 2007-04-05 2008-12-24 Raytheon Company System and related techniques for detecting and classifying features within data
US10154296B2 (en) 2007-06-26 2018-12-11 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9888288B2 (en) 2007-06-26 2018-02-06 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10623793B2 (en) 2007-06-26 2020-04-14 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9641902B2 (en) 2007-06-26 2017-05-02 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10277937B2 (en) 2007-06-26 2019-04-30 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9894417B2 (en) 2007-06-26 2018-02-13 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10582243B2 (en) 2007-06-26 2020-03-03 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10264303B2 (en) 2007-06-26 2019-04-16 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9648390B2 (en) 2007-06-26 2017-05-09 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on view preferences for minimizing navigation in VOD program selection
US11695976B2 (en) 2007-06-26 2023-07-04 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10149015B2 (en) 2007-06-26 2018-12-04 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9654833B2 (en) 2007-06-26 2017-05-16 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11265589B2 (en) 2007-06-26 2022-03-01 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9894419B2 (en) 2007-06-26 2018-02-13 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11272235B2 (en) 2007-06-26 2022-03-08 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11277669B2 (en) 2007-06-26 2022-03-15 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11290763B2 (en) 2007-06-26 2022-03-29 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10567846B2 (en) 2007-06-26 2020-02-18 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US9973825B2 (en) 2007-06-26 2018-05-15 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11570500B2 (en) 2007-06-26 2023-01-31 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11570521B2 (en) 2007-06-26 2023-01-31 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US11582498B2 (en) 2007-06-26 2023-02-14 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US10560733B2 (en) 2007-06-26 2020-02-11 Broadband Itv, Inc. Dynamic adjustment of electronic program guide displays based on viewer preferences for minimizing navigation in VOD program selection
US20130036124A1 (en) * 2011-08-02 2013-02-07 Comcast Cable Communications, Llc Segmentation of Video According to Narrative Theme
US10467289B2 (en) * 2011-08-02 2019-11-05 Comcast Cable Communications, Llc Segmentation of video according to narrative theme

Also Published As

Publication number Publication date
AU2674499A (en) 1999-08-30

Similar Documents

Publication Publication Date Title
WO1999041684A1 (en) Processing and delivery of audio-video information
CA2924065C (en) Content based video content segmentation
CN106686404B (en) Video analysis platform, matching method, and method and system for accurately delivering advertisements
KR100915847B1 (en) Streaming video bookmarks
US9202523B2 (en) Method and apparatus for providing information related to broadcast programs
US6877134B1 (en) Integrated data and real-time metadata capture system and method
EP2541963A2 (en) Method for identifying video segments and displaying contextually targeted content on a connected television
US20030131362A1 (en) Method and apparatus for multimodal story segmentation for linking multimedia content
US20110225136A1 (en) Video search method, video search system, and method thereof for establishing video database
US20050028194A1 (en) Personalized news retrieval system
US20090177758A1 (en) Systems and methods for determining attributes of media items accessed via a personal media broadcaster
US20020175932A1 (en) Method for summarizing news video stream using synthetic key frame based upon video text
JPH07114567A (en) Method and device for retrieving video
KR20030026529A (en) Keyframe Based Video Summary System
WO2003042866A2 (en) Method and system for personal information retrieval, update and presentation
JP2003511934A (en) Automatically locate, learn and extract commercial and other video content based on signs
JP2005513663A (en) Family histogram based techniques for detection of commercial and other video content
US20030208761A1 (en) Client-based searching of broadcast carousel data
KR100374040B1 (en) Method for detecting caption synthetic key frame in video stream
CN102486800A (en) Video searching method, system and method for establishing video database
Gauch et al. Real time video scene detection and classification
RU2413990C2 (en) Method and apparatus for detecting content item boundaries
Lienhart Indexing and retrieval of digital video sequences based on automatic text recognition
Mohan Text-based search of TV news stories
JP2002354391A (en) Method for recording program signal, and method for transmitting record program control signal

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase