US20020186768A1 - Video content detection method and system leveraging data-compression constructs - Google Patents

Video content detection method and system leveraging data-compression constructs Download PDF

Info

Publication number
US20020186768A1
US20020186768A1 US09/854,511 US85451101A US2002186768A1 US 20020186768 A1 US20020186768 A1 US 20020186768A1 US 85451101 A US85451101 A US 85451101A US 2002186768 A1 US2002186768 A1 US 2002186768A1
Authority
US
United States
Prior art keywords
data
video
content
frame
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/854,511
Other versions
US6714594B2 (en
Inventor
Nevenka Dimitrova
Thomas McGee
Gerhardus Mekenkamp
Edwin Salomons
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority to US09/854,511 priority Critical patent/US6714594B2/en
Assigned to KONINKLIJKE PHILIPS ELECTRONICS N.V. reassignment KONINKLIJKE PHILIPS ELECTRONICS N.V. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MEKENKAMP, GERHARD, SALOMONS, EDWIN, DIMITROVA, NEVENKA, MCGEE, THOMAS, NESVADBA, JAN ALEXIS DANIEL
Priority to KR1020037000892A priority patent/KR100869038B1/en
Priority to PCT/IB2002/001633 priority patent/WO2002093929A1/en
Priority to EP02727885A priority patent/EP1393569A1/en
Priority to CNB028098315A priority patent/CN100493186C/en
Priority to JP2002590671A priority patent/JP2004522354A/en
Publication of US20020186768A1 publication Critical patent/US20020186768A1/en
Publication of US6714594B2 publication Critical patent/US6714594B2/en
Application granted granted Critical
Assigned to KONINKLIJKE PHILIPS ELECTRONICS N.V. reassignment KONINKLIJKE PHILIPS ELECTRONICS N.V. CORRECTED COVER SHEET TO SPECIFY EACH INVENTOR'S NAME, PREVIOUSLY RECORDED AT REEL/FRAME 012461/0818 (ASSIGNMENT OF ASSIGNOR'S INTEREST) Assignors: MEKENKAMP, GERHARD, SALOMONS, EDWIN, DIMITROVA, NEVENKA, MCGEE, THOMAS, NESVADBA, JAN ALEXIS DANIEL
Priority to JP2009021705A priority patent/JP2009135957A/en
Adjusted expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2353Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor

Definitions

  • the invention relates to the detection of content in video data streams, for example, commercials and more particularly to the accurate identification of transitions from one type of content to another, such as the temporal boundaries of commercial.
  • Personal video receivers/recorders devices that modify and/or record the content of broadcast video
  • a personal video recorder which automatically records programs on a hard disk responsively to stored user preferences.
  • content detection One of the features under investigation for such systems is content detection.
  • a system that can detect commercials may allow substitute advertisements to be inserted in a video stream (“commercial swapping”) or temporary halting of the video at the end of a commercial to prevent a user, momentarily distracted during a commercial, from missing any of the main program content.
  • Another technique is to measure the temporal distance between black frame sequences to determine a presence of a commercial.
  • Another technique identified commercials based on matching images. In other words, differences in the qualities of the image content were used as an indicator. Also known is the use of a predetermined indicator within the video stream which demarcates commercial boundaries, but this is simply a method of indicating a previously known commercial, not a method of detecting them.
  • Commercial detection based on trained neural networks configured to distinguish content based on analysis of the video stream have been proposed, but have not met with much success so far. Also, neural networks are complex and expensive to implement for this purpose.
  • the invention employs low and mid-level features that are automatically generated in the process of compressing video as inputs to various classifier tools.
  • the classifier tools are trained to identify commercial features and generate metrics responsively to them.
  • the metrics are employed in combination (a super-classifier) to detect the boundaries of the commercials.
  • ASIC application-specific integrated circuit
  • ASIP application-specific instruction-set processor
  • a dedicated chip normally performs image compression on consumer appliances, since the processes involved require high speed.
  • One aspect of the invention is to provide a way to leverage the results of the compression process, not only for compression, but also for the analysis of the video required to detect certain types of content.
  • One example of a device that can compress video implements the Motion Pictures Expert Group (MPEG) compression scheme known as MPEG-2.
  • MPEG-2 Motion Pictures Expert Group
  • video data are represented by video sequences, each including of a group of pictures (GOP), each GOP including pieces of data that describe the pictures or “frames” that make up the video.
  • the frame is the primary coding unit of the video sequence.
  • a picture consists of three rectangular matrices, one representing luminance (the intensity of the various portions of a frame) and two representing chrominance (Cb and Cr; the color of the various portions of a frame).
  • the luminance matrix has an even number of rows and columns.
  • the chrominance matrices are one-half the size of the Y matrix in each direction (horizontal and vertical) because human perception is less detail-sensitive for color than it is for luminosity.
  • Each frame is further divided into one or more contiguous macroblocks, grouped into “slices.”
  • the order of the macroblocks within a slice is from left-to-right and top-to-bottom.
  • the macroblock is the basic coding unit in the MPEG-2 scheme. It represents a 16 ⁇ 16 pixel part of a frame. Since each chrominance component has one-half the vertical and horizontal resolution of the luminance component, a macroblock consists of four luminance, one Cb block and one Cr block. Each luminance macroblock is further divided into four blocks of 8 ⁇ 8 pixels.
  • I-frames Intra-frames
  • P-frames Predicted frames or “P-frames” which are defined partly by data representing the frame corresponding to the P-frame and partly on data representing one or more previous frames.
  • Bidirectional frames or “B-frames” are represented by data from both prior and future frames as well as the data corresponding to the B-frame itself.
  • DCT discrete cosine transform
  • the DCT data can be represented as many different wavy patterns, or only a few, with big steps between them. Initially, the DCT data are very fine-grained. But as part of the compression process, the DCT data are subjected to a process called quantization where the relative contributions of the different wave patterns are represented by coarse or fine-grained scales, depending on how much the data has to be compressed.
  • Compressing video images to generate P-frames and B-frames involve more complex processes.
  • a computer takes a first image and its predecessor image and looks for where each block (or macroblock, depending on the selection of the user) moved from one image to the next.
  • the MPEG-2 data simply indicates where the block in the earlier frame moved to in the new frame. This is described as a vector, a line, or arrow, whose length indicates distance of the movement and whose orientation indicates the direction of the movement. This kind of description is faulty, however, because not all motion in video can be described in terms of blobs moving around.
  • the defect is fixed by transmitting a correction that defines the difference between the image as predicted by a motion description and the image as it actually looked. This correction is called the residual.
  • the motion data and residual data are subjected to the DCT and quantization, just as the I-frame image data.
  • B-frames are similar to P-frames, except that they can refer to both previous and future frames in encoding their data.
  • the example video compression device generates the following data for each frame, as a byproduct of the compression process.
  • the following are examples of what may be economically derived from an encoder and are by no means comprehensive. In addition, they would vary depending on the type of encoder.
  • frame indicator a frame identifier that can be used to indicate the type of frame (I, P, or B).
  • luminance DC total value an indication of the luminance of an I-frame.
  • quantizer scale the quantization scale used for the DCT data.
  • MAD Mel Absolute Difference
  • Progressive/Interlaced value An indicator of whether the image is an interlaced type, usually found in conventional television video, or progressive type, usually found in video from movies and computer animation.
  • Luminance DC differential value This value represents the variation in luminance among the macroblocks of a frame. Low variation means a homogeneous image, which could be a blank screen.
  • Chrominance DC total value Analogous to luminance value but based on chrominance component rather than the luminance component.
  • Chrominance DC differential value Analogous to luminance differential value but based on chrominance component rather than luminance component.
  • Letterbox value indicates the shape of the video images by looking for homogeneous bands at the top and bottom of the frames, as when a wide-screen format is painted on a television screen.
  • Time stamps are not indicia of commercials, but indicate a location in a video stream and are used to mark the beginnings and ends of video sequences distinguishable by content.
  • Keyframe distance This is the number of frames between scene cuts.
  • FIG. 1 is block diagram of a hardware system for implementing a process of video-content identification based on compression features according to an embodiment of the invention.
  • FIG. 2 is a flow chart illustrating a process that makes use of compression features for identification of content sequences according to an embodiment of the invention.
  • FIG. 3 is a flow chart illustrating a process that makes use of compression features for identification of content sequences according to another embodiment of the invention.
  • an MPEG encoder 100 encodes video data 90 from a live data feed such as the Internet, a data store, a broadcast, or any other source.
  • the MPEG encoder generates compressed data that may be stored in a data store 110 such as a hard disk, a DVD, CDROM, or other data storage medium. Alternatively, the data may be buffered for distribution by any suitable means.
  • the MPEG encoder 100 may generate a variety of different values, some of them listed below.
  • MAD Mean Absolute Difference
  • the above may be output to a content analyzer 120 in raw form by the MPEG encoder 100 or the data may be refined first, depending on the allocation (between the encoder 100 and the analyzer 120 ) of the functions described herein.
  • These data are standard in the MPEG field, but are described for convenience below, along with some comment regarding how they may be utilized or filtered.
  • a playback selector 130 may use the results from the content analyzer to edit the compressed video. For example, where commercials or high action sequences are desired to be deleted from the video material, the playback selector can skip over material bracketed by markers resulting from the content analyzer 120 analysis and stored with the MPEG file in the data store 110 .
  • the MPEG data are described below as an example of the kinds of data that may be available from a compression process.
  • the frame indicator is just an ordinal identifier of the frame.
  • the frame indicator distinguishes between I-frames and P-frames (and B-frames).
  • I-frames have a value of 0 and P-frames (or B-frames) a value of 1,2,3,4 or 5.
  • the I and P or B frame indication may be used for content detection as discussed below.
  • the luminance total value is the sum of the first (out of 4) luminance DC values of each macro block over the entire frame. Any selection of the DC (chrominance or luminance) values may also be used. The former value is useful for I-frames only. For P-frames and B-frames, the luminance total value is calculated based on the previous frames. The luminance total value may be used for black frame detection. Alternatively, as discussed below, an aggregate value, the luminance differential value, may provide unicolor (non-black, but homogeneous color frames) and other advantages for this task. The luminance total value is advantageous for certain kinds of detection, such as for detection of flashes.
  • the quantizer scale indicates the quantization value used by the MPEG encoder 100 for quantization of the video data.
  • This quantization value may be adaptive to ensure that the bit rate stays in a predefined band. This feature is useful for detecting very complex or fast moving scenes. The value is relevant for I-frames as well as P-frames and B-frames.
  • part of the MPEG encoding process is the estimation of the motion of fields of color and luminance from one frame to another.
  • the results of this process are displacement vectors whose values are determined by the MAD matching criterion.
  • the MAD total value of the upper part can indicate sharp scene changes.
  • the frame is split into an upper (slices 0-25) and a lower part (slices 26-35).
  • the MAD total value-upper part is the sum of all MAD values of the macro blocks of the slices 0-25.
  • the macroblocks will be just slightly (if at all) displaced and will match quite well with the reference macro blocks. Therefore the MAD value will be very low (approaching zero).
  • the MAD value at a sharp scene change nearly no matching macroblocks will be found or just with a high content difference. Therefore the MAD value at a sharp scene change is much higher than the average MAD value.
  • the calculation of the value is the same as the one for the upper part of the frame.
  • the MAD total value lower part is the sum of all MAD values of the macro blocks of the slices 26-35. Again, the frames are split because each change in subtitles (very often used in some countries in Europe) leads to a false scene change detection.
  • the MAD value of the lower frame part can be useful as a subtitle change detector and as a support feature for the sharp scene change detector.
  • the current bit rate indicates the bit rate for the transmission of the MPEG data and has a fixed value per GOP. To hold the current bit rate in a certain band the quantizer value is increased or decreased depending on the actual current bit rate. This value is used in combination with the quantizer value to indicate fast varying or very complex scenes.
  • a field move average value in the X-direction indicates the displacement value of each macro block in the x-direction. This may be used, for example, as a check for sufficient movement in the scene, which in turn may be used to indicate whether there has been a shift from progressive to interlace video or the reverse. If the absolute value of the horizontal displacement of the actual macroblock is larger than 8 half pixels (control for sufficient movement either to the left or to the right), the progressive/interlaced value for the actual frame may be increased by one if the macroblock is frame DCT encoded (i.e., DCT type mode of the macroblock is 0) or decreased by one if the macroblock is field DCT coded (i.e., DCT type mode is 1). The progressive/interlaced value relative to a threshold may then be used as an indicator of whether the current video is progressive or interlaced.
  • This value may be used to indicate black frames, unicolor frames, and frames with low information content.
  • the absolute difference of the DC values (only first DC value of each macroblock) of consecutive macroblocks in a slice are first summed together.
  • the summed values of all the slices in the frame are then summed together to provide a total value.
  • This value may be used to help indicate black frames, unicolor frames, and frames with low information content or the opposite.
  • To calculate the chrominance DC differential value the absolute difference of the DC values (or a subset) of consecutive macroblocks in a slice are first summed together as above. Separate values could be calculated for the separate chrominance signals, e.g., Cr and Cb.
  • a color histogram could also be an output of the compression encoder or made to be one.
  • the histogram could be used to indicate unicolor frames.
  • the histogram could also serve an independent signature device. That is, along with other parameters, or even by itself, it may serve to distinguish some types of content from others.
  • the histogram can be efficiently generated because the blocks are at a lower resolution than the original image.
  • the letterbox value is based on the luminance DC value.
  • the luminance DC total values of the macroblocks of the first two slices (slices 0 & 1) and the last two slices (slices 34 & 35 for PAL) are summed together and the maximum value of both values gives the Letterbox value.
  • the letterbox value may be computed based on luminance differential value or total value.
  • audio compression produces a variety of useful values that may be used for classification of content.
  • a function that operates on the quantized subband data could be used to generate these additional features.
  • the time stamps are used to retrieve frames, and to mark the content breaks detected.
  • the set of features discussed above may be reduced to generate a set of mid-level features derived therefrom. For example, the following were tested for their ability to aid in the detection of commercial breaks.
  • An indicator of scene change may be derived from the MAD total value of the upper part of the frame. In the event of a sharp scene change, this value jumps, for one or two frames, to a very high value and then returns again to a low value.
  • a sliding time window may be used to calculate the average MAD value around the actual frame and its successor. If the MAD value for the actual frame (or the sum of the actual value and its successor) exceeds a certain threshold in relation to average MAD value, a sharp scene change may be indicated by changing the value of a scene change detector.
  • the luminance DC differential value remains under a certain threshold, multiple thresholds may be used, a black frame or an unicolor frame is detected.
  • Some broadcasters use unicolor frames (e.g. blue frames) instead of black frames between commercials. In this case, a unicolor frame indicator is useful for the commercial detector.
  • the interlaced/progressive value may be used to differentiate between interlaced and progressive video material.
  • a running sum may be generated by adding the interlaced/progressive value of each frame to the running sum. If this sum exceeds a threshold, for example, 20,000, the video material may be indicated as interlaced material or if below that threshold, it may be indicated as progressive material.
  • a deadband may be defined between the two thresholds where the video material is not defined. This indicator may be useful for detecting commercials since commercials are produced with different equipment due to different budgets. Therefore the video material in the commercial block can change quite often between interlaced and progressive video material.
  • the letterbox detector can be used to distinguish between material with a distinct aspect ratio (e.g., of 4:3 and 16:9).
  • Some video, for example commercials are sent out in formats that are different from the main program material.
  • the main material could be in a letterbox (like a movie) or the commercial could be in a letterbox, the important data being the change itself.
  • the letterbox indicates if the two upper and two lower slices are black. Advertisement banners, or small objects on a black background, result in a false detection, but these specific sequences are most probably not encapsulated by black (unicolor) frames and therefore they have only a minor influence on the commercial detector.
  • a short letterbox sequence encapsulated by black (unicolor) frames is a good indication for a commercial block.
  • the keyframe distance detector is a measure of the time (or number of frames or GOPs) between scene breaks.
  • the average keyframe frame distance detector can be used to indicate slowly changing video material vs. rapidly changing video material.
  • the keyframe distance is low typically varying around 10-15 GOPs.
  • the keyframe distance can be around 40 GOPs sometimes reaching values over 100.
  • the average keyframe distance is computed as the running average from the keyframe distances within a window of 5 keyframes. For example, a threshold of 5 keyframes may be used to distinguish commercial or action content from other content.
  • the tables indicate the program genre and columns for black frames, letterbox, progressive-interlaced change and average keyframe distance.
  • Table I for each feature, it was determined if that feature alone could be used as an indicator of the location of the commercial. The conclusion is indicated as either yes or no.
  • Table II for each feature, it was determined if that feature alone could be used to determine the correct boundaries of the commercial. Table I shows that black frame presence, progressive/interlaced material changes are strong indicators of the location of the commercial break within the program. The keyframe distance is a much weaker indicator compared to the black frame and progressive/interlaced changes. Reliance on progressive/interlaced change detection produces many false-positives, but rarely misses a commercial boundary. This may be true of other features as well. A technique in which one feature is used as a trigger and one or more other features used to verify, so as to delete false-positives, was developed.
  • Table II shows that individual features cannot be used alone to reliably detect the true boundaries of the commercial breaks.
  • the tolerance used for generating the table required that strict boundaries (within 2 seconds) be found. That is, if the commercial boundary were detected a little early or late by an interval of more than 2 seconds, it was regarded as a clean miss. If this criterion were relaxed, some of the features, particularly unicolor frames, could be used alone to reasonably good effect.
  • the columns indicate whether the feature can be used by itself to identify correctly both the beginning and the ending of a commercial break. Black frames can be misleading because the broadcasters do not always insert them properly and because the intensity level may vary such that the method will not detect them.
  • This tolerance may be adjusted by providing a threshold that permits greater variability in luminance among adjacent frames in the test for black (monocolor) frames.
  • the letterbox and keyframe distance appear to be unreliable for detection of the boundaries of commercial breaks.
  • black frames can be used to detect commercial boundaries with substantial accuracy overall on average if the criterion for missing is softened.
  • the above table was based on a two-second miss being a complete failure. So a detector based on black frame detection still provides rather accurate commercial detection.
  • step S 90 As video is compressed, the raw data and the above values are computed for each I frame in step S 90 .
  • step S 100 boundary sequences are identified and recorded, with a frame identification, if present.
  • step S 110 verification data is identified and, if present, recorded with the appropriate frame identifiers. If the process is incomplete in step S 120 , the next increment of video is compressed in step S 90 .
  • step S 130 When the process is completed, a set of data describing the video sequence in terms of the above features is stored in association with the compressed video and when displayed, appropriate editing may be performed as required in step S 130 .
  • FIG. 3 it may be desired to allow identification and editing of video material in a process that is closer to a real time process. For example, if a personal digital recorder is buffering broadcast video material by compressing the broadcast and the user is viewing this material with a certain delay, it would be beneficial to be able to identify content sequences as the broadcast is being compressed. This is instead of completing compression and only then identifying the content sequences and applying the appropriate editing; for example turning the volume down during commercials. In an alternative process for identifying particular forms of content, video data is compressed S 10 .
  • step S 20 the system checks for the presence of a boundary trigger event, for example, a sequence of black or unicolor frames as indicated by differential luminance detection or a change from progressive to interlace. If a trigger event is detected, a flag indicating the detection of a start of a type of content has begun is set in step S 30 .
  • the record includes an identification of the frame where it was found so that a time sequence of events can be generated. There may be many flags for each of a variety of different types of video sequences (e.g., one for commercials, one for violent content, one for action, one for talking heads, etc.)
  • step S 40 the presence of a type of data that may be used to verify a commercial or other type of video content sequence is identified, if present. If such data is found, it is stored in step S 50 .
  • step S 55 it is determined if there are bounded sequences of subject matter that may be verified as being of a particular type. If found, they are recorded in step S 65 along with an indication of the frame where it was identified. If editing is applicable at step S 65 , instructions for editing can be recorded and later (or presently) implemented at this step. If the compression process is completed in step S 70 , then the process terminates. If not, it resumes at step S 10 .
  • the events that indicate the start and/or end of particular types of video, such as commercials, may be any suitable feature.
  • One that has been discovered by experiment to be particular useful for commercial detection is the frame distance between detected unicolor or black frames (or consecutive sequences of black or unicolor frames). These may be used as triggers because in certain cases instead of black frames, broadcasters in certain countries have started using other monochrome frames. If the black frame distance conforms to a certain pattern (distance is between certain threshold 20 to 40 seconds) then the algorithm starts counting the number of black frames. After three black frames the probability of commercial detection increases and potential commercial end is set. Any of the different features could be used as commercial triggers, however a much more complex algorithm may be desirable for verification.
  • the black frame sequence appearance was used as a trigger for commercial detection.
  • Normally black frames (or unicolor frames) are used by the content creators to delineate commercials within a commercial break, as well as the beginning and ending of a whole commercial break. It may be assumed that a commercial break starts with a series of black (unicolor) frames and that during the commercial break a black frame will be follow within 1200 frames. Constraints may be placed on the duration of the commercials. For example, to be verified as a commercial, a sequence may be required to be no shorter than 1,500 frames and no longer than 10,000 frames (European content, which is 25 frames per second—US is 30 frames per second).
  • An additional constraint may be applied to the minimum time between a candidate sequence before it will be labeled a commercial. For example, commercials may be required to be at least two minutes apart (3000 frames). The last constraint may be important for the linking of the segments that potentially represent commercials. If the linking is allowed for a long period of time, overly long “commercial” breaks might result which include non-commercial subject matter.
  • a potential commercial is detected, for example by detection of a black frame, other features are tested to increase or decrease the probability that the black frame, or other trigger event, actually indicated the start of a commercial break.
  • a high cut rate, high MAD density, or low keyframe distance may serve as verifiers.
  • a threshold level may be used such that the probability of a commercial is increased if the threshold is exceeded and reduced if not.
  • the probability may be proportional to the inverse of the keyframe distance and proportional to the MAD density.
  • the average number of keyframes between scene cuts can be as low as 5 GOPs during commercials.
  • the threshold used for the keyframe distance can be varied in the range of 10 to 15 for good results. Again, segments that are close to each other can be linked to infer the whole commercial break.
  • the above feature set provided by a compression encoder may also be applied in sophisticated ways to recognize different kinds of content.
  • these features and further features derived therefrom may also serve as inputs to a neural network, hidden Markov model, Bayesian network, or other classification engine to permit recognition of various types of video content.
  • a neural network hidden Markov model, Bayesian network, or other classification engine to permit recognition of various types of video content.
  • the entire feature set could be used to train a network to identify commercials, leaving it to the training process to determine the particular import of the various features in determining the start and end events that bound the commercials.
  • Audio compression encoders produce representations of audio data that will be recognized as providing unique signatures that can be recognized in an automated system to help distinguish certain kinds of content from others.
  • the current bit rate or quantizer may indicate the quantity of silent time intervals present.
  • the DCT coefficients corresponding to high-action, attention-grabbing material may be very different from those corresponding to the main program material and these signature features may be defined in a classifier, such as a Bayesian classifier, neural network, or hidden Markov model.
  • a classifier such as a Bayesian classifier, neural network, or hidden Markov model.
  • features derived from a compression process are used to classify content in a video stream, it is clear that these same features may be used in conjunction with other features (e.g., real-time features) for the same purposes.
  • real-time audio volume may be used in conjunction with black-frame (or unicolor frame) detection to identify transition to/from commercials.
  • black-frame or unicolor frame detection
  • the compression features may be employed as a secondary feature set to augment a primary feature set used for detailed content analysis, such as text recognition, face recognition, etc.

Abstract

The process of compressing video requires the calculation of a variety data that are used in the process of compression. The invention exploits some or all of these data for purposes of content detection. For example, these data may be leveraged for purposes of commercial detection. The luminance, motion vector field, residual values, quantizer, bit rate, etc. may all be used either directly or in combination, as signatures of content. A process for content detection may employ one or more features as indicators of the start and/or end of a sequence containing a particular type of content and other features as verifiers of the type of content bounded by these start/end indicators. The features may be combined and/or refined to produce higher-level feature data with good computational economy and content-classification utility.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • The present application is related to the following patents/applications, which are hereby incorporated by reference as if fully set forth in their entireties herein. [0001]
  • 1. “Apparatus and Method for Locating a Commercial Disposed Within a Video Data Stream,” invented by: Nevenka Dimitrova, Thomas McGee, Herman Elenbaas, Eugene Leyvi, Carolyn Ramsey and David Berkowitz, Filed Jul. 28, 1998, U.S. Pat. No. 6,100,941. [0002]
  • 2. “Automatic Signature-Based Spotting, Learning and Extracting of Commercials and Other Video Content,” invented by Dimitrova, McGee, Agnihotri, filed Oct. 13, 1999, U.S. patent application Ser. No. 09/417,288.[0003]
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0004]
  • The invention relates to the detection of content in video data streams, for example, commercials and more particularly to the accurate identification of transitions from one type of content to another, such as the temporal boundaries of commercial. [0005]
  • 2. Background of the Invention [0006]
  • Personal video receivers/recorders, devices that modify and/or record the content of broadcast video, are becoming increasingly popular. One example is a personal video recorder, which automatically records programs on a hard disk responsively to stored user preferences. One of the features under investigation for such systems is content detection. For example, a system that can detect commercials may allow substitute advertisements to be inserted in a video stream (“commercial swapping”) or temporary halting of the video at the end of a commercial to prevent a user, momentarily distracted during a commercial, from missing any of the main program content. [0007]
  • There are known methods for detecting commercials. One method is the detection of high cut rate due to a sudden change in the scene with no fade or movement transition between temporally-adjacent frames. Cuts can include fades so the cuts do not have to be hard cuts. A more robust criterion may be high transition rates. Another indicator is the presence of a black frame (or monochrome frame) coupled with silence, which may indicate the beginning of a commercial break. Another known indicator of commercials is high activity, an indicator derived from the observation/assumption that objects move faster and change more frequently during commercials than during the feature (non-commercial) material. These methods show somewhat promising results, but reliability is still wanting. There have been many issued patents devoted to commercial isolation that employ detection of monochrome frames and high activity. The use of monochrome frames, scene breaks, and action, as measured by a technique called “edge change ratio and motion vector length,” has been reported. [0008]
  • The combination of black frame detection and “activity” as represented by a rate of change of luminance level, has been discussed. Unfortunately, it is difficult to determine what constitutes “activity” and identifying the precise point of onset and termination. Black frames produce false positives because, among other things, they are also found in dissolves. Thus, any sequence of black frames followed by a high action sequence can be misjudged and skipped as a commercial. [0009]
  • Another technique is to measure the temporal distance between black frame sequences to determine a presence of a commercial. Another technique identified commercials based on matching images. In other words, differences in the qualities of the image content were used as an indicator. Also known is the use of a predetermined indicator within the video stream which demarcates commercial boundaries, but this is simply a method of indicating a previously known commercial, not a method of detecting them. Commercial detection based on trained neural networks configured to distinguish content based on analysis of the video stream have been proposed, but have not met with much success so far. Also, neural networks are complex and expensive to implement for this purpose. [0010]
  • SUMMARY OF THE INVENTION
  • Briefly, the invention employs low and mid-level features that are automatically generated in the process of compressing video as inputs to various classifier tools. The classifier tools are trained to identify commercial features and generate metrics responsively to them. The metrics are employed in combination (a super-classifier) to detect the boundaries of the commercials. The benefit of using these low- and mid-level features is that they can be generated and processed very quickly using relatively inexpensive electronics, such as using an application-specific integrated circuit (ASIC) or application-specific instruction-set processor (ASIP). [0011]
  • Generally speaking, a dedicated chip normally performs image compression on consumer appliances, since the processes involved require high speed. One aspect of the invention is to provide a way to leverage the results of the compression process, not only for compression, but also for the analysis of the video required to detect certain types of content. One example of a device that can compress video implements the Motion Pictures Expert Group (MPEG) compression scheme known as MPEG-2. [0012]
  • In MPEG-2, video data are represented by video sequences, each including of a group of pictures (GOP), each GOP including pieces of data that describe the pictures or “frames” that make up the video. The frame is the primary coding unit of the video sequence. A picture consists of three rectangular matrices, one representing luminance (the intensity of the various portions of a frame) and two representing chrominance (Cb and Cr; the color of the various portions of a frame). The luminance matrix has an even number of rows and columns. The chrominance matrices are one-half the size of the Y matrix in each direction (horizontal and vertical) because human perception is less detail-sensitive for color than it is for luminosity. Each frame is further divided into one or more contiguous macroblocks, grouped into “slices.” The order of the macroblocks within a slice is from left-to-right and top-to-bottom. The macroblock is the basic coding unit in the MPEG-2 scheme. It represents a 16×16 pixel part of a frame. Since each chrominance component has one-half the vertical and horizontal resolution of the luminance component, a macroblock consists of four luminance, one Cb block and one Cr block. Each luminance macroblock is further divided into four blocks of 8×8 pixels. [0013]
  • In MPEG-2, some frames, called Intra-frames or “I-frames,” are represented by data that is independent of the content of any other frame. This allows a playback device to enter the video file at any point where such a frame is located. In MPEG-2, frames are grouped into a group of pictures (GOP), with an I-frame always leading any group of pictures. I-frames are distinct from Predicted frames or “P-frames” which are defined partly by data representing the frame corresponding to the P-frame and partly on data representing one or more previous frames. Bidirectional frames or “B-frames” are represented by data from both prior and future frames as well as the data corresponding to the B-frame itself. [0014]
  • The way in which data is compressed in MPEG-2 depends on the type of frame. The blocks of an I-frame are each translated into a different format called discrete cosine transform (DCT). This process can be roughly described as defining the appearance of each block as a sum of different predefined wave patterns so a highly detailed pattern would include a lot of short wave patterns and a smooth pattern would include long (or no) waves. The reason for doing this is that in video, many of the blocks are smooth. This allows the data that describes the contributions of short waves in such blocks to be greatly compressed by a process called run-length encoding. Also, when the video must be forced into a bottleneck and certain data have to be sacrificed, throwing out certain data from the DCT representation yields a better looking picture than throwing out data in the original image, which could, for example, leave the pictures full of holes. [0015]
  • The DCT data can be represented as many different wavy patterns, or only a few, with big steps between them. Initially, the DCT data are very fine-grained. But as part of the compression process, the DCT data are subjected to a process called quantization where the relative contributions of the different wave patterns are represented by coarse or fine-grained scales, depending on how much the data has to be compressed. [0016]
  • Compressing video images to generate P-frames and B-frames involve more complex processes. A computer takes a first image and its predecessor image and looks for where each block (or macroblock, depending on the selection of the user) moved from one image to the next. Instead of describing the whole block in the P-frame, the MPEG-2 data simply indicates where the block in the earlier frame moved to in the new frame. This is described as a vector, a line, or arrow, whose length indicates distance of the movement and whose orientation indicates the direction of the movement. This kind of description is faulty, however, because not all motion in video can be described in terms of blobs moving around. The defect, however, is fixed by transmitting a correction that defines the difference between the image as predicted by a motion description and the image as it actually looked. This correction is called the residual. The motion data and residual data are subjected to the DCT and quantization, just as the I-frame image data. B-frames are similar to P-frames, except that they can refer to both previous and future frames in encoding their data. [0017]
  • The example video compression device generates the following data for each frame, as a byproduct of the compression process. The following are examples of what may be economically derived from an encoder and are by no means comprehensive. In addition, they would vary depending on the type of encoder. [0018]
  • frame indicator: a frame identifier that can be used to indicate the type of frame (I, P, or B). [0019]
  • luminance DC total value: an indication of the luminance of an I-frame. [0020]
  • quantizer scale: the quantization scale used for the DCT data. [0021]
  • MAD (Mean Absolute Difference): the average of the magnitudes of the vectors used to describe a P- or B-image in terms of movement of blocks. There are several that may be generated: for example one representing only an upper or lower portion of a whole frame or one that includes all blocks of the frame. [0022]
  • Current bit rate: The amount of data representing a GOP [0023]
  • Progressive/Interlaced value: An indicator of whether the image is an interlaced type, usually found in conventional television video, or progressive type, usually found in video from movies and computer animation. [0024]
  • Luminance DC differential value: This value represents the variation in luminance among the macroblocks of a frame. Low variation means a homogeneous image, which could be a blank screen. [0025]
  • Chrominance DC total value. Analogous to luminance value but based on chrominance component rather than the luminance component. [0026]
  • Chrominance DC differential value. Analogous to luminance differential value but based on chrominance component rather than luminance component. [0027]
  • Letterbox value: indicates the shape of the video images by looking for homogeneous bands at the top and bottom of the frames, as when a wide-screen format is painted on a television screen. [0028]
  • Time stamps: These are not indicia of commercials, but indicate a location in a video stream and are used to mark the beginnings and ends of video sequences distinguishable by content. [0029]
  • Scene change detection: This indicates a sudden change in scene content due to abrupt change in average MAD value. [0030]
  • Keyframe distance: This is the number of frames between scene cuts. [0031]
  • As an example of a type of content that may be identified and temporally bracketed, over 15 hours of video with commercials were tested. The effectiveness of the different features, and combinations of features, as indicators of the beginnings and ends of commercial sequences were determined. It was determined that the individual indicators discussed above are less reliable on their own than when combined. These tests confirmed that various ways of combining these data may be used to produce reliable content detection, particularly commercial detection. [0032]
  • The invention will be described in connection with certain preferred embodiments, with reference to the following illustrative figures so that it may be more fully understood. With reference to the figures, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause-of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.[0033]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is block diagram of a hardware system for implementing a process of video-content identification based on compression features according to an embodiment of the invention. [0034]
  • FIG. 2 is a flow chart illustrating a process that makes use of compression features for identification of content sequences according to an embodiment of the invention. [0035]
  • FIG. 3 is a flow chart illustrating a process that makes use of compression features for identification of content sequences according to another embodiment of the invention.[0036]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Referring to FIG. 1, a system is shown that may be used for detecting content by leveraging data produced during video compression. In the illustrated embodiment, an [0037] MPEG encoder 100 encodes video data 90 from a live data feed such as the Internet, a data store, a broadcast, or any other source. The MPEG encoder generates compressed data that may be stored in a data store 110 such as a hard disk, a DVD, CDROM, or other data storage medium. Alternatively, the data may be buffered for distribution by any suitable means. The MPEG encoder 100 may generate a variety of different values, some of them listed below.
  • frame indicator [0038]
  • luminance DC total value [0039]
  • quantizer scale [0040]
  • MAD (Mean Absolute Difference) total value, lower part [0041]
  • Current bit rate [0042]
  • Field move average in X-direction [0043]
  • Luminance differential value [0044]
  • MAD total value, upper part: sum of all MAD values [0045]
  • MAD total value, lower part: sum of all MAD values [0046]
  • Letterbox value [0047]
  • Time stamp [0048]
  • Chrominance DC total value [0049]
  • Chrominance differential value [0050]
  • Generally chip-based compression encoders do not generate all of these values and do not expose the registers that hold those values that they do generate because normally they have no other use. In the present invention, these values are applied to additional processes for content recognition. The compression chips may need to be modified to generate some of these values and expose their registers that hold them to outside processes. The computational burden on such a chip would not increased significantly by doing this and the required design modifications of the compression chip are well within the competence of a person of ordinary skill to perform. Pure software systems can provide these data to any other process by simple software modifications, such as variable definition changes. The above may be output to a [0051] content analyzer 120 in raw form by the MPEG encoder 100 or the data may be refined first, depending on the allocation (between the encoder 100 and the analyzer 120) of the functions described herein. These data are standard in the MPEG field, but are described for convenience below, along with some comment regarding how they may be utilized or filtered.
  • A [0052] playback selector 130 may use the results from the content analyzer to edit the compressed video. For example, where commercials or high action sequences are desired to be deleted from the video material, the playback selector can skip over material bracketed by markers resulting from the content analyzer 120 analysis and stored with the MPEG file in the data store 110. The MPEG data are described below as an example of the kinds of data that may be available from a compression process.
  • Frame Indicator [0053]
  • The frame indicator is just an ordinal identifier of the frame. The frame indicator distinguishes between I-frames and P-frames (and B-frames). For a GOP size of 6, I-frames have a value of 0 and P-frames (or B-frames) a value of 1,2,3,4 or 5. The I and P or B frame indication may be used for content detection as discussed below. [0054]
  • Luminance Total Value [0055]
  • The luminance total value is the sum of the first (out of 4) luminance DC values of each macro block over the entire frame. Any selection of the DC (chrominance or luminance) values may also be used. The former value is useful for I-frames only. For P-frames and B-frames, the luminance total value is calculated based on the previous frames. The luminance total value may be used for black frame detection. Alternatively, as discussed below, an aggregate value, the luminance differential value, may provide unicolor (non-black, but homogeneous color frames) and other advantages for this task. The luminance total value is advantageous for certain kinds of detection, such as for detection of flashes. [0056]
  • Quantizer Scale [0057]
  • The quantizer scale indicates the quantization value used by the [0058] MPEG encoder 100 for quantization of the video data. This quantization value may be adaptive to ensure that the bit rate stays in a predefined band. This feature is useful for detecting very complex or fast moving scenes. The value is relevant for I-frames as well as P-frames and B-frames.
  • MAD Total Value-Upper Part [0059]
  • As discussed above, part of the MPEG encoding process is the estimation of the motion of fields of color and luminance from one frame to another. The results of this process are displacement vectors whose values are determined by the MAD matching criterion. The MAD total value of the upper part can indicate sharp scene changes. The frame is split into an upper (slices 0-25) and a lower part (slices 26-35). In the upper part of the frame no subtitles appear and therefore no false detection due to text changes can occur. The MAD total value-upper part is the sum of all MAD values of the macro blocks of the slices 0-25. In the case of static scenes the macroblocks will be just slightly (if at all) displaced and will match quite well with the reference macro blocks. Therefore the MAD value will be very low (approaching zero). At a sharp scene change nearly no matching macroblocks will be found or just with a high content difference. Therefore the MAD value at a sharp scene change is much higher than the average MAD value. [0060]
  • MAD Total Value, Lower Part [0061]
  • The calculation of the value is the same as the one for the upper part of the frame. The MAD total value lower part is the sum of all MAD values of the macro blocks of the slices 26-35. Again, the frames are split because each change in subtitles (very often used in some countries in Europe) leads to a false scene change detection. The MAD value of the lower frame part can be useful as a subtitle change detector and as a support feature for the sharp scene change detector. [0062]
  • Current Bit Rate [0063]
  • The current bit rate indicates the bit rate for the transmission of the MPEG data and has a fixed value per GOP. To hold the current bit rate in a certain band the quantizer value is increased or decreased depending on the actual current bit rate. This value is used in combination with the quantizer value to indicate fast varying or very complex scenes. [0064]
  • Progressive/Interlaced Value [0065]
  • A field move average value in the X-direction indicates the displacement value of each macro block in the x-direction. This may be used, for example, as a check for sufficient movement in the scene, which in turn may be used to indicate whether there has been a shift from progressive to interlace video or the reverse. If the absolute value of the horizontal displacement of the actual macroblock is larger than 8 half pixels (control for sufficient movement either to the left or to the right), the progressive/interlaced value for the actual frame may be increased by one if the macroblock is frame DCT encoded (i.e., DCT type mode of the macroblock is 0) or decreased by one if the macroblock is field DCT coded (i.e., DCT type mode is 1). The progressive/interlaced value relative to a threshold may then be used as an indicator of whether the current video is progressive or interlaced. [0066]
  • Luminance DC Differential Value [0067]
  • This value may be used to indicate black frames, unicolor frames, and frames with low information content. To calculate the luminance DC differential value, the absolute difference of the DC values (only first DC value of each macroblock) of consecutive macroblocks in a slice are first summed together. The summed values of all the slices in the frame are then summed together to provide a total value. [0068]
  • Chrominance DC Differential Value [0069]
  • This value may be used to help indicate black frames, unicolor frames, and frames with low information content or the opposite. To calculate the chrominance DC differential value, the absolute difference of the DC values (or a subset) of consecutive macroblocks in a slice are first summed together as above. Separate values could be calculated for the separate chrominance signals, e.g., Cr and Cb. [0070]
  • A color histogram could also be an output of the compression encoder or made to be one. The histogram could be used to indicate unicolor frames. The histogram could also serve an independent signature device. That is, along with other parameters, or even by itself, it may serve to distinguish some types of content from others. The histogram can be efficiently generated because the blocks are at a lower resolution than the original image. [0071]
  • Letterbox Value [0072]
  • The letterbox value is based on the luminance DC value. The luminance DC total values of the macroblocks of the first two slices (slices 0 & 1) and the last two slices (slices 34 & 35 for PAL) are summed together and the maximum value of both values gives the Letterbox value. The letterbox value may be computed based on luminance differential value or total value. [0073]
  • Audio Features [0074]
  • As discussed below, audio compression produces a variety of useful values that may be used for classification of content. For example, a function that operates on the quantized subband data could be used to generate these additional features. [0075]
  • Time Stamps [0076]
  • The time stamps are used to retrieve frames, and to mark the content breaks detected. [0077]
  • The set of features discussed above may be reduced to generate a set of mid-level features derived therefrom. For example, the following were tested for their ability to aid in the detection of commercial breaks. [0078]
  • Scene change detection [0079]
  • Black frame/Unicolor frame detection [0080]
  • Letterbox detection [0081]
  • Interlaced/progressive Indicator [0082]
  • Keyframe distance [0083]
  • These features are described below. [0084]
  • Scene Change Detection [0085]
  • An indicator of scene change may be derived from the MAD total value of the upper part of the frame. In the event of a sharp scene change, this value jumps, for one or two frames, to a very high value and then returns again to a low value. A sliding time window may be used to calculate the average MAD value around the actual frame and its successor. If the MAD value for the actual frame (or the sum of the actual value and its successor) exceeds a certain threshold in relation to average MAD value, a sharp scene change may be indicated by changing the value of a scene change detector. [0086]
  • Black Frame/Unicolor Frame Detector [0087]
  • If the luminance DC differential value remains under a certain threshold, multiple thresholds may be used, a black frame or an unicolor frame is detected. Some broadcasters use unicolor frames (e.g. blue frames) instead of black frames between commercials. In this case, a unicolor frame indicator is useful for the commercial detector. [0088]
  • Interlaced/Progressive Indicator [0089]
  • The interlaced/progressive value may be used to differentiate between interlaced and progressive video material. A running sum may be generated by adding the interlaced/progressive value of each frame to the running sum. If this sum exceeds a threshold, for example, 20,000, the video material may be indicated as interlaced material or if below that threshold, it may be indicated as progressive material. A deadband may be defined between the two thresholds where the video material is not defined. This indicator may be useful for detecting commercials since commercials are produced with different equipment due to different budgets. Therefore the video material in the commercial block can change quite often between interlaced and progressive video material. [0090]
  • Letterbox Detector [0091]
  • The letterbox detector can be used to distinguish between material with a distinct aspect ratio (e.g., of 4:3 and 16:9). Some video, for example commercials, are sent out in formats that are different from the main program material. The main material could be in a letterbox (like a movie) or the commercial could be in a letterbox, the important data being the change itself. The letterbox indicates if the two upper and two lower slices are black. Advertisement banners, or small objects on a black background, result in a false detection, but these specific sequences are most probably not encapsulated by black (unicolor) frames and therefore they have only a minor influence on the commercial detector. A short letterbox sequence encapsulated by black (unicolor) frames is a good indication for a commercial block. [0092]
  • Keyframe Distance Detector [0093]
  • The keyframe distance detector is a measure of the time (or number of frames or GOPs) between scene breaks. The average keyframe frame distance detector can be used to indicate slowly changing video material vs. rapidly changing video material. During the commercial breaks the keyframe distance is low typically varying around 10-15 GOPs. During the normal programming the keyframe distance can be around 40 GOPs sometimes reaching values over 100. The average keyframe distance is computed as the running average from the keyframe distances within a window of 5 keyframes. For example, a threshold of 5 keyframes may be used to distinguish commercial or action content from other content. [0094]
  • The various MPEG data, alone and combination, were derived from, and compared with, sample video material from television broadcasts for purposes of commercial detection. Graphs were plotted showing all the combination features against time with actual commercial breaks indicated on the time line for European content. Using this graphical analysis, each feature can be analyzed for its ability to indicate a commercial break, alone and in concert with others of the features. The results of this analysis are summarized in Tables I and II. [0095]
    TABLE I
    Individual feature contribution to detection of
    commercial location
    Progres-
    Black sive/Inter- Keyframe
    Genre frame Letterbox laced distance
    Sports no no no no
    yes no yes yes
    Talk show yes yes yes yes
    Movie no no yes no
    yes no yes no
    Talk show no no yes no
    yes no yes no
    yes no yes no
    yes no yes yes
    News yes no yes no
    yes no yes no
    Talk show yes yes yes yes
    yes no yes yes
    Talk show yes no yes yes
    yes no yes yes
    Sports yes yes yes yes
    yes yes yes no
    yes no yes yes
    yes yes yes yes
    Sports yes no yes yes
    yes yes yes yes
    yes no yes yes
    yes no yes yes
    yes yes yes yes
    yes no yes yes
    yes no yes no
    yes no yes no
    yes no yes no
    yes no yes no
  • [0096]
    TABLE II
    Individual feature contribution to detection of
    commercial boundary
    Progres-
    Black sive/Inter- Keyframe
    Genre frame Letterbox laced distance
    Sports no no no no
    no no no no
    Talk show yes no yes no
    Movie no no no no
    yes no yes no
    Talk show no no yes no
    no no yes no
    no no yes no
    no no yes no
    News no no yes no
    no no no no
    Talk show no yes no no
    no no yes no
    Talk show no no yes no
    no no no no
    Sports no no no no
    no no no no
    no no no no
    Sports no no no yes
    no no yes no
    Sports no no yes no
    no no yes no
    Drama no no no no
    no no no yes
    Drama no no yes no
    no no no no
    no no no no
    no no yes no
  • The tables indicate the program genre and columns for black frames, letterbox, progressive-interlaced change and average keyframe distance. In Table I, for each feature, it was determined if that feature alone could be used as an indicator of the location of the commercial. The conclusion is indicated as either yes or no. In Table II, for each feature, it was determined if that feature alone could be used to determine the correct boundaries of the commercial. Table I shows that black frame presence, progressive/interlaced material changes are strong indicators of the location of the commercial break within the program. The keyframe distance is a much weaker indicator compared to the black frame and progressive/interlaced changes. Reliance on progressive/interlaced change detection produces many false-positives, but rarely misses a commercial boundary. This may be true of other features as well. A technique in which one feature is used as a trigger and one or more other features used to verify, so as to delete false-positives, was developed. [0097]
  • Table II shows that individual features cannot be used alone to reliably detect the true boundaries of the commercial breaks. However, the tolerance used for generating the table required that strict boundaries (within 2 seconds) be found. That is, if the commercial boundary were detected a little early or late by an interval of more than 2 seconds, it was regarded as a clean miss. If this criterion were relaxed, some of the features, particularly unicolor frames, could be used alone to reasonably good effect. In Table II, the columns indicate whether the feature can be used by itself to identify correctly both the beginning and the ending of a commercial break. Black frames can be misleading because the broadcasters do not always insert them properly and because the intensity level may vary such that the method will not detect them. This tolerance may be adjusted by providing a threshold that permits greater variability in luminance among adjacent frames in the test for black (monocolor) frames. The letterbox and keyframe distance appear to be unreliable for detection of the boundaries of commercial breaks. Note that black frames can be used to detect commercial boundaries with substantial accuracy overall on average if the criterion for missing is softened. The above table was based on a two-second miss being a complete failure. So a detector based on black frame detection still provides rather accurate commercial detection. [0098]
  • Referring to FIG. 2, the following is a method for content detection, e.g., commercial detection, based on the features: [0099]
  • black frame detection; [0100]
  • unicolor frame detection; [0101]
  • progressive vs. interlaced mode detection; [0102]
  • keyframe distance; [0103]
  • letterbox; and [0104]
  • density of MAD values. [0105]
  • As video is compressed, the raw data and the above values are computed for each I frame in step S[0106] 90. In step S100, boundary sequences are identified and recorded, with a frame identification, if present. In step S110, verification data is identified and, if present, recorded with the appropriate frame identifiers. If the process is incomplete in step S120, the next increment of video is compressed in step S90. When the process is completed, a set of data describing the video sequence in terms of the above features is stored in association with the compressed video and when displayed, appropriate editing may be performed as required in step S130.
  • Since it may not be known, when a particular video sequence is compressed, precisely what edits will be applied, a full record of the compression features may be recorded as the video is compressed. In that way, the editing may be applied at the time of viewing. Alternatively, if the edits to be applied to the video are known, the stored compressed video may be edited in advance, or a set of instructions for editing may be stored and the record of compression features discarded. [0107]
  • Referring to FIG. 3, it may be desired to allow identification and editing of video material in a process that is closer to a real time process. For example, if a personal digital recorder is buffering broadcast video material by compressing the broadcast and the user is viewing this material with a certain delay, it would be beneficial to be able to identify content sequences as the broadcast is being compressed. This is instead of completing compression and only then identifying the content sequences and applying the appropriate editing; for example turning the volume down during commercials. In an alternative process for identifying particular forms of content, video data is compressed S[0108] 10. Then, in step S20, the system checks for the presence of a boundary trigger event, for example, a sequence of black or unicolor frames as indicated by differential luminance detection or a change from progressive to interlace. If a trigger event is detected, a flag indicating the detection of a start of a type of content has begun is set in step S30. The record includes an identification of the frame where it was found so that a time sequence of events can be generated. There may be many flags for each of a variety of different types of video sequences (e.g., one for commercials, one for violent content, one for action, one for talking heads, etc.)
  • If there is no trigger event in step S[0109] 20, control passes to step S40. In step S40, the presence of a type of data that may be used to verify a commercial or other type of video content sequence is identified, if present. If such data is found, it is stored in step S50. In step S55, it is determined if there are bounded sequences of subject matter that may be verified as being of a particular type. If found, they are recorded in step S65 along with an indication of the frame where it was identified. If editing is applicable at step S65, instructions for editing can be recorded and later (or presently) implemented at this step. If the compression process is completed in step S70, then the process terminates. If not, it resumes at step S10.
  • The events that indicate the start and/or end of particular types of video, such as commercials, may be any suitable feature. One that has been discovered by experiment to be particular useful for commercial detection is the frame distance between detected unicolor or black frames (or consecutive sequences of black or unicolor frames). These may be used as triggers because in certain cases instead of black frames, broadcasters in certain countries have started using other monochrome frames. If the black frame distance conforms to a certain pattern (distance is between certain threshold 20 to 40 seconds) then the algorithm starts counting the number of black frames. After three black frames the probability of commercial detection increases and potential commercial end is set. Any of the different features could be used as commercial triggers, however a much more complex algorithm may be desirable for verification. [0110]
  • In an experimental evaluation, the black frame sequence appearance was used as a trigger for commercial detection. Normally black frames (or unicolor frames) are used by the content creators to delineate commercials within a commercial break, as well as the beginning and ending of a whole commercial break. It may be assumed that a commercial break starts with a series of black (unicolor) frames and that during the commercial break a black frame will be follow within 1200 frames. Constraints may be placed on the duration of the commercials. For example, to be verified as a commercial, a sequence may be required to be no shorter than 1,500 frames and no longer than 10,000 frames (European content, which is 25 frames per second—US is 30 frames per second). An additional constraint may be applied to the minimum time between a candidate sequence before it will be labeled a commercial. For example, commercials may be required to be at least two minutes apart (3000 frames). The last constraint may be important for the linking of the segments that potentially represent commercials. If the linking is allowed for a long period of time, overly long “commercial” breaks might result which include non-commercial subject matter. [0111]
  • Once a potential commercial is detected, for example by detection of a black frame, other features are tested to increase or decrease the probability that the black frame, or other trigger event, actually indicated the start of a commercial break. For example, the presence of a letterbox change immediately after the black frame, a shift from progressive to interlace video material (or the reverse), a high cut rate, high MAD density, or low keyframe distance may serve as verifiers. In the case of low keyframe distance (or high cut rate), a threshold level may be used such that the probability of a commercial is increased if the threshold is exceeded and reduced if not. Alternatively, the probability may be proportional to the inverse of the keyframe distance and proportional to the MAD density. [0112]
  • It has been determined empirically that the average number of keyframes between scene cuts can be as low as 5 GOPs during commercials. The threshold used for the keyframe distance can be varied in the range of 10 to 15 for good results. Again, segments that are close to each other can be linked to infer the whole commercial break. There are commercials that are characterized by long keyframe distances. To allow for this, a tolerance can be built in to allow the keyframe distance to be higher for some maximum interval, say 750 frames, i.e. half a minute. [0113]
  • The above feature set provided by a compression encoder may also be applied in sophisticated ways to recognize different kinds of content. For example, these features and further features derived therefrom, may also serve as inputs to a neural network, hidden Markov model, Bayesian network, or other classification engine to permit recognition of various types of video content. Thus, for example, rather than separate out one feature as a trigger feature indicating a potential start of a commercial, the entire feature set could be used to train a network to identify commercials, leaving it to the training process to determine the particular import of the various features in determining the start and end events that bound the commercials. [0114]
  • Although the examples discussed above focussed mainly on video features, audio features generated during compression of audio data or the audio portions of video data may be exploited in the same ways as discussed above. For example, the sound volume intensity of a commercial or action sequence of video data may be different from that of other portions. Audio compression encoders produce representations of audio data that will be recognized as providing unique signatures that can be recognized in an automated system to help distinguish certain kinds of content from others. For example the current bit rate or quantizer may indicate the quantity of silent time intervals present. For another example, the DCT coefficients corresponding to high-action, attention-grabbing material, such as commercials, may be very different from those corresponding to the main program material and these signature features may be defined in a classifier, such as a Bayesian classifier, neural network, or hidden Markov model. [0115]
  • Although in the embodiments discussed above, features derived from a compression process are used to classify content in a video stream, it is clear that these same features may be used in conjunction with other features (e.g., real-time features) for the same purposes. For example, real-time audio volume may be used in conjunction with black-frame (or unicolor frame) detection to identify transition to/from commercials. There are many ways of generating additional data from a video source that may be combined with those available from current compression encoders and which may be used in conjunction with the encoder-generated data for video/audio classification. In fact, the compression features may be employed as a secondary feature set to augment a primary feature set used for detailed content analysis, such as text recognition, face recognition, etc. [0116]
  • It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. [0117]

Claims (22)

What is claimed is:
1. A content editor, comprising:
a video compression encoder that generates first and second feature data from a video sequence as part of a compression process resulting in a compressed version of video data;
said first and second feature data being separate from said compressed version of video data;
an analysis engine programmed to receive said first and second feature data and calculate at least a third feature datum from at least one of said first and second feature data;
a playback selector programmed to edit said compressed version of video data responsively to said at least a third feature datum.
2. A content editor as in claim 1, wherein said playback selector is programmed to edit said compressed version of video data responsively to at least one of said first and second data.
3. A content editor as in claim 1, wherein said third data includes an identifier of a presence of a sequence of unicolor frames.
4. A content editor as in claim 1, wherein said third data includes an identifier of a transition between letterbox format and non-letterbox format.
5. A content editor as in claim 1, wherein said third data includes an identifier of a transition between interlaced and progressive video.
6. A content editor as in claim 1, wherein said third data includes an identifier of a frequency of scene cuts.
7. A content editor as in claim 1, wherein said third data includes a color histogram representing a frame.
8. A content editor as in claim 1, wherein said first and second data includes audio features of said video sequence.
9. A content editor as in claim 1, wherein said playback selector is programmed to edit said compressed version of video data responsively to at least one of said first, second, and third data includes at least one of an average of motion vectors, a current bit rate, a variation of luminance within a frame, variation of color within a frame, a total luminance of a frame, a total color of a frame, change in luminance between frames, a mean absolute difference, and a quantizer scale.
10. A video content detector, comprising:
a video compression encoder capable of receiving uncompressed video data and generating compressed video data;
said analysis engine being connected to receive first data from the video compression encoder, said first data being separate from said compressed video data;
said first data being generated as a result of a compression process;
said analysis engine being programmed to generate an identifier of a beginning of a type of content in said compressed video responsively to said first data.
11. A content detector as in claim 10, wherein said first data includes at least one of a quantizer scale, motion vector data, bit rate data, a variation of luminance within a frame, variation of color within a frame, a total luminance of a frame, a total color of a frame, change in luminance between frames, a mean absolute difference, and a quantizer scale.
12. A content detector as in claim 10, wherein said analysis engine is programmed to calculate a derivative feature from at least one of said first data and to generate said identifier responsively also to said derivative data.
13. A content detector as in claim 10, wherein said analysis engine is programmed to identify, responsively to said first data, the presence or absence of a letterbox in said uncompressed video data and to generate an identifier of a location in a sequence of said compressed video data coinciding with said presence or absence.
14. A content detector as in claim 10, wherein said analysis engine is programmed to identify, responsively to said first data, the presence of interlaced or progressive video format in said uncompressed video data and to generate an identifier of a location in a sequence of said compressed video data coinciding with said interlaced or progressive video format.
15. A content detector as in claim 10, wherein said analysis engine is programmed to identify, responsively to said first data, the presence of unicolor frames in said uncompressed video data and to generate an identifier of a location in a sequence of said compressed video data coinciding with said unicolor frames.
16. A content detector as in claim 10, wherein said analysis engine is programmed to identify, responsively to said first data, an indicator or a frequency of scene cuts in said uncompressed video data and to generate an identifier of a location in a sequence of said compressed video data coinciding with said frequency of scene cuts.
17. A method for detecting commercials in a compressed video stream, comprising the steps of:
compressing video data and generating compressed video data and first data as a byproduct of said step of compressing;
identifying first events in said first data indicating a potential start of a commercial sequence;
verifying that a content of video following said potential start is characteristic of a commercial sequence responsively to said first data;
indicating a presence of a commercial responsively to results of said steps of identifying and verifying.
18. A method as in claim 17, wherein said step of verifying includes calculating at least one of a scene cut rate, a unicolor frame sequence, a letterbox border of a video frame, and whether the video format is progressive or interlaced.
19. A method for detecting content in video data, comprising the steps of:
compressing video data and generating compressed video data and compression feature data as a byproduct of said step of compressing;
classifying content portions of said video data based on said compression feature data in combination with non-compression feature data;
indicating content identified in said step of classifying.
20. A method as in claim 19, wherein said step of classifying includes programming a classification engine based on examples of said predefined content.
21. A method as in claim 19, wherein said step of classifying includes training a classifier and using said classifier to classify said predefined content.
22. A method as in claim 21, wherein said classifier includes at least one of a Bayesian classifier, a neural network, and a hidden Markov model classifier.
US09/854,511 2001-05-14 2001-05-14 Video content detection method and system leveraging data-compression constructs Expired - Fee Related US6714594B2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US09/854,511 US6714594B2 (en) 2001-05-14 2001-05-14 Video content detection method and system leveraging data-compression constructs
CNB028098315A CN100493186C (en) 2001-05-14 2002-05-08 Video content detection method and system leveraging data-compression parameter
PCT/IB2002/001633 WO2002093929A1 (en) 2001-05-14 2002-05-08 Video content analysis method and system leveraging data-compression parameters
EP02727885A EP1393569A1 (en) 2001-05-14 2002-05-08 Video content analysis method and system leveraging data-compression parameters
KR1020037000892A KR100869038B1 (en) 2001-05-14 2002-05-08 A content editor, a video content detector, a method for detecting commercials and content
JP2002590671A JP2004522354A (en) 2001-05-14 2002-05-08 Video content analysis method and system using data compression parameters
JP2009021705A JP2009135957A (en) 2001-05-14 2009-02-02 Video content analysis method and system using data-compression parameter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/854,511 US6714594B2 (en) 2001-05-14 2001-05-14 Video content detection method and system leveraging data-compression constructs

Publications (2)

Publication Number Publication Date
US20020186768A1 true US20020186768A1 (en) 2002-12-12
US6714594B2 US6714594B2 (en) 2004-03-30

Family

ID=25318886

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/854,511 Expired - Fee Related US6714594B2 (en) 2001-05-14 2001-05-14 Video content detection method and system leveraging data-compression constructs

Country Status (6)

Country Link
US (1) US6714594B2 (en)
EP (1) EP1393569A1 (en)
JP (2) JP2004522354A (en)
KR (1) KR100869038B1 (en)
CN (1) CN100493186C (en)
WO (1) WO2002093929A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030165322A1 (en) * 2001-08-20 2003-09-04 Jason Demas System and method for providing personal video recording trick modes
US20040034870A1 (en) * 2002-08-12 2004-02-19 O'brien Royal J Data streaming system and method
WO2005006768A1 (en) * 2003-06-20 2005-01-20 Nielsen Media Research, Inc Signature-based program identification apparatus and methods for use with digital broadcast systems
WO2005009043A1 (en) * 2003-07-18 2005-01-27 Koninklijke Philips Electronics N.V. Detecting a content item in a digital video stream
US20060093231A1 (en) * 2004-10-28 2006-05-04 Fujitsu Limited Method and apparatus for encoding image data, and method an apparatus for recording video using the same
US20060110057A1 (en) * 2004-11-23 2006-05-25 Microsoft Corporation Method and system for detecting black frames in a sequence of frames
US20070274537A1 (en) * 2004-08-18 2007-11-29 Venugopal Srinivasan Methods and Apparatus for Generating Signatures
US20080284853A1 (en) * 2005-12-23 2008-11-20 Pero Juric Non-Intrusive Determination of an Objective Mean Opinion Score of a Video Sequence
US20090237560A1 (en) * 2008-03-18 2009-09-24 Cisco Technology, Inc. Networked ip video wall
US20090292822A1 (en) * 2008-05-22 2009-11-26 Cisco Technology, Inc. Software client control of digital picture frames
US20100074339A1 (en) * 2008-09-19 2010-03-25 Akihiro Yonemoto Fast Macroblock Structure Decision Using SAD Discrepancy and its Prediction Mode
US20100138890A1 (en) * 2007-05-07 2010-06-03 Nxp B.V. Device to allow content analysis in real time
US7742737B2 (en) 2002-01-08 2010-06-22 The Nielsen Company (Us), Llc. Methods and apparatus for identifying a digital audio signal
US20100238355A1 (en) * 2007-09-10 2010-09-23 Volker Blume Method And Apparatus For Line Based Vertical Motion Estimation And Compensation
CN102045520A (en) * 2009-10-15 2011-05-04 康佳集团股份有限公司 Method and system for television program switching and television set
US20110157475A1 (en) * 2009-12-31 2011-06-30 Wright David H Methods and apparatus to detect commercial advertisements associated with media presentations
AU2013203872B2 (en) * 2009-12-31 2016-03-03 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements in television transmissions
US20170085926A1 (en) * 2004-07-23 2017-03-23 The Nielsen Company (Us), Llc Methods and apparatus for monitoring the insertion of local media into a program stream
US9756283B1 (en) * 2011-09-30 2017-09-05 Tribune Broadcasting Company, Llc Systems and methods for identifying a black/non-black frame attribute
US9848222B2 (en) 2015-07-15 2017-12-19 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US10091265B2 (en) 2016-06-01 2018-10-02 Amazon Technologies, Inc. Catching up to the live playhead in live streaming
US10530825B2 (en) * 2016-06-01 2020-01-07 Amazon Technologies, Inc. Catching up to the live playhead in live streaming
US11277461B2 (en) * 2019-12-18 2022-03-15 The Nielsen Company (Us), Llc Methods and apparatus to monitor streaming media
US20220303618A1 (en) * 2021-03-17 2022-09-22 Comcast Cable Communications, Llc Systems, methods, and apparatuses for processing viewership information

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69920055T2 (en) * 1999-02-26 2005-09-15 Stmicroelectronics Asia Pacific Pte Ltd. METHOD AND DEVICE FOR DETERMINING LAPSE / NON-LENGHTER IMAGES AND RECOGNITION OF REPEATED PICTURES AND SCENE CHANGES
US9038108B2 (en) * 2000-06-28 2015-05-19 Verizon Patent And Licensing Inc. Method and system for providing end user community functionality for publication and delivery of digital media content
GB2365245B (en) * 2000-07-28 2004-06-30 Snell & Wilcox Ltd Video Compression
AUPR133700A0 (en) * 2000-11-09 2000-11-30 Mediaware Solutions Pty Ltd Transition templates for compressed digital video and method of generating same
US20070089151A1 (en) * 2001-06-27 2007-04-19 Mci, Llc. Method and system for delivery of digital media experience via common instant communication clients
US8972862B2 (en) * 2001-06-27 2015-03-03 Verizon Patent And Licensing Inc. Method and system for providing remote digital media ingest with centralized editorial control
US7970260B2 (en) * 2001-06-27 2011-06-28 Verizon Business Global Llc Digital media asset management system and method for supporting multiple users
US8990214B2 (en) * 2001-06-27 2015-03-24 Verizon Patent And Licensing Inc. Method and system for providing distributed editing and storage of digital media over a network
US20060236221A1 (en) * 2001-06-27 2006-10-19 Mci, Llc. Method and system for providing digital media management using templates and profiles
US7170566B2 (en) * 2001-12-21 2007-01-30 Koninklijke Philips Electronics N.V. Family histogram based techniques for detection of commercials and other video content
US20030123841A1 (en) * 2001-12-27 2003-07-03 Sylvie Jeannin Commercial detection in audio-visual content based on scene change distances on separator boundaries
US7974495B2 (en) * 2002-06-10 2011-07-05 Digimarc Corporation Identification and protection of video
US7136417B2 (en) * 2002-07-15 2006-11-14 Scientific-Atlanta, Inc. Chroma conversion optimization
US20040015988A1 (en) * 2002-07-22 2004-01-22 Buvana Venkataraman Visual medium storage apparatus and method for using the same
US7512180B2 (en) * 2003-06-25 2009-03-31 Microsoft Corporation Hierarchical data compression system and method for coding video data
KR100505699B1 (en) * 2003-08-12 2005-08-03 삼성전자주식회사 Encoding rate controller of video encoder providing for qualitative display using real time variable bit-rate control, video data transmission system having it and method thereof
WO2005104676A2 (en) * 2004-03-29 2005-11-10 Nielsen Media Research, Inc. Methods and apparatus to detect a blank frame in a digital video broadcast signal
WO2005099273A1 (en) * 2004-04-08 2005-10-20 Koninklijke Philips Electronics N.V. Monochrome frame detection method and corresponding device
EP1751986A1 (en) * 2004-04-08 2007-02-14 Koninklijke Philips Electronics N.V. Coding method applied to multimedia data
CN101107851B (en) * 2005-01-19 2010-12-15 皇家飞利浦电子股份有限公司 Apparatus and method for analyzing a content stream comprising a content item
KR100707189B1 (en) * 2005-04-29 2007-04-13 삼성전자주식회사 Apparatus and method for detecting advertisment of moving-picture, and compter-readable storage storing compter program controlling the apparatus
CN101180633A (en) 2005-05-19 2008-05-14 皇家飞利浦电子股份有限公司 Method and apparatus for detecting content item boundaries
US7561206B2 (en) * 2005-06-29 2009-07-14 Microsoft Corporation Detecting progressive video
US9401080B2 (en) 2005-09-07 2016-07-26 Verizon Patent And Licensing Inc. Method and apparatus for synchronizing video frames
US9076311B2 (en) * 2005-09-07 2015-07-07 Verizon Patent And Licensing Inc. Method and apparatus for providing remote workflow management
US20070107012A1 (en) * 2005-09-07 2007-05-10 Verizon Business Network Services Inc. Method and apparatus for providing on-demand resource allocation
US8631226B2 (en) * 2005-09-07 2014-01-14 Verizon Patent And Licensing Inc. Method and system for video monitoring
US20090222671A1 (en) 2005-10-25 2009-09-03 Burbank Jeffrey H Safety features for medical devices requiring assistance and supervision
WO2007122541A2 (en) 2006-04-20 2007-11-01 Nxp B.V. Data summarization system and method for summarizing a data stream
US7982797B2 (en) * 2006-06-30 2011-07-19 Hewlett-Packard Development Company, L.P. Detecting blocks of commercial content in video data
US7881657B2 (en) 2006-10-03 2011-02-01 Shazam Entertainment, Ltd. Method for high-throughput identification of distributed broadcast content
US8659654B2 (en) * 2006-10-11 2014-02-25 Microsoft Corporation Image verification with tiered tolerance
ES2463716T3 (en) 2007-05-22 2014-05-29 Koninklijke Philips N.V. Remote lighting control
JP2009122829A (en) * 2007-11-13 2009-06-04 Sony Corp Information processing apparatus, information processing method, and program
CN101175214B (en) * 2007-11-15 2010-09-08 北京大学 Method and apparatus for real-time detecting advertisement from broadcast data stream
US20090320060A1 (en) * 2008-06-23 2009-12-24 Microsoft Corporation Advertisement signature tracking
US8774959B2 (en) * 2009-12-15 2014-07-08 Japan Super Quartz Corporation Method of calculating temperature distribution of crucible
US9998750B2 (en) 2013-03-15 2018-06-12 Cisco Technology, Inc. Systems and methods for guided conversion of video from a first to a second compression format
US10306333B2 (en) * 2017-09-13 2019-05-28 The Nielsen Company (Us), Llc Flagging advertisement frames for automatic content recognition

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4750052A (en) * 1981-02-13 1988-06-07 Zenith Electronics Corporation Apparatus and method for automatically deleting selected program intervals from recorded television broadcasts
US4752834A (en) * 1981-08-31 1988-06-21 Shelton Video Editors Inc. Reciprocating recording method and apparatus for controlling a video recorder so as to edit commercial messages from a recorded television signal
US4750213A (en) * 1986-06-09 1988-06-07 Novak Albert P Method and system for editing unwanted program material from broadcast signals
JPH01284092A (en) * 1988-01-26 1989-11-15 Integrated Circuit Technol Ltd Method and apparatus for discriminating and eliminating specific data from video signal
US5696866A (en) * 1993-01-08 1997-12-09 Srt, Inc. Method and apparatus for eliminating television commercial messages
US5333091B2 (en) * 1993-01-08 1996-12-17 Arthur D Little Enterprises Method and apparatus for controlling a videotape player to automatically scan past recorded commercial messages
JP2846840B2 (en) * 1994-07-14 1999-01-13 三洋電機株式会社 Method for generating 3D image from 2D image
JP3823333B2 (en) * 1995-02-21 2006-09-20 株式会社日立製作所 Moving image change point detection method, moving image change point detection apparatus, moving image change point detection system
US6002831A (en) * 1995-05-16 1999-12-14 Hitachi, Ltd. Image recording/reproducing apparatus
JPH0993588A (en) * 1995-09-28 1997-04-04 Toshiba Corp Moving image processing method
JP3332756B2 (en) * 1996-03-14 2002-10-07 三洋電機株式会社 Television broadcast signal recording and playback device
US5767922A (en) * 1996-04-05 1998-06-16 Cornell Research Foundation, Inc. Apparatus and process for detecting scene breaks in a sequence of video frames
US5999689A (en) * 1996-11-01 1999-12-07 Iggulden; Jerry Method and apparatus for controlling a videotape recorder in real-time to automatically identify and selectively skip segments of a television broadcast signal during recording of the television signal
JPH10215436A (en) * 1997-01-30 1998-08-11 Sony Corp Recording and reproducing device, its method and recording medium
US6021220A (en) * 1997-02-11 2000-02-01 Silicon Biology, Inc. System and method for pattern recognition
JP3514063B2 (en) * 1997-02-20 2004-03-31 松下電器産業株式会社 Receiver
US6014183A (en) * 1997-08-06 2000-01-11 Imagine Products, Inc. Method and apparatus for detecting scene changes in a digital video stream
JP2000069414A (en) * 1998-08-17 2000-03-03 Sony Corp Recorder, recording method, reproduction device, reproduction method and cm detection method
JP2000209553A (en) * 1998-11-13 2000-07-28 Victor Co Of Japan Ltd Information signal recorder and reproducing device
JP4178629B2 (en) * 1998-11-30 2008-11-12 ソニー株式会社 Information processing apparatus and method, and recording medium
US6469749B1 (en) * 1999-10-13 2002-10-22 Koninklijke Philips Electronics N.V. Automatic signature-based spotting, learning and extracting of commercials and other video content
CN1240218C (en) * 1999-11-01 2006-02-01 皇家菲利浦电子有限公司 Method and apparatus for swapping the video contents of undesired commercial breaks or other video sequences
US6766098B1 (en) 1999-12-30 2004-07-20 Koninklijke Philip Electronics N.V. Method and apparatus for detecting fast motion scenes

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030165322A1 (en) * 2001-08-20 2003-09-04 Jason Demas System and method for providing personal video recording trick modes
US8238725B2 (en) * 2001-08-20 2012-08-07 Broadcom Corporation System and method for providing personal video recording trick modes
US7742737B2 (en) 2002-01-08 2010-06-22 The Nielsen Company (Us), Llc. Methods and apparatus for identifying a digital audio signal
US8548373B2 (en) 2002-01-08 2013-10-01 The Nielsen Company (Us), Llc Methods and apparatus for identifying a digital audio signal
US20040034870A1 (en) * 2002-08-12 2004-02-19 O'brien Royal J Data streaming system and method
WO2005006768A1 (en) * 2003-06-20 2005-01-20 Nielsen Media Research, Inc Signature-based program identification apparatus and methods for use with digital broadcast systems
US9054820B2 (en) 2003-06-20 2015-06-09 The Nielsen Company (Us), Llc Signature-based program identification apparatus and methods for use with digital broadcast systems
US8255938B2 (en) 2003-06-20 2012-08-28 The Nielsen Company (Us), Llc Signature-based program identification apparatus and methods for use with digital broadcast systems
US20060184961A1 (en) * 2003-06-20 2006-08-17 Nielsen Media Research, Inc. Signature-based program identification apparatus and methods for use with digital broadcast systems
WO2005009043A1 (en) * 2003-07-18 2005-01-27 Koninklijke Philips Electronics N.V. Detecting a content item in a digital video stream
US7936973B2 (en) 2003-07-18 2011-05-03 Koninklijke Philips Electronics N.V. Detecting a content item in a digital video stream
US10356446B2 (en) * 2004-07-23 2019-07-16 The Nielsen Company (Us), Llc Methods and apparatus for monitoring the insertion of local media into a program stream
US20170085926A1 (en) * 2004-07-23 2017-03-23 The Nielsen Company (Us), Llc Methods and apparatus for monitoring the insertion of local media into a program stream
US11310541B2 (en) 2004-07-23 2022-04-19 The Nielsen Company (Us), Llc Methods and apparatus for monitoring the insertion of local media into a program stream
US11477496B2 (en) 2004-07-23 2022-10-18 The Nielsen Company (Us), Llc Methods and apparatus for monitoring the insertion of local media into a program stream
US20070274537A1 (en) * 2004-08-18 2007-11-29 Venugopal Srinivasan Methods and Apparatus for Generating Signatures
US7783889B2 (en) 2004-08-18 2010-08-24 The Nielsen Company (Us), Llc Methods and apparatus for generating signatures
US8489884B2 (en) 2004-08-18 2013-07-16 The Nielsen Company (Us), Llc Methods and apparatus for generating signatures
US7457360B2 (en) * 2004-10-28 2008-11-25 Fujitsu Limited Method and apparatus for encoding image data with pre-encoding scheme, and method and apparatus for recording video using the same
US20060093231A1 (en) * 2004-10-28 2006-05-04 Fujitsu Limited Method and apparatus for encoding image data, and method an apparatus for recording video using the same
US7650031B2 (en) * 2004-11-23 2010-01-19 Microsoft Corporation Method and system for detecting black frames in a sequence of frames
US20060110057A1 (en) * 2004-11-23 2006-05-25 Microsoft Corporation Method and system for detecting black frames in a sequence of frames
US8212939B2 (en) * 2005-12-23 2012-07-03 Swissqual License Ag Non-intrusive determination of an objective mean opinion score of a video sequence
US20080284853A1 (en) * 2005-12-23 2008-11-20 Pero Juric Non-Intrusive Determination of an Objective Mean Opinion Score of a Video Sequence
US20100138890A1 (en) * 2007-05-07 2010-06-03 Nxp B.V. Device to allow content analysis in real time
US20100238355A1 (en) * 2007-09-10 2010-09-23 Volker Blume Method And Apparatus For Line Based Vertical Motion Estimation And Compensation
US8526502B2 (en) * 2007-09-10 2013-09-03 Entropic Communications, Inc. Method and apparatus for line based vertical motion estimation and compensation
US20090237560A1 (en) * 2008-03-18 2009-09-24 Cisco Technology, Inc. Networked ip video wall
US20090292822A1 (en) * 2008-05-22 2009-11-26 Cisco Technology, Inc. Software client control of digital picture frames
US8156244B2 (en) * 2008-05-22 2012-04-10 Cisco Technology, Inc. Software client control of digital picture frames
US8275046B2 (en) * 2008-09-19 2012-09-25 Texas Instruments Incorporated Fast macroblock structure decision using SAD discrepancy and its prediction mode
US20100074339A1 (en) * 2008-09-19 2010-03-25 Akihiro Yonemoto Fast Macroblock Structure Decision Using SAD Discrepancy and its Prediction Mode
CN102045520A (en) * 2009-10-15 2011-05-04 康佳集团股份有限公司 Method and system for television program switching and television set
US10631044B2 (en) 2009-12-31 2020-04-21 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements associated with media presentations
AU2013203872B2 (en) * 2009-12-31 2016-03-03 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements in television transmissions
US9591353B2 (en) 2009-12-31 2017-03-07 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements associated with media presentations
US8925024B2 (en) * 2009-12-31 2014-12-30 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements associated with media presentations
US11558659B2 (en) * 2009-12-31 2023-01-17 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements associated with media presentations
US10028014B2 (en) 2009-12-31 2018-07-17 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements associated with media presentations
US20110157475A1 (en) * 2009-12-31 2011-06-30 Wright David H Methods and apparatus to detect commercial advertisements associated with media presentations
US11070871B2 (en) 2009-12-31 2021-07-20 The Nielsen Company (Us), Llc Methods and apparatus to detect commercial advertisements associated with media presentations
US9756283B1 (en) * 2011-09-30 2017-09-05 Tribune Broadcasting Company, Llc Systems and methods for identifying a black/non-black frame attribute
US10694234B2 (en) 2015-07-15 2020-06-23 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US11184656B2 (en) 2015-07-15 2021-11-23 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US10264301B2 (en) 2015-07-15 2019-04-16 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US9848222B2 (en) 2015-07-15 2017-12-19 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US11716495B2 (en) 2015-07-15 2023-08-01 The Nielsen Company (Us), Llc Methods and apparatus to detect spillover
US10530825B2 (en) * 2016-06-01 2020-01-07 Amazon Technologies, Inc. Catching up to the live playhead in live streaming
US10091265B2 (en) 2016-06-01 2018-10-02 Amazon Technologies, Inc. Catching up to the live playhead in live streaming
US11277461B2 (en) * 2019-12-18 2022-03-15 The Nielsen Company (Us), Llc Methods and apparatus to monitor streaming media
US20220303618A1 (en) * 2021-03-17 2022-09-22 Comcast Cable Communications, Llc Systems, methods, and apparatuses for processing viewership information

Also Published As

Publication number Publication date
EP1393569A1 (en) 2004-03-03
US6714594B2 (en) 2004-03-30
JP2004522354A (en) 2004-07-22
JP2009135957A (en) 2009-06-18
KR20030031961A (en) 2003-04-23
WO2002093929A1 (en) 2002-11-21
CN100493186C (en) 2009-05-27
KR100869038B1 (en) 2008-11-17
CN1757236A (en) 2006-04-05

Similar Documents

Publication Publication Date Title
US6714594B2 (en) Video content detection method and system leveraging data-compression constructs
US8761452B2 (en) System, method and computer program product for video fingerprinting
US6810144B2 (en) Methods of and system for detecting a cartoon in a video data stream
JP4942883B2 (en) Method for summarizing video using motion and color descriptors
US7170566B2 (en) Family histogram based techniques for detection of commercials and other video content
US7912303B2 (en) Apparatus and method for generating thumbnail images
US9147112B2 (en) Advertisement detection
US20030123841A1 (en) Commercial detection in audio-visual content based on scene change distances on separator boundaries
EP1319230B1 (en) An apparatus for reproducing an information signal stored on a storage medium
Joyce et al. Temporal segmentation of video using frame and histogram space
US20030061612A1 (en) Key frame-based video summary system
US20060271947A1 (en) Creating fingerprints
US6823011B2 (en) Unusual event detection using motion activity descriptors
EP1817908B1 (en) System, method for video fingerprinting
JP4667697B2 (en) Method and apparatus for detecting fast moving scenes
Dimitrova et al. Real time commercial detection using MPEG features
US7302160B1 (en) Audio/video recorder with automatic commercial advancement prevention
GB2419489A (en) Method of identifying video by creating and comparing motion fingerprints
JP3714871B2 (en) Method for detecting transitions in a sampled digital video sequence
Smeaton et al. An evaluation of alternative techniques for automatic detection of shot boundaries in digital video
Joyce et al. Temporal segmentation of video using frame and histogram-space
Aoki High‐speed topic organizer of TV shows using video dialog detection
O'Toole Analysis of shot boundary detection techniques on a large video test suite
Bateman Video Motion as a Video Fingerprint.

Legal Events

Date Code Title Description
AS Assignment

Owner name: KONINKLIJKE PHILIPS ELECTRONICS N.V., NETHERLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DIMITROVA, NEVENKA;MCGEE, THOMAS;MEKENKAMP, GERHARD;AND OTHERS;REEL/FRAME:012461/0818;SIGNING DATES FROM 20010508 TO 20011005

AS Assignment

Owner name: KONINKLIJKE PHILIPS ELECTRONICS N.V., NETHERLANDS

Free format text: CORRECTED COVER SHEET TO SPECIFY EACH INVENTOR'S NAME, PREVIOUSLY RECORDED AT REEL/FRAME 012461/0818 (ASSIGNMENT OF ASSIGNOR'S INTEREST);ASSIGNORS:DIMITROVA, NEVENKA;MCGEE, THOMAS;MEKENKAMP, GERHARD;AND OTHERS;REEL/FRAME:015299/0271;SIGNING DATES FROM 20010508 TO 20011005

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20160330