US6650705B1 - Method for encoding and transcoding multiple video objects with variable temporal resolution - Google Patents

Method for encoding and transcoding multiple video objects with variable temporal resolution Download PDF

Info

Publication number
US6650705B1
US6650705B1 US09/579,889 US57988900A US6650705B1 US 6650705 B1 US6650705 B1 US 6650705B1 US 57988900 A US57988900 A US 57988900A US 6650705 B1 US6650705 B1 US 6650705B1
Authority
US
United States
Prior art keywords
video
objects
shape
coding
temporal resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/579,889
Inventor
Anthony Vetro
Huifang Sun
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Research Laboratories Inc
Original Assignee
Mitsubishi Electric Research Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Research Laboratories Inc filed Critical Mitsubishi Electric Research Laboratories Inc
Priority to US09/579,889 priority Critical patent/US6650705B1/en
Assigned to MITSUBISHI ELECTRIC INFORMATION TECHNOLOGY CENTER AMERICA, INC. reassignment MITSUBISHI ELECTRIC INFORMATION TECHNOLOGY CENTER AMERICA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SUN, HUIFANG, VETRO, ANTHONY
Assigned to MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. reassignment MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: MITSUBISHI ELECTRIC INFORMATION TECHNOLOGY CENTER AMERICA, INC.
Priority to CN01802111.5A priority patent/CN1199467C/en
Priority to PCT/JP2001/001828 priority patent/WO2001091467A1/en
Priority to EP01912202A priority patent/EP1289301B1/en
Priority to JP2001586925A priority patent/JP4786114B2/en
Application granted granted Critical
Publication of US6650705B1 publication Critical patent/US6650705B1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/20Contour coding, e.g. using detection of edges
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/21Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with binary alpha-plane coding for video objects, e.g. context-based arithmetic encoding [CAE]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/29Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving scalability at the object level, e.g. video object layer [VOL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream

Definitions

  • This invention relates generally to encoding and transcoding multiple video objects, and more particularly to a system that controls the encoding and transcoding of multiple video objects with variable temporal resolutions.
  • MPEG-1 for storage and retrieval of moving pictures
  • MPEG-2 for digital television
  • H.263 see ISO/IEC JTC1 CD 11172, MPEG, “Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media up to about 1.5 Mbit/s—Part 2: Coding of Moving Pictures Information,” 1991, LeGall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications of the ACM, Vol. 34, No. 4, pp.
  • Newer video coding standards such as MPEG-4 (for multimedia applications), see“Information Technology—Generic coding of audio/visual objects,” ISO/IEC FDIS 14496-2 (MPEG4 Visual), Nov. 1998, allow arbitrary-shaped objects to be encoded and decoded as separate video object planes (VOP).
  • the objects can be visual, audio, natural, synthetic, primitive, compound, or combinations thereof.
  • Video objects are composed to form compound objects or “scenes.”
  • the emerging MPEG-4 standard is intended to enable multimedia applications, such as interactive video, where natural and synthetic materials are integrated, and where access is universal.
  • MPEG-4 allows for content based interactivity. For example, one might want to “cut-and-paste” a moving figure or object from one video to another.
  • this type of application it is assumed that the objects in the multimedia content have been identified through some type of segmentation process, see for example, U.S. patent application Ser. No. 09/326,750 “Method for Ordering Image Spaces to Search for Object Surfaces” filed on Jun. 4, 1999 by Lin et al.
  • the network can represent a wireless channel or the Internet. In any case, the network has limited capacity and a contention for its resources must be resolved when the content needs to be transmitted.
  • Rate control is used to allocate the number of bits per coding time instant. Rate control ensures that the bitstream produced by an encoder satisfies buffer constraints.
  • Rate control processes attempt to maximize the quality of the encoded signal, while providing a constant bit rate.
  • frame-based encoding such as MPEG-2
  • U.S. Pat. No. 5,847,761 “Method for performing rate control in a video encoder which provides a bit budget for each frame while employing virtual buffers and virtual buffer verifiers,” issued to Uz, et al. on Dec. 8, 1998.
  • object-based encoding such as MPEG-4, see U.S. Pat. No. 5,969,764, “Adaptive video coding method,” issued to Sun and Vetro on Oct. 19, 1999.
  • Bit stream conversion or“transcoding” can be classified as bit rate conversion, resolution conversion, and syntax conversion.
  • Bit rate conversion includes bit rate scaling and conversion between a constant bit rate (CBR) and a variable bit rate (VBR).
  • CBR constant bit rate
  • VBR variable bit rate
  • the basic function of bit rate scaling is to accept an input bitstream and produce a scaled output bitstream that meets new load constraints of a receiver.
  • a bit stream scaler is a transcoder, or filter, that provides a match between a source bitstream and the receiving load.
  • the transcoder typically, scaling can be accomplished by a transcoder 100 .
  • the transcoder includes a decoder 110 and encoder 120 .
  • a compressed input bitstream 101 is fully decoded at an input rate Rin, then encoded at a new output rate Rout 102 to produce the output bitstream 103 .
  • the output rate is lower than the input rate.
  • full decoding and full encoding in a transcoder is not done due to the high complexity of encoding the decoded bitstream, instead the transcoding is done on a compressed or partial decoded bitstream.
  • FIG. 2 shows an example method.
  • the video bitstream is only partially decoded.
  • macroblocks of the input bitstream 201 are variable-length decoded (VLD) 210 .
  • the input bitstream is also delayed 220 and inverse quantized (IQ) 230 to yield discrete cosine transform (DCT) coefficients.
  • IQ inverse quantized
  • DCT discrete cosine transform
  • the partially decoded data are analyzed 240 and a new set of quantizers is applied at 250 to the DCT macroblocks.
  • These re-quantized macroblocks are then variable-length coded (VLC) 260 and a new output bitstream 203 at a lower rate can be formed.
  • VLC variable-length coded
  • the number of bits allocated for encoding texture information is controlled by a quantization parameter (QP).
  • QP quantization parameter
  • Changing the QP on the basis of information contained in the original bitstream reduces the rate of texture bits.
  • the information is usually extracted directly in the compressed domain and can include measures that relate to the motion of macroblocks or residual energy of DCT macroblocks. This type of analysis can be found in the bit allocation analyzer 240 of FIG. 2 .
  • Vetro summarizes video content in very unique ways.
  • individual video objects are transcoded with different qualities.
  • the difference in quality can be related to either the spatial quality or the temporal resolution (quality).
  • the receiver can compose the objects so that all pixels within a reconstructed scene are defined.
  • Undefined pixels in the scene can result from background and foreground objects, or overlapping objects being sampled at different temporal resolutions so that in the re-composed scene, “holes” appear. Therefore, when varying the temporal resolution of multiple objects during encoding or transcoding, it was critical that synchronization was maintained.
  • the background can be encoded at a relatively low temporal resolution; say ten frames per second.
  • the foreground object is encoded at a higher temporal resolution of thirty frames per second. This is fine as long as the foreground object does not move a lot. However, should the foreground object move with respect o the background, a “hole” will appear in that portion of the background, which is no longer occluded by the foreground object.
  • MPEG-7 Multimedia Content Description Interface
  • MPEG-7 Context, Objectives and Technical Roadmap ISO/IEC N2861, July 1999.
  • this standard plans to incorporate a set of descriptors and description schemes that can be used to describe various types of multimedia content.
  • the descriptor and description schemes are associated with the content itself and allow for fast and efficient searching of material that is of interest to a particular user. It is important to note that this standard is not meant to replace previous coding standards, rather, it builds on other standard representations, especially MPEG-4, because the multimedia content can be decomposed into different objects and each object can be assigned a unique set of descriptors. Also, the standard is independent of the format in which the content is stored.
  • MPEG-7 The primary application of MPEG-7 is expected to be search and retrieval applications, see “MPEG-7 Applications,” ISO/IEC N2861, July 1999.
  • MPEG-7 Applications ISO/IEC N2861
  • a user specifies some attributes of a particular object. At this low-level of representation, these attributes can include descriptors that describe the texture, motion and shape of the particular object.
  • descriptors that describe the texture, motion and shape of the particular object.
  • a method of representing and comparing shapes has been described in U.S. patent application Ser. No. 09/326,759 “Method for Ordering Image Spaces to Represent Object Shapes” filed on Jun. 4, 1999 by Lin et al., and a method for describing the motion activity has been described in U.S. patent application Ser. No. 09/406,444 “Activity Descriptor for Video Sequences” filed on Sep.
  • These descriptors and description schemes allow a user to access properties of the video content that are not traditionally derived by an encoder or transcoder. For example, these properties can represent look-ahead information that was assumed to be inaccessible to the transcoder. The only reason that the encoder or transcoder has access to these properties is because the properties were extracted from the content at an earlier time, i.e., the content was pre-processed and stored in a database with its associated meta-data.
  • the information itself can be either syntactic or semantic, where syntactic information refers to the physical and logical signal aspects of the content, while the semantic information refers to the conceptual meaning of the content.
  • syntactic elements can be related to the color, shape and motion of a particular object.
  • semantic elements can refer to information that cannot be extracted from low-level descriptors, such as the time and place of an event or the name of a person in a video sequence.
  • the present invention provides an apparatus and method for coding a video.
  • the coding according to the invention can be performed by an encoder or a transcoder.
  • the video is first partitioned into video objects.
  • the partitioning is done with segmentation planes, and in the case of the transcoder, a demultiplexer is used.
  • shape features are extracted from each object.
  • the shape features can be obtained by measuring how the shape of each object evolves over time. A Hamming or Hausdorff distance measure can be used.
  • the extracted shape features are combined in a rate or transcoder control unit to determine a temporal resolution for each object over time.
  • the temporal resolutions are used to encode the various video objects.
  • motion features and coding complexity can also be considered while making trade-offs in temporal resolution determinations.
  • the partitioning, combining, and coding is performed in an encoder.
  • the demultiplxing, combining, and coding are performed in a transcoder.
  • boundary blocks of the objects in the compressed-video are used for extracting the shape features.
  • different objects can have different temporal resolutions or frame rates.
  • FIG. 1 is a block diagram of a prior art transcoder
  • FIG. 2 is a block diagram of a prior art partial decoder/encoder
  • FIG. 3 is a block diagram of a scene reconstructed from two video objects
  • FIG. 4 is a block diagram of a scene reconstructed from two video objects having different temporal resolutions
  • FIG. 5 is a block diagram of an encoder according to the invention.
  • FIG. 6 is a block diagram of a transcoder according to the invention.
  • FIG. 7 is a flow diagram of a method for encoding according to the invention.
  • FIG. 8 is a flow diagram of an example encoding strategy used by the method of FIG. 7 .
  • Our invention provides a method and apparatus for controlling temporal resolutions while encoding and transcoding multiple video objects in a scene.
  • the temporal resolution controller enables the encoding, transcoding, and reconstruction of objects having variable and different temporal resolutions.
  • One of the main advantages of an object-based coding scheme is that both the spatial and temporal resolution of the objects can vary independently.
  • the method and apparatus we describe are applicable to both object-based encoding and transcoding systems, and real-time as well as non real-time applications.
  • the input video is uncompressed, and during the transcoding, the input video is compressed. In both cases, the output video is compressed.
  • the mechanism and procedures that we describe can be seamlessly integrated into the architecture of prior art devices.
  • FIG. 3 shows a scene 303 that has been partitioned into two video objects; a foreground object 301 and a background object 302 .
  • the scene can be reconstructed by combining the two objects.
  • the foreground object is a moving person and the background object is a stationary wall.
  • the pixels of the foreground and background objects define all of the pixels in the scene.
  • the background is encoded at a frame rate of 15 Hz
  • the foreground is encoded at a frame rate of 30 Hz, which is twice the first rate.
  • the two objects have independent motion, and the pixels that are associated with each will change in every frame.
  • the foreground object could also be relatively stationary, but that it has higher internal motion than the background object.
  • the foreground is rich in texture, and it includes moving eyes, lips, and other moving facial features, while the background is a blank wall. Therefore, it is desired to encode the foreground at a higher spatial and temporal resolution than the background.
  • the foreground object is in motion with respect to the background as shown in the sequences of FIG. 4 .
  • sequences 401 - 403 time runs from left to right.
  • the sequence 401 is the background object encoded at a relative low temporal resolution
  • the sequence 402 is the foreground object encoded at a relative high resolution
  • sequence 403 is the reconstructed scene.
  • This causes holes 404 in every other frame. These holes are due to the movement of one object, without the updating of adjacent objects or overlapping objects.
  • the holes are uncovered area of the scene that cannot be associated with either object and for which no pixels are defined. The holes disappear when the objects are resynchronized, e.g. every other frame.
  • the method and apparatus for controlling and making decisions on the temporal resolution of objects indicates the amount of shape change (distortion) in a scene.
  • shape features that can be extracted for this purpose, for example, one shape feature measures the shape difference of an object over time.
  • the encoder can decide the amount of temporal resolution to use for each object while encoding or transcoding.
  • Shape differences for each object are measured over time.
  • the shape difference is inversely proportional to the amount of variability in the temporal resolution between the objects. For a fixed amount of time, a small difference indicate that a greater variability, whereas large difference indicates a lower variability. If the duration of time between when objects are resynchronized is made larger, the saved bits can be allocated to objects that need better quality.
  • a method that optimally synchronizes the objects operates as follows. Periodically sample the video to find a difference between the shapes of each object over time. If the shape difference of an object is small over time, then increase the sampling period for measuring the difference. Continue to increase the sampling period until the difference is greater than some predetermined threshold D. At this point, either output the frames to resynchronize the video objects with that difference, or determine new frequency at which the objects should be synchronized.
  • the frequency can be based on an average, a minimum, or a median time interval between synchronization frames. This frequency can be the used to determine an optimal temporal rate for each of the various video objects.
  • a temporal controller can provide various ways to effect the temporal resolution of objects in the scene, which are applicable to both encoders and transcoders.
  • the first difference measure that we consider is the well-known Hamming distance.
  • the Hamming distance measures the number of pixels that are different between two shapes.
  • segmentation (alpha— ⁇ ) values may only be zero or one, where zero refers to a transparent pixel in an segmentation plane and one refers to an opaque pixel in the segmentation plane.
  • ⁇ 1 (m,n) and ⁇ 2 (m, n) are corresponding segmentation planes at different time instances.
  • Hausdorff distance is defined as the maxmin function between two sets of pixel:
  • a and b are pixels of the sets A and B of two video objects respectively, and d(a,b) is the Euclidean distance between these pixels.
  • the above metric indicates the maximum distance of the pixels in set A to the nearest pixel in set B. Because this metric is not symmetric, i.e., h(A,B) may not be equal to h(B,A), a more general definition is given by:
  • H ( A,B ) max ⁇ h ( A,B ), h ( B,A ) ⁇ .
  • shape is coded in a variety of different modes and is done at the macroblock level.
  • a shape macro-block is coded as an opaque macroblock, a transparent macroblock or a boundary macroblock.
  • the boundary blocks of course define the shape of an object.
  • These coding modes can be used to reconstruct the macroblock level silhouette of the binary shape.
  • FIG. 5 shows an object-based encoder 500 according to our invention.
  • the encoder includes a switch 510 , a shape coder 520 , a motion estimator 530 , a motion compensator, a motion coder 550 , a texture coder 560 , a VOP memory 570 , a multiplexer (MUX) 580 , an output buffer 590 , a meta data storage unit 591 .
  • the encoder also includes a rate control unit (RCU) 592 for performing texture, temporal, shape, and meta-data analysis 593 - 596 .
  • Input to the encoder 500 is an object-based video (In) 501 .
  • the video is composed of a image sequence data and segmentation (alpha) planes defining the boundary (shape) of each video object.
  • the shape coder 520 processes the shape of each object and writes the shape coding results into an output bitstream (Out) 509 via the MUX 580 and buffer 590 .
  • the shape data are also used for motion estimation 530 , motion compensation 540 , and texture coding 560 . Particularly, the shape data is used to extract shape features for each object.
  • the objects, and associated shape and motion features are stored in the VOP memory 570 .
  • motion vectors are determined for each macroblock.
  • the motion vectors are also coded and written into the output bitstream via the MUX and buffer.
  • a motion compensated prediction is formed from video object data stored in the VOP memory 570 . This prediction is subtracted 541 from the input object yielding a set of residual macroblocks.
  • These residual macroblocks are subject to the texture coding 560 and the corresponding data are written to the output bitstream.
  • the texture coding is according to a QP control signal provided by the RCU.
  • the quantization parameter (QP) of the RCU 592 is responsible for selecting the appropriate QP for each video object. It does so by using models to estimate the corresponding QP according to the assigned rate budget.
  • the temporal analysis is described in detail below. Briefly, the temporal analysis is responsible for controlling the temporal resolution of each object during coding and transcoding.
  • the temporal resolution of all video objects is identical to avoid composition problems as described above with reference to FIG. 4 . Therefore, the prior art did not independently consider temporal resolution for the various objects. There, the temporal analysis provided a signal to skip all video objects when the output buffer was in danger of overflowing. Our invention provides a better solution, for example, objects that are relatively stationary can be encoded at a lower frame rate than faster moving objects to reduce the overall bit rate.
  • variable temporal qualities we enable the encoding and transcoding of video objects with variable temporal resolutions.
  • the shape analysis 592 is responsible for extracting the shape features that are used by the temporal analysis to decide if variable temporal resolution can be achieved without composing problems, i.e., holes are avoided even if the temporal encoding rates of the various objects are different.
  • the shape analysis can work in the real-time encoding mode, where data are retrieved from the VOP memory 570 .
  • the encoder also receives the meta-data 594 related to the shape features, i.e., descriptions of the content already exist, then such meta-data can be used in place of, or in conjunction with the shape data from the VOP memory 570 .
  • the meta-data are handled by the meta-data analysis, and like the shape analysis, the meta-data assists the temporal analysis in determining an optimal temporal resolution for each video object.
  • FIG. 6 shows a high-level block diagram of an object-based transcoder 600 according to an alternative embodiment of the invention.
  • the transcoder 600 includes a demultiplexer 601 , a multiplexer 602 , and an output buffer 603 .
  • the transcoder 600 also includes one or more object-based transcoders 630 operated by a transcoding control unit (TCU) 610 according to control information 604 .
  • TCU transcoding control unit
  • the unit TCU includes texture, temporal, shape and meta-data analyzers 611 - 514 .
  • An input compressed bitstream 605 is partitioned into one or more an object-based elementary bitstreams by the demultiplexer.
  • the object-based bitstreams can be serial or parallel.
  • the total bit rate of the bitstream 605 is R in .
  • the output compressed bitstream 606 from the transcoder 600 has a total bit rate R out such that R out ⁇ R in .
  • the multiplexer 601 provides one or more elementary bitstream to each of the object-based transcoders 630 , and the object-based transcoders provide object data 607 to the TCU 610 .
  • the transcoders scale the elementary bitstreams.
  • the scaled bitstreams are composed by the multiplexer 602 before being passed on to the output buffer 603 , and from there to a receiver.
  • the output buffer 606 also provides rate-feedback information 608 to the TCU.
  • the control information 604 that is passed to each of the transcoders is provided by the TCU.
  • the TCU is responsible for the analysis 611 - 612 of texture and shape data.
  • the TCU can also use network data 609 .
  • the TCU also performs meta-data analysis 614 .
  • the analysis of the temporal quality enables transcoding with variable temporal resolution.
  • FIG. 7 shows the steps of a method 700 for encoding and transcoding a video 701 according to our invention.
  • the input 701 to the method is either an uncompressed video in the case of the encoder 500 or a compressed video in the case of the transcoder 600 .
  • Step 710 partitions the video 701 into objects 711 .
  • Step 720 extracts, over time, shape features 721 from each object. The shape features can be distance or macroblock based as described above.
  • Step 730 extracts, optionally, motion feature from each object over time.
  • Other features that can be extracted and considered to determine an optimal temporal resolution can include coding complexity, e.g. spatial complexity, DCT complexity, texture complexity, etc.
  • Step 740 combines the extracted features to determine temporal resolutions 741 to use while encoding or transcoding the various objects 711 in step 750 .
  • FIG. 8 show some example encoding scenarios that are based on analyzing the evolving shape of video objects over time.
  • the input is first and second extracted object sequences 801 - 802 .
  • Graphs 810 and 820 plot shape features, e.g. shape differences ( ⁇ ) over time (t). Note, between times t 1 and t 2 the objects' shapes remain relatively constant.
  • Graphs 811 and 821 plot optionally each object's internal motion features over time. Note, the first object has very little internal motion, while the second object's internal motion is quite high.
  • the combiner 850 (RCU 592 or TCU 610 ) considers the extracted features using, perhaps a maxmin, summation, comparison, or other function combinatorial function to make decision on how to best distribute the available bits over the various objects during the actual coding.
  • scenario 831 do not code the first object at all during the interval [t 1 ,t 2 ], and allocate all available bits to the second object. This might have the effect of an observable and sudden drastic change in the quality of the video at times t 1 and t 2 .
  • a better scenario 831 might use a lower temporal resolution during the interval [t1,t 2 ], or better yet a gradual reduction in resolution followed by a gradual increase.
  • scenario 833 more bits are allocated to the second object during the time intervals [t 0 , t 1 ] and [t 2 , t end ], then during the interval [t1,t 2 ], to reflect the higher internal motion of the second object.
  • our object-based input and output bitstreams 601 - 602 are entirely different than traditional frame-based video programs.
  • MPEG-2 does not permit dynamic frame skipping.
  • the GOP structure and reference frames are usually fixed.
  • content 651 and corresponding content descriptions 652 are stored in a database 650 .
  • the content descriptions are generated from a feature extractor 640 , which accepts the input object-based bitstreams 605 .
  • the input bitstream is fed into the demux 601 and transcoder as described above.
  • the meta-data are sent to the meta-data analysis 614 within the TCU.
  • the main objective of the temporal controller in an object-based encoder or transcoder is to maximize the quality of the composed scene on the receiver side, while avoiding composition problems as described above with reference to FIG. 4 . To maximize quality under these constraints, it is necessary to exploit the temporal redundancy in the signal as much as possible.
  • the motion compensation process achieves the removal of temporal redundancy.
  • specifying the motion vector for every coding unit or macroblock may be more than is actually required.
  • the residual of the motion compensated difference must also be coded. The point is, to maximize quality not every object needs to be coded at every time instant. In this way, these saved bits can be used for other more important objects at different time instants.
  • the temporal controller makes use of the shape distortion metrics to indicate the amount of movement among shapes in the scene.
  • This measure can relate to the scene at various cue levels as defined in U.S. patent application Ser. No. 09/546,717.
  • the temporal controller can provide various ways to impact the temporal resolution of objects in the scene, which are applicable to both encoders and transcoders.
  • the temporal controller acts in the same manner. However, because the observations are limited because of latency constraints, only causal data are considered. Therefore, the temporal coding decisions are made instantaneously.
  • extraction of the shape distortion metric can be done in either the pixel or compressed domain. Regardless of where distortion information is extracted, it should be noted that some tolerance can be incorporated into the decision-making process of the temporal control. In other words, some applications may to tolerate a small amount of undefined area, provide that the gain in the defined area is substantial.
  • a weight ranging between [0,1] is defined, where 0 means that there is no movement among the shape boundaries and 1 means that the shape boundary is completely different.
  • the weight is a function of the shape distortion metrics defined earlier and can correspond to a percentage or normalized value.
  • this weighting will not exist. Rather, only the extreme weights are valid, i.e., 0 or 1.
  • the temporal controller according to our invention provides the following effects and advantages.

Abstract

A video is first partitioned into video objects. If the video is uncompressed, then the partitioning is done with segmentation planes. In the case where the video is compressed, a demultiplexer is used for the partitioning. Over time, shape features are extracted from each partitioned object. The extracted shape features are combined to determined a temporal resolution for each object over time. The temporal resolutions are subsequently used to encode or transcode the video objects as an output compressed video.

Description

FIELD OF THE INVENTION
This invention relates generally to encoding and transcoding multiple video objects, and more particularly to a system that controls the encoding and transcoding of multiple video objects with variable temporal resolutions.
BACKGROUND OF THE INVENTION
Recently, a number of standards have been developed for communicating encoded information. For video sequences, the most widely used standards include MPEG-1 (for storage and retrieval of moving pictures), MPEG-2 (for digital television) and H.263, see ISO/IEC JTC1 CD 11172, MPEG, “Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media up to about 1.5 Mbit/s—Part 2: Coding of Moving Pictures Information,” 1991, LeGall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications of the ACM, Vol. 34, No. 4, pp. 46-58, 1991, ISO/IEC DIS 13818-2, MPEG-2, “Information Technology—Generic Coding of Moving Pictures and Associated Audio Information—Part 2: Video,” 1994, ITU-T SG XV, DRAFT H.263, “Video Coding for Low Bitrate Communication,” 1996, ITU-T SG XVI, DRAFT13 H.263+Q15-A-60 rev.0, “Video Coding for Low Bitrate Communication,” 1997.
These standards are relatively low-level specifications that primarily deal with the spatial and temporal compression of video sequences. As a common feature, these standards perform compression on a per frame basis. With these standards, one can achieve high compression ratios for a wide range of applications.
Newer video coding standards, such as MPEG-4 (for multimedia applications), see“Information Technology—Generic coding of audio/visual objects,” ISO/IEC FDIS 14496-2 (MPEG4 Visual), Nov. 1998, allow arbitrary-shaped objects to be encoded and decoded as separate video object planes (VOP). The objects can be visual, audio, natural, synthetic, primitive, compound, or combinations thereof. Video objects are composed to form compound objects or “scenes.”
The emerging MPEG-4 standard is intended to enable multimedia applications, such as interactive video, where natural and synthetic materials are integrated, and where access is universal. MPEG-4 allows for content based interactivity. For example, one might want to “cut-and-paste” a moving figure or object from one video to another. In this type of application, it is assumed that the objects in the multimedia content have been identified through some type of segmentation process, see for example, U.S. patent application Ser. No. 09/326,750 “Method for Ordering Image Spaces to Search for Object Surfaces” filed on Jun. 4, 1999 by Lin et al.
In the context of video transmission, these compression standards are needed to reduce the amount of bandwidth (available bit rate) that is required by the network. The network can represent a wireless channel or the Internet. In any case, the network has limited capacity and a contention for its resources must be resolved when the content needs to be transmitted.
Over the years, a great deal of effort has been placed on architectures and processes that enable devices to transmit the video content robustly and to adapt the quality of the content to the available network resources. Rate control is used to allocate the number of bits per coding time instant. Rate control ensures that the bitstream produced by an encoder satisfies buffer constraints.
Rate control processes attempt to maximize the quality of the encoded signal, while providing a constant bit rate. For frame-based encoding, such as MPEG-2, see U.S. Pat. No. 5,847,761, “Method for performing rate control in a video encoder which provides a bit budget for each frame while employing virtual buffers and virtual buffer verifiers,” issued to Uz, et al. on Dec. 8, 1998. For object-based encoding, such as MPEG-4, see U.S. Pat. No. 5,969,764, “Adaptive video coding method,” issued to Sun and Vetro on Oct. 19, 1999.
When the content has already been encoded, it is sometimes necessary to further convert the already compressed bitstream before the stream is transmitted through the network to accommodate, for example, a reduction in the available bit rate. Bit stream conversion or“transcoding” can be classified as bit rate conversion, resolution conversion, and syntax conversion. Bit rate conversion includes bit rate scaling and conversion between a constant bit rate (CBR) and a variable bit rate (VBR). The basic function of bit rate scaling is to accept an input bitstream and produce a scaled output bitstream that meets new load constraints of a receiver. A bit stream scaler is a transcoder, or filter, that provides a match between a source bitstream and the receiving load.
As shown in FIG. 1, typically, scaling can be accomplished by a transcoder 100. In a brute force case, the transcoder includes a decoder 110 and encoder 120. A compressed input bitstream 101 is fully decoded at an input rate Rin, then encoded at a new output rate Rout 102 to produce the output bitstream 103. Usually, the output rate is lower than the input rate. However, in practice, full decoding and full encoding in a transcoder is not done due to the high complexity of encoding the decoded bitstream, instead the transcoding is done on a compressed or partial decoded bitstream.
Earlier work on MPEG-2 transcoding has been published by Sun et al., in “Architectures for MPEG compressed bitstream scaling,” IEEE Transactions on Circuits and Systems for Video Technology, April 1996. There, four methods of rate reduction, with varying complexity and architecture, were presented.
FIG. 2 shows an example method. In this architecture, the video bitstream is only partially decoded. More specifically, macroblocks of the input bitstream 201 are variable-length decoded (VLD) 210. The input bitstream is also delayed 220 and inverse quantized (IQ) 230 to yield discrete cosine transform (DCT) coefficients. Given the desired output bit rate, the partially decoded data are analyzed 240 and a new set of quantizers is applied at 250 to the DCT macroblocks. These re-quantized macroblocks are then variable-length coded (VLC) 260 and a new output bitstream 203 at a lower rate can be formed. This scheme is much simpler than the scheme shown in FIG. 1 because the motion vectors are re-used and an inverse DCT operation is not needed.
More recent work by Assuncao et al., in “A frequency domain video transcoder for dynamic bit-rate reduction of MPEG-2 bitstreams,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 953-957, December 1998, describe a simplified architecture for the same task. They use a motion compensation (MC) loop, operating in the frequency domain for drift compensation. Approximate matrices are derived for fast computation of the MC macroblocks in the frequency domain. A Lagrangian optimization is used to calculate the best quantizer scales for transcoding.
Other work by Sorial et al, “Joint transcoding of multiple MPEG video bitstreams,” Proceedings of the International Symposium on Circuits and Systems, May 1999, presents a method of jointly transcoding multiple MPEG-2 bitstreams, see also U.S. patent application Ser. No. 09/410,552 “Estimating Rate-Distortion Characteristics of Binary Shape Data,” filed Oct. 1, 1999 by Vetro et al.
According to prior art compression standards, the number of bits allocated for encoding texture information is controlled by a quantization parameter (QP). The above papers are similar. Changing the QP on the basis of information contained in the original bitstream reduces the rate of texture bits. For an efficient implementation, the information is usually extracted directly in the compressed domain and can include measures that relate to the motion of macroblocks or residual energy of DCT macroblocks. This type of analysis can be found in the bit allocation analyzer 240 of FIG. 2.
In addition to the above classical methods of transcoding, some new methods of transcoding have been described, for example, see U.S. patent application Ser. No. 09/504,323 “Object-Based Bitstream Transcoder,” filed by Vetro et al. on Feb. 14, 2000, for example. There, information delivery systems that overcome limitations of conventional transcoding systems were described. The conventional systems were somewhat bounded in the amount of rate that could be reduced, and also the conventional systems did not consider the overall perceptual quality; rather, objective measures, such as PSNR have dominated.
In the systems described by Vetro, et al., conversion is more flexible and the measure of quality can deviate from classical bit-by-bit differences.
Vetro summarizes video content in very unique ways. Within the object-based framework, individual video objects are transcoded with different qualities. The difference in quality can be related to either the spatial quality or the temporal resolution (quality).
If the temporal resolution is varied among objects in a scene, it is important that all objects maintain some type of temporal synchronization with each other. When temporal synchronization is maintained, the receiver can compose the objects so that all pixels within a reconstructed scene are defined.
Undefined pixels in the scene can result from background and foreground objects, or overlapping objects being sampled at different temporal resolutions so that in the re-composed scene, “holes” appear. Therefore, when varying the temporal resolution of multiple objects during encoding or transcoding, it was critical that synchronization was maintained.
To illustrate this point further. Consider a scene where there is a relatively stationary background object, e.g., a blank wall, and a more active foreground object such as moving person. The background can be encoded at a relatively low temporal resolution; say ten frames per second. The foreground object is encoded at a higher temporal resolution of thirty frames per second. This is fine as long as the foreground object does not move a lot. However, should the foreground object move with respect o the background, a “hole” will appear in that portion of the background, which is no longer occluded by the foreground object.
It is an object of the invention to correct this problem and to enable encoding and transcoding of multiple video objects with variable temporal resolutions.
The most recent standardization effort taken on by the MPEG standard committee is that of MPEG-7, formally called “Multimedia Content Description Interface,” see “MPEG-7 Context, Objectives and Technical Roadmap,” ISO/IEC N2861, July 1999. Essentially, this standard plans to incorporate a set of descriptors and description schemes that can be used to describe various types of multimedia content. The descriptor and description schemes are associated with the content itself and allow for fast and efficient searching of material that is of interest to a particular user. It is important to note that this standard is not meant to replace previous coding standards, rather, it builds on other standard representations, especially MPEG-4, because the multimedia content can be decomposed into different objects and each object can be assigned a unique set of descriptors. Also, the standard is independent of the format in which the content is stored.
The primary application of MPEG-7 is expected to be search and retrieval applications, see “MPEG-7 Applications,” ISO/IEC N2861, July 1999. In a simple application, a user specifies some attributes of a particular object. At this low-level of representation, these attributes can include descriptors that describe the texture, motion and shape of the particular object. A method of representing and comparing shapes has been described in U.S. patent application Ser. No. 09/326,759 “Method for Ordering Image Spaces to Represent Object Shapes” filed on Jun. 4, 1999 by Lin et al., and a method for describing the motion activity has been described in U.S. patent application Ser. No. 09/406,444 “Activity Descriptor for Video Sequences” filed on Sep. 27, 1999 by Divakaran et al. To obtain a higher-level of representation, one can consider more elaborate description schemes that combine several low-level descriptors. In fact, these description schemes can even contain other description schemes, see “MPEG-7 Multimedia Description Schemes WD (V1.0),” ISO/IEC N3113, December 1999 and U.S. patent application Ser. No. 09/385,169 “Method for representing and comparing multimedia content,” filed Aug. 30, 1999 by Lin et al.
These descriptors and description schemes allow a user to access properties of the video content that are not traditionally derived by an encoder or transcoder. For example, these properties can represent look-ahead information that was assumed to be inaccessible to the transcoder. The only reason that the encoder or transcoder has access to these properties is because the properties were extracted from the content at an earlier time, i.e., the content was pre-processed and stored in a database with its associated meta-data.
The information itself can be either syntactic or semantic, where syntactic information refers to the physical and logical signal aspects of the content, while the semantic information refers to the conceptual meaning of the content. For a video sequence, the syntactic elements can be related to the color, shape and motion of a particular object. On the other hand, the semantic elements can refer to information that cannot be extracted from low-level descriptors, such as the time and place of an event or the name of a person in a video sequence.
It is desired to maintain synchronization in an object-based encoder or transcoder for video objects in a scene having variable temporal resolutions. Moreover, it is desired that such variation is identified with video content meta-data.
SUMMARY OF THE INVENTION
The present invention provides an apparatus and method for coding a video. The coding according to the invention can be performed by an encoder or a transcoder. The video is first partitioned into video objects. In the case of the encoder, the partitioning is done with segmentation planes, and in the case of the transcoder, a demultiplexer is used. Over time, shape features are extracted from each object. The shape features can be obtained by measuring how the shape of each object evolves over time. A Hamming or Hausdorff distance measure can be used. The extracted shape features are combined in a rate or transcoder control unit to determine a temporal resolution for each object over time. The temporal resolutions are used to encode the various video objects. Optionally, motion features and coding complexity can also be considered while making trade-offs in temporal resolution determinations.
In the case where the video is uncompressed data, the partitioning, combining, and coding is performed in an encoder. For a compressed video, the demultiplxing, combining, and coding are performed in a transcoder. In the later case, boundary blocks of the objects in the compressed-video are used for extracting the shape features. In one aspect of the invention, different objects can have different temporal resolutions or frame rates.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a prior art transcoder;
FIG. 2 is a block diagram of a prior art partial decoder/encoder;
FIG. 3 is a block diagram of a scene reconstructed from two video objects;
FIG. 4 is a block diagram of a scene reconstructed from two video objects having different temporal resolutions;
FIG. 5 is a block diagram of an encoder according to the invention;
FIG. 6 is a block diagram of a transcoder according to the invention;
FIG. 7 is a flow diagram of a method for encoding according to the invention; and
FIG. 8 is a flow diagram of an example encoding strategy used by the method of FIG. 7.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Introduction to Variable Temporal Resolution Encoding and Transcoding
Our invention provides a method and apparatus for controlling temporal resolutions while encoding and transcoding multiple video objects in a scene. The temporal resolution controller enables the encoding, transcoding, and reconstruction of objects having variable and different temporal resolutions. One of the main advantages of an object-based coding scheme is that both the spatial and temporal resolution of the objects can vary independently.
Providing higher spatial quality to more interesting objects, such as human faces, is desired; the same is true for temporal resolution. However, in the context of temporal resolution, a major subtlety exists. That is, synchronization among the objects in the scene must be maintained so that all of the pixels in the reconstructed scene are defined. It should be noted that video reconstruction of a compressed video are defined by the normative part of most video standards, (MPEG-1/2/4), and handled by conventional decoders. Therefore, decoders are not described herein.
The method and apparatus we describe are applicable to both object-based encoding and transcoding systems, and real-time as well as non real-time applications. During encoding, the input video is uncompressed, and during the transcoding, the input video is compressed. In both cases, the output video is compressed. The mechanism and procedures that we describe can be seamlessly integrated into the architecture of prior art devices.
Composition Problem
FIG. 3 shows a scene 303 that has been partitioned into two video objects; a foreground object 301 and a background object 302. The scene can be reconstructed by combining the two objects. In this simple example, the foreground object is a moving person and the background object is a stationary wall. Note, that in the initial frame the pixels of the foreground and background objects define all of the pixels in the scene. When these two objects are encoded at the same temporal resolution, there is no problem with object composition during image reconstruction in the receiver. All pixels in the reconstructed scene 303 are defined.
However, a problem occurs when the objects are encoded at different temporal resolutions. As an example, the background is encoded at a frame rate of 15 Hz, while the foreground is encoded at a frame rate of 30 Hz, which is twice the first rate. In general, the two objects have independent motion, and the pixels that are associated with each will change in every frame. In addition, it should be noted that the foreground object could also be relatively stationary, but that it has higher internal motion than the background object. For example, the foreground is rich in texture, and it includes moving eyes, lips, and other moving facial features, while the background is a blank wall. Therefore, it is desired to encode the foreground at a higher spatial and temporal resolution than the background.
With our example, the foreground object is in motion with respect to the background as shown in the sequences of FIG. 4. In sequences 401-403, time runs from left to right. Here, the sequence 401 is the background object encoded at a relative low temporal resolution, the sequence 402 is the foreground objet encoded at a relative high resolution, and sequence 403 is the reconstructed scene. This causes holes 404 in every other frame. These holes are due to the movement of one object, without the updating of adjacent objects or overlapping objects. The holes are uncovered area of the scene that cannot be associated with either object and for which no pixels are defined. The holes disappear when the objects are resynchronized, e.g. every other frame.
Shape Distortion Metrics
The method and apparatus for controlling and making decisions on the temporal resolution of objects, according to our invention, indicates the amount of shape change (distortion) in a scene. We describe a number of shape features that can be extracted for this purpose, for example, one shape feature measures the shape difference of an object over time. After the shape features of the various objects have been extracted and compared, the encoder can decide the amount of temporal resolution to use for each object while encoding or transcoding.
Shape differences for each object are measured over time. The shape difference is inversely proportional to the amount of variability in the temporal resolution between the objects. For a fixed amount of time, a small difference indicate that a greater variability, whereas large difference indicates a lower variability. If the duration of time between when objects are resynchronized is made larger, the saved bits can be allocated to objects that need better quality.
Temporal Metrics
A method that optimally synchronizes the objects operates as follows. Periodically sample the video to find a difference between the shapes of each object over time. If the shape difference of an object is small over time, then increase the sampling period for measuring the difference. Continue to increase the sampling period until the difference is greater than some predetermined threshold D. At this point, either output the frames to resynchronize the video objects with that difference, or determine new frequency at which the objects should be synchronized. The frequency can be based on an average, a minimum, or a median time interval between synchronization frames. This frequency can be the used to determine an optimal temporal rate for each of the various video objects.
Difference Based Shape Features
For simplicity, we consider the difference in shape features between two scenes only, i.e., from one frame to the next. However, such shape features can also relate to the scene at various cue levels. Cue levels are defined in U.S. patent application Ser. No. 09/546,717, “Adaptable Bitstream Video Delivery System” filed by Vetro et al. on Apr. 11, 2000, incorporated herein by reference.
Depending on the cue level from which the shape feature is extracted, a temporal controller can provide various ways to effect the temporal resolution of objects in the scene, which are applicable to both encoders and transcoders.
Hamming Distance
The first difference measure that we consider is the well-known Hamming distance. The Hamming distance measures the number of pixels that are different between two shapes. First, we only consider binary shapes, i.e., segmentation (alpha—α) values may only be zero or one, where zero refers to a transparent pixel in an segmentation plane and one refers to an opaque pixel in the segmentation plane. Within this context, the Hamming distance, d, is defined as: d = n = 0 N - 1 m = 0 M - 1 α 1 ( m , n ) - α 2 ( m , n )
Figure US06650705-20031118-M00001
where α1 (m,n) and α2(m, n) are corresponding segmentation planes at different time instances.
Hausdorff Distance
Another widely used shape difference measure is the Hausdorff distance, which is defined as the maxmin function between two sets of pixel:
h(A,B)=max{min{d(a,b)}}
where a and b are pixels of the sets A and B of two video objects respectively, and d(a,b) is the Euclidean distance between these pixels. The above metric indicates the maximum distance of the pixels in set A to the nearest pixel in set B. Because this metric is not symmetric, i.e., h(A,B) may not be equal to h(B,A), a more general definition is given by:
H(A,B)=max {h(A,B),h(B,A)}.
We should note that these difference measures are most accurate when computed in the pixel-domain, however approximated data from the compressed-domain can also be used in the above computations. The pixel-domain data are readily available in the encoder, but for the transcoder, it may not be computationally feasible to decode the shape data. Instead, the data can be approximated in some computationally efficient way.
Macroblock Based Shape Features
For instance, in MPEG-4, shape is coded in a variety of different modes and is done at the macroblock level. For example, in intra-mode, a shape macro-block is coded as an opaque macroblock, a transparent macroblock or a boundary macroblock. The boundary blocks, of course define the shape of an object. These coding modes can be used to reconstruct the macroblock level silhouette of the binary shape. Of course, it would not be as accurate as the pixel-level metric, but is quite feasible in terms of complexity.
Encoder Structure
FIG. 5 shows an object-based encoder 500 according to our invention. The encoder includes a switch 510, a shape coder 520, a motion estimator 530, a motion compensator, a motion coder 550, a texture coder 560, a VOP memory 570, a multiplexer (MUX) 580, an output buffer 590, a meta data storage unit 591. The encoder also includes a rate control unit (RCU) 592 for performing texture, temporal, shape, and meta-data analysis 593-596. Input to the encoder 500 is an object-based video (In) 501. The video is composed of a image sequence data and segmentation (alpha) planes defining the boundary (shape) of each video object.
Encoder Operation
The shape coder 520 processes the shape of each object and writes the shape coding results into an output bitstream (Out) 509 via the MUX 580 and buffer 590. The shape data are also used for motion estimation 530, motion compensation 540, and texture coding 560. Particularly, the shape data is used to extract shape features for each object. The objects, and associated shape and motion features are stored in the VOP memory 570.
In the motion estimator 530, motion vectors are determined for each macroblock. The motion vectors are also coded and written into the output bitstream via the MUX and buffer. Using the motion vectors derived from the motion estimation, a motion compensated prediction is formed from video object data stored in the VOP memory 570. This prediction is subtracted 541 from the input object yielding a set of residual macroblocks. These residual macroblocks are subject to the texture coding 560 and the corresponding data are written to the output bitstream. The texture coding is according to a QP control signal provided by the RCU.
The quantization parameter (QP) of the RCU 592 is responsible for selecting the appropriate QP for each video object. It does so by using models to estimate the corresponding QP according to the assigned rate budget. The temporal analysis is described in detail below. Briefly, the temporal analysis is responsible for controlling the temporal resolution of each object during coding and transcoding.
In the prior art, the temporal resolution of all video objects is identical to avoid composition problems as described above with reference to FIG. 4. Therefore, the prior art did not independently consider temporal resolution for the various objects. There, the temporal analysis provided a signal to skip all video objects when the output buffer was in danger of overflowing. Our invention provides a better solution, for example, objects that are relatively stationary can be encoded at a lower frame rate than faster moving objects to reduce the overall bit rate.
In the present invention, we consider variable temporal qualities. We enable the encoding and transcoding of video objects with variable temporal resolutions.
The shape analysis 592 is responsible for extracting the shape features that are used by the temporal analysis to decide if variable temporal resolution can be achieved without composing problems, i.e., holes are avoided even if the temporal encoding rates of the various objects are different. The shape analysis can work in the real-time encoding mode, where data are retrieved from the VOP memory 570. However, if the encoder also receives the meta-data 594 related to the shape features, i.e., descriptions of the content already exist, then such meta-data can be used in place of, or in conjunction with the shape data from the VOP memory 570. The meta-data are handled by the meta-data analysis, and like the shape analysis, the meta-data assists the temporal analysis in determining an optimal temporal resolution for each video object.
Transcoder Structure
FIG. 6 shows a high-level block diagram of an object-based transcoder 600 according to an alternative embodiment of the invention. Here, the input video is already compressed. The transcoder 600 includes a demultiplexer 601, a multiplexer 602, and an output buffer 603. The transcoder 600 also includes one or more object-based transcoders 630 operated by a transcoding control unit (TCU) 610 according to control information 604. The unit TCU includes texture, temporal, shape and meta-data analyzers 611-514.
An input compressed bitstream 605 is partitioned into one or more an object-based elementary bitstreams by the demultiplexer. The object-based bitstreams can be serial or parallel. The total bit rate of the bitstream 605 is Rin. The output compressed bitstream 606 from the transcoder 600 has a total bit rate Rout such that Rout<Rin. The multiplexer 601 provides one or more elementary bitstream to each of the object-based transcoders 630, and the object-based transcoders provide object data 607 to the TCU 610.
The transcoders scale the elementary bitstreams. The scaled bitstreams are composed by the multiplexer 602 before being passed on to the output buffer 603, and from there to a receiver. The output buffer 606 also provides rate-feedback information 608 to the TCU.
As stated above, the control information 604 that is passed to each of the transcoders is provided by the TCU. As indicated in FIG. 6, the TCU is responsible for the analysis 611-612 of texture and shape data. During the analysis, the TCU can also use network data 609. The TCU also performs meta-data analysis 614. The analysis of the temporal quality enables transcoding with variable temporal resolution.
Encoding/Transcoding Method
FIG. 7 shows the steps of a method 700 for encoding and transcoding a video 701 according to our invention. The input 701 to the method is either an uncompressed video in the case of the encoder 500 or a compressed video in the case of the transcoder 600. Step 710 partitions the video 701 into objects 711. Step 720 extracts, over time, shape features 721 from each object. The shape features can be distance or macroblock based as described above. Step 730 extracts, optionally, motion feature from each object over time. Other features that can be extracted and considered to determine an optimal temporal resolution can include coding complexity, e.g. spatial complexity, DCT complexity, texture complexity, etc. Step 740 combines the extracted features to determine temporal resolutions 741 to use while encoding or transcoding the various objects 711 in step 750.
Example Encoding Scenarios
FIG. 8 show some example encoding scenarios that are based on analyzing the evolving shape of video objects over time. Here, the input is first and second extracted object sequences 801-802. Graphs 810 and 820 plot shape features, e.g. shape differences (Δ) over time (t). Note, between times t1 and t2 the objects' shapes remain relatively constant. Graphs 811 and 821 plot optionally each object's internal motion features over time. Note, the first object has very little internal motion, while the second object's internal motion is quite high. The combiner 850 (RCU 592 or TCU 610) considers the extracted features using, perhaps a maxmin, summation, comparison, or other function combinatorial function to make decision on how to best distribute the available bits over the various objects during the actual coding.
In scenario 831, do not code the first object at all during the interval [t1,t2], and allocate all available bits to the second object. This might have the effect of an observable and sudden drastic change in the quality of the video at times t1 and t2. A better scenario 831 might use a lower temporal resolution during the interval [t1,t2], or better yet a gradual reduction in resolution followed by a gradual increase. In scenario 833, more bits are allocated to the second object during the time intervals [t0, t1] and [t2, tend], then during the interval [t1,t2], to reflect the higher internal motion of the second object.
All of the new degrees of freedom, described above, make the object-based transcoding framework very unique and desirable for network applications. As with the MPEG-2 and H.263 coding standards, MPEG-4 exploits the spatio-temporal redundancy of video using motion compensation and DCT. As a result, the core of our object-based transcoders is an adaptation of MPEG-2 transcoders that have been described above. The major difference is that shape information is now contained within the bitstream, and with regard to texture coding, tools are provided to predict DC and AC for Intra blocks.
It is also important to note that the transcoding of texture is indeed dependent on the shape data. In other words, the shape data cannot simply be parsed out and ignored; the syntax of a compliant bitstream depends on the decoded shape data.
Obviously, our object-based input and output bitstreams 601-602 are entirely different than traditional frame-based video programs. Also, MPEG-2 does not permit dynamic frame skipping. There, the GOP structure and reference frames are usually fixed.
In the non-real-time scenario case, content 651 and corresponding content descriptions 652 are stored in a database 650. The content descriptions are generated from a feature extractor 640, which accepts the input object-based bitstreams 605. When it is time to transmit the contents, the input bitstream is fed into the demux 601 and transcoder as described above. The meta-data are sent to the meta-data analysis 614 within the TCU.
Functionality of Temporal Analysis
The main objective of the temporal controller in an object-based encoder or transcoder is to maximize the quality of the composed scene on the receiver side, while avoiding composition problems as described above with reference to FIG. 4. To maximize quality under these constraints, it is necessary to exploit the temporal redundancy in the signal as much as possible.
In most video coding schemes, the motion compensation process achieves the removal of temporal redundancy. However, specifying the motion vector for every coding unit or macroblock may be more than is actually required. In addition to bits for the motion vector, the residual of the motion compensated difference must also be coded. The point is, to maximize quality not every object needs to be coded at every time instant. In this way, these saved bits can be used for other more important objects at different time instants.
For the non-real-time scenario, the temporal controller makes use of the shape distortion metrics to indicate the amount of movement among shapes in the scene. This measure can relate to the scene at various cue levels as defined in U.S. patent application Ser. No. 09/546,717. Depending on the cue level that this feature (or measure) is extracted from, the temporal controller can provide various ways to impact the temporal resolution of objects in the scene, which are applicable to both encoders and transcoders.
For real-time scenarios, the temporal controller acts in the same manner. However, because the observations are limited because of latency constraints, only causal data are considered. Therefore, the temporal coding decisions are made instantaneously.
As stated earlier, extraction of the shape distortion metric can be done in either the pixel or compressed domain. Regardless of where distortion information is extracted, it should be noted that some tolerance can be incorporated into the decision-making process of the temporal control. In other words, some applications may to tolerate a small amount of undefined area, provide that the gain in the defined area is substantial.
In this case, a weight ranging between [0,1] is defined, where 0 means that there is no movement among the shape boundaries and 1 means that the shape boundary is completely different. The weight is a function of the shape distortion metrics defined earlier and can correspond to a percentage or normalized value. On the other hand, for applications that do not allow room for composition problems, this weighting will not exist. Rather, only the extreme weights are valid, i.e., 0 or 1.
In situations when some tolerable amount of undefined pixels are received, it is possible to recover these pixels using simple post-processing interpolation techniques or other techniques based on error concealment.
Effects and Advantages of Variable Temporal Resolution Encoding
The temporal controller according to our invention provides the following effects and advantages.
Determine instances in time when objects can be encoded or transcoded with variable temporal resolution. Assign fixed non-uniform frame-rates to the objects of a video segment. Extract or locate key frames to enable the summarization of content.
Improve bit allocation, or reserve bits for portions (frames) of a video where changes in shape of objects are large. Such frames are more demanding on the bits required for the shape information. In order to maintain the quality of the texture information, additional bits may be required.
Although the invention has been described by way of examples of above embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.

Claims (19)

We claim:
1. A method for coding a video, comprising:
partitioning the video into a plurality of objects;
measuring, over time, differences in shape of each object to determine a plurality of shape features of each object;
combining, the plurality of shape features to determine a temporal resolution for each object; and
coding each object according to the corresponding temporal resolution of the object.
2. The method of claim 1 wherein the video is uncompressed data, and the partitioning, combining, and coding is performed in an encoder.
3. The method of claim 1 wherein the video is compressed data, and the partitioning, combining, and coding is performed in a transcoder.
4. The method of claim 1 wherein at least two objects are coded according to different corresponding temporal resolutions.
5. The method of claim 1 wherein the temporal resolution of a coded object is proportional to the shape difference associated with the coded object.
6. The method of claim 1 wherein the shape difference is a Hamming distance which measures the number of pixels that are different between the objects.
7. The method of claim 3 wherein the partitioned objects have binary shapes, and the Hamming distance, d, is defined as: d = n = 0 N - 1 m = 0 M - 1 α 1 ( m , n ) - α 2 ( m , n )
Figure US06650705-20031118-M00002
where α1(m,n) and α2(m,n) are corresponding segmentation planes at different time instants.
8. The method of claim 3 wherein the shape difference is a Hausdorff distance, which defines as maxmin function between sets of pixels, associated with the objects.
9. The method of claim 8 wherein the maxmin function is
h(A,B)=max{min{d(a,b)}}
where a and b are pixels of sets A and B of a first and second object respectively, and d(a,b) is a Euclidean distance between the pixels.
10. The method of claim 1 wherein the video includes a plurality of frames, and each frame includes a plurality of macroblocks, and the macroblocks are coded as opaque blocks, transparent blocks, and boundary blocks.
11. The method of claim 1 further comprising:
coding the shape features of the objects as meta-data.
12. The method of claim 1 further comprising;
extracting, overtime, a motion feature from each object;
combining, over time, the motion features with the shape features to determine the temporal resolution for each object over time.
13. The method of claim 1 further comprising;
extracting, overtime, a coding complexity from each object;
combining, over time, the coding complexity with the shape features to determine the temporal resolution for each object over time.
14. The method of claim 1 wherein the shape features of the objects are extracted from a plurality of cue levels of the video.
15. An apparatus for coding a video, comprising:
means for partitioning a video into a plurality of objects;
means for measuring, over time, differences in shape of each object to determine a plurality of shape features of each object;
means for combining, the plurality of shape features to determine a temporal resolution for each object; and
means for coding each object according to the corresponding temporal resolution of the object.
16. The apparatus of claim 15 wherein the means for partitioning and measuring includes a shape code, a motion estimator, a motion compensator and a texture coder.
17. The apparatus of claim 15 wherein the objects and shape features are stored in a memory.
18. The apparatus of claim 15 wherein the video is an uncompressed, and the means for combining is a rate control unit.
19. The apparatus of claim 15 wherein the video is compressed, and the means for combining is a transcoding control unit.
US09/579,889 2000-05-26 2000-05-26 Method for encoding and transcoding multiple video objects with variable temporal resolution Expired - Fee Related US6650705B1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US09/579,889 US6650705B1 (en) 2000-05-26 2000-05-26 Method for encoding and transcoding multiple video objects with variable temporal resolution
CN01802111.5A CN1199467C (en) 2000-05-26 2001-03-08 Method and device for encoding image
PCT/JP2001/001828 WO2001091467A1 (en) 2000-05-26 2001-03-08 Method and device for encoding image
EP01912202A EP1289301B1 (en) 2000-05-26 2001-03-08 Method and device for encoding image
JP2001586925A JP4786114B2 (en) 2000-05-26 2001-03-08 Method and apparatus for encoding video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/579,889 US6650705B1 (en) 2000-05-26 2000-05-26 Method for encoding and transcoding multiple video objects with variable temporal resolution

Publications (1)

Publication Number Publication Date
US6650705B1 true US6650705B1 (en) 2003-11-18

Family

ID=24318760

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/579,889 Expired - Fee Related US6650705B1 (en) 2000-05-26 2000-05-26 Method for encoding and transcoding multiple video objects with variable temporal resolution

Country Status (5)

Country Link
US (1) US6650705B1 (en)
EP (1) EP1289301B1 (en)
JP (1) JP4786114B2 (en)
CN (1) CN1199467C (en)
WO (1) WO2001091467A1 (en)

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010938A1 (en) * 2000-05-31 2002-01-24 Qian Zhang Resource allocation in multi-stream IP network for optimized quality of service
US20020028024A1 (en) * 2000-07-11 2002-03-07 Mediaflow Llc System and method for calculating an optimum display size for a visual object
US20020118752A1 (en) * 2000-12-26 2002-08-29 Nec Corporation Moving picture encoding system
US20020152317A1 (en) * 2001-04-17 2002-10-17 General Instrument Corporation Multi-rate transcoder for digital streams
US20020198905A1 (en) * 2001-05-29 2002-12-26 Ali Tabatabai Transport hint table for synchronizing delivery time between multimedia content and multimedia content descriptions
US20030122733A1 (en) * 2000-06-15 2003-07-03 Blackham Geoffrey Howard Apparatus using framestore demultiplexing
US20030169933A1 (en) * 2002-03-09 2003-09-11 Samsung Electronics Co., Ltd. Method for adaptively encoding motion image based on temporal and spatial complexity and apparatus therefor
US20040001544A1 (en) * 2002-06-28 2004-01-01 Microsoft Corporation Motion estimation/compensation for screen capture video
US20040001634A1 (en) * 2002-06-28 2004-01-01 Microsoft Corporation Text detection in continuous tone image segments
US20040013198A1 (en) * 2001-08-31 2004-01-22 Haruo Togashi Encoding apparatus and method for encoding
US20040017939A1 (en) * 2002-07-23 2004-01-29 Microsoft Corporation Segmentation of digital video and images into continuous tone and palettized regions
US20040062268A1 (en) * 2002-09-30 2004-04-01 Intel Corporation Automated method for mapping constant bit-rate network traffic onto a non-constant bit-rate network
US20040131261A1 (en) * 2002-09-04 2004-07-08 Microsoft Corporation Image compression and synthesis for video effects
US20040189863A1 (en) * 1998-09-10 2004-09-30 Microsoft Corporation Tracking semantic objects in vector image sequences
US20040225506A1 (en) * 2001-06-28 2004-11-11 Microsoft Corporation Techniques for quantization of spectral data in transcoding
US20040252230A1 (en) * 2003-06-13 2004-12-16 Microsoft Corporation Increasing motion smoothness using frame interpolation with motion analysis
US20040252759A1 (en) * 2003-06-13 2004-12-16 Microsoft Corporation Quality control in frame interpolation with motion analysis
US20050058195A1 (en) * 2003-07-18 2005-03-17 Samsung Electronics Co., Ltd. GoF/GoP texture description method, and texture-based GoF/GoP retrieval method and apparatus using the same
US20050175099A1 (en) * 2004-02-06 2005-08-11 Nokia Corporation Transcoder and associated system, method and computer program product for low-complexity reduced resolution transcoding
US6950464B1 (en) * 2001-12-26 2005-09-27 Cisco Technology, Inc. Sub-picture level pass through
US20050232497A1 (en) * 2004-04-15 2005-10-20 Microsoft Corporation High-fidelity transcoding
US20050254584A1 (en) * 2001-03-05 2005-11-17 Chang-Su Kim Systems and methods for enhanced error concealment in a video decoder
US7020335B1 (en) * 2000-11-21 2006-03-28 General Dynamics Decision Systems, Inc. Methods and apparatus for object recognition and compression
US20060109915A1 (en) * 2003-11-12 2006-05-25 Sony Corporation Apparatus and method for use in providing dynamic bit rate encoding
US20060125956A1 (en) * 2004-11-17 2006-06-15 Samsung Electronics Co., Ltd. Deinterlacing method and device in use of field variable partition type
US20060233258A1 (en) * 2005-04-15 2006-10-19 Microsoft Corporation Scalable motion estimation
US20060291412A1 (en) * 2005-06-24 2006-12-28 Naqvi Shamim A Associated device discovery in IMS networks
US20070011718A1 (en) * 2005-07-08 2007-01-11 Nee Patrick W Jr Efficient customized media creation through pre-encoding of common elements
US20070171979A1 (en) * 2004-02-20 2007-07-26 Onno Eerenberg Method of video decoding
US20070182728A1 (en) * 2006-02-06 2007-08-09 Seiko Epson Corporation Image display system, image display method, image display program, recording medium, data processing apparatus, and image display apparatus
US20070197227A1 (en) * 2006-02-23 2007-08-23 Aylus Networks, Inc. System and method for enabling combinational services in wireless networks by using a service delivery platform
US20070237226A1 (en) * 2006-04-07 2007-10-11 Microsoft Corporation Switching distortion metrics during motion estimation
US20070237232A1 (en) * 2006-04-07 2007-10-11 Microsoft Corporation Dynamic selection of motion estimation search ranges and extended motion vector ranges
US20070268964A1 (en) * 2006-05-22 2007-11-22 Microsoft Corporation Unit co-location-based motion estimation
US7321624B1 (en) * 2001-03-16 2008-01-22 Objectvideo, Inc. Bit-rate allocation system for object-based video encoding
US20080162713A1 (en) * 2006-12-27 2008-07-03 Microsoft Corporation Media stream slicing and processing load allocation for multi-user media systems
US20080170753A1 (en) * 2007-01-11 2008-07-17 Korea Electronics Technology Institute Method for Image Prediction of Multi-View Video Codec and Computer Readable Recording Medium Therefor
US20080291905A1 (en) * 2006-05-16 2008-11-27 Kiran Chakravadhanula Systems and Methods for Real-Time Cellular-to-Internet Video Transfer
US20090251829A1 (en) * 2008-04-02 2009-10-08 Headway Technologies, Inc. Seed layer for TMR or CPP-GMR sensor
US20100296572A1 (en) * 2007-12-11 2010-11-25 Kumar Ramaswamy Methods and systems for transcoding within the distributiion chain
US20100309975A1 (en) * 2009-06-05 2010-12-09 Apple Inc. Image acquisition and transcoding system
US7853865B2 (en) * 2002-03-19 2010-12-14 Sharp Laboratories Of America, Inc. Synchronization of video and data
US20110128448A1 (en) * 2008-03-06 2011-06-02 Erwin Bellers Temporal Fallback For High Frame Rate Picture Rate Conversion
WO2012039933A1 (en) * 2010-09-21 2012-03-29 Dialogic Corporation Efficient coding complexity for video transcoding systems
US8170534B2 (en) 2007-04-17 2012-05-01 Aylus Networks, Inc. Systems and methods for user sessions with dynamic service selection
US8270473B2 (en) 2009-06-12 2012-09-18 Microsoft Corporation Motion based dynamic resolution multiple bit rate video encoding
US20120281748A1 (en) * 2011-05-02 2012-11-08 Futurewei Technologies, Inc. Rate Control for Cloud Transcoding
US8311115B2 (en) 2009-01-29 2012-11-13 Microsoft Corporation Video encoding using previously calculated motion information
US8396114B2 (en) 2009-01-29 2013-03-12 Microsoft Corporation Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming
US20130070859A1 (en) * 2011-09-16 2013-03-21 Microsoft Corporation Multi-layer encoding and decoding
US8432899B2 (en) 2007-02-22 2013-04-30 Aylus Networks, Inc. Systems and methods for enabling IP signaling in wireless networks
US8457958B2 (en) 2007-11-09 2013-06-04 Microsoft Corporation Audio transcoder using encoder-generated side information to transcode to target bit-rate
US8483373B2 (en) 2005-06-24 2013-07-09 Aylus Networks, Inc. Method of avoiding or minimizing cost of stateful connections between application servers and S-CSCF nodes in an IMS network with multiple domains
USRE44412E1 (en) 2005-06-24 2013-08-06 Aylus Networks, Inc. Digital home networks having a control point located on a wide area network
US8611334B2 (en) 2006-05-16 2013-12-17 Aylus Networks, Inc. Systems and methods for presenting multimedia objects in conjunction with voice calls from a circuit-switched network
CN103597839A (en) * 2011-05-31 2014-02-19 杜比实验室特许公司 Video compression implementing resolution tradeoffs and optimization
US8705616B2 (en) 2010-06-11 2014-04-22 Microsoft Corporation Parallel multiple bitrate video encoding to reduce latency and dependences between groups of pictures
US20140307793A1 (en) * 2006-09-06 2014-10-16 Alexander MacInnis Systems and Methods for Faster Throughput for Compressed Video Data Decoding
US20140348231A1 (en) * 2009-09-04 2014-11-27 STMicoelectronics International N.V. System and method for object based parametric video coding
US9371099B2 (en) 2004-11-03 2016-06-21 The Wilfred J. and Louisette G. Lagassey Irrevocable Trust Modular intelligent transportation system
US10178396B2 (en) 2009-09-04 2019-01-08 Stmicroelectronics International N.V. Object tracking
US10847048B2 (en) * 2018-02-23 2020-11-24 Frontis Corp. Server, method and wearable device for supporting maintenance of military apparatus based on augmented reality using correlation rule mining
US11089343B2 (en) 2012-01-11 2021-08-10 Microsoft Technology Licensing, Llc Capability advertisement, configuration and control for video coding and decoding
US11671632B2 (en) * 2018-08-14 2023-06-06 Huawei Technologies Co., Ltd. Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2842983B1 (en) * 2002-07-24 2004-10-15 Canon Kk TRANSCODING OF DATA
DE10335009A1 (en) * 2003-07-23 2005-02-10 Atmel Germany Gmbh Method for wireless data transmission between a base station and a transponder
KR100619822B1 (en) 2003-12-24 2006-09-13 엘지전자 주식회사 Image processing apparatus and method
WO2007007257A1 (en) * 2005-07-13 2007-01-18 Koninklijke Philips Electronics N.V. Processing method and device with video temporal up-conversion
WO2008123568A1 (en) * 2007-04-04 2008-10-16 Nec Corporation Content distribution system, content distribution method, and translator for use in them
JP5337969B2 (en) * 2008-04-08 2013-11-06 富士フイルム株式会社 Image processing system, image processing method, and program
US8447128B2 (en) 2008-04-07 2013-05-21 Fujifilm Corporation Image processing system
FR2932055B1 (en) * 2008-06-03 2010-08-13 Thales Sa METHOD FOR ADAPTATION OF THE FLOW OF TRANSMISSION OF VIDEO FLOWS BY PRETREATMENT IN THE COMPRESSED DOMAIN AND SYSTEM IMPLEMENTING THE METHOD
KR102499355B1 (en) 2016-02-26 2023-02-13 벌시테크 리미티드 A shape-adaptive model-based codec for lossy and lossless image compression

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5847761A (en) 1995-12-26 1998-12-08 C-Cube Microsystems Inc. Method for performing rate control in a video encoder which provides a bit budget for each frame while employing virtual buffers and virtual buffer verifiers
US5969764A (en) 1997-02-14 1999-10-19 Mitsubishi Electric Information Technology Center America, Inc. Adaptive video coding method
US6026195A (en) * 1997-03-07 2000-02-15 General Instrument Corporation Motion estimation and compensation of video object planes for interlaced digital video
US6167084A (en) * 1998-08-27 2000-12-26 Motorola, Inc. Dynamic bit allocation for statistical multiplexing of compressed and uncompressed digital video signals
US6192080B1 (en) * 1998-12-04 2001-02-20 Mitsubishi Electric Research Laboratories, Inc. Motion compensated digital video signal processing
US6295371B1 (en) * 1998-10-22 2001-09-25 Xerox Corporation Method and apparatus for image processing employing image segmentation using tokenization
US6385242B1 (en) * 1998-05-04 2002-05-07 General Instrument Corporation Method and apparatus for inverse quantization of MPEG-4 video
US6411724B1 (en) * 1999-07-02 2002-06-25 Koninklijke Philips Electronics N.V. Using meta-descriptors to represent multimedia information
US6459812B2 (en) * 1996-09-09 2002-10-01 Sony Corporation Picture encoding and/or decoding apparatus and method for providing scalability of a video object whose position changes with time and a recording medium having the same recorded thereon

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63157579A (en) * 1986-12-22 1988-06-30 Nippon Telegr & Teleph Corp <Ntt> Pseudo three-dimensional image pickup device
JPH0813145B2 (en) * 1988-03-09 1996-02-07 国際電信電話株式会社 Video coding method using region segmentation
JP2828997B2 (en) * 1988-07-22 1998-11-25 株式会社日立製作所 Adaptive transform coding device
JPH0787586B2 (en) * 1989-08-02 1995-09-20 富士通株式会社 Image signal coding control method
JP2536684B2 (en) * 1990-09-29 1996-09-18 日本ビクター株式会社 Image coding device
JPH04354489A (en) * 1991-05-31 1992-12-08 Fujitsu Ltd Picture coder
JPH05111015A (en) * 1991-10-17 1993-04-30 Sony Corp Movement adaptive image encoder
JP3245977B2 (en) * 1992-06-30 2002-01-15 ソニー株式会社 Digital image signal transmission equipment
JP3197420B2 (en) * 1994-01-31 2001-08-13 三菱電機株式会社 Image coding device
JPH07288806A (en) * 1994-04-20 1995-10-31 Hitachi Ltd Moving image communication system
JP4499204B2 (en) * 1997-07-18 2010-07-07 ソニー株式会社 Image signal multiplexing apparatus and method, and transmission medium
JP3860323B2 (en) * 1997-10-27 2006-12-20 三菱電機株式会社 Image decoding apparatus and image decoding method
JP3211778B2 (en) * 1998-07-17 2001-09-25 ミツビシ・エレクトリック・リサーチ・ラボラトリーズ・インコーポレイテッド Improved adaptive video coding method
JP2000078572A (en) * 1998-08-31 2000-03-14 Toshiba Corp Object encoding device, frame omission control method for object encoding device and storage medium recording program
JP2000092489A (en) * 1998-09-09 2000-03-31 Toshiba Corp Device and method for image encoding and medium recorded with program

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5847761A (en) 1995-12-26 1998-12-08 C-Cube Microsystems Inc. Method for performing rate control in a video encoder which provides a bit budget for each frame while employing virtual buffers and virtual buffer verifiers
US6459812B2 (en) * 1996-09-09 2002-10-01 Sony Corporation Picture encoding and/or decoding apparatus and method for providing scalability of a video object whose position changes with time and a recording medium having the same recorded thereon
US5969764A (en) 1997-02-14 1999-10-19 Mitsubishi Electric Information Technology Center America, Inc. Adaptive video coding method
US6026195A (en) * 1997-03-07 2000-02-15 General Instrument Corporation Motion estimation and compensation of video object planes for interlaced digital video
US6385242B1 (en) * 1998-05-04 2002-05-07 General Instrument Corporation Method and apparatus for inverse quantization of MPEG-4 video
US6167084A (en) * 1998-08-27 2000-12-26 Motorola, Inc. Dynamic bit allocation for statistical multiplexing of compressed and uncompressed digital video signals
US6295371B1 (en) * 1998-10-22 2001-09-25 Xerox Corporation Method and apparatus for image processing employing image segmentation using tokenization
US6192080B1 (en) * 1998-12-04 2001-02-20 Mitsubishi Electric Research Laboratories, Inc. Motion compensated digital video signal processing
US6411724B1 (en) * 1999-07-02 2002-06-25 Koninklijke Philips Electronics N.V. Using meta-descriptors to represent multimedia information

Cited By (116)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7088845B2 (en) 1998-09-10 2006-08-08 Microsoft Corporation Region extraction in vector images
US20040189863A1 (en) * 1998-09-10 2004-09-30 Microsoft Corporation Tracking semantic objects in vector image sequences
US7559078B2 (en) * 2000-05-31 2009-07-07 Microsoft Corporation Resource allocation in multi-stream IP network for optimized quality of service
US20050155080A1 (en) * 2000-05-31 2005-07-14 Microsoft Corporation Resource allocation in multi-stream IP network for optimized quality of service
US7260826B2 (en) * 2000-05-31 2007-08-21 Microsoft Corporation Resource allocation in multi-stream IP network for optimized quality of service
US7574726B2 (en) * 2000-05-31 2009-08-11 Microsoft Corporation Resource allocation in multi-stream IP network for optimized quality of service
US7450571B2 (en) 2000-05-31 2008-11-11 Microsoft Corporation Resource allocation in multi-stream IP network for optimized quality of service
US20020010938A1 (en) * 2000-05-31 2002-01-24 Qian Zhang Resource allocation in multi-stream IP network for optimized quality of service
US20060156200A1 (en) * 2000-05-31 2006-07-13 Microsoft Corporation Resource Allocation in Multi-Stream IP Network for Optimized Quality of Service
US20060156201A1 (en) * 2000-05-31 2006-07-13 Microsoft Corporation Resource Allocation in Multi-Stream IP Network for Optimized Quality of Service
US20030122733A1 (en) * 2000-06-15 2003-07-03 Blackham Geoffrey Howard Apparatus using framestore demultiplexing
US20020028024A1 (en) * 2000-07-11 2002-03-07 Mediaflow Llc System and method for calculating an optimum display size for a visual object
US7020335B1 (en) * 2000-11-21 2006-03-28 General Dynamics Decision Systems, Inc. Methods and apparatus for object recognition and compression
US20020118752A1 (en) * 2000-12-26 2002-08-29 Nec Corporation Moving picture encoding system
US7170940B2 (en) * 2000-12-26 2007-01-30 Nec Corporation Moving picture encoding system
US20070104271A1 (en) * 2000-12-26 2007-05-10 Nec Corporation Moving picture encoding system
US20050254584A1 (en) * 2001-03-05 2005-11-17 Chang-Su Kim Systems and methods for enhanced error concealment in a video decoder
US7321624B1 (en) * 2001-03-16 2008-01-22 Objectvideo, Inc. Bit-rate allocation system for object-based video encoding
US20020152317A1 (en) * 2001-04-17 2002-10-17 General Instrument Corporation Multi-rate transcoder for digital streams
US6925501B2 (en) * 2001-04-17 2005-08-02 General Instrument Corporation Multi-rate transcoder for digital streams
US7734997B2 (en) * 2001-05-29 2010-06-08 Sony Corporation Transport hint table for synchronizing delivery time between multimedia content and multimedia content descriptions
US20020198905A1 (en) * 2001-05-29 2002-12-26 Ali Tabatabai Transport hint table for synchronizing delivery time between multimedia content and multimedia content descriptions
US20040225506A1 (en) * 2001-06-28 2004-11-11 Microsoft Corporation Techniques for quantization of spectral data in transcoding
US20050240398A1 (en) * 2001-06-28 2005-10-27 Microsoft Corporation Techniques for quantization of spectral data in transcoding
US7092879B2 (en) 2001-06-28 2006-08-15 Microsoft Corporation Techniques for quantization of spectral data in transcoding
US7069209B2 (en) 2001-06-28 2006-06-27 Microsoft Corporation Techniques for quantization of spectral data in transcoding
US20040013198A1 (en) * 2001-08-31 2004-01-22 Haruo Togashi Encoding apparatus and method for encoding
US6950464B1 (en) * 2001-12-26 2005-09-27 Cisco Technology, Inc. Sub-picture level pass through
US7280708B2 (en) * 2002-03-09 2007-10-09 Samsung Electronics Co., Ltd. Method for adaptively encoding motion image based on temporal and spatial complexity and apparatus therefor
US20030169933A1 (en) * 2002-03-09 2003-09-11 Samsung Electronics Co., Ltd. Method for adaptively encoding motion image based on temporal and spatial complexity and apparatus therefor
US8214741B2 (en) 2002-03-19 2012-07-03 Sharp Laboratories Of America, Inc. Synchronization of video and data
US7853865B2 (en) * 2002-03-19 2010-12-14 Sharp Laboratories Of America, Inc. Synchronization of video and data
US7085420B2 (en) 2002-06-28 2006-08-01 Microsoft Corporation Text detection in continuous tone image segments
US7224731B2 (en) * 2002-06-28 2007-05-29 Microsoft Corporation Motion estimation/compensation for screen capture video
US20040001634A1 (en) * 2002-06-28 2004-01-01 Microsoft Corporation Text detection in continuous tone image segments
US20040001544A1 (en) * 2002-06-28 2004-01-01 Microsoft Corporation Motion estimation/compensation for screen capture video
US7072512B2 (en) 2002-07-23 2006-07-04 Microsoft Corporation Segmentation of digital video and images into continuous tone and palettized regions
US20040017939A1 (en) * 2002-07-23 2004-01-29 Microsoft Corporation Segmentation of digital video and images into continuous tone and palettized regions
US20040131261A1 (en) * 2002-09-04 2004-07-08 Microsoft Corporation Image compression and synthesis for video effects
US7421129B2 (en) 2002-09-04 2008-09-02 Microsoft Corporation Image compression and synthesis for video effects
US20040062268A1 (en) * 2002-09-30 2004-04-01 Intel Corporation Automated method for mapping constant bit-rate network traffic onto a non-constant bit-rate network
US7292574B2 (en) * 2002-09-30 2007-11-06 Intel Corporation Automated method for mapping constant bit-rate network traffic onto a non-constant bit-rate network
US7558320B2 (en) 2003-06-13 2009-07-07 Microsoft Corporation Quality control in frame interpolation with motion analysis
US7408986B2 (en) 2003-06-13 2008-08-05 Microsoft Corporation Increasing motion smoothness using frame interpolation with motion analysis
US20040252230A1 (en) * 2003-06-13 2004-12-16 Microsoft Corporation Increasing motion smoothness using frame interpolation with motion analysis
US20040252759A1 (en) * 2003-06-13 2004-12-16 Microsoft Corporation Quality control in frame interpolation with motion analysis
US7519114B2 (en) * 2003-07-18 2009-04-14 Samsung Electronics Co., Ltd. GoF/GoP texture description method, and texture-based GoF/GoP retrieval method and apparatus using the same
US20050058195A1 (en) * 2003-07-18 2005-03-17 Samsung Electronics Co., Ltd. GoF/GoP texture description method, and texture-based GoF/GoP retrieval method and apparatus using the same
US9497513B2 (en) * 2003-11-12 2016-11-15 Sony Corporation Apparatus and method for use in providing dynamic bit rate encoding
US20060109915A1 (en) * 2003-11-12 2006-05-25 Sony Corporation Apparatus and method for use in providing dynamic bit rate encoding
US20050175099A1 (en) * 2004-02-06 2005-08-11 Nokia Corporation Transcoder and associated system, method and computer program product for low-complexity reduced resolution transcoding
US20070171979A1 (en) * 2004-02-20 2007-07-26 Onno Eerenberg Method of video decoding
US20050232497A1 (en) * 2004-04-15 2005-10-20 Microsoft Corporation High-fidelity transcoding
US10979959B2 (en) 2004-11-03 2021-04-13 The Wilfred J. and Louisette G. Lagassey Irrevocable Trust Modular intelligent transportation system
US9371099B2 (en) 2004-11-03 2016-06-21 The Wilfred J. and Louisette G. Lagassey Irrevocable Trust Modular intelligent transportation system
US7535513B2 (en) * 2004-11-17 2009-05-19 Samsung Electronics Co., Ltd. Deinterlacing method and device in use of field variable partition type
US20060125956A1 (en) * 2004-11-17 2006-06-15 Samsung Electronics Co., Ltd. Deinterlacing method and device in use of field variable partition type
US20060233258A1 (en) * 2005-04-15 2006-10-19 Microsoft Corporation Scalable motion estimation
US10085291B2 (en) 2005-06-24 2018-09-25 Aylus Networks, Inc. Associated device discovery in IMS networks
US20060291412A1 (en) * 2005-06-24 2006-12-28 Naqvi Shamim A Associated device discovery in IMS networks
US9999084B2 (en) 2005-06-24 2018-06-12 Aylus Networks, Inc. Associated device discovery in IMS networks
US10194479B2 (en) 2005-06-24 2019-01-29 Aylus Networks, Inc. Associated device discovery in IMS networks
US9468033B2 (en) 2005-06-24 2016-10-11 Aylus Networks, Inc. Associated device discovery in IMS networks
US10477605B2 (en) 2005-06-24 2019-11-12 Aylus Networks, Inc. Associated device discovery in IMS networks
US8553866B2 (en) 2005-06-24 2013-10-08 Aylus Networks, Inc. System and method to provide dynamic call models for users in a network
USRE44412E1 (en) 2005-06-24 2013-08-06 Aylus Networks, Inc. Digital home networks having a control point located on a wide area network
US8483373B2 (en) 2005-06-24 2013-07-09 Aylus Networks, Inc. Method of avoiding or minimizing cost of stateful connections between application servers and S-CSCF nodes in an IMS network with multiple domains
US20070011718A1 (en) * 2005-07-08 2007-01-11 Nee Patrick W Jr Efficient customized media creation through pre-encoding of common elements
US20070182728A1 (en) * 2006-02-06 2007-08-09 Seiko Epson Corporation Image display system, image display method, image display program, recording medium, data processing apparatus, and image display apparatus
US20070197227A1 (en) * 2006-02-23 2007-08-23 Aylus Networks, Inc. System and method for enabling combinational services in wireless networks by using a service delivery platform
US8494052B2 (en) 2006-04-07 2013-07-23 Microsoft Corporation Dynamic selection of motion estimation search ranges and extended motion vector ranges
US8155195B2 (en) 2006-04-07 2012-04-10 Microsoft Corporation Switching distortion metrics during motion estimation
US20070237226A1 (en) * 2006-04-07 2007-10-11 Microsoft Corporation Switching distortion metrics during motion estimation
US20070237232A1 (en) * 2006-04-07 2007-10-11 Microsoft Corporation Dynamic selection of motion estimation search ranges and extended motion vector ranges
US20080291905A1 (en) * 2006-05-16 2008-11-27 Kiran Chakravadhanula Systems and Methods for Real-Time Cellular-to-Internet Video Transfer
US9148766B2 (en) 2006-05-16 2015-09-29 Aylus Networks, Inc. Systems and methods for real-time cellular-to-internet video transfer
US9026117B2 (en) 2006-05-16 2015-05-05 Aylus Networks, Inc. Systems and methods for real-time cellular-to-internet video transfer
US8611334B2 (en) 2006-05-16 2013-12-17 Aylus Networks, Inc. Systems and methods for presenting multimedia objects in conjunction with voice calls from a circuit-switched network
US20070268964A1 (en) * 2006-05-22 2007-11-22 Microsoft Corporation Unit co-location-based motion estimation
US9094686B2 (en) * 2006-09-06 2015-07-28 Broadcom Corporation Systems and methods for faster throughput for compressed video data decoding
US20140307793A1 (en) * 2006-09-06 2014-10-16 Alexander MacInnis Systems and Methods for Faster Throughput for Compressed Video Data Decoding
US20080162713A1 (en) * 2006-12-27 2008-07-03 Microsoft Corporation Media stream slicing and processing load allocation for multi-user media systems
US8380864B2 (en) 2006-12-27 2013-02-19 Microsoft Corporation Media stream slicing and processing load allocation for multi-user media systems
US9438882B2 (en) * 2007-01-11 2016-09-06 Korea Electronics Technology Institute Method for image prediction of multi-view video codec and computer readable recording medium therefor
USRE47897E1 (en) * 2007-01-11 2020-03-03 Korea Electronics Technology Institute Method for image prediction of multi-view video codec and computer readable recording medium therefor
US20140355680A1 (en) * 2007-01-11 2014-12-04 Korea Electronics Technology Institute Method for image prediction of multi-view video codec and computer readable recording medium therefor
US20080170753A1 (en) * 2007-01-11 2008-07-17 Korea Electronics Technology Institute Method for Image Prediction of Multi-View Video Codec and Computer Readable Recording Medium Therefor
US9160570B2 (en) 2007-02-22 2015-10-13 Aylus Networks, Inc. Systems and method for enabling IP signaling in wireless networks
US8432899B2 (en) 2007-02-22 2013-04-30 Aylus Networks, Inc. Systems and methods for enabling IP signaling in wireless networks
US8170534B2 (en) 2007-04-17 2012-05-01 Aylus Networks, Inc. Systems and methods for user sessions with dynamic service selection
US8433303B2 (en) 2007-04-17 2013-04-30 Aylus Networks, Inc. Systems and methods for user sessions with dynamic service selection
US8457958B2 (en) 2007-11-09 2013-06-04 Microsoft Corporation Audio transcoder using encoder-generated side information to transcode to target bit-rate
US20100296572A1 (en) * 2007-12-11 2010-11-25 Kumar Ramaswamy Methods and systems for transcoding within the distributiion chain
US8804044B2 (en) * 2008-03-06 2014-08-12 Entropic Communications, Inc. Temporal fallback for high frame rate picture rate conversion
US20110128448A1 (en) * 2008-03-06 2011-06-02 Erwin Bellers Temporal Fallback For High Frame Rate Picture Rate Conversion
US20090251829A1 (en) * 2008-04-02 2009-10-08 Headway Technologies, Inc. Seed layer for TMR or CPP-GMR sensor
US8396114B2 (en) 2009-01-29 2013-03-12 Microsoft Corporation Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming
US8311115B2 (en) 2009-01-29 2012-11-13 Microsoft Corporation Video encoding using previously calculated motion information
US20100309975A1 (en) * 2009-06-05 2010-12-09 Apple Inc. Image acquisition and transcoding system
US8270473B2 (en) 2009-06-12 2012-09-18 Microsoft Corporation Motion based dynamic resolution multiple bit rate video encoding
US20140348231A1 (en) * 2009-09-04 2014-11-27 STMicoelectronics International N.V. System and method for object based parametric video coding
US10178396B2 (en) 2009-09-04 2019-01-08 Stmicroelectronics International N.V. Object tracking
US9813731B2 (en) * 2009-09-04 2017-11-07 Stmicroelectonics International N.V. System and method for object based parametric video coding
US8705616B2 (en) 2010-06-11 2014-04-22 Microsoft Corporation Parallel multiple bitrate video encoding to reduce latency and dependences between groups of pictures
US9094685B2 (en) 2010-09-21 2015-07-28 Dialogic Corporation Efficient coding complexity estimation for video transcoding systems
WO2012039933A1 (en) * 2010-09-21 2012-03-29 Dialogic Corporation Efficient coding complexity for video transcoding systems
US20120281748A1 (en) * 2011-05-02 2012-11-08 Futurewei Technologies, Inc. Rate Control for Cloud Transcoding
CN103597839B (en) * 2011-05-31 2017-10-20 杜比实验室特许公司 Video-frequency compression method, video reconstruction method and system and encoder
CN103597839A (en) * 2011-05-31 2014-02-19 杜比实验室特许公司 Video compression implementing resolution tradeoffs and optimization
US9591318B2 (en) * 2011-09-16 2017-03-07 Microsoft Technology Licensing, Llc Multi-layer encoding and decoding
US9769485B2 (en) * 2011-09-16 2017-09-19 Microsoft Technology Licensing, Llc Multi-layer encoding and decoding
US20170134737A1 (en) * 2011-09-16 2017-05-11 Microsoft Technology Licensing, Llc Multi-layer encoding and decoding
US20130070859A1 (en) * 2011-09-16 2013-03-21 Microsoft Corporation Multi-layer encoding and decoding
US11089343B2 (en) 2012-01-11 2021-08-10 Microsoft Technology Licensing, Llc Capability advertisement, configuration and control for video coding and decoding
US10847048B2 (en) * 2018-02-23 2020-11-24 Frontis Corp. Server, method and wearable device for supporting maintenance of military apparatus based on augmented reality using correlation rule mining
US11671632B2 (en) * 2018-08-14 2023-06-06 Huawei Technologies Co., Ltd. Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection

Also Published As

Publication number Publication date
JP4786114B2 (en) 2011-10-05
CN1386376A (en) 2002-12-18
EP1289301A1 (en) 2003-03-05
WO2001091467A1 (en) 2001-11-29
CN1199467C (en) 2005-04-27
EP1289301A4 (en) 2009-06-17
EP1289301B1 (en) 2011-08-24

Similar Documents

Publication Publication Date Title
US6650705B1 (en) Method for encoding and transcoding multiple video objects with variable temporal resolution
JP4601889B2 (en) Apparatus and method for converting a compressed bitstream
US6925120B2 (en) Transcoder for scalable multi-layer constant quality video bitstreams
US6480628B2 (en) Method for computational graceful degradation in an audiovisual compression system
US6574279B1 (en) Video transcoding using syntactic and semantic clues
US6490320B1 (en) Adaptable bitstream video delivery system
US6404814B1 (en) Transcoding method and transcoder for transcoding a predictively-coded object-based picture signal to a predictively-coded block-based picture signal
US9774848B2 (en) Efficient compression and transport of video over a network
US6542546B1 (en) Adaptable compressed bitstream transcoder
US7058127B2 (en) Method and system for video transcoding
CA2185704C (en) Method, rate controller, and system for preventing overflow and underflow of a decoder buffer
US8311095B2 (en) Method and apparatus for transcoding between hybrid video codec bitstreams
EP1587327A2 (en) Video transcoding
EP1445958A1 (en) Quantization method and system, for instance for video MPEG applications, and computer program product therefor
US6961377B2 (en) Transcoder system for compressed digital video bitstreams
JP2001112006A (en) Rate-distortion characteristic estimation method
JPH08111870A (en) Method and device for re-coding image information
US20050265446A1 (en) Mosquito noise detection and reduction
KR20040106403A (en) Encoding device and method, decoding device and method, edition device and method, recording medium, and program

Legal Events

Date Code Title Description
AS Assignment

Owner name: MITSUBISHI ELECTRIC INFORMATION TECHNOLOGY CENTER

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VETRO, ANTHONY;SUN, HUIFANG;REEL/FRAME:010854/0122

Effective date: 20000523

AS Assignment

Owner name: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., M

Free format text: CHANGE OF NAME;ASSIGNOR:MITSUBISHI ELECTRIC INFORMATION TECHNOLOGY CENTER AMERICA, INC.;REEL/FRAME:011564/0329

Effective date: 20000828

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20111118