Veröffentlichungsnummer | US7180944 B2 |
Publikationstyp | Erteilung |
Anmeldenummer | US 10/888,479 |
Veröffentlichungsdatum | 20. Febr. 2007 |
Eingetragen | 8. Juli 2004 |
Prioritätsdatum | 4. Febr. 2004 |
Gebührenstatus | Bezahlt |
Auch veröffentlicht unter | US20050169377 |
Veröffentlichungsnummer | 10888479, 888479, US 7180944 B2, US 7180944B2, US-B2-7180944, US7180944 B2, US7180944B2 |
Erfinder | Chia-Wen Lin, Yuh-Ruey Lee, Yeh-Kai Chou |
Ursprünglich Bevollmächtigter | Industrial Technology Research Institute |
Zitat exportieren | BiBTeX, EndNote, RefMan |
Patentzitate (11), Nichtpatentzitate (7), Referenziert von (13), Klassifizierungen (30), Juristische Ereignisse (3) | |
Externe Links: USPTO, USPTO-Zuordnung, Espacenet | |
The present invention generally relates to video transcoders in communication, and more specifically to a low-complexity spatial downscaling video transcoder and method thereof.
The moving picture experts group (MPEG) developed a generic video compression standard that defines three types of frames, called Intra-frame (I-frame), predictive-frame (P-frame) and bi-directionally predictive frame (B-frame). A group of pictures (GOP) comprises an I-frame and a plural of P-frames and B-frames.
In recent years, due to the advances of network technologies and wide adoptions of video coding standards, digital video applications become increasingly popular in our daily life. Networked multimedia services, such as video on demand, video streaming, and distance learning, have been emerging in various network environments. These multimedia services usually use pre-encoded videos for transmission. The heterogeneity of present communication networks and user devices poses difficulties in delivering these bit-streams to the receivers. The sender may need to convert one preencoded bit-stream into a lower bit-rate or lower resolution version to fit the available channel bandwidths, the screen display resolutions, or even the processing powers of diverse clients. Many practical applications such as video conversions from DYD to VCD, i.e., MPEG-2 to MPEG-1, and from MPEG-1/2 to MPEG-4 involve such spatial-resolution, format, and bit-rate conversions. Dynamic bit-rate or resolution conversions may be achieved using the scalable coding schemes in current coding standards to support heterogeneous video communications. They, however, usually just provide a very limited support of heterogeneity of bit-rates and resolutions, e.g., MPEG-2 and H.263+, or introduce significantly higher complexity at the client decoder, e.g., MPEG-4 FGS.
Video transcoding is a process of converting a previously compressed video bit-stream into another bit-stream with a lower bit-rate, a different display format (e.g., downscaling), or a different coding method (e.g., the conversion between H.26x and MPEG-x, or adding error resilience), etc. It is considered an efficient means of achieving fine and dynamic adaptation of bit-rates, resolutions, and formats. In realizing transcoders, the computational complexity and picture quality are usually the two most important concerns.
A straightforward realization of video transcoders is the Cascaded Pixel-domain Downscaling Transcoder (CPDT) that cascades a decoder followed by an encoder as shown in
Recently, DCT-domain transcoding schemes have become very attractive because they can avoid the discrete cosine transform (DCT) and the inverse discrete cosine transform (IDCT) computations. Also, several efficient schemes were developed for implementing the DCT-domain motion compensation (DCT-MC). However, the conventional simplified DCT-domain transcoder cannot be used for spatial/temporal downscaling because it has to use the same motion vectors that are decoded from the incoming video at the encoding stage. The outgoing motion vectors usually are different from the incoming motion vectors in spatial/temporal downscaling applications.
The firstly proposed Cascaded DCT-domain Downscaling Transcoder (CDDT) architecture is depicted in
where w_{i }and h_{i}ε{0,1, . . . 7}. H_{h} _{ i }and H_{w} _{ i }are constant geometric transform matrices defined by the height and width of each sub-block generated by the intersection of b_{i }with b.
It takes 8 matrix multiplications and 3 matrix additions to compute Eq. (1) directly. However, the following relationships of geometric transform matrices hold: H_{h} _{ 1 }=H_{h} _{ 2 }, H_{h} _{ 3 }=H_{h} _{ 4 }, H_{w} _{ 1 }=H_{w} _{ 3 }and H_{w} _{ 2 }=H_{w} _{ 4 }. Usin computation of Eq. (1) can be reduced to 6 matrix multiplications and 3 matrix additions, as shown in Eq. (2) below.
B=H _{h} _{ 1 }(N _{1} H _{w} _{ 1 } +N _{2} H _{w} _{ 2 })+H _{h} _{ 3 }(N _{3} H _{w} _{ 3 } +N _{4} H _{w} _{ 4 }) (2)
where H_{h} _{ i }and H_{w} _{ i }can be pre-computed and then pre-stored in a memory. Therefore, no additional DCT computation is required for the computation of Eq. (1) and Eq. (2).
To reduce the computation of the DCT-MC, the number of matrix multiplications can be reduced from 24 to 18 by the conventional shared information method, while the number of matrix additions/subtractions is a bit increased. This leads to a computational reduction of about 25% in the DCT-MC operation.
A more efficient DCT-domain downscaling scheme, named DCT decimation, was then proposed for image downscaling and later adopted in video transcoding. This DCT decimation scheme extracts the 4×4 low-frequency DCT coefficients from the four original blocks b_{1}–b_{4}, then combines the four 4×4 sub-blocks into an 8×8 block. Let B_{1}, B_{2}, B_{3}, and B_{4}, represent the four original 8×8 DCT blocks; {circumflex over (B)}_{1}, {circumflex over (B)}_{2}, {circumflex over (B)}_{3 }and {circumflex over (B)}_{4 }the four 4×4 low-frequency sub-blocks of B_{1}, B_{2}, B_{3}, and B_{4}, respectively; {circumflex over (b)}_{i}=IDCT({circumflex over (B)}_{i}), i=1, . . . , 4. Then
is the downscaled version of
To compute {circumflex over (B)}=DCT({circumflex over (b)}) directly from {circumflex over (B)}_{1}, {circumflex over (B)}_{2}, {circumflex over (B)}_{3}, and {circumflex over (B)}_{4}, it can use the following expression:
In addition, an architecture similar to the CDDT was proposed, where a reduced-size frame memory is used in the DCT-domain decoder loop for computation and memory reduction which may lead to some drifting errors.
The present invention has been made to overcome the above-mentioned drawback of conventional DCT-domain downscaling transcoder. The primary object of the present invention is to provide a low-complexity spatial downscaling video transcoder. The spatial downscaling video transcoder of the invention integrates the DCT-domain decoding and downscaling operations in the downscaling CDDT into a reduced-resolution DCT-MC so as to achieve significant reduction of computations without any quality degradation.
The spatial downscaling video transcoder of the invention comprises a decoder having a reduced DCT-MC unit, a DCT-domain downscaling unit, and an encoder. The decoder receives incoming bit-streams, uses the reduced DCT-MC unit to integrate the DCT-domain motion compensation and downscaling operations in the downscaling CDDT into a reduced-resolution DCT-MC for B or P frames in MPEG standard, performs the reduced-resolution decoding, generates an estimated motion vector, and performs the full-resolution decoding for I-frames in MPEG standard. After downscaling the decoded I-frames, the DCT-domain downscaling unit outputs the results for encoding. The encoder receives the estimated vector and the downscaling results from the DCT-domain downscaling unit, determines encoding modes and outputs encoded bit-streams.
The spatial downscaling video transcoder of the invention has two preferred embodiments. In the first preferred embodiment, the low-complexity operation performs the full-resolution decoding for I and P frames and the reduced-resolution decoding for B frames. When performing the DCT-MC downscaling for B-frames in the decoder-loop, only the low-frequency portions are extracted while I and P frames are decoded at the full picture resolution. In this way, for B-frames, only the reduced-resolution DCT-MC is required in the decoder-loop. Since B-frames usually occupy a large portion of an I-B-P structured MPEG video, the computation saving can be very significant.
In the second preferred embodiment of the invention, the low computational complexity is achieved by performing the reduced-resolution decoding for all B and P frames. Therefore, every block of P-frames has only nonzero low-frequency DCT coefficients and all high-frequency coefficients are discarded.
According to the architecture of the video transcoder, the spatial downscaling video transcoding method of the invention also provides an activity-weighted median filtering scheme for re-sampling motion vectors, and a scheme for determining the coding modes. The spatial downscaling video transcoding method of the invention mainly comprises the following steps: (a) receiving incoming bit-streams, using a reduced DCT-MC unit to integrate the DCT-domain motion compensation and downscaling operations in the downscaling CDDT into a reduced-resolution DCT-MC for B or P frames in MPEG standard, performing the reduced-resolution decoding, generating an estimated motion vector, and performing the full-resolution decoding for I-frames in MPEG standard; (b) after downscaling the decoded I-frames, outputting the results for encoding; (c) receiving the estimated vectors and the DCT-domain downscaling results as well as determining the encoding modes and outputting the encoded bit-stream.
This invention compares the average peak signal-to-noise ratio (PSNR) performance and processing speed of various transcoders. The luminance PSNR values of each frame are compared. The experimental results show that, as compared to the original CDDT, the first preferred embodiment of the invention can increase the processing speed over 60% without any quality degradation for videos with the (15,3) GOP structure. The second preferred embodiment of the invention can further increase the speed, while introducing below 0.3 dB quality degradation in the luminance component. By using the shared information approach, the processing speed of two preferred embodiments of the invention can be further improved without sacrificing the video quality.
The foregoing and other objects, features, aspects and advantages of the present invention will become better understood from a careful reading of a detailed description provided herein below with appropriate reference to the accompanying drawings.
The spatial downscaling video transcoder of the invention is used to reduce the computation by the conventional CDDT. The decoder-loop of the conventional CDDT is operated at the full picture resolution, while the encoding is performed at the quarter resolution. Instead of using the whole DCT coefficients decoded from the decoder loop, the DCT-domain downscaling scheme of the invention only exploits the low-frequency DCT coefficients of each decoded block for downscaling.
Accordingly, the spatial downscaling video transcoding method of the invention mainly comprises the following steps: (a) receiving incoming bit-streams, using a reduced DCT-MC unit to integrate the DCT-domain motion compensation and downscaling operations in the downscaling CDDT into a reduced-resolution DCT-MC for B or P frames in MPEG standard, performing the reduced-resolution decoding, generating an estimated motion vector, and performing the full-resolution decoding for I-frames in MPEG standard; (b) after downscaling the decoded I-frames, outputting the results for encoding; (c) receiving the estimated vector and the DCT-domain downscaling results as well as determining encoding modes and outputting encoded bit-streams.
In the first preferred embodiment of the invention, the low-complexity operation performs the full-resolution decoding for I and P frames and the reduced-resolution decoding for B frames.
Therefore, in step (a) of the transcoding scheme depicted in the first preferred embodiment, when performing the DCT-MC downscaling for B-frames in the decoder-loop, only the low-frequency coefficients are extracted while I and P frames are decoded at the full picture resolution.
Accordingly, referring to
For simplicity, only one reference frame is used in the following to show the simplified DCT-MC for decoding B-frames. It can be easily extended toe the case with bidirectional prediction. By incorparating the DCT decimation into the DCT-MC of the decoder-loop for B frames, the following equation holds:
where
I_{4 }is a 4×4 unit matrix, and 0 is a 4×4 zero matrix, then Eq. (4) becomes
P _{4} H _{h} _{ i } B _{i} H _{w} _{ i } P _{4} ^{l}=(H _{h} _{ i } ^{11} N _{i} ^{11} +H _{h} _{ i } ^{12} N _{i} ^{21})H _{w} _{ i } ^{11}+(H _{h} _{ i } ^{11} N _{i} ^{12} +H _{h} _{ i } ^{12} N _{i} ^{22})H _{w} _{ i } ^{21} (5)
It is worthy to mention that all matrices in Eq. (5) are 4×4 matrices. Therefore, if B-frames are decoded with quarter resolution, then it takes 6×4^{3 }multiplications and 21×42 additions for Eq. (5). While for Eq (1), it takes 2×8^{3 }multiplications and 14×82 additions. Therefore, the number of multiplication operations reduces about 60% in DCT-MC_{dec}. Furthermore, coding block N_{i }usually has many zero high-frequency coefficients. Therefore, this scheme has fewer computations than the number of computations mentioned above. Also, by the symmetric property of geometric transform matrices, that is H_{h} _{ 1 }=H_{h} _{ 2 }, H_{h} _{ 3 }=H_{h} _{ 4 }, H_{w} _{ 1 }=H_{w} _{ 3 }, and H_{w} _{ 2 }=H_{w} _{ 4 }, Eq. (4) can be written to
{circumflex over (B)} _{1} =H _{h} _{ 1 } ^{11}(N _{1} ^{11} H _{w} _{ 1 } ^{11} +N _{1} ^{12} H _{w} _{ 1 } ^{21} +N _{2} ^{11} H _{w} _{ 2 } ^{11} +N _{2} ^{12} H _{w} _{ 2 } ^{21})+H _{h} _{ 1 } ^{12}(N _{1} ^{21} H _{w} _{ 1 } ^{11} +N _{1} ^{22} H _{w} _{ 1 } ^{21} +N _{2} ^{21} H _{w} _{ 2 } ^{11} +N _{2} ^{22} H _{w} _{ 2 } ^{2})
+H _{h} _{ 3 } ^{11}(N _{3} ^{11} H _{w} _{ 1 } ^{11} +N _{3} ^{12} H _{w} _{ 1 } ^{21} +N _{4} ^{11} H _{w} _{ 2 } ^{11} +N _{4} ^{12} H _{w} _{ 2 } ^{21})+H _{h} _{ 3 } ^{12}(N _{3} ^{21} H _{w} _{ 1 } ^{11} +N _{3} ^{22} H _{w} _{ 1 } ^{21} +N _{4} ^{21} H _{w} _{ 2 } ^{11} +N _{4} ^{22} H _{w} _{ 2 } ^{21}) (6)
Eq. (6) needs 20 4×4-matrix multiplications and 15 4×4-matrix additions, while it takes 6 8×8-matrix multiplications and 3 8×8-matrix additions for Eq (2). It is worthy to mention that, although B-frames are decoded with the quarter resolution, the performance is the same as that of the original CDDT. Therefore, the architecture of the first preferred embodiment of this invention does not induce any quality degradation.
The computation can be further reduced by applying the quarter-resolution decoding for all P and B-frames. In this way, each block of the reference P-frame has only 4×4 nonzero low-frequency DCT coefficients (i.e., B12, B21, and B22 in (4) are all zero matrices), (4) can thus be reduced as
Therefore, in step (a) of the transcoding method of the second preferred embodiment, the decoder-loop extracts only the low-frequency portions when performing the DCT-MC downscaling for B-frames and P-frames while I-frames are decoded at the full picture resolution.
After the downscaling, the motion vectors need to be re-sampled to obtain a correct value. Full-range motion re-estimation is computationally too expensive, thus not suited to practical applications. Several conventional methods were proposed for fast re-sampling the motion vectors based on the motion information of the incoming frame. Three conventional motion vector re-sampling methods were compared: median filtering, averaging, and majority voting, where the median filtering scheme was shown to outperform the other two. For a reduced N_{x}×N_{y }system, original N_{x}×N_{y }macroblocks can be reduced to one macroblock. As a generation of median filtering scheme, this invention uses the activity-weighted median of the N_{x}×N_{y }incoming vector set V={v_{1}, v_{2}, . . . , v_{N} _{ x } _{×N} _{ y }} as follows:
where v is the new motion vector of the reduced macroblock (MB), N_{x }is the horizontal downscaling factor of the motion vector, and N_{y }is the vertical downscaling factor of the motion vector. The macroblock activity ACT_{i }can be the squared or absolute sum of DCT coefficients, the number of nonzero DCT coefficients, or simply the DC value. This invention adopts the squared sum of DCT coefficients of MB as the activity measure.
The MB coding modes also need to be re-determined after the downscaling. In the invention, the rules for determining the coding modes of the invention are as follows:
Note that, the motion vectors of skipped MBs are set to zero
The speed-up gain is dependent on the GOP structure and size used. The larger the number of B-frames in a GOP, the higher the performance gain of the first preferred embodiment of the invention, while the speed-up gain of the second preferred embodiment of the invention depends on the number of P- and B-frames in a GOP. By using the shared information approach, the processing speed (the parenthesized values in
In summary, this invention provides efficient architectures for DCT-domain spatial-downscaling video transcoders. Methods for realizing the invention include providing an activity-weighted median filtering scheme for re-sampling motion vectors, and a method for determining the coding modes. Two embodiments of the invention integrate the DCT-domain decoding and downscaling operations in the downscaling CDDT into a reduced-resolution DCT-MC so as to achieve significant reduction of computations. The first embodiment of the invention can speed up the decoding and downscaling of B-frames without sacrificing the visual quality, while the second embodiment can speed up the decoding and downscaling of P- and B-frames with acceptable quality degradation. By using the shared information approach, the processing speed of the DCT-domain Transcoder of the invention can be further improved.
Although the present invention has been described with reference to the preferred embodiments, it will be understood that the invention is not limited to the details described thereof. Various substitutions and modifications have been suggested in the foregoing description, and others will occur to those of ordinary skill in the art. Therefore, all such substitutions and modifications are intended to be embraced within the scope of the invention as defined in the appended claims.
Zitiertes Patent | Eingetragen | Veröffentlichungsdatum | Antragsteller | Titel |
---|---|---|---|---|
US5537440 | 7. Jan. 1994 | 16. Juli 1996 | Motorola, Inc. | Efficient transcoding device and method |
US5544266 | 19. Juli 1994 | 6. Aug. 1996 | Koninklijke Ptt Nederland N.V. | Transcoding device |
US5600646 | 27. Jan. 1995 | 4. Febr. 1997 | Videoserver, Inc. | Video teleconferencing system with digital transcoding |
US5657015 | 18. Okt. 1995 | 12. Aug. 1997 | Kokusai Denshin Denwa Kabushiki Kaisha | Method and apparatus of rate conversion for coded video data |
US5729293 | 29. Juni 1995 | 17. März 1998 | U.S. Philips Corporation | Method and device for transcoding a sequence of coded digital signals |
US6466623 | 26. März 1999 | 15. Okt. 2002 | Industrial Technology Research Institute | Method and apparatus for motion estimation for high performance transcoding |
US6490320 | 11. Apr. 2000 | 3. Dez. 2002 | Mitsubishi Electric Research Laboratories Inc. | Adaptable bitstream video delivery system |
US6542546 | 2. Febr. 2000 | 1. Apr. 2003 | Mitsubishi Electric Research Laboratories, Inc. | Adaptable compressed bitstream transcoder |
US6584077 | 30. Juni 1999 | 24. Juni 2003 | Tandberg Telecom As | Video teleconferencing system with digital transcoding |
US6647061 * | 9. Juni 2000 | 11. Nov. 2003 | General Instrument Corporation | Video size conversion and transcoding from MPEG-2 to MPEG-4 |
US6868188 * | 13. Febr. 2001 | 15. März 2005 | Telefonaktiebolaget Lm Ericsson (Publ) | Efficient down-scaling of DCT compressed images |
Referenz | ||
---|---|---|
1 | Anthony Vetro, Charilaos Christopoulos, and Huifang Sun, "Video Transcoding Architectures and Techniques: An Overview", IEEE Signal Processing Magazine, Mar. 2003. | |
2 | Junehwa Song and Boon-Lock Yeo,"A Fast Algorithm for DCT-Domain Inverse Motion Compensation Based on Shared Information in a Macroblock", IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, No. 5, Aug. 2000. | |
3 | Pedro A. A. Assunçao and Mohammed Ghanbari, Senior Member, IEEE, "A Frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2 Bit Streams", IEEE Transactions Circuits and Systems for Video Technology, vol. 8, No. 8, Dec. 1998. | |
4 | Peng Yin, Min Wu, and Bede Liu, "Video Transcoding by Reducing Spatial Resolution", Department of Electrical Engineering Princeton University, Princeton, Nj 08544, U.S.A. | |
5 | Rakesh Dugad, Student Member, IEEE and Narendra Ahuja, Fellow, IEEE."A Fast Scheme for Image Size Change in the Compressed Domain", IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, No. 4, Apr. 2001. | |
6 | Tamer Shanableh and Mohammed Ghanbari, Senior Member,IEEE, "Heterogeneous Video Transcoding to Lower Spatio-Temporal Resolutions and Different Encoding Formats", IEEE Transactions on Multimedia, vol. 2, No. 2, Jun. 2000. | |
7 | Wenwu Zhu, Kyeong Ho Yang, and Marc J. Beacken, "CIF to QCIF Video Bitstream Down-Conversion in the DCT Domain", Bell Labs Technical Journal. Jul.-Sep. 1998. |
Zitiert von Patent | Eingetragen | Veröffentlichungsdatum | Antragsteller | Titel |
---|---|---|---|---|
US7486207 * | 5. Dez. 2005 | 3. Febr. 2009 | Telefonaktiebolaget L M Ericsson (Publ) | Method and device for changing an encoding mode of encoded data streams |
US7577201 * | 17. Nov. 2004 | 18. Aug. 2009 | Lg Electronics Inc. | Apparatus and method for converting resolution of compressed video |
US7899120 * | 29. Sept. 2005 | 1. März 2011 | Samsung Electronics Co., Ltd. | Method for selecting output motion vector based on motion vector refinement and transcoder using the same |
US8121422 * | 9. Dez. 2005 | 21. Febr. 2012 | Siemens Aktiengesellschaft | Image encoding method and associated image decoding method, encoding device, and decoding device |
US8126280 * | 31. Dez. 2007 | 28. Febr. 2012 | Adobe Systems Incorporated | Enhanced decompression of compressed data |
US8275042 | 29. Okt. 2008 | 25. Sept. 2012 | Canon Kabushiki Kaisha | High-performance video transcoding method |
US8385427 * | 15. Apr. 2005 | 26. Febr. 2013 | Apple Inc. | Reduced resolution video decode |
US8731068 | 23. März 2011 | 20. Mai 2014 | Lsi Corporation | Video transcoder with flexible quality and complexity management |
US20050111546 * | 17. Nov. 2004 | 26. Mai 2005 | Lg Electronics Inc. | Apparatus and method for converting resolution of compressed video |
US20120250768 * | 4. Okt. 2012 | Nxp B.V. | Video decoding switchable between two modes | |
CN101409842B * | 22. Sept. 2008 | 15. Apr. 2015 | 奥多比公司 | 压缩数据的增强型解压 |
EP2555521A1 | 1. Aug. 2011 | 6. Febr. 2013 | Advanced Digital Broadcast S.A. | A method and system for transmitting a high resolution video stream as a low resolution video stream |
WO2013017565A1 | 30. Juli 2012 | 7. Febr. 2013 | Advanced Digital Broadcast S.A. | A method and system for transmitting a high resolution video stream as a low resolution video stream |
US-Klassifikation | 375/240.16, 375/E07.176, 375/E07.198, 375/E07.187, 375/E07.252, 375/240.26, 375/E07.181, 375/E07.211, 375/E07.17, 375/240.25 |
Internationale Klassifikation | H04N1/64, H04N7/18, H04N7/46, H04N7/26, H04N7/12, H04N7/50 |
Unternehmensklassifikation | H04N19/40, H04N19/176, H04N19/48, H04N19/159, H04N19/59, H04N19/61, H04N19/172 |
Europäische Klassifikation | H04N7/26A8B, H04N7/26T, H04N7/26A6S2, H04N7/26A8P, H04N7/50, H04N7/46S, H04N7/26C |
Datum | Code | Ereignis | Beschreibung |
---|---|---|---|
8. Juli 2004 | AS | Assignment | Owner name: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE, TAIWAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIN, CHIA-WEN;LEE, YUH-RUEY;CHOU, YEH-KAI;REEL/FRAME:015566/0733 Effective date: 20040615 |
20. Aug. 2010 | FPAY | Fee payment | Year of fee payment: 4 |
20. Aug. 2014 | FPAY | Fee payment | Year of fee payment: 8 |