WO2005009046A1 - Interframe wavelet video coding method - Google Patents

Interframe wavelet video coding method Download PDF

Info

Publication number
WO2005009046A1
WO2005009046A1 PCT/KR2004/001666 KR2004001666W WO2005009046A1 WO 2005009046 A1 WO2005009046 A1 WO 2005009046A1 KR 2004001666 W KR2004001666 W KR 2004001666W WO 2005009046 A1 WO2005009046 A1 WO 2005009046A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
average
video coding
frame
group
Prior art date
Application number
PCT/KR2004/001666
Other languages
French (fr)
Inventor
Chang-Hoon Yim
Ho-Jin Ha
Bae-Keun Lee
Woo-Jin Han
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Publication of WO2005009046A1 publication Critical patent/WO2005009046A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/53Multi-resolution motion estimation; Hierarchical motion estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to a wavelet video coding method, and more particularly, to an interframe wavelet video coding (IWVC) method in which an average temporal distance is reduced by changing a temporal filtering direction.
  • IWVC interframe wavelet video coding
  • Multimedia data requires a large capacity storage medium and a wide bandwidth for transmission since the amount of multimedia data is usually large.
  • a 24-bit true color image having a resolution of 640 * 480 needs a capacity of 640 * 480 * 24 bits, i.e., data of about 7.37 Mbits, per frame.
  • a bandwidth of 221 Mbits/sec is required.
  • a 0-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required.
  • a compression coding method is a requisite for transmitting multimedia data including text, video, and audio.
  • a basic principle of data compression is removing data redundancy.
  • Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy taking into account human eyesight and limited perception of high frequency.
  • Data compression can be classified into lossy/lossless compression according to whether source data is lost, intraframe/interframe compression according to whether individual frames are compressed independently, and symmetric/ asymmetric compression according to whether time required for compression is the same as time required for recovery.
  • Data compression is defined as real-time c ompression when a compression/recovery time delay does not exceed 50 ms and as scalable compression when frames have different resolutions.
  • lossless compression is usually used.
  • For multimedia data lossy compression is usually used.
  • intraframe compression is usually used to remove spatial redundancy
  • interframe compression is usually used to remove temporal redundancy.
  • data coding methods having scalability such as wavelet video coding and subband video coding, may be suitable to a multimedia environment.
  • IWNC Interframe Wavelet Video Coding
  • the present invention provides a scalable data coding method which provides improved performance by reducing a total temporal distance for motion estimation.
  • an interframe wavelet video coding method comprising receiving a group-of-frames and decomposing the group-of-frames into first difference frames and first average frames between the frames in a first forward temporal direction and a first backward temporal direction, wavelet-decomposing the first difference frames and the first average frames, and quantizing coefficients resulting from the wavelet-decomposition to generate a bitstream.
  • the interframe wavelet video coding method may further comprise obtaining a motion vector between frames and compensating for a temporal motion using the motion vector before decomposing the group-of-frames into the first difference and average frames.
  • the first forward temporal direction and the first backward temporal direction are preferably combined such that an average of temporal distances between frames in the group-of-frames is minimized.
  • the decomposing the group-of-frames into the first difference and average frames may comprise (a) decomposing the group-of-frames into a first difference frame and a first average frame between two frames in the first forward temporal direction; and (b) decomposing the group-of-frames into another first difference frame and another first average frame between other two frames in the first backward temporal direction.
  • the steps (a) and (b) may be alternately performed with respect to the frames in the group- of-frames.
  • the decomposing the group-of-frames into the first difference and average frames may further comprise decomposing the first average frames into a second difference frame and a second average frame between two first average frames in either of a second forward temporal direction and a second backward temporal directions.
  • the decomposing the first average frames into the second difference and average frames may be repeated a plurality number of times.
  • the second forward temporal direction and the second backward temporal direction may be combined such that an average of temporal distances between frames in the group-of-frames is minimized.
  • the decomposing the first average frames into the second difference and average frames may comprise (c) decomposing the first average frames into a second difference frame and a second average frame between two first average frames in the second forward temporal direction, and (d) decomposing the group-of-frames into another second difference frame and another second average frame between other two first average frames in the second backward temporal direction.
  • the steps (c) and (d) may be alternately performed with respect to the first average frames. Description of Drawings
  • FIG. 1 is a block diagram of an encoder performing an interframe wavelet video coding (IWNC) method
  • FIG. 2 illustrates directions of motion estimation in a conventional IWNC
  • FIGS. 3 and 4 illustrate directions of motion estimation in IWNC according to a first embodiment of the present invention
  • FIGS. 5 and 6 illustrate directions of motion estimation in IWNC according to a second embodiment of the present invention
  • FIG. 7 illustrates directions of motion estimation in IWNC according to a third embodiment of the present invention
  • FIG. 8 illustrates directions of motion estimation in IWNC according to a fourth embodiment of the present invention.
  • FIG. 9 is a graph comparing Peak Signal to Noise Ratios (PSNRs) with respect to a 'Canoe' sequence between a conventional IWNC method and embodiments of the present invention.
  • PSNRs Peak Signal to Noise Ratios
  • FIG. 10 is a graph comparing PS ⁇ Rs with respect to a 'Bus' sequence between a conventional IWNC method and embodiments of the present invention.
  • FIG. 11 is a graph comparing changes in PS ⁇ Rs with respect to a 'Canoe' sequence between a conventional IWNC method and embodiments of the present invention.
  • FIG. 1 is a block diagram of an encoder performing an interframe wavelet video coding (IWNC) method.
  • IWNC interframe wavelet video coding
  • the encoder performing an IWNC method includes a motion estimation block 10 which obtains a motion vector, a motion compensation temporal filtering block 40 which removes temporal redundancy using the motion vector, a spatial wavelet decomposition block 50 which removes spatial redundancy, a motion vector encoding block 20 which encodes the motion vector using a predetermined algorithm, a quantization block 60 which quantizes wavelet coefficients of respective components generated by the spatial wavelet decomposition block 50, and a buffer 30 which temporarily stores an encoded bitstream received from the quantization block 60.
  • the motion estimation block 10 obtains a motion vector used by the motion compensation temporal filtering block 40 using a hierarchical method such as Herarchical Variable Size Block Matching (HNSBM).
  • HNSBM Herarchical Variable Size Block Matching
  • the motion compensation temporal filtering block 40 decomposes frames into low- and high-frequency frames in a temporal direction using the motion vector obtained by the motion estimation unit 10.
  • an average of two frames is defined as a low- frequency component, and half of a difference between the two frames is defined as a high-frequency component.
  • Frames are decomposed in Group-of- Frames (GOF) units. Through such decomposition, temporal redundancy is removed.
  • Decomposition into high- and low- frequency frames may be performed using only a pair of frames without using a motion vector. Efawever, decomposition using a motion vector shows better performance than that using only a pair of frames.
  • an amount of a motion can be represented by a motion vector.
  • the portion of the first frame is compared with a portion to which a portion of the second frame at the same position as the portion of the first frame is moved by the motion vector, and a temporal motion is compensated. Thereafter, the first and second frames are decomposed into low- and high-frequency frames.
  • the spatial wavelet decomposition block 50 wavelet-decomposes frames that have been decomposed in the temporal direction by the motion compensation temporal filtering block 40 into spatial low- and high-frequency components, thereby removing spatial redundancy.
  • the motion vector encoding block 20 encodes a motion vector hierarchically obtained by the motion estimation block 10 such that the motion vector has an optimal number of bits using a rate-distortion algorithm, and then transmits the encoded motion vector to the buffer 30.
  • the quantization block 60 quantizes and encodes wavelet coefficients of components generated by the spatial wavelet decomposition block 50.
  • An encoded bitstream assumes scalability.
  • the buffer 30 stores the encoded bitstream before transmission and is controlled by a rate control algorithm.
  • FIG. 2 illustrates directions of motion estimation in conventional IWNC.
  • a single GOF includes 16 frames. Two adjacent frames in a pair are replaced by a high-frequency frame and a low-frequency frame.
  • motion estimation is performed only in a single direction, i.e., in a forward direction.
  • a temporal high-frequency sub- band frame HI is positioned at the frame 1
  • a temporal low- frequency sub-band frame L2 is positioned at the frame 2.
  • the temporal low-frequency sub- band L2 at level 1 is similar to the frame 2 at the level 0, and the temporal high- frequency sub-band HI is similar to an edge image of the frame 1 at the level 0.
  • pairs of frames 1 and 2, 3 and 4, 5 and 6, 7 and 8, 9 and 10, 11 and 12, 13 and 14, and 15 and 16 at the level 0 are replaced by pairs of sub-band frames HI and L2, H3 and L4, H5 and L6, H7 and L8, H9 and L10, HI 1 and L12, H13 and L14, and H15 and L16 which form frames at the level 1.
  • Temporal low-frequency sub-band frames at the level 1 are decomposed into temporal low-frequency sub-band frames and temporal high-frequency sub-band frames at level 2.
  • motion estimation is performed in a direction from the frame L2 to the frame L4.
  • a temporal high-frequency sub-band frame LH2 is positioned at a position of the frame L2
  • a temporal low-frequency sub-band frame LL4 is positioned at a position of the frame L4.
  • the frame LH2 is similar to an edge image of the frame L2
  • the frame LL4 is similar to the frame L4.
  • frames L2, L4, L6, L8, L10, L12, L14, and L16 at the level 1 are replaced by frames LH2, LL4, LH6, LL8, LH10, LL12, LH14, and LL16 at the level 2.
  • temporal low- frequency sub-band frames LL4, LL8, LL12 and LL16 at the level 2 are replaced by temporal high- and low-frequency sub-band frames LLH4, LLL6, LLH12, and LLL16 at level 3.
  • the temporal low- frequency sub-band frames LLL6 and LLL16 at the level 3 are finally replaced by temporal high- and low-frequency sub-band frames LLLH8 and LLLL16 at level 4.
  • a single GOF includes eight frames
  • the eight frames are finally decomposed into four types of temporal sub-bands through temporal filtering from level 0 to level 3.
  • the 32 frames are finally decomposed into six types of temporal sub-bands through temporal filtering from level 0 to level 5.
  • the present invention provides a scalable data coding method in which performance is improved by reducing a total temporal distance for motion estimation.
  • an average temporal distance (ATD) is defined.
  • ATD average temporal distance
  • a temporal distance is calculated first.
  • the temporal distance is defined as a positional difference between two frames. For example, a temporal distance between the frame 1 and the frame 2 is defined as 1, and a temporal distance between the frame L2 and the frame L4 is defined as 2.
  • the ATD is obtained by dividing the sum of temporal distances between frames in pairs, which are subjected to an operation for motion estimation, by the number of the pairs of the frames.
  • Table 1 Number of pairs of frames and temporal distance for motion estimation at each level in conventional IWVC
  • FIGS. 3 throigh 8 illustrate different directions of motion estimation in IWVC according to different embodiments of the present invention.
  • an IWVC method having directions of motion estimation shown in FIGS. 3 and 4 is referred to as Methodl.
  • Method 2 An IWVC method having directions of motion estimation shown in FIG. 7 is referred to as Method3, and an IWVC method having directions of motion estimation shown in FIG. 8 is referred to as Method4. Since Methodl and Method2 provide a minimum ATD, they will be described in more detail by dividing each method into two modes according to whether a direction of motion estimation at the level 3 is a forward direction or a backward direction. In other words, Methodl is divided into Methodl-a and Methodl-b, and Method2 is divided into Method2-a and Method2-b. In FIGS. 3 throigh 8, solid lines denote forward motion estimation, and dotted lines denote backward motion estimation.
  • both of the forward motion estimation and the backward motion estimation are present in level 0.
  • Motion estimation between frames 1 and 2 is performed in a forward direction from the frame 1 to the frame 2.
  • a temporal high-frequency sub-band frame HI is positioned at the frame 1
  • a temporal low-frequency sub-band frame L2 is positioned at the frame 2.
  • Motion estimation between frames 3 and 4 is performed in a backward direction from the frame 4 to the frame 3.
  • a temporal high-frequency sub-band frame H4 is positioned at the frame 4
  • a temporal low- frequency sub-band frame L3 is positioned at the frame 3.
  • a temporal distance for motion estimation at the level 1 is 2 in the conventional IWVC method
  • a temporal distance for motion estimation at the level 1 is 1 in Methodl shown in FIGS. 3 and 4.
  • All of the directions of motion estimation except for directions at level 3 are the same between Methodl-a and Methodl-b.
  • LLLL frames are positioned at positions of frames 10 and 7 in respective Methodl-a and Methodl-b.
  • Directions of motion estimation at the level 0 are the same between Method 1 and Method2, but directions of motion estimation at the level 1 are different between Methodl and Method2.
  • forward motion estimation is performed between frames L6 and L7 and backward motion estimation is performed between frames L10 and LI 1.
  • backward motion estimation is performed between the frames L6 and L7
  • forward motion estimation is performed between the frames L10 and LI 1.
  • All of the directions of motion estimation except for directions at level 3 are the same between Method2-a and Method2-b.
  • LLLL frames are positioned at positions of frames 11 and 6 in respective Method2-a and Method2-b.
  • Method3 and Method4 shown in FIGS.7 and 8 the LLLL frame is positioned at a position of a central frame, i.e., frame 8.
  • Method3 and Method4 provide a larger ATD and are arranged in Tables 4 and 5.
  • Table 4 Number of pairs of frames and temporal distance for motion estimation at each level in Method3 Number of pairs of frames for Temporal distance for motion motion estimation estimation Level 0 8 1 Level 1 Level 2 Level 3
  • Methodl through Method4 are 153, 153, 1.73, and 1.67, respectively; while the ATD obtained in the conventional IWVC is 2.13.
  • Methodl through Method4 shown in FIGS. 3 throigh 8 Methodl and Method2 provide a least ATD.
  • the ATD corresponds to a total temporal distance for motion estimation.
  • a total motion vector also decreases.
  • Such characteristic gives higher coding efficiency than the conventional IWVC.
  • FIG. 9 is a graph comparing peak signal to noise ratios (PSNRs) with respect to a 'Canoe' sequence between the conventional IWVC and the embodiments of the present invention.
  • PSNRs peak signal to noise ratios
  • FIG. 10 is a graph comparing PSNRs with respect to a 'Bus' sequence between the conventional IWVC and the embodiments of the present invention.
  • Methodl-a and Method2-a give higher PSNRs than the conventional IWVC by 1.0 dB and 15 dB, respectively.
  • Method3 and Method4 provide lower performance than Methodl-a and Method2-a but provide higher performance than the conventional IWVC.
  • FIG. 11 is a graph comparing changes in PSNRs with respect to a 'Canoe' sequence between the conventional IWVC and the embodiments of the present invention.
  • a total interframe temporal distance for motion estimation is reduced in a scalable video coding method using wavelets so that the performance of video coding can be improved.

Abstract

An interframe wavelet video coding (IWVC) method by which an average temporal distance (ATD) is minimized is provided. The IWVC method comprises receiving a group-of-frames and decomposing the group-of-frames into difference frames and first average frames between the frames in a first forward temporal direction and a backward temporal direction, wavelet-decomposing the first difference frames and the first average frames, and quantizing coefficients resulting from the wavelet-decomposition to generate a bitstream. The IWVC method provides improved video coding performance.

Description

Description INTERFRAME WAVELET VIDEO CODING METHOD Technical Field
[1] The present invention relates to a wavelet video coding method, and more particularly, to an interframe wavelet video coding (IWVC) method in which an average temporal distance is reduced by changing a temporal filtering direction. Background Art
[2] With the development of information communication technology including Internet, video communication as well as text and voice communication has increased. Conventional text communication cannot satisfy the various demands of users, and thus multimedia services that can provide various types of information such as text, pictures, and music have increased. Multimedia data requires a large capacity storage medium and a wide bandwidth for transmission since the amount of multimedia data is usually large. For example, a 24-bit true color image having a resolution of 640 * 480 needs a capacity of 640 * 480 * 24 bits, i.e., data of about 7.37 Mbits, per frame. When this image is transmitted at a speed of 30 frames per second, a bandwidth of 221 Mbits/sec is required. When a 0-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required. Accordingly, a compression coding method is a requisite for transmitting multimedia data including text, video, and audio.
[3] A basic principle of data compression is removing data redundancy. Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy taking into account human eyesight and limited perception of high frequency. Data compression can be classified into lossy/lossless compression according to whether source data is lost, intraframe/interframe compression according to whether individual frames are compressed independently, and symmetric/ asymmetric compression according to whether time required for compression is the same as time required for recovery. Data compression is defined as real-time c ompression when a compression/recovery time delay does not exceed 50 ms and as scalable compression when frames have different resolutions. For text or medical data, lossless compression is usually used. For multimedia data, lossy compression is usually used. Meanwhile, intraframe compression is usually used to remove spatial redundancy, and interframe compression is usually used to remove temporal redundancy.
[4] Difference types of transmission media for multimedia have different performance. Currently used transmission media have various transmission rates. For example, an ultrahigh- speed communication network can transmit data of several tens of megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second. In conventional video coding methods such as Motion Hcture Experts Group (MPEG)-l, MPEG-2, HJ63, and HJ64, temporal redundancy is removed by motion compensation based on motion estimation and compensation, and spatial redundancy is removed by transform coding. These methods have satisfactory compression rates, but they do not have the flexibility of a truly scalable bitstream. Accordingly, to support transmission media having various speeds or to transmit multimedia at a data rate suitable to a transmission environment, data coding methods having scalability, such as wavelet video coding and subband video coding, may be suitable to a multimedia environment. For example, Interframe Wavelet Video Coding (IWNC) can provide a very flexible, scalable bitstream. Disclosure of Invention Technical Problem
[5] Efawever, conventional IWNC has lower performance than a coding method such as HJ64. Due to this low performance, IWNC is used only for very limited applications although it has very excellent scalability. Accordingly, it has been an issue to improve the performance of data coding methods having scalability. Technical Solution
[6] The present invention provides a scalable data coding method which provides improved performance by reducing a total temporal distance for motion estimation.
[7] According to an aspect of the present invention, there is provided an interframe wavelet video coding method comprising receiving a group-of-frames and decomposing the group-of-frames into first difference frames and first average frames between the frames in a first forward temporal direction and a first backward temporal direction, wavelet-decomposing the first difference frames and the first average frames, and quantizing coefficients resulting from the wavelet-decomposition to generate a bitstream. Preferably, the interframe wavelet video coding method may further comprise obtaining a motion vector between frames and compensating for a temporal motion using the motion vector before decomposing the group-of-frames into the first difference and average frames. Also, the first forward temporal direction and the first backward temporal direction are preferably combined such that an average of temporal distances between frames in the group-of-frames is minimized.
[8] The decomposing the group-of-frames into the first difference and average frames may comprise (a) decomposing the group-of-frames into a first difference frame and a first average frame between two frames in the first forward temporal direction; and (b) decomposing the group-of-frames into another first difference frame and another first average frame between other two frames in the first backward temporal direction. The steps (a) and (b) may be alternately performed with respect to the frames in the group- of-frames. Meanwhile, the decomposing the group-of-frames into the first difference and average frames may further comprise decomposing the first average frames into a second difference frame and a second average frame between two first average frames in either of a second forward temporal direction and a second backward temporal directions. Here, the decomposing the first average frames into the second difference and average frames may be repeated a plurality number of times. The second forward temporal direction and the second backward temporal direction may be combined such that an average of temporal distances between frames in the group-of-frames is minimized.
[9] The decomposing the first average frames into the second difference and average frames may comprise (c) decomposing the first average frames into a second difference frame and a second average frame between two first average frames in the second forward temporal direction, and (d) decomposing the group-of-frames into another second difference frame and another second average frame between other two first average frames in the second backward temporal direction. The steps (c) and (d) may be alternately performed with respect to the first average frames. Description of Drawings
[10] The above and other features and advantages of the present invention will become more apparent by describing in detail preferred embodiments thereof with reference to the attached drawings in which:
[11] FIG. 1 is a block diagram of an encoder performing an interframe wavelet video coding (IWNC) method;
[12] FIG. 2 illustrates directions of motion estimation in a conventional IWNC;
[13] FIGS. 3 and 4 illustrate directions of motion estimation in IWNC according to a first embodiment of the present invention;
[14] FIGS. 5 and 6 illustrate directions of motion estimation in IWNC according to a second embodiment of the present invention;
[15] FIG. 7 illustrates directions of motion estimation in IWNC according to a third embodiment of the present invention;
[16] FIG. 8 illustrates directions of motion estimation in IWNC according to a fourth embodiment of the present invention;
[17] FIG. 9 is a graph comparing Peak Signal to Noise Ratios (PSNRs) with respect to a 'Canoe' sequence between a conventional IWNC method and embodiments of the present invention;
[18] FIG. 10 is a graph comparing PSΝRs with respect to a 'Bus' sequence between a conventional IWNC method and embodiments of the present invention; and
[19] FIG. 11 is a graph comparing changes in PSΝRs with respect to a 'Canoe' sequence between a conventional IWNC method and embodiments of the present invention. Mode for Invention
[20] A preferred embodiment of the present invention will now be described in detail with reference to the accompanying drawings.
[21] FIG. 1 is a block diagram of an encoder performing an interframe wavelet video coding (IWNC) method.
[22] The encoder performing an IWNC method includes a motion estimation block 10 which obtains a motion vector, a motion compensation temporal filtering block 40 which removes temporal redundancy using the motion vector, a spatial wavelet decomposition block 50 which removes spatial redundancy, a motion vector encoding block 20 which encodes the motion vector using a predetermined algorithm, a quantization block 60 which quantizes wavelet coefficients of respective components generated by the spatial wavelet decomposition block 50, and a buffer 30 which temporarily stores an encoded bitstream received from the quantization block 60.
[23] The motion estimation block 10 obtains a motion vector used by the motion compensation temporal filtering block 40 using a hierarchical method such as Herarchical Variable Size Block Matching (HNSBM).
[24] The motion compensation temporal filtering block 40 decomposes frames into low- and high-frequency frames in a temporal direction using the motion vector obtained by the motion estimation unit 10. In more details, an average of two frames is defined as a low- frequency component, and half of a difference between the two frames is defined as a high-frequency component. Frames are decomposed in Group-of- Frames (GOF) units. Through such decomposition, temporal redundancy is removed. Decomposition into high- and low- frequency frames may be performed using only a pair of frames without using a motion vector. Efawever, decomposition using a motion vector shows better performance than that using only a pair of frames. For example, where a portion of a first frame is moved in a second frame, an amount of a motion can be represented by a motion vector. The portion of the first frame is compared with a portion to which a portion of the second frame at the same position as the portion of the first frame is moved by the motion vector, and a temporal motion is compensated. Thereafter, the first and second frames are decomposed into low- and high-frequency frames.
[25] The spatial wavelet decomposition block 50 wavelet-decomposes frames that have been decomposed in the temporal direction by the motion compensation temporal filtering block 40 into spatial low- and high-frequency components, thereby removing spatial redundancy.
[26] The motion vector encoding block 20 encodes a motion vector hierarchically obtained by the motion estimation block 10 such that the motion vector has an optimal number of bits using a rate-distortion algorithm, and then transmits the encoded motion vector to the buffer 30. The quantization block 60 quantizes and encodes wavelet coefficients of components generated by the spatial wavelet decomposition block 50. An encoded bitstream assumes scalability. The buffer 30 stores the encoded bitstream before transmission and is controlled by a rate control algorithm.
[27] FIG. 2 illustrates directions of motion estimation in conventional IWNC.
[28] In FIG. 2, a single GOF includes 16 frames. Two adjacent frames in a pair are replaced by a high-frequency frame and a low-frequency frame. In the conventional IWNC, motion estimation is performed only in a single direction, i.e., in a forward direction.
[29] For example, at level 0, motion estimation between frames 1 and 2 is performed in a direction from the frame 1 to the frame 2. Thereafter, a temporal high-frequency sub- band frame HI is positioned at the frame 1, and a temporal low- frequency sub-band frame L2 is positioned at the frame 2. In this case, the temporal low-frequency sub- band L2 at level 1 is similar to the frame 2 at the level 0, and the temporal high- frequency sub-band HI is similar to an edge image of the frame 1 at the level 0. As such, pairs of frames 1 and 2, 3 and 4, 5 and 6, 7 and 8, 9 and 10, 11 and 12, 13 and 14, and 15 and 16 at the level 0 are replaced by pairs of sub-band frames HI and L2, H3 and L4, H5 and L6, H7 and L8, H9 and L10, HI 1 and L12, H13 and L14, and H15 and L16 which form frames at the level 1.
[30] Temporal low-frequency sub-band frames at the level 1 are decomposed into temporal low-frequency sub-band frames and temporal high-frequency sub-band frames at level 2. For example, for temporal decomposition, motion estimation is performed in a direction from the frame L2 to the frame L4. As a result, at the level 2, a temporal high-frequency sub-band frame LH2 is positioned at a position of the frame L2, and a temporal low-frequency sub-band frame LL4 is positioned at a position of the frame L4. Similarly, the frame LH2 is similar to an edge image of the frame L2, and the frame LL4 is similar to the frame L4. As such, frames L2, L4, L6, L8, L10, L12, L14, and L16 at the level 1 are replaced by frames LH2, LL4, LH6, LL8, LH10, LL12, LH14, and LL16 at the level 2.
[31] In the same manner as described above, the temporal low- frequency sub-band frames LL4, LL8, LL12 and LL16 at the level 2 are replaced by temporal high- and low-frequency sub-band frames LLH4, LLL6, LLH12, and LLL16 at level 3. The temporal low- frequency sub-band frames LLL6 and LLL16 at the level 3 are finally replaced by temporal high- and low-frequency sub-band frames LLLH8 and LLLL16 at level 4.
[32] In FIG. 2, shaded squares represent temporal high-frequency sub-band frames, and non-shaded squares represent temporal low-frequency sub-band frames. Consequently, the frames 1 through 16 at the level 0 are decomposed into five types of temporal sub- bands through temporal filtering from the level 0 to the level 4. This decomposition results in:
[33] one LLLL frame LLLL16;
[34] one LLLH frame: LLLH8;
[35] two LLH frames LLH4 and LLH12;
[36] four LH frames LH2, LH6, LH10, and LH14; and
[37] eight H frames HI, H3, H5, H7, H9, Hl l, H13, and H15.
[38] Where a single GOF includes eight frames, the eight frames are finally decomposed into four types of temporal sub-bands through temporal filtering from level 0 to level 3. Where a single GOF includes 32 frames, the 32 frames are finally decomposed into six types of temporal sub-bands through temporal filtering from level 0 to level 5.
[39] The present invention provides a scalable data coding method in which performance is improved by reducing a total temporal distance for motion estimation. To quantitatively calculate the total temporal distance, an average temporal distance (ATD) is defined. To calculate the ATD, a temporal distance is calculated first. The temporal distance is defined as a positional difference between two frames. For example, a temporal distance between the frame 1 and the frame 2 is defined as 1, and a temporal distance between the frame L2 and the frame L4 is defined as 2. The ATD is obtained by dividing the sum of temporal distances between frames in pairs, which are subjected to an operation for motion estimation, by the number of the pairs of the frames.
[40] Referring to FIG. 2, a temporal distance for motion estimation increases as the level increases. Where motion estimation is performed between the frames 1 and 2 at the level 1, a temporal distance is calculated as 2-1=1. Similarly, a temporal distance for motion estimation at the level 1 is 2, and a temporal distance for motion estimation at the level 3 is 8. In FIG. 2, 8, 4, 2, and 1 pairs of frames for motion estimation exist at the levels 0, 1, 2, and 3, respectively. Accordingly, the total number of pairs of frames used for motion estimation is 15. This is arranged in Table 1.
[41] Table 1 : Number of pairs of frames and temporal distance for motion estimation at each level in conventional IWVC
Figure imgf000009_0002
[42] As the temporal distance increases, a size of a motion vector also increases. In particular, this phenomenon rapidly appears in a video sequence having fast motions. In the conventional IWNC shown in FIG. 2, as the level increases, the temporal distance also increases. A large temporal distance at a high level may cause coding efficiency of the conventional IWVC to decrease. The ATD is calculated in the con¬ ventional IWVC as follows: [43]
Figure imgf000009_0001
[44] FIGS. 3 throigh 8 illustrate different directions of motion estimation in IWVC according to different embodiments of the present invention. Hereinafter, an IWVC method having directions of motion estimation shown in FIGS. 3 and 4 is referred to as Methodl. An IWVC method having directions of motion estimation shown in FIGS. 5 and 6 is referred to as Method2. An IWVC method having directions of motion estimation shown in FIG. 7 is referred to as Method3, and an IWVC method having directions of motion estimation shown in FIG. 8 is referred to as Method4. Since Methodl and Method2 provide a minimum ATD, they will be described in more detail by dividing each method into two modes according to whether a direction of motion estimation at the level 3 is a forward direction or a backward direction. In other words, Methodl is divided into Methodl-a and Methodl-b, and Method2 is divided into Method2-a and Method2-b. In FIGS. 3 throigh 8, solid lines denote forward motion estimation, and dotted lines denote backward motion estimation.
[45] Referring to FIGS. 3 and 4, in Methodl, both of the forward motion estimation and the backward motion estimation are present in level 0. Motion estimation between frames 1 and 2 is performed in a forward direction from the frame 1 to the frame 2. A temporal high-frequency sub-band frame HI is positioned at the frame 1, and a temporal low-frequency sub-band frame L2 is positioned at the frame 2. However, motion estimation on subsequent two frames is different. Motion estimation between frames 3 and 4 is performed in a backward direction from the frame 4 to the frame 3. A temporal high-frequency sub-band frame H4 is positioned at the frame 4, and a temporal low- frequency sub-band frame L3 is positioned at the frame 3.
[46] At the level 1, motion estimation is performed between the frames L2 and L3. As such, while a temporal distance for motion estimation at the level 1 is 2 in the conventional IWVC method, a temporal distance for motion estimation at the level 1 is 1 in Methodl shown in FIGS. 3 and 4. In other words, when motion estimation is performed in both of the forward and backward directions at the level 0, a temporal distance for the motion estimation can be reduced to 1 at the level 1. All of the directions of motion estimation except for directions at level 3 are the same between Methodl-a and Methodl-b. As shown in FIGS. 3 and 4, LLLL frames are positioned at positions of frames 10 and 7 in respective Methodl-a and Methodl-b.
[47] Directions of motion estimation at the level 0 are the same between Method 1 and Method2, but directions of motion estimation at the level 1 are different between Methodl and Method2. In Methodl, forward motion estimation is performed between frames L6 and L7 and backward motion estimation is performed between frames L10 and LI 1. Conversely, in Method2, backward motion estimation is performed between the frames L6 and L7, and forward motion estimation is performed between the frames L10 and LI 1. All of the directions of motion estimation except for directions at level 3 are the same between Method2-a and Method2-b. As shown in FIGS. 5 and 6, LLLL frames are positioned at positions of frames 11 and 6 in respective Method2-a and Method2-b.
[48] The numbers of pairs of frames used for motion estimation and temporal distances in Methodl and Method2 are shown in Tables 2 and 3. Table 3: Number of pairs of frames and temporal distance for motion estimation at each level in Method2
Figure imgf000011_0001
[50] Table 3: Number of pairs of frames and temporal distance for motion estimation at each level in Method2
Figure imgf000011_0002
[51] The ATD is calculated in Methodl as follows: [52]
ATD- 8x1 + 4x1 + 2x4 + 1x3 = 1.53 15
[53] The ATD is calculated in Method2 as follows: [54]
ATD 8x1+4x1+2x3+1x5 = 1. 15 53
[55] In Method3 and Method4 shown in FIGS.7 and 8, the LLLL frame is positioned at a position of a central frame, i.e., frame 8. As compared to Methodl and Method2, Method3 and Method4 provide a larger ATD and are arranged in Tables 4 and 5.
[56] Table 4: Number of pairs of frames and temporal distance for motion estimation at each level in Method3 Number of pairs of frames for Temporal distance for motion motion estimation estimation Level 0 8 1 Level 1 Level 2 Level 3
[57] Table 5: Number of pairs of frames and temporal distance for motion estimation at each level in Method4
Figure imgf000012_0001
[58] The ATD is calculated in Method3 as follows: [59] ATD = 8 χ 1 + 4 χ 2 + 2 χ 4 + 1 χ 2 = 1 73 15
[60] The ATD is calculated in Method4 as follows: [61] ATp = 8 x 1 + 4 x 1 + 2 x 4 + 1 x 1 = 1 67 15
[62] The ATDs obtained in Methodl through Method4 are 153, 153, 1.73, and 1.67, respectively; while the ATD obtained in the conventional IWVC is 2.13. Among Methodl through Method4 shown in FIGS. 3 throigh 8, Methodl and Method2 provide a least ATD.
[63] The ATD corresponds to a total temporal distance for motion estimation. When a total temporal distance for motion estimation decreases, a total motion vector also decreases. Such characteristic gives higher coding efficiency than the conventional IWVC.
[64] FIG. 9 is a graph comparing peak signal to noise ratios (PSNRs) with respect to a 'Canoe' sequence between the conventional IWVC and the embodiments of the present invention. Methodl-a and Method2-a provide almost the same performance and give a higher PSNR than the conventional IWVC by 1.0 through 15 dB.
[65] FIG. 10 is a graph comparing PSNRs with respect to a 'Bus' sequence between the conventional IWVC and the embodiments of the present invention. Methodl-a and Method2-a give higher PSNRs than the conventional IWVC by 1.0 dB and 15 dB, respectively. Method3 and Method4 provide lower performance than Methodl-a and Method2-a but provide higher performance than the conventional IWVC.
[66] FIG. 11 is a graph comparing changes in PSNRs with respect to a 'Canoe' sequence between the conventional IWVC and the embodiments of the present invention.
[67] It can be inferred from FIG. 11 that a PSNR is highest at the position of the LLLL frame in a GOF in all of the methods. Industrial Applicability
[68] According to the present invention, a total interframe temporal distance for motion estimation is reduced in a scalable video coding method using wavelets so that the performance of video coding can be improved.
[69] Although only a few embodiments of the present invention have been shown and described with reference to the attached drawings, it will be understood by those skilled in the art that changes may be made to these elements without departing from the features and spirit of the invention. For example, in the above-described embodiments of the present invention, a single GOF includes 16 frames. However, the present invention is not restricted thereto. In addition, the embodiments of the present invention has been described and tested based on IWVC. However, the present invention can be applied to other coding techniques. Therefore, it is to be understood that the above-described embodiments have been provided only in a descriptive sense and will not be construed as placing any limitation on the scope of the invention.

Claims

Claims
[1] An interframe wavelet video coding method comprising: receiving a group-of-frames and decomposing the group-of-frames into first difference frames and first average frames between the frames in a first forward temporal direction and a first backward temporal direction; wavelet-decomposing the first difference frames and the first average frames; and quantizing coefficients resulting from the wavelet-decomposition to generate a bitstream.
[2] The interframe wavelet video coding method of claim 1, further comprising obtaining a motion vector between frames and compensating for a temporal motion using the motion vector before decomposing the group-of-frames into the first difference and average frames.
[3] The interframe wavelet video coding method of claim 1, wherein the first forward temporal direction and the first backward temporal direction are combined such that an average of temporal distances between frames in the group-of-frames is minimized.
[4] The interframe wavelet video coding method of claim 1, wherein decomposing the group-of-frames into the first difference and average frames comprises: (a) decomposing the group-of-frames into a first difference frame and a first average frame between two frames in the first forward temporal direction; and (b) decomposing the group-of-frames into another first difference frame and another first average frame between other two frames in the first backward temporal direction.
[5] The interframe wavelet video coding method of claim 4, wherein steps (a) and (b) are alternately performed with respect to the frames in the group-of-frames.
[6] The interframe wavelet video coding method of claim 5, wherein the decomposing the group-of-frames into the first difference and average frames further comprises decomposing the first average frames into a second difference frame and a second average frame between two first average frames in either of a second forward temporal direction and a second backward temporal directions.
[7] The interframe wavelet video coding method of claim 6, wherein decomposing the first average frames into the second difference and average frames is repeated a plurality number of times.
[8] The interframe wavelet video coding method of claim 7, wherein the second forward temporal direction and the second backward temporal direction are combined such that an average of temporal distances between frames in the group-of-frames is minimized.
[9] The interframe wavelet video coding method of claim 6, wherein decomposing the first average frames into the second difference and average frames comprises: (c) decomposing the first average frames into a second difference frame and a second average frame between two first average frames in the second forward temporal direction; and (d) decomposing the group-of-frames into another second difference frame and another second average frame between other two first average frames in the second backward temporal direction.
[10] The interframe wavelet video coding method of claim 9, wherein steps (c) and (d) are alternately performed with respect to the first average frames. [11] The interframe wavelet video coding method of claim 4, wherein steps (a) and (b) are performed alternately and sequentially with respect to the frames in the group-of-frames. [12] The interframe wavelet video coding method of claim 4, wherein step (a) is preformed with respect to temporally first half of all of the frames in the group of frames and step (b) is performed with respect to temporally second half of all of the frames in the group-of-frames. [13] The interframe wavelet video coding method of claim 9, wherein steps (a) and (b) are performed alternately and sequentially with respect to the frames in the group-of-frames. [14] The interframe wavelet video coding method of claim 9, wherein step (a) is preformed with respect to temporally first half of all of the frames in the group of frames and step (b) is performed with respect to temporally second half of all of the frames in the group-of-frames. [15] The interframe wavelet video coding method of claim 6, wherein decomposing the first average frames into the second difference and average frames is repeated at least one time.
PCT/KR2004/001666 2003-07-18 2004-07-07 Interframe wavelet video coding method WO2005009046A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020030049449A KR20050009639A (en) 2003-07-18 2003-07-18 Interframe Wavelet Video Coding Method
KR10-2003-0049449 2003-07-18

Publications (1)

Publication Number Publication Date
WO2005009046A1 true WO2005009046A1 (en) 2005-01-27

Family

ID=36841006

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2004/001666 WO2005009046A1 (en) 2003-07-18 2004-07-07 Interframe wavelet video coding method

Country Status (3)

Country Link
KR (1) KR20050009639A (en)
CN (1) CN1810040A (en)
WO (1) WO2005009046A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101014129B (en) * 2007-03-06 2010-12-15 孟智平 Video data compression method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5495292A (en) * 1993-09-03 1996-02-27 Gte Laboratories Incorporated Inter-frame wavelet transform coder for color video compression

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5495292A (en) * 1993-09-03 1996-02-27 Gte Laboratories Incorporated Inter-frame wavelet transform coder for color video compression

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
OHM J.R.: "Temporal domain sub-band video coding with motion compensation", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. 3, 23 March 1992 (1992-03-23) - 26 March 1992 (1992-03-26), pages 229 - 232, XP000378915 *
VAN DER SCHAAR M., TURAGA D.S.: "Unconstrained motion compensated temporal filtering (UMCTF) framework for wavelet video coding", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. 3, 6 April 2003 (2003-04-06) - 10 April 2003 (2003-04-10), pages III-81 - 84 *

Also Published As

Publication number Publication date
KR20050009639A (en) 2005-01-25
CN1810040A (en) 2006-07-26

Similar Documents

Publication Publication Date Title
JP4891234B2 (en) Scalable video coding using grid motion estimation / compensation
US20050226334A1 (en) Method and apparatus for implementing motion scalability
US20060088222A1 (en) Video coding method and apparatus
US20050169379A1 (en) Apparatus and method for scalable video coding providing scalability in encoder part
US20050157793A1 (en) Video coding/decoding method and apparatus
US6931068B2 (en) Three-dimensional wavelet-based scalable video compression
US7042946B2 (en) Wavelet based coding using motion compensated filtering based on both single and multiple reference frames
US20050163217A1 (en) Method and apparatus for coding and decoding video bitstream
EP1766998A1 (en) Scalable video coding method and apparatus using base-layer
US20030202599A1 (en) Scalable wavelet based coding using motion compensated temporal filtering based on multiple reference frames
US20050047509A1 (en) Scalable video coding and decoding methods, and scalable video encoder and decoder
US20060013312A1 (en) Method and apparatus for scalable video coding and decoding
US20050158026A1 (en) Method and apparatus for reproducing scalable video streams
EP1741297A1 (en) Method and apparatus for implementing motion scalability
Conklin et al. A comparison of temporal scalability techniques
US7292635B2 (en) Interframe wavelet video coding method
EP1709811A1 (en) Device and method for playing back scalable video streams
WO2005020587A1 (en) Adaptive interframe wavelet video coding method, computer readable recording medium and system therefor
WO2005009046A1 (en) Interframe wavelet video coding method
Tillier et al. Multiple descriptions scalable video coding
KR100577364B1 (en) Adaptive Interframe Video Coding Method, Computer Readable Medium and Device for the Same
WO2006006793A1 (en) Video encoding and decoding methods and video encoder and decoder
EP1766986A1 (en) Temporal decomposition and inverse temporal decomposition methods for video encoding and decoding and video encoder and decoder
Al-Asmari et al. Low bit rate video compression algorithm using 3-D Decomposition
Kim et al. Scalable interframe wavelet coding with low complex spatial wavelet transform

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 20048170007

Country of ref document: CN

122 Ep: pct application non-entry in european phase