WO2006024988A2

WO2006024988A2 - A method and apparatus for motion estimation

Info

Publication number: WO2006024988A2
Application number: PCT/IB2005/052756
Authority: WO
Inventors: Jin Wang
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2004-08-31
Filing date: 2005-08-23
Publication date: 2006-03-09
Also published as: WO2006024988A3; KR20070051294A; EP1790166A2; JP2008512023A

Abstract

A method and apparatus for spatial layered compression of video stream are disclosed. Reference motion vector is introduced into the compression scheme of the present invention, and according to this reference motion vector, the base layer and enhancement layer may acquire the motion vector of the corresponding frame of image of the video stream, respectively, and thereby to respectively generate a base layer and enhancement layer. The introduced reference motion vector makes the motion estimation about the base layer associated with the motion estimation about the enhancement layer, thereby to reduce the total amount of calculation of the motion estimation about the base layer and enhancement layer. Further, because the reference frame for obtaining the reference motion vector can be obtained from original video sequence and no additional harmful operating is made on the original video sequence, the reference motion vector may better reflect actual motion within the video sequence.

Description

A METHOD AND APPARATUS FOR MOTION ESTIMATION

BACKGROUND OF THE INVENTION

This invention relates to a method and apparatus for compressing video stream, particularly relates to a method and apparatus for compressing video stream by using spatial layered compressing scheme.

Due to a large quantity of data being contained in the digital video, when making high definition television programs, it is a prodigious problem to transmit the high resolution video signals. Specifically, each frame of digital image is a still picture ( also called image ) made up of a group of pixel-points (also called pixels).

The quantity of the pixels depends upon the display definition of a particular system.

Thus, the quantity of the original digital information of the high resolution video is very large. Many video compression standards, such as MPEG - 2, MPEG - 4 and

H.263 etc. have been generated for reducing the quantity of the necessary data to be transmitted.

All of the above described standards support the layering technique, including spatial layering, temporal layering, SNR layering etc. In the layered encoding, the bit stream is divided into more than two kinds of bit streams or layers for encoding. Then, during decoding, the respective layers may be combined as desired to form a high resolution signal. For example, the base layers may provide a low resolution video stream, and the enhancement layers may provide additional information to enhance the base layer image.

Among current spatial layered compressing schemes, in addition to adopting the above said layered compressing technique, the motion prediction has been used to obtain predictive image in accordance with the relevance between the former and latter frames. Before being compressed, the input video stream is processed to form I,

P and B frames and to form a sequence in accordance with the parameter setting. The

I frame is encoding according to the information of itself only, P frame is predictively encoding according to the I or P frames nearest to it in the front, and B frame is predictively encoding according to itself or the frames before and after it.

Fig.l is a block diagram of a video coder 100 supporting the spatial layered compressing of MPEG - 2 / MPEG - 4. The video encoder 100 comprises base-encoder 112 and enhancement encoder 114. The base-encoder comprises a downsampler 120, a motion estimation ( ME ) means 122, a motion compensator (MC) 124, a right angle transforming ( for example, discrete cosine transform ( DCT ) ) circuit 135, a quantize^ Q )132, a variable length encodei( VLC )134, a bitrate control circuit 135, an inverse quantizer (IQ ) 138, an inverse transform circuit ( IDCT ) 140, switches 128 and 144 as well as upsampler 150. The enhancement encoder 114 comprises a motion estimation means 154, a motion compensator 155, a right angle transforming (for, example DCT transform ) circuit 158, a quantizer 160, a variable length encoder 162, a bitrate control circuit 164, an inverse quantizer 166, inverse transform circuit ( IDCT ) 168 and switches 170 and 172. All functions of the means mentioned above are well known in the art, so they will not be described in detail herein.

It is well known that the motion estimation is one of the most time-consuming portion in the video compressing system, that is, the larger the amount of calculation of motion estimation is, the lower the encoding efficiency of the video compression system is. In the layered encoding compressing scheme described above, during predicting the video images of the same frame, the motion estimation will be made for both the base layer and the enhancement layer, respectively, and no association exists between them. However, when the motion estimations are made for the base layer and the enhancement layer, respectively, since prediction is made for the same frame of image, a relatively large portion of the searching process is repeated, which results in the larger amount of calculation of motion estimation and lower encoding efficiency of the compressing scheme. Therefore, there is the need for a spatial layered video compression scheme with better efficiency of encoding.

OBJECT AND SUMMARY OF THE INVENTION

The present invention is directed to a much more efficient spatial layered compression method to overcome the disadvantages of the spatial layered compression scheme described above, by introducing a reference motion vector, the present invention allow the motion estimation of the base layer to associate with that of the enhancement layer such that the originally repetitive searching processes could be finished in one time, and then a small amount of searching is performed; thereby, on this basis, the computing complexity of the motion estimation is reduced and the efficiency of compressed encoding is improved. An embodiment in accordance with the invention discloses a method for spatial layered compression of video stream and an apparatus thereof. Firstly, processing the original video stream to obtain a reference motion vector for each of frame of image of the video stream; then, down-sampling the reference motion vector, and down-sampling the video stream; secondly, according to the down-sampled reference motion vector to acquire a motion vector of the corresponding frame of image of the down-sampled video stream; next, processing corresponding frame of image of the down-sampled video stream, respectively by using the motion vector, whereby to generate a base layer; finally, according to the reference motion vector, acquiring a motion vector of the corresponding frame of image of the video stream during generating the enhancement layer, and processing the video stream by using the motion vector and the base layer, whereby to generate an enhancement layer.

An alternative embodiment in accordance with the invention discloses yet another method for spatial layered compression of video stream and an apparatus thereof. Firstly, down-sampling the video stream to acquire a reference motion vector for each frame of image of the down-sampled video stream; secondly, according to said reference motion vector, acquiring a motion vector of the corresponding frame of image of the down-sampled video stream ; then, processing the down-sampled video stream by using the motion vector, whereby to generate a base layers ; finally, up-sampling said reference motion vector during generating enhancement layer, and according to the up-sampled reference motion vector, acquiring a motion vector of the corresponding frame of image of the video stream generate, and processing the video stream by using the motion vector and the base layers, thereby to generate an enhancement layer.

Another embodiment in accordance with the invention discloses a further method for spatial layered compression of video stream and an apparatus thereof.

Firstly, processing the video stream, thereby to generate a base layer ; then, up-sampling the motion vector for each frame of image of the base layer, thereby to acquire a reference motion vector of the corresponding frame of image; finally, according to the reference motion vector, acquiring a motion vector of the corresponding frame of image of the video stream, thereby processing the video stream by using the motion vector and the base layer to generate an enhancement layer.

Other objects and attainments together will a fuller understanding of the invention will become apparent and appreciated by referring to the following description in conjunction with the accompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is explained in detail by way of embodiments and with reference to the accompanying drawings, in which:

Fig.l is a block diagram of spatial layered compressing video encoder in accordance with the prior art;

Fig.2 is a schematic diagram of encoding system using reference motion vector in accordance with an embodiment of the invention:

Fig.3 is a flowchart of encoding by using the reference motion vector in accordance with one embodiment of the invention; Fig.4 is a schematic diagram of an encoding system using reference motion vector in accordance with another embodiment of the invention; and

Fig.5 is a schematic diagram of an encoding system using reference motion vector in accordance with a further embodiment of the invention.

Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.

DETAILED DESCRIPTION OF THE INVENTION

Fig.2 is a schematic diagram of an encoding system using reference motion vector in accordance with one embodiment of the invention. The encoding system 200 is used for the layered compressing, wherein the base layer portion is used to provide low resolution base information of the video stream and the enhancement layer is used to transfer edge enhancement information, both kinds of information may be recombined at the receiving terminal to form the high-resolution picture information.

The encoding system 200 comprises an acquiring means 216, a base layer acquiring means 212 and an enhancement layer acquiring means 214. Wherein, the acquiring means 216 is used for processing the original video stream, thereby to obtain the reference motion vector for each frame of image of the video stream. Acquiring means 216 comprises a motion estimation means 276 and a frame memory 282. The frame memory 282 is used to store the original video sequence. The motion estimation means 276 is used to acquire the reference frames ( for example: I or P frames ) from frame memory 282, and to make motion estimation on the current frame ( for example P frames ) according to the reference frames, thereby to derive the reference motion vector of the current frame by computing.

The base layer acquiring means 212 processes the video stream using the reference motion vector, thereby to generate a base layer. Means 212 comprises down-samplers 120, 286. The down-sampler 120 is used to down-sample the original video stream. Down-sampler 286 is used to down-sample the reference motion vector. Of course, those skilled in the art should well know that it is also possible to perform the down-sampling for the original video stream and reference motion vector with one down-sampler.

The base layer acquiring means 212 further comprises a motion vector acquiring means 222. The motion vector acquiring means 222 is used to acquire the motion vector of the corresponding frame of image of the down-sampled video stream based on the down-sampled reference motion vector. The process by which the motion vector acquiring means 222 acquires the motion vector will be described as follows.

The base layer acquiring means 212 further comprises a base layer generation means 213, using the motion vector to process the down-sampled video stream, thereby to generate the base layer. Except the down-samplers 120, 286 and motion vector acquiring means 222, all the other means within the base layer acquiring means 212 are basically the same as the base layer encoder Fig.l and belong to the base layer generation means 213, including motion compensator 124, DCT transform circuit 130, quantizer 132, variable length encoder 134, bitrate control circuit 135, inverse quantizer 138, inverse transform circuit 140, arithmetic units 125, 148, switches 128, 144 and up-sampler 150. The process with which the base layer generation means 213 generates the base layer based on the motion vector output from the motion vector acquiring means 222 is substantively the same as that of the prior art and will be discussed in detail as below.

In comparison with Fig.l, within the above described base layer acquiring means 212, the same reference number designates the components having identical or similar features and functions. The only difference between the motion estimation means 122 and the motion vector acquiring means 222 is the ways by which they acquire the motion vectors. The motion estimation means 122 of Fig.l directly uses the reference frames of the frame memory ( not shown ) to search within a larger searching window for acquiring the motion vector of the corresponding frame of image of the video stream, while the motion vector acquiring means 222 of Fig.2 further searches within a smaller searching window based on said reference motion vector for acquiring the motion vector of the corresponding frame of image of the video stream.

The enhancement layer acquiring means 214 processes the video stream by using the reference motion vector and the base layer, thereby to generate an enhancement layer. The means 214 comprises a motion vector acquiring means 254 and an enhancement layer generation means 215.

The motion vector acquiring means 254 is used to acquire the motion vector of the corresponding frame of image of the video stream based on the reference motion vector.

The enhancement layer generation means 215 processes the video stream by using the motion vector and the base layer, thereby to generate the enhancement layer. In the enhancement layer acquiring means 214, the components are substantially the same as those in the enhancement layer encoder 114 of the Fig. 1 except for the motion vector acquiring means 254, and all of them belong to the enhancement layer generation means 215 which includes motion compensator 155, DCT circuit 158, quantizer 160, variable length encoder 162, bitrate control circuit 164, inverse quantizer 166, inverse DCT circuit 168, and switches 170, 172. These components are similar to the corresponding components of the base layer acquiring means 212 in function. The process with which the enhancement layer generation means 215 generates enhancement layer by using the motion vector output from the motion vector acquiring means 254 is essentially same as that of the prior art, and the detailed description will be given as below.

In comparison with Fig.l, within the above described base layer acquiring means 214, the same reference number designates the components having identical or similar features and functions. The only difference between the motion estimation means 154 and motion vector acquiring means 254 is the ways by which they acquire the motion vector. The motion estimation means 154 of Fig.l directly uses the reference frames of the frame memory ( not shown ) to search within a larger searching window for acquiring the motion vector of the corresponding frame of image of the video stream, while the motion vector acquiring means 254 of Fig.2 further searches within a smaller searching window based on said reference motion vector for acquiring the motion vector of the corresponding frame of image of the video stream.

In conjunction with Fig.2, the process, with which the base layer acquiring means 212 and the enhancement layer acquiring means 214 acquire respective motion vectors by using the reference motion vector output by the acquiring means 216 and thereby to generate the base layer and enhancement layer, will be described in detail in the following.

An original video stream is inputted to the acquiring means 216 and then fed to motion estimation means 276 and frame memory 282, respectively. It shall be noted that, before being supplied to the acquiring means 216, the video stream has been processed to form I, P, B frames, and form such a sequence as I, B, P, B, P..., B, P, in accordance with the parameter setting. The input video sequence is stored in the frame memory 282. The motion estimation means 276 is used to acquire the reference frames ( for example: I frames ) from frame memory 282, and to make motion estimation on the current frame ( for example P frames ) according to the reference frames, thereby to compute the reference motion vector of the macro block of the current frame. The macro block is a sub-block with 16*16 pixels within the currently encoded frame and is used to match the blocks between the current macro block and reference frame to calculate the reference motion vector of the current macro block, and thereby to obtain the reference motion vector of the current frame.

There are four ways used for image prediction in the MPEG, comprising intra-frame encoding, forward predictive encoding, backward predictive encoding and bi-directional predictive encoding. I frame is intra-frame encoding image, P frame is the intra-frame encoding or forward predictive encoding or backward predictive encoding image, and B frame is intra-frame encoding or forward predictive encoding or bi-directional predictive encoding image.

Motion estimation means 276 makes forward prediction to the P frame and calculates its reference motion vector. In addition, the motion estimation means also makes forward or bi-directional prediction to the B frame and calculates its reference motion vector. No motion prediction is needed for intra-frame encoding.

Taking the forward prediction on the P frame as an example, the process of calculating the reference motion vector is described as following. The motion estimation means 276 reads out the previous reference frame from the frame memory 282, and searches in the searching window of the previous reference frame for a macro block that most marches the pixel block of the current frame. There are several algorithms for the match searching in the prior art, generally, the state of matching is judged by the mean squared error ( MAD ) or absolute error ( MSE ) between the pixel of the currently input block and the pixel of the corresponding block of the reference frame. The corresponding block of the reference frame having the minimum MAD or MSE is the optimum matching block, and the relative position of said optimum matching block to the position of the current block is the reference motion vector. .

By the processing described above, the motion estimation means 276 in the acquiring means 216 may acquire the reference motion vector of a frame of image of the video stream. After being down-sampled by the down-sampler 286, the reference motion vector is fed to the motion estimation means 222 of the base layer acquiring means 212, such that the motion estimation means 222 could make further motion estimation on the same frame of image at the base layer. Besides, the reference motion vector may also be fed to the motion estimation means 254 to the enhancement layer acquiring means 214, such that the motion estimation means 254 could make further motion estimation on the same frame of image at the enhancement layer.

While the acquiring means 216 is motion-estimating the input video stream, the base layer acquiring means 212 and the enhancement layer acquiring means 214 are also predictively encoding the input video stream, however said predictive encoding is a little bit delayed in time, because the base layer and the enhancement layer must make further motion estimation based on the reference motion vector.

The process in which the base layer makes further motion estimation based on the above reference motion vector is discussed as below;

The original input video stream is divided by the separator, and supplied to the base layer acquiring means 212 and enhancement layer acquiring means, respectively. In the base layer acquiring means, the input video stream is fed into down-sampler

120. The down-sampler may be a low - pass filter used to reduce the resolution of the input video stream. Then the down-sampled video stream is fed into motion vector acquiring means 222. The motion vector acquiring means 222 acquires the image of the previous reference frame of the video sequence stored in the frame memory, and searches a macro block that is best marching the current frame within a smaller searching window of the previous reference frame based on the down-sampled reference motion vector of the current frame output from above down-sampler 286, and thereby to acquire the video motion vector of a corresponding frame of image of the down-sampled video stream. After receiving the above described prediction mode from the motion vector acquiring means 222, reference motion vector and motion vector, the motion compensator 124 may read out image data of the previous reference frame stored in the frame memory ( not shown ) which had been encoded and partly decoded on the basis of the prediction mode, reference motion vector and motion vector, and shift the previous frame of image in accordance with the reference motion vector, then it shifts the same once more in accordance with the motion vector, thereby to predict the current frame of image. Of course, the previous frame of image can be shifted for only once by the amount that is the sum of the reference motion vector and the motion vector; in this case, the sum of the reference motion vector and the motion vector can be used as the motion vector of said frame of image. Then, the motion compensator 124 provides the predicted image to arithmetic unit 125 and switch 144. Arithmetic unit 125 also receives the input video stream, and calculates the difference between the image of the input video stream and the predicted image coming from motion compensator 124. The difference is supplied to the DCT circuit 130. If the prediction mode received from the motion estimation means 122 is intra-frame prediction, the motion compensator 124 does not output any predicted image. In such a case, arithmetic unit 125 does not perform the above described processing, but directly input the video stream to DCT circuit 130.

The DCT circuit 130 performs DCT processing on the signal output from the arithmetic unit to acquire DCT coefficient, which are supplied to quantizer 132. The quantizer 132 sets the magnitude for quantizing ( quantizing level ) based on the amount of data stored in the buffer, and quantizes the DCT coefficient supplied from DCT circuit 130 by using the quantizing level. The quantized DCT coefficient and the set quantizing magnitude are supplied to the VLC unit 134 together. According to the supplied quantizing magnitude from the quantizer 132, the

VLC unit 134 converts the quantizing coefficients supplied coefficient from the quantizer into a variable length code, e.g., Huffman code, thereby to generate a base layer.

Further, the converted quantizing coefficients are output to a buffei( not shown ). The quantizing coefficient and quantizing magnitude are also supplied to the inverse quantizer 138 which inversely quantizes the quantizing coefficient according to the quantizing magnitude so as to convert the quantizing coefficient into DCT coefficient. The DCT coefficients are supplied to the inverse DCT unit 140 which performs inverse DCT conversion to the DCT coefficients. The acquired inverse DCT coefficients are supplied to arithmetic unit 148.

The arithmetic unit 148 receives the inverse DCT coefficients from the inverse DCT unit 140, and receives data from motion compensator 124 according to the position of the switch 144. The arithmetic unit 148 calculates the sum of the signal supplied by inverse DCT unit 140 and the predictive image supplied by motion compensator 124 to partly decode the original image. However, if the prediction mode is of intra-frame encoding, the output of inverse DCT unit 140 may be directly sent to the frame memory. The decoded image acquired by the arithmetic unit 148 are fed to and stored in the frame memory to be used as a reference frame for intra-frame encoding, forward encoding, backward encoding, or bi-directional encoding in the future. The output of the arithmetic unit 140 is also supplied to the up-sampler 150 to generate a reconstructed stream which has the resolution that is substantially the same as that of the high resolution input video stream. However, due to be filter and damage brought by the compressing and decompressing, the reconstructed stream has errors in some degree. Said difference is determined by subtracting the reconstructed high resolution video stream from the original unchanged high resolution video stream and is inputted to the enhancement layer to be encoded. Therefore, the enhancement layer encodes and compresses the frames having said difference information.

The process of predictive encoding for the enhancement layer is very similar to that for the base layer. After the acquiring means 216 obtains the reference motion vector, the reference motion vector is fed to the motion estimation means 254 of the enhancement layer acquiring means 214. In this way, the motion estimation means 254 makes further motion estimation on the same frame image at the enhancement layer based on the reference motion vector, thereby to acquire the motion vector of the corresponding frame of image of the video stream. Then, according to the prediction mode, reference motion vector and said motion vector, the motion compensator 155 shifts the reference frames correspondingly, thereby to predict the current frame. Because this process of motion prediction is similar to that for the base layer, it will not be discussed in detail herein. Fig.3 is a flowchart of encoding by using the reference motion vector in accordance with one embodiment of the invention. This flow is an operational flow of means 200.

Firstly, receiving a specific high resolution video stream (step S305), e.g., a video stream having a resolution of 1920* 108Oi.

Next, acquiring the reference motion vector for each frame of image of the video stream ( step S310 ) . Supposing the current frame is P frame, the macro block that best matching the current frame is searched within the searching window of the reference frame I, for example, the searching is conducted in a searching window has a size of ±15 pixels which is the value recommended by the motion estimation. After the optimum matching block is found, the shift between the current block and matching block is the reference motion vector. Because this reference motion vector is acquired by predicting the reference frame within the original video stream which has no error, it could better reflect the actual video movement.

The acquiring process of the reference motion vector is expressed by the following formulae in which the ( Bx, By ) is the motion vector:

(Bx, By) = ZTg min SAD(m,n)

° (M,N)eS ( 1 )

In formula ( 1 ) , arg is the motion vector corresponding to the current macro block when the SAD is minimal.

SAD ( m , w ) = ∑ ∑ \ PA >, j) - R _p (i + m , j + n ) \

< t ( 2 )

In formula ( 2 ) , SAD, indicating the resemblance of two macro blocks, is the absolute value of the difference between respective pixels; m and n are the moving components of the matching block in horizontal and vertical directions, respectively; P_c ( i, j ) and R_p(i, j) are the pixels of current frame and previous reference frame, respectively. Subscripts "c" and "p" indicate "current frame " and "previous frame ", respectively. The reference motion vector may be respectively used for re-estimating the motion in base layer and enhancement layer of the video stream, such that the base layer and enhancement layer need only the motion estimating within a small range based on this reference motion vector, thereby to reduce the computing complexity and increasing compressed encoding efficiency of the encoding system.

Next, down-sampling the reference motion vector (Bx, By) for getting ( Bx, By' ) ( step S312 ) .

Down-sampling he video stream ( step S316 ) , for reducing its resolution, for example, to 720*480i.

According to the down-sampled reference motion vector ( BX' , By' ) , the motion vector of the corresponding frame of image of the down-sampled video stream is acquired (step S322). It should be noted that the corresponding frame of image mentioned herein is the same frame as the current frame when the reference motion vector is acquired. It is because that the prediction is made on the same frame, the motion vector( Dxi ;Dyi )can be obtained, based on the reference motion vector( Bx' , By' ) , by further searching the macro block that optimistically matching the current block within a smaller searching window of the reference frame. It has been proved by experiment that the searching window may be a new searching window of ±2 pixels. By referring to the formulae ( 3 ) and ( 4 ) , the searching process may be more clearly understood.

(Dx^Dy₁) ^g ₍ nήn_sR SAD_R ^

SAD_κ = ∑∑\ P_c(i,j) - R_p(i + Bx'+m,j + By'+n) \

> J (4)

It is shown by the formula (4) that the motion estimation is searching on the basis of reference motion vector ( Bx' , By' ) . Because the most of searching have been finished when calculating the reference motion vector, only very limited searching is needed for finding the optimum matching block in this step. The amount of searching in a searching window of ±2 pixels is obviously much lesser than that of the searching window of ±15 pixels.

The down-sampled video stream is processed by using the motion vector, to generate a base layer ( step S326 ) . Predictive frame of the current frame can be obtained by only shifting the reference frame in accordance with above described reference motion vector and motion vector, then a well known processing is enough to generate a base layer.

Acquiring the motion vector of the corresponding frame of image of the video stream according to the reference motion vector ( Bx, By ) ( step S332 ) . It should be noted that the corresponding frame of image herein is the same frame as the current frame when the reference motion vector is acquired. It is because that the prediction is made on the same frame, the motion vector ( Dx₂ ; Dy₂ ) can be obtained, based on the reference motion vector ( Bx, By ) , by further searching the macro block that optimistically matches the current block within a relatively small searching window of the reference frame. The method of obtaining the motion vector is similar to that of obtaining the motion vector by the base layer, so the detailed description is omitted.

Then, processing the video stream by using the motion vector and the base layer, thereby to generate a enhancement layer ( step S336 ) .

Therefore, in this present embodiment the reference motion vector can be used by the base layer and enhancement layer at the same time to predict motion, thus reducing the calculating complexity for searching in both layers and increasing the efficiency of the compressed encoding.

The computing complexities of compressing scheme about the present invention and the prior art figure 1 will be analyzed and compared as below.

It is supposed that the resolutions for the high definition ( HD ) frame and standard definition ( SD ) frame are 1920xl088i and 720x480i, respectively, and the searching window is of ±15 pixels. The computing complexity of the error measure SAD between two macro blocks for Y component is T_SA_D-

The total numbers of macro blocks for a HD frame and a SD frame ( only Y component considered ) are 8160 and 1350, respectively. If performing the motion estimation for each macro block within a searching window of ±15 pixels, the largest amount of calculation for obtaining the preferred motion vector of the macro block is

( 31 *31 *TSAD = 961 *TSAD ) • The amount of calculation for a HD frame is

( 8160*961 *TSAD = 7, 841, 760*T_SAD ) ; the amount of calculation for a SD frame

( base layer ) is 1350*961*TSAD = 1, 297, 350*T_SAD ) • For the encoding system shown in Fig.1 , the total largest amount of calculation for the motion vector of each frame is the sum of the amount of calculation for the HD frame and that for SD frame, i.e. ( 9, 139, 110*T_SAD ) •

For the encoding system shown in Fig.2, the amount of calculation for the reference motion vector is ( 7, 841, 760*T_SAD ) • When the motion estimation for each macro block is performed within a relatively smaller searching window (±2 pixels), the largest amount of calculation for getting a preferred motion vector is ( 5*5*T_SAD ⁼

25*T_SAD ) .The amount of calculation for a SD frame( base layer ) is( 1350*25*T_SAD

= 33, 750*T_SAD ) ; the amount of calculation for a HD frame ( enhancement layer ) is

( 8160*25*TSAD = 204, 00*T_SAD ) •

For the encoding system shown in Fig.2, the total largest amount of calculation for the motion vector of each frame is the sum of the amount of calculation of reference motion vector, searching amount for SD frame within a relatively smaller searching window, and searching amount for HD frame within a relatively smaller searching window, i.e. ( 7, 875, 510*TSAD ) ■

In comparison with the encoding system shown in Fig.l, the encoding system shown in Fig.2 has reduced the amount of calculation in percentage:

R =| 7, 875, 510-9, 139, 110 | / 9, 139, 110 = 14%

Fig.4 is a schematic diagram of an encoding system using reference motion vector in accordance with another embodiment of the invention. The encoding system

400 of this embodiment is similar to that shown in Fig.2 and the description here will concentrate on the difference between them only and omit the like parts. The difference between them is that the acquiring means 410 comprises a down-sampler

120 and a reference motion vector acquiring means 416. The original video stream is down-sampled by down-sampler 120 first. Then the down-sampled video stream is fed to the reference motion vector acquiring means 416, i.e. respectively fed to the motion estimation means 476 and frame memory 282, thereby acquiring the reference motion vector of each frame of image of the video stream. Then, the reference motion vector is directly fed to the motion estimation means 422 of the base layer acquiring means 412, and based on the reference motion vector, the means 422 re-estimates the motion within a relatively small searching window to acquire the motion vector of the corresponding frame of image of the down-sampled video stream ; afterwards, the base layer generation means 413 processes the down-sampled video stream by using the motion vector, thereby to generate the base layer.

Further, within the enhancement layer acquiring means 414, the reference motion vector described above is up-sampled by the up-sampler 486 first, then a motion vector acquiring means, i.e. motion vector estimation means 454, re-estimates the motion based on the up-sampled reference motion vector to acquire the motion vector of the corresponding frame of image of the video stream. Then, the video stream is processed by the enhancement layer generation means 415 with the reference motion vector and the base layer, thereby to generate an enhancement layer.

It could be seen from the description herein above, the motion estimations in the base layer and enhancement layer are associated together, such that the repetitive searching that has to be made by them when predicting the same frame of image could be finished in one time; and the base layer and the enhancement layer re-estimate within a relative small searching window based on the same reference motion vector. Because the searching processing is saved greatly, the amount of calculation of the whole encoding system is reduced.

Fig.5 is a schematic diagram of an encoding system using reference motion vector in accordance with further embodiment of the invention. The encoding system 500 of this embodiment is similar to that shown in Fig.2 and the description here will concentrate on the difference between them only and omit the like parts. The difference is that the motion estimation means 522 of base layer acquiring means 512 outputs the motion vector of each frame of image of the base layer, and said motion vector is up-sampled to be used as a reference motion vector of corresponding frame of image by a reference motion vector acquiring means, i.e., up-sampler 586, the reference motion vector is fed to the motion estimation means 554 of the enhancement layer acquiring means 514. Based on the reference motion vector, the motion estimation is proceed once more within a relatively small searching window, thereby to acquire the motion vector of the corresponding frame of image of the video stream. Then, according to the reference motion vector, the motion vector as well as the output of the base layer, the enhancement layer generation means 515 generates an enhancement layer in a way that is similar to that of the embodiment shown in

Fig.2. It could be seen from the above that within this embodiment, based on the motion vector acquired at the base layer, the enhancement layer processes its searching once more within a relatively small range, such that the enhancement layer omits a part of searching that is identical with that of the base layer, therefore reducing the total amount of calculation in the encoding system.

While the invention has been described in conjunction with specific embodiments, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art in light of the foregoing description.

Accordingly, it is intended to embrace all such alternatives, modifications and variations as fall within the spirit and scope of the appended claims.

Claims

CLAIMS:

LA spatial layered compression method for video stream, comprising steps: a. processing said video stream, thereby to obtain the reference motion vector of each frame of image of said video stream; b. processing said video stream by using the reference motion vector, thereby to generate a base layer; c. processing said video stream by using the reference motion vector and the base layer, thereby to generate a enhancement layer. 2.A method according to claim 1, wherein the step a comprises: down-sampling said video stream; acquiring said reference motion vector for each frame of image of said down-sampled video stream.

3.A method according to claim 2, wherein the step b comprises: according to said reference motion vector, acquiring said motion vector of the corresponding frame of image of the down-sampled video stream; processing said down-sampled video stream by using the motion vector, thereby to generate said enhancement layer.

4.A method according to claim 2, wherein the step c comprises: up-sampling said reference motion vector; according to said up-sampled reference motion vector, acquiring said motion vector of the corresponding frame of image of the video stream. processing the video stream by using the motion vector and said base layer, thereby to generate an enhancement layer. 5.A method according to claim 1, wherein the step b comprises: down-sampling said reference motion vector; down-sampling said video stream; according to said down-sampled reference motion vector, acquiring said motion vector of the corresponding frame of image of said down-sampled video stream; processing said down-sampled video stream by using the motion vector, thereby to generate said base layer. 6.A method according to claim 5, wherein the step c comprises: according to said reference motion vector, acquiring the motion vector of the corresponding frame of image of the video stream; processing the video stream by using the motion vector and said base layer, thereby to generate said enhancement layer.

7.A spatial layered compression method for video stream, comprising steps: a. processing said video stream, thereby to generate a base layer; b. up-sampling the motion vector of each frame of image of said base layer, thereby acquiring the reference motion vector of the corresponding frame of image; and c. processing the video stream by using the reference motion vector and the base layer, thereby to generate a enhancement layer.

8.A method according to claim 7, wherein the step c comprises: according to said reference motion vector, acquiring said motion vector of the corresponding frame of image of said video stream; processing the video stream by using the motion vector and said base layer, thereby to generate said enhancement layer.

9.A spatial layered compression apparatus for video stream, comprising: an acquiring means, used for processing said video stream, thereby to obtain the reference motion vector of each frame of image of said video stream; a base layer acquiring means, for processing said video stream using the reference motion vector , thereby to generate a base layer; and an enhancement layer acquiring means, for processing the video stream by using the reference motion vector and the base layer, thereby to generate a enhancement layer.

10.A apparatus according to claim 9, wherein said acquiring means comprises: a down-sampler, used for down-sampling said video stream, and a reference motion vector acquiring means, used to acquire the reference motion vector of each frame of image of the down-sampled video stream. l l.A apparatus according to claim 10, wherein said base layer acquiring means comprises: a motion vector acquiring means, used to acquire the motion vector of the corresponding frame of image of the down-sampled video stream based on said reference motion vector; and a base layer generation means, for processing said down-sampled video stream by using the motion vector, thereby to generate said base layer.

12.A apparatus according to claim 10, wherein said enhancement layer acquiring means comprises: a up-sampler, used for up-sampling said reference motion vector; a motion vector acquiring means, according to said up-sampled reference motion vector, to acquire said motion vector of the corresponding frame of image of the video stream; and an enhancement layer generation means, for processing said video stream by using the motion vector and the base layer, thereby to generate said enhancement layer. 13. A apparatus according to claim 9, wherein said base layer acquiring means comprises: a down-sampler, used for down-sampling said reference motion vector and said video stream, a motion vector acquiring means, used for acquiring the motion vector of the corresponding frame of image of the down-sampled video stream based on the down-sampled reference motion vector; and a base layer generation means, for processing said down-sampled video stream by using the motion vector, thereby to generate said base layer.

14.A apparatus according to claim 13, wherein said enhancement layer acquiring means comprises: a motion vector acquiring means, according to said reference motion vector, acquiring said motion vector of the corresponding frame of image of the video stream; and an enhancement layer generation means, for processing said video stream by using said motion vector and said base layer, thereby to generate said enhancement layer. 15.A spatial layered compression apparatus for video stream, comprising: a base layer acquiring means, used for processing the video stream, thereby to generate a base layer. a reference motion vector acquiring means, used for up-sampling the motion vector for each frame of image of the base layer, thereby to acquire the reference motion vector corresponding to the frame of image; and an enhancement layer acquiring means, used for processing the video stream by using the reference motion vector and the base layer, thereby to generate a enhancement layer. 16.A apparatus according to claim 15, wherein said enhancement layer acquiring means comprises: a motion vector acquiring means, according to said reference motion vector, acquiring said motion vector of the corresponding frame of image of said video stream; an enhancement layer generation means, for processing said video stream by using said motion vector and said base layer, thereby to generate said enhancement layer.