US20140072027A1 - System for video compression - Google Patents

System for video compression Download PDF

Info

Publication number
US20140072027A1
US20140072027A1 US13/611,959 US201213611959A US2014072027A1 US 20140072027 A1 US20140072027 A1 US 20140072027A1 US 201213611959 A US201213611959 A US 201213611959A US 2014072027 A1 US2014072027 A1 US 2014072027A1
Authority
US
United States
Prior art keywords
encoding
value
parallel
yuv
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/611,959
Inventor
Haibin Li
Roy Chen
Lei Zhang
Ji Zhou
Cai Zhong
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ATI Technologies ULC
Advanced Micro Devices Inc
Original Assignee
ATI Technologies ULC
Advanced Micro Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ATI Technologies ULC, Advanced Micro Devices Inc filed Critical ATI Technologies ULC
Priority to US13/611,959 priority Critical patent/US20140072027A1/en
Assigned to ADVANCED MICRO DEVICES, INC. reassignment ADVANCED MICRO DEVICES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, ROY, LI, HAIBIN, ZHONG, CAI, ZHOU, JI
Assigned to ATI TECHNOLOGIES ULC reassignment ATI TECHNOLOGIES ULC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ZHANG, LEI
Publication of US20140072027A1 publication Critical patent/US20140072027A1/en
Priority to US15/491,887 priority patent/US10542268B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/15Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream

Definitions

  • the present invention relates to scalable video applications and more specifically to improving compression in scalable video applications.
  • a video bit stream is called scalable when parts of the stream can be removed in a way that the resulting substream forms another valid bit stream for some target decoder, and the substream represents the source content with a reconstruction quality that is less than that of the complete original bit stream but is high when considering the lower quantity of remaining data.
  • a system and method for providing video compression that includes encoding using an encoding engine a YUV stream wherein Y, U and V color values are encoded in parallel and patching together the Y, U and V color streams to form a compressed YUV output stream.
  • the encoding engine further includes encoding each color value of the YUV stream in parallel using parallel encoding engines and a control engine for controlling operation all of the encoding engines in parallel.
  • the YUV stream has an average bits per pixel value that varies from a first value to a second value that is a larger than (e.g., double) the first value.
  • the encoding engine includes encoding the YUV stream in generally the same amount of time regardless of the average bits per pixel value.
  • the encoding engine includes determining color values while avoiding null value registers and storing the determined color values in at least one buffer.
  • the encoding engines further includes compressing level and register location of the stored determined color values from the at least one buffer in parallel.
  • FIG. 1 is a block diagram of a computing system according to an embodiment of the present invention
  • FIG. 2 is a block diagram of an entropy encoding engine according to an embodiment of the present invention
  • FIG. 3 is a block diagram of an encoding engine according to an embodiment of the present invention.
  • FIG. 4 is a diagram of collecting and buffering YUV color values according to an embodiment of the present invention.
  • FIG. 5 is diagrammatic view of a MB residual compress engine according to an embodiment of the present invention.
  • Embodiments of the invention as described herein provide a solution to the problems of conventional methods.
  • various examples are given for illustration, but none are intended to be limiting.
  • Embodiments include implementing a remote display system (either wired or wireless) using a standard, non-custom codec.
  • H.264 refers to the standard for video compression that is also known as MPEG-4 Part 10, or MPEG-4 AVC (Advanced Video Coding).
  • H.264 is one of the block-oriented motion-estimation-based codecs developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG).
  • VCEG Video Coding Experts Group
  • MPEG Moving Picture Experts Group
  • other video formats could also be employed in alternative embodiments.
  • Scalable Video Coding (SVC) that is gaining popularity for video conferencing type applications.
  • SVC Scalable Video Coding
  • a number of industry leading companies have standardized (or support the standard) using SVC in the UCIF (Universal Communications Interop Forum) for video conferencing.
  • UCIF Universal Communications Interop Forum
  • the H.264 standard supports the transmission of color video in the ‘YUV’ color format.
  • ‘YUV represents the ‘luma’ value, or brightness
  • ‘UV’ represents the color, or ‘chroma’ values.
  • Each unique Y, U and V value comprises 8 bits, or one byte, of data.
  • YUV standards support 24 bit per pixel (bpp) format for the YUV444 standard, 16 per pixel (bpp) format for the YUV422 standard, and 12 bit per pixel (bpp) format for the YUV411 standard and the YUV420 standard.
  • the U and V color values are shared between every other pixel, which results in an average bit rate of 16.
  • the U and V color values are shared between every four pixels, which results in an average bit rate of 12.
  • the U and V color values are shared between every four pixels, which results in an average bit rate of 12, but the YUV are distributed in a reordered format. These bandwidth saving techniques take into account the human eye's lesser sensitivity to variations in color than in brightness.
  • YUV444 format video is up to 2 times of the size of the space saving YUV420 format. Even so it is desirable to achieve compression speeds close to the YUV420 standard.
  • some embodiments of the invention give a solution to this by compressing Y/U/V color values at a MacroBlock (MB) level in parallel, and doing reordering by concatenating the MB of the Y color value with its LTV color values. It will be appreciated by those skilled in the art that this embodiment is especially useful in large bit rate applications such as giga-bit wireless displays and avoids memory bandwidth consumption.
  • MB MacroBlock
  • Computers and other such data processing devices have at least one control processor that is generally known as a control processing unit (CPU). Such computers and processing devices operate in environments which can typically have memory, storage, input devices and output devices. Such computers and processing devices can also have other processors such as graphics processing units (GPU) that are used for specialized processing of various types and may be located with the processing devices or externally, such as, included the output device. For example, GPUs are designed to be particularly suited for graphics processing operations. GPUs generally comprise multiple processing elements that are ideally suited for executing the same instruction on parallel data streams, such as in data-parallel processing. In general, a CPU functions as the host or controlling processor and hands-off specialized functions such as graphics processing to other processors such as GPUs.
  • CPU functions as the host or controlling processor and hands-off specialized functions such as graphics processing to other processors such as GPUs.
  • multi-core CPUs where each CPU has multiple processing cores
  • substantial processing capabilities that can also be used for specialized functions are available in CPUs.
  • One or more of the computation cores of multi-core CPUs or GPUs can be part of the same die (e.g., AMD FusionTM) or in different dies (e.g., Intel XeonTM with NVIDIA GPU).
  • hybrid cores having characteristics of both CPU and GPU e.g., CellSPETM, Intel LarrabeeTM
  • GPGPU style of computing advocates using the CPU to primarily execute control code and to offload performance critical data-parallel code to the GPU.
  • the GPU is primarily used as an accelerator.
  • the combination of multi-core CPUs and GPGPU computing model encompasses both CPU cores and GPU cores as accelerator targets.
  • Many of the multi-core CPU cores have performance that is comparable to GPUs in many areas.
  • the floating point operations per second (FLOPS) of many CPU cores are now comparable to that of some GPU cores.
  • Embodiments of the present invention may yield substantial advantages by enabling the use of the same or similar code base on CPU and GPU processors and also by facilitating the debugging of such code bases. While the present invention is described herein with illustrative embodiments for particular applications, it should be understood that the invention is not limited thereto. Those skilled in the art with access to the teachings provided herein will recognize additional modifications, applications, and embodiments within the scope thereof and additional fields in which the invention would be of significant utility.
  • Embodiments of the present invention may be used in any computer system, computing device, entertainment system, media system, game systems, communication device, personal digital assistant, or any system using one or more processors. Such embodiments may be particularly useful where the system comprises a heterogeneous computing system.
  • a “heterogeneous computing system,” as the term is used herein, is a computing system in which multiple kinds of processors are available.
  • Embodiments of the present invention enable the same code base to be executed on different processors, such as GPUs and CPUs.
  • Embodiments of the present invention can be particularly advantageous in processing systems having multi-core CPUs, and/or GPUs, because code developed for one type of processor can be deployed on another type of processor with little or no additional effort.
  • code developed for execution on a GPU also known as GPU-kernels, can be deployed to be executed on a CPU, using embodiments of the present invention.
  • Heterogeneous computing system 100 can include one or more processing units, such as processor 102 .
  • Heterogeneous computing system 100 can also include at least one system memory 104 , at least one persistent storage device 106 , at least one system bus 108 , at least one input device 110 and output device 112 .
  • a processing unit of the type suitable for heterogeneous computing are the accelerated processing units (APUs) sold under various brand names by Advanced Micro Devices of Sunnyvale, Calif. according to an embodiment of the present invention as illustrated by FIG. 2 .
  • a heterogeneous processing unit includes one or more CPUs and one or more GPUs, such as a wide single instruction, multiple data (SIMD) processor and unified video decoder perform functions previously handled by a discrete GPU. It will be understood that when referring to the GPU structure and function, such functions are carried out by the SIMD.
  • Heterogeneous processing units can also include at least one memory controller for accessing system memory and that also provides memory shared between the GPU and CPU and a platform interface for handling communication with input and output devices through, for example, a controller hub.
  • a wide single instruction, multiple data (SIMD) processor for carrying out graphics processing instructions may be included to provide a heterogenous GPU capability in accordance with an embodiment of the present invention or a discrete GPU may be included separated from the CPU to implement the embodiment; however, as will be understood by those skilled in the art, additional latency my be experienced in an implementation of the present invention using a discrete GPU.
  • SIMD single instruction, multiple data
  • architecture of the types described above are well suited to provide a solution for implementing hardware encoding and/or decoding in higher resolution YUV standards, such as YUV444.
  • YUV444 video streams there are two types supported, namely, a separate-color-plane YUV444 and non-separate-color-plane YUV444, where color is used in this context to also to refer to chroma and color plane is used in this context to also refer to Y/U/V color values.
  • a separate-color-plane stream the 3 color values of YUV have no dependency and compress independently, and the 3 color values are joined together into one whole video stream at the end of each slice of video data, where typically, a slice is a frame.
  • a MB level represents a compression unit in the H.264 specification and typically refers to a 16 ⁇ 16 pixel block in one frame, and they share the same prediction-mode.
  • the average pixel size of YUV444 format video at 24 bits per pixel is 2 times of the average pixel size of YUV420 format at 12 bits per pixel.
  • the Y/U/V color values are encoded and decoded in a sequential process.
  • an embodiment of the present invention includes a hardware configuration to compress Y/U/V color values in parallel using 3 encode engines. Each encoder is dedicated to encode one of the Y, U or V color values. For a separate-color-plane stream, this embodiment concatenates the Y/U/V color values at the end of each slice. For a non-separate-color-plane stream, the embodiment concatenates the Y/U/V color values at the end of each MB, where for each MB the Y color value is concatenated with corresponding UV color values.
  • CAVLC context-adaptive variable-length coding
  • each Y/U/V color value may be compressed using a base encoding unit, such as a 4 ⁇ 4 pixel block.
  • the entropy encoder includes two data-paths to compress each 4 ⁇ 4 block in parallel.
  • FIG. 2 shows the block diagram of Y/U/V color values concatenating at the top level in which an exemplary YUV stream is described in connection with the entropy encoding engine 200 .
  • the entropy encoding engine includes a top control (topctrl) engine 202 and three encoding engines 204 , 206 and 208 connected via a bus 209 to the topctrl engine 202 .
  • Each of the encoding engines 204 , 206 and 208 receives respective Y, U and V data from a local memory 210 and outputs encoded respective Y, U and V values to respective local buffers 212 , 214 and 216 .
  • the buffer 212 associated with the Y color value encoder 204 connects directly to the system memory 218 for outputting the final YUV compressed stream.
  • the exemplary YUV stream is a non-separate-color-plane stream; however, it will be appreciated by those skilled in the art that the same features of the entropy encoding engine 200 may be implemented to process a separate-color-plane stream.
  • the entropy encoder's firmware first checks the status of topctrl engine 202 and the 3 encoding engines 204 , 206 and 208 to confirm that they are ready to accept new YUV data, and then the topctrl engine 202 signals the encoding engines 204 , 206 and 208 to begin processing new YUV data.
  • the encoding engines 204 , 206 and 208 begin to encode simultaneously. Each Y/U/V color value will go into each encoding engine 204 , 206 and 208 .
  • Each Y/U/V color's output will be written into temporary local memory 212 , 214 , 216 .
  • U and V color values have the same type of local memory 214 and 216 , but for the Y color value, the local memory 212 is connected to system memory 218 , and the local memory 212 content can be written into system memory 218 automatically.
  • topctrl engine 202 Monitoring and control of the three encoding engines 204 , 206 and 208 at the same time is accomplished by the topctrl engine 202 using the following engines:
  • an internal buffer may be used for local memory to eliminate data exchanges with external memory can be added. This is also do-able when the hardware is configured with a fast processor or as a heterogeneous computing platform described above.
  • the data-flow for the Y color value encoding engine 300 is shown.
  • the non-separate color stream is used to exemplify the data flow in which a compress unit is one MB in the form of a 16 ⁇ 16 block.
  • the header information After reading the MB header from local memory 302 , the header information will be stored into local flops/buffer 304 , and then trigger the MB header compress 306 as part of the compressing engine 308 to begin compression of the header.
  • the beginning of header compression is a trigger signal that will also trigger a residual buffer 310 to read residual 4 ⁇ 4 blocks from local memory and store them into the residual buffer.
  • a Residual-pre-process engine 312 to monitor the status of the residual-buffer 310 , once there is one 4 ⁇ 4 block coefficient available and the Residual pre-process engine 312 will read out the 4 ⁇ 4 block, pre-process the data, store the result into a First-In, First-Out (FIFO) buffer 314 .
  • FIFO First-In, First-Out
  • a MB-residual-compress engine 316 within the compressing engine 308 monitors both the MB-header-compress 306 and the FIFO buffer 314 status. When the MB-header-compress 306 is done and there are valid data in the FIFO buffer 314 , the residual-compress engine 316 will begin to compress the residual.
  • the Probability Interval Partitioning Entropy (PIPE) coding engine 318 is an inserted pipe-stage in order to break the big pipe delay in the data-flow from conventional data flow scenarios.
  • a stream packer engine 322 has two tasks in which one is do some regular processing to conform the encoded YUV stream to by H.264 standard and the other is to sequentially read back the U and then V color values and patch them into the output after Y plane at MB level and written to the local memory 320 .
  • an improved process provided by the residual-pre-process engine 312 of FIG. 3 is shown operating on a unit having a 4 ⁇ 4 block of residual data.
  • the residual-pre-process engine 400 first scans the 4 ⁇ 4 2D arrays into 1D array 402 as described in the H.264 standard, and then begins to parse the 16 residuals.
  • the 16 residuals in the 1 D array 402 is one by one, which need at least 16 cycles to complete one 4 ⁇ 4 blk.
  • a fast parse process is used, which only parses the non-zero-residuals.
  • a 1 D array 404 having four coefficients with 11 zeros and one trailing zero requires 5 cycles to complete parsing of the 1 D array.
  • the FIFO buffer 406 stores only the data relevant to the residual information including the coefficient value 408 and location 410 based upon intervening zeros.
  • a MB residual compress engine 500 is shown.
  • the level steps 502 to 506 and run_before steps 508 to 510 are compressed sequentially.
  • An embodiment using the improved FIFO buffer 408 ( FIG. 4 ) that includes two FIFO buffers for the coefficient value 408 and location 410 based upon intervening zeros improvement includes level steps 512 to 516 ( FIG. 5 ) and run_before steps 518 to 520 compresses the level and run_before in a parallel process.
  • the run_before compress result will be stored into a local memory, once all the element before run_before are compressed, the data in local-memory will be read out and patch into the stream. It will be appreciated that this implementation the residual-pre-process engine 400 ( FIG. 4 ) and the MB residual compress 500 ( FIG. 5 ) will have similar process time, and make the pipe-line-delay more balanced.
  • the entropy encoding speed will be generally totally determined by the kernel engine speed.
  • the hardware described above can be implemented using a processor executing instruction from a non-transitory storage medium.
  • a hardware description language that is a code for describing a circuit.
  • An exemplary use of HDLs is the simulation of designs before the designer must commit to fabrication.
  • the two most popular HDLs are VHSIC Hardware Description Language (VHDL) and VERILOG.
  • VHDL was developed by the U.S. Department of Defense and is an open standard.
  • VERILOG also called Open VERILOG International (OVI)
  • OPI Open VERILOG International
  • VHDL is an HDL defined by IEEE standard 1076.1.
  • Boundary Scan Description Language (BSDL) is a subset of VHDL, and provides a standard machine- and human readable data format for describing how an IEEE Std 1149.1 boundary-scan architecture is implemented and operates in a device. Any HDL of the types described can be used to create instructions representative of the hardware description.

Abstract

A system and method for providing video compression that includes encoding using an encoding engine a YUV stream wherein Y, U and V color values are encoded in parallel and patching together the Y, U and V color streams to form a compressed YUV output stream. The encoding engine further includes encoding each color value of the YUV stream in parallel using parallel encoding engines and a control engine for controlling operation all of the encoding engines in parallel. The YUV stream has an average bits per pixel value that varies from a first value to a second value that is double the first value. The encoding engine includes encoding the YUV stream in generally the same amount of time regardless of the average bits per pixel value.

Description

    FIELD OF INVENTION
  • The present invention relates to scalable video applications and more specifically to improving compression in scalable video applications.
  • BACKGROUND
  • Currently, the remote transfer and display of video data using consumer electronics devices has become a field of significant development. Generally, it is desirable to permit such streaming between devices with different display capabilities. With the advent of different video devices having different video resolutions, it is desirable to compress the video stream thereby increasing the amount of data transmitted to communicate the highest video resolution that can be transferred, yet it is also desirable to permit viewing of such video streams with devices that may only permit lower resolution video streams or may have throughput or slow processing capabilities that render such higher resolution video signals impracticable. These issues have become particularly pronounced with the advent of high definition (HD) video, although the problem should not be construed as being limited to HD video. Thus, scalable video streams are increasing in popularity. In general, a video bit stream is called scalable when parts of the stream can be removed in a way that the resulting substream forms another valid bit stream for some target decoder, and the substream represents the source content with a reconstruction quality that is less than that of the complete original bit stream but is high when considering the lower quantity of remaining data.
  • The usual modes of compression can result in differences in the amount of time required to encode/decode higher resolution video (which may or may not conform to known “high definition” formats) in comparison to a lower resolution. In systems that support scalable video delays in processing, the video stream for higher resolution video can become a limiting factor in the overall system performance. Thus, the need exists for a way to reduce or eliminate the effects of delays due to compression of video.
  • SUMMARY OF EMBODIMENTS
  • A system and method for providing video compression that includes encoding using an encoding engine a YUV stream wherein Y, U and V color values are encoded in parallel and patching together the Y, U and V color streams to form a compressed YUV output stream.
  • In some embodiments, the encoding engine further includes encoding each color value of the YUV stream in parallel using parallel encoding engines and a control engine for controlling operation all of the encoding engines in parallel.
  • The YUV stream has an average bits per pixel value that varies from a first value to a second value that is a larger than (e.g., double) the first value. The encoding engine includes encoding the YUV stream in generally the same amount of time regardless of the average bits per pixel value.
  • In some embodiments the encoding engine includes determining color values while avoiding null value registers and storing the determined color values in at least one buffer.
  • In some embodiments the encoding engines further includes compressing level and register location of the stored determined color values from the at least one buffer in parallel.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Other aspects, advantages and novel features of embodiments of the invention will become more apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings wherein:
  • FIG. 1 is a block diagram of a computing system according to an embodiment of the present invention;
  • FIG. 2 is a block diagram of an entropy encoding engine according to an embodiment of the present invention;
  • FIG. 3 is a block diagram of an encoding engine according to an embodiment of the present invention;
  • FIG. 4 is a diagram of collecting and buffering YUV color values according to an embodiment of the present invention; and
  • FIG. 5 is diagrammatic view of a MB residual compress engine according to an embodiment of the present invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT(S)
  • Embodiments of the invention as described herein provide a solution to the problems of conventional methods. In the following description, various examples are given for illustration, but none are intended to be limiting. Embodiments include implementing a remote display system (either wired or wireless) using a standard, non-custom codec.
  • For purposes of this description, “H.264” refers to the standard for video compression that is also known as MPEG-4 Part 10, or MPEG-4 AVC (Advanced Video Coding). H.264 is one of the block-oriented motion-estimation-based codecs developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG). However, other video formats could also be employed in alternative embodiments.
  • Included with in the features of H.264 is Scalable Video Coding (SVC) that is gaining popularity for video conferencing type applications. A number of industry leading companies have standardized (or support the standard) using SVC in the UCIF (Universal Communications Interop Forum) for video conferencing.
  • The H.264 standard supports the transmission of color video in the ‘YUV’ color format. In ‘YUV,’ ‘Y’ represents the ‘luma’ value, or brightness, and ‘UV’ represents the color, or ‘chroma’ values.
  • Each unique Y, U and V value comprises 8 bits, or one byte, of data. YUV standards support 24 bit per pixel (bpp) format for the YUV444 standard, 16 per pixel (bpp) format for the YUV422 standard, and 12 bit per pixel (bpp) format for the YUV411 standard and the YUV420 standard. In the YUV422 standard, the U and V color values are shared between every other pixel, which results in an average bit rate of 16. In the YUV411 standard, the U and V color values are shared between every four pixels, which results in an average bit rate of 12. In the YUV420 standard, the U and V color values are shared between every four pixels, which results in an average bit rate of 12, but the YUV are distributed in a reordered format. These bandwidth saving techniques take into account the human eye's lesser sensitivity to variations in color than in brightness.
  • It will be appreciated by those skilled in the art that the size of YUV444 format video is up to 2 times of the size of the space saving YUV420 format. Even so it is desirable to achieve compression speeds close to the YUV420 standard. Advantageously, some embodiments of the invention give a solution to this by compressing Y/U/V color values at a MacroBlock (MB) level in parallel, and doing reordering by concatenating the MB of the Y color value with its LTV color values. It will be appreciated by those skilled in the art that this embodiment is especially useful in large bit rate applications such as giga-bit wireless displays and avoids memory bandwidth consumption.
  • In the following description, numerous specific details are introduced to provide a thorough understanding of, and enabling description for, embodiments of the implementing low latency applications. One skilled in the relevant art, however, will recognize that these embodiments can be practiced without one or more of the specific details, or with other components, systems, etc. In other instances, well-known structures or operations are not shown, or are not described in detail, to avoid obscuring aspects of the disclosed embodiments.
  • Computers and other such data processing devices have at least one control processor that is generally known as a control processing unit (CPU). Such computers and processing devices operate in environments which can typically have memory, storage, input devices and output devices. Such computers and processing devices can also have other processors such as graphics processing units (GPU) that are used for specialized processing of various types and may be located with the processing devices or externally, such as, included the output device. For example, GPUs are designed to be particularly suited for graphics processing operations. GPUs generally comprise multiple processing elements that are ideally suited for executing the same instruction on parallel data streams, such as in data-parallel processing. In general, a CPU functions as the host or controlling processor and hands-off specialized functions such as graphics processing to other processors such as GPUs.
  • With the availability of multi-core CPUs where each CPU has multiple processing cores, substantial processing capabilities that can also be used for specialized functions are available in CPUs. One or more of the computation cores of multi-core CPUs or GPUs can be part of the same die (e.g., AMD Fusion™) or in different dies (e.g., Intel Xeon™ with NVIDIA GPU). Recently, hybrid cores having characteristics of both CPU and GPU (e.g., CellSPE™, Intel Larrabee™) have been generally proposed for General Purpose GPU (GPGPU) style computing. The GPGPU style of computing advocates using the CPU to primarily execute control code and to offload performance critical data-parallel code to the GPU. The GPU is primarily used as an accelerator. The combination of multi-core CPUs and GPGPU computing model encompasses both CPU cores and GPU cores as accelerator targets. Many of the multi-core CPU cores have performance that is comparable to GPUs in many areas. For example, the floating point operations per second (FLOPS) of many CPU cores are now comparable to that of some GPU cores.
  • Embodiments of the present invention may yield substantial advantages by enabling the use of the same or similar code base on CPU and GPU processors and also by facilitating the debugging of such code bases. While the present invention is described herein with illustrative embodiments for particular applications, it should be understood that the invention is not limited thereto. Those skilled in the art with access to the teachings provided herein will recognize additional modifications, applications, and embodiments within the scope thereof and additional fields in which the invention would be of significant utility.
  • Embodiments of the present invention may be used in any computer system, computing device, entertainment system, media system, game systems, communication device, personal digital assistant, or any system using one or more processors. Such embodiments may be particularly useful where the system comprises a heterogeneous computing system. A “heterogeneous computing system,” as the term is used herein, is a computing system in which multiple kinds of processors are available.
  • Embodiments of the present invention enable the same code base to be executed on different processors, such as GPUs and CPUs. Embodiments of the present invention, for example, can be particularly advantageous in processing systems having multi-core CPUs, and/or GPUs, because code developed for one type of processor can be deployed on another type of processor with little or no additional effort. For example, code developed for execution on a GPU, also known as GPU-kernels, can be deployed to be executed on a CPU, using embodiments of the present invention.
  • An example heterogeneous computing system 100, according to an embodiment of the present invention, is shown in FIG. 1. Heterogeneous computing system 100 can include one or more processing units, such as processor 102. Heterogeneous computing system 100 can also include at least one system memory 104, at least one persistent storage device 106, at least one system bus 108, at least one input device 110 and output device 112.
  • A processing unit of the type suitable for heterogeneous computing are the accelerated processing units (APUs) sold under various brand names by Advanced Micro Devices of Sunnyvale, Calif. according to an embodiment of the present invention as illustrated by FIG. 2. A heterogeneous processing unit includes one or more CPUs and one or more GPUs, such as a wide single instruction, multiple data (SIMD) processor and unified video decoder perform functions previously handled by a discrete GPU. It will be understood that when referring to the GPU structure and function, such functions are carried out by the SIMD. Heterogeneous processing units can also include at least one memory controller for accessing system memory and that also provides memory shared between the GPU and CPU and a platform interface for handling communication with input and output devices through, for example, a controller hub.
  • A wide single instruction, multiple data (SIMD) processor for carrying out graphics processing instructions may be included to provide a heterogenous GPU capability in accordance with an embodiment of the present invention or a discrete GPU may be included separated from the CPU to implement the embodiment; however, as will be understood by those skilled in the art, additional latency my be experienced in an implementation of the present invention using a discrete GPU.
  • Advantageously, architecture of the types described above are well suited to provide a solution for implementing hardware encoding and/or decoding in higher resolution YUV standards, such as YUV444.
  • In H.264 spec, there are two types of YUV444 video streams supported, namely, a separate-color-plane YUV444 and non-separate-color-plane YUV444, where color is used in this context to also to refer to chroma and color plane is used in this context to also refer to Y/U/V color values. In a separate-color-plane stream, the 3 color values of YUV have no dependency and compress independently, and the 3 color values are joined together into one whole video stream at the end of each slice of video data, where typically, a slice is a frame. In a non-separate-color-plane stream, the 3 color values of Y/U/V are integrated together at each MB level, where a MB level represents a compression unit in the H.264 specification and typically refers to a 16×16 pixel block in one frame, and they share the same prediction-mode.
  • As described above, the average pixel size of YUV444 format video at 24 bits per pixel is 2 times of the average pixel size of YUV420 format at 12 bits per pixel. Conventionally, the Y/U/V color values are encoded and decoded in a sequential process. To achieve compression speeds close to YUV420, an embodiment of the present invention includes a hardware configuration to compress Y/U/V color values in parallel using 3 encode engines. Each encoder is dedicated to encode one of the Y, U or V color values. For a separate-color-plane stream, this embodiment concatenates the Y/U/V color values at the end of each slice. For a non-separate-color-plane stream, the embodiment concatenates the Y/U/V color values at the end of each MB, where for each MB the Y color value is concatenated with corresponding UV color values.
  • It will be appreciated that to achieve the parallel compression of each color value in YUV, a re-design of the data-path, pipeline as well as parallelizing the entropy encoding process as much as possible is required to improve the performance.
  • Furthermore, it has been found that parallel encoding is especially useful in large bit rate applications such as, but not limited to, giga-bit wireless displays. Additionally, it has been found that this solution adapts well with context-adaptive variable-length coding (CAVLC), which is a form of entropy coding used in the H.264 video encoding standard.
  • In this embodiment of the invention, each Y/U/V color value may be compressed using a base encoding unit, such as a 4×4 pixel block. The entropy encoder includes two data-paths to compress each 4×4 block in parallel.
  • FIG. 2 shows the block diagram of Y/U/V color values concatenating at the top level in which an exemplary YUV stream is described in connection with the entropy encoding engine 200. The entropy encoding engine includes a top control (topctrl) engine 202 and three encoding engines 204, 206 and 208 connected via a bus 209 to the topctrl engine 202. Each of the encoding engines 204, 206 and 208 receives respective Y, U and V data from a local memory 210 and outputs encoded respective Y, U and V values to respective local buffers 212, 214 and 216. The buffer 212 associated with the Y color value encoder 204 connects directly to the system memory 218 for outputting the final YUV compressed stream. The buffers 214 and 216 for the U and V color values output to the encoder 204 for the Y color value. As the entropy encoding engine 200 will be further described, the exemplary YUV stream is a non-separate-color-plane stream; however, it will be appreciated by those skilled in the art that the same features of the entropy encoding engine 200 may be implemented to process a separate-color-plane stream. In operation, as each MB in the non-separate-color-plane stream becomes available in local memory 210 for processing, the entropy encoder's firmware first checks the status of topctrl engine 202 and the 3 encoding engines 204, 206 and 208 to confirm that they are ready to accept new YUV data, and then the topctrl engine 202 signals the encoding engines 204, 206 and 208 to begin processing new YUV data. When the three encoding engines 204, 206 and 208 get commands to receive the YUV data, the encoding engines 204, 206 and 208 begin to encode simultaneously. Each Y/U/V color value will go into each encoding engine 204, 206 and 208. Each Y/U/V color's output will be written into temporary local memory 212, 214, 216. U and V color values have the same type of local memory 214 and 216, but for the Y color value, the local memory 212 is connected to system memory 218, and the local memory 212 content can be written into system memory 218 automatically.
  • Monitoring and control of the three encoding engines 204, 206 and 208 at the same time is accomplished by the topctrl engine 202 using the following engines:
      • a. An Idle Ready engine 220 determines when the entropy encoder 200 is read to accept new data.
      • b. A busy encoding engine 222 will then check all three encoding engines are all busy.
      • c. An encoding complete engine 224 then waits and identifies when all three cores are idle.
      • d. A U color value patching engine 226 then triggers the Y encoding engine 204 to fetch U-color output from U's local memory 214, write the encoded U color value into Y's local memory 212 and wait for the Y encoding engine 204 to finish.
      • e. A V color value patching engine 228 then triggers the Y encoding engine 204 to fetch V-color output from V's local memory 216, write the encoded V color value into Y's local memory 212 and wait for the Y encoding engine 204 to finish.
      • f. Upon completion of the V color value patching engine 228, the encode YUV data is written out to the system memory 218 and the topctrl engine 202 returns to the IDLE Ready engine 220 to await the availability of additional YUV color values to begin another MB encoding loop.
  • It will be appreciated by those skilled in the art that, if the patch engines 226 and 228 delays are ignored, one would measure up to triple the compress speed. Even when accounting for the patch engines 226 and 228, one can measure upwards of 2× speed over a conventional sequential patching method.
  • Finally for the best performance, an internal buffer may be used for local memory to eliminate data exchanges with external memory can be added. This is also do-able when the hardware is configured with a fast processor or as a heterogeneous computing platform described above.
  • With reference to FIG. 3, the data-flow for the Y color value encoding engine 300 is shown. Once again the non-separate color stream is used to exemplify the data flow in which a compress unit is one MB in the form of a 16×16 block.
  • It will be appreciated that in order to speed up each color plane compressing as much as possible, this solution also pipelines the data-path, and makes each pipe-stage delay balanced.
  • After reading the MB header from local memory 302, the header information will be stored into local flops/buffer 304, and then trigger the MB header compress 306 as part of the compressing engine 308 to begin compression of the header. At the same time, the beginning of header compression is a trigger signal that will also trigger a residual buffer 310 to read residual 4×4 blocks from local memory and store them into the residual buffer.
  • A Residual-pre-process engine 312 to monitor the status of the residual-buffer 310, once there is one 4×4 block coefficient available and the Residual pre-process engine 312 will read out the 4×4 block, pre-process the data, store the result into a First-In, First-Out (FIFO) buffer 314.
  • A MB-residual-compress engine 316 within the compressing engine 308 monitors both the MB-header-compress 306 and the FIFO buffer 314 status. When the MB-header-compress 306 is done and there are valid data in the FIFO buffer 314, the residual-compress engine 316 will begin to compress the residual.
  • The Probability Interval Partitioning Entropy (PIPE) coding engine 318 is an inserted pipe-stage in order to break the big pipe delay in the data-flow from conventional data flow scenarios.
  • It will be appreciated by those skilled in the art, that the functionality of the U and V encoding engines 206 and 208 (FIG. 2) have also now been described, where the data from the PIPE is written to the local memory 320. The remaining features described in FIG. 3 are unique to the Y color value encoding engine.
  • A stream packer engine 322 has two tasks in which one is do some regular processing to conform the encoded YUV stream to by H.264 standard and the other is to sequentially read back the U and then V color values and patch them into the output after Y plane at MB level and written to the local memory 320.
  • With reference to FIG. 4, an improved process provided by the residual-pre-process engine 312 of FIG. 3 is shown operating on a unit having a 4×4 block of residual data. The residual-pre-process engine 400 first scans the 4×4 2D arrays into 1D array 402 as described in the H.264 standard, and then begins to parse the 16 residuals. In a conventional parsing process, the 16 residuals in the 1 D array 402 is one by one, which need at least 16 cycles to complete one 4×4 blk. In an embodiment, a fast parse process is used, which only parses the non-zero-residuals. By way of example, but not by limitation, a 1 D array 404 having four coefficients with 11 zeros and one trailing zero requires 5 cycles to complete parsing of the 1 D array. The FIFO buffer 406 stores only the data relevant to the residual information including the coefficient value 408 and location 410 based upon intervening zeros.
  • With reference to FIG. 5, a MB residual compress engine 500 is shown. In a conventional embodiment, the level steps 502 to 506 and run_before steps 508 to 510 are compressed sequentially. An embodiment using the improved FIFO buffer 408 (FIG. 4) that includes two FIFO buffers for the coefficient value 408 and location 410 based upon intervening zeros improvement includes level steps 512 to 516 (FIG. 5) and run_before steps 518 to 520 compresses the level and run_before in a parallel process. The run_before compress result will be stored into a local memory, once all the element before run_before are compressed, the data in local-memory will be read out and patch into the stream. It will be appreciated that this implementation the residual-pre-process engine 400 (FIG. 4) and the MB residual compress 500 (FIG. 5) will have similar process time, and make the pipe-line-delay more balanced.
  • 3 Result for Speed
  • By the improvements described above and while excluding the local memory bandwidth, the entropy encoding speed will be generally totally determined by the kernel engine speed.
  • Without considering local memory bandwidth, the analyze result as below shows:
  • cycles/mb=(nzc+6)*(num4×4+1)*1.15+100cycles/header+UVbits/10, where “nzc” is the number of non-zero transform coefficient.
  • Furthermore, it will be appreciated that by implementing this configuration, encoding times for YUV regardless of the whether YUV444 or YUV420 will have approximately the same processing time due to the parallel entropy encoding of the Y, U and V color values.
  • In another exemplary embodiment, the hardware described above can be implemented using a processor executing instruction from a non-transitory storage medium. Those skilled in the art can appreciate that the instructions are created using a hardware description language (HDL) that is a code for describing a circuit. An exemplary use of HDLs is the simulation of designs before the designer must commit to fabrication. The two most popular HDLs are VHSIC Hardware Description Language (VHDL) and VERILOG. VHDL was developed by the U.S. Department of Defense and is an open standard. VERILOG, also called Open VERILOG International (OVI), is an industry standard developed by a private entity, and is now an open standard referred to as IEEE Standard 1364. A file written in VERILOG code that describes a Joint Test Access Group (JTAG) compliant device is called a VERILOG netlist. VHDL is an HDL defined by IEEE standard 1076.1. Boundary Scan Description Language (BSDL) is a subset of VHDL, and provides a standard machine- and human readable data format for describing how an IEEE Std 1149.1 boundary-scan architecture is implemented and operates in a device. Any HDL of the types described can be used to create instructions representative of the hardware description.
  • Although the invention has been described in terms of exemplary embodiments, it is not limited thereto. Rather, the appended claims should be construed broadly, to include other variants and embodiments of the invention, which may be made by those skilled in the art without departing from the scope and range of equivalents of the invention.

Claims (20)

What is claimed is:
1. A system for video compression comprising:
an encoding engine for encoding a YUV stream wherein Y, U and V color values are encoded in parallel and for patching together said Y, U and V color streams to form a compressed YUV output stream.
2. The system of claim 1 wherein:
said YUV stream has an average bits per pixel value that varies from a first value to a second value that is double the first value; and
said encoding engine encodes said YUV stream in generally the same amount of time regardless of the average bits per pixel value.
3. The system of claim 2 wherein:
said encoding engine encodes said YUV stream at a rate generally determined by kernel engine speed.
4. The system of claim 1 wherein said encoding engine includes:
one encoding engine for each color value of said YUV stream; and
a control engine for operating all of said encoding engines in parallel.
5. The system of claim 4 wherein:
said control engine controls the patching of said encoded U and V color values with said encoded Y color value.
6. The system of claim 5 wherein:
said patching of said encoded U and V color values with said encoded Y color value are completed sequentially after parallel encoding of said Y, U and V color values.
7. The system of claim 1 said encoding engine including:
a residual pre-processing engine for determine color values while avoiding null value registers; and
at least one buffer for storing said determined color values.
8. The system of claim 7 said encoding engine including:
compressing level and register location of said stored determined color values from said at least one buffer in parallel.
9. A method for video compression comprising:
encoding using an encoding engine a YUV stream wherein Y, U and V color values are encoded in parallel; and
patching together said Y, U and V color streams to form a compressed YUV output stream.
10. The method of claim 9 wherein:
said YUV stream has an average bits per pixel value that varies from a first value to a second value that is double the first value; and
encoding said YUV stream in generally the same amount of time regardless of the average bits per pixel value.
11. The method of claim 10 wherein:
encoding said YUV stream at a rate generally determined by kernel engine speed.
12. The method of claim 9 wherein said encoding includes:
encoding each color value of said YUV stream in parallel using parallel encoding engines; and
controlling operation all of said encoding engines in parallel.
13. The method of claim 12 wherein:
controlling operation includes controlling the patching of said encoded U and V color values with said encoded Y color value.
14. The method of claim 13 wherein:
said patching of said encoded U and V color values with said encoded Y color value are completed sequentially after parallel encoding of said Y, U and V color values.
15. The method of claim 9 said encoding includes:
determining color values while avoiding null value registers; and
storing said determined color values in at least one buffer.
16. The method of claim 15 said encoding includes:
compressing level and register location of said stored determined color values from said at least one buffer in parallel.
17. A computer readable non-transitory medium including instructions which when executed in a processing system cause the system to provide video compression comprising:
encoding using an encoding engine a YUV stream wherein Y, U and V color values are encoded in parallel;
patching together said Y, U and V color streams to form a compressed YUV output stream; and
said encoding includes:
encoding each color value of said YUV stream in parallel using parallel encoding engines; and
controlling operation all of said encoding engines in parallel.
18. The computer readable non-transitory medium of claim 17 wherein:
said YUV stream has an average bits per pixel value that varies from a first value to a second value that is double the first value; and
encoding said YUV stream in generally the same amount of time regardless of the average bits per pixel value.
19. The computer readable non-transitory medium of claim 17 said encoding includes:
determining color values while avoiding null value registers; and
storing said determined color values in at least one buffer.
20. The computer readable non-transitory medium of claim 19 said encoding includes:
compressing level and register location of said stored determined color values from said at least one buffer in parallel.
US13/611,959 2012-09-12 2012-09-12 System for video compression Abandoned US20140072027A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/611,959 US20140072027A1 (en) 2012-09-12 2012-09-12 System for video compression
US15/491,887 US10542268B2 (en) 2012-09-12 2017-04-19 System for video compression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/611,959 US20140072027A1 (en) 2012-09-12 2012-09-12 System for video compression

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/491,887 Continuation US10542268B2 (en) 2012-09-12 2017-04-19 System for video compression

Publications (1)

Publication Number Publication Date
US20140072027A1 true US20140072027A1 (en) 2014-03-13

Family

ID=50233258

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/611,959 Abandoned US20140072027A1 (en) 2012-09-12 2012-09-12 System for video compression
US15/491,887 Active 2033-04-19 US10542268B2 (en) 2012-09-12 2017-04-19 System for video compression

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/491,887 Active 2033-04-19 US10542268B2 (en) 2012-09-12 2017-04-19 System for video compression

Country Status (1)

Country Link
US (2) US20140072027A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9661340B2 (en) 2012-10-22 2017-05-23 Microsoft Technology Licensing, Llc Band separation filtering / inverse filtering for frame packing / unpacking higher resolution chroma sampling formats
US9749646B2 (en) 2015-01-16 2017-08-29 Microsoft Technology Licensing, Llc Encoding/decoding of high chroma resolution details
US9854201B2 (en) 2015-01-16 2017-12-26 Microsoft Technology Licensing, Llc Dynamically updating quality to higher chroma sampling rate
CN107948652A (en) * 2017-11-21 2018-04-20 青岛海信电器股份有限公司 A kind of method and apparatus for carrying out image conversion
CN108053452A (en) * 2017-12-08 2018-05-18 浙江理工大学 A kind of digital image colors extracting method based on mixed model
US9979960B2 (en) 2012-10-01 2018-05-22 Microsoft Technology Licensing, Llc Frame packing and unpacking between frames of chroma sampling formats with different chroma resolutions
CN109120938A (en) * 2017-06-26 2019-01-01 深圳市中兴微电子技术有限公司 A kind of Camera middle layer image processing method and system on chip
WO2019041222A1 (en) * 2017-08-31 2019-03-07 深圳市大疆创新科技有限公司 Encoding method, decoding method, encoding apparatus and decoding apparatus
US20190222623A1 (en) * 2017-04-08 2019-07-18 Tencent Technology (Shenzhen) Company Limited Picture file processing method, picture file processing device, and storage medium
US10368080B2 (en) 2016-10-21 2019-07-30 Microsoft Technology Licensing, Llc Selective upsampling or refresh of chroma sample values
US11394396B2 (en) * 2020-09-25 2022-07-19 Advanced Micro Devices, Inc. Lossless machine learning activation value compression

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020094031A1 (en) * 1998-05-29 2002-07-18 International Business Machines Corporation Distributed control strategy for dynamically encoding multiple streams of video data in parallel for multiplexing onto a constant bit rate channel
US20060256854A1 (en) * 2005-05-16 2006-11-16 Hong Jiang Parallel execution of media encoding using multi-threaded single instruction multiple data processing
US20080137731A1 (en) * 2005-09-20 2008-06-12 Mitsubishi Electric Corporation Image encoding method and image decoding method, image encoder and image decoder, and image encoded bit stream and recording medium
US20080253461A1 (en) * 2007-04-13 2008-10-16 Apple Inc. Method and system for video encoding and decoding
US7460725B2 (en) * 2006-11-09 2008-12-02 Calista Technologies, Inc. System and method for effectively encoding and decoding electronic information
US20090175548A1 (en) * 2007-05-17 2009-07-09 Sony Corporation Information processing device and method
US20100119167A1 (en) * 2008-11-11 2010-05-13 Sony Corporation Image decoding apparatus, image decoding method and computer program
US20100225655A1 (en) * 2009-03-06 2010-09-09 Microsoft Corporation Concurrent Encoding/Decoding of Tiled Data
US20110235699A1 (en) * 2010-03-24 2011-09-29 Sony Computer Entertainment Inc. Parallel entropy coding
US20120093234A1 (en) * 2007-11-13 2012-04-19 Elemental Technologies, Inc. Video encoding and decoding using parallel processors
US20120230598A1 (en) * 2009-09-24 2012-09-13 Sony Corporation Image processing apparatus and image processing method
US20140023286A1 (en) * 2012-07-19 2014-01-23 Xuanming Du Decoder performance through quantization control

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS59126368A (en) * 1983-01-10 1984-07-20 Hitachi Ltd Coder and encoder
JPH1013858A (en) * 1996-06-27 1998-01-16 Sony Corp Picture encoding method, picture decoding method and picture signal recording medium
JP4427827B2 (en) * 1998-07-15 2010-03-10 ソニー株式会社 Data processing method, data processing apparatus, and recording medium
US7233622B2 (en) * 2003-08-12 2007-06-19 Lsi Corporation Reduced complexity efficient binarization method and/or circuit for motion vector residuals
BRPI0609281A2 (en) * 2005-04-13 2010-03-09 Thomson Licensing method and apparatus for video decoding
US20080137744A1 (en) * 2005-07-22 2008-06-12 Mitsubishi Electric Corporation Image encoder and image decoder, image encoding method and image decoding method, image encoding program and image decoding program, and computer readable recording medium recorded with image encoding program and computer readable recording medium recorded with image decoding program
CN101783943B (en) * 2005-09-20 2013-03-06 三菱电机株式会社 Image encoder and image encoding method
US8300694B2 (en) * 2005-09-20 2012-10-30 Mitsubishi Electric Corporation Image encoding method and image decoding method, image encoder and image decoder, and image encoded bit stream and recording medium
US8306112B2 (en) * 2005-09-20 2012-11-06 Mitsubishi Electric Corporation Image encoding method and image decoding method, image encoder and image decoder, and image encoded bit stream and recording medium
US8036517B2 (en) * 2006-01-25 2011-10-11 Qualcomm Incorporated Parallel decoding of intra-encoded video
US20080170793A1 (en) * 2007-01-12 2008-07-17 Mitsubishi Electric Corporation Image encoding device and image encoding method
JP2008193627A (en) * 2007-01-12 2008-08-21 Mitsubishi Electric Corp Image encoding device, image decoding device, image encoding method, and image decoding method
US8542748B2 (en) * 2008-03-28 2013-09-24 Sharp Laboratories Of America, Inc. Methods and systems for parallel video encoding and decoding
US8194736B2 (en) * 2008-04-15 2012-06-05 Sony Corporation Video data compression with integrated lossy and lossless compression
BR112012008770A2 (en) * 2009-10-14 2018-11-06 Sharp Kk methods for parallel video encoding and decoding.
WO2011088594A1 (en) * 2010-01-25 2011-07-28 Thomson Licensing Video encoder, video decoder, method for video encoding and method for video decoding, separately for each colour plane
US20120014431A1 (en) * 2010-07-14 2012-01-19 Jie Zhao Methods and Systems for Parallel Video Encoding and Parallel Video Decoding
US20120014429A1 (en) * 2010-07-15 2012-01-19 Jie Zhao Methods and Systems for Parallel Video Encoding and Parallel Video Decoding
US8344917B2 (en) * 2010-09-30 2013-01-01 Sharp Laboratories Of America, Inc. Methods and systems for context initialization in video coding and decoding
US9060173B2 (en) * 2011-06-30 2015-06-16 Sharp Kabushiki Kaisha Context initialization based on decoder picture buffer
US20130003823A1 (en) * 2011-07-01 2013-01-03 Kiran Misra System for initializing an arithmetic coder

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020094031A1 (en) * 1998-05-29 2002-07-18 International Business Machines Corporation Distributed control strategy for dynamically encoding multiple streams of video data in parallel for multiplexing onto a constant bit rate channel
US20060256854A1 (en) * 2005-05-16 2006-11-16 Hong Jiang Parallel execution of media encoding using multi-threaded single instruction multiple data processing
US20080137731A1 (en) * 2005-09-20 2008-06-12 Mitsubishi Electric Corporation Image encoding method and image decoding method, image encoder and image decoder, and image encoded bit stream and recording medium
US7460725B2 (en) * 2006-11-09 2008-12-02 Calista Technologies, Inc. System and method for effectively encoding and decoding electronic information
US20080253461A1 (en) * 2007-04-13 2008-10-16 Apple Inc. Method and system for video encoding and decoding
US20090175548A1 (en) * 2007-05-17 2009-07-09 Sony Corporation Information processing device and method
US20120093234A1 (en) * 2007-11-13 2012-04-19 Elemental Technologies, Inc. Video encoding and decoding using parallel processors
US20100119167A1 (en) * 2008-11-11 2010-05-13 Sony Corporation Image decoding apparatus, image decoding method and computer program
US20100225655A1 (en) * 2009-03-06 2010-09-09 Microsoft Corporation Concurrent Encoding/Decoding of Tiled Data
US20120230598A1 (en) * 2009-09-24 2012-09-13 Sony Corporation Image processing apparatus and image processing method
US20110235699A1 (en) * 2010-03-24 2011-09-29 Sony Computer Entertainment Inc. Parallel entropy coding
US20140023286A1 (en) * 2012-07-19 2014-01-23 Xuanming Du Decoder performance through quantization control

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9979960B2 (en) 2012-10-01 2018-05-22 Microsoft Technology Licensing, Llc Frame packing and unpacking between frames of chroma sampling formats with different chroma resolutions
US9661340B2 (en) 2012-10-22 2017-05-23 Microsoft Technology Licensing, Llc Band separation filtering / inverse filtering for frame packing / unpacking higher resolution chroma sampling formats
US10044974B2 (en) 2015-01-16 2018-08-07 Microsoft Technology Licensing, Llc Dynamically updating quality to higher chroma sampling rate
US9749646B2 (en) 2015-01-16 2017-08-29 Microsoft Technology Licensing, Llc Encoding/decoding of high chroma resolution details
US9854201B2 (en) 2015-01-16 2017-12-26 Microsoft Technology Licensing, Llc Dynamically updating quality to higher chroma sampling rate
US10368080B2 (en) 2016-10-21 2019-07-30 Microsoft Technology Licensing, Llc Selective upsampling or refresh of chroma sample values
US20190222623A1 (en) * 2017-04-08 2019-07-18 Tencent Technology (Shenzhen) Company Limited Picture file processing method, picture file processing device, and storage medium
US11012489B2 (en) * 2017-04-08 2021-05-18 Tencent Technology (Shenzhen) Company Limited Picture file processing method, picture file processing device, and storage medium
CN109120938A (en) * 2017-06-26 2019-01-01 深圳市中兴微电子技术有限公司 A kind of Camera middle layer image processing method and system on chip
WO2019041222A1 (en) * 2017-08-31 2019-03-07 深圳市大疆创新科技有限公司 Encoding method, decoding method, encoding apparatus and decoding apparatus
CN107948652A (en) * 2017-11-21 2018-04-20 青岛海信电器股份有限公司 A kind of method and apparatus for carrying out image conversion
CN108053452A (en) * 2017-12-08 2018-05-18 浙江理工大学 A kind of digital image colors extracting method based on mixed model
US11394396B2 (en) * 2020-09-25 2022-07-19 Advanced Micro Devices, Inc. Lossless machine learning activation value compression

Also Published As

Publication number Publication date
US10542268B2 (en) 2020-01-21
US20170223370A1 (en) 2017-08-03

Similar Documents

Publication Publication Date Title
US10542268B2 (en) System for video compression
US11057585B2 (en) Image processing method and device using line input and output
US10659796B2 (en) Bandwidth saving architecture for scalable video coding spatial mode
TW583883B (en) System and method for multiple channel video transcoding
US5812791A (en) Multiple sequence MPEG decoder
US7085320B2 (en) Multiple format video compression
US9392292B2 (en) Parallel encoding of bypass binary symbols in CABAC encoder
US20140153635A1 (en) Method, computer program product, and system for multi-threaded video encoding
US20080170611A1 (en) Configurable functional multi-processing architecture for video processing
US20060176955A1 (en) Method and system for video compression and decompression (codec) in a microprocessor
US20230276023A1 (en) Image processing method and device using a line-wise operation
US20060176960A1 (en) Method and system for decoding variable length code (VLC) in a microprocessor
CN103986934A (en) Video processor with random access to compressed frame buffer and methods for use therewith
CN103246499A (en) Device and method for parallelly processing images
AU2019101272A4 (en) Method and apparatus for super-resolution using line unit operation
US8427494B2 (en) Variable-length coding data transfer interface
Kim et al. A real-time MPEG encoder using a programmable processor
US7675972B1 (en) System and method for multiple channel video transcoding
Shichao et al. A scalable multi-pipeline JPEG encoding architecture
US9330060B1 (en) Method and device for encoding and decoding video image data
US20070192393A1 (en) Method and system for hardware and software shareable DCT/IDCT control interface
Zhu et al. Hardware JPEG decoder and efficient post-processing functions for embedded application
WO1996036178A1 (en) Multiple sequence mpeg decoder and process for controlling same
Rani et al. Early Performance Analysis of Fully Pipelined JPEG Engine in the Simulation Environment

Legal Events

Date Code Title Description
AS Assignment

Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, HAIBIN;CHEN, ROY;ZHOU, JI;AND OTHERS;REEL/FRAME:028970/0359

Effective date: 20120910

Owner name: ATI TECHNOLOGIES ULC, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ZHANG, LEI;REEL/FRAME:028970/0370

Effective date: 20120911

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION