« ZurückWeiter »
,, , _,/Z W, -’ <—> ” <'—'—’ Tm: E INITIAL ENCODER START-UP DELA v BUFFER EMPTINESS
FIG. 8 in :> TIME FIG. 9 > TIME FIG. 10
MULTIMEDIA PRESENTATION LATENCY MINIMIZATION
This application is a continuation under 37 CFR 1.53(b) of U.S. patent application Ser. No. 09/205,875, titled “Multimedia Presentation Latency Minimization”, filed on Dec. 4, 1998, now U.S. Pat. No. 6,637,031 commonly assigned hereto, and hereby incorporated by reference.
The present invention relates generally to multimedia communications and more specifically to latency minimization for on-demand interactive multimedia applications.
COPYRIGHT NOTICE/ PERMISSION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright© 1 998, Microsoft Corporation, All Rights Reserved.
Infonnation presentation over the Intemet is changing dramatically. New time-varying multimedia content is now being brought to the Intemet, and in particular to the World Wide Web (the web), in addition to textual HTML pages and still graphics. Here, time-varying multimedia content refers to sound, video, animated graphics, or any other medium that evolves as a firnction of elapsed time, alone or in combination. In many situations, instant delivery and presentation of such multimedia content, on demand, is desired.
“On-demand” is a tenn for a wide set of technologies that enable individuals to select multimedia content from a central server for instant delivery and presentation on a client (computer or television). For example, video-on-demand can be used for entertaimnent (ordering movies transmitted digitally), education (viewing training videos) and browsing (viewing informative audiovisual material on a web page) to name a few examples.
Users are generally connected to the Internet by a comrnunications link of limited bandwidth, such as a 56 kilo bits per second (Kbps) modem or an integrated services digital network (ISDN) cormection. Even corporate users are usually limited to a fraction of the 1 .544 mega bits per second (Mbps) T-1 carrier rates. This bandwidth limitation provides a challenge to on-demand systems: it may be impossible to transmit a large amount of image or video data over a limited bandwidth in the short amount of time required for “instant delivery and presentation.” Downloading a large image or video may take hours before presentation can begin. As a consequence, special techniques have been developed for on-demand processing of large images and video.
A technique for providing large images on demand over a communications link with limited bandwidth is progressive image transmission. In progressive image transmission, each image is encoded, or compressed, in layers, like an onion. The first (core) layer, or base layer, represents a low-resolution version of the image. Successive layers represent succes
sively higher resolution versions of the image. The server transmits the layers in order, starting from the base layer. The client receives the base layer, and instantly presents to the user a low-resolution version of the image. The client presents higher resolution versions of the image as the successive layers are received. Progressive image transmission enables the user to interact with the server instantly, with low delay, or low latency. For example, progressive image transmission enables a user to browse through a large database of images, quickly aborting the transmission of the unwanted images before they are completely downloaded to the client.
Similarly, streaming is a technique that provides timevarying content, such as video and audio, on demand over a communications link with limited bandwidth. In streaming, audiovisual data is packetized, delivered over a network, and played as the packets are being received at the receiving end, as opposed to being played only after all packets have been downloaded. Streaming technologies are becoming increasingly important with the growth of the Internet because most users do not have fast enough access to download large multimedia files quickly. With streaming, the client browser or application can start displaying the data before the entire file has been transmitted.
In a video on-demand delivery system that uses streaming, the audiovisual data is often compressed and stored on a disk on a media server for later transmission to a client system. For streaming to work, the client side receiving the data must be able to collect the data and send it as a steady stream to a decoder or an application that is processing the data and converting it to sound or pictures. If the client receives the data more quickly than required, it needs to save the excess data in a buffer. Conversely, if the client receives the data more slowly than required, it needs to play out some of the data from the buffer. Storing part of a multimedia file in this manner before playing the file is referred to as buffering. Buffering can provide smooth playback even if the client temporarily receives the data more quickly or more slowly than required for real-time playback.
There are two reasons that a client can temporarily receive data more quickly or more slowly than required for real-time playback. First, in a variable-rate transmission system such as a packet network, the data arrives at uneven rates. Not only does packetized data inherently arrive in bursts, but even packets of data that are transmitted from the sender at an even rate may not arrive at the receiver at an even rate. This is due to the fact that individual packets may follow different routes, and the delay through any individual router may vary depending on the amount of traflic waiting to go through the router. The variability in the rate at which data is transmitted through a network is called network jitter.
A second reason that a client can temporarily receive data more quickly or more slowly than required for real-time playback is that the media content is encoded to variable bit rate. For example, high-motion scenes in a video may be encoded with more bits than low-motion scenes. When the encoded video is transmitted with a relatively constant bit rate, then the high-motion frames arrive at a slower rate than the low-motion frames. For both these reasons (variable-rate source encoding and variable-rate transmission chamrels), buffering is required at the client to allow a smooth presentation.
Unfortunately, buffering implies delay, or latency. Start-up delay refers to the latency the user experiences after he signals the server to start transmitting data from the begimiing of the content (such as when a pointer to the content is selected by the user) before the data can be decoded by the client system and presented to the user. Seek delay refers to the latency the
user experiences after he signals to the server to start transmitting data from an arbitrary place in the middle of the content (such as when a seek bar is dragged to a particular point in time) before the data can be decoded and presented. Both start-up and seek delays occur because even after the client begins to receive new data, it must wait until its buffer is sufliciently full to begin playing out of the buffer. It does this in order to guard against future buffer underflow due to network jitter and variable-bit rate compression. For typical audiovisual coding on the Intemet, start-up and seek delays between two and ten seconds are common.
Large start-up and seek delays are particularly armoying when the user is trying to browse through a large amount of audiovisual content trying to find a particular video or a particular location in a video. As in the image browsing scenario using progressive transmission, most of the time the user will want to abort the transmission long before all the data are downloaded and presented. In such a scenario, delays of two to ten seconds between aborts seem intolerable. What is needed is a method for reducing the start-up and seek delays for such “on demand” interactive multimedia applications.
Systems and methods for presenting time-varying multimedia content are described. In one aspect, a lower quality data stream for an initial portion of the multimedia content is received. The lower quality data stream is received at a rate faster than a real-time playback rate for the multimedia content. The lower quality data stream was encoded at a bit rate below a transmission rate. A higher quality data stream of a subsequent portion of the multimedia content is received. The higher quality data stream was encoded at a bit rate that equals the transmission rate. The initial portion and the subsequent portion of the multimedia content are presented at the realtime playback rate. Receiving the initial portion faster than the real-time playback rate provides for a reduction of latency due to buffering by a desired amount.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagram of an exemplary computer system in which the invention may be implemented.
FIG. 2 is a diagram of an example network architecture in which embodiments of the present invention are incorporated.
FIG. 3 is a block diagram representing the data flow for a streaming media system for use with the computer network of FIG. 2.
FIGS. 4A, 4B, 4C, 4D, and 4E are schedules illustrating data flow for example embodiments of the streaming media system of FIG. 3.
FIG. 5 is a decoding schedule for multimedia content preencoded at a full bitrate.
FIG. 6 is a schedule showing the full bit rate encoding of FIG. 5 advanced by T seconds.
FIG. 7 is a schedule showing a low bit rate encoding of the content shown in FIG. 5.
FIG. 8 is a schedule showing the low bit rate encoding schedule of FIG. 7 advanced by T seconds and superimposed on the advanced schedule of FIG. 6.
FIG. 9 is a schedule showing the transition from the delivery of the low bit rate encoded stream of FIG. 7 to the data stream of FIG. 6, with a gap to indicate optional bit stuffing.
FIG. 10 is a schedule showing the advanced schedule of FIG. 6 with a total of RT bits removed from the initial frames.
In the following detailed description of the embodiments, reference is made to the accompanying drawings which fonn a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in suflicient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the present inventions. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present inventions is defined only by the appended claims.
The present invention is a system for achieving low latency responses from interactive multimedia servers, when the transmission bit rate is constrained. A server provides at least two different data streams. A first data stream is a low resolution stream encoded at a bit rate below the transmission bit rate. A second data stream is a nonnal resolution stream encoded at a bit rate equal to the transmission bit rate. The server initially transmits the low resolution stream faster than real time, at a bit rate equal to the transmission bit rate. The client receives the low resolution stream faster than real time, but decodes and presents the low resolution stream in real time. When the client buffer has grown sufliciently large to guard against future underflow by the nonnal resolution stream, the server stops transmission of the low resolution stream and begins transmission of the nonnal resolution stream. The system of the present invention reduces the startup or seek delay for interactive multimedia applications such as video on-demand, at the expense of initially lower quality.
The detailed description of this invention is divided into four sections. The first section provides a general description of a suitable computing enviromnent in which the invention may be implemented including an overview of a network architecture for generating, storing and transmitting audio/ visual data using the present invention. The second section illustrates the data flow for a streaming media system for use with the network architecture described in the first section. The third section describes the methods of exemplary embodiments of the invention. The fourth section is a conclusion which includes a summary of the advantages of the present invention.
An Exemplary Computing Enviromnent.
FIG. 1 provides a brief, general description of a suitable computing enviromnent in which the invention may be implemented. The invention will hereinafter be described in the general context of computer-executable program modules containing instructions executed by a personal computer (PC). Program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Those skilled in the art will appreciate that the invention may be practiced with other computer-system configurations, including hand-held devices, multiprocessor systems, microprocessor-based programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing enviromnents where tasks are performed by remote processing devices linked through a communications network. In a distributed computing enviromnent, program modules may be located in both local and remote memory storage devices.
FIG. 1 employs a general-purpose computing device in the form of a conventional personal computer 20, which includes processing unit 21, system memory 22, and system bus 23