WO2004040874A1

WO2004040874A1 - Apparatuses and method for audio/video streaming over ip

Info

Publication number: WO2004040874A1
Application number: PCT/CA2003/001685
Authority: WO
Inventors: Ivan Parkhomenko; Dennis Perov
Original assignee: Parkhomenko, Alexander
Priority date: 2002-11-01
Filing date: 2003-11-03
Publication date: 2004-05-13
Also published as: AU2003280255A1; CA2410748A1

Abstract

The audio/video-over-IP, Internet Protocol streaming system includes an encoder engine, (Encoder), for automatically switching communication protocols and encoding said audio and/or video stream, e.g, from a video camera for transmission via said IP network, (Internet), and a decoder engine, (Encoder), in communication with said encoder over said IP network for decoding said encoded stream, e.g., for a television set.

Description

APPARATUSES AND METHOD FOR AUDIO/VIDEO STREAMING OVER IP

Field of the Invention

The present invention relates generally to audio/video transmission techniques and applications, and more particularly to "Video-over-IP" transmission methods and devices.

Background of the Invention

Existing IP (Internet Protocol) network audio and video streaming methods are generally poor in quality, and lack stability and flexibility. While methods are known that deliver video streaming over IP networks, they lack automated protocol switching, require large amounts of bandwidth, and tend to be easily interrupted due to inherent inadequacies in network architecture.

Existing methods typically require from about 3 to 45 Mbps of channel transmission bandwidth for an appropriate level of signal quality. In addition, they also require separate stand-alone devices to perform standards/protocol conversion for both the input and the output, usually resulting in a degradation of quality and transmission delays. In existing devices, protocol switching is not embedded, and must therefore be manually selected by an operator. This often leads to incorrect selections for a given scenario, or a certain network infrastructure, causing further interruptions in video/audio transmission.

What is needed is a way to provide audio/video capture; compression, transmission and decompression of a video and audio signal, all in the same apparatus, and in an automated manner.

For the foregoing reasons, there is a need for an improved method and apparatus for audio and video over IP networks. Summary of the Invention

The present invention is directed to an audio/video-over-IP (Internet Protocol) streaming system, method and apparatus for use with an IP network. The system includes an encoder engine for automatically switching communication protocols and encoding said audio and/or video stream for transmission via said IP network, and a decoder engine in communication with said encoder over said IP network for decoding said encoded stream.

The method includes the steps of receiving and/or transmitting an audio and/or < video signal, and automatically switching communication protocols to ensure signal viability.

The system requires a mere 1 Mbps of available transmission bandwidth, while prior art devices and methods need to have at least 3 to 45 Mbps for the same signal quality. The invention provides standards conversion on both input and output, together in the same device. Automated standards conversion, automated protocol switching, and mirror checking result in low bandwidth requirements for high quality video. Both the encoder and decoder can be assembled using relatively inexpensive "off the shelf components to provide a high quality video transmission device compatible within all IP networks, and to virtually all audio/video standards.

Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.

Brief Description of the Drawings

These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where: Figure 1 is an overview of an audio/video-over-IP system in accordance with an embodiment of the present invention; Figure 2 is an overview of an audio/video-over-IP method in accordance with an embodiment of the present invention; Figure 3 illustrates audio/video capture and compression schematics

(units 100 - 170);

Figure 4 illustrates a schematic drawing of an embodiment of the present invention;

Figure 5 illustrates a signal transmission routine (units 300 - 570); and Figure 6 illustrates a signal receiving and output schematics (units 600

- 770).

Detailed Description of the Presently Preferred Embodiment

The present invention is directed to an audio/video-over-IP (Internet

Protocol) streaming system, method and apparatus for use with an IP network. As illustrated in Figure 1 , the system 10 includes an encoder engine 110 for automatically switching communication protocols and encoding said audio and/or video stream for transmission via said IP network 130, and a decoder engine 150 in communication with said encoder over said IP network 130 for decoding said encoded stream.

As illustrated in Figure 2, the method 101 includes the steps of receiving and/or transmitting an audio and/or video signal 102, and automatically switching communication protocols to ensure signal viability 104.

The system 10 captures audio/video signal(s) from an analog source, digitizes, compresses and transmits digitized video and audio signals over IP (Internet Protocol) networks. It is designed to provide video and audio link(s) for Point-To-Point or Point-To-Multipoint transmissions, and ensures accurate transmission of TV quality video and CD quality audio signals over IP networks. The system 10 optimizes connection between audio/video source and receiver(s) by means of converting original audio/video signal into a digital compressed audio/video data stream and transmitting this data stream from source location to receiver(s) location(s) using an IP network, such as the Internet, to achieve a quality of audio/video reception that approximates that of a satellite link.

The system 10 can capture an analog audio/video signal from the source, convert the signal into a digital format and transmit that signal over an IP network to a receiver. Audio/video compression techniques are used to reduce the storage space and bandwidth required to operate with digital audio/video data. Automated optimization and switching of transmission protocols ensures that the connection between a client and an audio/video source will stay on as long as required, and provide the necessary connection quality sufficient for an acceptable level of audio/video reception.

An encoding apparatus captures analog audio/video signal and encrypts it into a coded bit stream. Upon software execution, the apparatus authenticates a client by means of a username and password, and checks for a valid IP address and the unique MAC (Media Access Control) address of the client. A load balancing procedure then provides continuous uninterrupted audio/video streaming at or above industry standard average signal quality, thereby withstanding network lags, and increased amount of hops (connecting IP network routers between server and client). In instances where a default server detects a delay by lag between itself and a client, the database is prompted/tasked to perform a mirror search for the closest mirror server, provided by instructing server computers that are listed as mirrors to send an echo signal (PING) to a client. The system 10 then calculates the lowest echo response time, and based on the lowest calculated response time, redirects a client to request an audio/video data stream from the closest and/or fastest mirror server.

A protocol switching procedure is executed on the server and client to establish a successful connection. During the protocol switching procedure, the server is instructed to send a specific data packet that will predetermine if a multicast UDP (User Datagram' Protocol) packet reached the clients machine, thereby inferring that a multicast UDP protocol can be used. In the event that no UDP packet reaches the client computer, an appropriate response protocol is directed to initialize a Unicast (UDP) method of a packet delivery. In the unlikely event that a UDP "lock" is forced by a firewall, proxy server or other network device, and the client cannot receive UDP packets; the server will automatically switch to the less efficient but more stable HTTP (Hypertext Transport Protocol) protocol, which will almost always be passed through the aforementioned network devices.

If network slowdown or quality degradation is observed, the server can be automatically instructed to perform various efficiency techniques. These techniques can include resizing the video to a smaller format, reducing the frame rate to sustain a minimum video quality, decreasing the audio sampling rate, and increasing buffering times. These techniques therefore manage the client's connection bandwidth, all geared towards optimizing and increasing the chances of robust non-stop audio/video reception by a client device.

Upon the completion of the aforementioned steps, a video format detection process is performed. The video data stream is then passed to an encoding engine that analyzes and encodes the video in its originating format, such as PAL SECAM or NTSC.

In an embodiment of the present invention, a decoding apparatus is provided that is configured to listen exclusively for incoming data streams by means of a network interface card (NIC). The decoding device is designed to receive scrambled audio/video bit streams, and convert then into viewable audio/video signals. In addition, the device can optionally storing one or more streams in digital format on a storage device, and/or storing it in analog device, such as a VCR (Video Cassette Recorder) type of device.

The input device 100 provides the encoding engine 110 with a stream of digital audio and/or video. The audio source can be provided in any format including digital WAV (CD) format, 5.1 Dolby TM (6-channel audio), a regular tape recorder or a microphone. As illustrated in Figure 3, an analog or digital audio/video input or "source video" device 100 provides a device incorporating an encoding engine 110 with raw digital audio/video data. Data can be taken from a local storage device, or streamed live into input device 100, from a camera, microphone, via satellite live feed, and the like. The video source can be provided in any format such as NTSC, PAL, or SECAM, and any resolution. Supported inputs include Composite Video, S-Video, RGB/SCART.

One method is to fully decode the incoming video signal into separate components (RGB or YUV), mix this with the scan converter components, and re-encode back to video. Since the video signal is never decoded, it remains at a very high bandwidth; such as when using a composite video input. Signal delays (from the video input to the encoding engine [110] are reduced to approximately 20ns. The synchronization pulse width and sub carrier frequencies of the video input remain unchanged. Video input synchronization & sub-carrier SC/H timings are unaffected.

A software-based audio/video compression encoding engine 110 processes digital audio/video stream with an audio/video compression codec, ensuring that the quality of audio/video stream is preserved without significant degradation, and a reduced size of the binary audio/video stream that can be uncompressed and converted to audio/video output by a decoder device employing the same compression codec as the encoder engine 110. The audio/video compression codec ratio can vary greatly from virtually loss-less, less than 1% video quality loss, to a low bandwidth digital audio/video stream with a much higher loss of audio/video quality.

Compressed and uncompressed digital audio/video signal can be optionally automatically stored on a transmitted data storage device 115 for later transmission, compression, playback or editing by pre-selecting a storage option on the encoding engine 110 device. A digitized and/or compressed audio/video stream is later sent to the storage device for archiving. Archived files can be additionally edited, played back, compressed, and/or transmitted unchanged at a later time.

A transmission and broadcast engine 120 is embedded/incorporated in a device having a network interface card (NIC), and is in charge of distribution of the digital audio/video stream over IP networks such as the Internet. The transmission and broadcast engine 120 distributes an audio/video signal as a point-to-point connection and broadcast via point-to-multipoint distribution using UDP and/or HTTP protocols. The transmission and broadcast engine 120 can multicast a single audio/video data stream that multiple users can receive simultaneously. This procedure functions only where permitted by network operations, since many networks lock out Multicast UDP protocols in order to avoid what they perceive as unnecessary network traffic.

The system 10 operates over IP network 130. This is a network "cloud" that has two or more computers. This can be the Internet with millions of client computing devices, or merely a local area network (LAN) with just a few computers. This network 130 operates using the IP format, and preferably support UDP multicast for a more efficient broadcast distribution.

A network receiver 140 includes a network interface card and software drivers that are able to communicate using the IP protocol with an encoding device and it's associated transmission and broadcast engine 120. The network receiver 140 is connected to the same IP network as the encoder device, and is capable of receiving a digital audio/video stream in the same format as it is was sent without dropping packets, and maintains the same rate of reception set by the encoding engine 1 10. The received compressed digital audio/video stream is then sent to a decoder engine 150, and/or optionally to a received data storage device 160 for archiving and the like.

The received data storage device 160 that provides space for the archiving of compressed digital audio/ video data streams. Stored data can be retrieved and sent to the decoder engine for a decompression procedure to convert the audio/video stream into an uncompressed playable format, or be used by editing software to perform desired editing procedures.

The decoder engine 150 is implemented as a software algorithm that is able to decode a compressed digital audio/video stream received by the •network receiver 140 from the encoder engine 110 by means of the IP network 130, or optionally from the transmitted data storage device 115. The compressed audio/video stream is then converted by the decoder engine 150 into an uncompressed audio/video signal that can be used by playback device 170 such as a video display. The audio/video playback device 170 then converts the uncompressed digital audio/video stream from the decoder engine 150 into a playable analog audio/video format that can be sent to the playback device 170, such as monitor or speakers to further monitor the output.

Figure 5 illustrates the initialization of algorithm functions and routines 300 that can be performed at one or more locations in the system 10. Program interface loading 310 includes login and password input fields on an included GUI (Graphic User Interface). An exemplary example of such a GUI can be provided in the form of a web page that can be viewed by any computer with Internet browser capabilities, and an IP network connection. Request for login and password occurs at 320, probing of an IP address and sending this information to an encoding unit that checks the validity of a connected user occurs at 340.

The IP Check procedure at 340 verifies an authorized decoder device by comparing the client's MAC (Media Access Control) address and subnet mask with an existing record in an access database. Since a MAC address is a unique identifier that is proprietary to every network device and no two MAC addresses are the same, this function ensures a secure connection is in place in point-to-point sessions where audio/video information is only intended for one specific subscriber. Authentication procedure at 330 detects if the username and password are authenticated. If authentication fails, an error message is displayed and a log entry added with a timestamp and the IP information of a possible unauthorized user [360]. By compiling results from authentication and IP check procedures at 350, the program has two options: grant access to a user or redirect to an error message and a log entry creation procedure 330 if authentication fails. An error message function at 355 displays an event to a user depending on the result received from the authentication procedure or an IP check, and logs the client's IP address and timestamps it.

If authorization is granted, the decoder is checked for a '1 on 1' priority tag at 360, meaning it will receive the audio/video data stream alone and no other device will be authorized to view the same data channel, unless it has the same priority tag. The tag can be configured to be granted only to mirror servers and/or a connection that requires higher security and session stability. This can be used to increase security for copyrighted material, or sensitive audio/video transmissions such as audio ideo conferences, as well as to increase the quality of the connection.

If the connection is tagged as 1-1 , then the source is set to "direct" 365, meaning that the client will get the audio/video stream directly from the encoder. If there is no 1-1 tag present then the session will proceed to a mirror checking procedure, which will appoint an appropriate source for the audio/video data stream. A mirror lookup routine is initiated at 370 after a user has been checked for a 1-1 priority tag, and barring any such detection, can then access mirror servers if any are present.

At 375, if no mirror server computers are present, the client will attempt to connect to the encoder server directly. However, it can only be available if no 1-1 connections are established between the encoder and other clients. At 380-385, check if 1-1 connection has been established with a server, if it is true that it is not a '1-1' session, it will be refused a connection and will proceed to an error message and termination signal, since all the resources and bandwidth available to the encoding server should be allocated to 1-1 sessions. This in turn indicates that it is a closed session, or mirror servers have established connections with the encoder and will provide other users with retransmissions of the audio/video stream.

At 390, if mirror server(s) are present then redirect the session to "Find

Fastest mirror" 390, if not then redirect to "Direct" source 400. Users IP address is determined, after which a host lookup is performed. From the acquired client's host information, perform a WHOIS function to an Internic database to determine the registration country of a primary host name, and an entry is added to a log file. A database is then contacted to lookup a mirror site in the given country or region to ensure a faster connection and better network quality. If no results are found using the WHOIS function, redirect the user to the fastest mirror server as determined by a PING command, or a default streaming server.

At 400, set server source to "Direct" if bandwidth and CPU load permit, and no next fastest mirror can be set as source. At 420, a Protocol Check is initiated after a successful usemame/password and IP authentication. Data packets are sent in Multicast and Unicast (UDP) protocols, if the response is received within a permitted time frame, Multicast or Unicast is adopted as the default streaming protocol for the given user/group. If the UDP protocol is locked for the user, then the HTTP protocol will be used as the transport protocol for the audio/video stream.

At 430, a bandwidth check is initiated after sending data packets at different buffer size values and determining the mean value of the response time have accomplished the Protocol Check. If the result confirms an appropriate time for network performance, the user is passed directly to a Mirror lookup function with default buffer time settings. If degraded network performance has been detected reapply a Bandwidth check function with lower buffer values. After the results have been analyzed, techniques can be applied such as setting the buffering time to a value higher than the default value, decreasing the audio sampling rate, decreasing the bandwidth settings, resizing the video, and/or reducing the frame rate; all to achieve sustainable audio/video quality with lower bandwidth settings.

At 440, adjust buffering, resize video and adjust audio bit rate to match the available bandwidth. At 450, display a warning if the bandwidth is lower than is necessary for high quality audio/video reception, such as "The bandwidth speed is Low". At 460, increase the buffering value, resize the video, reduce the frame rate, and change the sampling rate of the audio stream. At 470, initialize video streaming procedure. At 480, receive a termination signal after the streaming session is complete or interrupted.

At 490, when fastest mirror server is found, the decoder is redirected to the fastest "Mirror" source to receive an audio/video stream from that server. At 500, a protocol check is initiated after successful username/password and IP authentications. Data packets are sent in Multicast and Unicast protocols, if the response is received within a permitted time frame, multicast or unicast is adopted as the default streaming protocol for the given user/group.

A Bandwidth Check is initiated at 510 after sending data packets at different buffer size values and determining the mean value of the response time have accomplished the Protocol Check. If the result confirms an appropriate time for network performance, the user is passed directly to a Mirror lookup function with default buffer time settings. If degraded network performance has been detected, reapply a Bandwidth check function with lower buffer values. After the results have been analyzed, set the buffering time to a value higher than the default value, decrease the bandwidth settings, resize the video, reduce the frame rate; all geared towards achieving better video quality with lower bandwidth settings.

At 520, display a warning if the bandwidth is lower than is necessary for high quality audio video reception, such as "The bandwidth speed is Low". At 530, increase the buffering value, resize the video and reduce the frame rate. At 540, set the buffering and resize video to match the available bandwidth. At 550, initialize video streaming procedure. At 560, receive termination signal after the streaming is complete or interrupted. At 570, end program.

Figure 6 illustrates initialization of algorithm functions and routines 600 that can be performed at one or more locations in the system 10. Program interface loading at 610 includes login and password input fields on an included GUI (Graphic User Interface).

At 620, ask user for Server IP address, username and password. At 630, establish connection to server using a TCP/IP connection. At 640, check authorization of username and password, IP and MAC addresses. At 650, error function, gives an error message if the client was unable to establish a successful connection to the server if username and password did not match. At 660, establish a session with the Server with given credentials.

At 670, send an Echo signal, perform a handshake, and exchange headers to validate all connection properties and settings. At 680, open a listening port on the Client side for incoming video bit streams. At 690, start receiving streaming packets. At 700, prompt user to select an output type, such as video out and/or storage. At 710, storage selected instead of "direct- to-video", therefore monitor output. At 720, analyze video.

Check for the existence of a video output port at 730, such as Composite or S-Video. The lower the graphics resolution and refresh rate, the better the image quality. All scan converters store the computer image to be converted to video in their own internal memory, and in order to do so the computer image has to be "sampled" multiple times during each scan line. Each sample stores one pixel of information in memory. The number of samples taken is proportional to the image quality, such as the more samples the better. Higher graphics resolutions take less time to display each scan- line than lower ones do, therefore there will be more samples per line for lower resolution modes, since there is more time for more samples to be taken, and hence this will provide a better image quality. At 740, error message "No video output port detected", and video signal is directed to default VGA port. At 750, initialize and select video output port (S-VHS, Analog composite or SCART). The absolute maximum resolution is 1600x1200, maximum 1024x768 with no line dropping in NTSC, 1280x1024 in PAL. 24 bit compatible - 23 bits stored. 24 kHz to 100 kHz horizontal scan rate. Virtually any vertical scan rate accepted, therefore the horizontal scan rate is more important. Separate TTL-level HSync & VSync positive or negative are performed.

At 760, select output format. The lower the graphics resolution, the better the 'vertical' image quality. Video monitors have a fixed number of lines available for displaying pictures; for PAL it is 576 and for NTSC 480. Although some of these are typically off the top and bottom edges of the screen. Therefore, the more scan-lines a graphics resolution has, such as an 800x600 resolution has 600 scan-lines), the more difficult is it to squeeze all these lines into the limited number available on a TV monitor. Thus, lowering the graphics resolution helps to improve image quality. Software resizes the video signal accordingly to fit all lines with the aspect ratio of the original image preserved. At 770, end.

The system 10 creates an audio/video link that enables broadcast quality of NTSC 525 Lines/60Hz, PAL 625 lines/50Hz video signal to be sent and received over any IP network with bandwidth of less than 1 Mbps for the above resolutions. The system 10 performs as a standard converter that can encode a video signal in any format, be it NTSC, SECAM, or PAL, and decodes it in again any desired format NTSC or SECAM in real time. The system 10 provides for uninterrupted video broadcasting by using mirror technology that finds the optimal connections between clients and servers, through techniques like maintaining fewer hops and avoiding congested zones.

By employing a protocol switching technology, the system 10 enables an audio/video signal to penetrate virtually any IP network to deliver a stable audio/video stream between client and server. If a multicast protocol is blocked by a router or firewall settings, the software will switch to a slightly less efficient, yet more compatible/usable unicast protocol. In this way, the system 10 is flexible in the way it delivers content.

As well, if the connection between "encoder" and "decoder" is found to be slower than between "mirror" and "decoder", the system 10 will switch to an optimal streaming mirror server for more reliable stream acquisition. The system 10 ensures accurate transmission of TV quality video and CD quality audio signals over IP networks. The system 10 can be used in content delivery networks, telecommunications, live event streaming, corporate meetings, distance education, and telemedicine applications.

The system 10 requires a mere 1 Mbps of transmission bandwidth to transmit a quality signal. The system 10 provides automated standards conversion on both the input and output, together in the same device. The incorporation of automated standards conversion, automated protocol switching, and mirror checking all result in low bandwidth requirements for transmitting high quality audio and video. Both the encoder and decoder can be assembled using relatively inexpensive "off the shelf components to provide a high quality video transmission device compatible within any IP network, and virtually all audio/video standards. No dedicated ISDN line is required. Faster and clearer signal conversion from all CODECS.

Although the present invention has been described in considerable detail with reference to certain preferred embodiments thereof, other versions are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the preferred embodiments contained herein.

Claims

What is claimed is:

1. An audio/video-over-IP (Internet Protocol) streaming system for use with an IP network, comprising: an encoder engine for automatically switching communication protocols and encoding said audio and/or video stream for transmission via said IP network; and a decoder engine in communication with said encoder over said IP network for decoding said encoded stream.

The system according to claim 1 , further comprising a signal maintenance module for automatically instructing said encoder engine to perform various efficiency techniques if network slowdown or quality degradation is observed.

3. The system according to claim 2, wherein said efficiency technique module includes sub-modules selected from the list consisting of resizing the video to a smaller format module, reducing the frame rate to sustain video quality module, decreasing the audio sampling rate module, and increasing buffering times module, to manage said decoder engine's connection bandwidth, for optimizing and increasing the chances of nonstop audio/video reception.

4. The system according to any one of claims 1 through 3, further including a data storage device for archiving and other storage requirements.

5. An audio/video-over-IP method comprising the steps of:

(i) receiving and/or transmitting an audio and/or video signal; and (ii) automatically switching communication protocols to ensure signal viability.

6. The method according to claim 5, further including the step of automatically applying at least one signal maintenance technique to maintain signal viability/quality.

7. The method according to claim 2, wherein said signal maintenance techniques are selected from the list of steps consisting of resizing the video to a smaller format, reducing the frame rate to sustain video quality, decreasing the audio sampling rate, and increasing buffering times.

8. An audio/video-over-IP apparatus comprising: means for receiving and/or transmitting an audio and/or video signal; and means for automatically switching communication protocols to ensure signal viability.

9. A storage medium readable by a computer encoding a computer process to provide an audio/video-over-IP method, the computer process comprising: a processing portion for receiving and/or transmitting an audio and/or video signal; and a processing portion for automatically switching communication protocols to ensure signal viability.