RTP is widely used for streaming audio and video; it is designed for applications that send
data in one direction with no acknowledgment. The header of each RTP datagram contains
a timestamp, so the application receiving the datagram can reconstruct the timing of the
original data. It also contains a sequence number, so the receiving side can deal with
missing, duplicate, or out-of-order datagrams.
The two RTP streams, that is, the bi-directional conversation itself, are the important
elements in determining call quality of the voice conversations. Let’s look at the
composition of the RTP datagrams, which transport the voice datagrams.
The four important fields in the RTP header are described below.
All the fields related to RTP sit inside the UDP payload. So, like UDP, RTP is a connectionless
protocol. The software that executes RTP is not commonly part of the TCP/IP protocol
stack, so applications are coded to add and recognize an additional 12-byte header in
each UDP datagram. The sender fills in each header, which contains four important fields:
RTP Payload Type
Indicates which codec to use. The codec conveys the type of data (such as voice,
audio, or video) and how is it encoded.
Sequence Number
Helps the receiving side reassemble the data and to detect lost, out-of-order, and
duplicate datagrams.
Timestamp
Used to reconstruct the timing of the original audio or video. Also, helps the receiving
side determine consistency or the variation of arrival times, known as jitter.
It’s the timestamp that brings real value to RTP. An RTP sender puts a timestamp
in each datagram it sends. The receiving side of an RTP application sees when each
datagram actually arrives and compares this to the timestamp. If the time between
datagrams arrivals is the same as when they were sent, there’s no variation. However,
there could be lots of variation in the arrival times of datagrams depending on
network conditions, and the receiving side can easily calculate this jitter.
Source ID
Lets the software at the receiving side distinguish among multiple, simultaneous incoming
streams.
The accumulation of headers can add a lot of overhead, depending on the size of the data
payload. For example, a typical payload size when using the G.729 codec is 20 bytes, which
means that the codec outputs 20-byte chunks of the VoIP call at a predetermined rate
specific to that codec. With RTP, two-thirds of the datagram is the header because the total
header overhead consists of:
RTP (12 bytes) + UDP (8 bytes) + IP (20 bytes) = 40 bytes
Real bandwidth consumption by VoIP calls is higher that it first appears. The G.729 codec,
for example, has a data payload rate of 8 kbps. Its actual bandwidth usage is higher than
this, however. When sent at 20 ms intervals, its payload size is 20 bytes per datagram. To
this, add the 40 bytes of RTP header (yes, the header is bigger than the payload) and any
additional layer 2 headers. For example, Ethernet drivers generally add 18 more bytes. Also,
there are two concurrent RTP flows (one in each direction), so double the bandwidth consumption
you’ve calculated so far. The “Combined Bandwidth” column in the table below
shows a truer picture of actual bandwidth usage for some common codecs.
Some IP phones let you set the “delay between packets” or “speech packet length,” that is,
the rate at which the sender delivers datagrams into the network. For example, at 64 kbps, a
“20 millisecond speech datagram” implies that the sending side creates a 160-byte datagram
payload every 20 ms. There is a simple equation that relates the codec speed, the delay between
voice datagrams, and the datagram payload size:
Payload size (in bytes) =
Codec speed (in bits/sec) x datagram delay (ms)
-----------------------------------------------
8 (bits/byte) x 1000 (ms/sec)
In this example:
160 bytes = (64000 x 20)/8000
For a given data rate, increasing the delay causes the datagrams to get larger, since the datagrams
are sent less frequently to transport the same quantity of data. A delay of 30 ms at a
data rate of 64 kbps would mean sending 240-byte datagrams.
Codec Nominal
Data Rate
Packetization
Delay
Typical
Datagram
Size
Combined
Bandwidth
for 2 Flows
G.711u 64.0 kbps 1.0 ms 20 ms 174.40 kbps
G.711a 64.0 kbps 1.0 ms 20 ms 174.40 kbps
G.726-32 32.0 kbps 1.0 ms 20 ms 110.40 kbps
G.729 8.0 kbps 25.0 ms 20 ms 62.40 kbps
G.723.1
MPMLQ
6.3 kbps 67.5 ms 30 ms 43.73 kbps
G.723.1
ACELP
5.3 kbps 67.5 ms 30 ms 41.60 kbps
Sunday, February 10, 2008
Voice Streaming Protocols
Labels:
How VoIP Works
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment