Delay
A very important design consideration in implementing voice communications networks is
minimizing one-way, end-to-end delay. Voice traffic is real-time traffic and if there is too long
a delay in voice packet delivery, speech will be unrecognizable. An acceptable delay is less
than 200 milliseconds. Delay is inherent in voice networking and is caused by a number of
different factors.
There are basically two kinds of delay inherent in today's telephony networks:
Propagation delay – caused by the characteristics of the speed of light traveling via a
fiber-optic-based or copper-based medium of the underlying network.
! Handling delay (also called serialization delay) – caused by the devices that handle voice
information and have a significant impact on voice quality in a packet network. This
delay includes the time it takes to generate a voice packet. DSPs may take 5ms to 20ms
to generate a frame and usually one or more frames are placed in a voice packet.
Another component of this delay is the time taken to move the packet to the output
queue. Some devices expedite this process by determining packet destination and getting
the packet to the output queue quickly. The actual delay at the output queue, in terms of
time spent in the queue before being serviced, is yet another component of this handling
delay and is normally around 10ms. A CODEC-induced delay is considered a handling
delay.
The table below shows the delay introduced by different CODECs.
CODEC Bit Rate (Kbps) Compression Delay (ms)
G.711 PCM 64 5
G.729 CS-ACELP 8 15
G.729a CS-ACELP 8 15
Serialization delay
Serialization delay is the amount of time a router takes to place a packet on a wire for
transmission. Fragmentation helps to eliminate serialization delay, but fragmentation, such as
FRF.12, doesn't help without a queuing mechanism in place. For example, if a 1000-byte
packet enters a router's queue and is fragmented into ten 100-byte packets, without a
queuing mechanism in place, a router will still send all 1000-bytes before it starts to send
another packet. Conversely, if there is a queuing mechanism in place, but no fragmentation,
voice traffic can still fail. If a router receives a 1000-byte packet in its queue and begins
sending this packet in an instant before it receives a voice packet, the voice packet will have
to wait until all 1000 bytes are sent across the wire, before entering the queue, because once
a router starts sending a packet, it will continue to do so until the full packet is processed.
Therefore, it is essential that there is a method for a router to break large data packets into
smaller ones, and a queuing strategy in place to help voice packets jump to the front of a
queue ahead of data packets for transmission.
End-to-End delay
End-to-end delay depends on the end-to-end signal paths/data paths, the CODEC, and the
payload size of the packets.
Jitter
Jitter is variation in the delay of arrivals of voice packets at the receiver. This causes a
discontinuity of the voice stream. It is usually compensated for by using a play-out buffer for
playing out the voice smoothly. Play-out control can be exercised both in adaptive or nonadaptive
Echo Cancellation
Echo is hearing your own voice in the telephone receiver while you are talking. When timed
properly, echo is reassuring to the speaker. If the echo exceeds approximately 25ms, it can
be distracting and cause breaks in the conversation. In a traditional telephony network, echo
is normally caused by a mismatch in impedance from the four-wire network switch
conversion to the two-wire local loop and is controlled by echo cancellers.
In voice over packet-based networks or VoIP, echo cancellers are built into the low bit-rate
CODECs and are operated on each DSP. Echo cancellers are limited by design by the total
amount of time they will wait for the reflected speech to be received, which is known as an
echo trail. The echo trail is normally 32ms.
Reliability
Traditional data communication strives to provide reliable end-to-end communication
between two peers. They use checksum and sequence numbering for error control and some
form of negative acknowledgement with a packet retransmission handshake for error
recovery. The negative acknowledgement with subsequent re-transmission handshake adds
more than a round trip delay to transmission. For time-critical data, the retransmitted
message/packet might therefore be entirely useless. Thus, VoIP networks should leave the
proper error control and error recovery scheme to higher communication layers. They can
thus provide the level of reliability required, taking into account the impact of the delay
characteristics. Therefore, UDP is the transport level protocol of choice for voice and like
communications. Reliability is built into higher layers.
Audio data is delay-sensitive and requires the transmitted voice packets to reach the
destination with minimum delay and minimum delay jitter. Although TCP/IP provides reliable
connection, it is at the cost of packet delay or higher network latency. On the other hand,
UDP is faster compared to TCP. However, as packet sequencing and some degree of
reliability are required over UDP/IP, RTP over UDP/IP is usually used for voice and video
communication.
Interoperability
In a public network environment, in order for products from different vendors to interoperate
with each other, they need to conform to standards. These standards are being
devised by the ITU-T and the IETF. H.323 from ITU-T is by far the more popular standard.
However, SIP/MGCP standards from IETF are rapidly gaining more acceptance as relatively
light weight and easily scalable protocols.
Security
On the Internet, since anybody can capture packets meant for someone else, security of
voice communication becomes an important issue. Some measure of security can be
provided by using encryption and tunneling. Usually, the common tunneling protocol used is
Layer 2 Tunneling protocol, and the common encryption mechanism used is Secure Sockets
Layer (SSL).
Integration with PSTN and ISDN
IP Telephony needs to co-exist with traditional PSTN for still some more time. It means that
both PSTN and IP telephony networks should appear as a single network to users. This is
achieved through the use of gateways between the Internet on the one hand and PSTN or
ISDN on the other.
Scalability
As succeeding VoIP products strive to provide Telco-grade voice quality over IP as is true for
PSTN, but at a progressively lower cost, there is a potential for high growth rates in VoIP
systems. In such a scenario, it is essential that these systems be flexible enough to grow into
large user markets.