Introduction:
Voice communication has traditionally been carried over dedicated Telephone networks
operated by Telecommunication service providers such as the BSNL and MTNL in India or
AT&T in USA. These telephone networks have progressively evolved from the initial analog
circuits to the current digital networks with bandwidth in excess of 1 Gbps. For reasons of
varying bandwidth and networking requirements, different services were provided on
separate networks. For example, Telegraph networks, Telex networks, Telephone networks,
Facsimile networks, Cable networks and Data networks support different services, as their
names would suggest.
These networks possessed characteristics that satisfied the peculiar requirements of the
service they provided. For example, the voice network would support bandwidths of 64
Kbps for voice communication and would ensure telco-grade voice communication with little
jitter and echo cancellation. Likewise, the cable networks would provide even higher
bandwidth and improved quality of service (QoS) for video transmission. On the other hand,
the data communication networks’ bandwidth and QoS requirements are highly flexible. This
means that the data communication requirements could be a Telnet application, requiring
minimal data pipe, but reasonably fast network response times. Or it could be a batch
transmission application that required a higher throughput, but can tolerate larger inter
packet delivery delays. For most types of data communication applications, reliability is
critical, which means that the delivery protocols would implement mechanisms for error
checking, acknowledgment, re-transmissions and sequencing. On the other hand, for realtime
applications such as voice communications, it would make little sense to retransmit a
lost packet for play back at the receiving end, if it is out of sequence and is considerably
delayed. Essentially, the main point to be noted is that these networks have been designed
differently in terms of their underlying architecture and communication protocols.
In the late eighties and early nineties it was realized that integrating these networks into a
single integrated network, such that all services would use common facilities, would result in
efficiency and cost savings. This was the new mantra that made possible the creation and
deployment of ISDN and similar networks, bringing data and voice communication together.
However, nearly all these networks were built and operated by major telecommunication
equipment manufacturers and service providers. Although, the major international standards
bodies such as ITU-T (formerly CCITT), or the ETSI defined a relevant set of standards for
implementation and to assure inter-operability between products from different telecom
equipment manufacturers, these standards were still inadequate to reduce the proprietary
nature of most implementations. It meant that even if the standards assured inter-operability
among equipment and networks for existing communication services (which number only in
dozens), they fell woefully short, on account of proprietary implementations, for being able to
spawn and envisage even greater types of potential communication services. Consider what
the Internet has done for conceiving and spawning innumerable types of web-based
applications at progressively lower costs.
Subsequently, from the mid-nineties onwards, the Internet has proved to be the major
all-encompassing network that demonstrated its prowess in delivering all types of
media (data, voice and video) at lowest cost. Data communication equipment manufacturing
Network Convergence and VoIP Debashish Mitra 2 of 36
companies, such as Cisco, have also been instrumental in driving up the reach of the
Internet and Internet protocols. Internet protocols became the preferred protocol for
delivering communication payload for all types of networks, mainly for their open and widely
accepted interface implementations. Contrast this with ATM, which somehow has been left
behind.
However, a major shortcoming in the Internet protocols – TCP, UDP over IP has been their
inability to transfer real-time application data such as voice and video. The major issues were
jitter, network latency, echo cancellation, quality of service and security. To overcome this
shortcoming, newer implementations of IP (e.g. IP version 6) and a flurry of associated
protocol specifications (e.g. H.323 or SIP) were defined to plug the gap between the Internet
protocols and other telecom application-oriented protocols. These activities of developing and
implementing new IP-based protocol definitions for multimedia communications; their
underlying network architecture and also integration with existing networks are collectively
termed as Voice over IP or VoIP in short.
The effort to integrate all communication services over IP is a transition effort on two major
fronts. First, the Telecommunication equipment manufacturers were interested in integrating
the currently deployed services and network protocols to IP. Second, the Data
communication equipment manufacturers, who were already using IP for data
communication services were moving upward to provide voice and multimedia services over
data networks.
Network Convergence and VoIP Debashish Mitra:
The culmination of the above efforts and various standards making bodies is supposed to
achieve the objectives of service portability, network convergence and secured network
access. It is hoped that with the transition of voice (multimedia) over to Internet protocols
would open the doors to the conceptualization and implementation of numerous services in
thousands from the current dozens.
Issues in voice communication over networks
As the IP network was primarily designed to carry data, it does not provide real-time
guarantees but only provides best effort service, which is inadequate for voice
communication. Upper layer protocols were designed to provide such guarantees. Further, as
there are several vendors in the market implementing these protocols, conformance to
standards and interoperability issues have become important. The major issues governing
transfer of a voice stream over the Internet or using Internet protocols are listed below.
Bandwidth requirement
In the analog world, the voice transmission frequency spectrum requirement is 0-3.4 KHz in
the base band, and is nominally called a 4 KHz voice channel for convenience. For digital
telecommunication, the signal is sampled at twice the rate. The minimum-sampling rate
required is thus 8 KHz. If each sample contains 8 bits, the digital bandwidth required works
out to be 64 Kbps.
Telco quality voice requires sampling at 8 KHz. The bandwidth then depends on the level of
quantization. With Linear quantization at 8 bits/sample or at 16 bits/sample, the bandwidth is
either 64 Kbps or 128 Kbps. Further, the quantization (e.g. PCM) is modified by using an Alaw
or μ-law companding curve.
In order to communicate telco-grade voice (or similarly, other real-time applications such as
moving video) two different approaches can be attempted. To transmit information of the
highest quality over unrestricted bandwidth or to reduce the bandwidth required for
transmitting information (voice) of a given quality. Stated differently, decisions are required
regarding what information should be transmitted and how it should be transmitted.
Compression and decompression (CODEC) of digital signals is a means of reducing the
required bandwidth or transmission bit rate. Certain source data are highly redundant,
particularly digitized images such as video and facsimile. If, for example, a digital signal
contains a string of zeroes, it will be economical to transmit a code indicating that a string of
zero follows along with the length of the string. Many different algorithms for compression
and decompression of digital codes have been constructed.
Pulse code modulation (PCM) and adaptive differential PCM (ADPCM) are examples of
"waveform" CODEC techniques. Waveform CODECs are compression techniques that
exploit the redundant characteristics of the waveform itself. In addition to waveform
CODECs, there are source CODECs that compress speech by sending only simplified
parametric information about voice transmission; these CODECs require less bandwidth.
Network Convergence and VoIP Debashish Mitra 4 of 36
Source CODECs include linear predictive coding (LPC), code-excited linear prediction
(CELP) and multipulse-multilevel quantization (MP-MLQ).
Coding techniques for telephony and voice packet are standardized by the ITU-T in its
G-series recommendations.
Some algorithms for voice compression and decompression are given in the table below.
Each CODEC provides a certain quality of speech. The quality of transmitted speech is a
subjective response of the listener. A common benchmark used to determine the quality of
sound produced by specific CODECs is the mean opinion score (MOS). With MOS, a wide
range of listeners judge the quality of a voice sample (corresponding to a particular CODEC)
on a scale of 1 (bad) to 5 (excellent). The scores are averaged to provide the MOS for that
sample. The table below shows the relationship between CODECs and MOS scores.
Compression Methods and MOS Scores
Compression Method - Bit Rate (Kbps) - Framing Size (ms) - MOS Score
G.711PCM 64 1.25 4.1
G.729 CS-ACELP 8 10 3.92
G.729 x 2 Encodings 8 10 3.27
G.729 x 3 Encodings 8 10 2.68
G.729a CS-ACELP 8 10 3.7
Although it might seem logical from a resource usage standpoint to convert all calls to low
bit-rate CODECs to save on infrastructure costs, there are drawbacks to compressing voice.
One of the main drawbacks is signal distortion due to multiple encodings (called tandem
encodings).
Detailed topics:
Audio Code Configuration
Factors to be considered for VOIP
Diff between voice/VOIP call Handiling
Numbering Scheme
Session Initiation Protocol(SIP)
Related VoIP Protocols
H.323 Standard
Quality of Service