Nowadays, digital technologies offer a variety of new channels for communication. The evolution of the last years, led Voice over IP (VoIP) become increasingly significant. On the one hand, VoIP allows the reduction of costs and a high degree of flexibility. On the other hand, it increases the complexity in terms of quality monitoring. Unfortunately, the identification of the root cause of emerging performance issues became even more difficult with this new technology.
VoIP-communications are frequently suffering bad quality and even interruptions, especially in locations with weak internet connection.
So, what can we do when users are continuously calling us, complaining that the quality of the connection is bad, but we do absolutely not understand WHY?
A targeted monitoring of all involved systems (switchboard, internet line, infrastructure of the local network, geographic connection among branch offices) obviously is a good first step, but generally it is not enough to find the root cause of the annoying interruptions during your VoIP-calls.
For simplicity, we just consider those aspects, which affect the above described situation (therefore we do not consider the SIP protocol, which coordinates the management of phone sessions). Let’s concentrate on what happens when two people are connected and “try” to communicate.
To optimize the usage of the line, conversation data is usually coded and compressed through a codec. (Normally the g729, which has a rather good quality and an adequate bandwidth usage (24.8Kbps).)
To accelerate the communication, it is performed via UDP. Actually, this implies that we have no control over the delivery of the packages. In case of an overload, lost packages are not resend and therefore do never arrive to destination. This fact causes the disliked interruptions.
The communication between two phones, let’s assume an internal call between two different branch offices, can take place in two ways:
Peer-to-Peer: The phone communicate directly.
Non Peer-to-Peer: The phones communicate through a switchboard.
These two possibilities can also coexist; this depends on the configuration and the availability between the two branch offices. Non Peer-to-Peer communications need twice as much bandwidth as Peer-to-Peer communication, since data has to pass the line twice.
Now it is clear that applying a traditional monitoring approach does not allow collecting all necessary information to troubleshoot occurring performance issues.
Here the IP SLA, developed by Cisco, comes to our aid. IP SLA measures the performance of IP services on a communication network consisting of Cisco devices. IP SLA permits to continuously and reliably generate traffic between two devices (switches, routers), to obtain quality indicators of this communication. In this way, we can determine how VoIP behaves between branch office A and B, between A and C and between B and C. Once we’ve understood this, we can adopt suitable measures.
There are two indicators, which provide information about the quality of a conversation:
MOS (Mean Opinion Score): Is a value among 1 and 5, where 5 implies an excellent audio quality. [Depending on the codec, a value near 5 can be reached.]
ICPIF (Impairment Calculated Planning Impairment Factor): Is the result of five different values, which ranges between 5 (excellent) and 55 (extremely bad). Normally, a “good” conversation has a value of 10.
The indicators of the concerned periods show that during the conversations a deterioration of MOS and ICPIF occurred. During the same periods, an overload of the network between Milan-Catanzaro is discovered. (see images below)
Since we’ve recognized the problem, we are now able to set a QoS (Quality of Services) for communications between these two branch offices. To do so, we have to set/estimate a number of conversations, which have to be possible simultaneously. Once we set this number, we have to monitor it to understand if the actual bandwidth is sufficient.
The monitoring-system will proactively inform us if deviations from the predefined indicators occur. In this way, we can react in time to avoid restrictions to the users.
Below you will find an example, which shows the quality of MOS, IPCIF and Jitter.