Network Infrastructure Design for IP Phone
1 Introduction
Let’s start by describing the new type of phone system that businesses are now using to replace traditional phone systems in their offices. An “IP Phone System” (sometimes called an IP PBX) uses the technology of “IP (Internet Protocol)” to carry the voice conversations in your office. This does not necessarily mean it uses the public Internet. An IP Phone System uses IP technology within the private data network of a business in a single location or across a private network.
The same cabling that a business uses for its data network is used to carry the voice traffic of the phone system. In some ways they are totally independent and just sharing the same cabling. In one way they affect each other.
They are independent in that if the data server goes down, the voice will still go through. Your phone system will still work. Likewise if they phone system goes down, the data will still go through.The way the IP Phone System and data network could affect each other is in the capacity or “bandwidth” of the network, both in the office and going to the outside world. Data is “forgiving” meaning it is not time sensitive. If it is delayed by several tenths of a second or seconds to move your data back and forth the quality of the data doesn’t suffer. However, voice is time sensitive. It must occur in “real time” which typically means there can’t be more than 150 milliseconds (0.15 seconds) of delay in moving the voice traffic between its destinations. If the combined voice and data traffic is more than the capacity of the network infrastructure to handle it then the voice quality can suffer.
The network infrastructure consists of the cabling and the equipment throughout the network. IP Telephony on a properly designed, private network has the same voice quality as traditional phone systems. To be “properly designed” the network must include a proper “Quality of Service” plan and execution with the proper equipment. (That discussion is too much to include in this article. Give us a call.)
You can use IP Telephony over your private data network to connect remote sites with multiple workers or remote workers in home offices. If you don’t have a private network between sites you can use the public Internet to access remote sites.
2 Helpful
· Seamless extension dialing between all our locations.
· IP Telephony creates lower cost and greater functionality advantages from carrier services.
· Easily and economically connecting home based workers.
· Easily and economically connecting home based workers.
· Enhanced contact center (call center) responsiveness to customer needs.
· “Contact center” is the replacement term for what used to be “call center.”
· Disaster recovery and power outage backup for business continuity.
· Simplified system administration: Through a GUI (graphical user interface). You can make changes to your system that previously required your telephone equipment vendor to make the changes. Therefore, you can significantly reduce your maintenance costs.
· Easier moves of telephone sets: When moving from one location in your building to another, it previously required re-programming the telephone switch and physically changing some wires in the “telephone closet.” With IP Telephony as you pack up your desk supplies and plants, you also grab your telephone. In your new location, you simply plug the telephone into the Ethernet connection in the wall and then connect your computer to a jack in the phone that acts as a bypass for your data. All your personal settings move with you. Costs for moves are dramatically reduced.
· Software upgrades are much easier: And can be performed by you instead of paying the telephone equipment vendor to do them.
There are many more benefits to IP Telephony but this brief overview should be enough to peak your interest to continue your investigation. You don’t need to make a total swap out of your current phone system. It is possible to gradually introduce an IP Telephone System into your organization and interface it to legacy systems.
3 Voice over IP Overview
There are four new sections about SIP: Introduction to SIP, SIP Messages, SIP Call Flow, and SIP — Session Description Protocol.
The other pieces cover digitization of voice, audio codecs, codec latency vs bandwidth optimization, audio jitter, the Real Time Protocol, introduction to H.323, description of H.323 call flow, and H.323 call signalling optimizations.
4 Echo Canceling
5 Gatekeeper Basic Operations
6 Goal of the Project
IP PHONE OR TELEPHONY SYSTEM ARE 2 TYPE
*LAN, MAN, WAN Networks
&
*WIRELESS Networks.
(Discussed in wired technology)
7 OSI Model
The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization. It is a prescription of characterizing and standardizing the functions of a communications system in terms of abstraction layers. Similar communication functions are grouped into logical layers. An instance of a layer provides services to its upper layer instances while receiving services from the layer below.
For example, a layer that provides error-free communications across a network provides the path needed by applications above it, while it calls the next lower layer to send and receive packets that make up the contents of that path. Two instances at one layer are connected by a horizontal connection on that layer.
OSI model |
7. Application Layer |
NNTP · SIP · SSI · DNS · FTP ·Gopher · HTTP · NFS · NTP · SMPP ·SMTP · SNMP · Telnet · DHCP ·Netconf · RTP · SPDY · (more) |
6. Presentation Layer |
MIME · XDR · TLS · SSL |
5. Session Layer |
Named Pipes · NetBIOS · SAP · L2TP · PPTP · SOCKS |
4. Transport Layer |
TCP · UDP · SCTP · DCCP · SPX |
3. Network Layer |
IP (IPv4, IPv6) · ICMP · IPsec · IGMP ·IPX · AppleTalk |
2. Data Link Layer |
ATM · SDLC · HDLC · ARP · CSLIP ·SLIP · GFP · PLIP · IEEE 802.3 ·Frame Relay · ITU-T G.hn DLL · PPP ·X.25 · Network Switch · |
1. Physical Layer |
EIA/TIA-232 · EIA/TIA-449 · ITU-T V-Series · I.430 · I.431 · POTS · PDH ·SONET/SDH · PON · OTN · DSL ·IEEE 802.3 · IEEE 802.11 ·IEEE 802.15 · IEEE 802.16 · IEEE 1394 · ITU-T G.hn PHY · USB · Bluetooth ·Hubs |
OSI Model | ||||
Data unit | Layer | Function | ||
Host
layers |
Data | 7. Application | Network process to application | |
6. Presentation | Data representation, encryption and decryption, convert machine dependent data to machine independent data | |||
5. Session | Interhost communication | |||
Segments | 4. Transport | End-to-end connections, reliability and flow control | ||
Media
layers |
Packet/Datagram | 3. Network | Path determination andlogical addressing | |
Frame | 2. Data Link | Physical addressing | ||
Bit | 1. Physical | Media, signal and binary transmission |
Description of OSI layers
According to recommendation X.200, there are seven layers, each generically known as an N layer. An N+1 entity requests services from the layer N entity.
At each level, two entities (N-entity peers) interact by means of the N protocol by transmitting protocol data units (PDU).
A Service Data Unit (SDU) is a specific unit of data that has been passed down from an OSI layer to a lower layer, and which the lower layer has not yet encapsulated into a protocol data unit (PDU). An SDU is a set of data that is sent by a user of the services of a given layer, and is transmitted semantically unchanged to a peer service user.
The PDU at any given layer, layer N, is the SDU of the layer below, layer N-1. In effect the SDU is the ‘payload’ of a given PDU. That is, the process of changing a SDU to a PDU, consists of an encapsulation process, performed by the lower layer. All the data contained in the SDU becomes encapsulated within the PDU. The layer N-1 adds headers or footers, or both, to the SDU, transforming it into a PDU of layer N-1. The added headers or footers are part of the process used to make it possible to get data from a source to a destination.
Some orthogonal aspects, such as management and security, involve every layer.
Security services are not related to a specific layer: they can be related by a number of layers, as defined by ITU-T X.800 Recommendation.<href=”#cite_note-x800-2″>[3]
These services are aimed to improve the CIA triad (i.e.confidentiality, integrity, availability) of transmitted data. Actually the availability of communication service is determined by network design and/or network management protocols. Appropriate choices for these are needed to protect against denial of service.
Layer 1: Physical Layer
The Physical Layer defines electrical and physical specifications for devices. In particular, it defines the relationship between a device and transmission, such as a copper or optical cable. This includes the layout
of pins, voltages, cable specifications, hubs, repeaters, network, host bus adapters (HBA used in storage area networks) and more.
The major functions and services performed by the Physical Layer are:
§ Establishment and termination of a connection to a communications medium.
§ Participation in the process whereby the communication resources are effectively shared among multiple users. For example, contention resolution and flow control.
§ Modulation, or conversion between the representation of digital data in user equipment and the corresponding signals transmitted over a communications channel. These are signals operating over the physical cabling (such as copper and optical fiber) or over a <href=”#Radio_waves” title=”Electromagnetic wave”>radio link.
Parallel SCSI buses operate in this layer, although it must be remembered that the logical SCSI protocol is a Transport Layer protocol that runs over this bus. Various Physical Layer Ethernet standards are also in this layer; Ethernet incorporates both this layer and the Data Link Layer. The same applies to other local-area networks, such as token ring, FDDI, ITU-T G.hn and IEEE 802.11, as well as personal area networks such as Bluetooth and <href=”#Task_group_4_.28Low_Rate_WPAN.29″ title=”IEEE 802.15″>IEEE 802.15.4.
Layer 2: Data Link Layer
The Data Link Layer provides the functional and procedural means to transfer data between network entities and to detect and possibly correct errors that may occur in the Physical Layer. Originally, this layer was intended for point-to-point and point-to-multipoint media, characteristic of wide area media in the telephone system. Local area network architecture, which included broadcast-capable multiaccess media, was developed independently of the ISO work in IEEE Project 802. IEEE work assumed sublayering and management functions not required for WAN use. In modern practice, only error detection, not flow control using sliding window, is present in data link protocols such asPoint-to-Point Protocol (PPP), and, on local area networks, the IEEE 802.2 LLC layer is not used for most protocols on the Ethernet, and on other local area networks, its flow control and acknowledgment mechanisms are rarely used. Sliding window flow control and acknowledgment is used at the Transport Layer by protocols such as TCP, but is still used in niches where X.25 offers performance advantages.
The ITU-T G.hn standard, which provides high-speed local area networking over existing wires (power lines, phone lines and coaxial cables), includes a complete Data Link Layer which provides both error correction and flow control by means of a selective repeat Sliding Window Protocol.
Both WAN and LAN service arranges bits, from the Physical Layer, into logical sequences called frames. Not all Physical Layer bits necessarily go into frames, as some of these bits are purely intended for Physical Layer functions. For example, every fifth bit of the FDDI bit stream is not used by the Layer.
WAN protocol architecture
Connection-oriented WAN data link protocols, in addition to framing, detect and may correct errors. They are also capable of controlling the rate of transmission. A WAN Data Link Layer might implement a sliding window flow control and acknowledgment mechanism to provide reliable delivery of frames; that is the case for Synchronous Data Link Control (SDLC) and HDLC, and derivatives of HDLC such as LAPB andLAPD.
IEEE 802 LAN architecture
Practical, connectionless LANs began with the pre-IEEE Ethernet specification, which is the ancestor of IEEE 802.3. This layer manages the interaction of devices with a shared medium, which is the function of a Media Access Control (MAC) sublayer. Above this MAC sublayer is the media-independent IEEE 802.2 Logical Link Control (LLC) sublayer, which deals with addressing and multiplexing on multiaccess media.
While IEEE 802.3 is the dominant wired LAN protocol and IEEE 802.11 the wireless LAN protocol, obsolescent MAC layers include Token Ring and FDDI. The MAC sublayer detects but does not correct errors.
Layer 3: Network Layer
The Network Layer provides the functional and procedural means of transferring variable length data sequences from a source host on one network to a destination host on a different network, while maintaining the quality of service requested by the Transport Layer (in contrast to the data link layer which connects hosts within the same network). The Network Layer performs network routing functions, and might also perform fragmentation and reassembly, and report delivery errors. Routers operate at this layer—sending data throughout the extended network and making the Internet possible. This is a logical addressing scheme – values are chosen by the network engineer. The addressing scheme is not hierarchical.
The Network Layer may be divided into three sublayers:
1. Subnetwork Access – that considers protocols that deal with the interface to networks, such as X.25;
2. Subnetwork Dependent Convergence – when it is necessary to bring the level of a transit network up to the level of networks on either side
3. Subnetwork Independent Convergence – which handles transfer across multiple networks.
An example of this latter case is CLNP, or IPv7 ISO 8473. It manages the connectionless transfer of data one hop at a time, from end system to ingress router, router to router, and from egress router to destination end system. It is not responsible for reliable delivery to a next hop, but only for the detection of erroneous packets so they may be discarded. In this scheme, IPv4 and IPv6 would have to be classed with X.25 as subnet access protocols because they carry interface addresses rather than node addresses.
A number of layer management protocols, a function defined in the Management Annex, ISO 7498/4, belong to the Network Layer. These include routing protocols, multicast group management, Network Layer information and error, and Network Layer address assignment. It is the function of the payload that makes these belong to the Network Layer, not the protocol that carries them.
Layer 4: Transport Layer
The Transport Layer provides transparent transfer of data between end users, providing reliable data transfer services to the upper layers. The Transport Layer controls the reliability of a given link through flow control, segmentation/desegmentation, and error control. Some protocols are state- and connection-oriented. This means that the Transport Layer can keep track of the segments and retransmit those that fail. The Transport Layer also provides the acknowledgement of the successful data transmission and sends the next data if no errors occurred.
OSI defines five classes of connection-mode transport protocols ranging from class 0 (which is also known as TP0 and provides the least features) to class 4 (TP4, designed for less reliable networks, similar to the Internet). Class 0 contains no error recovery, and was designed for use on network layers that provide error-free connections. Class 4 is closest to TCP, although TCP contains functions, such as the graceful close, which OSI assigns to the Session Layer. Also, all OSI TP connection-mode protocol classes provide expedited data and preservation of record boundaries. Detailed characteristics of TP0-4 classes are shown in the following table:<href=”#cite_note-3″>[4]
Feature Name | TP0 | TP1 | TP2 | TP3 | TP4 |
Connection oriented network | Yes | Yes | Yes | Yes | Yes |
Connectionless network | No | No | No | No | Yes |
Concatenation and separation | No | Yes | Yes | Yes | Yes |
Segmentation and reassembly | Yes | Yes | Yes | Yes | Yes |
Error Recovery | No | Yes | Yes | Yes | Yes |
Reinitiate connection (if an excessive number of PDUs are unacknowledged) | No | Yes | No | Yes | No |
Multiplexing and demultiplexing over a single virtual circuit | No | No | Yes | Yes | Yes |
Explicit flow control | No | No | Yes | Yes | Yes |
Retransmission on timeout | No | No | No | No | Yes |
Reliable Transport Service | No | Yes | No | Yes | Yes |
Perhaps an easy way to visualize the Transport Layer is to compare it with a Post Office, which deals with the dispatch and classification of mail and parcels sent. Do remember, however, that a post office manages the outer envelope of mail. Higher layers may have the equivalent of double envelopes, such as cryptographic presentation services that can be read by the addressee only. Roughly speaking, tunneling protocols operate at the Transport Layer, such as carrying non-IP protocols such as IBM‘s SNA or Novell‘s IPX over an IP network, or end-to-end encryption with IPsec. While Generic Routing Encapsulation (GRE) might seem to be a Network Layer protocol, if the encapsulation of the payload takes place only at endpoint, GRE becomes closer to a transport protocol that uses IP headers but contains complete frames or packets to deliver to an endpoint. L2TP carries PPP frames inside transport packet.
Although not developed under the OSI Reference Model and not strictly conforming to the OSI definition of the Transport Layer, the Transmission (TCP) and the User Datagram Protocol (UDP) of the Internet Protocol Suite are commonly categorized as Layer 4 protocols within OSI.
Layer 5: Session Layer
The Session Layer controls the dialogues (connections) between computers. It establishes, manages and terminates the connections between the local and remote application. It provides for full-duplex, half-duplex, or simplex operation, and establishes check pointing, adjournment, termination, and restart procedures. The OSI model made this layer responsible for graceful close of sessions, which is a property of the Transmission Control Protocol, and also for session check pointing and recovery, which is not usually used in the Internet Protocol Suite. The Session Layer is commonly implemented explicitly in application environments that use remote procedure calls.
Layer 6: Presentation Layer
The Presentation Layer establishes context between Application Layer entities, in which the higher-layer entities may use different syntax and semantics if the presentation service provides a mapping between them. If a mapping is available, presentation service data units are encapsulated into session protocol data units, and passed down the stack.
This layer provides independence from data representation (e.g., encryption) by translating between application and network formats. The presentation layer transforms data into the form that the application accepts. This layer formats and encrypts data to be sent across a network. It is sometimes called the syntax layer.<href=”#cite_note-4″>[5]
The original presentation structure used the basic encoding rules of Abstract Syntax Notation One (ASN.1), with capabilities such as converting an EBCDIC-coded text file to an ASCII-coded file, or serialization of objects and other data structures from and to XML.
Layer 7: Application Layer
The Application Layer is the OSI layer closest to the end user, which means that both the OSI application layer and the user interact directly with the software application. This layer interacts with software applications that implement a communicating component. Such application programs fall outside the scope of the OSI model. Application layer functions typically include identifying communication partners, determining resource availability, and synchronizing communication. When identifying communication partners, the application layer determines the identity and availability of communication partners for an application with data to transmit. When determining resource availability, the application layer must decide whether sufficient network or the requested communication exists. In synchronizing communication, all communication between applications requires cooperation that is managed by the application layer. Some examples of application layer implementations also include:
§ On OSI stack:
§ FTAM File Transfer and Access Management Protocol
§ X.400 Mail
§ Common management information protocol (CMIP)
§ On TCP/IP stack:
§ Hypertext Transfer Protocol (HTTP),
§ File Transfer Protocol (FTP),
§ Simple Mail Transfer Protocol (SMTP)
§ Simple Network Management Protocol (SNMP).
8 Call Features
ADSI On-Screen Menu System
Alarm Receiver
Append Message
Authentication
Automated Attendant
Blacklists
Blind Transfer
Call Detail Records
Call Forward on Busy
Call Forward on No Answer
Call Forward Variable
Call Monitoring
Call Parking
Call Queuing
Call Recording
Call Retrieval
Call Routing (DID & ANI)
Call Snooping
Call Transfer
Call Waiting
Caller ID
Caller ID Blocking
Caller ID on Call Waiting
Calling Cards
Conference Bridging
Database Store / Retrieve
Database Integration
Dial by Name
Direct Inward System Access
Distinctive Ring
Distributed Universal Number Discovery (DUNDi™)
Do Not Disturb
E911
ENUM
Flexible Extension Logic
Interactive Directory Listing
Interactive Voice Response (IVR)
Local and Remote Call Agents
Macros
Music On Hold
Music On Transfer:
– Flexible Mp3-based System
– Random or Linear Play
– Volume Control
Predictive Dialer
Privacy
Open Settlement Protocol (OSP)
Overhead Paging
Protocol Conversion
Remote Call Pickup
Remote Office Support
Roaming Extensions
Route by Caller ID
SMS Messaging
Spell / Say
Streaming Media Access
Supervised Transfer
Call Features
Talk Detection
Text-to-Speech (via Festival)
Three-way Calling
Time and Date
Transcoding
Trunking
VoIP Gateways
Voicemail:
– Visual Indicator for Message Waiting
– Stutter Dialtone for Message Waiting
– Voicemail to email
– Voicemail Groups
– Web Voicemail Interface
Zapateller
Computer-Telephony Integration
AGI (Asterisk Gateway Interface)
Graphical Call Manager
Outbound Call Spooling
Predictive Dialer
TCP/IP Management Interface
Scalability
TDMoE (Time Division Multiple over Ethernet)
Allows direct connection of Asterisk PBX
Zero latency
Uses commodity Ethernet hardware
Voice-over IP
Allows for integration of physically separate installations
Uses commonly deployed data connections
Allows a unified dialplan across multiple offices
Speech
Codecs
ADPCM
G.711 (A-Law & ?-Law)
G.719 (pass through)
G.722
G.722.1 licensed from Polycom®
G.722.1 Annex C licensed from Polycom®
G.723.1 (pass through)
G.726
GSM
iLBC
Linear
LPC-10
Speex
VoIP Protocols
Google Talk
H.323
IAX™ (Inter-Asterisk exchange)
Jingle/XMPP
MGCP (Media Gateway Control Protocol
SCCP (Cisco® Skinny®)
SIP (Session Initiation Protocol)
UNIStim
Traditional Telephony Protocols
E&M
E&M Wink
Feature Group D
FXS
FXO
GR-303
Loopstart
Groundstart
Kewlstart
MF and DTMF support
Robbed-bit Signaling (RBS) Types
MFC-R2 (Not supported. However, a patch is available)
ISDN Protocols
AT&T 4ESS
EuroISDN PRI and BRI
Lucent 5ESS
National ISDN 1
National ISDN 2
NFAS
Nortel DMS100
Q.SIG
9 VoIP Basics: Converting Voice to Digital Form
Are you interested in Voice over IP? Would you like to know more about its background? This text begins a series that should shed some light on it.
Let’s start with the beginning. VoIP sends digitized voice across computer networks. So how do we convert voice to the digital form?
When converting an analog signal (be it speech or another noise), you need to consider two important factors: sampling and quantization. Together, they determine the quality of the digitized sound.
· Sampling is about the sampling rate — i.e. how many samples per second you use to encode the sound.
· Quantization is about how many bits you use to represent each sample. The number of bits determines the number of different values you can represent with each sample.
Figures 1 and 2 shows the idea of sampling — Figure 1 is the original analog signal, while Figure 2 shows the digitized form as a sequence of discrete samples.
Figure 1: Analog signal |
Figure 2: Digitized signal |
10 Quantization
As mentioned above, quantization is about how many bits you use to represent individual sound samples. In practice, we want to work with whole bytes, so let’s consider 8 or 16 bits.
With 8-bit samples, each sample can represent 256 different values, so we can work with whole numbers between -128 and +127. Because of the whole numbers, it is inevitable that we introduce some noise into the signal as we convert it to digital samples. For example, if the exact analog value is “7.44125”, we will represent it as “7”. As we do this with each sample in the sequence, we slightly distort the signal — inject noise, in other words.
It turns out 8-bit samples do not result in a good quality. With only 256 sample values, the analog-to-digital conversion adds too much noise. The situation improves a lot if we switch to 16-bit samples as 16 bits give us 65536 different representations (from -32768 to +32767). 16-bit samples are what you will find on a CD and what VoIP codecs use as their input.
11 Sampling
Now that we have decided what sample size to use (16 bits), let’s look at sampling rates. The table below shows three frequently used sampling rates:
Type | Transmitted Bandwidth | Sampling Frequency |
Telephone Speech | 300-3400 Hz | 8 kHz |
Wide Band Speech | 50-7000 Hz | 16 kHz |
CD quality audio | 20-20000 Hz | 44.1 kHz |
With VoIP, you will most frequently encounter the sampling rate of 8 kilohertz. The frequency of 16 kHz can be used now and then in situations when a higher quality audio is required (with proportionally higher Internet bandwidth consumption).
The choice of sampling frequencies for the individual types of audio is not random. There is a rule (based on the work of Nyquist and Shanon) that the sampling frequency needs to be equal or greater than two times the transmitted bandwidth. Figures 3 and 4 show why this is required.
Figure 3 |
In Figure 3, the sinusoid represents the original analog sound. The large black dots are where we read our samples. Note that we take two samples in each period, i.e. the sampling rate is two times the frequency of the sound. This is the absolute minimum that will allow us to reconstruct a signal that is still comprehensible. It certainly won’t be a hi-fi sound but it will have the correct frequency – see the thin black lines in the picture.
The Figure 4 shows a situation where we take less than two samples per period. The thin black lines show what would happen after we feed the samples into a digital-to-analog converter — we would hear something different from the original, a sound with lower frequency. This problem is known as “aliasing” since the lower frequency appears to be an “alias” to the original correct one.
12 VoIP Protocols: Introducing SIP
The Session Initiation Protocol (SIP for short) is a Voice over IP protocol designed by the Internet Engineering Task Force. SIP was created by the MMUSIC group of the IETF (MMUSIC stands for Multi-party Multimedia Session Control). Formally, the protocol is intended for creating, modifying and terminating sessions with one or more participants. The sessions are mainly VoIP telephone calls or conferences.
The first version of SIP was published in 1999 in RFC2543 with the two main authors being Mark Handley and Henning Schulzrinne. The standard was updated to version 2.0 in 2002 with RFC3261 and naturally there were many subsequent updates and extensions (RFC3265, RFC3853, RFC4320, RFC4916, RFC5393, RFC5621, RFC5626, RFC5630).
13 SIP Characteristics
Unlike H.323, SIP is a text-based protocol. The formatting of SIP requests and responses is based on HTTP version 1.1. Endpoints that communicate using SIP use the following three protocols:
SIP itself, used to establish and terminate the session; Session Description Protocol (SDP for short, RFC2327, obsoleted by RFC4566), used to exchange information about audio/video channels. Like SIP, SDP is also a product of the IETF’s MMUSIC group; RTP, used to send the real-time streams of audio or video across the network.SIP messages are exchanged between endpoints in transactions. A transaction consists of a request and the related response or responses. The messages that belong to the same transaction share the same transaction ID. This ID is called CSeq in SIP. Each transaction should have a unique CSeq number, with only a single exception: the ACK message (ACK for “acknowledge”) uses the same CSeq number as the transaction which it applies to.
SIP can use either UDP or TCP as the underlying transport protocol. Originally (in RFC2543), UDP was the only mandatory option. According to RFC3261 from 2002, all endpoints must be able to send SIP messages over both UDP and TCP. Still, UDP is the more frequently used option. When communicating over TCP, two modes are possible: either the same TCP channel is used for all transactions of a session or a new TCP connection is established for each individual transaction.
13 The SIP Protocol
The Session Initiation Protocol (SIP) is a protocol for establishing real time communication sessions with one or more participants. It’s most frequently used for Voice communications but it can handle video as well, as well as future applications. SIP was designed to be independent of the transport layer, i.e it can work on UDP, TCP or STCP. All voice/video communications take place via another protocol, usually RTP.
There are many RFCs surrounding SIP, but the most important one is RFC 3261
SIP is a text based protocol that looks and acts very much like the HTTP protocol. The original designers (Henning Schulzrinne& Mark Handley) wanted to make a protocol that had its roots in the IP world, rather then in the telecoms world. Sip has been an amazing success, beingthe major driver in the adoption of VOIP and Computer Telephony in recent years. All major manufacturers have adopted the standard and availability of SIP software, SIP hardware and Sip service providers is widespread.
Sip servers are responsible for setting up the calls between Sip devices. SIP servers usually combine several of the SIP server functions such as SIP proxy and SIP register into one piece of software. 3CX Phone System is both SIP proxy, a SIP registrar as well as a media server in order to handle real time voice communications as well.
14 Registration
Before we describe the flow of a typical SIP call, let’s have a look at how SIP user agents register with a SIP registrar. The example below shows a situation where an SIP softphone (namely, the Ekiga client) registers with an Asterisk PBX. The Asterisk’s IP address is 10.10.1.99, while the client is at 10.10.1.13 and wants to register the telephone number 13.
In order to register, the SIP telephone needs the send the REGISTER request:
SIP registration, phase 1
The registrar server will immediately reply with the provisional response “100 Trying”. This indicates that the request has been received (and thus the client does not need to retransmit it) and that it is being processed. While processing the request, the registrar discovers that the user agent needs to authenticate. It therefore responds with “401 Unauthorized”. For the user agent, this means that it has to send the REGISTER request once more, this time providing authentication.
Let’s have a look at the detail of the messages. This is the text of the register message:
REGISTER sip:10.10.1.99 SIP/2.0
CSeq: 1 REGISTER
Via: SIP/2.0/UDP 10.10.1.13:5060;
branch=z9hG4bK78946131-99e1-de11-8845-080027608325;report
User-Agent: Ekiga/3.2.5
From: <sip:13@10.10.1.99>
;tag=d60e6131-99e1-de11-8845-080027608325
Call-ID: e4ec6031-99e1-de11-8845-080027608325@vvt-laptop
To: <sip:13@10.10.1.99>
Contact: <sip:13@10.10.1.13>;q=1
Allow: INVITE,ACK,OPTIONS,BYE,CANCEL,SUBSCRIBE,NOTIFY,REFER,MESSAGE,
INFO,PING
Expires: 3600
Content-Length: 0
Max-Forwards: 70
We probably do not need to show the “100 Trying” response. The text of the “401 Unauthorized” message is as follows:
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 10.10.1.13:5060;
branch=z9hG4bK78946131-99e1-de11-8845-080027608325;
received=10.10.1.13;rport=5060
From: <sip:13@10.10.1.99>;
tag=d60e6131-99e1-de11-8845-080027608325
To: <sip:13@10.10.1.99>;tag=as5489aead
Call-ID: e4ec6031-99e1-de11-8845-080027608325@vvt-laptop
CSeq: 1 REGISTER
User-Agent: Asterisk PBX
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER,
SUBSCRIBE, NOTIFY
Supported: replaces
WWW-Authenticate: Digest algorithm=MD5, realm="asterisk",
nonce="343eb793"
Content-Length: 0
In the “401 Unauthorized” response, the important header is WWW-Authenticate:. It instructs the client to authenticate using the digest authentication (RFC2617). The nonce (a short for “number used once”) parameter is a “challenge string”. The client will combine the challenge string with the user’s password and compute the MD5 hash of the resulting string. The server will compute its own hash using the same method and compare it with the MD5 hash provided by the client. The digest authentication is the most frequently used method because the password is never sent over the network in plain text. The “basic” authentication has been deprecated in SIP 2.0 as it is insecure (sending a password in plain text is generally a bad idea).
Once the client computes the MD5 digest, it will re-send the REGISTER request. The message will look like this:
REGISTER sip:10.10.1.99 SIP/2.0
CSeq: 2 REGISTER
Via: SIP/2.0/UDP 10.10.1.13:5060;
branch=z9hG4bK32366531-99e1-de11-8845-080027608325;rport
User-Agent: Ekiga/3.2.5
Authorization: Digest username="test13", realm="asterisk",
nonce="343eb793", uri="sip:10.10.1.99", algorithm=MD5,
response="6c13de87f9cde9c44e95edbb68cbdea9"
From: <sip:13@10.10.1.99>;
tag=d60e6131-99e1-de11-8845-080027608325
Call-ID: e4ec6031-99e1-de11-8845-080027608325@vvt-laptop
To: <sip:13@10.10.1.99>
Contact: <sip:13@10.10.1.13>;q=1
Allow: INVITE,ACK,OPTIONS,BYE,CANCEL,SUBSCRIBE,NOTIFY,REFER,
MESSAGE,INFO,PING
Expires: 3600
Content-Length: 0
Max-Forwards: 70
The registrar server will again first respond with “100 Trying” and then compare the two MD5 hashes (the one provided by the client with the one computed by the registrar itself). If they match, the registrar will respond with “200 OK” and insert the endpoint to the location database. The database is usually shared between the registrar and the proxy server so that the proxy can use it connects calls.
The figure below shows the message exchange:
SIP registration, phase 2 |
The response “200 OK” contains one important parameter, Expires. It tells the client that the registration will expire after the given number of seconds and the client will be required to register again.
Call Flow
Let us now have a look at a typical SIP call. We will consider a scenario with a SIP proxy server involved. Suppose a user at the SIP telephone with number 121 dials the number 122. The following will happen:
1. The user agent in telephone 121 does not know the IP address of 122. But it knows the IP address of the SIP proxy (suppose this address is 10.10.1.99). The user agent will compose an INVITE request and send it to the proxy. The To:header of the request contains the SIP URI <sip:122@10.10.1.99>. The body of the INVITE request carries an SDP (Session Description Protocol) message providing the parameters (codec, IP address, port) the called party will need to send its RTP stream to the caller. See the previous section for an <href=”#invite_example”>example of the INVITE request.
2. The SIP proxy immediately responds with “100 Trying” and then forwards the INVITE request to the target telephone. The proxy server adds one Via: header to the message. As<href=”#location_service”>mentioned before, the SIP proxy has access to the location database and thus knows the IP addresses of all registered telephones (the simplest implementation of this is such that the registrar server and the proxy are the same application).
Steps 1 and 2 are shown in Figure A below.
Figure A |