Network monitoring means measuring network parameters to create statistics and functional analysis.
More specifically network monitoring deals with three of the five functional areas defined by ISO/OSI Network management model: • Performance monitoring • Fault monitoring • Account monitoring Performance monitoring is used to predict network evolution and to determine actual behaviour. This functional area collects data thanks to specific instruments using protocols like SNMP, RMON,NETFLOW. Performance data monitoring is a fundamental step to find out problems before they create network failure. Fault monitoring is used to control network fault that may occur and is in charge of alerting the network administrator. Usually to achieve network devices fault monitoring SNMP protocol is used to alert the Network manager. Account monitoringcollects network usage statistics. This functional area is very important for organizations that bill network traffic.
This document will treat only the first two functional areas.
Network performance monitoring
To perform a good network monitoring action you must know what you want to measure. La IETF (Internet Engineer Task Force) has defined in RFC 1242 (Benchmarking Terminology for Network Interconnection Devices) metrics for network traffic monitoring like: • Data link size • Latency • Overhead behaviour • Overload behaviour • Throughput • Frame loss You can visit web sitewww.ietf.org/rfc.html for further information. The main metrics that are inPerformance Monitoringarea are: • Response time • Availability • Bandwidth • Throughput • Packet loss • Usage • Latency (one way and RTT) • Jitter
Response Time
It is measured in ms and is the time spent by a system to react to an input . For example a web server response time is the time it spends in answering an http request. Response time depends on the kind of service: for batch application it’s not important otherwise in interactive application must be as low as possible.
Availability
Availability is time percentage that an object is available to users. Availability is obtained by the following formula: Availability= MTBF/(MTBF+MTTR) Where: MTBF = Mean time between failure MTTR= Mean time to repair.
Bandwidth
Bandwidth describes the maximum data transfer rate of a network or Internet connection. It measures how much data can be sent over a specific connection in a given amount of time. For example, a gigabit Ethernet connection has a bandwidth of 1,000 Mbps, (125 megabytes per second). An Internet connection via cable modem may provide 25 Mbps of bandwidth. While bandwidth is used to describe network speeds, it does not measure how fast bits of data move from one location to another. Since data packets travel over electronic or fiber-optic cables, the speed of each bit transferred is negligible. Instead, bandwidth measures how much data can flow through a specific connection at one time.
Throughput
Throughput refers to how much data can be transferred from one location to another in a given amount of time. It is used to measure the performance of hard drives and RAM, as well as Internet and network connections. For example, a hard drive that has a maximum transfer rate of 100 Mbps has twice the throughput of a drive that can only transfer data at 50 Mbps. Similarly, a 54 Mbps wireless connection has roughly 5 times as much throughput as an 11 Mbps connection. However, the actual data transfer speed may be limited by other factors such as the Internet connection speed and other network traffic. Therefore, it is good to remember that the maximum throughput of a device or network may be significantly higher than the actual throughput achieved in everyday use.
Packet loss
Packet loss is the discarding of packets in a network when a router or other network device is overloaded and cannot accept additional packets at a given moment. Packets are the fundamental unit of information transport in all modern computer networks, and increasingly in other communications networks as well. The losses are usually due to congestion on the network and buffer overflows on the end-systems. Packet loss may or may not be disruptive to the recipient of the data, depending on the type of network service and the severity of the loss. With best effort services, packet loss is acceptable because recovery of the lost packets is handled by other services.
Latency
In a network, latency, a synonym for delay, is an expression of how much time it takes for a packet of data to get from one designated point to another. In some usages (for example, AT&T), latency is measured by sending a packet that is returned to the sender and the round-trip time is considered the latency. The latency assumption seems to be that data should be transmitted instantly between one point and another (that is, with no delay at all). The contributors to network latency include: • Propagation: This is simply the time it takes for a packet to travel between one place and another at the speed of light. • Transmission: The medium itself (whether optical fiber, wireless, or some other) introduces some delay. The size of the packet introduces delay in a round trip since a larger packet will take longer to receive and return than a short one. • Router and other processing: Each gateway node takes time to examine and possibly change the header in a packet (for example, changing the hop count in the time-to-live field). • Other computer and storage delays: Within networks at each end of the journey, a packet may be subject to storage and hard disk access delays at intermediate devices such as switches and bridges. (In backbone statistics, however, this kind of latency is probably not considered.)
Jitter
In voice over IP (VoIP), jitter is the variation in the time between packets arriving, caused by network congestion, timing drift, or route changes. A jitter buffer can be used to handle jitter.
Fault monitoring
This functional area is in charge of controlling network devices. Network devices can be grouped in 2 categories: a) Infrastructure device b) Basic service deviceInfrastructure device are fundamentals for network communication: Router e Switch. Basic service device delivers essential services like DNS, DHCP, Proxy,Firewall etc… Monitoring both category can be done using SNMP protocol or specific tcp or udp port polling. SNMP protocol can be used to monitor internal temperature, fan state, power supply etc… TCP polling helps to know if services are working. In fault monitor is important to define correct thresholdsthat causes the device an abnormal behavior or failure. When a threshold has been reached a group of administrator will be alerted to solve problem.
From theory to practice
Once determined metrics to measure and devices to monitor we must implement in a real world the above theory. Sentinet3 can do specific control on the element that must be monitored using SNMP protocol simply knowing the OID (Object IDentifier). There is much information that can be gathered using SNMP with Sentinet3like: a) CPU usage b) RAM usage c) Fan state d) Bit in e) Bit out f) Packet loss g) Packet drop h) Bandwidth Much network information can be calculated using the above parameter like available bandwidth. Other useful information to monitor fundamental services can be gathered using check_tcp orcheck_udp. To enable proactive monitoring correct threshold must be set in order to allow alerting module to send sms or email when they are reached. A proactive action can be done like service stop and restart, switch port disabling or enabling an alternative communication line .