Commodity Internet Utilization
Oversubscription of the commodity Internet connection continues to be a problem faced by the University. Currently, fixed rate limits control the amount of traffic exported to the PSC/NCNE GigaPOP. Prudence dictates a periodic review of the status of this link and identify potentially better solutions. In this case, better solutions would enable bandwidth control, including aggressive control of low priority traffic, while providing high quality connectivity for academic uses. As demand rises, the existing bandwidth limits will be increasingly unable to provide reasonable connectivity. Deployment of advanced quality of service (QoS) controls is recommended to solve this problem.
Currently, Carnegie Mellon has two Internet connections through the GigaPOP. The first, a low-cost connection to the Internet2 Abilene network, provides high speed connectivity to other Universities and research companies. Additionally, through this connection we have high bandwidth to other GigaPOP members (UPitt, PSU) and the Pittsburgh Exchange, which connects other local companies (Stargate, Telerama, Nauticom). We colloquially refer to all this traffic as “I2”. The bandwidth on this connection is virtually unlimited.
The other connection is the costly commodity Internet (I1) connection. Any traffic that cannot reach its destination using I2 bandwidth must take this path. Currently we purchase 50Mbps of I1 connectivity through the GigaPOP, though we are allowed to transmit approximately 75Mbps. Historically, the GigaPOP takes the proceeds from the various member institutions and purchases the greatest amount of I1 bandwidth possible, then carves the bandwidth among the GigaPOP members.
To fully understand the rationale behind quality of service controls, let us first review the problems they are designed to solve. Specifically, traditional Ethernet-and-IP network elements (routers and switches) perform well in lightly-loaded situations, but problems arise as traffic increases. For example, shared-access Ethernet rapidly degrades in throughput after about 30% utilization. In this situation, the number of collisions and retransmissions grows quickly, reducing the network efficiency.
While the issues of half-duplex shared-Ethernet are not a problem on full duplex core network links, oversubscription of links, especially outbound links, is a problem. When machines try pushing more data than administratively allowed, routers are forced to begin queuing packets or eating into burst-buffer space. As the link utilization increases dramatically beyond the limit, constant over-subscription and queuing become the norm. Queuing has limits, however, as does user patience. While a large router with nearly infinite queues might guarantee packet delivery, most users are not willing to wait several days. Indeed, infinite queuing is not a solution. Most routers implement a ‘tail-drop’ system, where packets are simply dropped as the queue limit is reached. In this scenario, high-rate connections can overpower low-rate connections (such as simple web browsing) by simply sending or receiving more data.
The solution, then, is to configure quality of service controls. These types of mechanisms are available in dedicated packet shaping units, as well as modern routers (though generally to a lesser extent). The controls provide administrators with far greater control of traffic than basic queuing routers. Different types of traffic can be categorized and grouped, and specific policies applied to the groups. Categorization of traffic happens by inspecting the packets and matching on certain packet elements. This can include the source or destination IP address, source or destination TCP/UDP port numbers, or pre-defined quality of service tags. Today, categorization engines are also looking ‘deeper’ into packets, inspecting application layer content such as HTTP URLs and peer-to-peer filenames.
After categorization, administrators can then apply specific policies to the different groups of traffic. Standard policies that can be implemented include giving certain traffic a higher priority (and thus access to the bandwidth) than other traffic. Alternatively, a policy might grant a specific amount of bandwidth to traffic in a certain group, or a certain amount of bandwidth to each unique connection (for example, giving each voice-over-IP connection a specific bandwidth allocation).
The expectation, in categorizing and policing the traffic, is a reduction in tail-drop queue enforcement on the router. Advanced products, like the packet shapers, use different methods to manipulate TCP’s built-in flow control mechanisms, slowing down lower priority traffic. This causes some connections to begin flowing at slower speeds than they would otherwise and reduce congestion at the router.
The subtext, however, is that the quality of service mechanisms are most effective when specific policy is encoded. Packet shapers can use their rate control mechanisms to reduce congestion at the router, but in general this will continue to cause high-rate connections to dominate low-rate connections. At minimum, users will continue to receive sub-par bandwidth for what they consider “important” uses (typically, interactive applications such as web browsing). Consequently, creating policies that prioritize important connections, while slowing those that do not require immediacy, can reduce the consumed bandwidth while making the network appear more responsive.
The Catalyst 6500 platform, which is currently deployed as the campus backbone, has some basic QoS controls. The 6500s can classify traffic similar to the packet shaping devices, with some exceptions. For example, they are unable to look “deep” into packets to determine the application-layer protocol in use. Once classified, specific rate limits can be applied either as ‘microflow’ rates or ‘aggregate’ rates, or both. Microflow rates limit the rate a single flow can consume, while aggregate rates limit the rate that all traffic of the particular type can consume.
Unfortunately, the 6500 platform cannot perform complicated traffic shaping, prioritization, or queuing. Instead, it uses a simple tail-drop system. Traffic exceeding the rate limit is simply dropped, regardless of the administrative priority of the data. When packets are dropped, TCP responds by scaling down the transfer rate and retransmitting the packet. While some amount of dropping is expected in any network, excessive dropping causes a dramatic increase in the retransmission rate, further reducing the network efficiency.
Currently, the
QoS policy on our commodity Internet connection
specifies explicit rates for dormitory and non-dormitory traffic (25Mbps and
55Mbps, respectively). Today, we are hitting these limits twenty-four hours a
day. Interestingly, only a week ago we saw a daily drop in the overall
utilization from
There are many possible solutions to the ongoing increase in bandwidth demand. These strategies, as well as a summary of the approximate costs, are outlined below. In short, we could continue to run in the existing configuration, however there is a concern that performance will begin to rapidly decrease as the academic year progresses. Implementing different quality of service mechanisms with existing hardware is possible, but the hardware support is very rudimentary. More complicated QoS policies would today require coarse trade-offs in bandwidth allocation.
Recently, we’ve had the opportunity to evaluate packet shaping devices from Packeteer and Allot Communications. These products are single network elements dedicated to providing Quality of Service controls for network bandwidth. They implement proprietary rate-control mechanisms to encourage automatic scaling of bandwidth utilization by the end machines. Advanced QoS controls are provided, such as class-based queuing, traffic prioritization, and fair bandwidth allocation. Classification of network traffic is also implemented with advanced application layer parsing. For example, the current packet shaping devices can extract information such as the URL being access from web traffic.
Both
products were evaluated in similar testing environments. Complete results of
the testing are documented in Quality of Service Policy Enforcement Review[1].
The result of the testing is that the Allot NetEnforcer
unit was most suited to our environment, providing an excellent implementation
of quality of service controls. Additionally, it contains an intuitive
configuration and monitoring interface. The purchase of a suitable NetEnforcer is expected to cost approximately $40,000,
including software licenses and incidental implementation costs. A yearly
maintenance contract would also be required.
There has been ongoing discussion about the suitability of the Catalyst 6500 platform for its use as our border/external gateway router. The reason for the uncertainty is the fundamental design of the C6500 as a network switch with add-on routing capabilities, rather than being designed as a true backbone router. Because of this, it lacks many of the advanced queuing and congestion avoidance mechanisms of full routing platforms. Today, both Cisco and Juniper (leading router manufacturers) offer specialized quality of service mechanisms. Support for application layer classification is in preliminary stages (but often bounded at low bandwidth limits).
Migrating to a new routing platform would be a substantial change in the network, require extended investigation, planning, testing, and deployment. It would also require a substantial capital investment; we estimate a new egress router may cost as much as $70,000, depending upon the path taken (upgrade to existing hardware versus replacement) and possible trade-in. Converting to a new platform would enable us to get a head start on the eventuality of router replacement. However, given the relatively high cost, long lead time, and potential for wanting different hardware in the future, moving to a new platform at this time would likely be more complex and costly than other options.
At present, the yearly cost to Carnegie Mellon for NCNE GigaPOP bandwidth is $250,000 for 75Mbps. This translates to about $275/mo/Mbps. In approximate terms, if we were to contract for bandwidth from tier-1 access providers, the cost would likely drop to between $175 and $225 per month per Mbps. Not surprisingly, there are several disadvantages to following this route. Namely, we would continue to need a high speed connection to the GigaPOP for our I2 bandwidth. Additionally, we would have to consider the needs for redundancy of the I1 connection, which could raise the monthly cost to one similar with our current costs. Upfront port and connection charges would apply, as would monthly line charges.
|
Strategy |
Initial
Cost |
Yearly
Cost |
|
No change |
$0 |
$0 |
|
Packet shaper |
$40,000 |
$5,000 |
|
Routing platform |
$70,000 |
$10,000 |
|
Additional bandwidth (GigaPOP) |
$0 |
$3,000/Mbps |
|
Additional bandwidth (Direct) |
$30,000 |
$2,700/Mbps |
There are a number of solutions to the growing demand for commodity Internet bandwidth. The time has come to purchase a dedicated packet shaping device to enforce certain quality of service policies on our outbound Internet connection. While we might expect routers to support similar levels of control within two-to-four years (and partial support today), having a dedicated shaping unit in the network provides several advantages. It can be configured and maintained separate from the routing platform, and the interfaces are highly streamlined for policy configuration and monitoring. The current class of products includes modules for detailed accounting of network traffic, which could be very useful in billing situations.
We believe that the current QoS rate-limits will rapidly degrade in effectiveness as
the academic year progresses, assuming no change in the basic parameters
(bandwidth available). Deploying a packet shaper would enable us to set
bandwidth priorities and improve user experiences while maintaining or reducing
the bandwidth utilization. It would additionally provide fair allocation of the
bandwidth resources amongst the users. In short, installing a packet shaper is
the best solution for maintaining our outbound I1 connection.
[1] P.
Hill, K. Miller, K. Trivedi. “Quality
of Service Policy Enforcement Review.” http://www.net.cmu.edu/groups/netdev/packets.
September 2002.