Volume accounting for fun and profit

1. Introduction

Well, the profit is debatable, and it isn't always fun, but it is interesting and useful.

As networks rather then individual computers get internet connections, it gets important to know how your bandwidth is being used. In fact, even if the network is entirely separate from the outside world, it can be useful to know what's going on.

I'm not going to discuss using ipchains and logging to achieve this, I'd rather concentrate on a much more scalable solution - Netflow. Until the RFCs detailing Real Time Flow Metering are finalised and implemented by hardware manufacturers, you must have Cisco switches and routers available for this to work.

Netflow is a protocol which allows suitable hardware to summarise details of the traffic that it processes, and to send these summaries to a defined place. By place, I mean a combination of IP address and port. So I can configure my 7200 to send packets holding flow summary records to port 9999 on IP address 192.168.255.254 where they will arrive as a UDP stream.

Netflow uses UDP for reporting to minimise the load on the router. After all, the summaries are classed as management information, and are rightly considered to be less important than the work the router was designed for - moving data between interfaces as quickly as possible. Of course it does mean that if either the router itself or the network between the router and the collection point is congested you're going to lose data, and if that happens it's lost forever.

2. Set up

Any Cisco router running an IOS version later than 11.3 will be able to generate Netflow data, as will those switches which understand IP. The commands to activate collection and forwarding of Netflow data are simple:


   login
   enable
   configure terminal
   interface fastethernet 0/0
   ip route-cache flow
   ip flow-export <dest IP> <dest port> <src IP> <version>

where:

   dest IP
     IP address that Netflow summaries are sent to
   dest port
     Port number that Netflow summaries are sent to
   srcIP
     IP address used as the source of Netflow summaries
   version
     Which Netflow version to use (normally version 5)

A router with this configuration will send flow summary data when each flow is complete, which might not always be what you want. With normal web traffic, collecting the page itself is usually completed fairly quickly, so the issue doesn't arise. Downloading your Debian ISOs however, will take some time, even with ADSL, and you might not want a single summary record showing a 700 Mb transfer. To force export of a summary every fifteen minutes :


   ip flow-cache timeout active 15

For short connections, this will make no difference, but for long (potentially multi-hour) connections, a summary will be generated every fifteen miinutes.

There are a couple of important points about Netflow :

3. Netflow Packets

The formal description of the various Netflow packet formats can be found here.

In general, a Netflow packet has a header followed by a number of flow records. Routers will try to fill a packet with as many flow records as possible, which is thirty. Not all of the fields in the header are immediately useful, but some worth extracting are :

We will revisit a couple of these later.

4. Reading Netflow Data

Now lets look at some code to receive the summaries.

First, open the socket and then make the UDP buffer as big as possible. Resizing the buffer won't always be necessary, but when the collection process is part of a complex network, or one which has a lot of traffic it helps:


   my $Port = 9999;
   my $Socket = IO::Socket::INET -> new (
   		Proto => 'udp',
   		LocalPort => $Port);
   if (! defined $Socket) {
   	&LogMessage ("Open_Socket on UDP $Port failed ($!)");
   	exit;
   }
   #
   # Maximise the buffer size
   #
   my $BufferSize = $Socket -> sockopt (8);
   foreach my $i (2, 4, 8, 16, 32) {
   	$Socket -> sockopt (8, $BufferSize * $i);
   }
   $BufferSize = $Socket -> sockopt (8);

Now go into a loop reading packets and extracting flow records:

   #
   # Some constants.
   #
   my $Offset     = 24;
   my $FlowLength = 48;

   my $Summary;
   #
   # Read data and see where it came from
   #
   while ( 1 ) {
      my $Data;
      my $IPs;
      my $OK = $Socket -> recv ($Data, 8192, 0);
      if ($OK) {
         $RemoteAddress = (unpack_sockaddr_in ($OK))[1];
         $IPs = inet_ntoa ($RemoteAddress);
         my ($NetflowVersion, $NumberOfFlows) = unpack ('nn', $Data);
         next unless ($NetflowVersion == 5);

         my $N = 0;
         while ($N++ < $NumberOfFlows) {

            my $srcIP    = join ('.', unpack ('C4', substr ($Data, $Offset, 4)));
            my $destIP   = join ('.', unpack ('C4', substr ($Data, $Offset + 4, 4)));
            my $srcif    = unpack ('n', substr ($Data, $Offset + 12, 2));
            my $destif   = unpack ('n', substr ($Data, $Offset + 14, 2));
            my $Bytes    = unpack ('N', substr ($Data, $Offset + 20, 4));
            my $srcport  = unpack ('n', substr ($Data, $Offset + 32, 2));
            my $destport = unpack ('n', substr ($Data, $Offset + 34, 2));
            my $protocol = unpack ('C', substr ($Data, $Offset + 38, 1));
   #
   # Do something with the data !
   #
            $Summary -> {$destIP} -> {$destport} -> {$srcIP} += $Bytes;
            $Offset += $FlowLength;
         }
      }
   }

Of course you need to actually do something with the contents of the hash that's being built up, but that can be the core of a simple Netflow collector.

For an environment with hundreds of users and hundreds of megabytes of traffic to be measured, performance starts to be an issue. Then you need to look at dedicated servers to collect the data, and a database to store it. You may also need to identify users rather than machines, and for that you may need to timestamp the incoming data and match against radius or DHCP lease data.

As you might expect, the core of a Netflow collection system is straightforward - it has to be because Netflow data itself is simple. The support environment in which the collection process runs is often considerably more complex as the number of records increases.

5. Dropping Data

To help keep track of the system's accuracy, it is useful to extract the sequence number from the incoming packet. This can be done by :


	$Sequence = unpack ('N', substr ($Data, 16, 4));

The sequence number given is the sequence number in the previous packet from this router plus the number of flows in that packet. Careful management of this will tell you if packets are dropped by the collection process. You can't do anything about it of course (remember these are UDP), but it can be useful as a warning that the server is geting overloaded, or that the collector scripts need tweaking.

6. The Router Reboots

As we discovered earlier, the header from the Netflow packet will also indicate that the sending router has rebooted.


	$RebootTime = unpack ('N', substr ($Data, 4, 4));

This may be useful, depending on the complexity of your network.

The interface numbers extracted from the individual flow records shown above are SNMP indexes. Normally, unless the hardware in the router is changed, the SNMP index for each interface isn't going to change. However, for completeness, on detecting a reboot it is safer to reread the SNMP data for the router to confirm that the interface numbers are the same. Even if they have changed, it may not be relevant of course. Although it may be useful to know that data passing through interface number two is external, but all the others are internal.

7. Contact me

Email me