Pathfinder 0.1 -------------- (If you want to skip all the conceptual crap go to the end where there is a concrete example of output from a currently running PathFinder monitoring 14000 hosts and 30000 routers). I was fascinated by the idea of monitoring the connectivity on the Internet. This involved keeping track of routes and the responsiveness of routers on the internet. The internet has a large number of hosts so one of the questions that we considered was if it would be possible to monitor large quantities of routers (>10k) and what kind of hardware would enable us to do that. The result of the following tests is that an old 1U Telemetrybox at the Colo can be used to monitor more than 100.000 routes (300.000 routers) without a major impact on the system performance of the box. Router / Host Ratio: It was said that the average route to a host on the internet is 10 routers. It turns out that many routes share the same routers. The actual ratio that I observed was that a new host on the average added 3 new routers that needed to be monitored. Our initial project that we would need to monitor 100k routers for tracking paths to 10k hosts is wrong. We only need to monitor about 30k routers for that. Design: I took the sources for the MTR tool which is frequently used to monitor paths on the internet and used some pieces to build a new tool that I called "Pathfinder" which has been designed to scale to huge quantities of hosts to be monitored. Characteristics: - Small footprint (13K executable) - Simplification by omitting DNS lookups (can be added later by programs showing the data). A script can be used to translate hostnames to ip addresses before feeding them to PathFinder. - Simplification by using ascii file for configuration and logging. Database can load ascii data or it can be integrated later. The ascii format is suitable for direct import into MySQL. - One process does it all and schedules pings as well as manage the data. This simplifies implementation. It also means that adding additional processeors will do no good. The process is essentially I/O bound anyways. Using netsaint to schedule operations at this scale is suicide. So the tool needs to schedule on its own. - Prime Hash for all search operations to match IP addresses minimizes CPU resource use. - Statistics are kept in RAM from raw data. Statistics can be dumped in configurable intervals as to keep the load on a database or on storage as low as needed. The more storage is available the higher the volume and the more detail of the information obtained by Pathfinder. Calculations show that 100k routes / 10k hosts would consume less than 6 MB of RAM which is significantly lower than a Webbrowser (24-40Mb RAM). - Network bandwidth usage is settable by configuring the number of pings per second to perform - The maximum hitrate per second of hosts to be monitored can be set so that routers are not flooded with traffic. This also redirects pings away from the routers on the common path which are hit any number of times. - Traces path changes (Could be used for alarms if connectivity problems develop but not yet done) Network Bandwidth Use: ---------------------- PathFinder uses 60 byte ping packets. The number of pings can be configured with the --rate option. Default is 100 pings per seconds which results in a bandwidth use of around 10kbyte/sec. Possible configurations Rate Bandwidth 1 Not measurable 100 10kb/s (default) 1000 100kb/s This setting would use more than half of a T1 line. 10000 1Mbyte/sec Uses all Bandwidth that I am allowed to use at the Colo (10Megabit/sec) and is the highest setting I have tried. Tests were done on a 433 Celeron 128Ram at Level 3 which showed no moticable impact on system performance at any level. At 10000 pings per second the cpu use was at 70% but the system was still responsive. At 1000 pings/second the cpu use was 35% but idletime was still > 80%. Most of the CPU time was spend in System mode which indicates that the issue here is the Network Interface Card and the speed at which data gets back and forth to it. A better NIC might improve things. Memory use: ----------- With around 9000 hosts 23000 routers to monitor the process used 4.6 Megabyte of RAM. Resource use Per monitored path 140 byte Per router 50 byte 100K routers would take 5 Megabyte / 10K paths 1.4 Megabyte which is less than 6 Megabyte. Disk Space: ----------- For each sample taken: Per monitored path 180 Per router 40 A data dump for 100k routers and 10k hosts would take 2 Megabyte for the path data and 4 megabyte for the router performance data. The data dump to ascii files is configurable via the --dump-interval option. The following capacities are needed to store historical data. A shorter interval allows to analyze more detail for reports. Interval Per 5min Per Hour Per Day 5min 6M 100M 2.5G 1hr - 6M 200M 24hr - - 6M Adding compression gets these things down to 1/3rd size. Data is stored in a format that allows easy compression without having to get to another format. Summary sets can be stored in the same structure. Data is stored in ascii records separated by newlines. Fields are separated by tabs. paths-* structure: 1. Target IP 2. Number of times this route was taken 3. Number of Hops 4. List of routers traveled through (up to 30) routers-* structure: 1. Router IP 2. Number of pings sent to the IP 3. Number of pings returned 4. Best response time in usec 5. Average response time in usec 6. Worst response time in usec Router impact: -------------- Monitoring 100000 routers at 1000 pings per second means that a sample of each host is taken on the average every 2 minutes. This is a minimal impact on the system surveyed. The 5 minute statistics would not be very useful wince they would only include 2-3 samples. One hour data dumps would result in 30 samples which would be better. If a 5 minute resolution is needed then the hits per second could be increased to 10000 pings per second but then the 1U system needs to be dedicated to that task. Every host would be visited on average around 10-20 seconds giving us 20-30 samples within 5 minutes. The network bandwidth use is so high at 10000 pings per second though that I would suggest talking with our connectivity providers first. Sample Sessions on slave3.openrock.net -------------------------------------- I had some problem getting enough IP addresses for this test. I started with all the public debian mirrors around the world (187) but that did not get me very far. I tried network solutions ftp site for the .com zone but the .com .net and .org zone had been removed because of spammers abusing that stuff. I found the zone file for the .edu zone though. I ran a variety of pattern on the zonefile to extract all sorts of IPs from it. The DNS resolution took about 8 hours but at the end I had 14052 IP addresses. slave3:/home/christoph# pfd --rate=1000 --write-interval=5 --daemon --floodrate=1 --- Reading Hostlist from /etc/pathfinder.list net_add_netpath: duplicate IP 157.201.100.203 net_add_netpath: duplicate IP 157.201.100.202 net_add_netpath: duplicate IP 216.125.103.10 --- Starting to monitor 14052 paths. Ping Interval=1.00ms Flood Delay=1000.00ms Data Dump=5.00min --- Dumping Routing Path Data to 14052 hosts into /var/log/pathfinder/paths-976410738 --- Dumping Router Data for 29838 routers into /var/log/pathfinder/routers-976410738 --- Dumping Routing Path Data to 14052 hosts into /var/log/pathfinder/paths-976411038 --- Dumping Router Data for 29969 routers into /var/log/pathfinder/routers-976411038 For the 14052 routes we are monitoring we need to monitor around 30000 routers. christoph@slave3:~$ ls -l /var/log/pathfinder -rw-r--r-- 1 root root 2814949 Dec 9 17:12 paths-976410738 -rw-r--r-- 1 root root 2965116 Dec 9 17:17 paths-976411038 -rw-r--r-- 1 root root 982383 Dec 9 17:12 routers-976410738 -rw-r--r-- 1 root root 615031 Dec 9 17:17 routers-976411038 Around 3 Meg of pathdata and around 1Meg of router performance data each 5 minutes. TOP output: 5:09pm up 29 days, 1:50, 2 users, load average: 0.00, 0.00, 0.00 34 processes: 32 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 1.7% user, 4.3% system, 0.0% nice, 93.8% idle Mem: 127992K av, 120864K used, 7128K free, 25332K shrd, 61232K buff Swap: 128516K av, 368K used, 128148K free 37400K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 27261 root 19 0 5100 5100 528 R 0 5.1 3.9 0:03 pfd 27269 christop 3 0 1160 1160 688 R 0 0.9 0.9 0:00 top 1 root 0 0 380 376 320 S 0 0.0 0.2 0:05 init 2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd This shows that the process just uses 5% cpu and that the system is quite idle with 1000 pings/second. The process uses 5.1Megabytes for the data. No swapping or any other ilk happens. VMSTAT output: christoph@slave3:~$ vmstat 5 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 368 7040 61232 37400 0 0 0 0 2 6 0 0 14 1 0 0 368 7016 61232 37400 0 0 0 0 1825 1470 2 5 93 1 0 0 368 7000 61232 37400 0 0 0 1 1760 1332 2 4 94 1 0 0 368 6976 61232 37400 0 0 0 0 1834 1475 1 7 92 1 0 0 368 6968 61232 37400 0 0 0 0 1793 1382 1 4 95 1 0 0 368 6952 61232 37400 0 0 0 0 1767 1338 1 5 94 VMSTAT shows a cpu >90% idle. No swapping occurs. Its heavy on interupts. Next attempt using 10000 pings per second: ------------------------------------------ slave3:/home/christoph# pfd --rate=10000 --write-interval=5 --daemon --floodrate=1 --- Reading Hostlist from /etc/pathfinder.list net_add_netpath: duplicate IP 157.201.100.203 net_add_netpath: duplicate IP 157.201.100.202 net_add_netpath: duplicate IP 216.125.103.10 --- Starting to monitor 14052 paths. Ping Interval=0.10ms Flood Delay=1000.00ms Data Dump=5.00min --- Dumping Routing Path Data to 14052 hosts into /var/log/pathfinder/paths-976411606 --- Dumping Router Data for 30023 routers into /var/log/pathfinder/routers-976411606 TOP output: 5:22pm up 29 days, 2:04, 2 users, load average: 0.45, 0.16, 0.04 34 processes: 31 sleeping, 3 running, 0 zombie, 0 stopped CPU states: 12.6% user, 65.1% system, 0.0% nice, 22.1% idle Mem: 127992K av, 122004K used, 5988K free, 25348K shrd, 61232K buff Swap: 128516K av, 368K used, 128148K free 37676K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 27287 root 15 0 5828 5828 528 R 0 76.7 4.5 0:37 pfd 27288 christop 2 0 1164 1164 692 R 0 0.9 0.9 0:00 top 1 root 0 0 380 376 320 S 0 0.0 0.2 0:05 init 2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd We are pushing it here. 76% is used by PathFinder and the system is only 22% idle. VMSTAT output: christoph@slave3:/var/log/pathfinder$ vmstat 5 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 368 6132 61232 37676 0 0 0 0 3 6 0 0 15 1 0 0 368 6108 61232 37676 0 0 0 0 15247 2251 14 64 22 1 0 0 368 6084 61232 37676 0 0 0 1 15087 2404 10 67 23 1 0 0 368 6064 61232 37676 0 0 0 0 14975 2480 11 66 22 1 0 0 368 6036 61232 37676 0 0 0 0 14921 2413 12 64 24 1 0 0 368 6012 61232 37676 0 0 0 0 14479 2487 13 62 25 This confirms the resource situation. Interrupt use is way up. Note that there is no swapping since the whole stuff fits neatly into memory (<6Meg!!!). The limitation is the speed of the NIC. We can saturate a 10Mbit link with pings without a problem. We would need a NIC with higher intelligence (and less thirst for interrupts) to get more speed. I think there would not be an issue with monitoring even 10 times as much as done here. With 10000 pings per second we have a sample rate of 25-30 per data dump for 30k routers. For 100k hosts we would have 2-4 samples per 5 minutes of around 300k routers with 10000 pings per second. The data structures would grow ten fold up to 60 megabytes as would also the storage requirements. That could be done with a dedicated 1U box or even better with a modern PC. One million routers would be possible by using a 1U box and 1. Going to an hourly sample cycle (30 times as much data!) 2. Running at 10k pings/second (could even work at 1k/sec if sample are not to be too frequent!) 3. Having a machine with 500Meg of ram. PathFinder would need 180Megabytes of RAM. I hope this proves that monitoring lots of host would work.... Christoph Lameter, December 9th 2000