I wrote a small program called "pathfinder" to analyze traceroute paths and router responsiveness on the Internet. It was set to do 1000 pings per second to more than 14000 hosts extracted from the .edu zone file. The program ran for one week from our Colo at Level 3. It was a exploration of 1. The scalability and robustness of the mySQL database used in the TBox/openROCK thing. 2. The limits on what kind of mapping and measurements that could be performed from a single 1U box. 3. Test to see what kind of information could be gathered on the Internet itself. Some numbers: Start Test: 2000-12-11 09:01:51 End Test: 2000-12-17 12:47:43 ICMP packets sent: 481 million (481152915) ICMP packets received: 474 million (474751156) Ping Success rate: 98.67% Nr. of Router Datasets: 2.6 million (2654629) Nr. of Path Elements: 37.3 million (37356030) Average response time: 147ms (147424.5766 us) Worst response time: 65 sec (65469816 us) Insertion of 480.000 rows into a database of 37million rows in 44minutes. Roughtly 10.000 rows per minute. Querying the database with 37 mio records demands patience. Any query of reasonable complexity (which usually means reviewing all 37mio records) takes around 15 minutes to complete. SQL Queries keep the HD transferring at full speed so its mostly the hardware that is limiting the queries. ICMP Statistics by day: +----------------+--------------+----------+-----------+------------------+ | Day | Worst (ms) | Avg (ms) | Sent (mio)| Successrate in % | +----------------+--------------+----------+-----------+------------------+ | Mon 2000-12-11 | 63.6 seconds | 191.2 | 49.83 | 98.95 | | Tue 2000-12-12 | 65.4 seconds | 151.5 | 79.45 | 98.85 | | Wed 2000-12-13 | 65.5 seconds | 154.5 | 78.43 | 98.19 | | Thu 2000-12-14 | 65.4 seconds | 142.1 | 78.22 | 98.64 | | Fri 2000-12-15 | 61.7 seconds | 137.5 | 78.03 | 98.71 | | Sat 2000-12-16 | 65 seconds | 128.3 | 77.78 | 98.78 | | Sun 2000-12-17 | 46 seconds | 125.7 | 39.41 | 98.68 | +----------------+--------------+----------+-----------+------------------+ ICMP Statistics by hour: select hour(from_unixtime(time)) as hour,max(worst)/1000,avg(avg)/1000,sum(sent)/100000,100*sum(received)/sum(sent) from router group by hour; hour Worst ms Average Sent(mio) Success % 0 57278.37 138.059590 196.18 98.67 1 43301.89 136.007962 195.81 98.71 2 51901.02 131.662991 196.98 98.78 3 63325.29 131.654931 196.59 98.69 4 65422.86 164.178118 194.76 97.72 5 63522.00 139.733280 195.40 98.32 6 49304.99 142.706104 196.46 98.77 7 55722.09 146.657704 196.71 98.68 8 65000.23 148.779013 195.82 98.74 9 59250.37 165.504852 196.40 98.71 10 65274.52 161.591378 229.10 98.68 11 63620.93 172.375448 229.41 98.68 12 39172.46 163.020699 233.48 98.57 13 44416.79 170.170915 196.93 98.67 14 54636.03 162.751704 196.07 98.69 15 48788.04 148.042846 220.50 98.78 16 65469.82 137.381458 171.37 98.75 17 65377.39 135.851451 197.15 98.78 18 56507.93 137.741980 195.72 98.77 19 60658.95 138.538186 196.17 98.69 20 59730.43 136.892590 195.92 98.82 21 59888.10 138.834535 195.69 98.83 22 65442.16 138.655882 196.02 98.78 23 64260.51 139.809274 196.88 98.78 Distinct Network Traceroute Paths encountered: 26.8 mio Maximum Path Length encountered: 28 hops Average Pathlength: 14.1 hops (This is a limitation I build into the tool. Maybe I better increase this) Maximum distinct routes within an hour to one host: 9 routes Table of router distances, performance and reliability: ------------------------------------------------------- (Here one of mySQLs limitations surfaced. I need to create a temporary table since mySQL does not have the ability to do nested SELECTs: create tempx(host int unsigned,hop int, primary key(hop,host)); insert into tempx select distinct router,hop from path; select count(*),avg(avg),100*sum(received)/sum(sent) from tempx,router where router.host=tempx.host group by hop; select hop,count(*) from tempx group by hop; Of couse this query then needed even longer to run than the ealier ones. ) hop #IPs RTA Success% 1 2 3.1ms 100% 2 2 1.7ms 99.84 3 2 2.5ms 99.85 4 30 19ms 99.28 5 222 41ms 99.01 6 615 56.6ms 98.91 7 1359 70.8ms 98.85 8 2694 86.7ms 98.67 9 4120 97.1ms 98.59 10 5255 108.2ms 98.66 11 6726 117.5ms 98.12 12 8044 134.7ms 98.01 13 8666 153.5ms 97.49 14 8936 165ms 97.61 15 8241 179.4ms 97.44 16 7248 190.9ms 97.70 17 5697 204.4ms 97.36 18 4326 218.8ms 97.77 19 3118 232.2ms 97.31 20 2359 256.2ms 97.78 21 1872 267ms 97.01 22 1354 280.8ms 97.73 23 1092 295.7ms 97.31 24 874 313.8ms 96.58 25 690 316.5ms 97.03 26 586 314ms 96.58 27 516 305ms 97.43 28 481 303.8ms 96.82 Interpretation: 1. Our colo needs 3 hops to get stuff out to the internet. 2. Level 3 has 30 major connection points locally. 3. Internet Backbone routing results in a loss of >1% 3. 41ms is the time it takes for an electrical signal to cross the US. I guess the 222 could be seen as the major exchange points of L3. Total number of IP addresses encountered (and monitored for loss and performance!): 35453 Total routers discovered: 21398 Total hosts: 14055 Total number of complaints about pinging from us: 2 Out of diskspace condition: At the end of the week the / partition was overflowing. mySQL stopped and did no longer accept any new data. Processes were hanging as a result. Data was spooled in disk-files. The database was then relocated to another partition with more space. A recovery run of about an hour made the data accessible again. The spooled data was then inserted into the database.sele 1U box: Was able to handle the load just fine. Equipment: IDE HD and a 433 Celeron CPU with 128M Ram. There was barely any load on the system except during hourly database dumps. mySQL: Basically fine except that it does not support full SQL. Advantage is that it is relativly light in system resource use so it can be deployed with the monitoring tool on the same machine.