We have recently added another transit feed to our New York PoP, with a declared aim to bring down latency between London and New York to sub 70ms. We are more than happy to be able to state that current latencies between London Telehouse and New York are now around 67ms. An update to our latency overview has been posted here as well: https://worralorrasurfa.castlegem.co.uk/whmcs/knowledgebase.php?action=displayarticle&id=43.
With that, we want to explain the essentials of latency and throughput a bit.
Network latency in general states how long it takes for a packet to travel the distance between its source and destination. Network throughput, however, defines how much of your data you can send in one go (per time unit). Latency and througput are usually not directly related, unless we are in a situation where a link becomes saturated (upon which throughput will decrease, and latencies will most likely increase), and different applications or purposes require varying degrees of quality in terms of latency and throughput.
For example, if you want to manage a Linux server via ssh from home, you would like to see small latencies: you want to see what you type right away and not have to wait for ages for the characters to appear on your screen on the shell. Latency here is key, but throughput is not that important: ssh does not need enormous amounts of bandwidth. Now, video streaming is something different. If you want to watch youtube videos, you want the videos to come down your internet connection as smooth as if you were watching TV at home. In this case you need decent throughput, i.e. a lot of data per time unit, but latency is not that much of an issue here: it wont matter much if your video starts after 1 or 2 seconds, just as long as it is smooth.
Currently, we see emphasis on small latencies increasing. While this has always been a big concern for us due to the nature of our clients (a real lot of them are traders who require superb latencies to the exchanges), throughput used to be the decisive parameter for internet connections. Part of this shift in emphasis, we believe, is caused by the fact that nowadays most typical internet applications live very well with bandwidths available.
How can we measure latency and throughput? For latencies, ping, traceroute, and mtr are excellent friends. We wrote about these in a previous post, but let’s go into some examples:
ping, put simply, checks the connectivity between source and destination:
# ping HOSTNAME
PING HOSTNAME (IP) 56(84) bytes of data.
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=1 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=2 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=3 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=4 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=5 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=6 ttl=60 time=66.8 ms
We can see that the latency between our host (a London Telehouse server) and the destination (one of our routers in New York) is pretty much 66.8ms. ping takes different arguments such as the size of the packets, or the number of packets to be sent, etc. The manpage (man ping) will give you details.
traceroute will not only check the latency between the source and destination, but will also show latencies (and thus possible issues) on the way there:
# traceroute HOSTNAME
traceroute to HOSTNAME(IP), 30 hops max, 60 byte packets
1 ... (...) 0.419 ms 0.463 ms 0.539 ms
2 40ge1-3.core1.lon2.he.net (22.214.171.124) 10.705 ms 10.706 ms 10.422 ms
3 100ge1-1.core1.nyc4.he.net (126.96.36.199) 67.176 ms 67.189 ms 67.174 ms
4 10ge9-7.core1.sjc2.he.net (188.8.131.52) 141.010 ms 140.897 ms 140.928 ms
5 10ge1-2.core1.fmt2.he.net (184.108.40.206) 136.597 ms 136.746 ms 136.885 ms
6 ....castlegem.co.uk (IP) 136.855 ms 136.437 ms 136.635 ms
As we can see, we get rather stable latencies throughout all the way from London to California. Large variations in the latencies on the way are not necessarily an indication for issues yet, though, as long as the destination latencies are still smooth and regular. Possible reasons for deviations on the way to your destination could be routers rate limiting their replies or, in worse case, routers or networks indeed being congested (we will get to measuring throughput shortly).
mtr can in a way be considered the combination of ping and traceroute. It displays the network path packets travel, and it keeps doing that by sending packet after packet.
HOSTNAME (0.0.0.0) Fri Mar 7 09:51:28 2014
Keys: Help Display mode Restart statistics Order of fields quit
Host Loss% Snt Last Avg Best Wrst StDev
1. IP 0.0% 28 0.3 1.9 0.3 41.4 7.7
2. vl365-globalcrossing-peer.jump.net.uk 0.0% 28 0.3 5.8 0.3 64.3 16.6
3. po7-20G.ar4.CHI2.gblx.net 0.0% 28 259.2 114.3 89.3 259.2 57.0
4. DESTINATION 0.0% 28 91.8 91.9 91.6 94.5 0.6
We can see that hop #3 has a large standard deviation, but latency to the destination is very consistent. In our case, this is from London to Chicago. Hop #3 simply seems to rate limit these probing packets, hence has a larger latency, or/and is busy doing other things than talking to us. It would not be uncommon to see packet loss on the routers either – this is fine and also due to rate limiting mechanisms – just as long as the destination latency is still consistent, i.e. no packet loss, and no extreme deviations.
That is all good – but how do we check throughput? There are several makeshift means to measure throughput, they range from timing browser requests on the command line (such as time lynx -source http://www.google.com/ > /dev/null) to using ftp with hashmarks on and the more common wget http://HOST/testfile. These will all give you a cursory glimpse into how fast you can download data from a destination to computer. There is, however, a very nice tool called iperf that does this job in a very professional manner.
iperf can measure throughput between two network locations, and it can give you a good idea of bottlenecks when used in combination with traceroute or mtr. The drawback of iperf is that you not only need a client, but also a server to connect to. iperf is thus primarily indeed more of a professional tool, i.e. something set up between providers or commercial clients and their providers to sort out potential issues, define SLAs, etc.
There is an excellent introductory article on iperf from 2007, which we are happy to link to here: http://www.enterprisenetworkingplanet.com/netos/article.php/3657236/Measure-Network-Performance-with-iperf.htm.
Example output, both from the server and client side, can be seen below:
# ./iperf -s
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
[ 4] local IPx port 5001 connected with IPy port 59508
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.1 sec 566 MBytes 472 Mbits/sec
# ./iperf -c HOSTNAME -t 10
Client connecting to HOSTNAME, TCP port 5001
TCP window size: 23.2 KByte (default)
[ 3] local IPy port 59508 connected with IPx port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 566 MBytes 474 Mbits/sec
Here we conclude our brief overview and hope that some of you will find it useful indeed!