Would you like to discuss your environment with a knowledgable engineer?
Many website and system admins are the type of people that will only believe things when they see them for themselves. This is especially true when it comes to performance testing. One of the very first things folks want to do when testing anything that claims to improve performance of website is to put these claims to test. This appendix serves to point out some basic things about such stress and performance testing.
Not much improvement can be seen unless you’re letting aiScaler to do what it is designed to do: cache content. Make sure you define some patterns with non-zero TTLs. A very simple pattern that would enable caching of every single URL for 60 seconds could be:
Be aware that the first request, for any given URL, to a cold aiScaler instance would result in cache miss and the request would have to go all the way to the origin server to be filled. Subsequent requests, for 60 seconds in our example, would come from cache.
Obviously, you wouldn’t want to go to production with a cache-all pattern, but it is useful for testing purposes.
You can use Apache Benchmark or “ab” – it is normally installed along with Apache Web server and therefore could be found, free of charge, on almost any Linux system. To test a URL with 10 concurrent load threads, for total of 2000 requests, you’d run something like this:
The report that ab produces is rather self-explanatory. You want to compare the before and after numbers of requests-per-second (RPS) and time-per-request, time-to-first-byte. It is also the easiest tool to use for some quick testing.
A more evolved tool for benchmarking is Jmeter. It is capable of running in a master-slave configuration, with main instance controlling a number of slave load generators. It has a flexible reporting API with a number of plugins available that can graph all kinds of stats in real time. You can create fairly evolved test scripts – crafting custom URLs based on values you read from files etc. Refer to Jmeter manual for more information.
Let’s say you have 2 servers. One is running aiScaler and the other one you use to generate the load from. Make sure you understand the network connectivity between the two. Are they on the same network/switch or are they a thousand miles apart? What is the max throughout you can have between the two?
A better case scenario would place both systems onto the same network, in the same datacenter and ideally on the same switch, at 1Gbps. Let’s say the network switch is not oversubscribed and can deliver full 1Gbps in both directions (using full duplex links). What it means is that you can expect to pass
about 100MBps in each direction.
Say the URL you’re testing is such that the response is 20KB in size. You fire up ab, point at aiScaler instance at the other server, tweak the number of load generating threads for a while, but quickly grow frustrated – the aiScaler doesn’t seem to be able to handle more than 5000 RPS, no matter what you do. Both servers are barely registering any load during the test, so you suspect something must be seriously wrong somewhere. A smirk might cross your face: another overhyped product, clearly incapable of reaching stated performance numbers, you’ve seen plenty of those before. Guess what! Nothing is wrong, it is just that at 5000 RPS, each at 20KB of response payload, you have fully saturated your network! Re-read the previous sentence till you fully understand what is happening.
Now, if the 2 servers are in different datacenter, you can end with much smaller RPS numbers. In more severe cases, you’d become limited not only by available bandwidth, but also by latency.
Let’s say you have 2 servers. One is running aiScaler and the other one you use to generate the load from. Both are on the same network/switch and you do have full 1Gbps of bandwidth available. The response size is mere 40 bytes, with header and all, so using simple math, you expect to see around 250,000 RPS when performing the stress test.
In reality, however, you will likely discover that a single load client is incapable of generating enough load to load up aiScaler server. You’ll likely need 4 or more load clients, to get to 250K RPS numbers.
General performance improvement suggestions:
Another limit on max number of open connections might come from a system-wide limit. To see what it is set to and change it, read/write from /proc/sys/fs/file-max file:
Increase number of per-process open file descriptors via ulimit command.