Troubleshooting 101 with aiScalerPosted by Max Robbins on October 25th, 2010
You’re in charge of a complex web site with a multitude of sub-domains, hosting all kind of information: editorial news, search, viewer comments, videos, news feeds, financial stock quotes. It is a thing of beauty, with 20+ APIs of all sorts, few dozen web, application and database servers – all working in concert to drive millions of page views per day.
Your site is humming along quite nicely when all of a sudden your monitoring screens light up red and site slows to a crawl. A minute later it is down and no responses are getting back to the clients. Where do you start looking to understand what’s going on? What do you do to restore the service ASAP? How do you make sure it doesn’t happen again?
With aiCache front-ending the traffic problem resolution becomes a straightforward exercise. A divide’n’conquer approach is in order – we need to understand what component is ailing and what traffic patterns are prevalent.
This is where you start: pull up the aiScaler Web monitor – it will show all of the domains aiScaler is accelerating. Look to see which ones are showing highest client/origin server session counts, slowest response times and highest increases in traffic.
aiScaler displays all of that information in real time – refreshing it every few seconds! You instantly see which site is getting hammered with traffic . aiScaler displays average requests/second in last 5 seconds, last minute and last hour. You quickly see what site saw the highest traffic jump.
The all-websites-overview screen allows you to zoom into a particular site or see list of pending origin server requests – this one is likely to provide a list of URLs that are unable to obtain quick responses from origin servers, per website, in real time again.
Many setups share components so that after a spike against one of services/sub-domains, in 10-20 seconds the whole site comes to a screeching halt.
How do you reconstruct the sequence of event? Again, Aicache come to the rescue – it collects 5-second snapshots of traffic and stores it both in aiScaler memory, where it can be instantly pulled up for each of the accelerated websites, and in the statistics log files – one per each accelerated domain.
You can use “runstat” CLI command or look at the statistics log file to see what domain started seeing elevated traffic levels and/or slower response times from origin servers first.
You can also use CLI’s “inventory” commands, the “sorted by fill time, number of requests, number fills” variety to see the most requested URLs, the slowest URLs to obtain – again, for each subdomain.
You narrow your search down to a failing domain. It is still unclear how to go about fixing it, yet you want to restore the service so that all the other domains/services can start working again.
The easiest way to accomplish that is put the ailing sub-domain into “fallback” mode. This way it continues to serve cached and possibly stale content, while completely disengaging the origin servers, letting your team members concentrate on restoring that service.
For a busy website, use of fallback command has another important benefit – as aiScaler now either instantly delivers a cached (possibly stale) response or, equally instantaneously, sends an error message to the client, you stop the uncontrolled growth of session counts on your network/security devices and begin dissipating these.
Bottom line: with aiScaler in the path of traffic, understanding, troubleshooting and recovering of websites becomes a fairly easy and straightforward exercise. The benefit of understanding traffic flows, in real time, is reason to insert aiScaler into the path of your web traffic, while providing a host of additional benefits!
aiCache: Get your life back …