Dynamic Site Acceleration explained

Dynamic Site Acceleration (aka “whole site acceleration”) is a group of techniques that make dynamic websites faster, while offloading the application servers. Here are some common techniques that are part of DSA and featured in aiScaler:

1. Connection Management: reducing client connection time & TCP Multiplexing
2 . Dynamic cache control: introduction
- Dynamic cache control: Microcaching
- Dynamic cache control: aiScaler vs traditional HTTP cache control
- Dynamic cache control: easy configuration
- Dynamic cache control: invalidation of cache
- Dynamic cache control: prevent caching of responses
- Dynamic cache control: caching POST and GET responses
- Dynamic cache control: response-based TTL-bending
- Dynamic cache control: TTL bending when under heavy load.
3. Preloading and pre-fetching of -uncachable- third party API responses.
4. On-the-fly response compression.
5. Off-loading SSL termination
6. Full page caching and whole site delivery.
7. TCP/IP optimizations
8. Route optimization

1. Connection management: reducing client connection time (by buffering non-cacheable content)

The non-cacheable requests are always forwarded verbatim¹ to the origin servers and obtained responses are fed back to the requesting clients, without being cached or shared. Even in this situation, with no benefits of caching, the use of aiScaler offers a very important advantage: it offloads the task of dealing with the client network connections to aiScaler – away from the web servers. Without aiScaler, each client connection requires a dedicated process that lives on the web server for the duration of the connection. When clients have a slow connection, this occupies the origin server, because the process has to stay alive, while waiting for a complete request. With aiScaler front-ending the traffic, the situation is different. aiScaler waits to obtain a complete and valid request from the client, before sending this virtually instantaneously to the origin, using a faster connection and our efficient, zero-overhead processing of requests/responses. Do not underestimate this benefit – it can offload web servers significantly.

To illustrate, let’s imagine a client consuming larger responses from a web site over a slower/congested connection. Without aiScaler front-ending such requests/responses, most existing web servers have to dedicate a whole separate process/thread to sending and/or receiving of this data to such slower client, even when the actual generation or post-processing of the response is very fast. Such dedicated process will need to be maintained for the duration of such connection, which could be 10 and more seconds.

Imagine more than a hundred responses like that being fed to the clients at the same time and you can probably see how most web farms would have a problem in such a situation. What if you have a few tens of thousands of connected users – the situation only gets worse ! And chances are that the code that generates the responses also maintains an application server and a DB connection for the duration of the response, only further compounding the problem and propagating even more load to your application and database Servers.

Now, with aiScaler front-ending the traffic, the situation is very different. It is the aiScaler that obtains a complete request from a client, makes sure it is a valid request and only then, virtually instantaneously, feeds it to an origin server.

Similarly, when on origin server is ready with a response, aiScaler consumes it virtually instantaneously, not tying up the origin server for much longer time like a regular client would. After obtaining a complete response from origin server, aiScaler then feeds the response to the requesting clients, using its extremely efficient, zero-overhead network logic.

aiScaler has connection management turned on by default and does not require additional configuration

¹aiScaler might perform some request/response header modifications if so configured.

Connection management: TCP Multiplexing a.k.a. HTTP keep-alive

Normally aiScaler does not force to close a connection after serving a response to a client, allowing clients to send more requests over the already-established connection. This saves on overhead for establishing TCP/IP connections, since it’s more efficient than opening a new connection for every single request. This is knows as connection persistence or connection Keep-Alive feature.

Instead of closing a client connection after serving a response, aiScaler indicates to clients, via Keep-Alive HTTP header, its Keep-Alive preferences as to for how long the connection could be kept open by the client.

The longer the connection stays open, the better the experience for your visitors. However, a large keep-alive setting, means more open connections on your aiScaler server, so you might need to adjust your client Keep-Alive settings to find a middle ground. We have set a recommended default number in the template of the configuration file, which can be found in our deployment tool.

2. Dynamic cache control: introduction

Dynamic cache control aims at increasing the cache-hit ratio of your website. This is the number of responses served from cache, in relation to the responses served from origin. Serving content from RAM-based cache is fast and cheap. Serving content from origin servers (like database- and web-servers) uses a lot of CPU and I/O actions. This results in slow websites and large, expensive server farms.

When a response comes directly from aiScaler’s response cache, it completely eliminates all and any load that your web site components, such as web, application and database servers, would be subjected to otherwise. Since aiScaler’s response cache is RAM based, serving of cached response generates no disk IO and very little CPU load.

Caching can be truly a life saver for many websites – a difference between staying up and being down, quick response times and unacceptably long waits, 10 servers vs. 100 servers, $50,000 vs. a million dollar spent on your infrastructure. Dynamic caching all comes down to increasing your cache-hit ratio. This can be difficult for dynamic websites, which is why aiScaler has extensive dynamic cache control.

Microcaching

Microcaching -caching content for very short periods- can make a tremendous difference to performance. Microcaching gives the impression that the site is still dynamic, although technically it is static content that is simply refreshed with short intervals. It can be especially useful in combination with full page caching. Let’s illustrate this point:

Imagine a financial news website, with stock quotes that receive updates with a 5-second interval. Let’s say the frontpage receives 100 requests a second. Without microcaching, all requests will head straight to the origin servers and most likely past that as well – straight to the app- and database-servers. If you enable caching for just 5 seconds, the origin server will only see 1 request every 5 second. That is 500-time reduction of traffic to your origin servers! This could be a difference between a site that cannot stay up even with dozens of origin servers (and matching number of App and DB servers) and a site whose footprint can now be reduced to say just 2-3 origin servers. As you can see, cost savings by implementing microcaching can be astonishing. For busy URLs, you might even see meaningful results with TTLs of 1-2 seconds.

Dynamic cache control: aiScaler vs traditional HTTP cache control

Traditional HTTP cache control, has a number of optional request and response headers that are known as “conditional” headers. For example, when an HTTP response is delivered from a web server to a requesting client, some or all of the following headers could be sent back by the responding web server:

Etag (for example “Etag: 0924385FDCACCC”). This can be thought of as response’s digital fingerprint of sorts .
Last-Modified (for example“Last-Modified: Jan 1 2001 01:01:01 EST”).
Expires (for example “Expires: Jan 1 2001 01:01:01 EST”)

At aiScaler we found these headers unnecessarily complicated and causing mediocre performance. Unless otherwise configured, aiScaler eliminates both the Etag and Last-Modified conditional response headers for all cacheable content. It does so by making sure these are never propagated from origin servers to the requesting clients. It also never generates these headers on its own.

Additionally, aiScaler replaces the value of the Expires response header. For example, assuming a certain URL is configured to be cacheable for 10 minutes, aiScaler replaces the Expires value with: “time now + 10 minutes”, where “time now” is the time when filling the cache for the first time.

The end result is an encouragement of caching and reuse of content by the clients (browsers, intermediary proxies and caches), without having to burden neither the clients nor the aiScaler server with the generation and processing of conditional requests.

Dynamic cache control – easy configuration

aiScaler uses “exact“, “simple” or “regular-expression” pattern types, to set up all caching rules. For example, to cache all jpg files for 7 days, the only line you need to add to the configuration file is:

pattern .jpg simple 7d

Or, to cache the front page for 10 minutes, you can use an exact pattern:

pattern /index.html exact 10m (or pattern / exact 10m)

aiScaler does not use XML or a special configuration language, that you have to learn. However, if you need help with the configuration of your caching patterns, our engineers can do this for you at cost-covering rate per hour. Simply contact our support

Dynamic cache control: forcefully expire (purge) cache

aiScaler has several ways to invalidate cache, when a dynamic page has changed. All of this prevent you from serving stale content to visitors, while still allowing you to serve content from cache, when this is possible. Cache can be purged on a case-by-case basis, using regular expressions, this means: you can purge cache for a specific group of URLs. The options for triggering cache expiration are:

File-driven cache expiration. aiScaler checks a folder every 5 seconds for files that contain instructions to expire cache. This can be especially useful for communication with external API’s, like an inventory system for a web-shop.
Response-driven cache invalidation. This feature is best explained by example. Let’s say you have a message board web site where you cache both discussion threads and forum fronts. Yet at the same time, when a new message is added to a thread, you want to have aiScaler expire the cached content of respective discussion thread right-away, not waiting for cached content to expire when its TTL runs out. This way the newly added message is seen by visitors as soon as possible and the posters are not confused when the message they’ve just posted doesn’t show up in the thread right away, as they certainly expect it to.
CLI-based cache purging, for manual invalidation of cache.
Purging cache by visiting a secret URL, for sysadmins, who want to purge cache, but do not want to log in to the CLI.

Dynamic cache control: prevent caching of responses

There are times when you want to serve different content (responses) for the same URL (request). For example, registered users vs anonymous users. Some of these responses might be served from cache, while others are so dynamic that they have to be served from origin servers. aiScaler has several options to filter these requests, so that as many responses as possible can be served from cache.

Cookie-driven cache control. A given page might look the same for all the anonymous users, while it looks different for those who log in. Sometimes you can cache a page for most visitors (such as anonymous users), but serve it from origin servers to other visitors (such as registered/logged-in ones). aiScaler detects the presence of cookies and will only serve cached content based on the presence/lack of a certain cookie. For some (anonymous) users, the page will be cached, with all the regular benefits of caching, while others (logged-in users) will see their personalized content.
Conten-driven cache control. You might have a need to control whether or not a web page is cacheable, based on the page’s content. For example, let’s say www.example.com/breakingnews.aspx is normally cached for 10 seconds, using full page caching. However, if the editorial team decides to publish a dynamic object -like a survey (poll) on that page- you do not want to serve the page from cache, since it may lead to stale content. aiScaler can search pages for strings and alter caching patterns based on the presence of certain content. To go back to the example: As soon as the poll is removed from the page, aiScaler automatically switches back to full page caching.
Pattern based prevention of caching This is to make the configuration of caching pattern easier. In the same way that you can use patterns to create caching rules, you can also use patterns to exempt certain URL’s from those caching rules. I.e. you can let aiScaler explicitly know, never to cache certain URL’s. Typical examples of such exceptions, include dynamic personalized content or URLs that might “clash” with general caching patterns.
URL triggered cache control: prevent caching, based on a visitor’s browsing history. Sometimes, you won’t be able to use cookie-driven cache freshness control. It can happen when a session cookie is established even before a user logs in. Using such session cookies as cache busting indicators is then impossible. Another scenario could involve an e-commerce site – where a user can place an item into a shopping basket, without logging in first. Again, you might not be able to use the presence of a cookie in the request as a cache busting indicator. aiScaler account for these scenarios with URL-triggered cache control. You effectively tell aiScaler: after a user visits a certain URL, I want to disable caching for otherwise cacheable URLs. For example, after basketAdd.jsp is visited, you want to disable showing a cached version of other pages, as the cached pages will have a non-empty shopping basket in them.

Dynamic cache control: caching POST and GET responses

aiScaler allows you to cache both POST and GET requests. For more information see page 143 of the admin guide

Dynamic cache control: response based TTL-bending

Normally, aiScaler obeys caching TTL rules as set by the caching patterns. Sometimes, you might find yourself configuring aiScaler for a website that requires certain flexibility in assigning TTL based not only on the request URL (these you match via patterns), but also on what response looks like. In this particular case, TTL might depend on values of certain response headers.

For example, let’s assume there’s a URL pattern of /content_ID/XXXX , where XXXX is some random number. This group of URL’s, might return different responses, depending on XXXX. For example, for some values of XXXX, it might return HTML – which you would like to cache for 10 seconds. For other values, it might return CSS -that you could cache for 1 day. And yet for other URL’s (= values of XXXX) it might return images – which you want to cache for 1 week. But you cannot make the determination of the TTL for the URL, until you receive the actual response and can analyze the Content-Type header.

aiScaler can set these TTL’s for different types of content, increasing the cache-hit ratio. Serving more content from cache, ultimately leads to lower page loading times and a decreased load on origin servers.

Dynamic cache control: TTL-bending when under heavy load

Normally, aiScaler obeys the caching TTL rule as set by matching pattern. Sometimes you’d like to temporarily increase the TTLs, to reduce the load and survive the onslaught of traffic. This could be during a DdoS attack or because of unexpected high traffic.

You can certainly manually edit the configuration file, modify (increase) the defined TTLs and reload the configuration for these changes to take effect. But then you need to remember to restore the old settings back when the load subsides and clearly, such manual changes require time and presence of an operator.

aiScaler comes to the rescue, with a feature called “TTL-bending when under heavy load”. Should the average website response time exceed a certain threshold (let’s say 2000ms), all configured TTL’s will be temporarily increased by a preset factor (let’s say 5). For example, if you normally cache a certain response, with a TTL of 10 seconds, this will then automatically increase to 50 seconds, for the duration of the “crisis situation”

When the response time drops below the threshold again (under 2000ms), TTL‘s will be restored to its original value of 10 seconds. All of this happens automatically, without any operator involvement.

3. Prefetching responses

A website might rely on a number of API calls to other websites/service providers for functionality such as Advertisement calls, analytics etc. Such calls might be expensive in terms of amount of time they take to execute and as a result, they might slow down the pages on your website that rely on these calls.

aiScaler allows you to configure a set of such slower URLs for preload, where aiScaler pre-fetches and actively maintain a queue of fresh responses to such slower calls, in anticipation that these responses might be soon requested. When they are requested, instead of going to the remote site to obtain the response, aiScaler, virtually instantaneously, serves pre-fetched response.

By tailoring pre-fetch parameters, you can fine tune preload so that most of such responses are in fact pre-fetched. To assist with it, aiScaler collects and reports a comprehensive set of pre-fetch statistics.

One must not confuse the preload functionality with caching of responses. While caching is only applicable to shared, cacheable responses, preload logic only acts on non-cacheable responses (we also refer to these as 0TTL).

Global, per-Website Web statistics screens contain information on the efficacy of pre-load: ratio of responses served from preload queues to all 0TTL responses. You’d want that ratio to be as close to 100% as possible, to maximize the benefits of response preload. To get there, adjust the number of preloaded responses.

If you get that number too high, it is possible that some preloaded responses might get stale before a client has a chance to request it. To deal with this scenario, you can specify the max age of preloaded responses that is allowed to be served to the clients, it defaults to 10 minutes.

aiScaler reports the queue length statistic for each URL you configured for preload via Web and CLI interfaces. Preload counters are also available via SNMP.

4. On-the-fly compression

aiScaler’s caching algorithm is extremely efficient in comparison with other solutions (see this report). You will discover that in most cases, you are limited by bandwidth available between aiScaler servers and your clients. To solve this, you need to upgrade your infrastructure, but first make sure you compress everything that is compressible. This way you can expect 4-fold+ increase in RPS numbers, with the same bandwidth².

aiScaler has ability to compress web content on-the-fly. While it was rare to have this capability in a web server 8-9 years ago, nowadays most web servers support it out of the box. So it is up you to decide where you want to compress the responses – right on the origin servers or by aiScaler. If you feel that origin servers are already taxed out and could use a break, let aiScaler handle it. Otherwise we recommend compressing it at origin server, because it can be more efficient to compress content upstream. However, if you decide to let aiScaler do the compression, you do not have to modify you origin of web servers in any way or install any software.

² Significantly large text-based responses, such as HTML, CSS, JS, JSON, XML might compress 5:1 and better, a very significant benefit indeed. Your bandwidth utilization drops; you save money and the end-users, on slower connections, will see responses coming much faster. And the whole Internet, too, is a better place for it. So please, do compress your content.

5. Offloading SSL termination

aiScaler saves resources on your origin servers, by offloading HTTPS traffic encryption. When acting in this fashion, aiScaler maintains HTTPS communications between itself and clients, while optionally forwarding requests in clear (HTTP) to origin servers. This way you get to have your cake and eat it too: you protect the information while in transit, yet you relieve your origin servers from having to deal with HTTPS overhead. You can also have a more traditional configuration when aiScaler accesses origin servers over HTTPS, in case you do not have full control over the path between aiScaler and your web-servers.

If you know that certain auxiliary content (such as images, JS, CSS) in HTTPS pages can be obtained from your origin servers via HTTP, aiScaler can ignore encryption on those files. This will reduce the HTTPS load on origin servers and likely improve you site’s performance.

With reasonably fast hardware dedicated to aiScaler servers, you can expect very high HTTPS session establishment and bulk encryption rates from aiScaler. For example , 8 threaded aiScaler running on 8-core Intel Nehalem server, can accomplish over 15,000 RSA-1024 key signs/sec, 25,000 key verify/sec. The same configuration is capable of driving around 1.5Gbps of traffic in 3DES or AES-256 encryption mode, certainly more than adequate numbers for even high volumes of HTTPS traffic.

6. Full page caching

Full page caching is simply caching the entire HTML page, so that it doesn’t have to be generated again for the next user. For more info, go to this page.

7. TCP/IP optimizations

aiScaler optimizes HTTP keep-alive connections, to reduce the overhead that comes from establishing new TCP/IP connections. See page 311-313 of the admin guide

8. Route optimization (Latency based routing)

aiScaler works with sophisticated third party DNS providers, to provide route optimizations (DynDNS, DNS Made Easy or AWS Route 53). These providers take care of latency based routing, geographic redundancy and geographic load balancing. Route optimization is useful, both to lower page loading times and to divide traffic in case of DDoS attacks. Read more on aiCDN»