aiScaler Content-driven Caching Control
Posted by Max Robbins on October 25th, 2010aiScaler makes it possible to control whether or not a web page is cacheable, based on page’s (response) content. For example, www.acmenews.com/breakingnews.aspx web page is normally cached for 10 seconds, unless editorial team decides to publish a survey (poll) on that page, in which case you cannot cache the page for as long as the poll is active. As soon as poll is removed from the page, you can restore the caching back to 10 second.
aiScaler offers support for this kind of scenarios via so called “Content Driven Caching” or CDC for short. Here’s how it works.
First, you define a cacheable pattern, just as usual. Then you add one or more of cdc_pattern pattern-level settings under the matching pattern, specifying regular expression match strings. With these specified, aiScaler will analyze response bodies, looking to match the response body to a defined cdc_pattern.
Should a match be found, aiScaler temporarily overrides TTL for the matching response and sets it to 0. Effectively, the matching web page is declared non-cacheable. Periodically, aiScaler will attempt to match the page’s content again and should no match be found, the page will have its TTL restored back to the one specified by the pattern. You can control how frequently such-rechecking is done by setting cdc_interval pattern level setting, it defaults to 5 seconds.
website www.acmenews.com
…
pattern breakingnews simple 10
cdc_pattern acme\spoll
cdc_pattern acme\ssurvey
In the example above, we match response’s body (content) to see if contains “acme poll” or “acme survey” in it. Should a match be found, aiScaler will declare the page non-cacheable and no longer serve it out of cache. Sometimes, you might find it easier to match for auxiliary content URLs instead. For example, you might know that every time a poll is published on a page, the poll’s Javascript is included into the page so you can look for that Javascript URL instead. For example:
cdc_pattern userpoll.js
cdc_pattern usercomment.js
Matching for such JS “includes” might be less resource intensive, as they are often located at the very beginning of the page’s HTML, so aiScaler can find the match faster.
Let’s consider another situation. Acmenews’s editorial team might decide to publish polls or enable comments, in any of the following pages: usnews.jsp, worlnews.jsp, marketnews.asp and cenews.asp. As you can see, all of these URLs have a common “news.jsp” component to them , so you can create a single pattern to cover all of them:
website www.acmenews.com
…
pattern news.jsp simple 10
cdc_pattern userpoll.js
cdc_pattern usercomment.js
Now all of the mentioned pages will have their content analyzed for CDC-overrides, independently of each other. In other words, usnews.jsp might end up in CDC-override state, while worldnews.jsp is still served from cache.
The handling of page with the CDC-overridden TTL is no different from the way 0TTL requests are normally handled by aicache. Specifically, all of the cookies are sent both to OS and back to the requesting client.
When page is in “regular”, cacheable state, aiScaler only attempts to match the response body against the CDC patterns when the page is refreshed, so we recommend you keep TTL for such pages low enough, so aiScaler can detect the change in the page’s content quickly enough. As mentioned earlier, when in “CDC-override” state, the matches are performed every cdc_interval seconds – every 5 seconds by default.
When aiScaler needs to perform content matching for CDC patterns, it requests the response in plain, non-compressed form. This way no CPU cycles need to be spent by OS to compress the response and by aiScaler to uncompress it, before matching could be performed. The response will be compressed by aiScaler, in accordance with on-the-fly compression settings and client browser indicating support for compression