aiScaler Cache size management via Cache-by-Path Feature.

Posted by Max Robbins on October 25th, 2010

If your web site is fairly large and has significant number of various Web documents (URLs), after deploying aiScaler, most of the content from your web servers will end up in the aiScaler’s response cache as users request it from your web site. aiCache is designed to keep such cacheable responses in memory (RAM) and never tries to save cached documents to secondary storage, such as hard drives.

aiScaler uses request’s URL as a pointer (signature) to cached copy of the Web content. Typically we would want to use as much information as possible in the URL to act “signature” or pointer to this cached object. For example let’s assume that our web site has just published a breaking news article and it is available under the following URL: www.acmenews.com/stories.dll?articleid=12344. The “stories.dll” in our case is most likely a program that renders news article with a given article ID.

In order to cache this document we would need to refer to it by the whole URL string, as shown above. Assuming that we have about a thousand active stories on our Web Site, we would end up with about a thousand cached Web documents, representing these news articles which is absolutely fine and is just the way the aiScaler was designed to operate.

Now, let’s consider different example. In today’s world of Internet it is quite common for one web site to provide a link to content on a different site. Let’s assume that a number of external web sites point to our web site’s homepage. In order for us to know which of those Web Sites has referred a user to our Web Site, those referring sites would normally provide a “referrer ID” of some form.

For example, the web site
“www.acmebusinesspartner.com”
might point (via HTTP href) to
“www.acmenews.com/breakingnews.html”
via the following link:
“www.acmenews.com/breakingnews.html?partnerid=partner1”

The “partnerid” parameter does not affect appearance (content) of the resulting Web page in way, nor is it processed in any way. It is provided simply to end-up in the log file or to be analyzed by client-side Javascript code, so that at the end of day we know how many users were referred to our web site by our partner sites.

Normally, we would use the whole string above,
www.acmenews.com/breakingnews.html?partnerid=partner1
as a signature for the cached copy of breakingnews.html. Let’s assume that there are hundreds of sites out there that point to our Web Site. This would lead to hundreds of different URLs pointing to the same document and would force us to populate cache of our Aicache server with hundreds of copies of the same Web response, polluting the cache unnecessarily with exact bit-for-bit copies of the same content – clearly rather wasteful situation in respect to utilization of RAM.

That situation might deteriorate even further in certain other cases. For example, some web sites append random strings of characters as a parameter to static web pages or JavaScript files, as a way to obtain certain functionality . However this parameter does not have any impact on the web document itself and is essentially ignored by the web server(s). Once again, if we use the whole URL string as a signature for the cached copy of the web documents, we would pollute response cache with possibly thousands of copies of the same response and adversely affect performance. Imagine a web site with a hundred thousand subscribers that appends subscriber ID as a parameter to a static HTML or JavaSript file. If we follow our regular routine and store each resulting Web document as a separate entry in the Aicache’s cache (even though all of these have absolutely identical content), we might simply run out of RAM space on our servers. Let alone that caching responses under signatures containing such a random string would effectively make such responses non-cacheable.
Fortunately Aicache addresses this problem with a feature that allows designating certain URLs as cacheable “by-path-only”. In other words these URLs will have their entire query part stripped to obtain signature that point to resulting (cached) Web document.

For example
www.acmenews.com/breakingnews.html?partnerid=partner1
becomes www.acmenews.com/breakingnews.html
which then would be used as signature for the cached copy of this page. No matter how many sites refer to us in this fashion, we only store a single copy of this web document in our cache. We save significant amount of RAM space, yet we retain the required functionality, as the referrer information still can be processed client-side and ends up in the aicache log files should server-side log crunching be required.

Resulting resource preservation can be quite significant. Just as explained above we might be able to cut the number of objects we have to store (and manage too) in Aicache’s cache by many orders of magnitude.
We configure this functionality by specifying ignore_query for particular pattern:
pattern html simple 10m ignore_query

Categories

Archive

aiScaler Cache size management via Cache-by-Path Feature.

Leave Comment