aiScaler Cache size management via Cache-by-Path Feature.Posted by Max Robbins on October 25th, 2010
If your web site is fairly large and has significant number of various Web documents (URLs), after deploying aiScaler, most of the content from your web servers will end up in the aiScaler’s response cache as users request it from your web site. aiCache is designed to keep such cacheable responses in memory (RAM) and never tries to save cached documents to secondary storage, such as hard drives.
aiScaler uses request’s URL as a pointer (signature) to cached copy of the Web content. Typically we would want to use as much information as possible in the URL to act “signature” or pointer to this cached object. For example let’s assume that our web site has just published a breaking news article and it is available under the following URL: www.acmenews.com/stories.dll?articleid=12344. The “stories.dll” in our case is most likely a program that renders news article with a given article ID.
In order to cache this document we would need to refer to it by the whole URL string, as shown above. Assuming that we have about a thousand active stories on our Web Site, we would end up with about a thousand cached Web documents, representing these news articles which is absolutely fine and is just the way the aiScaler was designed to operate.
Now, let’s consider different example. In today’s world of Internet it is quite common for one web site to provide a link to content on a different site. Let’s assume that a number of external web sites point to our web site’s homepage. In order for us to know which of those Web Sites has referred a user to our Web Site, those referring sites would normally provide a “referrer ID” of some form.
For example, the web site
might point (via HTTP href) to
via the following link:
Normally, we would use the whole string above,
as a signature for the cached copy of breakingnews.html. Let’s assume that there are hundreds of sites out there that point to our Web Site. This would lead to hundreds of different URLs pointing to the same document and would force us to populate cache of our Aicache server with hundreds of copies of the same Web response, polluting the cache unnecessarily with exact bit-for-bit copies of the same content – clearly rather wasteful situation in respect to utilization of RAM.
Fortunately Aicache addresses this problem with a feature that allows designating certain URLs as cacheable “by-path-only”. In other words these URLs will have their entire query part stripped to obtain signature that point to resulting (cached) Web document.
which then would be used as signature for the cached copy of this page. No matter how many sites refer to us in this fashion, we only store a single copy of this web document in our cache. We save significant amount of RAM space, yet we retain the required functionality, as the referrer information still can be processed client-side and ends up in the aicache log files should server-side log crunching be required.
Resulting resource preservation can be quite significant. Just as explained above we might be able to cut the number of objects we have to store (and manage too) in Aicache’s cache by many orders of magnitude.
We configure this functionality by specifying ignore_query for particular pattern:
pattern html simple 10m ignore_query