Configuring the cache agent for automatic refreshing and preloading
Most caching proxy servers cache a file only after a user requests it. Caching Proxy has a cache agent that provides automatic cache preloading. You can specify that the cache agent automatically retrieves specified URLs, the most popular URLs, or both, and places them in the cache before they are requested.
In some cases, you need to set the host name of the proxy server and identify the cache access log before the cache is preloaded. To configure the cache agent, in the Configuration and Administration forms, select Cache Configuration and use the Cache Preload and Cache Refresh forms. The files representing query results (that is, files whose URLs include the question mark character (?) are cached only if query caching is enabled).
- Caching is applied to specified URLs before a user requests the pages.
- The cache is populated before the server becomes busy with user activity.
- Current® files are supplied to users more quickly from the cache than if they were fetched on the first request.
- The proxy server is busy caching pages even during hours of low user activity.
- You must exercise some control over what is automatically loaded. Loading linked files from high-level pages, such as web indexes and search sites, can generate requests for many pages.
For optimal efficiency, set the cache agent to run when server activity is low and before the server becomes busy with client requests. Then, the files are ready in the cache to provide fast service the first time that a user requests them. By default, the cache agent is started every night at 3 a.m. local time.
Special considerations for reverse proxy configurations:
For security reasons, when you use a reverse proxy configuration, the Proxy http:* rule should be disabled, by default. (That is, this rule is commented in the ibmproxy.conf file.) However, if the rule is disabled, the cache agent is prevented from successfully sending requests and refreshing the cache content of Caching Proxy. A 403 Forbidden By Rule Error in the error log results and refreshing the cache does not complete.
Service /any-valid-string* INTERNAL:cacheAgentService
The variable any-valid-string is any string that is valid and that does not conflict with other mapping rules in the ibmproxy.conf file.
Both Caching Proxy and cache agent parse the URI based on this service directive. Instead of sending the URI directly to Caching Proxy, the cache agent utility adds a prefix to the URI with the /any-valid-string pattern in the service directive.
http://www.ibm.com/
to /any-valid-string/http://www.ibm.com/
The cache agent sends the URI with the prefix to Caching Proxy. When Caching Proxy receives the request, it removes the prefix /any-valid-string/. If the remaining URI is a fully qualified unit, Caching Proxy directly serves the request without mapping the URI against other rules.
Additionally, the cache agent can send a relative URI to Caching Proxy. For example, if you add LoadURL /abc/ by using the previously referenced service directive in the ibmproxy.conf file, the cache agent transforms it into /any-valid-string/abc/ and sends it to Caching Proxy. Caching Proxy receives the URL, removes the prefix, maps /abc/ against other mapping rules, and handles the request if there is a match.
Setting the server host name
On Linux and UNIX operating, specify the host name of the proxy server whose cache is being preloaded or refreshed. On Windows operating systems, specify the host name only if the proxy server being refreshed is not on the local machine (Refreshing a remote server's cache that is based on its most frequently accessed files is not possible because the local cache agent does not have access to a remote server's cache access log.)
To set the host name of the proxy server, in the Configuration and Administration forms, select Cache Configuration –> Cache Refresh: Identify cache destination server.
Preloading the cache with specific files
To preload the cache with the content stored at specific URLs, in the Configuration and Administration forms, use Cache Configuration –> Cache Preload. In this form, you can specify URLs for the cache agent to load. The proxy retrieves those pages when the cache agent starts, regardless of whether they were in the cache previously (These URLs are specified in the proxy configuration file by the LoadURL directive). This form can also be used to define URLs whose content is never cached. Access to a cache access log is not required for this type of cache preloading.
- Refresh the cache daily—Check this box if you want the cache agent to refresh the cache every night. If you do not want to start the cache agent, make sure that this box is not checked.
- Cache refresh time—If you want the cache agent to run at a time other than 3:00 a.m. local time, specify when you want it to start.
- Cache Contents—In the URL or IP Address field, specify the URLs to load. To exclude URLs from being preloaded, specify the URLs and click Ignore in the Cache status box.
Preloading the cache with frequently cached files
To preload the most frequently accessed pages automatically, use the Cache Configuration –> Cache Refresh form. This function requires a Cache Access Log for the proxy server. The most popular URLs are determined automatically from the Cache Access Log. The administrator can also specify the number of frequently accessed pages to preload in the cache. (This number is specified in the proxy configuration file by the LoadTopCached directive.)
- Refresh the cache daily—Check this box if you want the cache agent to refresh the cache every night. If you do not want to start the cache agent, make sure that this box is cleared.
- Cache refresh time—If you want the cache agent to run at a time other than 3:00 a.m., specify the hour and minute when you want it to start.
- Identify cache destination server—Use this option if you want to refresh a server other than the local machine. (You cannot refresh a remote server that is based on the frequency of access to specific files.)
- Cache the most popular URLs—Specify the number of URLs to cache from the previous night's cache access log.
- Load linked pages—Use this setting to configure delving (see the following section for details on delving). Set the number of levels to delve, and whether to delve for all pages (always), no pages (never), administrator-specified pages only (admin), or popular pages only (topn). Also, specify whether to delve across hosts, whether to delay between requests, and whether to cache inline images.
- Number of threads—Set the maximum number of threads to use for cache refreshing.
- Maximum work queue depth—Set the maximum queue for URLs to request.
- Maximum URLs to request—Set the maximum number of pages to load. This number is checked before delving page retrieval begins.
- Maximum time—Set the maximum time to run the cache agent. If this time is set to 0 hours 0 minutes, the cache agent runs to completion.
Delving

To control the delving process, the administrator specifies to the cache agent a maximum number of URLs that it can load (the default setting is 2000), a maximum length of time it can run (the default setting is 2 hours), and a maximum number of threads it can use (the default setting is four). The administrator can also configure more controls. By default, delving is enabled for two levels of hierarchy and is not allowed across hosts. Additionally, a delay is inserted between requests.
- It loads specific pages that the administrator specifies.
- It loads popular (frequently accessed) pages from the cache access log.
- If the maximum number of pages is not reached, more pages are loaded by delving.
The cache agent does not check whether the maximum number of pages has been reached until it starts delving across links. If the value for the maximum number of pages (called MaxURLs in the proxy configuration file) is lower than the number of pages that are retrieved in steps 1 and 2, no linked pages are retrieved.
Configuration file setting | Result |
---|---|
|
If the Cache Access Log has more than 30 unique URLs, the cache agent retrieves main.html, welcome.htm, and the top 30 requested URLs based on the cache access log. Because it has not reached the MaxURLs value, it retrieves and loads up to 18 linked URLs from pages already cached. |
|
If the cache access log has more than 30 unique URLs, the cache agent retrieves favorites.html, dislikes.html, and the top 30 requested URLs from the cache access log. No other files are retrieved because the value in MaxURLs has been exceeded. |
|
If the cache access log has more than 20 unique URLs, the cache agent retrieves hi.htm, index.html, the top 20 requested URLs from the cache access log, and up to 3 linked URLs from the earlier pages. No other files are retrieved because the value in MaxURLs has been reached. |
Related proxy configuration file directives
- AutoCacheRefresh — Specify whether cache refreshing is to be used
- CacheAccessLog — Specify the path for the cache access log files
- CacheRefreshTime — Specify when to start the cache agent
- DelayPeriod — Specify pausing between requests
- DelveAcrossHosts — Specify caching across domains
- DelveDepth — Specify how far to follow links while caching
- DelveInto — Specify whether the cache agent follows links
- IgnoreURL — Specify URLs that are not refreshed
- LoadInlineImages — Control the refreshing of imbedded images
- LoadTopCached — Specify the number of popular pages to refresh
- LoadURL — Specify the URLs to refresh
- MaxUrls — Specify the maximum number of URLs to refresh
Starting the cache agent manually
If automatic cache refreshing is enabled, the cache agent automatically runs a refresh operation at the specified time. However, you also can run the cache agent at any time from a command line.
- On Linux and UNIX operating systems: usr/sbin/cacheagt
- On Windows operating
systems: server_root
\bin\cacheagt.exe
Where server_root is the drive and directory where you installed Caching Proxy (for example, C:\Program Files\IBM\edge\cachingproxy\cp).
45 16 * * * /usr/sbin/cacheagt
This
command example starts the cache agent every day at 4:45 p.m. local
time. You can use multiple entries to run the cache agent more than once,
if needed. For more information, see your operating system's documentation
about the cron daemon.When using a cron daemon to run the cache agent, remember to turn off the automatic refresh option, either by using the Cache Configuration –> Cache Refresh configuration form or by editing the proxy configuration file. Otherwise, the cache agent runs more than once each day.