[Nemeth10] 23.5. Caching and proxy servers

来源:百度文库 编辑:神马文学网 时间:2024/07/08 15:25:48

23.5. Caching and proxy servers

The Internet and theinformation on it are still growing rapidly. Ergo, the bandwidth andcomputing resources required to support it are growing rapidly as well.How can this state of affairs continue?

The only way to deal with thisgrowth is to use replication. Whether it’s on a national, regional, orsite level, Internet content needs to be more readily available from acloser source as the Internet grows. It just doesn’t make sense totransmit the same popular web page from Australia across a veryexpensive link to North America millions of times each day. Thereshould be a way to store this information once it’s been sent acrossthe link once.

Fortunately, there is—atleast at the site level. A web proxy lets you cache and manage yoursite’s outbound requests for web content.

Here’s how it works.Client web browsers contact the proxy server to request an object fromthe Internet. The proxy server then makes a request on the client’sbehalf (or provides the object from its cache) and returns the resultto the client. Proxy servers of this type are often used to enhancesecurity or to filter content.

In a proxy-based system,only one machine needs direct access to the Internet through theorganization’s firewall. At organizations such as K–12 schools, a proxyserver can also filter content so that inappropriate material doesn’tfall into the wrong hands. Many commercial and freely available proxyservers are available today. Some of these systems are purely softwarebased, and others are embodied in a hardware appliance. An extensivelist of proxy server technologies can be found at web-caching.com/proxy-caches.html.

The next couple of sections describe the Squid Internet Object Cache,[5] a popular stand-alone cache. We also delve briefly into the proxy features of the mod_cache module for the Apache web server.

[5] Why “Squid”? According to the FAQ, “all the good names were taken.”

Using the Squid cache and proxy server

Squid is a caching and proxy server that supports several protocols, including HTTP, FTP, and SSL.

Proxy service is nice, butit’s Squid’s caching features that are really worth getting excitedabout. Squid not only caches information from local user requests butalso allows construction of a hierarchy of Squid servers.[6] Groups of Squid servers use the Internet Cache Protocol (ICP) to communicate information about what’s in their caches.

[6]Unfortunately, some sites mark all their pages as being uncacheable,which prevents Squid from working its magic. In a similar vein, Squidisn’t able to cache dynamically generated pages.

With this feature,administrators can build a system in which local users contact anon-site caching server to obtain content from the Internet. If anotheruser at that site has already requested the same content, a copy can bereturned at LAN speed (usually 100 Mb/s or greater). If the local Squidserver doesn’t have the object, perhaps the server contacts theregional caching server. As in the local case, if anyone in the regionhas requested the object, it is served immediately. If not, perhaps thecaching server for the country or continent can be contacted, and soon. Users perceive a performance improvement, so they are happy.

For many, Squid offerseconomic benefits. Because users tend to share web discoveries,significant duplication of external web requests can occur at areasonably sized site. One study has shown that running a cachingserver can reduce external bandwidth requirements by up to 40%.

To make effective use ofSquid, you’ll likely want to force your users to use the cache. Eitherconfigure a default proxy through Active Directory (in a Windows-basedenvironment) or configure your router to redirect all web-based trafficto the Squid cache by using the Web Cache Communication Protocol, WCCP.

Setting up Squid

Squid is easy to installand configure. Since Squid needs space to store its cache, you shouldrun it on a dedicated machine that has plenty of free memory and diskspace. A configuration for a large cache would be a machine with 32GiBof RAM and 8TB of disk.

You may be able to find precompiled Squid binaries for your system, or you can download a fresh copy of Squid from squid-cache.org. If you choose to compile it yourself, run the configurescript at the top of the source tree after you unpack the distribution.This script assumes that you want to install the package in /usr/local/squid. If you prefer some other location, use the --prefix=dir option to configure. After configure has completed, run make all and then make install.

Once you’ve installed Squid, you must localize the squid.conf configuration file. See the QUICKSTART file in the distribution directory for a list of the changes you need to make to the sample squid.conf file.

You must also run squid -zby hand to build and zero out the directory structure in which cachedweb pages will be stored. Finally, you can start the server by handwith the RunCache script; it will normally be started by a script when the machine boots.

To test Squid, configure yourdesktop web browser to use the Squid server as a proxy. This option isusually found in the browser’s preferences panel.

Reverse-proxying with Apache

For security or load balancing reasons, it’s sometimes useful for web hosting sites to proxy inboundrequests (that is, requests to your web servers that are coming in frombrowsers on the Internet). Since this is backward from the typical useof a web proxy (handling outbound requests from browsers at your site),such an installation is called a reverse proxy.

One popular configuration putsa reverse proxy on your site’s DMZ network to accept Internet users’requests for services such as web-based email. The proxy then passesthese requests along to the appropriate internal servers. This approachhas several advantages:

  • It eliminates the temptation to allow direct inbound connections to servers that are not in the DMZ.

  • You need to configure only a single DMZ server, rather than one server for each externally accessible service.

  • You can control the accessible URLs at a central choke point, providing some security benefit.

  • You can log inbound requests for monitoring and analysis.

See Chapter 22 for more information about DMZ networks.


Configuring Apache to provide reverse proxy service is relatively straightforward. Inside aVirtualHost clause in Apache’s httpd.conf file, you use theProxyPass andProxyPassReverse directives.

  • ProxyPass maps a remote URL into the URL space of the local server, making that part of the local address space appear to be a mirror of the remote server. (In this scenario, the “local” server is the DMZ machine and the “remote” server is the server on your interior network.)

  • ProxyPassReverse hides the real server by “touching up” outbound HTTP headers that transit the proxy.

Below is asnippet of the reverse proxy configuration needed to insert a UNIX DMZsystem in front of a Microsoft Outlook Web Access (OWA) server thatprovides web-based email.

Code View:Scroll/Show All

ProxyPass https://wm.monkeypaw.com/rpc
ProxyPassReverse https://wm.monkeypaw.com/rpc
SSLRequireSSL



ProxyPass https://wm.monkeypaw.com/exchange
ProxyPassReverse https://wm.monkeypaw.com/exchange
SSLRequireSSL



ProxyPass https://wm.monkeypaw.com/exchweb
ProxyPassReverse https://wm.monkeypaw.com/exchweb
SSLRequireSSL



ProxyPass https://wm.monkeypaw.com/public
ProxyPassReverse https://wm.monkeypaw.com/public
SSLRequireSSL



ProxyPass https://wm.monkeypaw.com/oma
ProxyPassReverse https://wm.monkeypaw.com/oma
SSLRequireSSL



ProxyPass https://wm.monkeypaw.com/Microsoft-Server-ActiveSync
ProxyPassReverse https://wm.monkeypaw.com/Microsoft-Server-ActiveSync
SSLRequireSSL



In this example, proxy servicesare provided for only a few top-level URLs: /rpc, /exchange, /exchweb,/public, /oma, and /Microsoft-Server-ActiveSync. For security reasons,it’s a good idea to limit the requests allowed through the proxy.