Tuning LAMP systems, Part 2: Optimizing Apach...

来源:百度文库 编辑:神马文学网 时间:2024/10/06 21:52:29
Tuning LAMP systems, Part 2: Optimizing Apache and PHP
What slows Apache down, and how to get the most out of PHP
Sean A. Walberg (sean@ertw.com), Senior Network Engineer

Sean Walberg has been working with Linux and UNIX since 1994 in academic, corporate, and Internet service provider environments. He has written extensively about systems administration over the past several years.
 
Summary:  Applications using the LAMP (Linux®, Apache, MySQL, PHP/Perl) architecture are constantly being developed and deployed. But often the server administrator has little control over the application itself because it's written by someone else. Thisseries of three articles discusses many of the server configuration items that can make or break an application's performance. This second article focuses on steps you can take to optimize Apache and PHP.
View more content in this series
Tags for this article: apache,lamp,php
Tag this!
Update My dW interests (Log in |What's this?)Skip to help for Update My dW interests
 
Date:  30 Apr 2007
Level:  Intermediate
Also available in:   Chinese
Activity:  53806 views
Comments:   0 (Add comments)
Average rating (based on 87 votes)
 
Show articles and other content related to my search: lamp optimizationShow descriptions |Hide descriptions
(1 - 10 of 61 search results)View all search results
developerWorks : Open source tutorials and projects
Find how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products. Topic areas include Eclipse, Globus/Grid, Apache, Derby/Cloudscape, Linux, scripting languages such as PHP, Perl, and Python, as well as broader discussions on licensing and open source development.
IBM WebSphere Developer Technical Journal: Pair J2EE with PHP to implement a common Web application infrastructure
Enterprise Java applications deployed on WebSphere Application Server and Web sites deployed in Apache, MySQL, and PHP environments have traditionally been considered competing, mutually exclusive solutions. However, you can run WebSphere and PHP together to match their relative advantages to your Web development requirements.
Make PHP apps fast, faster, fastest, Part 3: Cache your data in memory with the Memcache daemon
This 'Make PHP apps fast, faster, fastest' series explores XCache and XDebug, and now the Memcache daemon. The Memcache daemon (memcached) is a high-performance distributed object cache. Installed between your application and your data store, memcached persists your objects in RAM. Each cache hit replaces a roundtrip to a database server, making your application run faster.
Redbooks - InfoSphere Warehouse: Cubing Services and Client Access Interfaces
Formerly known as DB2® Warehouse, InfoSphere™ Warehouse enables a unified, powerful data warehousing environment. It provides access to structured and unstructured data, as well as operational and transactional data. In this IBM® Redbooks® publication, we provide a brief overview of InfoSphere Warehouse, but the primary objective is to discuss and describe the capabilities of one particular component of the InfoSphere Warehouse, which is InfoSphere Warehouse Cubing Services. InfoSphere Warehouse Cubing Services is designed to provide a multidimensional view of data stored in relational databases, for significantly improved query and analysis capabilities. For this, there are particular schema designs that are typically used for these data warehouse and data mart databases, called dimensional, or cube, models. Optimization techniques are used to dramatically improve the performance of the OLAP queries, which are a core component of data warehousing and analytics. InfoSphere Warehouse Cubing Services works with business intelligence (BI) tools, and clients, such as Cognos® , Alphablox, and Microsoft® Excel® , through client interfaces, to accelerate OLAP queries from many data sources. We describe these interfaces and provide examples of how to use them to improve the performance of your OLAP queries.
Optimize Perl
Perl is an incredibly flexible language, but its ease of use can lead to some sloppy and lazy programming habits. We're all guilty of them, but there are some quick steps you can take to improve the performance of your Perl applications. In this article, I'll look at the key areas of optimization, which solutions work and which don't, and how to continue to build and extend your applications with optimization and speed in mind.
DB2 Universal Database for Linux, UNIX, and Windows Version 8.1.4 Information Center
By clicking the above link, the information center will be launched in search mode, using your search terms. Note, particular firewall settings can prevent the search terms from being passed to the information center search.
Redbooks - MySQL to DB2 Conversion Guide
Switching database vendors is often considered an exhausting challenge for database administrators and developers. Complexity, total cost, and the risk of downtime are often the reasons that restrain IT decision makers from starting the migration project. The primary goal of this book is to show that, with the proper planning and guidance, converting from MySQL to IBM® DB2® is not only feasible but straightforward. If you picked up this book, you are most likely considering converting to DB2 and are probably aware of several of the advantages of to converting to DB2 data server. In this IBM Redbooks® publication, we discuss in detail how you can take advantage of this industry leading database server. This book is an informative guide that describes how to convert the database system from MySQL™ 5.1 to DB2® V9.7 on Linux® and the steps that are involved in enabling the applications to use DB2 instead of MySQL. This guide also presents the best practices in conversion strategy and planning, conversion tools, porting steps, and practical conversion examples. It is intended for technical staff that is involved in a MySQL to DB2 conversion project.
Redbooks - Practical Migration to Linux on System z
There are many reasons why you would want to optimize your servers through virtualization using Linux® on System z®: - Too many distributed physical servers with low utilization - A lengthy provisioning process that delays the implementation of new applications - Limitations in data center power and floor space - High Total Cost of Ownership (TCO) - Difficulty allocating processing power for a dynamic environment. This IBM® Redbooks® publication provides a technical planning reference for IT organizations that are considering a migration to Linux on System z. The overall focus of the content in this book is to walk the reader through some of the important considerations and planning issues that you could encounter during a migration project. Within the context of a pre-existing Unix based or x86 environment, we attempt to present an end-to-end view of the technical challenges and methods necessary to complete a successful migration to Linux on System z.
Redbooks - Using IBM DB2 for i as a Storage Engine of MySQL
With the Apache, MySQL™, and PHP (AMP) stack, IBM® i has the open source middleware to run thousands of PHP applications and scripts that have been written to the MySQL database. MySQL is a database that is used on millions of Web sites. To support the wide variety of usage, the developers of MySQL has developed an open storage engine architecture for data functionality and storage. Over a dozen storage engines are available for MySQL. IBM and Sun™ Microsystems have worked together to deliver a DB2® for i Storage Engine for MySQL. With this support, PHP applications written to MySQL database can have the data stored in the DB2 for i database. This approach provides management benefits for the IBM i customer because DB2 is integrated into IBM i and customers already know how to manage, back up, and protect DB2 data. In addition, the DB2 for i Storage Engine provides access to the MySQL data from IBM i environments such as RPG, CL, and DB2 Web Query. The DB2 for i Storage Engine offers the management and data access integration that can make IBM i the preferred platform for running open source applications for IBM i customers. This IBM Redbooks® publication provides broad information to help you understand this storage engine. The book also helps you install, tailor, and configure DB2 for i Storage Engine for MySQL support.
Redbooks - Implementing an IBM System x iDataPlex Solution
The IBM® iDataPlex™ data center solution is for Web 2.0, high performance computing (HPC) cluster, and corporate batch processing customers experiencing limitations of electrical power, cooling, physical space, or a combination of these. By providing a big picture approach to the design, iDataPlex solution uses innovative ways to integrate Intel® -based processing at the node, rack, and data center levels to maximize power and cooling efficiencies while providing necessary compute density. An iDataPlex rack is built with industry-standard components to create flexible configurations of servers, chassis, and networking switches that integrate easily. Using technology for flexible node configurations, iDataPlex technology can configure customized solutions for applications to meet specific business needs for computing power, storage intensity, and the right I/O and networking. This IBM Redbooks® publication is for customers who want to understand and implement the IBM iDataPlex solution. It introduces the iDataPlex solution and the innovations in its design, outlines its benefits, and positions it with IBM System x and BladeCenter® servers. The book provides details of iDataPlex components and the supported configurations. It describes application considerations for an iDataPlex solution and aspects of data center design that are central to an iDataPlex solution. The book concludes by introducing the services offerings available from IBM for planning and installing an iDataPlex solution.
Linux, Apache, MySQL, and PHP (or Perl) form the basis of the LAMP architecture for Web applications. Many open source packages based on LAMP components are available to solve a variety of problems. As the load on an application increases, the bottlenecks in the underlying infrastructure become more apparent in the form of slow response to user requests. Theprevious article showed you how to tune the Linux system and covered the basics of LAMP and performance measurement. This article focuses on the Web server components, Apache and PHP.
Apache is a highly configurable piece of software. It has a lot of features, but each one comes at a price. Tuning Apache is partially an exercise in proper allocation of resources, and involves stripping down the configuration to only what's needed.
Apache is modular in that you can add and remove features easily. Multi-Processing Modules (MPMs) provide this modular functionality at the core of Apache -- managing the network connections and dispatching the requests. MPMs let you use threads or even move Apache to a different operating system.
Only one MPM can be active at one time, and it must be compiled in statically with --with-mpm=(worker|prefork|event).
The traditional model of one process per request is called prefork. A newer, threaded, model is called worker, which uses multiple processes, each with multiple threads to get better performance with lower overhead. The final, event MPM is an experimental module that keeps separate pools of threads for different tasks. To determine which MPM you're currently using, execute httpd -l.
Choosing the MPM to use depends on many factors. Setting aside the event MPM until it leaves experimental status, it's a choice between threads or no threads. On the surface, threading sounds better than forking, if all the underlying modules are thread safe, including all the libraries used by PHP. Prefork is the safer choice; you should do careful testing if you choose worker. The performance gains also depend on the libraries that come with your distribution and your hardware.
Regardless of which MPM you choose, you must configure it appropriately. In general, configuring an MPM involves telling Apache how to control how many workers are running, whether they're threads or processes. The important configuration options for the prefork MPM are shown in Listing 1.
StartServers 50 MinSpareServers 15 MaxSpareServers 30 MaxClients 225 MaxRequestsPerChild 4000
Compiling your own software
When I got started with UNIX®, I insisted on compiling my own software for everything I put on my systems. Maintaining updates eventually caught up with me, so I learned how to build packages to ease this task. Eventually, I realized that most of the time I was duplicating the effort the distribution was doing; now, for the most part, I stick with whatever is provided by my distribution of choice when I can, and roll my own packages when I must.
Similarly, you may find that maintainability of vendor packages outweighs the benefits of going with the latest and greatest code. Sometimes performance tuning and systems administration have conflicting goals. You may have to consider vendor support if you're using a commercial Linux or relying on third-party support.
If you strike out on your own, learn how to build packages that work with your distribution and how to integrate them into your patching system. This will ensure that the software, along with any tweaks you make, are built consistently and can be used across multiple systems. Also keep on top of software updates by subscribing to the appropriate mailing lists and Rich Site Summary (RSS) feeds.
In the prefork model, a new process is created per request. Spare processes are kept idle to handle incoming requests, which reduces the start-up latency. The previous configuration starts 50 processes as soon as the Web server comes up and tries to keep between 10 and 20 idle servers running. The hard limit on processes is dictated by MaxClients. Even though a process can handle many consecutive requests, Apache kills off processes after 4,000 connections, which mitigates the risk of memory leaks.
Configuring the threaded MPMs is similar, except that you must determine how many threads and processes are to be used. The Apache documentation explains all the parameters and calculations necessary.
Choosing the values to use involves some trial and error. The most important value is MaxClients. The goal is to allow enough worker processes or threads to run without causing your server to swap excessively. If more requests come in than can be handled, then at least those that made it through get service; the others are blocked.
If MaxClients is too high, then all clients experience poor service because the Web server tries to swap out one process to allow another one to run. Too low a setting means you may deny services unnecessarily. Checking the number of processes running at high loads and the resulting memory footprint of all the Apache processes gives you a good idea of how to set this value. If you go over 256 MaxClients, you must also set ServerLimit to the same number; read the MPM's documentation carefully for the associated caveats.
Tuning the number of servers to start and keep spare depends on the role of the server. If the server runs only Apache, you can use modest values as shown inListing 1, because you're able to make full use of the machine. If the system is shared with a database or other server, then you should limit the number of spare servers being run.
Each request that Apache processes goes through a complicated set of rules that dictates any restrictions or special instructions the Web server must follow. Access to a folder can be restricted by IP address to a certain folder, or a username and password can be configured. These options also include the handling of certain files, such as if a directory listing is provided, how certain filetypes are to be handled, or whether the output should be compressed.
These configurations take the form of containers in httpd.conf such as to specify that the configuration to follow refers to a location on disk, or to indicate that the reference is to a path in the URL. Listing 2 shows a Directory container in action.
AllowOverride None Options FollowSymLinks
In Listing 2, the configuration enclosed in the Directory and /Directory tags is applied to the given directory and everything under it — in this case, the root directory. Here, the AllowOverride tag dictates that users aren't allowed to override any options (more on this later). The FollowSymLinks option is enabled, which lets Apache look past symlinks to serve the request, even if the file is outside the directory containing Web files. This means that if a file in your Web directory is a symlink to /etc/passwd, the Web server happily serves the file if asked. With -FollowSymLinks used instead, this feature is disabled, and the same request causes an error to be returned to the client.
This last scenario is a cause for concern on two fronts. The first is a performance matter. If FollowSymLinks is disabled, then Apache must check each component of the filename (directories and the file itself) to make sure they're not symbolic links. This incurs extra overhead in the form of disk activity. A companion option called FollowSymLinksIfOwnerMatch follows the symbolic link if the owner of the file is the same as that of the link. This has the same performance hit as disabling following of symlinks. For best performance, use the options inListing 2.
Security-conscious readers should be alert by now. Security is always a trade-off between functionality and risk. In this case, the functionality is speed, and the risk is allowing unauthorized access to files on the system. One of the mitigations is that LAMP application servers are generally dedicated to a particular function, and users can't create the potentially dangerous symbolic links. If it's vital to have symbolic link-checking enabled, you can restrict it to a particular area of the file system, as in Listing 3.
Options FollowSymLinks Options -FollowSymLinks
In Listing 3, any public_html directory in a user's home directory has the FollowSymLinks option removed for it and any child directories.
As you've seen, options can be configured on a per-directory basis through the main server configuration. Users can override this server configuration themselves (if permitted by the administrator by the AllowOverrides statement) by dropping a file called .htaccess into a directory. This file contains additional server directives that are loaded and followed on each request to the directory where the .htaccess file resides. Despite the earlier discussion about not having users on the system, many LAMP applications use this functionality to control access and for URL rewriting, so it's wise to understand how it works.
Even though the AllowOverrides statement prevents users from doing anything you don't want them to, Apache must still look for the .htaccess file to see if there is any work to be done. A parent directory can specify directives that are to be processed by requests from child directories, which means Apache must also search each component of the directory tree leading to the requested file. Understandably, this causes a great deal of disk activity on each request.
The easiest solution is to not allow any overrides, which eliminates the need for Apache to check for .htaccess. Any special configurations are then placed directly in httpd.conf. Listing 4 shows the additions to httpd.conf to enable password checking for a user's project directory, rather than putting in a .htaccess file and relying on AllowOverrides.
AuthUserFile /home/user/.htpasswd AuthName "uber secret project" AuthType basic Require valid-user
If the configuration is moved into httpd.conf and AllowOverrides is disabled, disk usage can be reduced. A user's project may not attract many hits, but consider how powerful this technique is when applied to a busy site.
Sometimes it's not possible to eliminate use of .htaccess files. For example, in Listing 5, where an option is restricted to a certain part of the file system, overrides can also be scoped.
AllowOverrides None AllowOverrides AuthConfig
After you implement Listing 5, Apache still looks for .htaccess files in the parent directories, but it stops in the public_html directory because the rest of the file system has the functionality disabled. For example, if a file that maps to /home/user/public_html/project/notes.html is requested, only the public_html and project directories are searched.
One final note about per-directory configurations is in order. Any document about tuning Apache will tell you to disable DNS lookups through the HostnameLookups off directive because trying to reverse-resolve every IP address connecting to your server is a waste of resources. However, any limitations based on hostname force the Web server to perform a reverse lookup on the client's IP address and a forward lookup on the result of that to verify the authenticity of the name. Therefore, it's wise to avoid using access controls based on the client's hostname and to scope them as described when they're necessary.
When a client connects to a Web server, it's allowed to issue multiple requests over the same TCP connection, which reduces the latency associated with multiple connections. This is useful when a Web page refers to several images: The client can request the page and then all the images over one connection. The downside is that the worker process on the server has to wait for the session to be closed by the client before it can move on to the next request.
Apache lets you configure how persistent connections, called keepalives, are handled. KeepAlive 5 at the global level of httpd.conf allows the server to handle 5 requests on a connection before forcing the connection closed. Setting this number to 0 disables the use of persistent connections. KeepAliveTimeout, also at the global level, determines how long Apache will wait for another request before closing the session.
Handling persistent connections isn't a one-size-fits-all configuration. Some Web sites fare better with keepalives disabled (KeepAlive 0), and some experience a tremendous benefit by having them on. The only solution is to try both and see for yourself. It's advisable, though, to use a low timeout such as 2 seconds with KeepAliveTimeout 2 if you enable keepalives. This ensures that any client wishing to make another request has ample time, and that worker processes aren't idling while waiting for another request that may never come.
The Web server can compress the output before it's sent back to the client. This results in a smaller page being sent over the Internet at the expense of CPU cycles on the Web server. For those servers that can afford the CPU overhead, this is an excellent way of making pages download faster — it isn't unheard of for pages to be a third of their size after compression.
Images are generally already compressed, so compression should be limited to text output. Apache provides compression through mod_deflate. Although mod_deflate can be simple to turn on, it includes many complexities that the manual is eager to explain. This article doesn't cover the configuration of compression except to provide a link to the appropriate documentation (see theResources section.)
PHP is the engine that runs the application code. You should install only the modules you plan to use and have your Web server configured to use PHP only for script files (usually those ending in .php) and not all static files.
When a PHP script is requested, PHP reads the script and compiles it into what's called Zend opcode, a binary representation of the code to be executed. This opcode is then executed by the PHP engine and thrown away. An opcode cache saves this compiled opcode and reuses it the next time the page is called. This saves a considerable amount of time. Several opcode caches are available; I've had a great deal of success with eAccelerator.
Installing eAccelerator requires the PHP development libraries on your computer. Because different Linux distributions place files in difference places, it's best to get the installation instructions directly from the eAccelerator Web site (see theResources section for a link). It's also possible that your distribution has already packaged an opcode cache, and you just have to install it.
Regardless of how you get eAccelerator on your system, there are a few configuration options to look at. The configuration file is usually /etc/php.d/eaccelerator.ini. eaccelerator.shm_size defines the size of the shared memory cache, which is where the compiled scripts are stored. The value is in megabytes. Determining the proper size depends on your application. eAccelerator provides a script to show the status of the cache, which includes the memory usage; 64 megabytes is a good start (eaccelerator.shm_size="64"). You may also have to tweak your kernel's maximum shared memory size if the value you choose isn't accepted. Add kernel.shmmax=67108864 to /etc/sysctl.conf, and run sysctl -p to make the setting take effect. The value for kernel.shmmax is in bytes.
If the shared memory allocation is exceeded, eAccelerator must purge old scripts from memory. By default, this is disabled; eaccelerator.shm_ttl = "60" specifies that when eAccelerator runs out of shared memory, any script that hasn't been accessed in 60 seconds should be purged.
Another popular alternative to eAccelerator is the Alternative PHP Cache (APC). The makers of Zend also have a commercial opcode cache that includes an optimizer to further increase efficiency.
You configure PHP in php.ini. Four important settings control how much system resources PHP can consume, as listed in Table 1.
Setting Description Recommended value
max_execution_time How many CPU-seconds a script can consume 30
max_input_time How long (seconds) a script can wait for input data 60
memory_limit How much memory (bytes) a script can consume before being killed 32M
output_buffering How much data (bytes) to buffer before sending out to the client 4096
These numbers depend mostly on your application. If you accept large files from users, then max_input_time may have to be increased, either in php.ini or by overriding it in code. Similarly, a CPU- or memory-heavy program may need larger settings. The purpose is to mitigate the effect of a runaway program, so disabling these settings globally isn't recommended. Another note on max_execution_time: This refers to the CPU time of the process, not the absolute time. Thus a program that does lots of I/O and few calculations may run for much longer than max_execution_time. It's also how max_input_time can be greater than max_execution_time
The amount of logging that PHP can do is configurable. In a production environment, disabling all but the most critical logs saves disk writes. If logs are needed to troubleshoot a problem, you can turn up logging as needed. error_reporting = E_COMPILE_ERROR|E_ERROR|E_CORE_ERROR turns on enough logging to spot problems but eliminates a lot of chatter from scripts.
Back to top
This article focused on tuning the Web server, both Apache and PHP. With Apache, the general idea is to eliminate extra checks the Web server must do, such as processing the .htaccess file. You must also tune the Multi-Processing Module you're using to balance the system resources used with the availability of idle workers for incoming requests. The best thing you can do for PHP is to install an opcode cache. Keeping your eye on a few resource settings also ensures that scripts don't hog resources and make the system slow for everyone else.
The next and final article in this series will look at tuning the MySQL database. Stay tuned!
Learn
"Quantify performance changes using application tracing" (developerWorks, July 2006) shows how to use application tracing to show the effect of configuration changes on Apache.
"Using the new memory manager" (developerWorks, March 2007) covers the latest changes to PHP 5.2's handling of memory. PHP is constantly refining its use of system resources.
mod_deflate is an Apache module that compresses output on the fly. This can also be done in PHP throughoutput compression.
Pre-caching compressed static files such as JavaScript code. CSS is another way to improve performance.Compressing and concatenating all your JavaScript code and CSS is even better.
The Apache documentation onMulti-Processing Modules is worth reading to learn about the functionality of each; follow the links to the specific documentation for the MPM you choose.
In thedeveloperWorks Linux zone, find more resources for Linux developers.
Stay current withdeveloperWorks technical events and Webcasts.
Get products and technologies
If your distribution doesn't includeeAccelerator, theInstall From Source instructions will be helpful.
TheAlternative PHP Cache andZend Platform are alternatives to eAccelerator.
Siege lets you simulate users, so you can find out how much traffic your site can handle.
Sooner or later you're going to want to cache certain elements of your site and distribute load across multiple Web servers.Squid in accelerator mode (also known as a reverse proxy) or theLinux Virtual Server Project are excellent tools.
WithIBM trial software, available for download directly from developerWorks, build your next development project on Linux.
Discuss
Check outdeveloperWorks blogs and get involved in thedeveloperWorks community.

Sean Walberg has been working with Linux and UNIX since 1994 in academic, corporate, and Internet service provider environments. He has written extensively about systems administration over the past several years.