[Bernstein09] Section 7.2. A Model for System Recovery

来源:百度文库 编辑:神马文学网 时间:2024/06/03 06:07:28

7.2. A Model for System Recovery

Inthis section, we will discuss how to cope with the failure and recoveryof processes that are running application code. We will look at thefailure and recovery of resource managers, notably database systems,starting in Section 7.3.

Detecting Process Failures

Operatingsystem processes are a firewall between the operating system and theapplication. An application failure may cause the application’s processto fail. However, the operating system can continue, so only theprocess needs to be restarted. This reduces MTTR compared to a systemwhere the application failure causes an operating system reboot.Therefore, most TP systems are built from multiple processes.

Wewould like each process to be as reliable as possible. But of course,no matter how reliable it is, there are times when it will fail. Whenit does fail, some agent outside of the process has to observe thatfact and ask to recreate the process. Usually that’s done by theoperating system, database system, or transactional middleware.

Thetransactional middleware or database system usually has one or moremonitoring processes that track when application or database processesfail. There are several ways that are commonly used to detect failures:

  • Each process could periodically send an “I’m alive” message to the monitoring process (see Figure 7.1); the absence of such a message warns the monitoring process of a possible failure.

    Figure 7.1. A Fault Detection Monitor. The monitor detects process failures, in this case by listening for “I’m alive” messages. When it doesn’t hear one within its timeout period, it assumes the process has failed.

  • The monitoring process could poll the other processes with “Are you alive?” messages.

  • Each process could own an operating system lock that the monitoring process is waiting to acquire; if the process fails, the operating system releases the process’s lock, which causes the monitoring process to be granted the lock and hence to be alerted of the failure.

Whicheverapproach is taken, it is important to optimize the time it takes for amonitoring process to detect the failure, since that time contributesto the MTTR and therefore to unavailability.

Inall these cases, the symptom provides a good reason to suspect that theprocess failed, but it is not an ironclad guarantee that the processactually did fail. In the first two cases, the process might just beslow to respond. Inthe third case, it might have released the lock yet still beoperational. The suspicion of a failure is more likely to be true ifthe detector is executing under the same operating system instance asthe process being monitored; that is, on the same machine or virtualmachine, though even here it is not a guarantee. This is the scenariowe focus on in this chapter, and we will assume that failure detectionis accurate.

Ina distributed system where the monitor is running on a differentmachine than the process being monitored, there is a greater chancethat the failure symptom is due to a communication failure rather thana process failure. We will explore this issue in Chapters 8 and 9.

Aprocess could fail by returning incorrect values. That is, it couldfail to satisfy its specification. For example, the data it returnscould have been corrupted by faulty memory, a faulty communicationline, or an application bug. We do not consider such errors here. Weassume the first two are prevented by suitable error-detecting codes.We do not consider applicationbugs because we cannot eliminate them by using generic systemmechanisms. They are addressed by software engineering technology andmethodology, which are outside the scope of this book.

Whena process failure is detected, some agent needs to recreate the failedprocess. The operating system generally is designed only to recreateprocesses that are needed to keep the system running at all, such asthe file system (if it runs as a process) and system monitor processes.The operating system generally does not automatically recreateapplication processes, except those managed by the operating system’sprocess control system. Therefore, transactional middleware anddatabase systems must step in to detect the failure of application anddatabase system processes, and when they do fail, to recreate them.

Client Recovery

Inthis discussion of recovery, we assume a basic client-server model: aclient process communicates with a server process and the serverprocess uses underlying resources, such as a disk or communicationsline (see Figure 7.2).A common configuration is to have the client running on a desktopmachine and the server on a larger shared machine. Whatever theconfiguration, the possible technical approaches for system recoveryremain the same. We are therefore deliberately vague about the type ofmachine on which the client and server run.

Figure 7.2. Basic Client-Server Model. Aclient process communicates with a server process and the serverprocess uses underlying resources, such as a disk or communication line.


Thereare several points of failure in this system: the client, theclient-server connection, the server, the server-resource connection,and the resources. If the client fails and later recovers, it needs toreconnect to the server and can start calling it again. Or, if theclient loses communication with the server, either because thecommunication line or server failed, the failure will eventually berepaired and the client will later re-establish thatcommunication and resume calling the server. In either case, atrecovery time, the main issue for the client is to re-establish itsstate relative to the server.

Thestate of the client relative to the server consists of the set of itsoutstanding calls to the server. Therefore, to recover its state, itneeds to determine the following:

  • What calls were outstanding at the time it failed or lost connectivity with the server?

  • What happened to those calls while it was down or not communicating with the server?

  • What does it have to do to finish those calls properly before proceeding with new calls?

These are exactly the issues we discussed in Chapter 04,“Queued Transaction Processing.” If there is a persistent queue betweenthe client and server, then the client can find out the state of alloutstanding calls (called “requests” in Chapter 4)by examining the queue. If not, then it has to use anapplication-specific technique, such as looking at the database stateon the server to determine if the client’s previous calls completed, orreissuing in-doubt calls with the same serial number and relying on theserver to discard duplicate calls. These techniques too were discussedin Chapter 4.

The remaining issues all focus on server availability, which is the subject of the rest of this section.

Server Recovery

Aftera server has been recreated, it runs its recovery procedure toreconstruct its state before starting to process new calls. If this isthe first time the server has ever run, then the recovery procedure istrivial—the server just initializes its state. If not, then it has somework to do.

Toexplore how a server reconstructs its state, let’s begin from firstprinciples. Suppose the server is a sequential processor of calls andthere are no transactions in the picture. The server just receives acall from a client, does what is requested, and returns a result. Atthe time it failed, the server might have been in the middle ofprocessing such a call.

Aswe discussed in the previous section on Client Recovery, it is up tothe client to determine the state of its outstanding calls. It’s alwayspossible that a server (or communications) failure causes a call to getlost, so the client must be able to cope with that fact. Since theclient has to be able to deal with lost calls, it would seem that arecovering server could just ignore whatever call it was working on atthe time it failed and start afresh. It’s up to the client to figureout what to do.

Unfortunately,this doesn’t always work, because the server may have performed anon-idempotent operation while it was processing its last call beforethe failure. For example, it may have printed a check, transferredmoney, credited a bank account, dispensed cash, or shipped a product.If the client concludes that the server did not execute the call, itwill reissue it, thereby causing the server to redo the work.Therefore, if the server performed a non-idempotent operation on behalfof the call it was processing at the time of failure, it must notre-execute the call. Rather, it must complete the call and return aresult.

Thedetails of recovering a partially-executed call are complex and are notcommonly used in systems that support transactions. However, toappreciate how much easier things are when transactions are available,let us briefly examine what the server would have to do if it could notrely on transactions.

Checkpoint-Based Recovery

Supposethe server partially executed a client’s call and then failed. Supposeall the operations that the server executed for that partially-executedcall were idempotent. In that case, at recovery time the server cansimply reprocess the call from the beginning. Re-executing operationsthat it executed before the failure does no harm, because all thoseoperations are idempotent.

Supposethe server did perform non-idempotent operations for the last call.Then it must recover itself to a state that came after the lastnon-idempotent operation it executed before the failure. So, forexample, if the server printed a check before the failure, then it mustbe recovered to a state after the time that it printed the check. Ifthe server continued processing from a state before it printed thecheck then it would repeat that operation (i.e., print the checkagain), which is exactly what should not happen. Recreating this staterequires some careful bookkeeping before the failure, so the recoveringserver can look up what was going on at the time of failure, to figureout what it should do.

Onegeneral way to prepare for this type of recovery is to have a serversave its memory state on nonvolatile storage (e.g., a disk) before itexecutes a non-idempotent operation. That way, when it recovers, it canrecreate that state (see Figure 7.3). Saving memory state is an example of checkpointing. In general, checkpointingis any activity that is done during normal processing to reduce theamount of work to redo after a recovery. Saving memory state is a kindof checkpointing, because it ensures that when the server recovers, itwon’t have to redo any work that it did before saving its state.

Figure 7.3. Server Checkpointing. Duringnormal operation, the server periodically writes a checkpoint. After afailure, it uses its last checkpointed state to recover.


Savingthe state of the server’s memory is not cheap, especially if it has tobe done every time a non-idempotent operation is performed. As we’llsee in a moment, transactions help reduce this cost.

To recover from a failure, the server restores the last checkpoint state it successfully saved (see Figure 7.4). It must then check if the non-idempotent operation that followed its last checkpoint actually ran. For example, ifthe server checkpoints its state right before printing a check, then atrecovery time reconstituting the server state requires determiningwhether or not the check was printed. This is the same question weasked in the earlier section on Client Recovery and in Section 4.4, Handling Non-Undoable Operations. That is, in this situation, the server is in the role of a client in Section 4.4that may have called a non-idempotent operation before it failed.Therefore, when the server recovers, it must determine whether thatnon-idempotent operation ran, and if so it can skip over it.

Figure 7.4. Checkpoint-Based Recovery Procedure. Theserver program checkpoints before its non-idempotent “print check”operation. The server recovery procedure recovers the last checkpointstate and branches to the line after the statement that created thecheckpoint. The server program then executes the non-idempotentoperation “print check” only if it wasn’t done before the failure.
[View full size image]

Tosummarize: If a server performs non-idempotent operations, then itreconstitutes its state at recovery time to one that comes after thelast non-idempotent operation that it performed before the failure. Theidea is to start running the process from that state, so thatnon-idempotent operations it does from that point on don’t cause aproblem.

Transaction-Based Server Recovery

Transactionssimplify server recovery by focusing clients’ and servers’ attention onthe transactions executed by each server, rather than on individualcalls within a transaction. That is, the server does all its workwithin transactions. The client tells the server to start atransaction, the client makes some calls to the server within thattransaction, and then the client tells the server to commit thetransaction.

If aserver that supports transactions fails and subsequently recovers, itsstate includes the effects of all transactions that committed beforethe failure and no effects of transactions that aborted before thefailure or were active at the time of the failure. Comparing thisbehavior to a nontransactional server, it is as if the transactionalserver performs a checkpoint every time it commits a transaction, andits recovery procedure discards all effects of aborted or incompletetransactions. Thus, when a transactional server recovers, it ignoreswhich calls were executing when it failed and focuses instead on which transactionswere executing when it failed. So instead of recovering to a state asof the last partially-executed call (as in checkpoint-based recovery),it recovers to a state containing all the results of all committedtransactions and no others.

Forthis to work, the server must be able to undo all of a transaction’soperations when it aborts. This effectively makes the operationsredoable when the transaction is re-executed. That is, if an operationwas undone, then there’s no harm in redoing it later, even if it isnon-idempotent. This avoids a problem that was faced incheckpoint-based recovery—the problem of returning to a state after thelast non-idempotent operation. This isn’t necessary because everynon-idempotent operation was either part of a committed transaction(and hence won’t be redone) or was undone (and hence can be redone).

Ifall operations in a transaction must be redoable, then the transactionmust not include the non-idempotent operations we encountered in theearlier section, Server Recovery,such as printing a check or transferring money. To cope with such anon-idempotent operation, the transaction should enqueue a message thatcontains the operation. It’s safe for the transaction to contain theenqueue operation, because it is undoable. The program that processesthe message and performs the non-idempotent operation should use thereply handling techniques in Section 4.4 to get exactly-once execution of the actual operation (printing the check or sending a money-transfer message).

Transactionsnot only simplify server recovery, they also speed it up. A memorycheckpoint is expensive, but transaction commitment is relativelycheap. The trick is that the transactional server is carefullymaintaining all its state on disk, incrementally, by writing smallamounts to a log file, thereby avoiding a bulk copy of its memorystate. It is designed to suffer failures at arbitrary points in time,and to reconstruct its memory state from disk using the log, withrelatively modest effort. The algorithms to reconstruct its state inthis way are what gives transactions their all-or-nothing anddurability properties. Either all of a transaction executes or none ofit does. And all of its results are durably saved in stable storage,even if the system fails momentarily after the transaction commits.These algorithms are the main subject of the rest of this chapter.

Stateless Servers

When transactions are used, servers usually are split into two types: application processes and resource managers (see Figure 7.5).An application process receives a client request, starts a transaction,performs application logic, and sends messages to transactionalresource managers. It does not directly access transactional resources,such as a database. Resource managers handle the state being shared bytransactions—databases, recoverable queues, and so on.

Figure 7.5. Stateless Servers. An application process stores all its state in resource managers, and is therefore stateless.


A resource manager behaves just like a transactional server described in the previous section, Transaction-Based Server Recovery.That is, it executes all calls within a transaction. And its recoveryprocedure returns its state to one that includes the effects of allcommitted transactions and no others.

Anapplication process can use a simpler recovery procedure than resourcemanagers, because it is stateless. That is, it doesn’t have any statethat might be needed after recovery. It receives a request to run atransaction (from its client), starts a transaction, executesoperations that manipulate local memory or call a database system oranother application process, commits the transaction, and sends a replyback to the client. At this point, it has no state worth remembering.It simply processes the next request that it receives as if it had beeninitialized from scratch.

Astateless server doesn’t have to do very much to recover from afailure. It just reinitializes its state and starts runningtransactions again, completely oblivious to whatever it was doingbefore the failure. Since it maintains all its state in transactionalresource managers, it is really up to the resource managers toreconstitute their states after a failure. The resource managersrecover to a state that includes all the committed transactions andnone of the aborted ones, up to the time of the failure. Now theapplication process can start processing requests again.

Theapplication processes controlled by transactional middleware usuallyare designed to be stateless servers so they do not need any recoverycode. The only ambiguity is about the state of the last request that aclient issued to the application process before the failure (e.g., thata front-end program issued to a request controller). That is, theclient is not stateless, since it needs to know the state of that lastrequest. This is where queued request processing comes in—to figure outthe state of that last request and thereby determine whether it has tobe rerun. For the application process that was actually executing therequest, there’s no ambiguity at all. It restarts in a clean state, asif it were initialized for the first time.