5/31/2008

Programmers Only: Optimistic Locking

转一篇关于locking的文章,学习ing।

LOCKING BASICS

For the purposes of this article, let us assume that we have chosen LOCKSIZE PAGE (or ANY) for the table spaces used in the examples. In this article, whenever I use the word PAGE you may substitute ROW if you like.

The DB2 for z/OS lock manager (also known as the IRLM) is designed to monitor access to data stored in DB2 tables. The IRLM is consulted every time an application requests a lock. The IRLM then decides whether or not the requested lock can be obtained concurrently with existing locks held by other users or if the requested lock must wait in a queue and be handled serially. If the latter situation occurs, then the request will either be granted after the other incompatible locks are released or it won't be granted at all and the requester will timeout while waiting. In other words, the IRLM decides and enforces the logic that is needed — concurrent versus serialized access logic (see my prior columns on locking, listed in Resources).

The IRLM needs no special directions for the length of time to hold exclusive (X) page locks. They're held from the time they're acquired until they're released at COMMIT. Period. No more discussion. The COMMIT may be explicitly coded or may be implicit (such as COMMITs after ROLLBACKs or at program termination). Once you've acquired a page X-lock you no longer need to worry about someone else updating or deleting your data during your unit of recovery. What the application programmer must worry about and understand is the concept of duration for readers. But not just any readers. In this article, we're talking about a reader who intends to update the data he has read. Our reader is worried about other users updating his data between his read and his update. Our reader recognizes that there's a vulnerable duration between the initial read and the eventual update. If someone else is able to update the data during this unprotected duration, then what will happen when our reader does his update? Our reader's update will overlay the update that occurred during the duration. He will never ever be rewarded for overlaying
someone else's update. This duration between the read and the following update, this potential data integrity exposure, is entirely under the control of the application
programmer.

PESSIMISTIC LOCKING TECHNIQUES

There are many ways of protecting the data between the read and the subsequent update. If we are pessimists, we will assume that there are users out there lying in wait for our data, exactly our data, our page, our row. Therefore, we must protect the data we're reading at any cost. But not all pessimistic locking has the same impact on concurrency and performance. The techniques vary from extreme to potentially tolerable.

Probably the most pessimistic technique would be to BIND the program with ISOLATION RR. With ISOLATION RR, the application attempts to acquire an S-lock on every page read (for every single SQL statement on every table the application uses) whether or not a row qualified on that page. And it holds and accumulates the S-locks until COMMIT. This isn't a very realistic solution for multiple reasons. First, concurrency and performance would suffer dreadfully and, second, online applications would still be in trouble unless they used CICS conversational techniques or the equivalent. Likewise, binding with ISOLATION RS, which "only" acquires locks on pages in which rows qualify (accumulating those locks until COMMIT), is seldom a realistic solution.

Another pessimistic technique would be to have a LOCK TABLE IN EXCLUSIVE MODE statement embedded in the program for those tables being read and subsequently updated. Just kick everyone else out of the tables. While this technique may work for a few applications, it isn't a very realistic solution for day-to-day batch activity and certainly not for online applications.

A third pessimistic technique would be to use the FOR UPDATE OF clause in a DECLARE CURSOR statement. Surprised? The FOR UPDATE OF clause is pessimistic because it assumes that someone will update the data if it isn't "tied up." The application programmer is so sure that someone will update our reader's data that he's willing to acquire not just one lock but two: a U-lock that will be held while our reader is evaluating the data and then an X-lock when he actually does the update. The conversion of the U- to an X-lock is done by means of an upgrade process. Our reader will upgrade the lock on every page that he reads for update and pay for two lock acquisitions just because the application programmer is so pessimistic. And, by the way, this technique works in batch but not online.

A fourth pessimistic technique would be to BIND the application with ISOLATION CS and CURRENTDATA YES. This technique works to protect our reader some — but not all — of the time. Hmmmm. Now there's a scary technique upon which to base the integrity of corporate data: pessimism accompanied by high CPU overhead, concurrency issues, and no dependability. ISOLATION CS with CURRENTDATA YES ensures that our reader will only be allowed to read clean, committed data. Whenever a qualified row is found on a page, an S-page lock will be acquired on the page. The S-locks will not be accumulated; they will be released when the CURSOR moves off the page.

This technique seems much more benign than the others. However, it doesn't work at all for online pseudo-conversational applications. And for batch applications? It depends on the access path that was chosen. If the access path is one WHERE the qualified row is found at FETCH, then the S-lock will be held while the row is returned to the application. If the access path is one WHERE the qualified row is found at OPEN, then the S-lock will be acquired and released at OPEN. It will not be held at FETCH when the row is returned to the application. And since it won't be held at FETCH, it won't be held during the vulnerable duration. One thing we don't need is access-path-dependent data integrity.

One horrid side effect of this technique is that the BIND parameter applies to all read-only SQL, not just the reads that you intend to update. Our reader may actually timeout trying to acquire an S-lock on a page even though his desired row is clean and all he wants to do is read it.

Before we move on to the optimistic locking techniques, I want to point out that none of these pessimistic locking techniques (except for locking everyone out of the data completely) will protect our reader/updater from timeouts. Just because you have an S- or a U-lock on the page while you're in your application deciding to update or not doesn't mean that someone else won't have another S-lock at the same time. So, when you decide to update and request your X-lock, there's absolutely nothing to make that other user with his S-lock move out of the way. And you wait ever so patiently for your X-lock. And timeout.

OPTIMISTIC LOCKING TECHNIQUES

The concept behind optimistic locking assumes that most, if not all of the time, no one out there is interested in your page, in your about-to-be-updated data. It does not assume that there will never be a stranger lurking out there waiting to take advantage of your optimism. It doesn't mean that you're uninterested in the integrity of your data. It doesn't assume you're willing to overlay someone else's update. It just means that you are realistic about the fact that it's highly unlikely that another user will want to update exactly your row on your page at the same time you do. If you're optimistic, there's no need to kick everyone else out of the data. And there's
no need to acquire and accumulate locks on every page whether a row qualifies on that page or not. There's no need to acquire U-locks on every page and then pay for the upgrade to an X-lock (two locks on every page). And there's no need to make rules that apply to all of your read-only SQL whether you intend to update that data or not.

MINIMAL OPTIMISM

The first optimistic locking technique has been employed in DB2 applications since DB2 began. I first saw the technique in an online application. The programmer realized that locks, no matter what kind or how acquired, weren't held across screen displays. While the user was staring at the data on a screen, any other user could be updating that data. After a user changed the data and hit the "OK — do the update" key, a pessimistic programmer would open a CURSOR declared with the FOR UPDATE OF clause, reread the data, and then update the data WHERE CURRENT OF CURSOR. Our optimistic programmer wouldn't pay for the overhead of rereading the data and the overhead of the U-lock followed by the X-lock. Our optimistic programmer would just code the update — but not foolishly so. The UPDATE would contain a substantial WHERE clause to ensure that the data hadn't changed between the initial read and the update

(Listing 1 shows an example).

This rich WHERE clause syntax ensures that our user's data hasn't changed between the time he read the data and then updated the data, all without the overhead of the reread and without the overhead of upgrading a lock. In the second optimistic locking technique, the programmer maintains and checks a counter or an update timestamp in the updating SQL to confirm that the data hasn't changed during the vulnerable duration. In order to use this technique in V8, the table design must include either a counter or a timestamp that's updated by every update statement. Because optimistic locking techniques are so important for performance and
concurrency, tables created under DB2 9 for z/OS contain an automatic update timestamp.

Although these two optimistic techniques were initially used for online applications, they can be used in batch applications in which pessimistic techniques cause too darned much overhead. The combination of using either of these techniques along with binding programs with ISOLATION CS and CURRENTDATA NO ensures that our reader will only be allowed to view clean, committed data, while avoiding locks in all but very rare situations. Reduced CPU overhead and rare contention. Nirvana. With extreme optimism, tempered by "I'm optimistic but I'm not stupid" WHERE clause techniques, the programmer can now avoid almost all read locks.

MORE TO COME

Now that you understand the concept of optimistic locking, whether implemented by a rich WHERE clause or by using the update timestamp or counter technique, we can move on to more complex scenarios. In my next column, I'll discuss three related issues: built-in optimistic locking (part of V7's static scroll cursors), multirow reads followed by set updates WHERE CURRENT OF CURSOR (V8), and the new built-in timestamp (V9).

Bonnie Baker is a consultant and corporate trainer specializing in application performance issues on the DB2 for z/OS platform. An IBM DB2 Gold Consultant, a five-
time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame, she is best known for her series of seminars entitled "Things I Wish They'd Told Me 8 Years Ago" and for writing this column. You can reach her through Bonnie Baker Corp. at 1-813-837-3393 or bkbaker@bonniebaker.com.


Listing 1.
A WHERE clause that ensures data hasn't changed between an initial read and an update.
UPDATE tname
SET UPDCOL1 = :HVUPDCOL1,
UPDCOL2 = :HVUPDCOL2
WHERE UPDCOL1 = :ORIGINALVALUECOL1
UPDCOL2 = :ORIGINALVALUECOL2
NO_UPDCOL3 = :ORIGINALVALUECOL3, etc.... listing all columns in the table


Programmers Only: Optimistic
Locking, Part 2
When it comes to DB2 for z/OS locks, a pessimistic approach rarely pays.

By Bonnie Baker

In my last column I explained the concept of optimistic locking in DB2 for z/OS. I explained how optimistic locking techniques can be used in application programs to reduce locking overhead (including potential timeouts) and improve performance. I explained that SQL functions — such as for update of and update where current of
cursor) — and BIND parameters — such as ISOLATION RR/RS, FOR UPDATE OF, or CURRENTDATA(YES) — are very pessimistic and will most likely consume many more DB2 resources than necessary.

You might recall that I told you that optimistic locking techniques have been around since DB2 was a V1R1 baby; however, until now, the techniques have always been enforced by developers using application program logic and special WHERE clause predicates. Finally, I explained that a program that uses optimistic techniques is far less likely to be a victim of data integrity exposures than one that foolishly relies on pessimistic techniques such as CURRENTDATA(YES) and assumes that DB2 is always holding necessary locks.

Now that we understand the optimistic locking concept, whether implemented by a rich WHERE clause that reapplies qualifying predicates or by using the UPDATE TIMESTAMP or update counter technique, we can move on to more complex scenarios. In this column, I'll discuss four related issues: V4's row-level locking, V7's static scroll cursor with its "built-in" optimistic locking, V8's dynamic scroll cursor and row-positioned cursor, and DB2 V9's new data type that folks are calling a "built-in timestamp."

ROW-LEVEL LOCKING

Now why would a discussion about row-level locking be in this particular column? Well, many people who are using pessimistic locking techniques encounter more than their endurable quota of timeouts and deadlocks. To solve the problem, they could try to do less locking by using optimistic locking techniques, or they could blame the problem on page-level locking and switch to row-level locking in an attempt to solve the timeout problems. But by doing so, their pessimism just becomes more granular. Locks are acquired on rows instead of pages. In a worst case scenario, if someone reads six rows FOR UPDATE OF on a single page, that person will acquire six U-locks instead of one. And, as the rows are updated WHERE CURRENT OF CURSOR, that person will make six trips (instead of one) to the lock manager to upgrade the locks from U to X.

Locking problems can often be solved by other less onerous means. Sometimes creating indexes to allow better access paths that touch fewer pages will solve a locking problem. If you filter out the rows you want from the rows you don't want by using a great index with matching and screening predicates, then you won't read as many pages. And if you don't read a page, you won't ask for a lock on that page. Nor will you be told that you can't read it because someone else has an incompatible lock on the page. But you don't necessarily have to create more indexes. Often, switching from pessimistic locking techniques to optimistic locking techniques will solve any locking problems you have.

V7'S STATIC SCROLL CURSOR

There are many ways of protecting data (optimistically, of course) between a read and a subsequent update. One is by declaring the cursor as a static scroll cursor. But not just any static scroll cursor. The cursor must be a SENSITIVE static scroll cursor — sensitive to change by others as well as by our own program. With these cursors, the DB2 developers have given us built-in optimistic locking. At OPEN CURSOR a result set of all the qualified rows is materialized. Remember, we have told DB2 that the cursor is sensitive, with each update or delete to the base table, both by ourselves (DECLARE SENSITIVE) and by others (FETCH SENSITIVE). DB2 will make sure that our result set stays in sync with the real table. In order to stay in sync, each FETCH from the static CURSOR causes a simultaneous read to the real table to see if anyone else has changed or deleted the underlying row. We get the most current COMMIT-ted image of our row
regardless of what's in the result set. If we then look at the data and decide to either UPDATE WHERE CURRENT OF CURSOR or DELETE WHERE CURRENT OF CURSOR, DB2 has to read the base table to make the change. Because the base table has to be read anyway, DB2 can simultaneously and automatically check to see any of the values on the row have changed between the time we fetched our in-sync image and the time we decide to update the row. If the values have changed, then DB2 gracefully lets us know. DB2 is essentially using the rich WHERE clause technique that I mentioned in Part 1 of this article (available online at DB2mag.com). And we don't have to code the WHERE clause ourselves. DB2 is doing it for us. If the row hasn't changed, the update is performed and our change is perpetuated to the result set. If the row has been changed by someone else, the UPDATE isn't successful and a message is returned to the program.

One side benefit of this technique is that for the first time we can now use the UPDATE/DELETE WHERE CURRENT OF CURSOR technique on a cursor that has an ORDER BY clause. Before the advent of static scroll cursors, we couldn't use FOR UPDATE OF and UPDATE WHERE CURRENT OF CURSOR if we had sort syntax (for example, ORDER BY) in our CURSOR. Coding both an ORDER BY clause and a FOR UPDATE OF clause in the CURSOR declaration netted a slap on the coding wrist that this cursor was an "unambiguously read-only cursor."

Well, we still can't use ORDER BY and FOR UPDATE OF in the same cursor declaration, but we can use ORDER BY in the CURSOR declaration and follow the FETCH with an UPDATE WHERE CURRENT OF CURSOR. We can do so without fear — as long as the CURSOR is a SENSITIVE STATIC SCROLL CURSOR with built-in optimistic locking.

V8'S DYNAMIC SCROLL CURSORS, ROW-POSITIONED CURSORS,
AND LOCKING

In V8, we're allowed to DECLARE DYNAMIC SCROLL CURSORs. By definition, these cursors are always reading rows from the base table — never a sortout file, never a result set. If the cursor has an ORDER BY, there must be an index available for DB2 to use in order to avoid the data sort. Otherwise, the BIND will fail. In other words, we know that, sort syntax or not, DB2 will always be reading the base table.

So, with these cursors, even with sort syntax we can UPDATE WHERE CURRENT OF CURSOR. We can't use FOR UPDATE OF if we have an ORDER BY, but we can update anyway. The problem is that, unlike with static scroll cursors, there is no built-in optimistic locking. So, how do we ensure that there's no data integrity exposure? The optimistic locking techniques that we've talked about aren't available.

The problem is exacerbated if we use the new "row-positioned" option that allows us to FETCH more than one row, let's say 100 rows, with one connect to DB2. If we then UPDATE WHERE CURRENT OF CURSOR, 100 rows will be updated and we won't be able to check 100 timestamps. Nor will we have the ability to apply 100 rich WHERE clause comparisons.

I've come up with some approaches that work, but none that I really like. First, remember that we know we're accessing the base table; this means that BINDing with
CURRENTDATA(YES) will cause DB2 to hold locks on the row or rows between the FETCH and the UPDATE WHERE CURRENT OF CURSOR.

What I especially don't like about this approach is that CURRENTDATA is a BIND parameter and applies to all the read-only SQL in the program, not just the CURSOR in question. To make it apply to just the CURSOR in question, we'd have to isolate the cursor in its own program. And besides, CURRENTDATA(YES) was in the list of pessimistic techniques I mentioned in the last column.

A second, SQL-specific technique is to use the WITH clause on our cursor declaration and use WITH RS for our isolation. WITH RS will cause DB2 to acquire an S-lock on each page that has a qualified row on it and accumulate those locks (very important for our row-positioned cursor) until the locks are upgraded to X-locks by the UPDATE WHERE CURRENT OF CURSOR. At COMMIT all the X-locks and any outstanding S-locks will be released. This approach is also a pessimistic technique that causes upgrade overhead (see Part 1), but at least it's SQL specific. And because of that I like it better than CURRENTDATA(YES). If anyone has a better, more optimistic technique to handle data integrity exposures from concurrent updaters, please share the technique with me so that I can share it with others.
V9'S BUILT-IN TIMESTAMP

In Part 1, we discussed adding an update timestamp column to our table so that we could check it for change each time a row was updated. However, the burden of
maintaining the timestamp fell to the programmer. If someone slipped into SPUFI and updated the row, the timestamp might not get changed. The rules may not be obeyed. In V9, we have new SQL syntax for an automatically updated timestamp. You can now create/alter tables and include a ROW CHANGE TIMESTAMP. The name of the column can be anything you like (MY_UPD_TIMESTAMP) but the definition of the column includes new parameters: GENERATED ALWAYS FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP.

Now the program code can use the optimistic UPDATE TIMESTAMP technique (see Part 1) without worrying that other applications (such as SPUFI or QMF) might not be obeying the rules.

STICK WITH IT, OPTIMISTICALLY

If I could make one thing stick in every programmer's brain, I would say that figuring out when a lock is held and when one isn't, when there is data integrity exposure and when there isn't, is a complex, confusing, and painful process. Using a single approach whenever possible has many advantages. Whenever that single technique is unusable, choose the most optimistic of the available options.

Bonnie Baker is a consultant and corporate trainer specializing in applications performance issues on the DB2 for z/OS platform. She is an IBM DB2 Gold Consultant, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her series of seminars entitled "Things I Wish They'd Told Me 8 Years Ago" and writing the "Programmers Only" column. She can be reached through Bonnie Baker Corporation at 1-813-837-3393 or bkbaker@bonniebaker.com.

IDUG User View: Locking Up DB2 Performance
A new locking feature in DB2 9 for z/OS improves the performance outlook.
By David Beulke


Supporting multiple systems and dealing with application developers and managers who often don't have a clue makes the DBA's job a challenge. Fortunately, DB2 9 for z/OS offers a new optimistic locking feature that can improve system performance-and perhaps the DBA's (and application developer's) mood.

Database locking is a necessary overhead and a core component of all DBMSs. Locking maintains integrity by preventing multiple transactions from changing the same data at the same time. But taking and maintaining database locks can be expensive, especially for complex systems, applications, or transactions.
Optimistic locking now uses new features defined within DB2 tables to reduce deadlocks and overall locking overhead, improving system and application performance. (Read Programmers Only for more on optimistic locking.)

To use the new feature for optimistic locking, you need to define a new ROW CHANGE TIMESTAMP column in a DB2 table with new parameters (GENERATED ALWAYS, FOR EACH ROW ON UPDATE, AS ROW CHANGE TIMESTAMP) as follows:

CREATE TABLE BEULKE.PRODUCT_TBL (
PROD_NBR INTEGER NOT NULL,
PROD_INVENTORY INTEGER NOT NULL,
PROD_LAST_UPD NOT NULL
GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE
TIMESTAMP);

These parameters tell DB2 to always populate and pay special attention to thetimestamp and table. The last update timestamp has been embedded in some
applications for years; now IBM has endorsed and improved on this technique.

These new features enable DB2 to retrieve rows from a specific time period and understand when they were last modified. DB2 not only notes the row timestamp
information but also the record ID (RID) and change token information. Noting the row attributes allows applications and users to query the database via timestamp to get a specific row or group of rows based on WHERE timestamp clause criteria.

The new column feature reduces locking overhead by allowing the majority of applications to be rebound and reduces locking profiles from Repeatable Read (RR),
Read Stability (RS), or Cursor Stability (CS) to Uncommitted Read (UR). Uncommitted Read avoids database locks; the application can maintain database transaction integrity by using the new timestamp column within the application UPDATE SQL statements. The new timestamp column provides both the timestamp and the record ID (RID) of the row that DB2 can use to verify that no other application has changed the data in question.

Another DB2 9 SQL phrase, SKIP LOCKED DATA, also helps avoid locks by not retrieving or affecting data rows with incompatible locks. You can use the phrase within SELECT, UPDATE, and DELETE SQL statements to avoid deadlocks.

Use both the isolation level UR and the SKIP LOCKED DATA phrase cautiously. Although the techniques can dramatically reduce locking and improve performance, they require that you thoroughly know your applications. Research each application in detail before using this performance boost, and read the DB2 manuals for full details. Because these techniques can significantly reduce deadlocks and locking overhead, especially in a data sharing environment, they're worth the research and implementation time.

These techniques and other performance topics are covered at International DB2 Users Group conferences around the world (www.idug.org).
David Beulke [davebeulke@cs.com], a past president of IDUG, is a DB2 consultant,
author, and lecturer who specializes in database performance, data warehouses, and
Internet applications.