
From: "Stephen C. Tweedie" <sct@redhat.com>

There is a race condition in jbd between journal_unmap_buffer() and
journal_commit_transaction().  This is leading to corruption of buffers on the
transaction's t_locked_list, leading to a variety of symptoms usually
involving an oops in kjournald.

The problem is that various special-case exit paths in journal_unmap_buffer()
call journal_put_journal_head without any locking.  This is racing against a
refiling of the same journal_head in journal_commit_transaction():

			__journal_unfile_buffer(jh);
			__journal_file_buffer(jh, commit_transaction,
						BJ_Locked);

The way these functions work, this leaves the jh temporarily with
b_transaction==NULL; and if journal_unmap_buffer()'s call to
journal_put_journal_head() hits this window, it sees the NULL transaction and
frees the journal_head which is just about to get refiled on the locked list.

The main exit path on journal_unmap_buffer() performs its
journal_put_journal_head() before dropping the j_list_lock, so is not
vulnerable to this race.  The fix is to move the other similar calls on
special-case exit branches in that function so that they also release the
journal_head before dropping that lock.

This is low-risk since the new order has already been tested as the normal
exit path from this function.  The change has had extensive testing and has
been shown to fix the problem with no regressions found.

Signed-off-by: Peter Keilty <Peter.Keilty@hp.com>
Signed-off-by: Nicholas Dokos <nicholas.dokos@hp.com>
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/fs/jbd/transaction.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff -puN fs/jbd/transaction.c~ext3-fix-journal_unmap_buffer-race fs/jbd/transaction.c
--- 25/fs/jbd/transaction.c~ext3-fix-journal_unmap_buffer-race	2005-03-21 21:24:42.000000000 -0800
+++ 25-akpm/fs/jbd/transaction.c	2005-03-21 21:24:42.000000000 -0800
@@ -1785,10 +1785,10 @@ static int journal_unmap_buffer(journal_
 			JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
 			ret = __dispose_buffer(jh,
 					journal->j_running_transaction);
+			journal_put_journal_head(jh);
 			spin_unlock(&journal->j_list_lock);
 			jbd_unlock_bh_state(bh);
 			spin_unlock(&journal->j_state_lock);
-			journal_put_journal_head(jh);
 			return ret;
 		} else {
 			/* There is no currently-running transaction. So the
@@ -1799,10 +1799,10 @@ static int journal_unmap_buffer(journal_
 				JBUFFER_TRACE(jh, "give to committing trans");
 				ret = __dispose_buffer(jh,
 					journal->j_committing_transaction);
+				journal_put_journal_head(jh);
 				spin_unlock(&journal->j_list_lock);
 				jbd_unlock_bh_state(bh);
 				spin_unlock(&journal->j_state_lock);
-				journal_put_journal_head(jh);
 				return ret;
 			} else {
 				/* The orphan record's transaction has
@@ -1823,10 +1823,10 @@ static int journal_unmap_buffer(journal_
 					journal->j_running_transaction);
 			jh->b_next_transaction = NULL;
 		}
+		journal_put_journal_head(jh);
 		spin_unlock(&journal->j_list_lock);
 		jbd_unlock_bh_state(bh);
 		spin_unlock(&journal->j_state_lock);
-		journal_put_journal_head(jh);
 		return 0;
 	} else {
 		/* Good, the buffer belongs to the running transaction.
_
