Received: from mnm [127.0.0.1]
	by localhost with POP3 (fetchmail-5.9.0)
	for akpm@localhost (single-drop); Tue, 10 Jun 2003 23:40:00 -0700 (PDT)
Received: from digeo-e2k04.digeo.com ([192.168.2.24]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329);
	 Tue, 10 Jun 2003 23:35:00 -0700
Received: from digeo-nav01.digeo.com ([192.168.1.233]) by digeo-e2k04.digeo.com with Microsoft SMTPSVC(5.0.2195.5329);
	 Tue, 10 Jun 2003 23:34:59 -0700
Received: from packet.digeo.com ([192.168.17.15])
 by digeo-nav01.digeo.com (SAVSMTP 3.1.1.32) with SMTP id M2003061023371314547
 for <akpm@digeo.com>; Tue, 10 Jun 2003 23:37:13 -0700
Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105])
	by packet.digeo.com (8.12.8/8.12.8) with ESMTP id h5B6YvX8013081
	for <akpm@digeo.com>; Tue, 10 Jun 2003 23:34:57 -0700 (PDT)
Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150])
	by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5B6Yqtd174654;
	Wed, 11 Jun 2003 02:34:53 -0400
Received: from sparklet.in.ibm.com (d01av02.pok.ibm.com [9.56.224.216])
	by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5B6YkAh246374;
	Wed, 11 Jun 2003 02:34:49 -0400
Received: (from suparna@localhost)
	by sparklet.in.ibm.com (8.11.6/8.11.0) id h5B6dtx02392;
	Wed, 11 Jun 2003 12:09:55 +0530
Date: Wed, 11 Jun 2003 12:09:55 +0530
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Andrew Morton <akpm@digeo.com>
Cc: philip.copeland@oracle.com
Subject: Re: -mm7 go boom
Message-ID: <20030611120955.A2385@in.ibm.com>
Reply-To: suparna@in.ibm.com
References: <1055298788.3224.57.camel@emerald> <20030610204950.2d783a89.akpm@digeo.com> <20030611112601.A2267@in.ibm.com> <20030610225358.3682764b.akpm@digeo.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="J/dobhs11T7y2rNN"
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <20030610225358.3682764b.akpm@digeo.com>; from akpm@digeo.com on Tue, Jun 10, 2003 at 10:53:58PM -0700
X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang)
X-OriginalArrivalTime: 11 Jun 2003 06:34:59.0460 (UTC) FILETIME=[94A79C40:01C32FE3]
X-Spam-Status: No, hits=-39.0 required=6.0
	tests=BAYES_01,EMAIL_ATTRIBUTION,IN_REP_TO,PATCH_UNIFIED_DIFF,
	      QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES,
	      USER_AGENT_MUTT
	autolearn=ham version=2.53
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp)


--J/dobhs11T7y2rNN
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, Jun 10, 2003 at 10:53:58PM -0700, Andrew Morton wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> wrote:
> >
> > > 
> >  > I'm suspecting that there's garbage on the workqueue pointed to
> >  > by local var `cwq'.  ie: a kioctx got freed up while it was still
> >  > queued up via schedule_work().
> >  > 
> > 
> >  Hmm, looking at the code, I don't see a protection against this
> >  case, so that's something to fix anyway I guess. However, we'd
> >  see this only if the program is done with the ioctx (e.g exit or
> >  Ctrl C, or an explicit call to destroy the ioctx).
> > 
> >  Phil, Do you see this when the program is about to exit (normally
> >  or due to some other signal) ? 
> 
> Apparently tasks were exitting at the time.   The test runs over
> a thousand processes.

OK. Could try the attached patch and see if it helps.
Its a little strong, but I didn't find a direct way to just flush
/delete a particular workqueue entry.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India


--J/dobhs11T7y2rNN
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="aio-flush-workqueue.patch"

diff -ur -X dontdiff linux-2.5.70-mm5/fs/aio.c linux-2.5.70-mm5-dbg/fs/aio.c
--- linux-2.5.70-mm5/fs/aio.c	Fri Jun  6 17:45:26 2003
+++ linux-2.5.70-mm5-dbg/fs/aio.c	Wed Jun 11 13:35:55 2003
@@ -346,6 +347,11 @@
 		aio_cancel_all(ctx);
 
 		wait_for_all_aios(ctx);
+		/* 
+		 * this is an overkill, but ensures we don't leave 
+		 * the ctx on the aio_wq
+		 */
+		flush_workqueue(aio_wq);
 
 		if (1 != atomic_read(&ctx->users))
 			printk(KERN_DEBUG
@@ -1147,6 +1171,11 @@
 
 	aio_cancel_all(ioctx);
 	wait_for_all_aios(ioctx);
+	/* 
+	 * this is an overkill, but ensures we don't leave 
+	 * the ctx on the aio_wq
+	 */
+	flush_workqueue(aio_wq);
 	put_ioctx(ioctx);	/* once for the lookup */
 }
 

--J/dobhs11T7y2rNN--
