Closed Bug 97866 Opened 23 years ago Closed 23 years ago

Async file I/O problems in Mac NSPR

Categories

(NSPR :: NSPR, defect, P1)

PowerPC
macOS
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: sfraser_bugs, Assigned: sfraser_bugs)

References

Details

(Keywords: crash, platform-parity, Whiteboard: PDT+, [OSX+], [ETA 09.26])

Attachments

(1 file, 1 obsolete file)

In working on bug 71718, I discovered some issues with the async file I/O code on 
Mac.

It currently looks like this:

if (op == READ_ASYNC)
{
  /*
  **  Skanky optimization so that reads < 20K are actually done synchronously
  **  to optimize performance on small reads (e.g. registry reads on startup)
  */
  if ( bytes > 20480L )
  {
    err = PBReadAsync(&pbAsync.pb);
    if (err != noErr && err != eofErr)
      goto ErrorExit;
    
    me->io_pending = PR_TRUE; /* Only mark thread io pending if async call worked 
*/
  }
  else
  {
    (void) PBReadSync(&pbAsync.pb);
    /*
    ** This is probbaly redundant but want to make sure we indicate the read
    ** is complete so we don't wander off into the Sargasso Sea of Mac
    ** threading
    */
    pbAsync.pb.ioParam.ioResult = 0;
  }
}
else
{
  /* writes are currently always async */
  err = PBWriteAsync(&pbAsync.pb);
    if (err != noErr)
      goto ErrorExit;
  
  /* Didn't get an error on the asyn call so mark thread io pending */
  me->io_pending = PR_TRUE;
}

/* See if the i/o call is still pending before we actually yield */
if (pbAsync.pb.ioParam.ioResult == 1)
  WaitOnThisThread(me, PR_INTERVAL_NO_TIMEOUT);
else
  me->io_pending = PR_FALSE; /* io completed so don't mark thread io pending */

There are two problems here.

First, here:

    err = PBReadAsync(&pbAsync.pb);
...
    me->io_pending = PR_TRUE;

You risk a race between the notifier and this code in setting me->io_pending. 
It's quite possible for the notifer to hit before you come out of PBReadAsync(), 
set io_pending to PR_FALSE, causing you to wait forever lower down (on Mac OS X, 
the 'ioResult == 1' trick does not seem to work, and anyway, io_pending and 
ioResult are not strictly synchronized).

Second, if you make an async call, then detect pbAsync.pb.ioParam.ioResult == 1 
so skip the WaitOnThisThread(), it's possible to leave this routine before the 
notifier fires. This was happening on Mac OS X. The notifier then gets some 
random stack garbage in its param, which is bad. *If* you make an async call, 
then you *must* call WaitOnThisThread(), even if you think the notifer has fired 
already.
Marking this to block Bug 98405; spreading keyword love.
Blocks: 98405
Summary: Async file I/O problems in Mac NSPR → Async file I/O problems in Mac NSPR
Severity: normal → critical
OS: Mac System 8.5 → MacOS X
Priority: -- → P1
Setting platform OS X, P1, critical.

Needs an appropriate NSPR milestone to ship with 0.9.5.
Note that the patch in bug 71718 will fix this.
Assignee: sdagley → sfraser
Depends on: 71718
Attached patch patch to macio.c (obsolete) — Splinter Review
The patch ensures that we always call WaitOnThisThread() if we are doing an asyc
read or write, and that thread->io_pending flag is properly maintained.

Reviews, please (wtc, sdagley/gordon).
Status: NEW → ASSIGNED
r=sdagley

One comment/suggestion though - the 20K threshold for sync vs. async reads was 
POOMA based on the size of file reads during startup 2 years ago.  We might want 
to change the that to 32K so we're aligned on an 8K boundary (and maybe re-check 
the size of file reads during startup)
Target Milestone: --- → 4.2
Attachment #49778 - Attachment is obsolete: true
New patch based on discussion with Gordon: Changes:
* It turns out that PB{Read,Write}Async *never* return errors directly (see
  <http://developer.apple.com/technotes/fl/fl_515.html>), so I removed the
  error handling from those calls.
* I added error handling for the PBReadSync call, which does need it.
* I removed an unneccessary setting of pbAsync.pb.ioParam.ioResult in
  the sync read case.
r=gordon
This is not on Tian Tian Must Fix list for this release. Should we onsider
minusing it, and changing the priority for this release?
A P1 critical w/nsbranch+ and nsenterrise+ and you don't think it's a must fix?  
It is.  I don't think tiantian is querying for NSPR bugs and/or the 4.2 milestone 
is throwing the query off
Get with Tian Tian, and make sure here query captures ALL the STOP SHIPS!!!
Check it into the trunk. Get people to test it, and let us know when you fill
comfonfident, and we'll give you a PDT+.
Whiteboard: PDT
Whiteboard: PDT → PDT [OSX+]
a=wtc.
Fix checked in to NSPR tip and NSPR_PRE_4_2_CLIENT_BRANCH. Keeping bug open for 
checkin to 0.9.4 branch.
Simon - Let's talk about this one @ tomorrow's PDT meeting.
Lisa - We need to know how well this fix works on all Mac OS. Let's let it bake
fora while and get some more testing.
Please advise on what items to test on the trunk.  I may need to get Mozilla QA
to help out on this testing on the trunk.
This is low-level code that handles just about all disk file I/O that mozilla 
does: disk cache, file reading on startup, downloading, mail summary files etc 
etc.
I will coordinate with Asa to try some tests.  Thanks.   I assume this is on all
the Mac OS's.
check it in - PDT+.
Whiteboard: PDT [OSX+] → PDT+, [OSX+]
Keywords: nsenterprise+
marking nsenterprise- to get off the enterprise stop ship list.
Keywords: nsenterprise-
Pls check this one in today.
Whiteboard: PDT+, [OSX+] → PDT+, [OSX+], [ETA 09.26]
No longer blocks: 98405
*** Bug 98405 has been marked as a duplicate of this bug. ***
This is already on the NSPR tip.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
I am marking verified.  This affects many basic operations on the Mac.  Since
the landing of this fix, there haven't been major problems with the disk cache,
file reading on startup, downloading, mail summary files etc, as Simon suggested
we try.

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: