Closed Bug 599127 Opened 14 years ago Closed 14 years ago

Wyciwyg channel writes end up doing synchronous disk IO

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jst, Assigned: michal)

References

Details

Attachments

(1 file, 6 obsolete files)

When a site builds up a complete document with document.open()/write()/close(), we end up remembering what that document was constructed from, and we end up writing that data to disk (through the disk cache) synchronously from the JS calls to document.write(), and this inevitably slows us down, significantly in some cases. From a quick look into about:cache on a few computers it seems gmail ends up using wyciwyg channels, as do a bunch of other random sites. And also, the sunspider performance testing harness hits this too.

We have a couple of options here at least, one is to make these entries only ever end up in the memory cache, but bz argues against that in bug 561085, comment 2. Another option is to let this data go to disk, but do that asynchronously like we do for normal disk cache writes now.

Michal says he will investigate.
Marking blocking betaN+, we should eliminate the synchronous disk IO for Firefox 4.
blocking2.0: --- → betaN+
Shaver, did you have stacks showing the IO being done that you could share here?
   0 libSystem.B.dylib write
   1 libnspr4.dylib PR_Write /Users/sayrer/dev/tm/nsprpub/pr/src/io/priometh.c:147
   2 XUL nsDiskCacheStreamIO::FlushBufferToFile() /Users/sayrer/dev/tm/netwerk/cache/nsDiskCacheStreams.cpp:734
   3 XUL nsDiskCacheStreamIO::Write(char const*, unsigned int, unsigned int*) /Users/sayrer/dev/tm/netwerk/cache/nsDiskCacheStreams.cpp:590
   4 XUL nsDiskCacheOutputStream::Write(char const*, unsigned int, unsigned int*) /Users/sayrer/dev/tm/netwerk/cache/nsDiskCacheStreams.cpp:257
   5 XUL nsCacheEntryDescriptor::nsOutputStreamWrapper::Write(char const*, unsigned int, unsigned int*) /Users/sayrer/dev/tm/netwerk/cache/nsCacheEntryDescriptor.cpp:665
   6 XUL nsWyciwygChannel::WriteToCacheEntry(nsAString_internal const&) /Users/sayrer/dev/tm/content/html/document/src/nsWyciwygChannel.cpp:379
   7 XUL nsHTMLDocument::WriteCommon(nsAString_internal const&, int) /Users/sayrer/dev/tm/content/html/document/src/nsHTMLDocument.cpp:2180
   8 XUL nsHTMLDocument::Write(nsAString_internal const&) /Users/sayrer/dev/tm/content/html/document/src/nsHTMLDocument.cpp:2212
   9 XUL nsIDOMHTMLDocument_Write(JSContext*, unsigned int, jsval_layout*) /Users/sayrer/dev/tm/ff-dbg/js/src/xpconnect/src/dom_quickstubs.cpp:17153
  10 XUL CallCompiler::generateNativeStub()
  11 XUL js::mjit::ic::NativeCall(js::VMFrame&, unsigned int)
  12  0x117fbc5cf
  13 XUL EnterMethodJIT(JSContext*, JSStackFrame*, void*, void*)
  14 XUL js::mjit::JaegerShot(JSContext*)
  15 XUL js::RunScript(JSContext*, JSScript*, JSFunction*, JSObject&)
  16 XUL js::Invoke(JSContext*, js::CallArgs const&, unsigned int)
  17 XUL js::ExternalInvoke(JSContext*, js::Value const&, js::Value const&, unsigned int, js::Value*, js::Value*)
  18 XUL js::ExternalInvoke(JSContext*, JSObject*, js::Value const&, unsigned int, js::Value*, js::Value*)
  19 XUL JS_CallFunctionValue
  20 XUL nsJSContext::CallEventHandler(nsISupports*, void*, void*, nsIArray*, nsIVariant**) /Users/sayrer/dev/tm/dom/base/nsJSEnvironment.cpp:2140
I don't think we should cache document.write content differently than other DOM manipulation content, as per our earlier discussion.  I don't fully understand bz's comment in the other bug, though, or why it's meaningfully datalossy.
> I don't think we should cache document.write content differently than other DOM
> manipulation content

Well... historically we (and at least Netscape; not sure about IE) have.  There are cases of user interaction (e.g. the user hitting reload) that are broken with normal DOM manipulation but not open/write/close.

Note that there's a major difference here: open/write/close always blows away the old document and creates a new one.  So it's much easier to cache than generic DOM manipulation.

> I don't fully understand bz's comment in the other bug, though, or why it's
> meaningfully datalossy.

Right now session restore of a frameset that has document.written content in a subframe should, I think, restore the contents of that subframe.
dwitte and I talked about this stuff at the work week, btw, and there's no particular reason to use the necko cache for this stuff.  We can use any other back end we want, ranging from a hashtable to some sort of sqlite thing.  The only issue is eviction policy, at that point....
I don't understand why this needs to go to disk synchronously given that we already have async write support in the cache.

I think for Firefox 4 we probably want to stick with the necko cache, since reworking it won't be trivial. I'd hope we can make it asynchronous, though.

Michal, you got this?
For the record, session restore does right now successfully restore a page from a cache entry with a wyciwyg uri, if you want to test you can use do so with something this:

javascript:w=window.open(); w.document.open(); w.document.write("foo"); w.document.close();

which will open a new tab or window, and if you then restart firefox the opened tab/window will restore with the correct content in it.
The asynchronous write in case of HTTP is done by asynchronous stream listener tee. I can do the same in nsWyciwygChannel, i.e. post an event to the background thread instead of writing to the cache directly in nsWyciwygChannel::WriteToCacheEntry().


> For the record, session restore does right now successfully restore a page from
> a cache entry with a wyciwyg uri, if you want to test you can use do so with
> something this:

Yes it does, but only in case of graceful shutdown. We throw away the whole cache after a crash and empty page is displayed in this case (which seems to me to be a bug). But I'm not sure what should be displayed, error page or the page that generated the wyciwyg page?
Attachment #482831 - Flags: review?(jduell.mcbugs)
Change the PRLock to a mozilla::Mutex and make it a non-pointer member of the class.  Direct PRLock usage is deprecated.
Attachment #482831 - Attachment is obsolete: true
Attachment #484035 - Flags: review?(jduell.mcbugs)
Attachment #482831 - Flags: review?(jduell.mcbugs)
Comment on attachment 484035 [details] [diff] [review]
patch v2 - using mozilla:Mutex instead of PRLock


>diff --git a/content/html/document/src/nsWyciwygChannel.cpp b/content/html/document/src/nsWyciwygChannel.cpp

All the files in this patch have moved to netwerk/protocol/wyciwyg--need to update.

>+
>+class nsWyciwygAsyncEvent : public nsRunnable {
>+public:
>+  nsWyciwygAsyncEvent()
>+    : mEventType(eNone)
>+    , mSource(-1)
>+    , mReason(NS_OK)
>+  {}

Change this to instead use three separate classes: nsWyciwygWriteEvent, nsWyciwygSetCharsetEvent, and nsWyciwygCloseEvent.  Then you'd have only the constructors (with needed args for Run) a simple Run() with no switch statement, and with infallible new you'd be able to just call

     mCacheIOTarget->Dispatch(new nsWyciwygCloseEvent(...));


> 
>+  if (!mCacheIOTarget) {
>+    rv = WriteToCacheEntryInternal(aData);

I think we should scrap all this !mCacheIOTarget logic in the whole wyciwyg protocol--if we can't spin a new thread during Init() that ought to be a serious failure.  Having both codepaths just makes things more complex.

>+  } else {
>+    nsRefPtr<nsWyciwygAsyncEvent> ev = new nsWyciwygAsyncEvent();
>+    if (!ev) return NS_ERROR_OUT_OF_MEMORY;

new is now infallible--get rid of error check.  Ditto for other instance of same logic.

>+    mozilla::Mutex                      mMutex;

Ugh.  The synchronization needed for this change is really subtle.  I've been staring at the code for a while, and I don't think we've got it right.

Right now you're using this mutex to ensure that a CloseCacheEntryInternal() call doesn't set mCacheEntry to null in the middle of ReadFromCache(). So far, ok.

[Note that this is unlikely--the code internally would never cause them to be entered at the same time (the only place ReadFromCache gets called now is from nsWyciwygChannel::OnCacheEntryAvailable: the only place CloseCacheEntry gets called is either after that ReadFromCache call fails, or during OnStopRequest, which by definition can only be called after ReadFromCache is complete.)  But it could conceivably happen if a client called AsyncOpen and then CloseCacheEntry in succession.]

But we still have a problem with GetCharsetAndSource().  You've tried to handle this by using a temp variable:

>+  nsCOMPtr<nsICacheEntryDescriptor> cacheEntry;
>+  cacheEntry = mCacheEntry;
    
But this doesn't work, for two reasons:

1) It's a somewhat risky thing to rely on a pointer write being atomic (i.e that if CloseCacheEntry in the IO sets ptr = null, we'll see either the previous value of ptr, or null).  It works on many platform, and probably all of the ones we care about (for now at least).  But as bz just pointed out to me, it's not ok to do with an nsCOMPtr, as that involves both a ptr assignment and an AddRef, and there's a race (we could read the ptr value, then the IO thread could set mCacheEntry to null, and if it's the last ref, delete it, and then the nsCOMPtr here could AddRef the now-dead object).

2) A lock would fix #1.  But the following call sequence would still fail:

   channel->WriteCharsetAndSourceToCache
   channel->WriteToCacheEntry
   channel->GetCharsetAndSource

The first two functions would be sent to the IO thread, and may not have completed (and thus mCacheEntry may not yet be set) by the time GetCharsetAndSource is called.  Logically, if the user has called WriteCharsetAndSourceToCache, we need to have GetCharsetAndSource return the value.  

I think the answer here is that we need have GetCharsetAndSource wait if we know that an open of mCacheEntry is pending.  That's going to requre a "bool OpenIsPending", and also a mozilla::Monitor instead of a lock.  Use MonitorAutoEnter in *every* place in the entire class where you need to check/access mCacheEntry.  In AsyncOpen and WriteCharsetAndSourceToCache (on the main thread: before you send an event to the IO thread) set OpenIsPending = true (while you have the monitor lock).  Have GetCharsetAndSource grab the monitor, and then do

    MonitorAutoEnter mon(mMonitor);
    while (OpenIsPending)
       mon.Wait();

before checking/accessing mCacheEntry.

In OnCacheEntryAvailable, hold the monitor, and after you set mCacheEntry, set OpenIsPending = false and call mon.Notify().   Also do this in OpenCacheEntry, after mCacheEntry has been set.

*Ideally*, we'd let go of the monitor during any blocking disk calls (like the call to cacheSession->OpenCacheEntry, or during mCacheOutputStream->Write).  You may be able to release the monitor during those calls.  But this isn't a big deal--it seems like WYSIWYG channels are really used for either reading or writing, not both simultaneously, so I don't think anything important is likely to block even if you hold the monitor during blocking I/O:  just maybe some calls to GetCharsetAndSource (i.e. ones that wouldn't otherwise need to block).  But I haven't really thought through that with 100% certainty.


Maybe add NS_ASSERTIONS that we're not on the main thread for those functions that can now only be called from the IO thread?  It's always nice to help understand the code, if nothing else.


>diff --git a/content/html/document/src/nsWyciwygProtocolHandler.cpp b/content/html/document/src/nsWyciwygProtocolHandler.cpp
> 
> NS_IMETHODIMP
> nsWyciwygProtocolHandler::NewChannel(nsIURI* url, nsIChannel* *result)
> {
>   NS_ENSURE_ARG_POINTER(url);
>   nsresult rv;
>-    
>+
>+  nsCAutoString path;
>+  rv = url->GetPath(path);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRInt32 slashIndex = path.FindChar('/', 2);
>+  if (slashIndex == kNotFound)
>+    return NS_ERROR_FAILURE;
>+
>+  if (path.Length() < (PRUint32)slashIndex + 1 + 5)
>+    return NS_ERROR_FAILURE;
>+
>+  if (!PL_strncasecmp(path.get() + slashIndex + 1, "https", 5)) {

Is there some reason you're doing all this string logic on the URL, rather than just calling uri->SchemeIs("https", ...)?  That's what nsHttpHandler does.

>+    // HACK: make sure PSM gets initialized on the main thread.
>+    nsCOMPtr<nsISocketProviderService> spserv =
>+      do_GetService(NS_SOCKETPROVIDERSERVICE_CONTRACTID);
>+    if (spserv) {
>+      nsCOMPtr<nsISocketProvider> provider;
>+      spserv->GetSocketProvider("ssl", getter_AddRefs(provider));
>+    }
>+  }

We need to make sure we only init the PSM if we're in chrome, not the child.

(Perhaps we should move the common logic for intializing it in nsHttpHandler.cpp and here into a function somewhere--nsNetUtil.h?)

Oh, and just to be clear--the PSM init has nothing to do with the async write fix, right?  You just noticed we may need it if a wyciwyg channel is the first https channel opened?  (Is that even possible? During session restore?))

>   nsWyciwygChannel* channel = new nsWyciwygChannel();
>   if (!channel)
>     return NS_ERROR_OUT_OF_MEMORY;
>   NS_ADDREF(channel);
>   rv = channel->Init(url);
>   if (NS_FAILED(rv)) {
>     NS_RELEASE(channel);
>     return rv;


> netwerk/cache/nsICacheSession.idl

Is is just me, or should the docs for nsICacheSession state that asyncOpenCacheEntry is only for read access?  I suggest

- * Asynchronous cache access. Does not block the calling thread. 
+ * Asynchronous cache open: ACCESS_READ only. Does not block calling thread.
Attachment #484035 - Flags: review?(jduell.mcbugs) → review-
(In reply to comment #13)
> 2) A lock would fix #1.  But the following call sequence would still fail:
> 
>    channel->WriteCharsetAndSourceToCache

I think you mean channel->SetCharsetAndSource, right?

> check/access mCacheEntry.  In AsyncOpen and WriteCharsetAndSourceToCache (on
> the main thread: before you send an event to the IO thread) set OpenIsPending =
> true (while you have the monitor lock).

And here you probably mean WriteToCacheEntry instead of WriteCharsetAndSourceToCache, right? The suggested solution would fix your example but it wouldn't fix following scenario:

 channel->WriteToCacheEntry
 (WriteToCacheEntryInternal is called, so mCacheEntry is valid)
 channel->SetCharsetAndSource
 channel->GetCharsetAndSource

Old values would be returned if GetCharsetAndSource is called before SetCharsetAndSourceInternal.

IMO a simpler working solution is to store charset and source into the member variables directly in SetCharsetAndSource and remember that this method was called. GetCharsetAndSource would return these values if SetCharsetAndSource was called, otherwise it would get them from the cache entry. I didn't implement this because we don't use wyciwyg channel this way and it would need an additional locking which I tried to minimize. But you're right that we should probably handle this correctly because of addons etc.


> >+  if (path.Length() < (PRUint32)slashIndex + 1 + 5)
> >+    return NS_ERROR_FAILURE;
> >+
> >+  if (!PL_strncasecmp(path.get() + slashIndex + 1, "https", 5)) {
> 
> Is there some reason you're doing all this string logic on the URL, rather than
> just calling uri->SchemeIs("https", ...)?  That's what nsHttpHandler does.

I'm not checking the scheme but beginning of the path. The scheme is "wyciwyg".


> Oh, and just to be clear--the PSM init has nothing to do with the async write
> fix, right?  You just noticed we may need it if a wyciwyg channel is the first
> https channel opened?  (Is that even possible? During session restore?))

Right, I've found that problem while restoring a session with https wyciwyg page.


> Is is just me, or should the docs for nsICacheSession state that
> asyncOpenCacheEntry is only for read access?  I suggest
> 
> - * Asynchronous cache access. Does not block the calling thread. 
> + * Asynchronous cache open: ACCESS_READ only. Does not block calling thread.

Why do you think so? There is no limitation for calling asyncOpenCacheEntry().
> IMO a simpler working solution is to store charset and source into the member
variables directly in SetCharsetAndSource and ... GetCharsetAndSource would return these values if SetCharsetAndSource was called, otherwise it would get them from the cache entry.

Ah--that sounds like a good solution.  We might be able to avoid the need to have mMutex (or monitor) with that, at least for our purposes.

> I didn't implement this because we don't use wyciwyg channel this way

There's probably more stuff we could theoretically worry about with synchronization here, but you're right that practically speaking it looks like wyciwyg channels are mainly used for either reading or writing, not both simultaneously, so we can probably get away without worrying about edge cases.
Attached patch patch v3 (obsolete) — Splinter Review
(In reply to comment #15)
> There's probably more stuff we could theoretically worry about with
> synchronization here, but you're right that practically speaking it looks like
> wyciwyg channels are mainly used for either reading or writing, not both
> simultaneously, so we can probably get away without worrying about edge cases.

We don't use it for both reading and writing and I think that it wouldn't even work. And especially WyciwygChannelChild is written so that it can either only read or only write. So I've removed the mutex completely, which should be safe for our purpose.
Attachment #484035 - Attachment is obsolete: true
Attachment #491964 - Flags: review?(jduell.mcbugs)
Attachment #491964 - Flags: review?(jduell.mcbugs) → review+
pushed to try.
Attachment #491964 - Attachment is obsolete: true
Attachment #492109 - Flags: review+
Tryserver run came up with asserts about using URIs on more than one thread.  I fixed the one use of mURI by passing the spec instead, but still had asserts, because if one of our Aync events did the final release of the channel on the CacheIOThread, we'd call the mURI destructor (and get an assert about thread safety again).

Fixed by making all the Async IO events' destructors create an event to the main thread just to make sure the channel is never destroyed on the cacheIO thread.

Michal:  take a look and see what you think.   I've sent it to tryserver at 1:49 AM PST
Attachment #492109 - Attachment is obsolete: true
Attachment #492274 - Flags: review?(michal.novotny)
Try is green, so unless michal finds something objectionable in how I solved this, we're good to land.
> +  ~nsWyciwygAsyncEvent().
> +  {
> +    // NS_NewRunnableMethod addrefs channel and will release it on main
> +    // thread--which is the goal here--so target method is a no-op
> +    NS_DispatchToMainThread(
> +        NS_NewRunnableMethod(mChannel, &nsWyciwygChannel::MainReleaseNoOp),
> +        NS_DISPATCH_NORMAL);
> +  }

Although it is very unlikely, it is still possible that MainReleaseNoOp will be called on the main thread before mChannel in nsWyciwygAsyncEvent is nulled out on the IO thread. Isn't it?
Attached patch v4: fix unlikely race (obsolete) — Splinter Review
Nice catch.  I've changed it to free mChannel before we dispatch:

    nsCOMPtr<nsIRunnable> event =
      NS_NewRunnableMethod(mChannel, &nsWyciwygChannel::MainReleaseNoOp);

    // ensure we free our ref before event gets released on main thread
    mChannel = 0;
    NS_DispatchToMainThread(event, NS_DISPATCH_NORMAL);
Attachment #492274 - Attachment is obsolete: true
Attachment #492574 - Flags: review?(michal.novotny)
Attachment #492274 - Flags: review?(michal.novotny)
Comment on attachment 492574 [details] [diff] [review]
v4: fix unlikely race

Looks good.
Attachment #492574 - Flags: review?(michal.novotny) → review+
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/fc75e982b6dd
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b8
This broke my build locally.  I didn't wait long enough to see if it also breaks libxul builds (mine is non-libxul), and I landed the fix anyway:

http://hg.mozilla.org/mozilla-central/rev/07c209a8b9bc
Summary: Wyciwyg channel write's end up doing synchronous disk IO → Wyciwyg channel writes end up doing synchronous disk IO
Depends on: 620194
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: