Closed Bug 953435 Opened 6 years ago Closed 6 years ago
Browser hang on Mac if an After
Process Next Event callback tries to spin the event loop
The attached testcase seems to completely freeze my browser on Mac: we never respond to clicks on the "OK" button of the alert. The only weird thing I see in here is that we're spinning the event loop from inside this stack: #61 0x000000010565565d in nsDOMMutationObserver::HandleMutations () at nsDOMMutationObserver.h:390 #62 0x00000001056457b4 in nsXPConnect::AfterProcessNextEvent (this=0x10060bbe0, aThread=0x100615520, aRecursionDepth=0, aEventWasProcessed=true) at nsXPConnect.cpp:1130 #63 0x0000000105645b84 in non-virtual thunk to nsXPConnect::AfterProcessNextEvent(nsIThreadInternal*, unsigned int, bool) () at nsXPConnect.cpp:1246 #64 0x0000000103b79379 in nsThread::ProcessNextEvent (this=0x100615520, mayWait=false, result=0x7fff5fbfcb13) at nsThread.cpp:651 #65 0x0000000103a8255b in NS_ProcessPendingEvents (thread=0x100615520, timeout=20) at nsThreadUtils.cpp:210 via the DOM mutation observer calling alert(). I'm not sure whether this is cross-platform or Mac-only... Olli, do you see the same issue on Linux?
Works fine on linux.
And works fine on Windows too.
Somewhat similar to Bug 397439?
A possible patch coming, but can't test it since I don't have a (working) mac.
May or may not compile on OSX, and may or may not fix the issue :) GetIsProcessingEvents could be made thread-safe, if needed (by using atomic counter), but other similar things aren't thread-safe either there. https://tbpl.mozilla.org/?tree=Try&rev=5bb11e763dc0
Summary: Browser hang (at least on Mac) if an AfterProcessNextEvent callback tries to spin the event loop → Browser hang on Mac if an AfterProcessNextEvent callback tries to spin the event loop
And it doesn't compile.
Attachment #8351740 - Attachment is obsolete: true
v2 fixes the bug for me once I fix the presumably-backwards test in the assert in GetIsProcessingEvents.
Oops, the assert in the patch is indeed wrong
The patch makes Cocoa appshell less fragile, by making it use generic thread API. isProcessingEvents is similar, but not quite the same to recursionDepth. (I think we should merge nsIThreadInternal to nsIThread)
(And yes, isProcessingEvents is a bit odd, given that it isn't threadsafe, but for now it should be enough.)
Comment on attachment 8354929 [details] [diff] [review] +asserts fixed Review of attachment 8354929 [details] [diff] [review]: ----------------------------------------------------------------- r=me with the changes below. ::: xpcom/threads/nsIThread.idl @@ +83,5 @@ > */ > boolean processNextEvent(in boolean mayWait); > + > + /** > + * true if we're processing runnables or thread observers. Please add a blurb about only accessing this on the current thread. I see that nsThread::GetRecursionDepth, which is similar to this, MOZ_ASSERTs for mismatched threads, which seems a little more likely to catch errors. Please use MOZ_ASSERT instead of warning. (Bonus points for making this attribute [infallible], then.) ::: xpcom/threads/nsThread.h @@ +149,5 @@ > PRThread *mThread; > uint32_t mRunningEvent; // counter > uint32_t mStackSize; > > + uint32_t mIsProcessingEvents; Naming this mProcessingEvent for parallelism with mRunningEvent seems like a better choice.
Attachment #8354929 - Flags: review?(nfroyd) → review+
nsThread::GetRecursionDepth does *not* assert. That is the reason why I removed assert from my patch and made it consistent with recursionDepth
(In reply to Olli Pettay [:smaug] from comment #13) > nsThread::GetRecursionDepth does *not* assert. That is the reason why I > removed > assert from my patch and made it consistent with recursionDepth Oh, whoops, you're right. I was either looking at an old version or was just confused. Carry on!
Comment on attachment 8354929 [details] [diff] [review] +asserts fixed This looks fine to me, but I'd like to test it. I should be able to do that later today or tomorrow.
Comment on attachment 8354929 [details] [diff] [review] +asserts fixed I just tested this in current trunk code, and everything went as expected. This bug's testcase "froze" Firefox without the patch, but didn't with the patch. Conceptually, using nsIThread::isProcessingEvents is exactly what I wanted to do from the beginning. But I convinced myself that the "recursion depth" is a good proxy. And indeed it seems to be, since this code is old, and we don't seem to have had much trouble with it since it landed. What I don't understand, though, is exactly how/why using nsIThread::isProcessingEvents can sometimes give a different result from using the "recursion depth". Can anyone here explain?
Attachment #8354929 - Flags: review?(smichaud) → review+
recursionDepth value is decreased before calling the thread observers.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
this bug has originally showed about a 5.47% increase in tscroll on osx 10.6 and 10.8: http://graphs.mozilla.org/graph.html#tests=[[287,63,21]]&sel=none&displayrange=30&datatype=running The backout put us back to the original levels. This is seen in better detail on datazilla and two tests specifically benefit: tiled-fixed.html: https://datazilla.mozilla.org/?start=1390855933&stop=1391460733&product=Firefox&repository=Mozilla-Inbound&test=tscrollx&page=tiled-fixed.html&x86=false&x86_64=true&project=talos and tiled.html: https://datazilla.mozilla.org/?start=1390855933&stop=1391460733&product=Firefox&repository=Mozilla-Inbound&test=tscrollx&page=tiled.html&x86=false&x86_64=true&project=talos
Do we want to track this as a talos regression? This was backed out and the original patch is a performance imporovement. I am just following up on all performance bugs that have now migrated to Aurora.
We don't want to track the talos regression.
thanks, I have removed the whiteboard flag.
Smaug, could you reland this bug's patch on trunk (or mozilla-inbound), now that I've landed a bandaid patch for bug 959281? It will ultimately need to be backed out again when a patch for bug 996848 lands (since that patch will make it redundant). We can look out for another talos regression -- but I suspect the bandaid patch from bug 959281 will prevent it from happening.
Feel free to land, or I'll land tomorrow.
Go ahead and land it tomorrow. I'm at the end of my day and tired :-)
Do we still need some variant of this? I don't have access to any OSX devices? nsAppShell.mm seems to have changed quite a bit, and the patch certainly doesn't apply anymore.
> Do we still need some variant of this? As long as my patch for bug 996848 remains in the tree, we don't. So far that patch has done pretty well. But it's conceivable that some issue will arise that we can't fix (or can't fix quickly enough). In that case we'd probably want to back it out and fall back to a combination of my bandaid patch for bug 959281 and your patch for this bug.
Let's close this WORKSFORME for the time being.
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.