Closed Bug 537572 Opened 15 years ago Closed 14 years ago

Workers: nsLayoutStatics refcounting isn't threadsafe, triggered via nsContentUtils::WrapNative

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: timeless, Assigned: bent.mozilla)

Details

Attachments

(2 files, 4 obsolete files)

Bug 537527 comment 2

crash with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a1pre)
Gecko/20100102 Minefield/3.7a1pre ID:20100102044111

Signature    nsCOMPtr_base::~nsCOMPtr_base() |
nsTArray<txExpandedNameMap_base::MapItem>::DestructRange(unsigned int, unsigned
int)
bp-0f41d41d-6494-46b2-836d-413e22100102
Time     2010-01-02 07:39:28.787807
Uptime    5
Last Crash    483823 seconds before submission
Product    Firefox
Version    3.7a1pre
Build ID    20100102044111
Branch    1.9.3
OS    Windows NT
OS Version    5.1.2600 Service Pack 3
CPU    x86
CPU Info    GenuineIntel family 15 model 2 stepping 9
Crash Reason    EXCEPTION_ACCESS_VIOLATION
Crash Address    0x1
User Comments    Bug 537527
Processor Notes     
Crashing Thread
Frame     Module     Signature     Source
0     xul.dll     nsCOMPtr_base::~nsCOMPtr_base()    
obj-firefox/xpcom/build/nsCOMPtr.cpp:81
1     xul.dll    
nsTArray<txExpandedNameMap_base::MapItem>::DestructRange(unsigned int,unsigned
int)     obj-firefox/dist/include/nsTArray.h:878
2     xul.dll     txHandlerTable::`scalar deleting destructor'(unsigned int)    
3     xul.dll     txHandlerTable::shutdown()    
content/xslt/src/xslt/txStylesheetCompileHandlers.cpp:3078
4     xul.dll     txMozillaXSLTProcessor::Shutdown()    
content/xslt/src/xslt/txMozillaXSLTProcessor.cpp:1306
5     xul.dll     nsLayoutStatics::Shutdown()    
layout/build/nsLayoutStatics.cpp:304
6     xul.dll    
nsContentUtils::WrapNative(JSContext*,JSObject*,nsISupports*,nsID
const*,int*,nsIXPConnectJSObjectHolder**,int)     
7         @0x7fd203f     
8     xul.dll     nsDOMWorkerFunctions::NewWorker(JSContext*,JSObject*,unsigned
int,int*,int*)     dom/src/threads/nsDOMWorker.cpp:367
9     mozjs.dll     js_Invoke     js/src/jsinterp.cpp:1376
10     mozjs.dll     RefillFinalizableFreeList     js/src/jsgc.cpp:1527
11         @0x4cbff4f

This isn't the main thread. But nsLayoutStatics isn't threadsafe.
Attached patch rough outline (obsolete) — Splinter Review
Attachment #419801 - Flags: review?(dbaron)
Comment on attachment 419801 [details] [diff] [review]
rough outline

This looks important to fix, but whatever the fix is, it shouldn't be littered with #ifdefs (and the ifdefs also shouldn't misspell STATICS as STATISTICS).
Attachment #419801 - Flags: review?(dbaron) → review-
Summary: dom workers violated an unasserted invariant that nsLayoutStatics isn't threadsafe → dom workers violated an unasserted invariant that nsLayoutStatics refcounting isn't threadsafe
I don't understand nsDOMFileReader and nsXMLHttpRequest parts in the the patch.
Those objects are cycle collectable and should be ever used only in the main thread.
Attached patch DOM worker bandaid (obsolete) — Splinter Review
I think this should be simple enough for fixing the dom worker usage.
Attachment #419835 - Flags: superreview?(jst)
Attachment #419835 - Flags: review?(jst)
Don't you also need to remove some other caller?

(Also, could you add NS_ASSERTIONS that NS_IsMainThread() in nsLayoutStatics::AddRef and Release?)
Attached patch Patch (obsolete) — Splinter Review
Fix workers to hold layout statics alive until shutdown, and add assertions to layout statics.
Assignee: timeless → bent.mozilla
Attachment #419835 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #419957 - Flags: superreview?(jst)
Attachment #419957 - Flags: review?(jst)
Attachment #419835 - Flags: superreview?(jst)
Attachment #419835 - Flags: review?(jst)
Dbarons question remains unanswered. How can this fix the problem if no addrefs/releases are removed from anywhere?
We're now addrefing for the lifetime of the dom thread service which must be longer than the lifetime of an individual worker so there is no way for a worker to release the last reference.
Comment on attachment 419957 [details] [diff] [review]
Patch

bent: that doesn't work.

the addref function is currently not threadsafe. what this means is that if two threads call addref at the same time, bad things can happen.
Attachment #419957 - Flags: superreview?(jst)
Attachment #419957 - Flags: review?(jst)
Attachment #419957 - Flags: review-
What timeless said. One solution is to simply make the addref/release threadsafe, but require that the last release happens on the main thread.

We'd have to audit the places that call addref/release to make sure that it wouldn't be too big of a perf-hit. I think it'd be ok though.
(In reply to comment #10)
> What timeless said. One solution is to simply make the addref/release
> threadsafe, but require that the last release happens on the main thread.

How do you suggest doing the latter?  If we simply "require" it, we may as well make the callers in question not refcount at all.
Hmm.. good point.

I think in this case it would be possible to implement the latter requirement since we can ensure that when nsContentUtils::WrapNative is called by worker code (off the main thread), the same worker code keeps a reference from the main thread.

However when nsContentUtils::WrapNative is called from other code, that always happens on the main thread.

Maybe.

But I agree it's cleaner if we can just always require that the refcounting happens from the main thread. This would probably mean removing the refcounting from nsContentUtils::WrapNative.
It's not clear to me that this reference counting is all *that* critical; its only purpose is to prevent shutdown crashes.  If we have to skip it for a few cases... maybe we'll crash on shutdown in some very rare cases?  But if the thing that's responsible for shutting down the worker threads does an AddRef/Release, then even that shouldn't be possible.
Peterv, can we remove that LayoutStaticRef in WrapNative? Sicking and I think it shouldn't really be necessary.
Then how do you ensure XPConnect stays alive during the wrapping?
Attached patch Patch, v2 (obsolete) — Splinter Review
Chatted with peterv, think this is about the best we can do for now.
Attachment #419801 - Attachment is obsolete: true
Attachment #419957 - Attachment is obsolete: true
Attachment #428491 - Flags: superreview?(peterv)
Attachment #428491 - Flags: review?(peterv)
Summary: dom workers violated an unasserted invariant that nsLayoutStatics refcounting isn't threadsafe → Workers: nsLayoutStatics refcounting isn't threadsafe, triggered via nsContentUtils::WrapNative
Comment on attachment 428491 [details] [diff] [review]
Patch, v2

>-  PRBool push = topJSContext != cx;
>-  if (push) {
>-    rv = sThreadJSContextStack->Push(cx);
>-    NS_ENSURE_SUCCESS(rv, rv);
>-  }
>-
>-  rv = sXPConnect->WrapNativeToJSVal(cx, scope, native, aIID, aAllowWrapping,
>-                                     vp, aHolder);
>-
>-  if (push) {
>-    sThreadJSContextStack->Pop(nsnull);
>+  if (NS_SUCCEEDED(rv)) {
>+    PRBool push = topJSContext != cx;
>+    if (push) {
>+      rv = sThreadJSContextStack->Push(cx);
>+      if (NS_SUCCEEDED(rv)) {
>+        rv = sXPConnect->WrapNativeToJSVal(cx, scope, native, aIID,
>+                                           aAllowWrapping, vp, aHolder);

Now you're only calling sXPConnect->WrapNativeToJSVal if |push| is true, that wasn't the case in the old code.
Attachment #428491 - Flags: superreview?(peterv)
Attachment #428491 - Flags: review?(peterv)
Attachment #428491 - Flags: review-
Attached patch Patch, v2.1Splinter Review
Bah! Ok, this should be right. Nice catch.
Attachment #428491 - Attachment is obsolete: true
Attachment #428555 - Flags: superreview?(jonas)
Attachment #428555 - Flags: review?(jonas)
Attachment #428555 - Flags: superreview?(jonas)
Attachment #428555 - Flags: superreview+
Attachment #428555 - Flags: review?(jonas)
Attachment #428555 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/61de3d10aece
Status: ASSIGNED → RESOLVED
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
Closed: 14 years ago
Resolution: --- → FIXED
Attachment #428751 - Flags: review?(bent.mozilla) → review+
How safe / tested is this?
blocking1.9.1: ? → ---
blocking1.9.2: ? → needed
Actually, looking at the branches, this isn't needed. We don't have nsContentUtils::WrapNative there, and nsDOMClassInfo has its own static helper.
blocking1.9.2: needed → ---
Landed followup assertions, http://hg.mozilla.org/mozilla-central/rev/2dc00d4b379a
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
This is probably not the right place to put this comment, so please tell me where I can put it :)

For the past couple of days, my own builds of comm-central-trunk have been hitting this during startup:

ASSERTION: Could not initialize nsContentUtils: 'Error', file /u/wa1ter/src/thunderbird/mozilla/layout/build/nsLayoutStatics.cpp, line 168

Note: I'm *not* having this problem with firefox, it's thunderbird that's hitting the assertion.

I have a backtrace if anyone's interested.

Thanks for any clues.
Ben, you switched these back to branch nominations, which doesn't jive with comment 22 - was that a mistake?
(In reply to comment #25)
> Ben, you switched these back to branch nominations, which doesn't jive with
> comment 22 - was that a mistake?

Oops!
blocking1.9.1: ? → ---
blocking1.9.2: ? → ---
(In reply to comment #24)

walter, I'd file a bug in Thunderbird:General and someone will triage it appropriately.
I was about to file a new bug but found this very old one that seemed right on the money.

nsLayoutStatics::Shutdown() is never called when I shut down latest tip of firefox.

I believe it is due to the ref counter not being thread safe so gets out of sync, never reaches zero, and as a result, Shutdown() is never called...

This is bad because essential cleanup such as cubeb_destroy() is not being called and causing audio glitches in JACK when firefox is closed.
(In reply to Damien Zammit from comment #28)
> I was about to file a new bug but found this very old one that seemed right
> on the money.

Please file a new bug. This bug is supporsed to be fixed and even added an assertion to prevent it from happening again. Your issue is very likely to be a different problem.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: