Closed Bug 463289 Opened 13 years ago Closed 13 years ago
_LEVEL assertions in pldhash .c from ns Native Module Loader
Several times a week, I see a bundle of RECURSION_LEVEL assertion failures. When I look at the stacks, I don't see any actual recursion. Furthermore, the first assertion shows the recursion level going below 0. Brendan suspects that the hash table used by nsNativeModuleLoader::LoadModule might not be protected by a lock. Benjamin, can you look into this?
Indeed it's not protected by a lock and never has been. For some reason we always assumed that component loads would always occur on the main thread. I wonder if we shouldn't enforce that some how, perhaps even through proxying, rather than trying to make this code threadsafe.
Assignee: nobody → benjamin
OS: Mac OS X → All
Hardware: PC → All
Status: NEW → ASSIGNED
Summary: RECURSION_LEVEL assertions in pldhash.c → RECURSION_LEVEL assertions in pldhash.c from nsNativeModuleLoader
Comment on attachment 347293 [details] [diff] [review] Proxy requests to the main thread, rev. 1 Looks good. /be
Attachment #347293 - Flags: review?(brendan) → review+
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 347293 [details] [diff] [review] Proxy requests to the main thread, rev. 1 This seems to be a fairly safe patch: it should probably bake for a few days before getting approvals for branches.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Should this be relanded? I still see these asserts...
Yeah... it's not clear whether it was causing the orange the first time around, so I've been hesitant to land it. Maybe I'll try at a quiet time in the morning again. I'd love it if somebody ran a set of unit tests over this patch to check for deadlocks.
Pushed to m-c: http://hg.mozilla.org/mozilla-central/rev/54b5c634212e Unit-testers appear to have cycled green. Yay!
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Comment on attachment 347293 [details] [diff] [review] Proxy requests to the main thread, rev. 1 This patch is not risk-free: it could potentially cause deadlocks under the following conditions: * application code calling into the service manager * holding a lock (wrong code, but hard to detect or prevent) * from off the main thread The benefit of this patch is that if code from off the main thread initializes a new module (e.g. NSS), it could currently cause memory corruption and crashes. I'm ambivalent, but it should probably bake a while on trunk before landing on branch, so after b3.
As I've hit this assertion a lot with workers (which we're pushing as a major feature of 1.9.1) I would feel a lot safer with this fix in 1.9.1 as well...
I would rather have this in b3, so that we have a better chance of finding bad interactions with binary components out there in extensions, rather than having to find out with RCs.
Attachment #347293 - Flags: approval1.9.1? → approval1.9.1+
If we're hitting this more with workers then it's definitely a blocker. Bombs away, I'll push it to the branch tomorrow.
Flags: blocking1.9.1? → blocking1.9.1+
You need to log in before you can comment on or make changes to this bug.