<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Brian Hackett [Laid off!]

Reporter

Comment 1

•

10 years ago

Um, I have no idea how that got set.

Group: mozilla-employee-confidential

Comment 2

•

10 years ago

All objects have a non-NULL group_ pointer, so if group_ is NULL then the object is invalid. I don't know anything about mozJSComponentLoader or why the object might be invalid.

Flags: needinfo?(bhackett1024)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 4

•

10 years ago

As written in the duped bug, I can reproduce this crash all the time. When I'm running our firefox-ui-tests with the --jsdebugger argument and are waiting too long at a formerly set breakpoint, Firefox crashes. Steps: 1. mach firefox-ui-test --jsdebugger 2. Set breakpoint in marionette/driver.js in switchToWindow (line 1470) at this.getWinEnumerator() 3. Click the ok button of the modal dialog to start the tests 4. When the breakpoint is hit just wait about 10s - then Firefox crashes

Keywords: reproducible

OS: Mac OS X → All

Hardware: Unspecified → All

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Reporter

Comment 5

•

10 years ago

Eddy, any chance you can help here, maybe with the steps from comment #4 or by pointing to someone who can? Thanks!

Flags: needinfo?(ejpbruel)

Comment 6

•

10 years ago

I was also able to hit this numerous times with the following steps: 1. Open the page http://cpettitt.github.io/project/dagre-d3/latest/demo/user-defined.html 2. Open the page toolbox 3. Open the browser toolbox 4. In the browser toolbox, open inspector.js, and set a breakpoint inside modifyAttributes 5. Back over in the page toolbox, inspect the house like SVG shape, and edit an attribute ~10s later the browser crashes. Here's my crash report from these steps: https://crash-stats.mozilla.com/report/index/28539982-f6f5-4b10-96e6-e43862160309

•

10 years ago

This looks like the same bug as bug 1232709 - |obj| is presumably null. (In reply to :Gijs Kruitbosch from comment #10) > Bobby, can you play final buck-passer where all of Tom, Eddy and bhackett > say they don't know who knows this code? Who does? The component loader isn't really code that anybody knows by heart. If this is reproducible (which it sounds like it is) _or_ you can get an rr trace, this should be straightforward for any Gecko hacker (including yourself) to debug. So I think you should just find somebody with the cycles to actually look at it. In terms of XPConnect peers who might be convinced to look into it, Gabor or Blake, maybe?

Flags: needinfo?(bobbyholley)

Reporter

Comment 12

•

10 years ago

Gabor, would you have time to look at this? I'm swamped with other things and not really at home in JS or xpconnect land...

Flags: needinfo?(gkrizsanits)

Comment 13

•

10 years ago

Yes, I can take a look at this.

Assignee: nobody → gkrizsanits

Flags: needinfo?(gkrizsanits)

Comment 14

•

10 years ago

I could not reproduce the crash with any of the steps above on the current nightly / fresh build. (In reply to Brian Hackett (:bhackett) from comment #2) > All objects have a non-NULL group_ pointer, so if group_ is NULL then the > object is invalid. I don't know anything about mozJSComponentLoader or why > the object might be invalid. This bug has nothing to do with mozJSComponentLoader, we have an object with an invalid state, and import is just happen to be the function we pass this object to. Someone from the JS team should take a look at this, I have no idea how an object can get into that state, nor can I reproduce this to try and trace the moment when we set that |group_| property to null.

Assignee: gkrizsanits → nobody

Bobby Holley (:bholley)

Comment 15

•

10 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #14) > I could not reproduce the crash with any of the steps above on the current > nightly / fresh build. > > (In reply to Brian Hackett (:bhackett) from comment #2) > > All objects have a non-NULL group_ pointer, so if group_ is NULL then the > > object is invalid. I don't know anything about mozJSComponentLoader or why > > the object might be invalid. > > This bug has nothing to do with mozJSComponentLoader, we have an object with > an invalid state, and import is just happen to be the function we pass this > object to. Someone from the JS team should take a look at this, I have no > idea how an object can get into that state, nor can I reproduce this to try > and trace the moment when we set that |group_| property to null. My impression is that |obj| is null, and that this is the non-debug version of bug 1232709. Are you saying that isn't the case?

Flags: needinfo?(gkrizsanits)

Comment 16

•

10 years ago

(In reply to Bobby Holley (busy) from comment #15) > My impression is that |obj| is null, and that this is the non-debug version > of bug 1232709. Are you saying that isn't the case? Actually, I think you're right sorry. Anyway, I keep trying to reproduce it when I have some time for it but cannot do much until then.

Flags: needinfo?(gkrizsanits)

Bobby Holley (:bholley)

Comment 17

•

10 years ago

Sounds like Gijs can reproduce. Maybe he can come up with reliable STR or an rr trace?

Reporter

Comment 18

•

10 years ago

(In reply to Bobby Holley (busy) from comment #17) > Sounds like Gijs can reproduce. Maybe he can come up with reliable STR or an > rr trace? (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #16) > (In reply to Bobby Holley (busy) from comment #15) > > My impression is that |obj| is null, and that this is the non-debug version > > of bug 1232709. Are you saying that isn't the case? > > Actually, I think you're right sorry. Anyway, I keep trying to reproduce it > when I have some time for it but cannot do much until then. Do the steps in comment #6 not reproduce this for you: (In reply to J. Ryan Stinnett [:jryans] (use ni?) (at work week until Mar. 18) from comment #6) > I was also able to hit this numerous times with the following steps: > > 1. Open the page > http://cpettitt.github.io/project/dagre-d3/latest/demo/user-defined.html > 2. Open the page toolbox > 3. Open the browser toolbox > 4. In the browser toolbox, open inspector.js, and set a breakpoint inside > modifyAttributes > 5. Back over in the page toolbox, inspect the house like SVG shape, and edit > an attribute > > ~10s later the browser crashes. > > Here's my crash report from these steps: > > https://crash-stats.mozilla.com/report/index/28539982-f6f5-4b10-96e6- > e43862160309 ? I don't know how to make an rr trace and would need to get everything set-up on my linux vm to do that, AIUI (and manipulate some vmware config things, or something?). I haven't figured out how to do this yet, and I"m fairly nose-to-grindstone with e10s test efforts, unfortunately. I am also... hesitant... about me being able to figure out what's wrong with the component loader - my knowledge of the JS engine as well as xpconnect and their concepts is approximately nil. :-)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 19

•

10 years ago

(In reply to :Gijs Kruitbosch from comment #18) > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #16) > Do the steps in comment #6 not reproduce this for you: > ? No they don't. Actually the breakpoint never gets hit no matter what attributes I change. Did not have a lot more luck with the steps in Comment 4 either, on a fresh build from mc mach firefox-ui-test --jsdebugger brings up the update popup and then the test fails, the debugger does never show up. > > I don't know how to make an rr trace and would need to get everything set-up > on my linux vm to do that, AIUI (and manipulate some vmware config things, > or something?). I haven't figured out how to do this yet, and I"m fairly > nose-to-grindstone with e10s test efforts, unfortunately. I am also... > hesitant... about me being able to figure out what's wrong with the > component loader - my knowledge of the JS engine as well as xpconnect and > their concepts is approximately nil. :-) Once you've got the rr trace at least the figuring out part would not rest on your shoulder ;) Anyway, a simpler more reliable steps to reproduce would be plenty. Maybe there is just some flags I need to set or something for the debugger I don't know, or maybe you are on another branch. Can you reproduce this on a fresh debug build from the current mc with a fresh profile?

Comment 20

•

10 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #19) > Did not have a lot more luck with the steps in Comment 4 either, on a fresh > build from mc > mach firefox-ui-test --jsdebugger brings up the update popup and then the > test fails, the The mach command has been changed in the meantime, but looks like we have a problem bringing up the debugger. If you checkout a revision from March 6th the above command should work for you and replicate this problem. I will try to get this fixed via bug 1256996 tomorrow.

Comment 21

•

10 years ago

(In reply to Henrik Skupin (:whimboo) from comment #20) > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #19) > > Did not have a lot more luck with the steps in Comment 4 either, on a fresh > > build from mc > > mach firefox-ui-test --jsdebugger brings up the update popup and then the > > test fails, the > > The mach command has been changed in the meantime, but looks like we have a > problem bringing up the debugger. If you checkout a revision from March 6th > the above command should work for you and replicate this problem. I will try > to get this fixed via bug 1256996 tomorrow. It might have to do something with the fact that in the other STR the debugger (from browser toolbox) never breaks at the breakpoint I set, so essentially does not work at all. I'd prefer to wait until the debugger is functional again.

Reporter

Comment 22

•

10 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #21) > (In reply to Henrik Skupin (:whimboo) from comment #20) > > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #19) > > > Did not have a lot more luck with the steps in Comment 4 either, on a fresh > > > build from mc > > > mach firefox-ui-test --jsdebugger brings up the update popup and then the > > > test fails, the > > > > The mach command has been changed in the meantime, but looks like we have a > > problem bringing up the debugger. If you checkout a revision from March 6th > > the above command should work for you and replicate this problem. I will try > > to get this fixed via bug 1256996 tomorrow. > > It might have to do something with the fact that in the other STR the > debugger (from browser toolbox) never breaks at the breakpoint I set, so > essentially does not work at all. I'd prefer to wait until the debugger is > functional again. The debugger is perfectly functional, I'm guessing you have e10s turned on, and so the page toolbox runs in the child process, and because of the same reason that MSVS/GDB/XCode breakpoints don't work when you're attached to the other process, they don't work here. Turn e10s off and the debugger will break as described. As it is, with Ryan's steps I can reproduce this fairly easily on my Windows machine, but after having just set up everything on my Linux VM so I can run Firefox under rr, of course I can't reproduce there. :-\ I'll try copying over prefs from the Windows profile to see if it's something else that's different about the profile. Otherwise, are there things I can find out from just setting breakpoints in MSVS or something? Any obvious suspects for what would be clearing this thing that shouldn't be cleared?

Flags: needinfo?(gkrizsanits)

•

10 years ago

So... I'm struggling a bit, but AIUI things actually go wrong when this thing is constructed, specifically: (rr) watch * (void**) 0x110e55fcb5d0 Hardware watchpoint 1: * (void**) 0x110e55fcb5d0 (rr) reverse-continue Continuing. Hardware watchpoint 1: * (void**) 0x110e55fcb5d0 Old value = (void *) 0x7fce74dbe0c0 New value = (void *) 0xe5e5e5e5e5e5e5e5 JS::PersistentRooted<JSObject*>::PersistentRooted<JSRuntime*> (cx=<optimised out>, this=0x110e55fcb5d0) at /home/gijs/mc-realbuild/dist/include/js/RootingAPI.h:979 979 registerWithRootLists(rootLists(cx)); (rr) bt #0 JS::PersistentRooted<JSObject*>::PersistentRooted<JSRuntime*> (cx=<optimised out>, this=0x110e55fcb5d0) at /home/gijs/mc-realbuild/dist/include/js/RootingAPI.h:979 #1 mozJSComponentLoader::ModuleEntry::ModuleEntry (this=0x110e55fcb590, aCx=0x7fce74dd3400) at /home/gijs/mozilla-central/js/xpconnect/loader/mozJSComponentLoader.h:91 #2 0x000058bb74e2854e in mozJSComponentLoader::ImportInto (this=this@entry=0x7fce74da17c0, aLocation=..., targetObj=..., targetObj@entry=..., callercx=callercx@entry=0x7fce74dd3400, vp=..., vp@entry=...) at /home/gijs/mozilla-central/js/xpconnect/loader/mozJSComponentLoader.cpp:1167 #3 0x000058bb74e28fcf in mozJSComponentLoader::Import (this=0x7fce74da17c0, registryLocation=..., targetValArg=..., cx=0x7fce74dd3400, optionalArgc=<optimised out>, retval=...) at /home/gijs/mozilla-central/js/xpconnect/loader/mozJSComponentLoader.cpp:1051 #4 0x000058bb74e2ef11 in nsXPCComponents_Utils::Import (this=<optimised out>, registryLocation=..., targetObj=..., cx=0x7fce74dd3400, optionalArgc=<optimised out>, retval=...) at /home/gijs/mozilla-central/js/xpconnect/src/XPCComponents.cpp:2502 #5 0x000058bb748f745c in NS_InvokeByIndex (that=<optimised out>, methodIndex=<optimised out>, paramCount=<optimised out>, params=<optimised out>) at /home/gijs/mozilla-central/xpcom/reflect/xptcall/md/unix/xptcinvoke_x86_64_unix.cpp:182 #6 0x000058bb74e5f991 in CallMethodHelper::Invoke (this=0x7fff2dba8668) at /home/gijs/mozilla-central/js/xpconnect/src/XPCWrappedNative.cpp:2083 #7 CallMethodHelper::Call (this=0x7fff2dba8668) at /home/gijs/mozilla-central/js/xpconnect/src/XPCWrappedNative.cpp:1400 #8 XPCWrappedNative::CallMethod (ccx=..., mode=mode@entry=XPCWrappedNative::CALL_METHOD) at /home/gijs/mozilla-central/js/xpconnect/src/XPCWrappedNative.cpp:1367 #9 0x000058bb74e67b0b in XPC_WN_CallMethod (cx=0x7fce74dd3400, argc=2, vp=0x2626181aa830) at /home/gijs/mozilla-central/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1115 #10 0x000058bb76d78b32 in js::CallJSNative (args=..., native=<optimised out>, cx=0x7fce74dd3400) at /home/gijs/mozilla-central/js/src/jscntxtinlines.h:235 #11 js::Invoke (cx=0x7fce74dd3400, args=..., construct=<optimised out>) at /home/gijs/mozilla-central/js/src/vm/Interpreter.cpp:478 #12 0x000058bb76d72f8f in Interpret (cx=0x7fce74dd3400, state=...) at /home/gijs/mozilla-central/js/src/vm/Interpreter.cpp:2809 #13 0x000058bb76d7862d in js::RunScript (cx=cx@entry=0x7fce74dd3400, state=...) at /home/gijs/mozilla-central/js/src/vm/Interpreter.cpp:428 #14 0x000058bb76d78939 in js::Invoke (cx=cx@entry=0x7fce74dd3400, args=..., construct=construct@entry=js::NO_CONSTRUCT) at /home/gijs/mozilla-central/js/src/vm/Interpreter.cpp:496 #15 0x000058bb76c0c375 in js::fun_call (cx=cx@entry=0x7fce74dd3400, argc=argc@entry=1, vp=vp@entry=0x2626181aa788) at /home/gijs/mozilla-central/js/src/jsfun.cpp:1161 #16 0x000058bb76c0faca in js::fun_apply (cx=0x7fce74dd3400, argc=1, vp=0x2626181aa788) at /home/gijs/mozilla-central/js/src/jsfun.cpp:1179 #17 0x000058bb76d78b32 in js::CallJSNative (args=..., native=<optimised out>, cx=0x7fce74dd3400) at /home/gijs/mozilla-central/js/src/jscntxtinlines.h:235 #18 js::Invoke (cx=0x7fce74dd3400, args=..., construct=<optimised out>) at /home/gijs/mozilla-central/js/src/vm/Interpreter.cpp:478 #19 0x000058bb76d72f8f in Interpret (cx=0x7fce74dd3400, state=...) at /home/gijs/mozilla-central/js/src/vm/Interpreter.cpp:2809 #20 0x000058bb76d7862d in js::RunScript (cx=cx@entry=0x7fce74dd3400, state=...) at /home/gijs/mozilla-central/js/src/vm/Interpreter.cpp:428 #21 0x000058bb76d78939 in js::Invoke (cx=cx@entry=0x7fce74dd3400, args=..., construct=construct@entry=js::NO_CONSTRUCT) ... mozJSComponentLoader.h looking like this: 87 class ModuleEntry : public mozilla::Module 88 { 89 public: 90 explicit ModuleEntry(JSContext* aCx) 91 : mozilla::Module(), obj(JS_GetRuntime(aCx)), thisObjectKey(JS_GetRuntime(aCx)) 92 { 93 mVersion = mozilla::Module::kVersion; 94 mCIDs = nullptr; 95 mContractIDs = nullptr; 96 mCategoryEntries = nullptr; 97 getFactoryProc = GetFactory; 98 loadProc = nullptr; 99 unloadProc = nullptr; 100 101 location = nullptr; 102 } if I step through the constructor, once in the constructor body, obj ends up as: (rr) p obj $4 = {<js::PersistentRootedBase<JSObject*>> = {<No data fields>}, <mozilla::LinkedListElement<JS::PersistentRooted<JSObject*> >> = { mNext = 0x7fce74dbe0c0, mPrev = 0x6f116b873a10, mIsSentinel = false}, ptr = 0x0} And, just to make sure the optimized build isn't messing with me, it still looks like that after the constructor returns. It doesn't change until the crash. At the point where we're crashing we're looking for "resource://gre/modules/TelemetryStorage.jsm". The JS stack is in comment #23 . However, at the point where we're creating this ModuleEntry and set its obj to 0, we are *also* looking for this file, but through a different stack: (rr) call DumpJSStack() 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":273] this = [object BackstagePass] 1 XPCU_defineLazyGetter/<.get() ["resource://gre/modules/XPCOMUtils.jsm":198] this = [object BackstagePass] 2 TelemetrySendImpl._checkPendingPings<() ["resource://gre/modules/TelemetrySend.jsm":607] 3 next(val = undefined) ["self-hosted":947] this = [object Generator] 4 TaskImpl_run(aSendResolved = true) ["resource://gre/modules/Task.jsm":319] this = [object Object] 5 TaskImpl(iterator = [object Generator]) ["resource://gre/modules/Task.jsm":280] this = [object Object] 6 createAsyncFunction/asyncFunction() ["resource://gre/modules/Task.jsm":254] this = [object Object] 7 TelemetrySendImpl.setup<() ["resource://gre/modules/TelemetrySend.jsm":588] 8 next(val = undefined) ["self-hosted":947] this = [object Generator] 9 TaskImpl_run(aSendResolved = true) ["resource://gre/modules/Task.jsm":319] this = [object Object] 10 TaskImpl(iterator = [object Generator]) ["resource://gre/modules/Task.jsm":280] this = [object Object] 11 createAsyncFunction/asyncFunction(false) ["resource://gre/modules/Task.jsm":254] this = [object Object] 12 this.TelemetrySend.setup(testing = false) ["resource://gre/modules/TelemetrySend.jsm":183] this = [object Object] 13 setupTelemetry/this._delayedInitTask<() ["resource://gre/modules/TelemetryController.jsm":673] 14 next(val = undefined) ["self-hosted":947] this = [object Generator] 15 TaskImpl_run(aSendResolved = true) ["resource://gre/modules/Task.jsm":319] this = [object Object] 16 TaskImpl(iterator = [object Generator]) ["resource://gre/modules/Task.jsm":280] this = [object Object] 17 createAsyncFunction/asyncFunction() ["resource://gre/modules/Task.jsm":254] 18 Task_spawn(aTask = [function]) ["resource://gre/modules/Task.jsm":168] this = [object Object] 19 this.DeferredTask.prototype._timerCallback/<() ["resource://gre/modules/DeferredTask.jsm":282] 20 next(val = undefined) ["self-hosted":947] this = [object Generator] ... it's not clear to me why obj gets set to 0 when we're constructing the ModuleEntry for the initial call, or why/how that's even possible. It also seems like if I use "next" in gdb to step through the rest of ImportInto, when I end up at: 1170 mInProgressImports.Put(info.Key(), newEntry); 1171 1172 rv = info.EnsureURI(); 1173 NS_ENSURE_SUCCESS(rv, rv); 1174 RootedValue exception(callercx); 1175 rv = ObjectForLocation(info, sourceLocalFile, &newEntry->obj, 1176 &newEntry->thisObjectKey, 1177 &newEntry->location, true, &exception); 1178 and try to continue stepping over the ObjectForLocation call, it goes straight into the segfault. This is surprising because I didn't think, from looking at the stack from comment #0, that there were multiple Cu.import calls on the stack. And in fact, the backtrace from the crash doesn't have two calls in it. So at this point I'm mostly just very confused. Stepping through this a bit more, it seems that the call to ReadCachedScript from within ObjectForLocation causes the crash - but it's not clear to me how that would work and/or if it's worth figuring that out, rather than why the ModuleEntry's obj thing is null on creation. Does this help at all? Bobby/Gabor?

Bobby Holley (:bholley)

Comment 27

•

10 years ago

I'm not sure what the specific question about rr for me is. rr traces aren't portable, so the debugging itself will need to happen on Gijs' machine, presumably with some help from gabor. Y'all should hop on IRC together. :-) Any questions about rr itself should probably go to roc, or maybe BenWa.

Flags: needinfo?(bobbyholley)

•

Comment 39

•

9 years ago

Attached file Full stacks with Task.jsm gunk — Details

Reporter

Comment 40

•

9 years ago

I attached full JS stacks as produced by DumpJSStack(), but I don't think those are very readable, so here's shorter ones: 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":273] this = [object BackstagePass] 1 XPCU_defineLazyGetter/<.get() ["resource://gre/modules/XPCOMUtils.jsm":198] this = [object BackstagePass] 2 TelemetrySendImpl._checkPendingPings<() ["resource://gre/modules/TelemetrySend.jsm":607] 7 TelemetrySendImpl.setup<() ["resource://gre/modules/TelemetrySend.jsm":588] 12 this.TelemetrySend.setup(testing = false) ["resource://gre/modules/TelemetrySend.jsm":183] 13 setupTelemetry/this._delayedInitTask<() ["resource://gre/modules/TelemetryController.jsm":673] The other one looks like this: 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":273] this = [object BackstagePass] 1 XPCU_defineLazyGetter/<.get() ["resource://gre/modules/XPCOMUtils.jsm":198] this = [object BackstagePass] 2 Impl._loadSessionData<() ["resource://gre/modules/TelemetrySession.jsm":1972] 7 setupChromeProcess/this._delayedInitTask<() ["resource://gre/modules/TelemetrySession.jsm":1433] 13 this.DeferredTask.prototype._timerCallback/<() ["resource://gre/modules/DeferredTask.jsm":282] 19 this.DeferredTask.prototype._timerCallback([xpconnect wrapped nsITimer]) ["resource://gre/modules/DeferredTask.jsm":280] 20 bound ([xpconnect wrapped nsITimer]) ["self-hosted":752] So number two here is basically going through: https://dxr.mozilla.org/mozilla-central/rev/bccb11375f2af838cda714d42fd8cef78f5c7bf1/toolkit/components/telemetry/TelemetrySession.jsm#1429-1433 which is a delayed task that is started after a particular delay which ends up calling https://dxr.mozilla.org/mozilla-central/rev/bccb11375f2af838cda714d42fd8cef78f5c7bf1/toolkit/components/telemetry/TelemetrySession.jsm#1972 which pokes at TelemetryStorage, which is a lazy getter and therefore imports the module. Number 1 is *also* a delayed task: https://dxr.mozilla.org/mozilla-central/rev/bccb11375f2af838cda714d42fd8cef78f5c7bf1/toolkit/components/telemetry/TelemetryController.jsm#667-673 which calls into TelemetrySend, which then here: https://dxr.mozilla.org/mozilla-central/rev/bccb11375f2af838cda714d42fd8cef78f5c7bf1/toolkit/components/telemetry/TelemetrySend.jsm#607 also pokes at a lazy getter for TelemetryStorage and therefore imports the module. Now... the confusing thing is these are not recursive stacks. They are separate stacks. Both of these timers fired and we're managing to try to run code off the second one while code off the first one has not run to completion, which is breaking us. I don't know enough about the JS engine / the debugger to figure out how that could ever happen, I just know it's messed up.

Flags: needinfo?(gkrizsanits)

Reporter

•

9 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #44) > I thought you might > now if there are some JS API that can spin the event loop, but never mind > then. XPCOM exposes such APIs. We use them in other places such as the tab-modal implementations of window.alert() and friends. They are problematic there, too, but doing it in a callback like this seems... wrong? :-) > > I don't understand the question. You and Gijs can reproduce this issue, > > right? What expertise is missing? > > I haven't reproduced it yet, but will when I get there, it's just I have > some e10s bugs that have higher prio so I have not got the time yet. If the JS debugger code is spinning the event loop from new global/script notifications, re-paging :ejpbruel / :fitzgen / :jimb about that. Folks, is that fixable? Why are we doing it at all?

Flags: needinfo?(nfitzgerald)

Flags: needinfo?(jimb)

Flags: needinfo?(ejpbruel)

Comment 46

•

9 years ago

(In reply to :Gijs Kruitbosch from comment #35) > In case this is easier, I *think* these are fairly conclusive STR (works at > least 50% of the time for me) that you should be able to use if you want to > look at this yourself: I tried these steps and could not reproduce it even after changing the timeout to 5 minutes :( I wanted to trace where do we spin the event loop. Is it really the Invoke call? Or something else? If it's the invoke call what is on the JS stack? To me the JS callback for the debugger does not seem to be doing much so I would first try to make sure that it's really the invoke call or some other JS/Debugger API the debugger uses from onNewGlobalObject. I think the debugger spins the event loop in a restrictive/selective way, maybe it fires nsITimers accidentally that belongs to the Telemtry initialization, even though it should not. Eddy should know more about this...

Reporter

Comment 47

•

9 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #46) > (In reply to :Gijs Kruitbosch from comment #35) > > In case this is easier, I *think* these are fairly conclusive STR (works at > > least 50% of the time for me) that you should be able to use if you want to > > look at this yourself: > > I tried these steps and could not reproduce it even after changing the > timeout to 5 minutes :( Debug or opt build? > I wanted to trace where do we spin the event loop. Is it really the Invoke > call? Or something else? If it's the invoke call what is on the JS stack? To > me the JS callback for the debugger does not seem to be doing much so I > would first try to make sure that it's really the invoke call or some other > JS/Debugger API the debugger uses from onNewGlobalObject. I think the > debugger spins the event loop in a restrictive/selective way, maybe it fires > nsITimers accidentally that belongs to the Telemtry initialization, even > though it should not. Eddy should know more about this... Mm, I hope they can enlighten us as to what's going on here.

Comment 48

•

9 years ago

(In reply to :Gijs Kruitbosch from comment #47) > Debug or opt build? Debug. Debugging opt builds is quite tricky.

Reporter

Comment 49

•

9 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #48) > (In reply to :Gijs Kruitbosch from comment #47) > > Debug or opt build? > > Debug. Debugging opt builds is quite tricky. Whereas I use opt because doing frontend work on debug builds is almost impossible. :-) Can you check if you can reproduce the problem on an opt build? I wouldn't be terribly surprised if debug builds are more careful in places and somehow end up avoiding this issue. In fact, there's an assert specifically for the case we're looking at in debug builds: https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/loader/mozJSComponentLoader.cpp#1204 ...

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Comment 50

•

9 years ago

(In reply to :Gijs Kruitbosch from comment #49) > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #48) > > Can you check if you can reproduce the problem on an opt build? I wouldn't > be terribly surprised if debug builds are more careful in places and somehow > end up avoiding this issue. In fact, there's an assert specifically for the > case we're looking at in debug builds: > https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/loader/ > mozJSComponentLoader.cpp#1204 ... I'd be fine if instead of a crash I'd hit an assertion... also for most of the asserts there is a crash that follows it shortly like in this case the next line after the assertion... Anyway, I will try it whenever I get the time again but not sure if it's really useful to try and debug an opt build. Maybe a non optimized release build would get us somewhere.

Comment 51

•

9 years ago

We have to spin the event loop on new scripts because there are certain things asynchronous things we must do before allowing those new scripts to run (the main thing being set pending breakpoints). See calls to `unsafeSynchronize` in `_addSource`, which is called from the new script handler: https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/devtools/server/actors/script.js#1948-2007 In general, you have to be prepared for arbitrary JS to run when creating scripts or globals, since debugger hooks get fired on those events, and that arbitrary JS includes things like spinning the event loop.

Flags: needinfo?(nfitzgerald)

Reporter

Comment 52

•

9 years ago

(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #51) > We have to spin the event loop on new scripts because there are certain > things asynchronous things we must do before allowing those new scripts to > run (the main thing being set pending breakpoints). > > See calls to `unsafeSynchronize` in `_addSource`, which is called from the > new script handler: > https://dxr.mozilla.org/mozilla-central/rev/ > 55d557f4d73ee58664bdf2fa85aaab555224722e/devtools/server/actors/script. > js#1948-2007 > > In general, you have to be prepared for arbitrary JS to run when creating > scripts or globals, since debugger hooks get fired on those events, and that > arbitrary JS includes things like spinning the event loop. OK, so how do you propose we solve this case? We're not done importing this module, and it's being imported again, and that's causing crashes. From what you're saying it sounds like we can't *not* spin the event loop, so we can't avoid the other code running and importing the same module a second time over. We could throw on the second import, but that will cause behaviour changes for code that runs while the debugger is active... Importing it a second time despite the first import already being in progress sounds... scary.

Flags: needinfo?(nfitzgerald)

Flags: needinfo?(jimb)

Flags: needinfo?(ejpbruel)

Jim Blandy :jimb

Comment 53

•

9 years ago

(In reply to :Gijs Kruitbosch from comment #45) > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #44) > > I thought you might > > now if there are some JS API that can spin the event loop, but never mind > > then. > > XPCOM exposes such APIs. We use them in other places such as the tab-modal > implementations of window.alert() and friends. They are problematic there, > too, but doing it in a callback like this seems... wrong? :-) > > > > I don't understand the question. You and Gijs can reproduce this issue, > > > right? What expertise is missing? > > > > I haven't reproduced it yet, but will when I get there, it's just I have > > some e10s bugs that have higher prio so I have not got the time yet. > > If the JS debugger code is spinning the event loop from new global/script > notifications, re-paging :ejpbruel / :fitzgen / :jimb about that. Folks, is > that fixable? Why are we doing it at all? Yes, the devtools server does run JS from the onNewGlobalObject callback; that's how the chrome debugger ensures that new globals become debuggees, to make activity within their scope visible to the debugger. And yes, the devtools server does spin event loops while stopped in the midst of debuggee code. This is how the debugger's UI remains active even as the debuggee is stopped at a breakpoint. The devtools server tries to stop all other event handling in the debuggee while it's doing so, because letting handlers run obviously breaks the run-to-completion rule. But Gecko's architecture makes this hard. We recently landed a feature that will help us detect these problems. However, we have *no* legitimate reason to combine those cases, and spin an event loop from onNewGlobalObject. The onNewGlobalObject handler should do nothing but call addGlobal on some Debugger. If this is occurring, I'm very interested in seeing a backtrace. It might be the case that the onNewGlobalObject handler is trying to load some other module. That would also be a bug; it's not reasonable to expect the platform to handle that kind of reentrancy. We do spin an event loop from onNewScript notifications, if breakpoints are set. This is an ongoing source of problems, and probably should never have been landed in the first place, but it's part of our current strategy for handling source maps. We'll be looking for alternatives to that strategy in O2. To see if this is part of the problem, you might try logging calls to unsafeSynchronize here: https://hg.mozilla.org/mozilla-central/file/bccb11375f2a/devtools/server/actors/script.js#l1060

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 54

•

9 years ago

(In reply to :Gijs Kruitbosch from comment #45) > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #44) > > I thought you might > > now if there are some JS API that can spin the event loop, but never mind > > then. > > XPCOM exposes such APIs. We use them in other places such as the tab-modal > implementations of window.alert() and friends. They are problematic there, > too, but doing it in a callback like this seems... wrong? :-) > > > > I don't understand the question. You and Gijs can reproduce this issue, > > > right? What expertise is missing? > > > > I haven't reproduced it yet, but will when I get there, it's just I have > > some e10s bugs that have higher prio so I have not got the time yet. > > If the JS debugger code is spinning the event loop from new global/script > notifications, re-paging :ejpbruel / :fitzgen / :jimb about that. Folks, is > that fixable? Why are we doing it at all? This is an extremely complicated and convoluted part of the debugger, but I’ll try to explain it as best as I can. When a new script notification is fired, it introduces a new Debugger.Script instance to the debugger. Each Debugger.Script instance has a reference to a Debugger.Source instance, which is a reflection object for the source from which that script was compiled. Ignoring source maps for the time being, each of those Debugger.Source instances has a corresponding source actor, which represents the source to the client. When you reload a page, it causes all sources for that page to be reloaded. This will cause all the scripts on the page to be recompiled. As a result, we get a new set of Debugger.Script instances. This set of Debugger.Script instances is distinct from the set of Debugger.Script instances we had before the reload, even if none of the sources changed after the reload. Consequently, the Debugger.Source instances we obtain from those Debugger.Script instances are also distinct from the ones we got from the previous set. This poses a problem, because we now have two distinct sets of Debugger.Source instances. From the perspective of the client, however, nothing changed. All of the sources are still the same as they were before the reload. It is therefore reasonable for the client to expect to be able to talk to those sources using the same source actors as before. However, those source actors still refer to the old Debugger.Source instances. This is a problem, because those old Debugger.Source instances still refer to the old Debugger.Script instances. As a result, sending a request to send a breakpoint to a source actor would result in us setting the breakpoint on a script that is no longer being used, because it is from a previous page load. To work around this, we create a new set of source actors for the new set of Debugger.Source instances. However, if we detect that we already had a source actor for an older Debugger.Source instance for the same source, instead of allocating an actor id for the new source actor, we reuse the actor id of the old source actor. From the perspective of the client, this effectively and transparently replaces the old source actor with the new source actor, since the client only ever refers to those by their actor id. So, what does all of this have to do with event loops? Because reloading a page creates a new set of scripts, it wipes out all the breakpoints we had on the old set of scripts. We thus need to reset all our breakpoints on the new set of Debugger.Script instances. Breakpoints are stored by location in the debugger. We first obtain the source actor for that location. We then obtain the Debugger.Source instance for that source actor. And finally, we obtain the Debugger.Script instance from the Debugger.Source instance. We then set breakpoints on the appropriate byte code offsets on the Debugger.Script instance. We need to make sure all source actors point to the latest Debugger.Source instance, otherwise we will end up with an old Debugger.Script instance, and breakpoints will not work as expected. For this reason, we need to make sure all source actors are updated *before* we reset the breakpoints. Unfortunately, when you add source maps to the mix, updating source actors becomes an asynchronous process. We create a source actor for each original source. However, from the perspective of the engine, those original sources don’t actually exist, so the Debugger.Source instance for these actors instead refers to the generated source that these original sources mapped to. When we get a new version of this Debugger.Source instance, we need to update the existing original source actors, but to do that, we need to fetch the source map. Since fetching a source map is an asynchronous process, the entire process of updating the source actors becomes asynchronous as well. Without unsafeSynchronise, we would end up with only partially updated source actors before the breakpoints are reset, which can cause us to set breakpoints on old Debugger.Script instances. The end result is that your breakpoints sometimes don’t work. The second reason that we use unsafeSynchronise is that resetting breakpoints on a source mapped source is similarly an asynchronous process. Breakpoints are stored by original location, but in reality are set on a generated location. To obtain that generated location, we need to fetch the source map. In the case of a top-level breakpoint on a source mapped source, not using unsafeSynchronise here could cause the script to start running (and pass through that breakpoint) before we get a chance to set it. The end result is, again, that your breakpoints sometimes don’t work. Needless to say, this is all a horrible, horrible hack, and I am more than open to discuss ideas to redesign this. Jim Blandy has some ideas to replace our uses of unsafeSynchronise with breakpoints on the entry point of each frame. This would still cause us to enter a nested event loop, but now this would only happen from within a breakpoint, not a new script notification. I don’t know if that would fix your immediate problem, but I thought I’d point it out.

Comment 55

•

9 years ago

I have nothing to add that hasn't already been added.

Flags: needinfo?(nfitzgerald)

•

9 years ago

(In reply to Eddy Bruel [:ejpbruel] from comment #61) > With that, we now have two ways to fix this: either don't use > unsafeSynchronise in onNewScript, or make sure that debuggee timers don't > fire while we are inside a nested event loop. Both are significant projects. > > The best short term solution I can come up with is the one you suggested: > source maps can already be configured on/off for the debugger, so let's > rewrite this code so that we don't use any asynchronous calls if source maps > are off. Doing so would allow us not to use unsafeSynchronise when source > maps are off. OK. Can either you or :jlongster pick this up? :-)

Flags: needinfo?(ejpbruel)

Jim Blandy :jimb

Comment 64

•

9 years ago

The chrome module loading case is interesting. Unlike content scripts, module loads have a synchronous API: you call the import function, which loads the module and then returns. Certainly special-casing chrome URLs to avoid spinning the event loop is the best fix for this. But what if we did want to support source maps for chrome URLs? Is that just impossible? That seems ridiculous. The synchronous import behavior for chrome can't be reconciled with any asynchronous behavior during either loading or execution. Suppose Telemetry.jsm did have source maps. My suggested hack of setting a breakpoint at the script's entry point doesn't help: if we load Telemetry.jsm, set a breakpoint at its entry point, hit the breakpoint and spin while loading the source map, then the second timer will find itself interrupting the first import, just as before. But the solution is actually the same: chrome URLs are safe to fetch synchronously. Either at onNewScript time or at entry breakpoint time, we could see that the source map URL was a chrome URL, and synchronously fetch and apply the source map.

Jim Blandy :jimb

Comment 65

•

9 years ago

Gijs, thanks for the explanation in comment 60. Now I really feel like I know what's going on, and even how to write tests for it that reproduce the problem reliably.

Assignee

Comment 66

•

9 years ago

(In reply to :Gijs Kruitbosch from comment #62) > (In reply to Eddy Bruel [:ejpbruel] from comment #61) > > With that, we now have two ways to fix this: either don't use > > unsafeSynchronise in onNewScript, or make sure that debuggee timers don't > > fire while we are inside a nested event loop. Both are significant projects. > > > > The best short term solution I can come up with is the one you suggested: > > source maps can already be configured on/off for the debugger, so let's > > rewrite this code so that we don't use any asynchronous calls if source maps > > are off. Doing so would allow us not to use unsafeSynchronise when source > > maps are off. > > OK. Can either you or :jlongster pick this up? :-) I will put this on my todo for next week.

Assignee: nobody → ejpbruel

Flags: needinfo?(ejpbruel)

Priority: -- → P1

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 67

•

9 years ago

Attached patch Carefully avoid unsafeSynchronize when source maps are disabled. — Details — Splinter Review

This patch refactors addSource so that it still does the same thing as before if source maps are enabled. However, if source maps are disabled, it now carefully avoids the use of unsafeSynchronise. This keeps the browser from crashing when using the browser debugger, provided you disable source mapping for the latter. This solution isn't perfect, but it's the best we can do given the constraints we have (we can't stop unsafeSynchronize from running that timer without refactoring Gecko's event model). Not being able to use source maps in the browser debugger doesn't seem like too big of a deal, since we hardly use any source maps for chrome code anyway (if any!). With the above patch in place, I'd like to suggest that we disable source maps for the browser debugger by default (but keep it enabled for the normal debugger, of course). James, what could would I have to change to accomplish that?

Flags: needinfo?(jlong)

Attachment #8740877 - Flags: review?(jlong)

James Long (:jlongster)

Comment 68

•

9 years ago

(In reply to Eddy Bruel [:ejpbruel] from comment #67) > > With the above patch in place, I'd like to suggest that we disable source > maps for the browser debugger by default (but keep it enabled for the normal > debugger, of course). James, what could would I have to change to accomplish > that? It's just a pref that is pulled here and paused in when attaching the thread: https://github.com/mozilla/gecko-dev/blob/master/devtools/client/framework/attach-thread.js#L50 I suppose that is a good place to detect if it's the browser debugger and force it to false.

Flags: needinfo?(jlong)

James Long (:jlongster)

Comment 69

•

9 years ago

Comment on attachment 8740877 [details] [diff] [review] Carefully avoid unsafeSynchronize when source maps are disabled. Review of attachment 8740877 [details] [diff] [review]: ----------------------------------------------------------------- I don't want to r+ this patch, but I don't know what I'd recommend. `_addSource` is getting horrific. I suppose this is fine a short-term solution. Luckily, moving the sourcemap to the client completely solves all of this. I know we still want to somehow pause the engine and sourcemap breakpoints before we set them, but since we plan to do that as an enhancement on top of the client-sourcemap architecture, it doesn't block that work, and can easily be turned off if it causes horrific problems like this. I'm still a little skeptical that we'll fine a proper hack that doesn't introduce these kinds of problem. Pausing the engine on a breakpoint would not have solved this bug. Are there bugs filed for moving the sourcemap to the client? Can you link to that bug in `_addSource` with a note that we're going to remove all of these hacks?

Attachment #8740877 - Flags: review?(jlong) → review+

Comment 70

•

9 years ago

(In reply to James Long (:jlongster) from comment #68) > (In reply to Eddy Bruel [:ejpbruel] from comment #67) > > > > With the above patch in place, I'd like to suggest that we disable source > > maps for the browser debugger by default (but keep it enabled for the normal > > debugger, of course). James, what could would I have to change to accomplish > > that? > > It's just a pref that is pulled here and paused in when attaching the > thread: > https://github.com/mozilla/gecko-dev/blob/master/devtools/client/framework/ > attach-thread.js#L50 > > I suppose that is a good place to detect if it's the browser debugger and > force it to false. Here's where we set other Browser Toolbox specific prefs: https://dxr.mozilla.org/mozilla-central/source/devtools/client/framework/toolbox-process-window.js#57-65 It could be easiest to set the pref value you need there.

Assignee

Comment 71

•

9 years ago

Try run for this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa105db6630d I will write a separate patch for the pref value.