Closed Bug 756267 Opened 8 years ago Closed 8 years ago

Crash after executing profiling test in jsd_GetScriptFunctionId @ JS_GetScriptPrincipals or JS_GetFunctionId

Categories

(Core :: JavaScript Engine, defect, critical)

15 Branch
x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla17
Tracking Status
firefox-esr10 --- unaffected

People

(Reporter: sebastianzartner, Assigned: sfink)

References

Details

(4 keywords, Whiteboard: [js:p1:fx17][js:bumped:1][firebug-p1])

Crash Data

Attachments

(4 files)

Attached file First test to execute
While executing the second of the attached two test cases for Firebug Firefox crashes:

https://crash-stats.mozilla.com/report/index/bp-2b657647-9be1-4d79-8b9d-8e5a72120517

To reproduce this you first need to install Firebug 1.10a8:
http://getfirebug.com/releases/firebug/1.10/firebug-1.10.0a8.xpi
Then execute the test in "issue5033.html" and after that the one in "profile.html". First an unresponsive script message will be shown. Clicking on Continue and waiting a moment more will result in a browser crash.

Sebastian
Attached file Second test to execute
Crash Signature: [@ JS_GetScriptPrincipals ]
Keywords: crash, crashreportid
Assignee: nobody → general
Severity: normal → critical
Status: UNCONFIRMED → NEW
Component: General → JavaScript Engine
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → general
Version: 15 Branch → Trunk
This is blocking our automated test-bot, I am marking as Firebug p1
Honza
Whiteboard: [firebug-p1]
Do you a regression range, by chance?
Whiteboard: [firebug-p1] → [firebug-p1][js:p1:fx15]
If you mean to test again, here are the results (testing with another machine):
Firefox doesn't crash right after clicking "Continue" in the unresponseive script dialog but after executing "profileEnd()" in the second test case. And I get another crash signature:

https://crash-stats.mozilla.com/report/index/bp-1f0ef69d-3e9c-4563-84fa-56f6d2120522

I will retest it later on the original machine I saw this to confirm the results.

Sebastian
(In reply to Sebastian Zartner from comment #4)
> If you mean to test again, here are the results (testing with another
> machine):
> Firefox doesn't crash right after clicking "Continue" in the unresponseive
> script dialog but after executing "profileEnd()" in the second test case.
> And I get another crash signature:
> 
> https://crash-stats.mozilla.com/report/index/bp-1f0ef69d-3e9c-4563-84fa-
> 56f6d2120522
> 
> I will retest it later on the original machine I saw this to confirm the
> results.
> 
> Sebastian

Oops, I left out a word. I meant to ask: do you know in which Firefox Nightly the problem first started?
The first time I checked this was right before I posted this report (having the lately Nightly at that time).
So I can't tell you if it didn't already occur earlier. Sorry.

Sebastian
(In reply to David Mandelin from comment #5)
> Oops, I left out a word. I meant to ask: do you know in which Firefox
> Nightly the problem first started?
Based on crash stats, it first appeared in 15.0a1/20120507 for the JS_GetScriptPrincipals signature and 15.0a1/20120504 for the JS_GetFunctionId signature.
Crash Signature: [@ JS_GetScriptPrincipals ] → [@ JS_GetScriptPrincipals ] [@ JS_GetFunctionId ]
Summary: Crash after executing profiling test [@ JS_GetScriptPrincipals ] → Crash after executing profiling test in jsd_GetScriptFunctionId @ JS_GetScriptPrincipals or JS_GetFunctionId
Version: Trunk → 15 Branch
Whiteboard: [firebug-p1][js:p1:fx15] → [firebug-p1][js:p1:fx16]
Whiteboard: [firebug-p1][js:p1:fx16] → [js:p1:fx16][firebug-p1]
Assignee: general → sphink
Window for Inbound:
Last good nightly: 2012-05-02
First bad nightly: 2012-05-03

Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c3813fbb1c9a&tochange=de5745bce8bc
I had some trouble reproducing because I kept hitting a seemingly unrelated assertion in debug builds, which seems to have been fixed now. I still don't get the slow script dialog; I just get a hang running through JS code. But I'm not sure that it's in an infinite loop; I think it may be making very slow progress. What seems odd is that Firebug seems to be debugging itself; I'm seeing lots of hooks called for code in firebug-service.js. (Unless I'm confusing myself, but I don't think I am for once.)

Relevant changes in that range are hoisting various things from the compartment to the runtime, and compartment-per-global being implemented for xpconnect (bed8c4e3dfdf). So it seems CPG-related.

I don't know why I'm not getting the slow script dialog, unless it's automatically disabled when running under gdb?

cx->compartment->debugModeBits == 1

I'll look at where that is getting set tomorrow.
Unfortunately, the debugModeBits == 1 is correct. JSD debugMode is activated globally, so all compartments get that set from the runtime while you're debugging. So JSD gets a callback from any script running anywhere, and does a slow traversal of a filter to list to discard the ones it doesn't care about. (That traversal has gotten even slower with recent xpc_UnmarkGray* additions.)

Still, none of that is a smoking gun. Looks like I'll need to dig up an old build.
I have a crash from this test case in the debugger right now. The problem appears to be that JSD is holding onto JSScripts that are deleted. (In fact, they are deleted, their arena gets reused for something else, and then when shutting down JSD the thing-that-used-to-be-a-JSScript is referenced and crashes in some random way.)

JSD does not root (or mark) the scripts it hangs onto. Instead, it keeps a table of all the scripts and watches destruction events, removing scripts from the JSD table when they die. In other words, JSD keeps weak references to the scripts.

So it appears that JSD is not getting the destruction hook call. Dumping out a log of JSD script creation and destruction shows lots of scripts not getting unregistered.
One thing that seems unfortunate -- during shutdown, shutdownFirebug() in firebug.js ends up turning debugging back on. This final activation never seems to get turned off cleanly, which is not Firebug's fault but it would be nice if it didn't do this. It's probably not the real problem here, but it's another way that the script tables fail to get cleaned up and it's concealing the one causing this bug. (I'm assuming that it's different, because this bug does not require a shutdown.)

This is the call stack in Firebug 1.10.0a8 that restarts debugging during shutdown:

0 anonymous() ["resource://firebug/firebug-service.js":1638]
    this = [object Object]
1 anonymous() ["resource://firebug/firebug-service.js":921]
    this = [object Object]
2 <TOP LEVEL> ["<unknown>":0]
    <failed to get 'this' value>
3 anonymous() ["chrome://firebug/content/js/debugger.js":2630]
    this = chrome://firebug/content/firefox/firebugFrame.xul-@-43811
4 anonymous() ["chrome://firebug/content/js/debugger.js":2687]
    this = chrome://firebug/content/firefox/firebugFrame.xul-@-43811
5 anonymous() ["chrome://firebug/content/js/debugger.js":2667]
    this = chrome://firebug/content/firefox/firebugFrame.xul-@-43811
6 anonymous() ["chrome://firebug/content/firebug.js":2252]
    this = chrome://firebug/content/firefox/firebugFrame.xul-@-43811
7 anonymous() ["chrome://firebug/content/console/console.js":307]
    this = [object Object]
8 anonymous() ["chrome://firebug/content/firebug.js":2252]
    this = [object Object]
9 anonymous() ["chrome://firebug/content/console/consolePanel.js":234]
    this = [object Object]
10 anonymous() ["chrome://firebug/content/chrome/panelActivation.js":94]
    this = [object Object]
11 anonymous() ["chrome://firebug/content/firebug.js":308]
    this = [object Object]
12 shutdownFirebug() ["chrome://firebug/content/firebug.js":2642]
    this = [object ChromeWindow @ 0x7fffd7a53da0 (native @ 0x7fffd7e9c880)]
Ok. The problem is that JSD weakly holds JSScripts, and depends on getting a finalization callback to know when to release a script. If it doesn't receive a callback when a script dies, it will hold onto and later use the script during a JSDContext's shutdown. (That happens whenever jsdService::Off() is called, which is not just at process shutdown.)

The problem is that the JS engine will only call destroy hooks on scripts that it invoked the creation hook on -- it sets a callDestroyHook flag when it calls the creation hook. But JSD can discover scripts in ways other than the creation hook -- in particular, it looks at the top of the stack and grabs the script there. If such a script predated turning on debugging, then JSD will have a script that it will not receive a finalization callback for.

I don't know why the call/destroy hooks need to be balanced. It was done as part of bug 674251, which turned JSScript into a gcthing. I'll need to either (1) undo that pairing; (2) somehow release these scripts earlier; or (3) make JSD use a GC-participating WeakMap or similar.

CC'ing jorendorff to ask whether Debugger depends on script create/destroy hooks being paired. I see a call to Debugger::onNewScript, so I'm guessing it has its own mechanism and doesn't care.
I experimented by removing the pairing, and that seems to have eliminated the crash. It does *not* fix the problem where the second test case seems to go on forever, popping up the slow script dialog. (If I execute the test case directly, it succeeds immediately and shows me the profile data.) So we'll need to split this bug into two.
Blocks: 770786
Thanks for the detailed investigation, Steve. Just wanted to let you know that I was following them and already mentioned the shutdown problem mentioned in comment 12 to Honza since he's more familiar to the code related to JSD than I am.

Sebastian
(In reply to Steve Fink [:sfink] from comment #12)
> One thing that seems unfortunate -- during shutdown, shutdownFirebug() in
> firebug.js ends up turning debugging back on.
Thanks Steve for the analyses!

I committed a patch that avoids JSD activation on Firebug shutdown at:
https://github.com/firebug/firebug/commit/c45a98470eed63da0cad44536ac4966673af1adf

(Will be in Firebug 1.10 beta 4 or 1.10 final)

Honza
Group: core-security
Marked s-s because this produces JSScript* pointers to freed memory, and via JSD you can access lots of stuff through them, and probably modify a fair bit too.
Status: NEW → ASSIGNED
Nobody I've talked to has come up with a reason this would break anything.
Attachment #640695 - Flags: review?(jorendorff)
On IRC, Honza said that he preferred to not receive JSD onScriptDestroyed callbacks for scripts that did not come in through onScriptCreated.

This patch goes a little further and exposes control through a modifiable flag, which turned out to be the most straightforward implementation but also allows the user to control the behavior if needed.
Attachment #640698 - Flags: review?(jorendorff)
Duplicate of this bug: 774579
Sorry for the duplication. I thought it might be another bug incidentally having the same signature.

Sebastian
Whiteboard: [js:p1:fx16][firebug-p1] → [js:p1:fx17][js:bumped:1][firebug-p1]
Attachment #640695 - Flags: review?(jorendorff) → review+
Attachment #640698 - Flags: review?(jorendorff) → review+
Two notes though:

- These are not really separate patches; they have to land together to avoid regressing other stuff.

- If it's possible to test this, that'd be good.
I attempted to write a test for this, since most JSD tests are now relatively easy to write due to xpcshell's -d flag.

But this one wasn't, because it requires creating a script before JSD is turned on (not possible in a test script because xpcshell -d starts out with debugging on), and destroying it during the test execution. And xpcshell tests never have an empty JS stack, so I can't toggle JSD on and off during a test.

I attempted to delete a builtin function (eg Math.atan) and force a GC, but it seems the script was still not deleted, probably because they're special in some way and are held onto in some tricky way. (dumpObject says the slot is null. DumpHeapComplete shows the original function still being encountered when tracing Math.)

If I can't come up with something soon, I'm just going to land.
Math.atan wasn't working because there were a bunch of different Math objects pointing to the same object, and I was only overriding one of them. I have no idea what the other ones are; I could see other globals having a Math, but this was the same pointer.

But this approach is a non-starter anyway, because all of the builtins that are created before debugging is turned on are natives anyway, so there's no JSScript to kill. Perhaps with self-hosting there might be something, but that just seems too fragile.

So the proper way to write the test would be to (1) write it as a mochitest or (2) extend the xpcshell test framework to allow changing the debugger state by registering a "continuation" and exiting out of all JS code. I am not willing to do #1 just for JSD. I would be willing to do #2 if it would be useful to Debugger, though it probably doesn't need xpcshell help and can use the JS shell instead, and that may not even need modification.
(In reply to Steve Fink [:sfink] from comment #24)
> http://hg.mozilla.org/integration/mozilla-inbound/rev/85698a1c20cb

https://hg.mozilla.org/mozilla-central/rev/85698a1c20cb
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Marking the ESR as unaffected based upon comment 8.
Group: core-security
You need to log in before you can comment on or make changes to this bug.