Last Comment Bug 700915 - Very frequent crash in dromaeo_basics on 64-bit
: Very frequent crash in dromaeo_basics on 64-bit
Status: RESOLVED FIXED
[qa-]
: intermittent-failure
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86_64 All
: -- critical (vote)
: mozilla11
Assigned To: Chris Leary [:cdleary] (not checking bugmail)
:
Mentors:
Depends on:
Blocks: 438871 634654 691797
  Show dependency treegraph
 
Reported: 2011-11-08 19:50 PST by Phil Ringnalda (:philor, back in August)
Modified: 2012-11-25 19:31 PST (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
Diag: check that we're not decrefing on the helper thread. (1.39 KB, patch)
2011-11-09 12:02 PST, Chris Leary [:cdleary] (not checking bugmail)
wmccloskey: review+
Details | Diff | Splinter Review
JM regexp diag (3.34 KB, patch)
2011-11-09 17:09 PST, Chris Leary [:cdleary] (not checking bugmail)
wmccloskey: review+
Details | Diff | Splinter Review

Description Phil Ringnalda (:philor, back in August) 2011-11-08 19:50:49 PST
Remember the Monday rush to stuff as much as we could into mozilla-central before it went to mozilla-aurora? Fun, wasn't it? Now it's time to pay for our fun.

Something that landed on mozilla-inbound before, but probably not too much before, https://hg.mozilla.org/integration/mozilla-inbound/rev/69f7d8cc0c00 gave us a 64-bit-only crash in dromaeo_basics. Mac seems to crash a bit more than Linux, but the Mac stacks are mostly worthless looking. First Linux failure was:

https://tbpl.mozilla.org/php/getParsedLog.php?id=7280113&tree=Mozilla-Inbound

...
Crash reason:  SIGSEGV
Crash address: 0x7fca32a17f68

Thread 0 (crashed)
 0  0x7fca32a17f68
    rbx = 0x27441c40   r12 = 0x00000000   r13 = 0x1bf2c020   r14 = 0x1bf2c028
    r15 = 0x21302000   rip = 0x32a17f68   rsp = 0x80f3c8a8   rbp = 0x1bf2c028

Thread 1
 0  libc-2.11.so + 0xda7f9
    rbx = 0x39224800   r12 = 0x38118c10   r13 = 0x38118c90   r14 = 0x38118c00
    r15 = 0x00000000   rip = 0xd2eda7f9   rsp = 0x38118b88   rbp = 0x39247000

Thread 2
 0  libpthread-2.11.so + 0xb04c
    rbx = 0x43454590   r12 = 0xffffffff   r13 = 0x00000000   r14 = 0x370fbe5f
    r15 = 0x00000000   rip = 0xd360b04c   rsp = 0x370fbd58   rbp = 0x3922e980

Thread 3
 0  libpthread-2.11.so + 0xb3b9
    rbx = 0x3922e888   r12 = 0x00000565   r13 = 0x366facd0   r14 = 0x00000001
    r15 = 0x00000000   rip = 0xd360b3b9   rsp = 0x366fac78   rbp = 0x434e6ab0

Thread 4
 0  libc-2.11.so + 0xd4aa3
    rbx = 0x03e7fc18   r12 = 0x00000002   r13 = 0x03e7fc18   r14 = 0x35b22c00
    r15 = 0x00000001   rip = 0xd2ed4aa3   rsp = 0x358fea40   rbp = 0x00000000

Thread 5
 0  libxul.so!js::gc::Arena::finalize<JSObject> [jsgc.h : 162 + 0x8]
    rbx = 0x22606000   r12 = 0x00000080   r13 = 0x22605fff   r14 = 0x22606000
    r15 = 0x346bbd40   rip = 0x40ef0e5a   rsp = 0x346bbd20   rbp = 0x22606000
 1  libxul.so!js::gc::FinalizeTypedArenas<JSObject> [jsgc.cpp : 349 + 0x10]
    rbx = 0x22605000   r12 = 0x0000000c   r13 = 0x00000080   r14 = 0x2ec7c0b0
    r15 = 0x346bbde0   rip = 0x40ef1a12   rsp = 0x346bbd90   rbp = 0x22622008
 2  libxul.so!js::gc::ArenaLists::backgroundFinalize [jsgc.cpp : 1509 + 0x6]
    rbx = 0x346bbde0   r12 = 0x2ec7c0b0   r13 = 0x2d47b000   r14 = 0x1f900000
    r15 = 0x00000003   rip = 0x40eece38   rsp = 0x346bbde0   rbp = 0x0000000c
 3  libxul.so!js::GCHelperThread::doSweep [jsgc.cpp : 2342 + 0xc]
    rbx = 0x35a687e8   r12 = 0x1f9fffd8   r13 = 0x35a22000   r14 = 0x1f900000
    r15 = 0x00000003   rip = 0x40eecf75   rsp = 0x346bbe20   rbp = 0x2745c430
 4  libxul.so!js::GCHelperThread::threadLoop [jsgc.cpp : 2224 + 0x7]
    rbx = 0x35a687e8   r12 = 0x35a22000   r13 = 0x412869b0   r14 = 0x1f900000


Since I have to start my witchhunt somewhere, and the first reasonable thing below that push (which was a backout-reland to fix a commit message) is https://hg.mozilla.org/integration/mozilla-inbound/rev/366d80e91816 I'm blaming (with absolutely no evidence other than a hunch, mind you), bug 634654
Comment 1 Phil Ringnalda (:philor, back in August) 2011-11-08 20:03:05 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7275216&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278453&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7280749&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278595&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278927&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283168&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283745&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7285595&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7286638&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283371&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7284880&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7288903&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7295085&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7295477&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7296150&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7296513&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7297559&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7289441&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=7292838&tree=Firefox
Comment 2 Phil Ringnalda (:philor, back in August) 2011-11-08 21:49:18 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7299409&tree=Mozilla-Inbound
Comment 3 Chris Leary [:cdleary] (not checking bugmail) 2011-11-08 23:44:05 PST
(In reply to Phil Ringnalda (:philor) from comment #0)
> Remember the Monday rush to stuff as much as we could into mozilla-central
> before it went to mozilla-aurora? Fun, wasn't it? Now it's time to pay for
> our fun.

I'm ready to atone! Diagnostic patch coming up.
Comment 4 Chris Leary [:cdleary] (not checking bugmail) 2011-11-08 23:58:17 PST
Yeah, this is kind of weird. We're not supposed to be freeing objects any JSObjects on the background thread that have finalizers, and this stack has the background thread doing the freeing.

I guess I can make a patch to assert on that fact in the RegExpPrivate's decref. Will check out the background finalization code more tomorrow -- maybe billm has an idea of something I've done wrong?
Comment 7 Phil Ringnalda (:philor, back in August) 2011-11-09 09:09:52 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7309295&tree=Mozilla-Inbound
Comment 8 Phil Ringnalda (:philor, back in August) 2011-11-09 09:25:09 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7309753&tree=Mozilla-Inbound
Comment 9 Phil Ringnalda (:philor, back in August) 2011-11-09 10:05:48 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7310643&tree=Mozilla-Inbound
Comment 10 Chris Leary [:cdleary] (not checking bugmail) 2011-11-09 12:02:29 PST
Created attachment 573271 [details] [diff] [review]
Diag: check that we're not decrefing on the helper thread.

If this doesn't trigger on inbound, it would be a strong indicator that my patch is not to blame.
Comment 11 Bill McCloskey (:billm) 2011-11-09 12:11:50 PST
Comment on attachment 573271 [details] [diff] [review]
Diag: check that we're not decrefing on the helper thread.

The patch looks fine. I worry you may have to eat your words in comment 10, though :-).
Comment 12 Chris Leary [:cdleary] (not checking bugmail) 2011-11-09 12:25:59 PST
(In reply to Bill McCloskey (:billm) from comment #11)
> I worry you may have to eat your words in comment 10,
> though :-).

Me too. Random memory corruption is always a possibility. We shall see!

https://hg.mozilla.org/integration/mozilla-inbound/rev/22c3e0ef9971
Comment 13 Phil Ringnalda (:philor, back in August) 2011-11-09 12:30:42 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7313653&tree=Mozilla-Inbound (before)
Comment 14 Phil Ringnalda (:philor, back in August) 2011-11-09 15:46:56 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7317326&full=1&branch=mozilla-inbound#error0 (after, but 10.7, with... not the most useful-looking stacks)
Comment 16 Chris Leary [:cdleary] (not checking bugmail) 2011-11-09 17:09:11 PST
Created attachment 573382 [details] [diff] [review]
JM regexp diag

OS X is not so cooperative with the stack info. We can see if the JM interaction is causing the trouble... worst case scenario we may get more stack info when we call out to the stub.
Comment 17 Chris Leary [:cdleary] (not checking bugmail) 2011-11-09 17:56:43 PST
I'm going to submit this tomorrow morning (or really late this evening) so as not to affect perf for nightly users.
Comment 18 Marco Bonardo [::mak] (Away 6-20 Aug) 2011-11-10 03:22:04 PST
https://hg.mozilla.org/mozilla-central/rev/22c3e0ef9971
Comment 20 Phil Ringnalda (:philor, back in August) 2011-11-10 12:29:14 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7334982&tree=Mozilla-Inbound (finally, a non-Mac)
Comment 21 Chris Leary [:cdleary] (not checking bugmail) 2011-11-10 14:18:34 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe41fa77e51a (diagnostic patch)
Comment 23 Chris Leary [:cdleary] (not checking bugmail) 2011-11-10 15:22:32 PST
(In reply to Matt Brubeck (:mbrubeck) from comment #22)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7338024&tree=Mozilla-Inbound

Note: that's after the diagnostic patch. Makes the regexp stuff an unlikely culprit on the Linux side. Let's see if we get more OS X ones as well.
Comment 24 Phil Ringnalda (:philor, back in August) 2011-11-10 16:58:41 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7340970&tree=Mozilla-Inbound
Comment 25 Chris Leary [:cdleary] (not checking bugmail) 2011-11-10 17:21:22 PST
Backed out for not stopping the stackless crashes:

https://hg.mozilla.org/integration/mozilla-inbound/rev/7fceaa47fb90 (backout)
Comment 26 Phil Ringnalda (:philor, back in August) 2011-11-10 18:17:48 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7342615&tree=Mozilla-Inbound
Comment 29 Phil Ringnalda (:philor, back in August) 2011-11-12 23:56:58 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7368945&tree=Mozilla-Aurora
Comment 30 Phil Ringnalda (:philor, back in August) 2011-11-13 23:59:08 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7372902&tree=Mozilla-Aurora
Comment 33 Phil Ringnalda (:philor, back in August) 2011-11-15 19:37:06 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7419002&tree=Mozilla-Aurora
Comment 34 Phil Ringnalda (:philor, back in August) 2011-11-15 19:41:59 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7418543&tree=Mozilla-Aurora
Comment 36 Phil Ringnalda (:philor, back in August) 2011-11-16 09:52:38 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7430657&tree=Mozilla-Aurora
Comment 37 Phil Ringnalda (:philor, back in August) 2011-11-16 10:38:11 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=7431145&tree=Mozilla-Aurora
Comment 38 Chris Leary [:cdleary] (not checking bugmail) 2011-11-16 16:44:32 PST
I think this should be fixed by the landing of bug 702426. Seems like we get a few per day, so I suppose we give it two days to bake and see?
Comment 39 Phil Ringnalda (:philor, back in August) 2011-11-16 23:13:05 PST
Heh. Wait? Waiting is for people who lack either a retrigger link, or the willingness to abuse it. Me, I'm not proud. Or tired.

Fixed on the trunk by "one of those two fuzz bugs from last night" (I forget which they were now, but they landed on inbound on the 10th, and at the time I was certain they were what fixed it (knowing me, because I retriggered the piss out of them)), fixed on Aurora by the backouts in https://hg.mozilla.org/releases/mozilla-aurora/rev/3f725329f26d and friends.
Comment 40 Chris Leary [:cdleary] (not checking bugmail) 2011-11-17 00:04:13 PST
(In reply to Phil Ringnalda (:philor) from comment #39)
> Heh. Wait? Waiting is for people who lack either a retrigger link, or the
> willingness to abuse it. Me, I'm not proud. Or tired.

You are indeed a righteous man, and I truly admire that about you.
Comment 41 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-28 14:18:15 PST
Is this something QA can verify?
Comment 42 Chris Leary [:cdleary] (not checking bugmail) 2012-01-04 10:31:44 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #41)
> Is this something QA can verify?

We could run the dromaeo 64 bit a few times with a configuration similar (ideally equivalent) to the farm. Unfortunately, I'm not sure of the specifics there.
Comment 43 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-01-04 11:10:42 PST
Based on comment 42, this bug is qa-. If someone can verify this fix on their own or can provide a simpler testcase, feel free to do so.
Comment 44 Chris Leary [:cdleary] (not checking bugmail) 2012-01-04 11:19:58 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #43)
> Based on comment 42, this bug is qa-. If someone can verify this fix on
> their own or can provide a simpler testcase, feel free to do so.

For future reference, what are you looking for when you ask whether QA can verify? (i.e. what things are capable of being verified?) Maybe there are some docs out there that I haven't read.
Comment 45 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-01-04 12:20:45 PST
(In reply to Chris Leary [:cdleary] from comment #44)

Will respond off-line to reduce noise on this bug.

Note You need to log in before you can comment on or make changes to this bug.