Closed Bug 700915 Opened 13 years ago Closed 13 years ago

Very frequent crash in dromaeo_basics on 64-bit

Categories

(Core :: JavaScript Engine, defect)

x86_64
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla11
Tracking Status
firefox10 --- fixed

People

(Reporter: philor, Assigned: cdleary)

References

Details

(Keywords: intermittent-failure, Whiteboard: [qa-])

Attachments

(2 files)

Remember the Monday rush to stuff as much as we could into mozilla-central before it went to mozilla-aurora? Fun, wasn't it? Now it's time to pay for our fun.

Something that landed on mozilla-inbound before, but probably not too much before, https://hg.mozilla.org/integration/mozilla-inbound/rev/69f7d8cc0c00 gave us a 64-bit-only crash in dromaeo_basics. Mac seems to crash a bit more than Linux, but the Mac stacks are mostly worthless looking. First Linux failure was:

https://tbpl.mozilla.org/php/getParsedLog.php?id=7280113&tree=Mozilla-Inbound

...
Crash reason:  SIGSEGV
Crash address: 0x7fca32a17f68

Thread 0 (crashed)
 0  0x7fca32a17f68
    rbx = 0x27441c40   r12 = 0x00000000   r13 = 0x1bf2c020   r14 = 0x1bf2c028
    r15 = 0x21302000   rip = 0x32a17f68   rsp = 0x80f3c8a8   rbp = 0x1bf2c028

Thread 1
 0  libc-2.11.so + 0xda7f9
    rbx = 0x39224800   r12 = 0x38118c10   r13 = 0x38118c90   r14 = 0x38118c00
    r15 = 0x00000000   rip = 0xd2eda7f9   rsp = 0x38118b88   rbp = 0x39247000

Thread 2
 0  libpthread-2.11.so + 0xb04c
    rbx = 0x43454590   r12 = 0xffffffff   r13 = 0x00000000   r14 = 0x370fbe5f
    r15 = 0x00000000   rip = 0xd360b04c   rsp = 0x370fbd58   rbp = 0x3922e980

Thread 3
 0  libpthread-2.11.so + 0xb3b9
    rbx = 0x3922e888   r12 = 0x00000565   r13 = 0x366facd0   r14 = 0x00000001
    r15 = 0x00000000   rip = 0xd360b3b9   rsp = 0x366fac78   rbp = 0x434e6ab0

Thread 4
 0  libc-2.11.so + 0xd4aa3
    rbx = 0x03e7fc18   r12 = 0x00000002   r13 = 0x03e7fc18   r14 = 0x35b22c00
    r15 = 0x00000001   rip = 0xd2ed4aa3   rsp = 0x358fea40   rbp = 0x00000000

Thread 5
 0  libxul.so!js::gc::Arena::finalize<JSObject> [jsgc.h : 162 + 0x8]
    rbx = 0x22606000   r12 = 0x00000080   r13 = 0x22605fff   r14 = 0x22606000
    r15 = 0x346bbd40   rip = 0x40ef0e5a   rsp = 0x346bbd20   rbp = 0x22606000
 1  libxul.so!js::gc::FinalizeTypedArenas<JSObject> [jsgc.cpp : 349 + 0x10]
    rbx = 0x22605000   r12 = 0x0000000c   r13 = 0x00000080   r14 = 0x2ec7c0b0
    r15 = 0x346bbde0   rip = 0x40ef1a12   rsp = 0x346bbd90   rbp = 0x22622008
 2  libxul.so!js::gc::ArenaLists::backgroundFinalize [jsgc.cpp : 1509 + 0x6]
    rbx = 0x346bbde0   r12 = 0x2ec7c0b0   r13 = 0x2d47b000   r14 = 0x1f900000
    r15 = 0x00000003   rip = 0x40eece38   rsp = 0x346bbde0   rbp = 0x0000000c
 3  libxul.so!js::GCHelperThread::doSweep [jsgc.cpp : 2342 + 0xc]
    rbx = 0x35a687e8   r12 = 0x1f9fffd8   r13 = 0x35a22000   r14 = 0x1f900000
    r15 = 0x00000003   rip = 0x40eecf75   rsp = 0x346bbe20   rbp = 0x2745c430
 4  libxul.so!js::GCHelperThread::threadLoop [jsgc.cpp : 2224 + 0x7]
    rbx = 0x35a687e8   r12 = 0x35a22000   r13 = 0x412869b0   r14 = 0x1f900000


Since I have to start my witchhunt somewhere, and the first reasonable thing below that push (which was a backout-reland to fix a commit message) is https://hg.mozilla.org/integration/mozilla-inbound/rev/366d80e91816 I'm blaming (with absolutely no evidence other than a hunch, mind you), bug 634654
https://tbpl.mozilla.org/php/getParsedLog.php?id=7275216&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278453&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7280749&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278595&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278927&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283168&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283745&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7285595&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7286638&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283371&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7284880&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7288903&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7295085&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7295477&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7296150&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7296513&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7297559&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7289441&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=7292838&tree=Firefox
Severity: normal → critical
(In reply to Phil Ringnalda (:philor) from comment #0)
> Remember the Monday rush to stuff as much as we could into mozilla-central
> before it went to mozilla-aurora? Fun, wasn't it? Now it's time to pay for
> our fun.

I'm ready to atone! Diagnostic patch coming up.
Assignee: general → cdleary
Status: NEW → ASSIGNED
Yeah, this is kind of weird. We're not supposed to be freeing objects any JSObjects on the background thread that have finalizers, and this stack has the background thread doing the freeing.

I guess I can make a patch to assert on that fact in the RegExpPrivate's decref. Will check out the background finalization code more tomorrow -- maybe billm has an idea of something I've done wrong?
If this doesn't trigger on inbound, it would be a strong indicator that my patch is not to blame.
Attachment #573271 - Flags: review?(wmccloskey)
Comment on attachment 573271 [details] [diff] [review]
Diag: check that we're not decrefing on the helper thread.

The patch looks fine. I worry you may have to eat your words in comment 10, though :-).
Attachment #573271 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #11)
> I worry you may have to eat your words in comment 10,
> though :-).

Me too. Random memory corruption is always a possibility. We shall see!

https://hg.mozilla.org/integration/mozilla-inbound/rev/22c3e0ef9971
https://tbpl.mozilla.org/php/getParsedLog.php?id=7317326&full=1&branch=mozilla-inbound#error0 (after, but 10.7, with... not the most useful-looking stacks)
Attached patch JM regexp diagSplinter Review
OS X is not so cooperative with the stack info. We can see if the JM interaction is causing the trouble... worst case scenario we may get more stack info when we call out to the stub.
Attachment #573382 - Flags: review?(wmccloskey)
Attachment #573382 - Flags: review?(wmccloskey) → review+
I'm going to submit this tomorrow morning (or really late this evening) so as not to affect perf for nightly users.
(In reply to Matt Brubeck (:mbrubeck) from comment #22)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7338024&tree=Mozilla-Inbound

Note: that's after the diagnostic patch. Makes the regexp stuff an unlikely culprit on the Linux side. Let's see if we get more OS X ones as well.
Backed out for not stopping the stackless crashes:

https://hg.mozilla.org/integration/mozilla-inbound/rev/7fceaa47fb90 (backout)
I think this should be fixed by the landing of bug 702426. Seems like we get a few per day, so I suppose we give it two days to bake and see?
Heh. Wait? Waiting is for people who lack either a retrigger link, or the willingness to abuse it. Me, I'm not proud. Or tired.

Fixed on the trunk by "one of those two fuzz bugs from last night" (I forget which they were now, but they landed on inbound on the 10th, and at the time I was certain they were what fixed it (knowing me, because I retriggered the piss out of them)), fixed on Aurora by the backouts in https://hg.mozilla.org/releases/mozilla-aurora/rev/3f725329f26d and friends.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(In reply to Phil Ringnalda (:philor) from comment #39)
> Heh. Wait? Waiting is for people who lack either a retrigger link, or the
> willingness to abuse it. Me, I'm not proud. Or tired.

You are indeed a righteous man, and I truly admire that about you.
Is this something QA can verify?
Whiteboard: [orange] → [orange][qa?]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #41)
> Is this something QA can verify?

We could run the dromaeo 64 bit a few times with a configuration similar (ideally equivalent) to the farm. Unfortunately, I'm not sure of the specifics there.
Based on comment 42, this bug is qa-. If someone can verify this fix on their own or can provide a simpler testcase, feel free to do so.
Whiteboard: [orange][qa?] → [orange][qa-]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #43)
> Based on comment 42, this bug is qa-. If someone can verify this fix on
> their own or can provide a simpler testcase, feel free to do so.

For future reference, what are you looking for when you ask whether QA can verify? (i.e. what things are capable of being verified?) Maybe there are some docs out there that I haven't read.
(In reply to Chris Leary [:cdleary] from comment #44)

Will respond off-line to reduce noise on this bug.
Whiteboard: [orange][qa-] → [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: