The default bug view has changed. See this FAQ.

Very frequent crash in dromaeo_basics on 64-bit

RESOLVED FIXED in Firefox 10

Status

()

Core
JavaScript Engine
--
critical
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: philor, Assigned: cdleary)

Tracking

({intermittent-failure})

Trunk
mozilla11
x86_64
All
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox10 fixed)

Details

(Whiteboard: [qa-])

Attachments

(2 attachments)

(Reporter)

Description

6 years ago
Remember the Monday rush to stuff as much as we could into mozilla-central before it went to mozilla-aurora? Fun, wasn't it? Now it's time to pay for our fun.

Something that landed on mozilla-inbound before, but probably not too much before, https://hg.mozilla.org/integration/mozilla-inbound/rev/69f7d8cc0c00 gave us a 64-bit-only crash in dromaeo_basics. Mac seems to crash a bit more than Linux, but the Mac stacks are mostly worthless looking. First Linux failure was:

https://tbpl.mozilla.org/php/getParsedLog.php?id=7280113&tree=Mozilla-Inbound

...
Crash reason:  SIGSEGV
Crash address: 0x7fca32a17f68

Thread 0 (crashed)
 0  0x7fca32a17f68
    rbx = 0x27441c40   r12 = 0x00000000   r13 = 0x1bf2c020   r14 = 0x1bf2c028
    r15 = 0x21302000   rip = 0x32a17f68   rsp = 0x80f3c8a8   rbp = 0x1bf2c028

Thread 1
 0  libc-2.11.so + 0xda7f9
    rbx = 0x39224800   r12 = 0x38118c10   r13 = 0x38118c90   r14 = 0x38118c00
    r15 = 0x00000000   rip = 0xd2eda7f9   rsp = 0x38118b88   rbp = 0x39247000

Thread 2
 0  libpthread-2.11.so + 0xb04c
    rbx = 0x43454590   r12 = 0xffffffff   r13 = 0x00000000   r14 = 0x370fbe5f
    r15 = 0x00000000   rip = 0xd360b04c   rsp = 0x370fbd58   rbp = 0x3922e980

Thread 3
 0  libpthread-2.11.so + 0xb3b9
    rbx = 0x3922e888   r12 = 0x00000565   r13 = 0x366facd0   r14 = 0x00000001
    r15 = 0x00000000   rip = 0xd360b3b9   rsp = 0x366fac78   rbp = 0x434e6ab0

Thread 4
 0  libc-2.11.so + 0xd4aa3
    rbx = 0x03e7fc18   r12 = 0x00000002   r13 = 0x03e7fc18   r14 = 0x35b22c00
    r15 = 0x00000001   rip = 0xd2ed4aa3   rsp = 0x358fea40   rbp = 0x00000000

Thread 5
 0  libxul.so!js::gc::Arena::finalize<JSObject> [jsgc.h : 162 + 0x8]
    rbx = 0x22606000   r12 = 0x00000080   r13 = 0x22605fff   r14 = 0x22606000
    r15 = 0x346bbd40   rip = 0x40ef0e5a   rsp = 0x346bbd20   rbp = 0x22606000
 1  libxul.so!js::gc::FinalizeTypedArenas<JSObject> [jsgc.cpp : 349 + 0x10]
    rbx = 0x22605000   r12 = 0x0000000c   r13 = 0x00000080   r14 = 0x2ec7c0b0
    r15 = 0x346bbde0   rip = 0x40ef1a12   rsp = 0x346bbd90   rbp = 0x22622008
 2  libxul.so!js::gc::ArenaLists::backgroundFinalize [jsgc.cpp : 1509 + 0x6]
    rbx = 0x346bbde0   r12 = 0x2ec7c0b0   r13 = 0x2d47b000   r14 = 0x1f900000
    r15 = 0x00000003   rip = 0x40eece38   rsp = 0x346bbde0   rbp = 0x0000000c
 3  libxul.so!js::GCHelperThread::doSweep [jsgc.cpp : 2342 + 0xc]
    rbx = 0x35a687e8   r12 = 0x1f9fffd8   r13 = 0x35a22000   r14 = 0x1f900000
    r15 = 0x00000003   rip = 0x40eecf75   rsp = 0x346bbe20   rbp = 0x2745c430
 4  libxul.so!js::GCHelperThread::threadLoop [jsgc.cpp : 2224 + 0x7]
    rbx = 0x35a687e8   r12 = 0x35a22000   r13 = 0x412869b0   r14 = 0x1f900000


Since I have to start my witchhunt somewhere, and the first reasonable thing below that push (which was a backout-reland to fix a commit message) is https://hg.mozilla.org/integration/mozilla-inbound/rev/366d80e91816 I'm blaming (with absolutely no evidence other than a hunch, mind you), bug 634654
(Reporter)

Comment 1

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7275216&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278453&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7280749&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278595&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7278927&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283168&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283745&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7285595&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7286638&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7283371&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7284880&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7288903&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7295085&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7295477&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7296150&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7296513&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7297559&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7289441&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=7292838&tree=Firefox
Severity: normal → critical
(Reporter)

Comment 2

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7299409&tree=Mozilla-Inbound
(In reply to Phil Ringnalda (:philor) from comment #0)
> Remember the Monday rush to stuff as much as we could into mozilla-central
> before it went to mozilla-aurora? Fun, wasn't it? Now it's time to pay for
> our fun.

I'm ready to atone! Diagnostic patch coming up.
Assignee: general → cdleary
Status: NEW → ASSIGNED
Yeah, this is kind of weird. We're not supposed to be freeing objects any JSObjects on the background thread that have finalizers, and this stack has the background thread doing the freeing.

I guess I can make a patch to assert on that fact in the RegExpPrivate's decref. Will check out the background finalization code more tomorrow -- maybe billm has an idea of something I've done wrong?
Blocks: 691797
(Reporter)

Comment 5

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7306697&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7306531&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7303054&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7302289&tree=Mozilla-Inbound
(Reporter)

Comment 6

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7298806&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=7298858&tree=Firefox
(Reporter)

Comment 7

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7309295&tree=Mozilla-Inbound
(Reporter)

Comment 8

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7309753&tree=Mozilla-Inbound
(Reporter)

Comment 9

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7310643&tree=Mozilla-Inbound
Created attachment 573271 [details] [diff] [review]
Diag: check that we're not decrefing on the helper thread.

If this doesn't trigger on inbound, it would be a strong indicator that my patch is not to blame.
Attachment #573271 - Flags: review?(wmccloskey)
Comment on attachment 573271 [details] [diff] [review]
Diag: check that we're not decrefing on the helper thread.

The patch looks fine. I worry you may have to eat your words in comment 10, though :-).
Attachment #573271 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #11)
> I worry you may have to eat your words in comment 10,
> though :-).

Me too. Random memory corruption is always a possibility. We shall see!

https://hg.mozilla.org/integration/mozilla-inbound/rev/22c3e0ef9971
(Reporter)

Comment 13

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7313653&tree=Mozilla-Inbound (before)
(Reporter)

Comment 14

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7317326&full=1&branch=mozilla-inbound#error0 (after, but 10.7, with... not the most useful-looking stacks)
(Reporter)

Comment 15

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7318136&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7318056&tree=Mozilla-Inbound
Created attachment 573382 [details] [diff] [review]
JM regexp diag

OS X is not so cooperative with the stack info. We can see if the JM interaction is causing the trouble... worst case scenario we may get more stack info when we call out to the stub.
Attachment #573382 - Flags: review?(wmccloskey)
Attachment #573382 - Flags: review?(wmccloskey) → review+
I'm going to submit this tomorrow morning (or really late this evening) so as not to affect perf for nightly users.
https://hg.mozilla.org/mozilla-central/rev/22c3e0ef9971
Target Milestone: --- → mozilla11
https://tbpl.mozilla.org/php/getParsedLog.php?id=7333705&tree=Mozilla-Aurora
(Reporter)

Comment 20

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7334982&tree=Mozilla-Inbound (finally, a non-Mac)
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe41fa77e51a (diagnostic patch)
https://tbpl.mozilla.org/php/getParsedLog.php?id=7338024&tree=Mozilla-Inbound
(In reply to Matt Brubeck (:mbrubeck) from comment #22)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7338024&tree=Mozilla-Inbound

Note: that's after the diagnostic patch. Makes the regexp stuff an unlikely culprit on the Linux side. Let's see if we get more OS X ones as well.
(Reporter)

Comment 24

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7340970&tree=Mozilla-Inbound
Backed out for not stopping the stackless crashes:

https://hg.mozilla.org/integration/mozilla-inbound/rev/7fceaa47fb90 (backout)
(Reporter)

Comment 26

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7342615&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7353037&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=7359562&tree=Mozilla-Aurora
(Reporter)

Comment 29

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7368945&tree=Mozilla-Aurora
(Reporter)

Comment 30

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7372902&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=7386331&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=7348666&tree=Build-System
(Reporter)

Comment 33

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7419002&tree=Mozilla-Aurora
(Reporter)

Comment 34

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7418543&tree=Mozilla-Aurora
(Reporter)

Comment 35

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7428962&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=7429006&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=7427098&tree=Mozilla-Aurora
(Reporter)

Comment 36

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7430657&tree=Mozilla-Aurora
(Reporter)

Comment 37

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7431145&tree=Mozilla-Aurora
I think this should be fixed by the landing of bug 702426. Seems like we get a few per day, so I suppose we give it two days to bake and see?
(Reporter)

Comment 39

5 years ago
Heh. Wait? Waiting is for people who lack either a retrigger link, or the willingness to abuse it. Me, I'm not proud. Or tired.

Fixed on the trunk by "one of those two fuzz bugs from last night" (I forget which they were now, but they landed on inbound on the 10th, and at the time I was certain they were what fixed it (knowing me, because I retriggered the piss out of them)), fixed on Aurora by the backouts in https://hg.mozilla.org/releases/mozilla-aurora/rev/3f725329f26d and friends.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
status-firefox10: --- → fixed
Resolution: --- → FIXED
(In reply to Phil Ringnalda (:philor) from comment #39)
> Heh. Wait? Waiting is for people who lack either a retrigger link, or the
> willingness to abuse it. Me, I'm not proud. Or tired.

You are indeed a righteous man, and I truly admire that about you.
Is this something QA can verify?
Whiteboard: [orange] → [orange][qa?]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #41)
> Is this something QA can verify?

We could run the dromaeo 64 bit a few times with a configuration similar (ideally equivalent) to the farm. Unfortunately, I'm not sure of the specifics there.
Based on comment 42, this bug is qa-. If someone can verify this fix on their own or can provide a simpler testcase, feel free to do so.
Whiteboard: [orange][qa?] → [orange][qa-]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #43)
> Based on comment 42, this bug is qa-. If someone can verify this fix on
> their own or can provide a simpler testcase, feel free to do so.

For future reference, what are you looking for when you ask whether QA can verify? (i.e. what things are capable of being verified?) Maybe there are some docs out there that I haven't read.
(In reply to Chris Leary [:cdleary] from comment #44)

Will respond off-line to reduce noise on this bug.
Keywords: intermittent-failure
Whiteboard: [orange][qa-] → [qa-]
You need to log in before you can comment on or make changes to this bug.