Closed Bug 1346660 Opened 3 years ago Closed 2 years ago

Crash in unsigned char* js::ProtectedReallocPolicy::maybe_pod_realloc<T>

Categories

(Core :: JavaScript Engine, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1124397
Tracking Status
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: marcia, Assigned: ehoogeveen)

References

(Depends on 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-10f2e2d5-11e3-4629-83cb-1dbda2170306.
=============================================================

Seen while looking at nightly Mac stats - crashes started using Build 20170303030202: http://bit.ly/2lRnr2n. Total of 5 crashes with several different installations.

Appears as if there is one Windows crash as well, with a slightly different signature: [@ js::ProtectedReallocPolicy::maybe_pod_realloc<T> ]

Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e91de6fb2b3dce9c932428265b0fdb630ea470d7&tochange=9732cd019a8b94c49a275661320c1b742635a3d6
Flags: needinfo?(emanuel.hoogeveen)
Yeah, I've been keeping an eye on these. Unfortunately none of the other assertions are hitting, so I have no idea what could be causing realloc to fail here.

The crash rate seems pretty low compared to what it was before, so it might be worth checking that we're seeing the poison pattern here, and not random bitflips due to faulty RAM.
Flags: needinfo?(emanuel.hoogeveen)
This does the quick check from an earlier version of the patch in bug 1332594, then does a full check to give us the exact range.

I'm not sure how much value this really adds - it seems pretty unlikely to me that these crashes *aren't* due to the poison pattern we saw before. Unfortunately I really don't know what else to do at this point.
Assignee: nobody → emanuel.hoogeveen
Attachment #8848490 - Flags: review?(jdemooij)
I wonder if we should start thinking about landing a mitigation. Something about jemalloc's realloc seems to be subtly busted, whether it's a bug in the code that we haven't spotted or a miscompilation, and I suspect it won't get fixed until we switch to jemalloc4 (I can keep digging, but at this point I'm not hopeful). When we switched from realloc to malloc/memcpy/free, the crashes went away.

Thankfully all our malloc policies use js_pod_realloc(), which takes the old size as well as the new one [1], so we could circumvent jemalloc's realloc in almost all cases by adjusting that function. We'd lose the benefit of in-place reallocation, but that's not something you can generally rely on anyway. Do you think this might be worth doing? If it works, and I think it will, it might be worth uplifting all the way to ESR.

[1] https://dxr.mozilla.org/mozilla-central/source/js/public/Utility.h#435
Flags: needinfo?(jdemooij)
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #3)
> Thankfully all our malloc policies use js_pod_realloc(), which takes the old
> size as well as the new one [1], so we could circumvent jemalloc's realloc
> in almost all cases by adjusting that function. We'd lose the benefit of
> in-place reallocation, but that's not something you can generally rely on
> anyway. Do you think this might be worth doing?

Hm I think it would be unfortunate to replace realloc with malloc/free to work around this bug. Not sure what else we can do though :(
Flags: needinfo?(jdemooij)
Attachment #8848490 - Flags: review?(jdemooij) → review+
Alright, let's see if this tells us anything surprising: https://treeherder.mozilla.org/#/jobs?repo=try&revision=38260bb5572890f78476a85046d6e7e28bf7d887
Status: NEW → ASSIGNED
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/706d07027f02
Confirm that the poison pattern is present in the buffer after reallocation. r=jandem
Keywords: checkin-needed
Comment on attachment 8848490 [details] [diff] [review]
Confirm that the poison pattern is present in the buffer after reallocation.

I'd like to uplift this to get some more information as the crash rate on Aurora is something like 8x as high (89 crashes in 13 days compared to 22 crashes in 27 days for Nightly).

Approval Request Comment
[Feature/Bug causing the regression]: This is a crash we've been tracking for a long time.
[User impact if declined]: The crash rate on Aurora is about 8x as high as on Nightly, so we'd have to wait a lot longer to get any kind of statistical sample.
[Is this code covered by automated tests?]: Not directly, but the JITs rely heavily on these buffers.
[Has the fix been verified in Nightly?]: Yes, this has been on Nightly for about a day and we've seen a single crash.
[Needs manual test from QE? If yes, steps to reproduce]: n/a
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This code is not used on Beta or Release. It is also purely a diagnostic check.
[String changes made/needed]: n/a
Attachment #8848490 - Flags: approval-mozilla-aurora?
Can I get an update here? Is this approval request not showing up in queues somehow?
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #9)
> Can I get an update here? Is this approval request not showing up in queues
> somehow?

Maybe because the bug is still open?
Hmm, well let's cheat a little and find out ;) This crash is also tracked by bug 1124397 so we're not really losing anything by closing this.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Depends on: 1124397
Keywords: leave-open
Resolution: --- → DUPLICATE
Duplicate of bug: 1124397
Comment on attachment 8848490 [details] [diff] [review]
Confirm that the poison pattern is present in the buffer after reallocation.

We can take this for help get more data. Aurora54+.
Attachment #8848490 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.