Closed Bug 1596943 Opened 5 years ago Closed 3 years ago

Intermittent dom/base/test/test_bug416383.html | application crashed [@ js::WeakMap<js::HeapPtr<JSObject*>, js::HeapPtr<JS::Value> >::markKey(js::GCMarker*, js::gc::Cell*, js::gc::Cell*)]

Categories

(Core :: JavaScript: GC, defect, P1)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Assigned: sfink)

References

(Regression)

Details

(Keywords: crash, intermittent-failure, regression)

Crash Data

Attachments

(1 file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=276477166&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ULVF06rLSXGZ3ya9bgECpw/runs/0/artifacts/public/logs/live_backing.log


[task 2019-11-15T18:44:33.486Z] 18:44:33 WARNING - PROCESS-CRASH | dom/base/test/test_bug416383.html | application crashed [@ js::WeakMap<js::HeapPtr<JSObject*>, js::HeapPtr<JS::Value> >::markKey(js::GCMarker*, js::gc::Cell*, js::gc::Cell*)]
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - Crash dump filename: /tmp/tmp1GMDO9/4521f725-f951-4dc3-24c5-8d7150847e15.dmp
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - Operating system: Android
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - 0.0.0 Linux 3.10.0+ #260 SMP PREEMPT Fri May 19 12:48:14 PDT 2017 x86_64
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - CPU: amd64
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - family 6 model 6 stepping 3
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - 4 CPUs
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - GPU: UNKNOWN
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - Crash address: 0x0
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - Process uptime: not available
[task 2019-11-15T18:44:33.487Z] 18:44:33 INFO - Thread 11 (crashed)
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - 0 libxul.so!js::WeakMap<js::HeapPtr<JSObject*>, js::HeapPtr<JS::Value> >::markKey(js::GCMarker*, js::gc::Cell*, js::gc::Cell*) [WeakMap-inl.h:9e78205ab7d0eac06ad757db1fc605b3e2c7a75c : 114 + 0x29]
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - rax = 0x00007de341831391 rdx = 0x0000000000000004
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - rcx = 0x00007de35bd15a80 rbx = 0x00007de33292e2e0
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - rsi = 0x00007de34452ca10 rdi = 0x000000000000001b
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - rbp = 0x00007de34452d110 rsp = 0x00007de34452d0c0
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - r8 = 0x0000000000000000 r9 = 0x00007de361b88090
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - r10 = 0x0000000000000012 r11 = 0x0000000000000246
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - r12 = 0x00007de34452d0d8 r13 = 0x0000000000000003
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - r14 = 0x00007de33292e318 r15 = 0x00ffffffffffffff
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - rip = 0x00007de33ea20e45
[task 2019-11-15T18:44:33.488Z] 18:44:33 INFO - Found by: given as instruction pointer in context
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - 1 libxul.so!void js::GCMarker::markImplicitEdgesHelper<JSObject*>(JSObject*) [Marking.cpp:9e78205ab7d0eac06ad757db1fc605b3e2c7a75c : 659 + 0x13]
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - rbp = 0x00007de34452d160 rsp = 0x00007de34452d120
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - rip = 0x00007de33ed528d3
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - Found by: previous frame's frame pointer
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - 2 libxul.so!js::GCMarker::processMarkStackTop(js::SliceBudget&) [Marking.cpp:9e78205ab7d0eac06ad757db1fc605b3e2c7a75c : 1861 + 0xf]
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - rbp = 0x00007de34452d210 rsp = 0x00007de34452d170
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - rip = 0x00007de33ed57659
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - Found by: previous frame's frame pointer
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - 3 libxul.so!js::GCMarker::markUntilBudgetExhausted(js::SliceBudget&) [Marking.cpp:9e78205ab7d0eac06ad757db1fc605b3e2c7a75c : 1662 + 0xb]
[task 2019-11-15T18:44:33.489Z] 18:44:33 INFO - rbp = 0x00007de34452d250 rsp = 0x00007de34452d220
[task 2019-11-15T18:44:33.490Z] 18:44:33 INFO - rip = 0x00007de33ed57f11
[task 2019-11-15T18:44:33.490Z] 18:44:33 INFO - Found by: previous frame's frame pointer
[task 2019-11-15T18:44:33.490Z] 18:44:33 INFO - 4 libxul.so!void js::gc::GCRuntime::markWeakReferences<js::gc::SweepGroupZonesIter>(js::gcstats::PhaseKind) [GC.cpp:9e78205ab7d0eac06ad757db1fc605b3e2c7a75c : 4170 + 0x4c]
[task 2019-11-15T18:44:33.490Z] 18:44:33 INFO - rbp = 0x00007de34452d2c0 rsp = 0x00007de34452d260
[task 2019-11-15T18:44:33.490Z] 18:44:33 INFO - rip = 0x00007de33ed01736
[task 2019-11-15T18:44:33.490Z] 18:44:33 INFO - Found by: previous frame's frame pointer
[task 2019-11-15T18:44:33.490Z] 18:44:33 INFO - 5 libxul.so!js::gc::GCRuntime::endMarkingSweepGroup(JSFreeOp*, js::SliceBudget&) [GC.cpp:9e78205ab7d0eac06ad757db1fc605b3e2c7a75c : 5073 + 0xd]

Perhaps related to recent weakmap changes.

Flags: needinfo?(sphink)
Priority: -- → P1
Assignee: nobody → sphink
Status: NEW → ASSIGNED

The original crash here is a null dereference in the line MOZ_ASSERT(p.found()) (where p is the result of looking up a remembered key in a WeakMap). I have no idea how that could get a null deref (rather than hitting the assertion, for example), but it's probably some optimizer-rearranged code or something.

Many of the intermittents being associated with this bug are actually for a different line of code, MOZ_ASSERT(mapColor) in markKey. I'm kind of baffled as to how that could happen. Entries are added to the weak keys tables in one function, addWeakEntry(), which is only called by markEntries(), which has its own MOZ_ASSERT(mapColor). (And the maps inserted into the weak keys tables are this, so it's being called on the same thing.) But at some point, the maps are getting unmarked (or invalidated by being swept, maybe?) and then one of their keys or keys' delegates is marked, triggering the lookup and subsequent assertion. I checked to be sure the weak key tables are being cleared at the end of a GC, and it does appear that they are (in SweepWeakMaps).

There is a chance my fix in bug 1597206 will fix this, though I honestly don't see why it would. (And I hope to revert the most relevant part when I turn on incremental marking.)

I'm going to land a diagnostics patch to try to get more information.

Flags: needinfo?(sphink)
Keywords: leave-open
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dead6c366e3e DEBUG-only diagnostics for weakmap marking problems r=jonco
Regressed by: 1597206

The volume has dropped very significantly. Actually, the mapColor assertion seems to be gone.

There is a chance my fix in bug 1597206 will fix this, though I honestly don't see why it would. (And I hope to revert the most relevant part when I turn on incremental marking.)

In retrospect, I think that's exactly what happened. Bug 1593399 accidentally started populating the weakKeys table during incremental GCs, even though it still discarded the table when weak marking mode was entered. Bug 1597206 reverted this, since it was an unintended and useless waste of time until we have incremental weakmap marking -- which strongly implies that the assertion is happening during this "extra" marking.

I still haven't tracked down the exact culprit here, but this gives me a lot more to go on. It also gives me much more confidence in the current state of the code (and the need to backport bug 1597206). I still need to figure this out before landing incremental weakmap marking, but it appears we're now in a good state.

There was a single crash of the other sort, the non-assertion version that this bug was originally filed for. It's on Android, and I don't have an explanation for it. But at least the volume is also much less, and it seems plausible that it could be unrelated to any of the changes discussed in this comment.

Hey Steve, was looking at this bug and I don't see any crashes of this type anymore. It looked like you were still going to look into something related? Or should I just close it out?

Flags: needinfo?(sphink)

I would say that especially given the additional landing of bug 1694538, this bug is probably either gone or will appear differently enough that this bug report won't be useful. Thanks for looking at it.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(sphink)
Resolution: --- → WORKSFORME
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: