Closed Bug 1514421 Opened 6 years ago Closed 5 years ago

Intermittent GECKO(1075) | Assertion failure: WeakMapBase::checkMarkingForZone(zone), at /builds/worker/workspace/build/src/js/src/gc/GC.cpp:5250

Categories

(Core :: JavaScript: GC, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox-esr60 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: sfink)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell fixed:backout])

Filed by: aciure [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=217064744&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/CC8jcqqlTcGGuUhOXxl10A/runs/0/artifacts/public/logs/live_backing.log [task 2018-12-14T18:16:55.028Z] 18:16:55 INFO - GECKO(1075) | ++DOMWINDOW == 25 (0x7f43520ebc00) [pid = 1075] [serial = 239] [outer = 0x7f433cef9000] [task 2018-12-14T18:16:55.038Z] 18:16:55 INFO - GECKO(1075) | [Parent 1075, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/base/nsContentUtils.cpp, line 8213 [task 2018-12-14T18:16:55.333Z] 18:16:55 INFO - GECKO(1075) | WeakMap value is less marked than map and key [task 2018-12-14T18:16:55.334Z] 18:16:55 INFO - GECKO(1075) | (map 0x7f7d1abae2c8 is black, key 0x7f7d1a277040 is black, value 0x7f7d1c8570d0 is gray) [task 2018-12-14T18:16:55.336Z] 18:16:55 INFO - GECKO(1075) | WeakMap value is less marked than map and key [task 2018-12-14T18:16:55.337Z] 18:16:55 INFO - GECKO(1075) | (map 0x7f7d1abae2c8 is black, key 0x7f7d1a277088 is black, value 0x7f7d1c857130 is gray) [task 2018-12-14T18:16:55.338Z] 18:16:55 INFO - GECKO(1075) | WeakMap value is less marked than map and key [task 2018-12-14T18:16:55.340Z] 18:16:55 INFO - GECKO(1075) | (map 0x7f7d1abae2c8 is black, key 0x7f7d1a2770d0 is black, value 0x7f7d1add16d0 is gray) [task 2018-12-14T18:16:55.341Z] 18:16:55 INFO - GECKO(1075) | Assertion failure: WeakMapBase::checkMarkingForZone(zone), at /builds/worker/workspace/build/src/js/src/gc/GC.cpp:5250 [task 2018-12-14T18:16:55.552Z] 18:16:55 INFO - GECKO(1075) | ++DOCSHELL 0x7f433ce37800 == 7 [pid = 1075] [id = {60a165bb-9b1d-4089-88a7-7ecef110d632}] [task 2018-12-14T18:16:55.554Z] 18:16:55 INFO - GECKO(1075) | ++DOMWINDOW == 26 (0x7f4352325000) [pid = 1075] [serial = 240] [outer = (nil)] [task 2018-12-14T18:16:55.555Z] 18:16:55 INFO - GECKO(1075) | ++DOMWINDOW == 27 (0x7f4352325c00) [pid = 1075] [serial = 241] [outer = 0x7f4352325000] [task 2018-12-14T18:16:55.573Z] 18:16:55 INFO - GECKO(1075) | [Parent 1075, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/base/nsContentUtils.cpp, line 8213 [task 2018-12-14T18:16:55.576Z] 18:16:55 INFO - GECKO(1075) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x1E0001,name=PBrowser::Msg_AsyncMessage) Channel error: cannot send/recv [task 2018-12-14T18:16:55.580Z] 18:16:55 INFO - GECKO(1075) | Handler function LocalDebuggerTransport instance's this.other.hooks.onPacket threw an exception: [Exception... "Unexpected error" nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)" location: "JS frame :: resource://devtools/shared/transport/child-transport.js :: send :: line 106" data: no] [task 2018-12-14T18:16:55.584Z] 18:16:55 INFO - GECKO(1075) | Stack: send@resource://devtools/shared/transport/child-transport.js:106:7 [task 2018-12-14T18:16:55.598Z] 18:16:55 INFO - GECKO(1075) | onPacket@resource://devtools/server/main.js:1303:11
Likely caused by bug 1463462.
Blocks: 1463462
Flags: needinfo?(jcoppeard)
Depends on: 1551275

There are 17 total failures in the last 2 days on linux64, macosx1010-64, windows10-64, windows7-32 all debug,

Recent occurences here seem to be from bug 1167452, Steve can you take a look?

Flags: needinfo?(sphink)
Regressed by: 1167452
Keywords: regression

(In reply to Andreea Pavel [:apavel] from comment #24)

There are 17 total failures in the last 2 days on linux64, macosx1010-64, windows10-64, windows7-32 all debug,

Recent occurences here seem to be from bug 1167452, Steve can you take a look?

Yes, that is definitely where this would be coming from.

Assignee: nobody → sphink
Flags: needinfo?(jcoppeard)

Steve, any updates or an eta for this? It is now on the disable recommended bugs list with 186 total failures in the last 30 days failing on linux32, linux64, macosx1010-64, windows7-32 & windows10-64 all debug builds.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=250788897&repo=autoland

I looked in a few logs and the assertion fails in different tests so we cannot disable this.

Joel would you mind taking a look?

Flags: needinfo?(jmaher)

this is a hard one, overall, I see many failures in:
toolkit/components/extensions/test/mochitest/

this started may 31st, can we retrigger and find a range- maybe win10/debug mochitest-e10s-5 ?

Flags: needinfo?(jmaher) → needinfo?(apavel)
Whiteboard: [stockwell disable-recommended] → [stockwell disable-recommended][retriggered]

given that this is a week with no response, I think it is reasonable to backout the offending patch.

Flags: needinfo?(jmaher)

Backing out the changes in bug 1167452 would mean backing out changes in 1556321 and 1556430 of which the last one is a security bug. Getting hunk failed on js/src/gc/GCMarker.h js/src/gc/Marking.cpp and js/src/gc/WeakMap-inl.h. Ideal would be that Steve takes a look at this and sorts it out.

(In reply to Cosmin Sabou [:CosminS] from comment #39)

Backing out the changes in bug 1167452 would mean backing out changes in 1556321 and 1556430 of which the last one is a security bug. Getting hunk failed on js/src/gc/GCMarker.h js/src/gc/Marking.cpp and js/src/gc/WeakMap-inl.h. Ideal would be that Steve takes a look at this and sorts it out.

Both of those are fixes for regressions that bug 1167452 introduced. That bug appears to have 8 regressions filed against it, half of which have been resolved and half of which have not.

I'm sympathetic to the difficulties of landing large changes, but we're also trying to set a stricter policy around backing out patches that trigger intermittent failures. The longer we wait, the harder it will be to back out the change. We should back it out now.

Sorry Steve. :-(

Flags: needinfo?(csabou)
Flags: needinfo?(csabou) → needinfo?(aryx.bugmail)

Sorry for the delay. I am currently aware of only two distinct regressions remaining. I do not think backing out just the last patch is going to resolve either of them. I think the 2nd half of the stack needs to come back out, sadly. And it looks like that's what ended up happening.

I'll probably close out everything and file a new bug, because half the patches in the original bug 1167452 are still landed. It's still a little messy, though, because one of those regressions (memory usage; bug 1556706) is from the 1st half of the stack, and is still in the tree. (Which is probably ok for now.) I'll comment further in bug 1167452.

Flags: needinfo?(sphink)
Whiteboard: [stockwell disable-recommended][retriggered] → [stockwell fixed:backout]

I've received complains about auto-magically disappearing WeakMap references in a way that's basically impossible to reliably reproduce.
The issue is here https://github.com/WebReflection/lighterhtml/issues/46 and there is a code pen https://codepen.io/WebReflection/pen/LvqPOR where sometimes, after N seconds, or minutes, or even hours, the template literal reference gets replaced and the button doesn't update 'cause it's node associated with it ends up in a limbo.

This never happens in Chrome, or even Safari, despite being affected by a similar bug (but it was easier to reproduce, and it involved WeakSet too).
https://bugs.webkit.org/show_bug.cgi?id=193764

If this bug has nothing to do with what I've described, apologies in advance, but there's still something wrong with current Firefox and in every OS, including mobile, when it comes to WeakMap, so I hope this is the bug I'm looking for, and there is an ETA for this, 'cause template literals based libraries are breaking badly.

Thanks in advance for any sort of help/hint/outcome.

Update: OK, I can confirm WeakMap cleans up when it shouldn't.

You can see the screenshot in here: https://github.com/WebReflection/lighterhtml/issues/46#issuecomment-501623645

The used JS bin is the following one: https://jsbin.com/peniburaho/1/edit?html,console,output

It was used to demonstrate a similar issue in WebKit: https://bugs.webkit.org/show_bug.cgi?id=190756

This is some sort of disaster for libraries that maps via WeakMap unique Template Literals to reflect some DOM node they represent.

As soon as the template literal is GC'd, the whole concept breaks. This is the third time the idea to collect unique template literals break my libraries (see https://medium.com/@WebReflection/a-tiny-disastrous-ecmascript-change-fadc05c83e69) , and the only option I have is to userAgent sniff Firefox and penalize it like I had to do fro WebKit/Safari and TypeScript badly transpiled template literals ( see https://github.com/ungap/template-literal/blob/master/index.js#L6-L23 )

If the eta is short, I might not penalize Firefox (it'd be a pity, it worked so well so far), but otherwise I'd like to have an estimated release number so I can maybe confine the patch in between ranges.

Thank You.

P.S. not sure this is a red herring or anything, but it looks like having the console open doesn't suffer the same issue, at least in the code pen they filed as bug.

Could you please take a look over what Andrea is saying in the above comments? Thank you.

Flags: needinfo?(sphink)

Update: I've filed a proper bug https://bugzilla.mozilla.org/show_bug.cgi?id=1559123

The way I am solving this, if anyone passing by is interested, is by using a WeakMap to store template literals as key once, performing the TL permutation as uunique string once, to return an equivalent Array that represent the template literal so that whenever it's collected, even if its surrounding closures are still used in the code, the operation will be heavy once per collection, but the returning array will always be the same, resurrecting the collected key from its own hashes.

You can see the implementation here: https://github.com/ungap/template-literal/blob/master/esm/index.js#L15-L39

This will penalize Firefox only on cold template literals usage, but it keeps natural WeakMap performance after.

However, I wish template literals were not collected when surrounding closure/scope is still actively invoked/used in the code.

Best Regards

(In reply to Cosmin Sabou [:CosminS] from comment #47)

Could you please take a look over what Andrea is saying in the above comments? Thank you.

It appears to be unrelated. The proper bug for that behavior is bug 1559123 and as mccr8 says there, it seems to be a problem with template literal retention, unrelated to weakmaps.

Flags: needinfo?(sphink)

Failures have dropped off since incremental weakmap marking was backed out (bug 1167452). Closing this bug, please file a new one if this failure recurs.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
See Also: → 1556933
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.