Closed Bug 1418797 Opened 7 years ago Closed 3 years ago

Crash in nsDisplayListBuilder::FreeClipChains

Categories

(Core :: Web Painting, defect, P3)

58 Branch
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox57 --- unaffected
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix

People

(Reporter: philipp, Unassigned)

References

Details

(4 keywords)

Crash Data

This bug was filed from the Socorro interface and is report bp-16e7b416-f852-4719-a35e-852d30171119. ============================================================= Top 7 frames of crashing thread: 0 xul.dll nsDisplayListBuilder::FreeClipChains layout/painting/nsDisplayList.cpp:1411 1 xul.dll nsDisplayListBuilder::EndFrame layout/painting/nsDisplayList.cpp:999 2 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:4034 3 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:4038 4 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:4038 5 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:4038 6 xul.dll style::bloom::BLOOM_KEY::__getit z:/build/build/src/obj-firefox/toolkit/library/rust/<__thread_local_inner macros>:6 ============================================================= this crash signature is newly showing up cross-platform since firefox 58
retained display lists are riding the trains in 58, but are preffed off for non-nightly. updating flags to reflect that.
This is super weird. The crash is within std::list code, and the crashing line (asm only) is basically: iter->_Next->_Prev = iter->_Prev; That crashes because iter->_Next is garbage (0xE5E5E5E5 in the report I looked at). The other values on iter (_Prev and _Myval) both look to be fine. These aren't values we control, so I don't see how they got corrupted without some sort of buffer overflow bug.
e5e5 is a UAF address, note. if you can repro with ASAN it would be easy to fix (or rr). Note that e5e5 could be uninitialized memory too, though I imagine that's unlikely here. If someone freed the list (or rather I guess the iterator), then iter->_Next (and iter->_Prev) would be e5e5 Started around Oct 24 or so in Nightly Also, even if retained-DL is disabled in 58, this spiked when it went to Beta, which implies this is a crash that happens even if retained-DL is turned off. I imagine not every line of the retained-DL change is under a pref, so it may have affected preffed-off usage in some unexpected way. Moving 58 back to ? since we're seeing crashes in 58 and I doubt they're all due to people who manually flipped the pref.
Group: core-security
Flags: needinfo?(matt.woodrow)
Group: core-security → layout-core-security
(In reply to Randell Jesup [:jesup] from comment #3) > Moving 58 back to ? since we're seeing crashes in 58 and I doubt they're all > due to people who manually flipped the pref. We have RDL enabled for 50% of Beta to shake out bugs like this one.
Priority: -- → P1
(note: we need to watch this one if crashes started before the RDL pref experiment, since it may affect release)
(In reply to Mike Taylor [:miketaylr] (58 Regression Engineering Owner) from comment #5) > (note: we need to watch this one if crashes started before the RDL pref > experiment, since it may affect release) When did that start? Can we filter crashes on "has the experiment" (likely no)?
according to bug 1416975 the experiment was live between 2017-12-04 and 2017-12-18.
There are a number of crash reports(42 so far) from 58.0b13(BID:20171226085105), the crash is not relevant to the enabling of RDL.
Matt, Jet -- this apparently is not a RDL issue; is there any chance of getting something for 58? This is a sec regression in 58 Perhaps interesting is that the first version I see it in is 58b12 (with limited visiblity, and of course most people take updates). That might not be super-relevant.
Flags: needinfo?(bugs)
If I had to guess I'd suspect something that landed to support RDL, and gets used (or is different than before) even with RDL preffed off.
(In reply to Randell Jesup [:jesup] from comment #9) > Matt, Jet -- this apparently is not a RDL issue; is there any chance of > getting something for 58? This is a sec regression in 58 > > Perhaps interesting is that the first version I see it in is 58b12 (with > limited visiblity, and of course most people take updates). That might not > be super-relevant. Is this because the reports from earlier got purged? The bug got filed 2 months ago, I'm pretty sure it's been present for the entirety of 58 beta. I had another look through the URLs list, but haven't managed to find any sort of reproducible test case yet.
Flags: needinfo?(matt.woodrow)
As Matt says, this came off the rails somewhere here, or earlier... > 6 xul.dll style::bloom::BLOOM_KEY::__getit ...as it's impossible for the CSS bloom filter to invoke painting directly. That is, the contents of the bloom filter have been trashed and we happen to get lucky enough to execute 7 more stack frames before crashing. It sucks that we've lost all the crash reports, though if this is indeed in the DL code, we should get new crash reports soon.
Flags: needinfo?(bugs)
Assignee: nobody → matt.woodrow
Crash Signature: [@ nsDisplayListBuilder::FreeClipChains] → [@ nsDisplayListBuilder::FreeClipChains] [@ std::_Hash<T>::equal_range | nsDisplayListBuilder::FreeClipChains]
(In reply to Matt Woodrow (:mattwoodrow) from comment #11) > (In reply to Randell Jesup [:jesup] from comment #9) > > Matt, Jet -- this apparently is not a RDL issue; is there any chance of > > getting something for 58? This is a sec regression in 58 > > > > Perhaps interesting is that the first version I see it in is 58b12 (with > > limited visiblity, and of course most people take updates). That might not > > be super-relevant. > > Is this because the reports from earlier got purged? The bug got filed 2 > months ago, I'm pretty sure it's been present for the entirety of 58 beta. > > I had another look through the URLs list, but haven't managed to find any > sort of reproducible test case yet. Crashes came back, as expected. Can you revisit?
Flags: needinfo?(matt.woodrow)
Matt, looking at comment 5 and comment 6, it seems you caught this in a debugger? It should be easier to spot and analyze if you can run this through an address sanitizer build (instructions on MDN) or with rr.
(In reply to Frederik Braun [:freddyb] from comment #14) > Matt, looking at comment 5 and comment 6, it seems you caught this in a > debugger? > It should be easier to spot and analyze if you can run this through an > address sanitizer build (instructions on MDN) or with rr. Do you mean comment 2? That was from analyzing a minidump from the crash report, I haven't been able to reproduce this at all. ASAN/rr indeed would be super useful if we manage to find a testcase for this.
Flags: needinfo?(matt.woodrow)
Blocks: 1467514
No longer blocks: 1467514
Still crashing with this signature in 62 beta 14 Some of the crashes are wildptrs, some are clear UAFs like https://crash-stats.mozilla.com/report/index/7f1df30b-6a0e-4a8e-8269-845f50180731 Any more ideas? Any assertions we can add?
Flags: needinfo?(matt.woodrow)
Looking at that dump, it's failing in it = mClipChainsToDestroy.erase(it) --- in _Pnode->_Prev->_Next = _Pnode->_Next (_Pnode->_Prev is probably 0xe5e5...). This is on MainThread... perhaps something is modifying the list off-main-thread? random-memory-trashing is possible, but seems unlikely.
(In reply to Randell Jesup [:jesup] from comment #17) > Looking at that dump, it's failing in it = mClipChainsToDestroy.erase(it) > --- in _Pnode->_Prev->_Next = _Pnode->_Next (_Pnode->_Prev is probably > 0xe5e5...). This is on MainThread... perhaps something is modifying the > list off-main-thread? random-memory-trashing is possible, but seems > unlikely. Great, that matches my conclusion from comment 2. I'm not aware of any other threads touching this, this code isn't threadsafe at all, so it would be really bad. Looks like this is low frequency, and declining, though I'm not sure why.
Flags: needinfo?(matt.woodrow)
Keywords: stalled
This content crash is still present at a low volume on release 62. Since we are about to release 63, and this is now marked as stalled, wontfix for 62/63.
Group: layout-core-security → gfx-core-security
Assignee: matt.woodrow → nobody

Seems very low frequency now.

Severity: critical → S3
Priority: P1 → P3

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:tnikkel, could you consider increasing the severity of this security bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(tnikkel)

Actually, not reports newer than 91. Let's just close this.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(tnikkel)
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Group: gfx-core-security
You need to log in before you can comment on or make changes to this bug.