Closed Bug 1442018 Opened 7 years ago Closed 2 years ago

Crashes in BCPaintBorderIterator::SetNewData dereferencing null mCellMap

Categories

(Core :: Layout: Tables, defect, P2)

defect

Tracking

()

RESOLVED FIXED
121 Branch
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr68 --- wontfix
firefox-esr115 --- fixed
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox85 --- wontfix
firefox86 --- wontfix
firefox119 --- wontfix
firefox120 --- wontfix
firefox121 --- fixed

People

(Reporter: philipp, Assigned: jfkthame)

References

Details

(Keywords: crash)

Crash Data

Attachments

(8 files)

This bug was filed from the Socorro interface and is report bp-36b61c65-5c49-4dab-93e6-ae0df0180228. ============================================================= Top 10 frames of crashing thread: 0 xul.dll BCPaintBorderIterator::SetNewData layout/tables/nsTableFrame.cpp:6974 1 xul.dll BCPaintBorderIterator::Next layout/tables/nsTableFrame.cpp:7128 2 xul.dll nsTableFrame::IterateBCBorders layout/tables/nsTableFrame.cpp:7950 3 xul.dll nsDisplayTableBorderCollapse::Paint layout/tables/nsTableFrame.cpp:1299 4 xul.dll mozilla::FrameLayerBuilder::PaintItems layout/painting/FrameLayerBuilder.cpp:6029 5 xul.dll mozilla::FrameLayerBuilder::DrawPaintedLayer layout/painting/FrameLayerBuilder.cpp:6190 6 xul.dll mozilla::layers::BasicPaintedLayer::PaintThebes gfx/layers/basic/BasicPaintedLayer.cpp:94 7 xul.dll mozilla::layers::BasicLayerManager::PaintSelfOrChildren gfx/layers/basic/BasicLayerManager.cpp:708 8 xul.dll mozilla::layers::BasicLayerManager::PaintLayer gfx/layers/basic/BasicLayerManager.cpp:891 9 xul.dll mozilla::layers::BasicLayerManager::PaintSelfOrChildren gfx/layers/basic/BasicLayerManager.cpp:731 ============================================================= this is a cross platform crash that's hanging around for a long time already. the signature has many user comments, that refer to printing flight tickets, receipts, financial reports, checkout confirmations and documents of that sort - so a crash in these circumstances might be especially aggravating (& unfortunately many of those sites not publicly accessible in order to attempt to reproduce this): https://crash-stats.mozilla.com/signature/?product=Firefox&signature=BCPaintBorderIterator%3A%3ASetNewData&date=%3E%3D2017-12-25#comments
This look like a null-dereference of mCellMap. Presumably mTableCellMap is non-null but mCellMap is null, which is entirely possible.
Summary: Crash in BCPaintBorderIterator::SetNewData → Crashes in BCPaintBorderIterator::SetNewData dereferencing null mCellMap
Component: Printing: Output → Layout: Tables
Priority: -- → P2

this issue is reproducible for me in a new profile (win10 64bit):

is there anything to go on from these steps that would help in fixing this crash?

Has STR: --- → yes
Flags: needinfo?(svoisen)
Flags: needinfo?(svoisen)
Whiteboard: [layout:triage-discuss]
Flags: needinfo?(dholbert)
Whiteboard: [layout:triage-discuss]

Thanks for those STR, philipp! I can also reproduce by print-previewing and simply scrolling down (to view one of the further-down pages).

I think the problem involves painting some content on one of those distant pages (and we only have to paint it when you scroll it into view [or actually-print rather than just preview], which is why the initial print preview rendering doesn't crash).

I'll take a look in the next day or so to see what I can find out...

In a debug build, my reduced testcase triggers this assertion right before the crash:

###!!! ASSERTION: CellIterator program error: 'false', file layout/tables/nsTableFrame.cpp, line 6625

Searching bugzilla for that string, I see it's mentioned in bug 1118168 comment 7 -- adding that bug to "see also" in case the issues are related.

See Also: → 1118168

(In reply to David Baron :dbaron: 🏴󠁵󠁳󠁮󠁹󠁿 ⌚UTC-4 (if account gets disabled due to email bounces, ask a bugzilla admin to reenable it) from comment #1)

This look like a null-dereference of mCellMap.
Presumably mTableCellMap is
non-null but mCellMap is null, which is entirely possible.

This is correct. At least, when the assertion in comment 6 fires, mCellMap is null though mTableCellMap is non-null.

(You said that's entirely possible, though note that the ###!!! ASSERTION in comment 6 is effectively asserting that mCellMap must be non-null, right now. Source reference:

    if (SetNewRow(mRg->GetFirstRow())) {
      mCellMap = mTableCellMap->GetMapFor(fifRg, nullptr);
      if (!mCellMap) ABORT1(false);
    }

https://searchfox.org/mozilla-central/rev/8e0ea968308dd692ba5394b14e584234acf7d4ca/layout/tables/nsTableFrame.cpp#6623-6626

Here, ABORT1 really just expands to NS_ASSERTION -- so this code is effectively asserting (incorrectly) that mCellMap is non-null -- and then later depending on that assumption, and null-deref'ing as a result.

Flags: needinfo?(dholbert)

(In reply to Daniel Holbert [:dholbert] from comment #6)

Searching bugzilla for that string, I see it's mentioned in bug 1118168 comment 7 -- adding that bug to "see also" in case the issues are related.

Also FWIW, I tested that bug's patch to see if it helps with the testcase here - but sadly, it does not. (I still crash.)

See Also: → 1607045

These crash signature reports show there is still a pretty high volume of crashes in Fx86 (beta), Fx85 (beta and release) and older versions, but apparently never been reproducible in the Nightly channel.

The STR in comment 2 (plus a VPN extension set to US) is not the right STR for the signature of this bug because:

  1. With these steps, the tab crashes, but no crash report is observed in about:crashes afterwards so there's no signature to compare to.
  2. To exclude these steps for good, I can say that this tab crash only occurs on Windows and Mac OSes, not also Linux OSes, as opposed to the reports.
  3. I can confirm that the latest Nightly does NOT reproduce the tab crash, but it does reproduce in Nightly v68.
    I have used mozregression to track the fix and these are my results:
  • Tested mozilla-central build: 2020-01-24 (verdict: g)
  • Tested mozilla-central build: 2020-01-23 (verdict: b)
    The bisection could not be completed due to builds not being found.

The reduced test case dows not crash the affected builds so it is unlikely it will uncover information about our crash signature.

This being said, this crash does not have any steps to reproduce.
Please NI me if a proper STR is provided and testing is needed.

Severity: critical → S2

My attached testcase (from URL in comment 2) was fixed via another bug a few years ago. My STR for that are to print-preview the testcase and hit "end" key to scroll to the end. In bad builds, this crashes (after I scroll).

Fix range:
First good revision: 6f6e201853c8f5053d6f8837acf89e76f5a334c5 (2020-01-24)
Last bad revision: fb6b61e49217d835b2d6e435560424aab10d5475 (2020-01-23)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fb6b61e49217d835b2d6e435560424aab10d5475&tochange=6f6e201853c8f5053d6f8837acf89e76f5a334c5

In that range, the likely-fix would be bug 1607045 (which has the same crash signature associated with it).

So: the attached testcase is really a testcase for bug 1607045; and there's apparently other ways to trigger this crash, given the continuing nonzero crash volume.

Also notable: for whatever reason, the crash volume for supported builds is quite low at this point. In the past 14 days, we have 21 crashes from Firefox 102 ESR versions, and 27 crashes from older versions (all newer than bug 1607045's fix, though). (EDIT: It looks like this lack-of-volume-in-new-versions may've been due to the crash signature changing, per my next comment here.)

I did find a URL on one of the crash reports that seems to trigger this crash for me in up-to-date Nightly, though!
https://wetten.overheid.nl/BWBR0036758/2022-10-22/0/afdrukken ("Regulation European EZK and LNV subsidies")

If I print-preview this^ doc and then scroll down through the preview, I eventually crash (around page 40 or so, I think). Crash report: bp-7ac79c5e-0abd-410f-9769-ab82c0221216

--> adding the crash signature from my report (which includes [@ nsTArray_base<T>::Length | as a prefix)

Crash Signature: [@ BCPaintBorderIterator::SetNewData] → [@ BCPaintBorderIterator::SetNewData] [@ nsTArray_base<T>::Length | BCPaintBorderIterator::SetNewData]

(ticking needinfo as a reminder to generate a reduced testcase from a repro'ing URL.)

I have one probably-not-to-share-publicly repro-URL that I ran across, which reproduces the crash quickly & reliably, but unfortunately doesn't repro the issue in a saved copy, which makes it not ideal for testcase-reduction. But I did capture the crash in rr and uploaded the trace to pernosco, and I'll post the link when I've got it.

Flags: needinfo?(dholbert)

Pernosco trace for aforementioned not-publicly-sharable URL: https://pernos.co/debug/kZCavGtp46C5gYLl5P8kSA/index.html

Attached file reduced testcase 2

Here's a testcase reduced from the URL in one of the crash reports (with real information removed/replaced with bogus text).

This reproduces the bug for me on my Ubuntu 22.04 ThinkPad.

The testcase is very fiddly, which is part of why I haven't (yet) reduced it further. It depends heavily on font metrics (and on how that impacts table cell sizing).

One key requirement to trigger the crash is that CCCCCCCCCCCCCCCCCCC DDDDDDDDDDDDDDD must linewrap and must be the first row on page 2, I think. (If I add or remove rows before it, such that it's not at the top of page 2, then I don't crash.)

Flags: needinfo?(dholbert)

This testcase is reduced quite a bit and reproduces the crash when I print-preview it, in builds several years old, with default configuration on my machine (e.g. when launched via mozregression, print previewed on my default page size which is US Letter).

In current Nightly, it doesn't repro the crash with deafult settings, but it does reproduce the crash when print-previewed with scale:110%. (I think that current vs. old-build difference is just due to minor changes in default page-margin settings or something like that.)

With this testcase, I get regression range:
Last good revision: 5f0961efaa1c01e5b4bc449ee5d58260393291ef (2019-07-03)
First bad revision: 32d7797bd8bd91e7b62ef2a5e19b8888881766f1 (2019-07-04)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5f0961efaa1c01e5b4bc449ee5d58260393291ef&tochange=32d7797bd8bd91e7b62ef2a5e19b8888881766f1

In that range, I think this would've been introduced by:
https://hg.mozilla.org/mozilla-central/rev/ae0a517daf9700bcb8c7ac0b035870fb508f2929
Bug 1474771 - Propagate NS_FRAME_IS_DIRTY to descendants when marking as dirty rather than during reflow.

We probably end up having trouble when we do an additional reflow on/inside some continuation of a table-part, in response to having that dirty bit set.

and here's a version of the same testcase that reproduces in current Nightly & will hopefully make for a useful reftest-paged crashtest.

Pernosco trace with a crash while print-previewing reduced testcase 5:
https://pernos.co/debug/kbs87wqWf0RtlDV1F2u54w/index.html

Hoping to take a look here, since this is still S2.

Flags: needinfo?(dholbert)
Assignee: nobody → dholbert
Flags: needinfo?(dholbert)

I poked a bit at the pernosco trace here, and I have a couple of ideas for things we can do to mitigate this.

First, let's add a simple null-check in BCPaintBorderIterator::SetNewData, and bail out if mCellMap has not been set, rather than crashing. This is not a fix for the real bug here; it's just wallpaper to prevent the null-deref crash. When we hit a "bad" case (failed to find the cell-map), it'll potentially leave a couple of borders unpainted, but that's a less-bad user experience than crashing the tab.

For something more like a real fix, I think the issue arises from the table-header group that has been both repeated and fragmented. Here's an excerpt of the frame tree on the first page of the preview:

    FloatList@7fb516936048 <
      Block(div)(0)@7fb516936058 parent=7fb516935f90 next-in-flow=7fb51693d650 (x=0, y=0, w=203.333, h=192) [content=7fb516b04dc0] [cs=7fb516e714d8] <
        line@7fb516936db0 count=1 state=block,clean,prevmarginclean,not-impacted,not-wrapped,no-break,clear-before:none,clear-after:none(x=0, y=0, w=203.333, h=162) <
          TableWrapper(table)(1)@7fb5169361a0 parent=7fb516936058 next=7fb51693d0e0 next-in-flow=7fb51693d0e0 (x=0, y=0, w=203.333, h=162) [content=7fb516b090c0] [cs=7fb51cfd3f28:-moz-table-wrapper] <
[1]         Table(table)(1)@7fb516936250 parent=7fb5169361a0 next-in-flow=7fb51693d190 (x=0, y=0, w=203.333, h=162) [content=7fb516b090c0] [cs=7fb516e71c58] <
[3]           TableRowGroup(thead)(1)@7fb516936398 parent=7fb516936250 next=7fb5169366a8 (x=2.66667, y=2.66667, w=198.667, h=54.6667) ink-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=59.3333) scr-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=59.3333) [content=7fb516b04e50] [cs=7fb516e71d48] <
                TableRow(tr)(1)@7fb516936448 parent=7fb516936398 (x=0, y=0, w=198.667, h=54.6667) ink-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=59.3333) scr-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=59.3333) [content=7fb516b04ee0] [cs=7fb516e9c3e8] <
                  BCTableCell(th)(0)@7fb516936510 parent=7fb516936448 (x=0, y=0, w=198.667, h=54.6667) ink-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=59.3333) scr-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=59.3333) [content=7fb516b04f70] [cs=7fb516e9c5c8] <
                    Block(th)(0)@7fb5169365e0 parent=7fb516936510 (x=3, y=27, w=192, h=0) [content=7fb516b04f70] [cs=7fb51cfd42e8:-moz-cell-content] <
                    >
                  >
                >
              >
              TableRowGroup(tbody)(3)@7fb5169366a8 parent=7fb516936250 next-in-flow=7fb51693d020 (x=2.66667, y=57.3333, w=198.667, h=102.667) [content=7fb516b0b040] [cs=7fb516e9c2f8] <
                TableRow(tr)(1)@7fb516936758 parent=7fb5169366a8 (x=0, y=0, w=198.667, h=102.667) ink-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=107.333) scr-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=107.333) [content=7fb516b0b0d0] [cs=7fb516e9c4d8] <
                  BCTableCell(td)(0)@7fb516936820 parent=7fb516936758 (x=0, y=0, w=198.667, h=102.667) ink-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=107.333) scr-overflow=(x=-2.66667, y=-2.66667, w=203.333, h=107.333) [content=7fb516b0b160] [cs=7fb516e9c6b8] <
                    Block(td)(0)@7fb5169368f0 parent=7fb516936820 (x=3, y=51, w=192, h=0) [content=7fb516b0b160] [cs=7fb51cfd44c8:-moz-cell-content] <
                    >
                  >
                >
              >
              ColGroupList@7fb516936388 <
                TableColGroup(table)(1)@7fb516936c28 parent=7fb516936250 (x=2.66667, y=2.66667, w=198.667, h=157.333) [content=7fb516b090c0] [cs=7fb51cfd45b8:-moz-table-column-group] <
                  TableCol(table)(1)@7fb516936cd8 parent=7fb516936c28 (x=0, y=0, w=198.667, h=157.333) [content=7fb516b090c0] [cs=7fb51cfd46a8:-moz-table-column]
                >
              >
            >
          >
        >
        line@7fb51693d5e8 count=1 state=block,clean,prevmarginclean,not-impacted,not-wrapped,no-break,clear-before:none,clear-after:none(x=0, y=162, w=198.667, h=25.3333) ink-overflow=(x=0, y=162, w=198.667, h=28) scr-overflow=(x=0, y=162, w=198.667, h=28) <
          TableWrapper(table)(1)@7fb51693d0e0 parent=7fb516936058 prev-in-flow=7fb5169361a0 next-in-flow=7fb51693da48 (x=0, y=162, w=198.667, h=25.3333) ink-overflow=(x=0, y=0, w=198.667, h=28) scr-overflow=(x=0, y=0, w=198.667, h=28) [content=7fb516b090c0] [cs=7fb51cfd3f28:-moz-table-wrapper] <
[2]         Table(table)(1)@7fb51693d190 parent=7fb51693d0e0 prev-in-flow=7fb516936250 next-in-flow=7fb51693daf8 (x=0, y=0, w=198.667, h=25.3333) ink-overflow=(x=0, y=0, w=198.667, h=28) scr-overflow=(x=0, y=0, w=198.667, h=28) [content=7fb516b090c0] [cs=7fb516e71c58] <
[4]           TableRowGroup(thead)(1)@7fb51693d2d8 parent=7fb51693d190 next-in-flow=7fb51693d988 (x=2.66667, y=2.66667, w=194, h=25.3333) [content=7fb516b04e50] [cs=7fb516e71d48] <
                TableRow(tr)(1)@7fb51693d388 parent=7fb51693d2d8 next-in-flow=7fb51693d728 (x=0, y=0, w=194, h=25.3333) ink-overflow=(x=0, y=0, w=198.667, h=25.3333) scr-overflow=(x=0, y=0, w=198.667, h=25.3333) [content=7fb516b04ee0] [cs=7fb516e9c3e8] <
                  BCTableCell(th)(0)@7fb51693d450 parent=7fb51693d388 next-in-flow=7fb51693d7f0 (x=0, y=0, w=198.667, h=25.3333) [content=7fb516b04f70] [cs=7fb516e9c5c8] <
                    Block(th)(0)@7fb51693d520 parent=7fb51693d450 next-in-flow=7fb51693d8c0 (x=1, y=12.6667, w=196.667, h=0) [content=7fb516b04f70] [cs=7fb51cfd42e8:-moz-cell-content] <
                    >
                  >
                >
              >
            >
          >
        >
      >
    >

Here, we can see that the table [1] has been fragmented, with a next-in-flow at [2]. Its table-header [3] gets repeated in the second fragment of the table, at [4]; this is done by nsCSSFrameConstructor::CreateContinuingTableFrame.

Such repeated frames don't get their own entries in the table's cellmap (which is maintained only on the first-in-flow table frame). There's code in nsCellMap::GetMapFor to handle this by looking for the "original" frame from which they were repeated, so they share the same cellmap.

However, this doesn't happen in this case, because the repeated instance of the header frame [4] has been fragmented (note that it has a next-in-flow), and this means that the frame constructor marked it as non-repeatable. That means nsCellMap doesn't do its search-for-the-original, and ends up returning null.

So I think we can fix this by ensuring that nsCellMap::GetMapFor does perform that search in such cases. Even though the current frame isn't marked as repeatable because of having a next-in-flow, it may nevertheless have been repeated from an earlier copy (that wasn't fragmented).

I've pushed a try run with a patch to do this, to see if it appears to break anything else...

This does not fix the underlying issue, which is that when the table
got fragmented, we failed to maintain the cellmap (attached to the
first-in-flow) to keep track of the additional rowgroup frame created
to go in the overflowing part of the table.

So this is a wallpaper patch just to prevent the crash here (confirmed
with testcase 5); it'll still throw NS_ASSERTIONs in a debug build,
and in the "bad" case (which depends on details of scaling) some of
the borders will be missing from the print preview/output, but that
seems preferable to crashing.

If a repeated header/footer row group gets fragmented, so that it has a
next-in-flow, its REPEATABLE flag will be cleared. But in such a case,
nsCellMap still needs to find the "original" in order to get the map
for the repeated frame.

Depends on D193660

Depends on D193661

Thanks, jfkthame!

Assignee: dholbert → jfkthame
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fa9a9180e244 Avoid crashing in BCPaintBorderIterator if mCellMap was not found. r=layout-reviewers,emilio https://hg.mozilla.org/integration/autoland/rev/be3ac4c9a585 nsCellMap needs to try harder to find the appropriate cellmap for a table header/footer that may be a repeat. r=layout-reviewers,emilio https://hg.mozilla.org/integration/autoland/rev/432e879ab207 Add testcase-5 as a crashtest. r=layout-reviewers,emilio

Please nominate this for ESR115 approval when you get a chance.

Flags: needinfo?(jfkthame)

Comment on attachment 9363695 [details]
Bug 1442018 - Avoid crashing in BCPaintBorderIterator if mCellMap was not found. r=#layout

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fix for a longstanding crash while painting table-cell borders
  • User impact if declined: Continued crashiness
  • Fix Landed on Version: 121
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The fixes here are well-understood and highly localized: a null-check to prevent the crash, and an added check for table header/footer rows that were being overlooked.
Flags: needinfo?(jfkthame)
Attachment #9363695 - Flags: approval-mozilla-esr115?
Attachment #9363696 - Flags: approval-mozilla-esr115?
Attachment #9363697 - Flags: approval-mozilla-esr115?

Comment on attachment 9363695 [details]
Bug 1442018 - Avoid crashing in BCPaintBorderIterator if mCellMap was not found. r=#layout

Approved for 115.6esr.

Attachment #9363695 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
Attachment #9363696 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
Attachment #9363697 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: