Closed Bug 862624 Opened 11 years ago Closed 7 years ago

"ASSERTION: Must have usable originating data here" and crash with large colspan

Categories

(Core :: Layout: Tables, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla47

People

(Reporter: jruderman, Assigned: MatsPalmgren_bugz)

References

(Blocks 1 open bug, )

Details

(Keywords: assertion, crash)

Crash Data

Attachments

(7 files, 3 obsolete files)

429 bytes, text/html
Details
27.42 KB, text/plain
Details
6.97 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
1.80 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
2.54 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
69.69 KB, patch
bzbarsky
: review-
Details | Diff | Splinter Review
67.48 KB, patch
Details | Diff | Splinter Review
###!!! ASSERTION: Must have usable originating data here: 'cellFrame', file layout/tables/nsCellMap.cpp

Crash [@ nsCellMapColumnIterator::GetNextFrame]
Attached file stacks (gdb)
On Windows: bp-eee052ef-4a3d-4627-8f28-def1b2130417.
Crash Signature: [@ nsCellMapColumnIterator::GetNextFrame(int*, int*)]
OS: Mac OS X → All
Hardware: x86_64 → All
Attached patch fix+testSplinter Review
Currently we use an uintptr_t for storing both the col/row span,
though we actually use only 32 bits even in 64-bit builds,
10 bits for colspan and 16 for rowspan offsets + 6 bits for flags.

This patch changes it to an uint64_t -- with 29 bits each for the
col/row span offsets + 6 flags.

We could wallpaper this with in early return in GetNextFrame() if
'mCol' is too large, but that seems risky -- it would fix the
null-pointer crash here at the risk of a nastier crash somewhere
else. So I think we should fix this properly by using larger fields.
It costs nothing in 64-bit builds, but it costs an additional 32-bit
word per CellData in 32-bit builds.  I don't think that's a problem
in practice on real web pages though.  And I think 29 bits is
sufficiently large that we'll hit an OOM before it overflows.

https://tbpl.mozilla.org/?tree=Try&rev=af1d22cc25e8
Assignee: nobody → matspal
Attachment #750195 - Flags: review?(bzbarsky)
Comment on attachment 750195 [details] [diff] [review]
fix+test

This is going to double the size of the cellmap for 32-bits, no?  For large tables that can easily mean hundreds of megabytes...  and yes, there are plenty of tables that size out there, especially when people are using rowspan/colspan values in the thousand+ range, as some tools like to.

r- based on that, unless there is really no other option...

On the other hand, I could see us doing this if we also fixed the bug where we heap-allocate the celldata and saved a corresponding amount of space that way.
Attachment #750195 - Flags: review?(bzbarsky) → review-
Comment on attachment 750195 [details] [diff] [review]
fix+test

(In reply to Boris Zbarsky (:bz) from comment #4)
> This is going to double the size of the cellmap for 32-bits, no?

Not really, it doubles the *size of the CellData struct* on 32-bit.
So I think you're slightly exaggerating the problem -- the memory used
for the CellData itself is only a fraction of memory used for a table.

> On the other hand, I could see us doing this if we also fixed the bug where
> we heap-allocate the celldata and saved a corresponding amount of space that
> way.

Fair enough - part 3 does that.  So with that, we use a CellData struct
per cell instead of a pointer + CellData, so the memory usage should stay
the same on 32-bit, and on 64-bit it'll save a pointer per cell.
Attachment #750195 - Flags: review- → review?(bzbarsky)
Attachment #751195 - Attachment description: wdiff of the last part for reviewing puposes → wdiff of the last part for reviewing purposes
> the memory used for the CellData itself is only a fraction of memory used for a table.

The memory used for CellData is, for large tables with large col/rowspans the main memory usage.  The last time I measured this I was getting things like 300MB of just CellData allocations on such tables....  The key is large spans: you can have a pretty small number of actual cells and a huge cellmap.

I really appreciate you doing the conversion to non-pointers, thank you!
Blocks: 357729
This breaks some a11y tests - need to look into those...
https://tbpl.mozilla.org/?usebuildbot=1&tree=Try&rev=755131fb66af
Comment on attachment 751187 [details] [diff] [review]
Make the CellData array be an array of CellData (or BCCellData) instead of an array of pointers to those.

The test failures was caused by this part.
We have some code that depends on having "holes" in the cellmap
(i.e. null-pointer CellData in the array) as a distinct state
separate from the "IsDead" state.

I'll add an additional patch that creates an "IsNull" state
and use that for cases where the old code extended the
CellData array introducing null-pointer entries.  The intention
is that this part should be idempotent. (We can investigate
the code that depends on this state later and decide if we
want to try to remove this state).

I think it's probably easier to review the current patches
as is, with that addition in mind.  Or you can wait with
this part if you wish.
Let me know if your prefer this folded with the "array" patch.

I'm not sure what to think of the M2 failures on Win32 in the
earlier try run -- it says "9223372036854775808 bytes leaked (CellData)"
which seems unlikely...  Is this just an integer overflow in the leak
checker?  Perhaps we should just remove the MOZ_COUNT_CTOR for
[BC]CellData now?

https://tbpl.mozilla.org/?tree=Try&rev=204ba3f24f99
Attachment #752174 - Flags: review?(bzbarsky)
> as a distinct state separate from the "IsDead" state.

Yes, but see the discussion in bug 357729?  At one point I thought we could get rid of that...

I guess working around it is ok for now.

> Is this just an integer overflow in the leak checker?

The leak checker logs addresses.  If you have arrays of celldata objects, not pointers, they get memmoved when the array reallocs, so the leak checker will get very confused.  So yes, we should remove the COUNT_* from celldata.

I think I would prefer the IsNull patch folded with the arrays patch, since they're a single conceptual unit.
(In reply to Boris Zbarsky (:bz) from comment #14)
> Yes, but see the discussion in bug 357729?  At one point I thought we could
> get rid of that...

Looking at your patch it seems you have the same representation as I tried,
so I suspect it would have failed these a11y mochitests too.

> I guess working around it is ok for now.

Yeah, it's a separate problem, and rather minor IMO.  I'll file a bug for it.

> The leak checker logs addresses.  If you have arrays of celldata objects,
> not pointers, they get memmoved when the array reallocs, so the leak checker
> will get very confused.  So yes, we should remove the COUNT_* from celldata.

Ah, I see, thanks.
Folded the two earlier patches.  Removed the MOZ_COUNT_CTOR stuff.
Attachment #751187 - Attachment is obsolete: true
Attachment #752174 - Attachment is obsolete: true
Attachment #751187 - Flags: review?(bzbarsky)
Attachment #752174 - Flags: review?(bzbarsky)
Attachment #752328 - Flags: review?(bzbarsky)
Attachment #751195 - Attachment is obsolete: true
I am really sorry for the lag here.  I will be working on this review tomorrow.
Comment on attachment 751184 [details] [diff] [review]
Make nsTableCellMap/nsCellMap MOZ_FINAL classes.

r=me
Attachment #751184 - Flags: review?(bzbarsky) → review+
Comment on attachment 750195 [details] [diff] [review]
fix+test

So this is fine as far as it goes, but I'd like to understand why it's needed.  Were we trying to store out-or-range numbers in the offset fields in the celldata?  If so, why?
Attachment #750195 - Flags: review?(bzbarsky) → review+
Ah, I guess he colspan="0" cell was ending up 1396 cols wide?

But can't that still be triggered by just using slightly bigger numbers?  Or is the point that if individual cells are limited to 10 or 16 bits but their sum can go to 29 bits you need a whole lot of cells?

For colspan, could we just remove support for colspan="0"?  Other UAs don't seem to support it and the HTML5 spec says colspan "must be a valid non-negative integer greater than zero".  And the UA requirements are "If parsing that value failed, or returned zero, or if the attribute is absent, then let colspan be 1, instead."

I guess that would still leave the issue for rowspan, right?
Comment on attachment 751278 [details] [diff] [review]
Remove now unused nsCellMap::mPresContext field.  Reorder some of the remaining fields to minimize alignment spill.

So this applies after the "stop heap-allocating celldata" patch, right?

r=me
Attachment #751278 - Flags: review?(bzbarsky) → review+
Comment on attachment 750195 [details] [diff] [review]
fix+test

Oh, one other note.  The 0xFFFFFFFF plus shifting and masking bits are pretty non-obvious.  I'd prefer we just did uint64_t(PR_BITMASK(32 - SPAN_FLAG_NUM_BITS)) << COL_SPAN_SHIFT and similar for the rowspans.
Comment on attachment 752328 [details] [diff] [review]
Make the CellData array be an array of CellData (or BCCellData) instead of an array of pointers to those. Introduce a IsNull state for CellData.

So this is heap-allocating the row arrays....  That seems a bit suboptimal.  Especially since sizeof(nsTArray<T>) does not depend on <T>.

If we could put nsTArray<T> inside unions I'd suggest having an nsTArray of such unions or something.  Since we can't, how about an nsTArray<nsTArray<CellData>>, with casts of the elements to nsTArray<BCCellData> as needed (which we have already anyway) and some copious comments explaining the setup?
Attachment #752328 - Flags: review?(bzbarsky) → review-
And again, _really_ sorry for the lag.  :(
Blocks: 897883
Hey Guys was running into https://crash-stats.mozilla.com/report/index/bp-cbf61ea9-bab1-47e9-aad9-77c142131210 while using bughunter and checking http://baike.baidu.com/view/1041988.htm

That site crashes nearly on load on debug/opt and since its real world site with this crash we maybe should fix it ? :)
Crash Signature: [@ nsCellMapColumnIterator::GetNextFrame(int*, int*)] → [@ nsCellMapColumnIterator::GetNextFrame(int*, int*)] [@ nsCellMapColumnIterator::GetNextFrame]
mats: is there a way we can fix this crash here, seems we already had a patch
Flags: needinfo?(mats)
(In reply to Boris Zbarsky [:bz] from comment #22)
> For colspan, could we just remove support for colspan="0"?

I filed bug 1241840 about that.

> I guess that would still leave the issue for rowspan, right?

I suspect so, yes.  But I seem to recall the colspan=0 code being more
complicated that rowspan=0 though so removing that should help.

(In reply to Carsten Book [:Tomcat] from comment #33)
> mats: is there a way we can fix this crash here, seems we already had a patch

Yeah, I'll try to resurrect these patches.  Thanks for the reminder.
Flags: needinfo?(mats)
Bug 1242164 creates an unused bit in celldata.h that's supposed to be dealt with in this bug.
I was asked today when we would fix this bug and it also seems to occur in the wild. As part of our recent quality push, we should probably look into this. Is this a regression?

Mats, is this still on your plan for fixing with the attached patches, or is something else required here? Thanks!
Flags: needinfo?(mats)
I probably wont have time to do that before June or so.  So feel free to take it!
Flags: needinfo?(mats)
i was not able to crash anymore either with the testcase or my url - maybe this was fixed by something else somehow ?
Crash volume for signature 'nsCellMapColumnIterator::GetNextFrame':
 - nightly (version 50): 3 crashes from 2016-06-06.
 - aurora  (version 49): 1 crash from 2016-06-07.
 - beta    (version 48): 5 crashes from 2016-06-06.
 - release (version 47): 21 crashes from 2016-05-31.
 - esr     (version 45): 4 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          0          0          0          3          0          0
 - aurora           0          0          0          0          1          0          0
 - beta             0          2          2          1          0          0          0
 - release          5          4          1          5          3          3          0
 - esr              0          3          0          1          0          0          0

Affected platforms: Windows, Mac OS X
mozregression says it was fixed by something in the range below:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0ecd7d72f232304da046b352c457e039e35ceed7&tochange=8a3c5c9b1486c3328bc4684d5ac5b3b849b09474
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
The testcase has <td colspan="0"></td> so I'm guessing bug 1241840.
I wouldn't be surprised if fuzzing finds another way to trigger it though...
Guess we'll find out when the time comes :)
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: