Closed Bug 810726 Opened 7 years ago Closed 7 years ago

Infinite loop in nsBlockInFlowLineIterator::FindValidLine() with specific layout

Categories

(Core :: Layout: Block and Inline, defect, P1, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla22
blocking-basecamp -
Tracking Status
firefox16 --- unaffected
firefox17 + fixed
firefox18 + fixed
firefox19 + fixed
firefox20 + fixed
firefox21 + fixed
firefox22 + fixed

People

(Reporter: ttaubert, Assigned: jwir3)

References

(Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [qa-])

Attachments

(12 files, 4 obsolete files)

1.10 KB, text/html
Details
59.55 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.85 KB, text/plain
Details
11.09 KB, text/plain
Details
59.32 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
57.12 KB, patch
mats
: review+
Details | Diff | Splinter Review
6.02 KB, patch
jwir3
: review+
Details | Diff | Splinter Review
57.29 KB, patch
Details | Diff | Splinter Review
2.91 KB, patch
Details | Diff | Splinter Review
55.84 KB, patch
mats
: review+
Details | Diff | Splinter Review
9.44 KB, patch
Details | Diff | Splinter Review
55.85 KB, patch
jwir3
: review+
Details | Diff | Splinter Review
I tried to reduce the test case as much as possible. You just need to click the provided file and it will freeze Firefox (from Beta to Nightly). Reproducible on Windows, Linux, Mac.

Regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9d2a7a8ca1c7&tochange=582d4c67b3a7

The stack trace is almost endless and looks like the following:

#0  nsBlockFrame::GetOverflowLines (this=<optimized out>)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:4449
#1  0x00007f3f4e325a80 in FindValidLine (this=0x7fffb0242c70)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:5231
#2  nsBlockInFlowLineIterator::FindValidLine (this=0x7fffb0242c70)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:5215
#3  0x00007f3f4e325ace in nsBlockInFlowLineIterator::nsBlockInFlowLineIterator (this=<optimized out>, 
    aFrame=<optimized out>, aFoundValidLine=0x7fffb0242c8f)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:5064
#4  0x00007f3f4e327a2c in nsBlockFrame::RenumberListsInBlock (aPresContext=0x7f3f2e743c00, 
    aBlockFrame=<optimized out>, aOrdinal=0x7fffb0242dac, aDepth=1, aIncrement=1)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:6604
#5  0x00007f3f4e3279aa in nsBlockFrame::RenumberListsFor (aPresContext=0x7f3f2e743c00, aKid=<optimized out>, 
    aOrdinal=0x7fffb0242dac, aDepth=0, aIncrement=1)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:6679
#6  0x00007f3f4e327a60 in nsBlockFrame::RenumberListsInBlock (aPresContext=0x7f3f2e743c00, 
    aBlockFrame=<optimized out>, aOrdinal=0x7fffb0242dac, aDepth=0, aIncrement=1)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:6617
#7  0x00007f3f4e327b97 in RenumberLists (aPresContext=0x7f3f2e743c00, this=0x7f3f29de3e68)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:6592
#8  nsBlockFrame::RenumberLists (this=0x7f3f29de3e68, aPresContext=0x7f3f2e743c00)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:6549
#9  0x00007f3f4e32ad32 in nsBlockFrame::Reflow (this=0x7f3f29de3e68, aPresContext=0x7f3f2e743c00, aMetrics=..., 
    aReflowState=..., aStatus=@0x7fffb02437c8: 3)
    at /home/tim/workspace/fx-team/layout/generic/nsBlockFrame.cpp:1003
#10 0x00007f3f4e331787 in nsContainerFrame::ReflowChild (this=<optimized out>, aKidFrame=0x7f3f29de3e68, 
    aPresContext=0x7f3f2e743c00, aDesiredSize=..., aReflowState=..., aX=<optimized out>, aY=0, aFlags=0, 
    aStatus=@0x7fffb02437c8: 3, aTracker=0x0)
    at /home/tim/workspace/fx-team/layout/generic/nsContainerFrame.cpp:953
[...]
FTR, this breaks the B2G browser as well. The page stays just white because the tab is OOP but every further navigation fails silently.
blocking-basecamp: --- → ?
mozregression tells me it's caused by bug 764567.
Blocks: 764567
Reproduced this on Mac 17.0 Beta - how easy is it to back out the patch in bug 764567 before tomorrow's go to build on final beta?
Flags: needinfo?(sjohnson)
not tracking for esr17, since if we fix this in 17 we'll have it fixed there too.
(In reply to Lukas Blakk [:lsblakk] from comment #3)
> Reproduced this on Mac 17.0 Beta - how easy is it to back out the patch in
> bug 764567 before tomorrow's go to build on final beta?

I don't think it should be too difficult to back out. I'm on travel at the moment, though, so my response times are going to be somewhat delayed...
Flags: needinfo?(sjohnson)
Tim, any chance you could help here? we'd need a tested & verified backout landed to beta branch by midday Pacific tomorrow. If not you, can you recommend someone?
Flags: needinfo?(ttaubert)
(In reply to Lukas Blakk [:lsblakk] from comment #6)
> Tim, any chance you could help here? we'd need a tested & verified backout
> landed to beta branch by midday Pacific tomorrow. If not you, can you
> recommend someone?

Sorry about this - normally, I would be on top of it asap, but being at a workweek makes it a bit difficult. If noone else can do it, I can do my best to get it backed out tomorrow morning, but I'm not sure how long it will take me to verify that it's working as expected...
One thing I forgot to mention: if we do choose to backout the previously mentioned bug, we need to make sure that the column-fill feature is completely removed. Otherwise, we'll be in a state where we don't want to be. The column-fill feature originally had to be backed out because of a problem with constrained-height elements, which was fixed by the aforementioned bug. I think everything relating to column-fill was backed out and relanded in bug 764567, but we need to be careful and verify that it's actually the case that this is removed.

Thus, if bug 764567 is backed out, it should be the case that column-fill has no effect on multicol elements.
Also, this looks to only be a problem with balancing columns. If you add -moz-column-fill: auto to the CSS, it doesn't appear to be a problem.

My recommendation would be to back this out of beta only (thus FF 17). I think we can fix this for all other releases reasonably easily.
(In reply to Scott Johnson (:jwir3) from comment #7)
> Sorry about this - normally, I would be on top of it asap, but being at a
> workweek makes it a bit difficult. If noone else can do it, I can do my best
> to get it backed out tomorrow morning, but I'm not sure how long it will
> take me to verify that it's working as expected...

No problem, I'll do what I can to help here but I have no clue at all about layout code.

Unfortunately, backing out https://hg.mozilla.org/mozilla-central/rev/34d30fa24371 (bug 764567) doesn't easily work as there have been too many other changes to affected files since then.
Flags: needinfo?(ttaubert)
I'll look at it this morning and see if I can get a decent backout patch, even if I have to manually back these changes out. ;)

I'm not sure if we have enough time, though... it still has to go through the review process. :(
Attached patch b810726-backoutSplinter Review
This is the backout patch for the column-fill stuff for the beta tree. 

Note, I intend to add a crashtest as well so we don't regress this in future versions, but I'm going to put that in a separate patch in a bit (once it's written).
Assignee: nobody → sjohnson
Attachment #680697 - Flags: review?(roc)
Comment on attachment 680697 [details] [diff] [review]
b810726-backout

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 764567
User impact if declined: Infinite loop in some rare cases of column-fill usage
Testing completed (on m-c, etc.): none :(
Risk to taking this patch (and alternatives if risky): Regresses column-fill behavior, so it no longer works. 
String or UUID changes made by this patch: none
Attachment #680697 - Flags: approval-mozilla-beta?
Attachment #680697 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
landed on beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/74d121044a74

We should keep this bug open, though, because we'll want an actual fix on all other trees (along with a crashtest).
Added a frame dump from nsColumnSetFrame::Reflow from the current beta (so the version with backout patch applied).
This is a framedump after a few iterations of nsContainerFrame::ReflowChild(), which appears to be where the loop is iterating indefinitely. It just keeps adding next-in-flow frames.
(In reply to Scott Johnson (:jwir3) from comment #14)
> landed on beta:
> https://hg.mozilla.org/releases/mozilla-beta/rev/74d121044a74
> 
> We should keep this bug open, though, because we'll want an actual fix on
> all other trees (along with a crashtest).

Should we do the same for FF18?
(In reply to Alex Keybl [:akeybl] from comment #17)
> (In reply to Scott Johnson (:jwir3) from comment #14)
> > landed on beta:
> > https://hg.mozilla.org/releases/mozilla-beta/rev/74d121044a74
> > 
> > We should keep this bug open, though, because we'll want an actual fix on
> > all other trees (along with a crashtest).
> 
> Should we do the same for FF18?

Yeah, I think that's probably a good choice. I don't know how much time I'll have to look into this right now, since I'm working on reflow-on-zoom, which we'd (ideally) like to ship in FF19.
(In reply to Scott Johnson (:jwir3) from comment #18)
> (In reply to Alex Keybl [:akeybl] from comment #17)
> > (In reply to Scott Johnson (:jwir3) from comment #14)
> Yeah, I think that's probably a good choice. I don't know how much time I'll
> have to look into this right now, since I'm working on reflow-on-zoom, which
> we'd (ideally) like to ship in FF19.

Also, so we don't repeat the same problem we're having right now where an IID changed in between beta cycles, we should probably plan to land on aurora next time...
(In reply to Scott Johnson (:jwir3) from comment #13)
> String or UUID changes made by this patch: none

Turns out (as comments above suggest) that this wasn't actually true, hence bug 813264.
(In reply to Scott Johnson (:jwir3) from comment #18)
> (In reply to Alex Keybl [:akeybl] from comment #17)
> > (In reply to Scott Johnson (:jwir3) from comment #14)
> > > landed on beta:
> > > https://hg.mozilla.org/releases/mozilla-beta/rev/74d121044a74
> > > 
> > > We should keep this bug open, though, because we'll want an actual fix on
> > > all other trees (along with a crashtest).
> > 
> > Should we do the same for FF18?
> 
> Yeah, I think that's probably a good choice. I don't know how much time I'll
> have to look into this right now, since I'm working on reflow-on-zoom, which
> we'd (ideally) like to ship in FF19.

Sounds like we'll want to take this change in our first FF18 beta. Please prepare a patch and let us know when it's ready for approval.
dbaron, just to be clear so I don't mess this up again:

The IID of nsIDOMCSS2Properties should be the same as the IID of nsIDOMCSS2Properties prior to the addition of MozColumnFill, correct (i.e. the one in the currently posted patch)? Or should it be the same as the one that was modified in bug 813264?
This is the version to be backed out of beta. It required manual merging, so I'm requesting a review. Note that I left the CheckInvalidateSizeChange() call out, even though it would normally have been added back in, because DLBI removed that call.
Attachment #683714 - Flags: review?(dbaron)
Comment on attachment 683714 [details] [diff] [review]
b810726-backout-beta

r=dbaron (I reviewed the delta from the previous backout)
Attachment #683714 - Flags: review?(dbaron) → review+
This bug has been called out as likely having risk to non-B2G platforms. Given that, marking as P1, and moving into the C2 milestone. We should prioritize this landing to mozilla-beta as soon as possible, to prevent late-breaking regressions to other platforms.
Priority: -- → P1
Target Milestone: --- → B2G C2 (20nov-10dec)
blocking-basecamp: + → -
Interesting... when doing debugging for this & trying to write a crashtest, it seems it only fails (i.e. goes into this infinite loop) if the window size is a certain width or greater. The crashtest width is too narrow, so it doesn't cause the crashtest to fail. I've confirmed this with a normal desktop window, as well. Once I resize it to a narrow size, the infinite loop doesn't happen. If I resize it slowly, though, it happens after a certain width. 

I'm in the process of determining what this "certain width" is...
Attached patch b810726-backout-aurora (obsolete) — Splinter Review
Backout patch for aurora (fx19).
Attachment #697566 - Flags: review?(dbaron)
Comment on attachment 697566 [details] [diff] [review]
b810726-backout-aurora

r=dbaron (reviewed delta from comment 24)
Attachment #697566 - Flags: review?(dbaron) → review+
Can we already back out on Firefox 20 as well and only reland once fully resolved? This has been on our tracking lists for a few releases now.
(In reply to Alex Keybl [:akeybl] from comment #30)
> Can we already back out on Firefox 20 as well and only reland once fully
> resolved? This has been on our tracking lists for a few releases now.

Well, I'd rather not do this. But, how about this...

Once the uplift happens, I'll post another backout patch for fx20 (which will become aurora). Then, give me one more cycle to get this fixed on trunk... I really think I can get to it in the next size weeks.

Does that seem like a fair compromise? :)
Comment on attachment 697566 [details] [diff] [review]
b810726-backout-aurora

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 764567
User impact if declined: Possible hang due to infinite loop.
Testing completed (on m-c, etc.): not on trunk - backout patch, but a similar patch was uplifted to beta and release.
Risk to taking this patch (and alternatives if risky): Fairly minimal, as it is already on beta.
String or UUID changes made by this patch: The original patch made changes to IIDs, but this one does not. Apparently, during the cycle before last, we uplifted a change that makes the interface changes for this bug unnecessary. Thus, there aren't any string or uuid changes in this patch, but there were changes in the patch for bug 764567, so they must have been modified and made unnecessary at some previous time.
Attachment #697566 - Flags: approval-mozilla-aurora?
Comment on attachment 697566 [details] [diff] [review]
b810726-backout-aurora

Approving for Aurora. Comment 31 sounds good for one more cycle.
Attachment #697566 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Mats: 

I saw that after applying this patch, one of the tests added as part of bug 822053 is consistently failing. I _think_ this might be because it relies on the column-fill: auto feature, which isn't going to be part of FF19 (hence I've disabled the test). BUT - I'd just like you to take a quick look at this to make sure I'm doing the right thing.

FWIW, I verified that Nightly and Aurora look the same for the testcase in bug 822053.

Note - Looks like this is probably going to have to be uplifted to beta instead of aurora, since we're in the process of merging right now.
Attachment #697566 - Attachment is obsolete: true
Attachment #698817 - Flags: review?(matspal)
Comment on attachment 698817 [details] [diff] [review]
b810726-backout-ff19

Yes, disabling that test is fine.

The changes to nsColumnSetFrame.cpp looks correct.
(I didn't look at the other files.)
Attachment #698817 - Flags: review?(matspal) → review+
Had to backout on beta, due to perma-orange crashtest:
https://hg.mozilla.org/releases/mozilla-beta/rev/f1684d3209e8

It looks like the problem seems to be fixed by bug 734777, but that was pushed to mozilla 20, not mozilla 19. The crashtest is failing because on OSX, there is one more assertion than it wants (3-7 are expected, but 8 were encountered):

https://tbpl.mozilla.org/php/getParsedLog.php?id=18574220&tree=Mozilla-Beta&full=1

So, we might be able to request uplift of the patch for bug 734777 to beta, or I could perhaps just make the assertion count for that test 3-8 (assuming other platforms don't encounter additional assertions). This would probably be fine to do, since mozilla20 has the patch, so it'd only be there for a single release.
adjusting the assertion count sounds like the right thing, at least assuming it's a different count of an existing assertion rather than a new assertion
Waiting to push this, since the tree is closed for beta.
Attached patch b810726 (obsolete) — Splinter Review
Attachment #712995 - Flags: review?(roc)
Inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a0db951f9f08
Target Milestone: B2G C2 (20nov-10dec) → mozilla21
(In reply to Scott Johnson (:jwir3) from comment #43)
> Inbound:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/a0db951f9f08

Backed out for reftest asserts.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f765c27d9a4

https://tbpl.mozilla.org/php/getParsedLog.php?id=19713337&tree=Mozilla-Inbound
REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/reftest/tests/layout/reftests/columns/column-balancing-overflow-004.html | assertion count 1 is more than expected 0 assertions
REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/reftest/tests/layout/reftests/pagination/column-balancing-break-inside-avoid-2.html | assertion count 2 is more than expected 0 assertions


https://tbpl.mozilla.org/php/getParsedLog.php?id=19713562&tree=Mozilla-Inbound

REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/reftest/tests/layout/reftests/columns/column-balancing-overflow-003.html | assertion count 1 is more than expected 0 assertions
REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/reftest/tests/layout/reftests/columns/column-balancing-overflow-004.html | assertion count 1 is more than expected 0 assertions
REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/reftest/tests/layout/reftests/pagination/column-balancing-break-inside-avoid-2.html | assertion count 2 is more than expected 0 assertions
Bah. None of these assertions were apparent on try, but clearly that's because I didn't do a debug build on try. ;P
Attached patch b810726 (v2) (obsolete) — Splinter Review
Re-requesting for review, since, after consultation with dbaron, I simplified the logic a bit. Basically, we now have four cases at the end of reflow children, if we exceed the allowed number of columns-

Content (columnBottom) is:
     
                           >= available               < available
                              height                    height
                         -------------------------------------------
> computed height       |  WAS: STOP          | WAS: REVERT         |
                        |  NOW: REVERT        | NOW: REVERT         |
                        |---------------------|---------------------|
<= computed height      |  WAS: STOP          | WAS: INFEASIBLE     |
                        |  NOW: STOP          | NOW: INFEASIBLE*    |
                         -------------------------------------------

I've numbered the cases, starting in the upper-right hand corner, and progressing counter-clockwise as I, II, III, IV, respectively.

The three actions we can take, depending on these cases are:

1) STOP - This means that we have too many columns, and we actually need to break the column set into two column sets (create a continuation for the column set itself).
2) REVERT - This means that we've detected that we need to re-start the reflow procedure, with column-fill: auto, rather than column-fill: balance.
3) INFEASIBLE - This means that we've determined that the current column height is not feasible, but it's possible that a different column height might be feasible.


In the first patch I posted, it modified case IV such that it performed the action associated with STOP instead of INFEASIBLE. I don't think this is the right course of action. Instead, I think that we still want to make it infeasible, but we want to break out of the child loop immediately. Further, the cases that deal with the column bottom being greater than the available height are nonsensical, because we never test a column height that is > the available height.

The logic is thus simplified so that if we've exceeded the number of columns allowed, then we need to stop and push our children to a continuation (of the column set). If, prior to this, we detect that our content bottom has exceeded our computed height, and we're balancing, then we switch to column-fill: auto, and immediately break out of the loop (since we need to restart reflow anyway).
Attachment #712995 - Attachment is obsolete: true
Attachment #717296 - Flags: review?(roc)
Depends on: 840818
Are we doing another backout here for FF20?
Try run for 06f31471f875 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=06f31471f875
Results (out of 196 total builds):
    exception: 6
    success: 62
    warnings: 10
    failure: 118
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/sjohnson@mozilla.com-06f31471f875
(In reply to Lukas Blakk [:lsblakk] from comment #47)
> Are we doing another backout here for FF20?

Hopefully not - the fix is finished, and has r+. I hope to land it today, and then request approval for aurora. It's fairly safe, I think.

Thoughts?
Attached patch b810726 (obsolete) — Splinter Review
Mats:

The assertion in bug 840818 was also being triggered as part of the fix for this patch. Unfortunately, the patch for bug 840818 doesn't fix the assertion being thrown in this case. What seems to be happening is that the assertion is being thrown as part of a temporary situation where are re-joining columns that were previously split into separate continuations.

dbaron, fantasai, and I spoke on Tuesday about this, and we agreed that we should restructure the assertion a bit so that it's testing something that is slightly weaker than what you had originally created. Basically, instead of verifying that if we had a non-empty FrameLines object, then the first child of the first line in mLines is the same as the first frame in mFrames, we check to make sure that the first frame in mFrames is the same as the first child of the first non-empty line. The only difference is that the previous assertion was stronger, because it was also asserting that the first line was non-empty. 

roc reviewed the changes in the patch, but I just wanted to run the changes for the assertion by you really quickly to make sure you agree with it.

Thanks!
Attachment #717296 - Attachment is obsolete: true
Attachment #720147 - Flags: review?(matspal)
Attachment #720147 - Attachment is patch: true
Blocks: 698783
Ah, so it wasn't PullFrameFrom (bug 840818) that was the problem, it's StealFrame
or DoRemoveFrame.  Sorry, I should have checked for similar patterns when I fixed
bug 840818.  It should be an easy fix, I'll use bug 847130 for that.
No longer blocks: 847130
Depends on: 847130
Comment on attachment 720147 [details] [diff] [review]
b810726

Please revert the nsBlockFrame changes; I'm fixing the underlying cause for
the assertion in bug 847130 instead.
Attachment #720147 - Flags: review?(matspal)
Sounds like the previously reviewed fix (attachment 717296 [details] [diff] [review]) is fine, then, though the indentation fix in nsColumnSetFrame.cpp in the newer patch is good.
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #53)
> Sounds like the previously reviewed fix (attachment 717296 [details] [diff] [review]
> [review]) is fine, then, though the indentation fix in nsColumnSetFrame.cpp
> in the newer patch is good.

Ok, I reverted the changes. This will land then as soon as bug 847130 lands.
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #55)
> bug 847130 landed yesterday (on inbound).

Ah, right. Ok, I will land it on inbound today then. ;)
Inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ddb4df63c258
Target Milestone: mozilla21 → mozilla22
Target Milestone: mozilla22 → ---
Attached patch b810726Splinter Review
Try server now shows green (minus intermittent failures).

https://tbpl.mozilla.org/?tree=Try&rev=2a74a0c73da6

It's a slightly modified version of the patch that is actually a version that's already been reviewed, I'm just reposting it here for clarity in the event that we need to come back to this bug at some later date.
Attachment #720147 - Attachment is obsolete: true
Attachment #721699 - Flags: review+
Inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f5b6b9b0157d
Target Milestone: --- → mozilla22
This will have to be nominated for approval and then landed prior to Tues Mar 12th when we go to build with beta 4 otherwise it looks like too much risk to take on later.  We'll need the time to evaluate and potentially backout if there are any regressions on beta channel after uplift.
I think it would make more sense to repeat the backout for 21, but get this patch onto aurora.

Scott, if that makes sense to you, could you make approval requests for both?
Flags: needinfo?(sjohnson)
https://hg.mozilla.org/mozilla-central/rev/f5b6b9b0157d
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
The approach dbaron outlines seems fine to me. Requesting approval for mozilla-beta uplift.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 764567
User impact if declined: Possible hang due to infinite loop in column set code.
Testing completed (on m-c, etc.): This is the same patch as the one that was uplifted to Firefox 19, so it's undergone testing on the release channel.
Risk to taking this patch (and alternatives if risky): Fairly minimal, as the backout has already been performed on release, as well as in several release cycles prior to now.
String or UUID changes made by this patch: None; It's important to consider, though, that the only reason this isn't the case is that prior backouts did make IID changes, but these were subsequently reverted (see bug 813264).
Attachment #722465 - Flags: approval-mozilla-beta?
Flags: needinfo?(sjohnson)
Adding backout patch interdiff, if you want to look at it, dbaron. I don't know whether the backout patch itself needs review or not, since (other than line numberings) it should be identical to the previous (FX19) backout patch.
Flags: needinfo?(dbaron)
Comment on attachment 721699 [details] [diff] [review]
b810726

[Approval Request Comment]
Bug caused by (feature/regressing bug #): column-fill feature implementation (bug 764567 and bug 695222)
User impact if declined: Hang caused by an infinite loop in the column reflow under certain conditions.
Testing completed (on m-c, etc.): Testing has been completed on mozilla-central and mozilla-inbound, as well as on numerous try runs.
Risk to taking this patch (and alternatives if risky): I think the risk is fairly small, but it does affect the way column reflow works, so there is a chance of regression. On the other hand, the patch fixes a known regression in column reflow that causes a hang. 
String or UUID changes made by this patch: None
Attachment #721699 - Flags: approval-mozilla-aurora?
(In reply to Scott Johnson (:jwir3) from comment #64)
> String or UUID changes made by this patch: None; It's important to consider,
> though, that the only reason this isn't the case is that prior backouts did
> make IID changes, but these were subsequently reverted (see bug 813264).

Apologies, we're having a hard time understanding this part. Can you explain it with a table?

Firefox 18 -> UUID A -> Behavior X
Firefox 19 -> UUID ? -> Behavior ?
Firefox 20 -> UUID ? -> Behavior ?
Attachment #721699 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Alex Keybl [:akeybl] from comment #67)
> Apologies, we're having a hard time understanding this part. Can you explain
> it with a table?
> 
> Firefox 18 -> UUID A -> Behavior X
> Firefox 19 -> UUID ? -> Behavior ?
> Firefox 20 -> UUID ? -> Behavior ?

Well, I think maybe I was being too verbose. Basically, what I meant was that 'No, there are no IID changes'. The table looks like this, I believe:

Firefox 18 -> UUID A -> Behavior X
Firefox 19 -> UUID A -> Behavior X
Firefox 20 -> UUID A -> Behavior X
Firefox 21 -> UUID A -> Behavior Y
Firefox 22 -> UUID A -> Behavior Y

The reason for this is that we changed the IID when we landed the column-fill feature. When that got backed out, the IID should have changed back, but because it was late in the beta cycle, we instead fixed it so that the IID didn't go back to what it was, but rather the column-fill changes were just un-implemented. Now, with this patch, those changes are again implemented, and the behavior is different. The IID change was already made, though, so it's not necessary again.
Flags: needinfo?(dbaron)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #68)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/b13c9d8f165b

Follow-up to annotate assertions that were fixed on m-c by bug 840818 and bug 847130. Uplift approval has been requested and this will be backed out if and when they land.
https://hg.mozilla.org/releases/mozilla-aurora/rev/133095cc43fa
Another follow-up (sorry, missed this test on the first push)
https://hg.mozilla.org/releases/mozilla-aurora/rev/352b0d752f5d
Comment on attachment 722465 [details] [diff] [review]
b810726-backout-fx20

Thanks for the clarification, please go ahead with landing this today so we have it out in time for tomorrow's Beta.
Attachment #722465 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Pushed a followup to beta that annotates the assertions:
https://hg.mozilla.org/releases/mozilla-beta/rev/871583c7b872
Does this need QA verification given in-testsuite coverage?
Backout patch requested for beta (fx21) due to regression bug 857324.
Attachment #743041 - Flags: review?(matspal)
Interdiff between backout patches for fx 20 and fx 21.
Comment on attachment 743041 [details] [diff] [review]
b810726-backout-fx21

r=mats, whitespace got messed up in two places:
indent "}" at layout/generic/nsColumnSetFrame.cpp:951
and remove empty line 947.
Attachment #743041 - Flags: review?(matspal) → review+
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 764567
User impact if declined: Infinite loop in some rare cases of column-fill usage
Testing completed (on m-c, etc.): This backout patch was landed on previous versions of firefox. 
Risk to taking this patch (and alternatives if risky): Regresses column-fill behavior, so it no longer works. 
String or UUID changes made by this patch: none
Attachment #743226 - Flags: review+
Attachment #743226 - Flags: approval-mozilla-beta?
Comment on attachment 743226 [details] [diff] [review]
b810726-backout-fx21

Approving the backout on Fx21 by which we will maintain the status quo similar to previous release and this option is deemed less riskier than the suggested fwd fix to fix the regression(bug 857324) for Fx21 timeframe, given we are left with our final two beta's.

Please make sure to land by tomorrow so this can get into our next beta.
Attachment #743226 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to bhavana bajaj [:bajaj] from comment #81)
> Comment on attachment 743226 [details] [diff] [review]
> b810726-backout-fx21
> 
> Approving the backout on Fx21 by which we will maintain the status quo
> similar to previous release and this option is deemed less riskier than the
> suggested fwd fix to fix the regression(bug 857324) for Fx21 timeframe,
> given we are left with our final two beta's.
> 
> Please make sure to land by tomorrow so this can get into our next beta.

Landed:
https://hg.mozilla.org/releases/mozilla-beta/rev/23c8ed5c349c
Can someone please answer to comment 76?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #76)
> Does this need QA verification given in-testsuite coverage?

No, I don't think so. The crash was exhibited by the testcase above, which I translated into a crashtest. So, unless there are other testcases that QA saw, I don't think there's much to verify that isn't already being run.
Whiteboard: qa-
Whiteboard: qa- → [qa-]
You need to log in before you can comment on or make changes to this bug.