Closed Bug 861919 Opened 12 years ago Closed 4 years ago

Lots of "ASSERTION: Should be in construction phase: 'InConstruction()'" at BasicLayerManager.cpp:533

Categories

(Core :: Graphics: Layers, defect)

23 Branch
All
Android
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: kats, Unassigned)

Details

Attachments

(6 files, 8 obsolete files)

9.71 KB, text/plain
Details
30.46 KB, application/x-gzip
Details
11.39 KB, application/x-gzip
Details
6.66 KB, patch
Details | Diff | Splinter Review
10.21 KB, application/x-gzip
Details
10.17 KB, application/x-gzip
Details
I'm running a local debug build of Fennec I'm getting a lot of assertion failure warnings in logcat. Pretty sure this wasn't happening last week, so I suspect it is a result of the layers refactoring. That's just a guess though. I/Gecko ( 4415): ###!!! ASSERTION: Should be in construction phase: 'InConstruction()', file /Users/kats/zspace/mozilla-git/gfx/layers/basic/BasicLayerManager.cpp, line 533 Line 533 in my version of BasicaLayerManager.cpp is the assertion in BasicLayerManager::EndTransactionInternal.
OS: Mac OS X → Android
Hardware: x86 → All
I had a go at reproducing this, but couldn't (although I do remember seeing these assertions at some point in the refactoring so I would blame that). It is possible it has been fixed by one of the patches which landed recently. Kats: were there any particular URLs that triggered this? If you could get a stack trace that would be awesome.
Flags: needinfo?(bugmail.mozilla)
I'm synced to m-c cset 7712e67a7c1e from last Friday and can reproduce this by loading mozilla.org and scrolling down. I grabbed a backtrace, it looks like mPhase is PHASE_NONE rather than PHASE_CONSTRUCTION.
Flags: needinfo?(bugmail.mozilla)
(In reply to Milan Sreckovic [:milan] from comment #3) > I wonder if bug 862236 is related. I don't think so, but I'm going to look at that bug next...
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2) > Created attachment 737989 [details] > Backtrace at assertion failure > > I'm synced to m-c cset 7712e67a7c1e from last Friday and can reproduce this > by loading mozilla.org and scrolling down. I grabbed a backtrace, it looks > like mPhase is PHASE_NONE rather than PHASE_CONSTRUCTION. Thanks kats! I had another go at reproducing and couldn't do so. I also went through the code and I don't see any obvious way we could end with PHASE_NONE here. I'll have a go with an older version, to check if this has been fixed.
Nope, couldn't repro with that build either. I'm officially giving up for now. If anyone comes up with different STR then I'll take another look.
I can still repro this after syncing up to m-c yesterday (I built with a few local changes but they were unrelated). I don't know if the device matters but I'm running on a Nexus 4. If you have a patch with debugging output or something you'd like me to try let me know.
Attached patch logging (obsolete) — Splinter Review
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7) > I can still repro this after syncing up to m-c yesterday (I built with a few > local changes but they were unrelated). I don't know if the device matters > but I'm running on a Nexus 4. If you have a patch with debugging output or > something you'd like me to try let me know. Trying on a Nexus 7, I hope this is not hw dependent, it really shouldn't be. And we must be running very similar versions of Android. Anyway, if you get a moment to try running with the logging patch I just posted, that should give me some clue as to what is going on. It will probably take a few iterations to find out properly though.
Attached file logging output (obsolete) —
This is a full logcat dump that includes: 1) starting fennec and waiting for gecko init 2) tapping the mozilla.org topsite link on about:home (circa the 08:30:26 timestamp) 3) waiting for the page to load, and then doing a couple of pan/zoom operations on it until I saw the assertion go by in the log 4) Killing fennec The first assertion shows up in the log at 08:31:08.281 and there's a few more after that. Also I see that your log output is basically flooding the log - beware that the android logging goes to a circular buffer and so log output can get overwritten if it's not read out fast enough. I've run into this problem with missing logs before, and had to log to a file (in the code) to get around it. I don't know if your log volume is high enough to hit this problem but if you see things that don't make sense then this might be why.
Attached patch loggin round 2 (obsolete) — Splinter Review
Attachment #738787 - Attachment is obsolete: true
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10) > Created attachment 738910 [details] > logging output > > This is a full logcat dump that includes: > 1) starting fennec and waiting for gecko init > 2) tapping the mozilla.org topsite link on about:home (circa the 08:30:26 > timestamp) > 3) waiting for the page to load, and then doing a couple of pan/zoom > operations on it until I saw the assertion go by in the log > 4) Killing fennec > > The first assertion shows up in the log at 08:31:08.281 and there's a few > more after that. > > Also I see that your log output is basically flooding the log - beware that > the android logging goes to a circular buffer and so log output can get > overwritten if it's not read out fast enough. I've run into this problem > with missing logs before, and had to log to a file (in the code) to get > around it. I don't know if your log volume is high enough to hit this > problem but if you see things that don't make sense then this might be why. Thanks for the logs! They are interesting, something weird is going on. So I put some more detail into the logging code, if you could give it a run when you get a chance that would be great, thanks!
Attached file logging output from round 2 (obsolete) —
Attached. It took much longer this time to hit the assertion; not sure why. Also I started hitting another assertion in GLContext.cpp.
Attachment #738910 - Attachment is obsolete: true
Not waiting on me here, right?
Flags: needinfo?(ncameron)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #14) > Not waiting on me here, right? no, still trying to make sense of the logs - there are more layer managers than I expected and one of them seems to go rogue, I can't figure out why. Will probably request some more logging at some point. Thanks!
Flags: needinfo?(ncameron)
Attached patch logging round 3 (obsolete) — Splinter Review
So it looks like we are calling EndTransaction without BeginTransaction, which would cause the assertion, but it's not obvious why. I narrowed it down to four candidates, but need more info to narrow it down any further, so another round of logging. Stat holiday tomorrow, so no rush. Thanks!
Attachment #739271 - Attachment is obsolete: true
Flags: needinfo?(bugmail.mozilla)
Logging from round 3. As with round 2 it took me quite a while to generate the assertion, not sure why that is.
Attachment #739622 - Attachment is obsolete: true
Flags: needinfo?(bugmail.mozilla)
Here's another run using round 3 with a shorter log and more assertions. Turns out the trick to get the assertions is scrolling down closer to the bottom of mozilla.org (I did one fling down to halfway down the page, wait half a second, and then did another fling down)
Great, thanks Kats! So, the logs show that we are calling EndTransaction from here: http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#1959 Without calling BeginTransaction. I am 99% sure that this is not due to the layers refactoring as we didn't really touch FrameLayerBuilder and certainly not any calls to *Transaction. I'll see if I can repro myself with the more detailed STR.
My guess here is that this is happening during a repeat transaction. For inactive layers, we call BeginTransaction during layer constructor for the parent layer manager, and then we should hit the above EndTransaction call when painting the ThebesLayer. If we have a repeat transaction (where we do only some of the ThebesLayer painting, send it to the compositor, then do the remainder), then we will paint the same ThebesLayer multiple times without extra layer building steps. If the inactive layer content lies across the boundary, such that it gets drawn both times, then this assertion seems expected. Shouldn't be hard to confirm, add prints around layer building, and when we hit a repeat transaction in BasicShadowableLayerManager::EndTransaction. Assuming that is the case, then the assertion shouldn't cause any problems and we can just work around it.
Attached patch logging round 4Splinter Review
OK, this tests Matt's repeat transaction hypothesis (which seems totally valid to me). If we are right, then this will be the last round of logging.
Assignee: nobody → ncameron
Attachment #741218 - Attachment is obsolete: true
Flags: needinfo?(bugmail.mozilla)
Attached patch sad fix (obsolete) — Splinter Review
If kats's logging confirms this is due to repeat transactions, then I don't think there is a nice fix, because when we trigger the repeat transaction, we have no way to get to the temporary layer manager which paints the inactive layer. The best we can do is not throw the toys out of the pram if we are in PHASE_NONE, which is a bit sad, but not too much.
Attachment #742893 - Flags: review?(matt.woodrow)
Comment on attachment 742893 [details] [diff] [review] sad fix Review of attachment 742893 [details] [diff] [review]: ----------------------------------------------------------------- How about checking InTransaction before we call EndTransaction, and call BeginTransaction() (or a new more specific function) if we're not in one. That way we only stop the assertion for this specific case, and it can continue to be useful in others.
(In reply to Matt Woodrow (:mattwoodrow) from comment #23) > Comment on attachment 742893 [details] [diff] [review] > sad fix > > Review of attachment 742893 [details] [diff] [review]: > ----------------------------------------------------------------- > > How about checking InTransaction before we call EndTransaction, and call > BeginTransaction() (or a new more specific function) if we're not in one. > > That way we only stop the assertion for this specific case, and it can > continue to be useful in others. Can we guarantee that we won't have a non-default render target? Because BeginTransaction will reset that. And I guess we can actully check when we do this if we are in fact in a repeat transaction in the non-temp layer manager
> Can we guarantee that we won't have a non-default render target? Because > BeginTransaction will reset that. Yes, the transaction is always started here (http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#2383) for the temporary managers.
Attached patch happy fix (obsolete) — Splinter Review
kats: does this make the assertions go away?
Attachment #742893 - Attachment is obsolete: true
Attachment #742893 - Flags: review?(matt.woodrow)
Attachment #742923 - Flags: review?(matt.woodrow)
Attachment #742923 - Flags: feedback?(bugmail.mozilla)
Attachment #742923 - Flags: review?(matt.woodrow) → review+
Output from logging round 4. The "happy fix" patch doesn't build on my current m-c checkout so I wasn't able to test that. I'll take a look at the build failure and see if I can rebase it appropriately. /Users/kats/zspace/mozilla-git/layout/base/FrameLayerBuilder.cpp:3286:29: error: 'mozilla::FrameLayerBuilder::LayerManager' has no member named 'InTransaction' /Users/kats/zspace/mozilla-git/layout/base/FrameLayerBuilder.cpp:3287:30: error: 'class mozilla::layers::LayerManager' has no member named 'GetRepeatTransaction'
Flags: needinfo?(bugmail.mozilla)
Comment on attachment 742923 [details] [diff] [review] happy fix Review of attachment 742923 [details] [diff] [review]: ----------------------------------------------------------------- I stuck in a few static_casts to make it compile (not sure if that's right or not), but it didn't make the assertions go away. I will attach the updated patch that I used and the output.
Attachment #742923 - Flags: feedback?(bugmail.mozilla) → feedback-
Attached patch happy fix with static_casts (obsolete) — Splinter Review
This is the modified version of "happy fix" that I used.
Attachment #742923 - Attachment is obsolete: true
Attachment #743069 - Attachment is obsolete: true
Hmm, so the last round of logging shows (I think) that a repeat transaction sometimes causes this, but there is some other reason which more often causes it and often a repeat transaction does not cause this issue. The latter is odd, I would have thought a repeat transaction would always cause these problems. The only way this could be a repeat transaction is if the layer manager we set the repeat transaction on is not the layer manager owned by the Thebes layer which is rendered (inactive) the second time around. I'm not sure if that is possible.
Assignee: ncameron → nobody
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: