Closed Bug 552809 Opened 14 years ago Closed 14 years ago

Intermittent black boxes from asyncDrawXULElement

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0a1+ ---

People

(Reporter: stechz, Assigned: joe)

References

Details

Attachments

(4 files, 3 obsolete files)

[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMCanvasRenderingContext2D.asyncDrawXULElement]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://browser/content/TileManager.js :: render :: line 682"  data: no]

This only happens occasionally and I've not linked it to anything specifically going wrong.
Assignee: nobody → joe
ok, seems we're getting this error systematically on win32 builds. I'm going to investigate today
Here's my build log for this fennectrolysis build under win32
(I'm not using any of Dougt's patches atm)
Assignee: joe → bugzillaFred
Status: NEW → ASSIGNED
Attached image Screen capture
Screen capture showing current fennectrolysis bogus behavior (google.com) that I suspect is related to the present bug
Blocks: 516521
I've found that this is happening due to double call of asyncDrawXulElement to the same canvas element without waiting for async response about drawFinished..

And in this case asyncDrawRequest failed in 
http://mxr.mozilla.org/projects-central/source/electrolysis/content/canvas/src/nsCanvasRenderingContext2D.cpp#3612
easily reproducible on about:license page.
joe, any idea what is going on?
Attached patch Fennec workaround (obsolete) — Splinter Review
Attachment #453589 - Flags: review?(mark.finkle)
Attachment #453589 - Flags: review?(mark.finkle) → review+
Comment on attachment 453589 [details] [diff] [review]
Fennec workaround

  wrapper.ctx = aCanvas.MozGetIPCContext("2d");
>   wrapper.drawContent = function(aLeft, aTop, aWidth, aHeight, aColor, aFlags) {
>-    this.ctx.asyncDrawXULElement(aBrowser, aLeft, aTop, aWidth, aHeight, aColor, aFlags);
>+    try {
>+      this.ctx.asyncDrawXULElement(aBrowser, aLeft, aTop, aWidth, aHeight, aColor, aFlags);
>+    }
>+    catch (e) {
>+      Components.utils.reportError(e);
>+      let self = this;
>+      setTimeout(function() {
>+        self.drawContent(aLeft, aTop, aWidth, aHeight, aColor, aFlags);
>+      }, 0);
>+    }

Actually, I'd like to just call the raw asyncDrawXULElement in the catch. We could possibly start a death loop if the function keeps throwing. We should only give it one extra chance.
I still see artifacting with this patch.
Attached patch Fennec workaround v2 (obsolete) — Splinter Review
Mark, I doubt you will like this.  This:
1) handles async render events to ensure that the render was successful
2) if there is an error and the event never comes, it will do a retry every 500 msecs up to 4 times.

See if this helps with artifacts.
Attachment #453589 - Attachment is obsolete: true
Attachment #453842 - Flags: review?(mark.finkle)
Comment on attachment 453842 [details] [diff] [review]
Fennec workaround v2

Overall, I can live with changes like this. I do find we still have rendering problems though. On IRC, you said it was when we start on a local page (about:home) and open a remote page (news.google.com)
Attachment #453842 - Flags: review?(mark.finkle) → review-
I based this patch off the assumption that canvas tiles should never render async content followed by sync content.

It could probably use cleanup, but this is dead code walking.

Let me know if you can reproduce render glitches.
Attachment #453842 - Attachment is obsolete: true
Attachment #453932 - Flags: review?(mark.finkle)
Comment on attachment 453932 [details] [diff] [review]
Fennec workaround v3

This patch definitely makes rendering better. I think we have an underlying platform problem though. Needed to destroy and recreate canvases when swithcing from drawWindow to asyncDrawXULElement seems wrong and busted.

Lastly, would it simplify anything if we just _always_ destroyed canvases when loading pages?
Attachment #453932 - Flags: review?(mark.finkle) → review+
... and yes, I really don't like the hoops we need to jumpt through to make this work. TileManager has reached a new "high" in complexity.
pushed to m-b:
http://hg.mozilla.org/mobile-browser/rev/1d0d9c3c481f

But not all rendering issues are fixed.
Mark and I have been noticing black boxes and weird artifacts occasionally.  I believe this is a problem with the async call.

The best way I've managed to reproduce the bug is by opening the start page and google.com in separate tabs and clicking between them.

It does not look like this is due to any unreported errors in the child.  PresShell is rendering to the document correctly when this problem occurs.  Maybe there is some sort of race condition.
tracking-fennec: --- → 2.0+
joe, still need your help here -- wan to get rid of the workaround asap.
tracking-fennec: 2.0+ → 2.0a1+
Bug 580197 shows some stacks where this bug (and the bandaid fix) is actually causing huge memory spikes and in some cases crashes after "too much recursion" problems.
There's a really easy way to trigger a failure - kill the content process in some manner, then switch to another tab.  AsyncDrawXULElement will try to get the child process, fail, check for an existing docShell and bail with NS_ERROR_FAILURE.  This almost certainly isn't the intermittent problem seen here, but it does make the browser explode due to the patch checked in here.
Assignee: bugzillaFred → joe
I'm seeing this on Android and Doug says he's seen it on maemo as well.
OS: Mac OS X → All
Hardware: x86 → All
Blocks: 581168
Summary: Intermittent NS_ERROR_FAILURE when calling asyncDrawXULElement → Intermittent black boxes from asyncDrawXULElement
(there are no NS_ERROR_FAILURE problems anymore, because the Fennec frontend code doesn't try to call asyncDrawXULElement on canvas elements that are pending another draw.)
Sometimes we get the wrong results in the parent process because the child process's X drawing hadn't yet been synced to the server. This patch fixes that.
Attachment #463223 - Flags: feedback?(webapps)
Joe:

I'm finding that I still need to recreate canvases everytime I switch from the start page to another web site.  If you comment out the "recreateCanvas" in TileManager.js.in here:

http://mxr.mozilla.org/mobile-browser/source/chrome/content/TileManager.js.in#835

You will see what I mean.
Attachment #463223 - Flags: feedback?(webapps) → review?(romaxa)
Comment on attachment 463223 [details] [diff] [review]
XSync after drawing in the child process

r=me with comment per IRC.
Attachment #463223 - Flags: review?(romaxa) → review+
Attachment #463223 - Flags: review+
Carrying over r+ - this version just contains the comment changes requested.
Attachment #463223 - Attachment is obsolete: true
Attachment #463281 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/bcd22ffaf253
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: