Closed Bug 410969 Opened 17 years ago Closed 17 years ago

Fix for bug 409109 causes talos to fail

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED

People

(Reporter: samuel.sidler+old, Unassigned)

References

Details

(Keywords: perf, regression)

(Filing for brendan.)

The fix that landed for bug 409109 caused the talos machines to go red, failing in the tgfx suite, specifically: talos/page_load_test/gfx/text2.html. (This happened before, with a previous version of the patch in bug 409109. See comment 10 there.) This test is run in through the pageloader and is failing due to an unresponsive script warning.

Instead of backing out, the tgfx suite was disabled and Alice restarted the builds. I'm adding the perf keyword because of this.


Tinderbox log: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1199537520.1199560480.5845.gz&fulltext=1
MXR link to Talos: http://mxr.mozilla.org/mozilla/source/testing/performance/talos/
Flags: blocking1.9?
It seems that the test is extremely sensitive to the frequency of callback calls since the patch could change that by few percent. 

Without the patch the callback is called on each backward script jump or when the total weight of other operations would exceed 4096 (the weight of the backward jump). Due to a small weight of those other operations with typical script code it means that they would not contribute to callback invocation count. Jump or returns are frequent enough in scripts so the weight would not grow past 4096 before the jump triggers the callback and resets the count. Then the callback would do something useful of each 256th invocation.

With the patch the callback is called when the count reaches 4096*256 and would do something useful on each invocation. But that bigger count means that the weights of non-jump operation contribute to the frequency of callback invocation as when the count reaches 4096*256 a sizable chunk of that can come from non-jumps. Effectively it leads to more branch callback calls. 

Now, this even slightly altered frequency of callback calls may lead to the following. On the first call the operation callback tries to detect if the executed script comes from chrome, and if so, set more permissive limits on the execution time, see ctx->mIsTrackingChromeCodeTime initializer at http://lxr.mozilla.org/seamonkey/source/dom/src/base/nsJSEnvironment.cpp#810 . Since now it is possible that the callback even on the first time is invoked in response to other operations rather than a script jump, the chrome detection code may get it wrong. 
The test page contains no script, just Jane Austen prose in a tiny font.

The only JS is from the XUL pageloader. I don't know that code or where to browse it -- cc'ing vlad. Since this pageloader can't be running JS the whole time the Jane Austen page is loading, I suspect the problem lies in a failure to reset the deadline for the slow script watchdog timer. See bug 408257 for another such bug, to do with the plugin API (probably; could be related).

/be
(In reply to comment #1)
> Now, this even slightly altered frequency of callback calls may lead to the
> following. On the first call the operation callback tries to detect if the
> executed script comes from chrome,

Aha! The pageloader must be chrome XUL. This is more like it.

/be
What's the status on this?  Tgfx continues to be off for all platforms and I'd like to get it back up and running.
Igor backed out bug 409109. I think it's safe to turn tgfx back on.
Tgfx is now active again, it'll show up in the next cycle.
Since the checking of an updated patch for bug 409109 has not affected Tallos tinderboxes, it confirms the theory from comment 1. But the full proof with landing of a patch for bug 411267.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.