Last Comment Bug 693219 - setting innerHTML makes page jump
: setting innerHTML makes page jump
Status: VERIFIED FIXED
[qa!]
: regression, verified-aurora, verified-beta
Product: Core
Classification: Components
Component: Layout (show other bugs)
: 8 Branch
: x86 Mac OS X
: P2 major with 1 vote (vote)
: mozilla10
Assigned To: Boris Zbarsky [:bz] (TPAC)
:
Mentors:
Depends on:
Blocks: 539093 693821
  Show dependency treegraph
 
Reported: 2011-10-09 17:18 PDT by Mime Čuvalo
Modified: 2011-12-07 13:45 PST (History)
10 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
affected
+
fixed
+
fixed


Attachments
testcase exhibiting issue (1.23 KB, text/html)
2011-10-09 17:18 PDT, Mime Čuvalo
no flags Details
testcase exhibiting issue (1.23 KB, text/html)
2011-10-09 17:20 PDT, Mime Čuvalo
no flags Details
Don't use TimeDuration methods from static constructors. (2.00 KB, patch)
2011-10-11 12:52 PDT, Boris Zbarsky [:bz] (TPAC)
roc: review+
christian: approval‑mozilla‑beta+
Details | Diff | Splinter Review
screenshot (8.40 KB, image/png)
2011-12-07 02:03 PST, Vlad [QA]
no flags Details

Description Mime Čuvalo 2011-10-09 17:18:12 PDT
Created attachment 565843 [details]
testcase exhibiting issue

My site has an infinite-scroll feature that adds more content to the bottom of the page when you get there.

However, in Firefox 8 there seems to be a regression that causes the page to jump/scroll to a point higher up in the document when this happens.  I cannot reproduce it in Firefox 7.0

I've created a test case that exhibits this behavior.  Open the file, go to the bottom of the page, click button to add a <div> and it'll make the page jump upwards.
Comment 1 Mime Čuvalo 2011-10-09 17:20:59 PDT
Created attachment 565844 [details]
testcase exhibiting issue

blah, for some reason bugzilla detected .html as text/plain...
Comment 2 Ryan VanderMeulen [:RyanVM] 2011-10-09 18:39:52 PDT
The testcase WFM on Windows 7.
Comment 3 Boris Zbarsky [:bz] (TPAC) 2011-10-09 18:44:20 PDT
Looks like this regressed in http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6df31af4cca6&tochange=19348341366b

Nothing in there jumps out at me, though...
Comment 4 Boris Zbarsky [:bz] (TPAC) 2011-10-09 18:44:32 PDT
Oh, and I'm also on Mac.
Comment 5 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-10-09 19:18:33 PDT
Maybe http://hg.mozilla.org/mozilla-central/rev/4053fa47daac or http://hg.mozilla.org/mozilla-central/rev/8d56702130fb ?
Comment 6 Boris Zbarsky [:bz] (TPAC) 2011-10-09 19:25:28 PDT
The former seems pretty unlikely.  The latter, maybe.  I'm not sure when I'll have a chance to bisect on builds here; if someone else can do that it would be lovely.
Comment 7 Mime Čuvalo 2011-10-10 04:17:13 PDT
Playing around with the testcase some more, fwiw, it seems that clicking the button to add innerHTML doesn't make the page jump some of the time.
Comment 8 Boris Zbarsky [:bz] (TPAC) 2011-10-10 18:36:41 PDT
Well, bisect says:

changeset:   73112:4053fa47daac
user:        Jeff Muizelaar <jmuizelaar@mozilla.com>
date:        Wed Jul 20 14:27:10 2011 -0400
summary:     Bug 539093. TimeStamp implementation for OS X using mach_absolute_time. r=cjones


Shows what I know!

Jeff, Chris, any idea what's up here?
Comment 9 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-10 20:12:49 PDT
That is ... bizarre.  No ideas off the top of my head.  Maybe ...
 - if something is checking the difference between TimeStamp t2 and t1 for the purposes of animation, then with Jeff's patch timestamps taken close to each other are more likely to be different since they have much higher resolution.  If that were the case, I would expect the bug to be in mac-specific code, because on desktop-linux we've had high-res timers for a long time.
 - maybe something doesn't expect decimal points in a millisecond value?  (/me waves hands)
 - were you testing on a multi-core machine?  Is mach_absolute_time() guaranteed to be monotonic across multiple cores?
Comment 10 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-10-10 21:45:59 PDT
Seems more likely to be related to the TimeStamp usage in layout/base/nsRefreshDriver.{h,cpp} than anything else I can think of, though I can't immediately see how.
Comment 11 Jeff Muizelaar [:jrmuizel] 2011-10-11 05:07:09 PDT
This only happens on OS X?
Comment 12 Mime Čuvalo 2011-10-11 07:25:25 PDT
Couldn't reproduce on Windows 7 (on a dual-boot Mac).
Comment 13 Mime Čuvalo 2011-10-11 07:29:57 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #9)
>  - were you testing on a multi-core machine? 

In my case, yes, Intel Core 2 Duo.
Comment 14 :Ehsan Akhgari 2011-10-11 07:46:43 PDT
It seems to me like the regression on Aurora and Beta here should we dealt with using a backout.  Marking as tracking10? too in case this is not fixed before the next migration date.
Comment 15 Boris Zbarsky [:bz] (TPAC) 2011-10-11 08:35:51 PDT
I've only tested on OSX, and yes on a multi-core machine.
Comment 16 :Ehsan Akhgari 2011-10-11 08:37:58 PDT
(In reply to Boris Zbarsky (:bz) from comment #15)
> I've only tested on OSX, and yes on a multi-core machine.

I tested on the same configuration, and could reproduce very easily.
Comment 17 Boris Zbarsky [:bz] (TPAC) 2011-10-11 08:49:34 PDT
I tried changing ClockTime like so:

  static PRUint64 lastClock = 0;
  PRUint64 newClock = mach_absolute_time();
  NS_ASSERTION(newClock >= lastClock, "How did that happen?");
  lastClock = newClock;
  return (lastClock / 1000000) * 1000000;

and the bug remains.  The assertion does not trigger.  I believe that eliminates all the guesses from comment 9.  :(
Comment 18 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-10-11 09:00:08 PDT
(In reply to Boris Zbarsky (:bz) from comment #17)
> I tried changing ClockTime like so:
> 
>   static PRUint64 lastClock = 0;
>   PRUint64 newClock = mach_absolute_time();
>   NS_ASSERTION(newClock >= lastClock, "How did that happen?");
>   lastClock = newClock;
>   return (lastClock / 1000000) * 1000000;
> 
> and the bug remains.  The assertion does not trigger.  I believe that
> eliminates all the guesses from comment 9.  :(

Not completely, since you didn't use a lock to protect lastClock.  That said, that theory seems relatively unlikely.
Comment 19 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-10-11 09:07:07 PDT
Did you check that the various values set up in TimeStamp::Startup initialize to things that yield sensible results?
Comment 20 Jeff Muizelaar [:jrmuizel] 2011-10-11 09:08:14 PDT
Has anyone tested on linux? That's the other place we use higher precision TimeStamping.
Comment 21 :Ehsan Akhgari 2011-10-11 09:11:12 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #20)
> Has anyone tested on linux? That's the other place we use higher precision
> TimeStamping.

I can't reproduce the bug on Linux.
Comment 22 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-11 09:58:18 PDT
Neither can I.
Comment 23 Boris Zbarsky [:bz] (TPAC) 2011-10-11 10:13:06 PDT
> Did you check that the various values set up in TimeStamp::Startup initialize to things
> that yield sensible results?

With my code from comment 18, I get:

  NS PER TICK: 1.000000
  RESOLUTION: 1000000
  RESOLUTION SIG DIGS: 1000000

which I think is expected.
Comment 24 Boris Zbarsky [:bz] (TPAC) 2011-10-11 10:15:43 PDT
OK, so if I disable interruptible reflow (GECKO_REFLOW_INTERRUPT_MODE=counter GECKO_REFLOW_INTERRUPT_FREQUENCY=1000000 GECKO_REFLOW_INTERRUPT_CHECKS_TO_SKIP=1000000) then the issue goes away.

So presumably we're interrupting the reflow in this case for some reason with the new timestamp setup?
Comment 25 Boris Zbarsky [:bz] (TPAC) 2011-10-11 11:26:00 PDT
Aha.  nsPresContext.cpp has:

  static TimeDuration sInterruptTimeout = TimeDuration::FromMilliseconds(100);

This is, of course, happening before TimeStamp::Startup.  So sInterruptTimeout gets set to some random value.  Looking at mine in gdb right now, I see:

(gdb) p sInterruptTimeout 
$1 = {
  mValue = -9223372036854775808
}

Which means that this code in nsPresContext::CheckForInterrupt:

  mHasPendingInterrupt =
    TimeStamp::Now() - mReflowStartTime > sInterruptTimeout &&
    HavePendingInputEvent() &&
    !IsChrome();

sets mHasPendingInterrupt to true no matter what mReflowStartTime is.
Comment 26 Boris Zbarsky [:bz] (TPAC) 2011-10-11 12:52:59 PDT
Created attachment 566305 [details] [diff] [review]
Don't use TimeDuration methods from static constructors.
Comment 27 :Ehsan Akhgari 2011-10-11 13:08:09 PDT
Hmm, so there's no way we can make sure that no other code is ever going to use this pattern.  Should we just switch the backend implementation to initialize lazily?  Or at least make it abort if something is called on it before the initialization happens?  If we don't do this, we're going to have to fight this issue again in the future...
Comment 28 Boris Zbarsky [:bz] (TPAC) 2011-10-11 14:27:57 PDT
Yeah, it would be good to file a followup on making it impossible to shoot one's foot like this.
Comment 29 Boris Zbarsky [:bz] (TPAC) 2011-10-11 14:31:32 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff1d493bf113
Comment 30 Boris Zbarsky [:bz] (TPAC) 2011-10-11 14:32:48 PDT
I filed bug 693821.
Comment 31 Marco Bonardo [::mak] 2011-10-12 03:16:02 PDT
https://hg.mozilla.org/mozilla-central/rev/ff1d493bf113
Comment 32 Mime Čuvalo 2011-11-11 16:25:00 PST
Ah, I didn't realize this was landed for FF10.  Can someone point out to me how to workaround this until then?  Any kind of CSS trick I can employ to make the page flow smoothly?
Comment 33 Boris Zbarsky [:bz] (TPAC) 2011-11-12 01:34:52 PST
Not a CSS one, but you could get a layout property (e.g. |document.body.offsetWidth|) right after your innerHTML set; that should work around the problem, at the cost of forcing some work to be synchronous that would normally get batched.
Comment 34 Mime Čuvalo 2011-11-12 07:11:55 PST
Excellent!  Thanks, Boris - you're always so helpful :)
Comment 35 Boris Zbarsky [:bz] (TPAC) 2011-11-12 10:15:24 PST
Comment on attachment 566305 [details] [diff] [review]
Don't use TimeDuration methods from static constructors.

We should strongly consider getting this in for Firefox 9.  I'm sorry it missed 8; I had been waiting for the tracking-* triage, but that never happened.  :(

The impact of this bug is that on Mac we always try to interrupt reflow as early as we can.  The result is almost certainly too many reflows, wasting CPU and taking longer to display pages.

The patch is very very safe: it just moves initialization of a variable from application startup to the first time the variable is used.
Comment 36 christian 2011-11-15 13:33:32 PST
Comment on attachment 566305 [details] [diff] [review]
Don't use TimeDuration methods from static constructors.

[triage comment]
Approved for beta. Please land as soon as possible.
Comment 37 christian 2011-11-15 17:13:17 PST
http://hg.mozilla.org/releases/mozilla-beta/rev/e7cd4b3f47d6

This looks to already be on aurora.
Comment 38 Boris Zbarsky [:bz] (TPAC) 2011-11-15 19:30:28 PST
Yes, this was already on Aurora.  Thank you for pushing this to beta!
Comment 39 Vlad [QA] 2011-12-07 02:03:50 PST
Created attachment 579636 [details]
screenshot

Hi guys.

I have verified this bug on:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20100101 Firefox/9.0 beta 4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0a2) Gecko/20111206 Firefox/10.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:11.0a1) Gecko/20111206 Firefox/11.0a1

The page doesn't jump anymore when I insert the text from the button, but I have to manually scroll the page to see the whole text.
When I clicked the button, the text appears as in the screenshot.

Is this intended? 
Thanks
Comment 40 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-07 08:47:30 PST
I believe this is FIXED as described.

From comment 0:
"My site has an infinite-scroll feature that adds more content to the bottom of the page when you get there...there seems to be a regression that causes the page to jump/scroll to a point higher up in the document when this happens."

If it does not JUMP then this can be marked VERIFIED.
Comment 41 Boris Zbarsky [:bz] (TPAC) 2011-12-07 13:45:05 PST
> Is this intended? 

Yes/

Note You need to log in before you can comment on or make changes to this bug.