Status

()

Core
DOM
P3
normal
9 years ago
4 years ago

People

(Reporter: Hixie (not reading bugmail), Assigned: bz)

Tracking

(Depends on: 3 bugs, {perf})

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

9 years ago
HTML5's various scripts are slow to run.

Specifically, once HTML5 is loaded, the time it takes for the toc, dfn, and status scripts to complete is about an order of magnitude more than Webkit (Opera doesn't do too well and IE can't run the tests).

TESTCASE
   http://damowmow.com/playground/bugs/slow/html5/001/?profile

There's a tarball:
   http://damowmow.com/playground/bugs/slow/html5/001/html5-perf-001.tar.gz
   http://damowmow.com/playground/bugs/slow/html5/001/README

Running with "?profile" at the end of the URL will cause the scripts to report how long they took in the <h2> at the top of the page. To test only one script at a time, the init() function in the index.html file can be modified to only load one test.
Ian, this would be a lot easier to profile with an actual tarball that had the relevant bits in it (and in particular had the XMLHttpRequested stuff, or if that stuff were white-listed for cross-site loads so saving locally and using <base> could work...)
Specifically for status.js.  I think the other ones I can do locally.
(Reporter)

Comment 3

9 years ago
The tarball is fixed now.

The status.js on the damowmow.com site uses static surrogates for the server-side script component that are included in the tarball.
Hmm.  I still get a "Fatal Network Error" alert with that tarball when running status.js...
(Reporter)

Comment 5

9 years ago
You might have to run it over a local HTTP server, I don't know if XHR over file:/// emulates the HTTP headers and so forth accurately enough.
OK, so as far as dfn.js goes, it should take at least 17829/3000 * 100 == 600ms or so (because it sets a 100ms timeout every 3000 links, and there are 17829 links it processes total).  It takes 800ms in Safari, and about 6000ms in Gecko.

Of those 6000ms in Gecko, about 2400ms are just painting triggering reflow during the timeout; they have nothing to do with the script per se; it's just that the reflow takes a lot longer than 100ms, so affects the script run time.

Of the remaining 3600ms, I see a bunch of parsing, execution of other scripts, etc.

In the interests of comparing apples to apples, I commented out the setTimeout on % 3000 (so that the initDfnInernal() runs straight through instead of possibly including other parts of pageload in the timing).  If I do that, Gecko runs in about 1000ms; Safari in about 300ms.

Going to go profile that now.
Ah, ok.  Yeah, I'll try status.js over http.
OK, dfn.js:

7% is GetParentNode.  Almost all of this time is XPCWrappedNative::GetNewOrUsed.  I bet peterv's work will help here.

A few percent is XPConnect overhead for calling GetHash, which isn't quickstubbed.

45.6% is under GetHash.  A large part of this is it being stupid and creating a URI, serializing it, then creating a URI from the resulting string, etc.  I thought I'd told someone to file a bug on this at some point... In any case, bug 481318 filed; thanks for reminding me about this issue.  ;)

About 8% is spent getting a node from document.links.  Again, mostly in GetNewOrUsed.

2% or so is newResolve on the document.links when getting array props.

A few percent under GetClassName, newResolve on the HTMLDocument, etc.  About 1.5% atomizing strings to ids in the JS engine.

I'll reprofile in a bit after I fix bug 481318.
Depends on: 481318
Depends on: 481335
OK, so that helps some (shaves about 170ms off total time on dfn.js).  GetHash is still 30-some percent.  It doesn't help that the script gets it twice for each link.  I filed bug 481335 on something that might help that situation.

Quickstubbing .hash shaves off another 130ms or so(!).  Filed bug 481336 on that.

At this point, GetNewOrUsed is also about 15% of the total time, so eliminating it will help a good bit.

We should reprofile once those issues are resoved.
Depends on: 481336
(In reply to comment #9)
> At this point, GetNewOrUsed is also about 15% of the total time, so eliminating
> it will help a good bit.

How much of this is dealing with the maplock? I wonder if it might just be worth the effort to maintain a main-thread-only hash of wrappers separate from the "must lock" set.
OK, next up is toc.js.  We spend about 1000ms on this, while webkit spends 26ms.  90% of the time is spent in GetElementById.  If I change the id table to always be live, the time drops to 63ms.  In other words, the time for this script is dominated by the 63 id lookup misses needed before we send the id table live and the resulting DOM walks...  I suppose we could make the table always live, but that has other performance penalties.  The cost of the current setup is bounded above, and the test is somewhat of a pathological case.  jst, sicking, any thoughts?

With an always-live id table, the time is spent in: GetHash (20%; see above for what the deal with it is), GetNewOrUsed (15% or so, called from various DOM traversal methods), nsHTMLDocument::ResolveName (13%, all under FindNamedItems walking about the DOM), populating the various document.getElementsByTagName("ol") content list (10%; the page defeats our lazy optimization by asking for the length when it doesn't really care about it), and some other minor stuff.

Fixing GetHash and GetNewOrUsed would be a good start; also sorting out the ResolveName issue.

To answer the question about GetNewOrUsed, since I've got another profile of it right here, it breaks down like so in this case:

6% self
14% XPCWrappedNative::FindTearOff
8% QI
7% nsDOMClassInfo::PostCreate
7% XPCWrappedNative:Init
7% JS_DHashTableOperate
6% ~nsCOMPtr_base (all cycle collector all the way!)
6% PR_EnterMonitor
6% nsElementSH::PostCreate
6% XPCWrappedNativeProto::GetNewOrUsed
5% operator new(unsigned long)
5% XPCWrappedNative::GatherScriptableCreateInfo
3% PR_ExitMonitor
3% nsContentListSH::PreCreate
1% PR_AtomicIncrement

plus some various minor stuff.
Now status.js.  We take about 9000ms in the test I did, though I've seen this number as high as 22000ms.  Webkit takes about 4000ms.  I don't see an easy way to measure just this script's time, since it's fundamentally async (uses XHR), in addition to doing setTimeout.

Ian, I'm going to need a hand here to see which parts of the profile really pertain to what status.js is doing...  Or at least it'd be nice.  I can probably sort it out on my own if I really have to.  ;)

19% of the time is spent handling reflow events.  About 2/3 of this is processing pending restyles; the rest is layout.  

54% of the time is spent handling restyle events.  42% (so about 80% of the restyle event time) is spent under RecreateFramesFor.  12% (so about 20% of the restyle event time) is spent computing the new style data.

The time under RecreateFramesFor is almost entirely comparing of tree positions for nsLayoutUtils::FindSiblingViewFor, called from CreateViewForFrame, called from BeginBuildingScrollFrame.  That accounts for 39% of the total test time.  The remaining 3% that are under RecreateFramesFor are under ~FrameConstructorState calling ProcessFrameInsertions and hence doing more tree position comparisons.

I'm not sure where these style changes are coming from; Ian, do you know?

12% of total time on the testcase is running timers.  A bit more than half of that is the GC timer, and that time is spent doing GC.  The remainder is the setTimeout timer, which seems to be running some JS, doing some GetHash, doing some DOM modification, etc.  All this is about 5% of total time reported, so not that much worth worrying about.

6% of total time is what looks like the initial script evaluation.  At least half of this is GetElementById (I went back to not hacking the id table liveness before profiling this).  The rest seems to be getting stuff from nodelists and whatnot; there isn't much time spent there to worry about.

7.5% of total time is spent processing paint events.  This breaks down as: 5.3% processing restyles, 2.1% reflowing, 0.1% mac widget layer overhead.  Actually painting is coming in at 0.0% here.  ;)  The restyles here are pure style data computation cost; no frametree munging here.

Conclusion: If this script is actually changing style data, then approximately 70% of the time above is relevant: it's spent processing the style changes.  Over half of it is the view insertion stuff, which we should fix by getting rid of views.  ;)  Ian, this script expected to either modify style data or change nodes in a way that would cause different rules to match?  roc, I assume there's nothing simple we can do to improve the view insertion stuff here?
(In reply to comment #12)
> Now status.js.  We take about 9000ms in the test I did, though I've seen this
> number as high as 22000ms.  Webkit takes about 4000ms.  I don't see an easy way
> to measure just this script's time, since it's fundamentally async (uses XHR),
> in addition to doing setTimeout.

Boris, if you can get Venkman working with 1.9.1 code, it does have a section in its profiling spreadsheets for "time spent in function", so you can exclude time spent waiting for a XHR response.
I'm not profiling with 1.9.1; doing that would be sort of a waste of time...  :)

And in any case, given the fact that some of the async things the script does really might be relevant (e.g. the restyles), that wouldn't tell me the information I need.
Getting rid of views is the thing to do there.
Depends on: 337801
(Reporter)

Comment 16

9 years ago
Re status.js, it inserts a lot (about 300-400) nodes into various places in between the ~10000 nodes that are children of the <body>, with a selector that should make them display:none until the very end, where it flips a switch and they all become display:block / position: absolute. After that point, it then tries to read the offsetTop of every single one of them. So it's quite possible that that is all doing style computation, yeah, but layouts should be deferred til the end (and should happen async between timeouts, in theory).
Aha.  So the restyles I'm seeing are in fact part and parcel of that script.  OK.  The insertion per se is not a problem, but finding siblings with views in that list is (becase there are so few, comparatively). That makes perfect sense.
With bug 481318, bug 481335, bug 481336 all fixed, dfn.js non-timeout time drops from about 1000ms to about 450ms.

Remaining places time is spent: GetNewOrUsed (30% of total now; a third of this is getting primary frames in PostCreate!), GetHash (13% of total now), NewResolve stuff (12% of total), js_Interpret (10% of total).  Once we jit across the calls and nix GetNewOrUsed, should re-profile.

I filed bug 481440 on the id table liveness issue.
Depends on: 481440
Depends on: 481677
Depends on: 513169
I did some more digging in here.  One reason for the vast number of restyles I see in status.js is that these three scripts are in fact running in parallel.  And toc.js and dfn.js some DOM manipulation that triggers async restyles on the body due to bug 501848.  That time ends up getting measured as part of the status.js time.

Another source of restyles here are the className changes on the <body>.  I bet bug 494117 is biting us there.

We seem to do a fair amount of layout too, but I haven't had a chance to look into that yet.
Depends on: 501848, 494117
Depends on: 526394
Duplicate of this bug: 524436

Comment 21

8 years ago
Martijn suggests this bug also affects
http://whatwg.org/specs/web-apps/current-work/#getelementsbyclassname
when running FF 3.5.5 or FF 3.6
Right.  That's exactly the page this bug is about; the urls in this bug are just instrumented versions of that page that measures its own performance.

Updated

8 years ago
Duplicate of this bug: 535932

Comment 24

8 years ago
I wonder if Bug 535994 should be duped to this bug or refocused, because there is an actual regression identified.  See bug 535994 comment 6.
For performance bugs, best not to dup if there are any doubts.  If in doubt, mark dependent and comment, as you just did.
Blocks: 535994

Comment 26

8 years ago
(In reply to comment #25)
> For performance bugs, best not to dup if there are any doubts.  If in doubt,
> mark dependent and comment, as you just did.

Then I don't understand why bug 524436 was marked as a dup of this bug 481131, while bug 535994 (which is the same bug, the summary being almost the same) was just marked as dependent on this bug 481131.
Duplicate of this bug: 535994
Vincent, because one bug just said "there's a hang on trunk" (and we know why there's a hang on trunk, and we have a patch for it) whereas the other bug said "there's a hang on trunk, and that hang is a regression, and oh, the duration of the hang depends on whether the load is coming from network or not".  That last bit is NOT true for the hang I know about, in my testing, so it wasn't obvious to me that the bug was the same, pending more investigation.

At this point, my best guess is that the "local" version in bug 535994 just didn't load the scripts (presumably because the whole site wasn't local), so wasn't hitting the hang case.

Make sense?

Comment 29

8 years ago
(In reply to comment #28)
> At this point, my best guess is that the "local" version in bug 535994 just
> didn't load the scripts (presumably because the whole site wasn't local), so
> wasn't hitting the hang case.
> 

This assertion is incorrect.  The "local" version *DOES* load all 12 scripts and 3 CSS files.  You can verify this yourself if you have the JSView extension installed.  It is even evident from comparing the page when loaded using either method.

If you're drawing the conclusion based on results I posted in response to your wanting me to test with scripts disabled, that is improper.  If you look at the timing I posted in bug 535994 comment 15, compared to all the other timings, it should be clear that there is no correlation to the "local" case where script are not disabled.
I commented in bug 535994, but unless your local version is actually running against a local HTTP server it almost certainly isn't doing the same thing as the "remote" version; last I checked the HTML5 scripts didn't work correctly when the XHRs they do are not running against an actual HTTP server.  Maybe hixie changed that, though.  In any case, let's take that up in bug 535994.

Comment 31

8 years ago
(In reply to comment #29)
> This assertion is incorrect.  The "local" version *DOES* load all 12 scripts
> and 3 CSS files.  You can verify this yourself if you have the JSView extension
> installed.  It is even evident from comparing the page when loaded using either
> method.

But does it *execute* the scripts exactly like the remote version? This is not obvious and could be a security hole (e.g. giving local information to remote servers), depending on the scripts.

BTW, "local" is inaccurate: does this mean "file:" or with a local web server?

Comment 32

8 years ago
(In reply to comment #31)
> BTW, "local" is inaccurate: does this mean "file:" or with a local web server?

Correct: file://

As for the rest, I plead ignorance.
Duplicate of this bug: 540138
Duplicate of this bug: 543244
OS: Mac OS X → All
Hardware: x86 → All

Updated

8 years ago
Duplicate of this bug: 550703

Comment 36

8 years ago
I wonder what this bug is about exactly: the summary says "HTML5 scripts are slow" and the bug is marked as normal, suggesting that the problem is local to a tab (e.g. the web page in some tab is slow to render due to slow script and things like that).

But the real problem observed by users (see the duplicates) is different: the whole browser hangs (if the effect is to wait for an hour, I wouldn't use the term "slow"); it is no longer possible to use the other tabs. IMHO either a separate bug should be opened or the summary and importance should be corrected: the summary should contain "hang" and/or "freeze" and the importance should be set to critical (just like a crash).

Another reason why the importance should be raised: while one can still "quit" Firefox, it is not possible to restart it. This is very bad for the average user. Looking at this more closely, Firefox is still running (without an open window) and one can kill it from a terminal, with the same effects as a crash.

Comment 37

8 years ago
Vincent: priorities are for engineers to use to express their personal priorities relative to other bugs they may own which can include crashers and security bugs which are more important. The assignee is currently Out (as indicated). If you disagree with the priority indicated, you can provide your own patch for the bug, and when you take ownership of the bug, you can set the priority to whatever you like.

Until then, please don't complain about priorities. Please do read:

https://bugzilla.mozilla.org/page.cgi?id=fields.html#priority
https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
The hang is fixed on trunk. This bug is not about the hang.

Comment 39

8 years ago
timeless: you mixed up priority and importance.

Robert: if the bug is not about the hang, then some bugs have been incorrectly marked as duplicates.
Vincent: Yes, that happens all the time. Please reopen any bug you feel has been erroneously duped (or ask in *that* bug to have it reopened).

There's not much we can do here to fix errors in other bugs.

Comment 41

8 years ago
Well, as a user I don't know the reason why the FF application hangs. But my Bug 550703 is about the whole application hanging. One could refine the findings therein to: "hanging on a particular page", which shouldn't occur anyway. But the focus on my bug is actually that under no circumstances the whole application should hang if there is a "problem" with any document/tab.

So I have now updated my bug to not be a duplicate. Thanks for the hint, Jonas.
> I wonder what this bug is about exactly

It's a tracking bug for performance problems on the HTML5 scripts.

> and the bug is marked as normal, suggesting that the problem is local to
> a tab

Nothing of the sort is suggested.

> if the effect is to wait for an hour, I wouldn't use the term "slow"

The hang is about 1 minute long on year-old laptop hardware on the current spec.  It was much shorter when the bug was filed, of course, since the spec was much smaller then.

Note that the performance issue that was causing the longest observable hang was fixed in bug 526394.  There was no dedicated bug about the performance problem itself, because it was known what technical change was needed to fix it (see comment 15); therefore the bug was about the technical change.  That's pretty much what Robert said in comment 38.  I've been marking the "hang" bugs as duplicates of this one, as have some others, because we didn't want to add noise to bug 526394.  I suppose now that that has landed it's a moot point.

Vincent, I suggest disregarding comment 37.  ;)

I'll mark bug 550703 a duplicate of something else, I guess, if that will make people happier....
Bug 494117 didn't really help much because of bug 501848.  Which I think we should try to fix.
Priority: -- → P2
Priority: P2 → P3

Comment 44

5 years ago
Firefox 19 (20121022)
2009-03-02; load: 10319ms; toc.js: 36ms; dfn.js: 8036ms to do 17829 links; status.js: 15897ms

Firefox 17
2009-03-02; load: 8011ms; toc.js: 32ms; dfn.js: 7298ms to do 17829 links; status.js: 17372ms

Chrome 22
2009-03-02; load: 6270ms; toc.js: 31ms; dfn.js: 874ms to do 17829 links; status.js: 3963ms

If the bug is about a hang. I don't a saw something other than the hang caused by js (I think).

Comment 45

4 years ago
Nightly 30.0a1 (2014-03-08) + ABP
2009-03-02; load: 5323ms; toc.js: 13ms; dfn.js: 4594ms to do 17844 links; status.js: 6620ms

Nightly 30.0a1 (2014-03-08)
2009-03-02; load: 4786ms; toc.js: 12ms; dfn.js: 3135ms to do 17844 links; status.js: 5160ms

Chrome 33
2009-03-02; load: 3180ms; toc.js: 12ms; dfn.js: 750ms to do 17829 links; status.js: 3590ms

IE 11
2009-03-02; load: 2886ms; toc.js: 48ms; dfn.js: 1881ms to do 17829 links; status.js: 3947ms

Firefox is a lot slower than Chrome and Internet Explorer on dfn.js and 40% slower on status.js. And AdBlock Plus makes Firefox slower.
You need to log in before you can comment on or make changes to this bug.