Closed Bug 464838 Opened 11 years ago Closed 11 years ago

Tp3 regression on 2008-11-11

Categories

(Core :: General, defect, P1)

x86
All
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1

People

(Reporter: bzbarsky, Assigned: mcmanus)

References

Details

(Keywords: fixed1.9.1)

Attachments

(3 files, 7 obsolete files)

We seem to have a Tp3 regression on at least some of the Mac and Linux machines (e.g. qm-plinux-fast03).  The checkin range is:

http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2008-11-11+19%3A00%3A00&enddate=2008-11-11+20%3A00%3A00

Out of those, I'd expect the DNS prefetch to be the most likely culprit.

Not sure we need to worry about this for 1.9.1b2, but we should figure out what's up here.  One test I'd suggest is commenting out the prefetch call in nsHTMLAnchorElement::BindToTree to see what impact, if any, it has.  It's my best guess for the regression source.
Flags: blocking1.9.1?
Though I have tried to understand just what tp3 is composed of, I didn't quite conquer that mountain.

I put together some of my own measurements, using -tp and -tpcycles, and depending on page can see a 1-2% average slow down with the patch removed, and using a well connected 100% hit proxy. Though the stddev on my tests tends to be well over 5% so I'm not sure I put a heck of a lot of faith in the average.  As I mentioned earlier, that's basically a test of overhead for this patch - there is no upside to prefetching DNS when using a proxy.

Anyhow, I will attach a patch that seems to speed things considerably up. On my tests it gets us within 1 or so ms on a page load that has an average of around 800ms. The measurements are a bit volatile, but it is a clear improvement on the trunk.

It does this by:

  * removing the allocations and hash lookups from the bindtotree() path. These are turned into an insertion into a static fifo and only when the document finishes loading are all the entries in the fifo pushed through the dns cache in a batch. Modeled after the existing link prefetch code.

  * the async-open early lookup code short circuits itself in case a proxy is being used. This makes sense to do, but didn't help the benchmark notably in isolation.

 If talos doesn't agree, it would be helpful to identify what particular pages might be showing the slowdown so they could be analyzed more closely.
Attached patch patch r 1 (obsolete) — Splinter Review
Assignee: nobody → mcmanus
Attachment #350108 - Flags: review?(bzbarsky)
> Though I have tried to understand just what tp3 is composed of, I didn't quite
> conquer that mountain.

What were the obstacles?  Getting your hands on the pageset?

> it would be helpful to identify what particular pages might be showing the
> slowdown

This data was available from Tinderbox when this bug was filed (and might still be available).  It's in the Tinderbox log.

One question here.  Is the main effect from doing the lookups in a batch, or from limiting the number of lookups to 512?
> One question here.  Is the main effect from doing the lookups in a batch, or
> from limiting the number of lookups to 512?

batch plus deferral.. 256 gave almost same results
Priority: -- → P2
Target Milestone: --- → mozilla1.9.1
Attachment #350108 - Flags: review?(bzbarsky) → review-
Comment on attachment 350108 [details] [diff] [review]
patch r 1

>+++ b/content/html/content/src/nsHTMLAnchorElement.cpp
>@@ -224,17 +208,24 @@ nsHTMLAnchorElement::BindToTree(nsIDocum
>+  // Prefetch links
>+  if (nsHTMLDNSPrefetch::IsAllowed(GetOwnerDoc())) {

It's probably worth checking aDocument here too, so we don't do prefetch stuff on various mutations of disconnected subtrees.

It would also be nice to not prefetch more than once for a given node, so we don't keep doing prefetch during DOM mutations of existing stuff in general.  Maybe file a followup bug on this?

>+++ b/content/html/content/src/nsHTMLDNSPrefetch.cpp
>+nsHTMLDNSPrefetch::Prefetch(nsAString &hostname, PRUint16 flags)

So unlike the URI version this doesn't do the queuing business, right?  That might be worth documenting in the header.

>+  nsCString  hostname8;
>...
>+  return sDNSService->AsyncResolve(hostname8, flags,

Just pass NS_ConvertUTF16toUTF8(hostname) directly to asyncResolve and skip the temporary?

>+nsDNSPrefetchDeferrals::nsDNSPrefetchDeferrals()
>+  : mActiveLoaderCount(0)
>+{
>+  mLock = PR_NewLock();
>+  ClearQueue();

This seems to be the only caller of ClearQueue.  Since ClearQueue does nothing of the sort (e.g. leaves refs to all the URIs) and instead just sets some members, maybe it should go away and we should just set the members here in the initializer list?

Why do we need a lock here at all?  Our OnLookupComplete can happen on random threads, but all other calls into this method should be on the main thread.  We can add asserts to that effect if we want (and that might be a good idea).

>+nsDNSPrefetchDeferrals::~nsDNSPrefetchDeferrals()
>+  RemoveProgressListener();

You actually don't want to do that, since it'll try to call virtual methods on this object, etc.  Given that the web progress holds a reference to us via an nsIWeakReference, there's no need to remove on destruction.

>+nsDNSPrefetchDeferrals::Add(PRUint16 flags, nsIURI *aURI)
>+  if (((mFireAndForgetHead + 1) & sMaxDeferredMask) == mFireAndForgetTail)
>+    return NS_ERROR_DNS_LOOKUP_QUEUE_FULL;

I think |(mFireAndForgetHead + 1) % sMaxDeferred| is much clearer.

Even clearer would be using an existing sane queue class, which we sadly don't have.  nsDeque would work if you could stuff your flags in the low bits of a URI pointer, but you need 3 flags.

Please file a bug to update this stuff to a templated queue class once we have one?  I'm pretty sure we already have a bug to create such a class, but if not let me know and I'll file it.

>+  mFireAndForgetHead = (mFireAndForgetHead + 1) & sMaxDeferredMask;

Again, I think explicitly using % would be clearer.

>+nsDNSPrefetchDeferrals::SubmitQueue()
>+    mFireAndForgetTail = (mFireAndForgetTail + 1) & sMaxDeferredMask;

And here.

>+nsDNSPrefetchDeferrals::RemoveProgressListener()

This method should just go away.

>+nsDNSPrefetchDeferrals::OnStateChange(nsIWebProgress* aWebProgress, 
>+      if (mActiveLoaderCount)
>+        mActiveLoaderCount--;

I think you should be able to assert mActiveLoaderCount > 0 here (before the decrement) instead of checking it.


>+nsDNSPrefetchDeferrals::OnLookupComplete(nsICancelable *request,

Maybe put this one before all the progress listener methods, instead of in between some of them?

+++ b/content/html/content/src/nsHTMLDNSPrefetch.h
+class nsDNSPrefetchDeferrals : public nsIDNSListener

This could be an inner class of nsHTMLDNSPrefetch, no?

+  ~nsDNSPrefetchDeferrals();

virtual, please

+++ b/netwerk/protocol/http/src/nsHttpChannel.cpp
+    if (!(mConnectionInfo &&  mConnectionInfo->UsingHttpProxy())) {

Take out one of the two spaces after the &&?

Also, probably worth commenting that there is no point doing DNS prefetch when using a proxy, since we'd just be resolving the proxy's hostname?

r- because of the progress listener removal issue, which needs to be fixed no matter what.
(In reply to comment #5)
> Even clearer would be using an existing sane queue class, which we sadly don't
> have.  nsDeque would work if you could stuff your flags in the low bits of a
> URI pointer, but you need 3 flags.
> 
> Please file a bug to update this stuff to a templated queue class once we have
> one?  I'm pretty sure we already have a bug to create such a class, but if not
> let me know and I'll file it.

We could at some point try using the standard library.  I'm not sure exactly what conclusion we reached the last time we discussed it, but I vaguely remember it being "sure, if somebody wants to try being the first...".  We should of course keep an eye on codesize and performance numbers, etc., and make sure the mobile guys are ok with it.

> +  ~nsDNSPrefetchDeferrals();
> 
> virtual, please

Destructors of XPCOM object implementations should generally be *either* virtual or private.  Is there any reason to pick virtual here over private?  It seems like it would be better of as private and non-virtual.
Yeah, private destructor would be fine too.
Boris, David,

Thanks for the reviews and making this stronger. Most of the feedback is a slam dunk, but I want to ask a couple questions before finalizing the next spin of the patch.

> Why do we need a lock here at all?  Our OnLookupComplete can happen on random
> threads, but all other calls into this method should be on the main thread.  

good to know. The lock is probably the most expensive thing in the add path, so it would be good to get rid of.

I was under the impression wholly different DOMs might be processed on different threads - and there is only one global dns deferral queue thus the need for the lock. I guess I am mistaken? And the statechange notification also comes in on that thread? How would I have worked that out for myself? (a doc I missed or forgot, perhaps?)

> >+  if (((mFireAndForgetHead + 1) & sMaxDeferredMask) == mFireAndForgetTail)

> I think |(mFireAndForgetHead + 1) % sMaxDeferred| is much clearer.

How strongly do you feel about this? My version reads very naturally to me, probably because I have used that idiom a million times. I like it because I know every compiler will generate the & while I have seen the latter turn into divides even in the presence of maskable constants.

If you feel strongly it is a fish out of water wrt the other code, I'll change it. Just let me know.

 >+nsDNSPrefetchDeferrals::OnStateChange(nsIWebProgress* aWebProgress, 
> >+      if (mActiveLoaderCount)
> >+        mActiveLoaderCount--;
> 
> I think you should be able to assert mActiveLoaderCount > 0 here (before the
> decrement) instead of checking it.

It was modeled on nsPrefetchService which has this comment
   //
    // at initialization time we might miss the first DOCUMENT START
    // notification, so we have to be careful to avoid letting our
    // stop count go negative.

That made a lot of sense to me. Do you think it isn't a valid concern?
All DOMs are always on the main thread at the moment, as is OnStateChange.

I'm not sure where, if anywhere, this is clearly documented.  :(

> How strongly do you feel about this?

Not that strongly.  I'm fine with keeping it, esp. since I hope we'll end up with a standard queue class being used here at some point.

Hmm.  Interesting point about init time...  Given that, probably better to leave that code as-is, yes.
Attached patch patch r2 (obsolete) — Splinter Review
New patch incorporates review comments. Follow-on bug regarding templated queue class to follow.
Attachment #350108 - Attachment is obsolete: true
Attachment #351004 - Flags: review?(bzbarsky)
> Please file a bug to update this stuff to a templated queue class once we have
> one?  I'm pretty sure we already have a bug to create such a class, but if not
> let me know and I'll file it.
> 

I wasn't able to find such a bug. I searched for queue, fifo, and even "linked list" and they didn't turn up a candidate.

Though https://bugzilla.mozilla.org/show_bug.cgi?id=442584 seems to agree with you :)
Comment on attachment 351004 [details] [diff] [review]
patch r2

r+sr=bzbarsky.
Attachment #351004 - Flags: superreview+
Attachment #351004 - Flags: review?(bzbarsky)
Attachment #351004 - Flags: review+
Patch r2 pushed:
http://hg.mozilla.org/mozilla-central/rev/a0c0ed9f461f
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Can someone verify that this actually made Tp3 go down on the relevant machines?  I don't seem to see anything other than a big jump on some machines yesterday there....
(In reply to comment #14)
> Can someone verify that this actually made Tp3 go down on the relevant
> machines?  I don't seem to see anything other than a big jump on some machines
> yesterday there....
I don't see any changes to the graphs.
Links to the specific pages and logs you are speaking of would be very helpful
Here are the tp3 graphs for all the machines running that test:

http://graphs.mozilla.org/#show=394872,394887,394900,395008,395020,395048,395125,395135,395166,787095,787118,787129,794379,794396,794407,911655,911694,912148,1431032,1431170,1431214,1431848,1431867&sel=1225860045,1228654546

Here's the tp3 graph for the machine I mentioned in comment 0 showing the regression and the lack of regression going away:

http://graphs.mozilla.org/#show=911694&sel=1226041126,1228665420

To get logs, you'd have to load the Firefox tinderbox tinderbox, go back in time (using the link at the bottom) to the right day, then click the "L" link on the correct builds.  Hovering the graph should tell you the relevant build start times.  It's unfortunate that this part is so annoying.  :(

In any case, reopening, since qm-plinux-fast03 still shows a regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
interested in seeing if this fixes the problem I'm seeing with https://bugzilla.mozilla.org/show_bug.cgi?id=423377#c34 though #c37
(In reply to comment #18)
> interested in seeing if this fixes the problem I'm seeing with
> https://bugzilla.mozilla.org/show_bug.cgi?id=423377#c34 though #c37

Take a look at 467562 - I put a patch in there yesterday hoping someone who can repro it would try it out.
Depends on: 467648
OS: Mac OS X → All
The innerHTML part of the Dromaeo modify test (http://dromaeo.com/?dom-modify$) has regressed, and the creation of URIs for DNS prefetching in BindToTree is showing up in profiles as the culprit. This patch moves that code from BindToTree to run from an event. Will probably help with the Tp3 regression too.
Attached patch patch r3 (obsolete) — Splinter Review
Peter - thank you for identifying the URI creation as the primary culprit here. That's a huge help.

On my desktop that innerhtml test measures about 100ms times with the 1.9.1 branch, and about 79ms with the combination of r2 + your patch (c#20) applied. (r2 is currently applied on moz-central, but not 1.9.1.. your patch requires r2).

With the code in bindtotree() just commented out (i.e. disable prefetch totally) innerhtml runs in about 79ms.

I'm going to suggest patch r3 (attached, applies on top of r2) instead of the c#20 patch. Your patch builds a deferred queue, eventually run by the main event loop. R2 already introduced a deferred queue, eventually run on pageload completion.. so we end up with a deferred queue on top of a deferred queue :(

The difference was the R2 queue deferred the DNS lookups, but it didn't defer the nsURI creation. The R3 patch makes that change (essentially deferring nsGenericHTMLElements instead of nsURIs).

R3 (on top of R2) performs as well as the event deferral patch on dromaeo, so hopefully we've cracked this nut.
Attachment #355437 - Flags: review?(bzbarsky)
So with that last patch, if the node is being added after the page has loaded (As in dromaeo), we end up holding a reference to the node in the prefetch queue until some other page loads, no?  I was OK with that for URIs, because the queue size is limited, so the memory consumption is bounded.  But if we're going to keep nodes alive, that entrains all sorts of other objects, possibly for arbitrarily long periods of time.  I'd really rather we didn't do that...
hm. Well, I kinda feel like the event loop could have the opposite problem (not wait long enough), right?

Is there anything you can think of to trigger this off of?
(In reply to comment #23)
> hm. Well, I kinda feel like the event loop could have the opposite problem (not
> wait long enough), right?
> 
> Is there anything you can think of to trigger this off of?

if there isn't a good answer to that I guess we could just go a different direction and walk the tree at pageload complete, building the uris then and being done with it. It wouldn't capture changes made through js, but that's probably ok for this..
pageload complete, breaking out when the queue is full, might work, yes.  You could use an expiration tracker (set for 1s or so, say) to decide when to trigger otherwise, maybe.
Attached patch patch r3-v2 (obsolete) — Splinter Review
Updated r3 patch

based on the first revision, normally the queue is submitted (and uri's built) on any pageload completion. This version adds a 2 second one-shot timer if an entry is added to the queue while no pageload is in progress (and there is no timer currently outstanding) - the queue is processed if the timer expires before another page begins loading.
Attachment #355437 - Attachment is obsolete: true
Attachment #355510 - Flags: review?(bzbarsky)
Attachment #355437 - Flags: review?(bzbarsky)
Comment on attachment 355510 [details] [diff] [review]
patch r3-v2

I should note that I suspect we'll want to back this out and figure out a different approach for this later, one that would extend to the async history stuff we want to do...

>+++ b/content/html/content/src/nsHTMLDNSPrefetch.cpp
>+nsHTMLDNSPrefetch::nsDeferrals::Tick (nsITimer *aTimer, void *aClosure)

Take out the space before '(', please.

>+  nsHTMLDNSPrefetch::nsDeferrals *self = (nsHTMLDNSPrefetch::nsDeferrals *) aClosure;

What guarantees that |self| hasn't been destroyed yet?  Should we be canceling the timer on destruction?

>+  NS_ASSERTION (self->mTimerArmed, "Timer is not armed");

Nix the space before '('.

>+  // If the queue is not submitted here beacuse there are outstanding pages being loaded,

"because"

>+++ b/content/html/content/src/nsHTMLDNSPrefetch.h
>+    static void Tick (nsITimer *aTimer, void *aClosure);

Nix space before '('.

>+      nsCOMPtr<nsGenericHTMLElement>   mElement;

nsRefPtr, please.

I'd like to see the lifetime issue addressed before marking review.
Attachment #355510 - Flags: review?(bzbarsky) → review-
Attached patch r3.v3 (obsolete) — Splinter Review
udpated with review comments from comment 27. The cancel() is indeed important, it was at one time included in the patch and I somehow lost it - thanks for the catch.

I dropped in nsRefPtr and assume it works just as nsCompPtr does for non COM objects? I could not find a strong definition for it. MDC calls the two "similar".

I updated the formatting bits as well. Is there any corollary to linux's checkpatch.pl for the mozilla style? http://www.mjmwired.net/kernel/Documentation/SubmittingPatches
Attachment #355510 - Attachment is obsolete: true
Attachment #355765 - Flags: review?(bzbarsky)
blocking for b3 due to risk
Flags: blocking1.9.1? → blocking1.9.1+
Priority: P2 → P1
Comment on attachment 355765 [details] [diff] [review]
r3.v3

r+sr=bzbarsky.  nsRefPtr works just like nsCOMPtr for reference counting, but without the special magic for calling QI, and without the need for the object it's used on to be implementing nsISupports: it just needs to have AddRef and Release methods.  And yeah, it should be used for any refcounted non-interface pointer.

http://beaufour.dk/jst-review/ is something that someone wrote up at some point, though it doesn't seem to catch that whitespace thing.
Attachment #355765 - Flags: superreview+
Attachment #355765 - Flags: review?(bzbarsky)
Attachment #355765 - Flags: review+
So... that last patch doesn't apply on m-c.  Patrick, want to post a merged patch for checkin?  And a 1.9.1 patch too?
(In reply to comment #31)
> So... that last patch doesn't apply on m-c.  Patrick, want to post a merged
> patch for checkin?  And a 1.9.1 patch too?

Yes, I will prepare both later today
Attached patch roll up patch for 1.9.1 (obsolete) — Splinter Review
This is a roll up patch that should apply cleanly to 1.9.1. It contains:

* r2 patch - which adds a deferral queue of uri (already on moz central)

* patch from bug 467648 which fixed a leak introduced by r2 (already on moz central).. this must be included here because r3 touches the same dtor

* r3 patch - which changes the deferrals from uris to elements and adds a one-shot timer to make sure they are flushed in a reasonable amount of time
Attached patch roll up patch for moz central (obsolete) — Splinter Review
This is a roll up patch that should apply cleanly to moz-central. It contains:

* r3 patch - which changes the deferrals from uris to elements and adds a
one-shot timer to make sure they are flushed in a reasonable amount of time (this is a slightly different version that r3.3 in order to merge cleanly with moz central)

m-c already contains the r2 and 467648 patches
Pushed http://hg.mozilla.org/mozilla-central/rev/b73e063a3f99

Let's see what the numbers look like.
And backed out because of leaks.  On the bright side, Tp did come down!
when I just run firefox and exit it, I don't see any leak.

when I runtests.py --chrome --autorun --close-when-done I do see a leak (just as reported by tinderbox)

the key difference in execution seems to be that under the chrome test nsLayoutStatics::Shutdown() (and therefore nsHTMLDNSPrefetch::Shutdown() which is called from it) is not run. nsLayoutStatics::Initialize() is run for both invocations. That results in the leak.

Any thoughts on why that is?
Hmm.  nsLayoutStatics::Shutdown won't happen until all documents are destroyed, so if you're holding element pointers it won't get called.

You probably need to register an xpcom-shutdown observer of your own to clean up the elements....
(In reply to comment #38)
> Hmm.  nsLayoutStatics::Shutdown won't happen until all documents are destroyed,
> so if you're holding element pointers it won't get called.
> 
> You probably need to register an xpcom-shutdown observer of your own to clean
> up the elements....

That was a huge time saver. Thank you.

I now get the same leak results on moz-central for --chrome --browser-chrome and --a11y. --browser-chrome does report some leaks, but they are the same with and without the patch, the others run clean.

A bit later I will post a new rollup patch for moz-central.. if that gets a green light I'll do one for 191. I'm optimistic as the tp number worked out yesterday.

fyi this is the diff vs the last diff for comment - it just flushes the references on xpcom-shutdown as well as in the dtor. Leaving them in the dtor is redundant but feels cleanly and harmless. this will be included in the rollup, but it is easier to see the differneces isolated:

http://www.ducksong.com/mozilla/misc/dns-shutdown-changes.010909.txt
You don't want the RemoveObserver call in your destructor.  It'll get removed anyway, since it's a weak ref, and if you do it this way you'll potentially crash because RemoveObserver will make virtual function calls on you.
v2 of roll up patch for moz central that drops element refs on xpcom-shutdown. includes RemoveObserver adjustment from comment 40.
Attachment #355843 - Attachment is obsolete: true
(In reply to comment #40)
> You don't want the RemoveObserver call in your destructor.  It'll get removed
> anyway, since it's a weak ref, and if you do it this way you'll potentially
> crash because RemoveObserver will make virtual function calls on you.

so a weak ref is just a ptr that magically goes null when the object it point to destructs - but it doesn't add a count to that object?
That's correct.  The observer service will still hold a pointer the the nsIWeakReference even after your object dies, but do_QueryReferent will return null.  Once that happens, the observer service will drop its pointer to the nsIWeakReference.
Depends on: 473089
Attachment #356256 - Flags: superreview+
Attachment #356256 - Flags: review+
Pushed the roll-up patch v2 as http://hg.mozilla.org/mozilla-central/rev/5a819cbcef1e
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
This is the 1.9.1 version of the patch that was applied to moz central in comment 44. If the tests for that apply go ok, this can be applied to 1.9.1

the patch in 473089 is still needed in 1.9.1 on top of this.
Attachment #351004 - Attachment is obsolete: true
Attachment #355765 - Attachment is obsolete: true
Attachment #355842 - Attachment is obsolete: true
Looks like this helped Tp3 somewhat.  Not all the way back down, but close; and there have been other changes since then in other parts of the code, of course.
Pushed http://hg.mozilla.org/releases/mozilla-1.9.1/rev/049bd5ae4aad on 1.9.1 (the 1.9.1 roll-up patch).
Keywords: fixed1.9.1
You need to log in before you can comment on or make changes to this bug.