Closed
Bug 1127401
Opened 11 years ago
Closed 11 years ago
Autophone - s1s2 - ~100 ms regression in cached Throbber stop - 2015-01-27
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(fennec38+)
RESOLVED
FIXED
Firefox 38
| Tracking | Status | |
|---|---|---|
| fennec | 38+ | --- |
People
(Reporter: bc, Assigned: n.nethercote)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(1 file)
|
6.19 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Regression in local blank and local twitter, throbber stop, cached, nexus 4 (Android 4.2), nexus 7 (Android 4.3). Nexus 5 is lagging and hasn't tested these changesets yet. Doesn't appear to happen for for first run uncached results or for Nexus S (Android 2.3). The change is most apparent for local blank.
http://phonedash.mozilla.org/#/org.mozilla.fennec/throbberstop/local-blank/norejected/2015-01-27/2015-01-29/cached/errorbars/standarderror/notry
before http://hg.mozilla.org/integration/mozilla-inbound/rev/a70ae54bd318
after http://hg.mozilla.org/integration/mozilla-inbound/rev/1a087d928c9d
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a70ae54bd318&tochange=1a087d928c9d -> bug 1124973
njn: I'm working on docs for Autophone and try but they aren't ready yet. Ping my on irc and I can walk you through it.
Updated•11 years ago
|
tracking-fennec: ? → 38+
Updated•11 years ago
|
Assignee: nobody → nalexander
| Assignee | ||
Comment 1•11 years ago
|
||
Bob explained to me how to read that graph. The regression is most obvious in the dark green line; it's near the end of Jan 27. It jumps from 2421 to 2544.
My only theory at the moment is that there might be an additional, unpredictable conditional branch after the change. I'll do some local testing to try to confirm this hypothesis.
| Assignee | ||
Comment 2•11 years ago
|
||
I've been looking at assembly code generated by clang for PLDHashTable::Lookup() vs. PLDHashTable::Search(). Both of them call SearchTable().
- For Lookup() the call to SearchTable() is the last thing in the function (in opt builds) so it does a tail call, surprisingly enough.
- For Search() is has to futz a bit with the return value of SearchTable(), which means it has to be a regular call.
I have a patch that adds an alternative version of SearchTable() that has the same return value convention (nullptr on failure) as Search(), and with that I get a noticeable speed-up even against Lookup() in a microbenchmark. Now I just need to work out how to structure that in a way that avoids tons of code duplication.
I admit I'm surprised that PLDHashTable lookups are hot enough that this fairly small difference is enough to be noticeable in the Fennec benchmarks.
| Assignee | ||
Comment 3•11 years ago
|
||
I've done two try runs, one being the m-i tip and the other having a patch that I hope will fix the problem:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c782e0adf55
https://treeherder.mozilla.org/#/jobs?repo=try&revision=070648c4ee73
I pushed them over 24 hours ago. I don't see them with the other try builds on phonedash.m.o. Do I need to do something to trigger a benchmarking run with a try build?
Flags: needinfo?(bob)
| Reporter | ||
Comment 4•11 years ago
|
||
njn: Something happened Thursday night where the pulse queue grew to exceed the limits and was deleted which prevented autophone from processing the messages and testing the associated builds. The first warning was at 01/29/2015 08:18 PM PT and the queue was deleted at 01/29/2015 09:59 PM PT. I don't know why it happened but suspect the glibc maintenance about that time. See the message "[Planned Maintenance Notification] Service Interruption Multiple Sites 17:00-19:00 Extended to 21:00" from MOC@Mozilla.com on 01/29/2015 07:40 PM PT. It looks like you were unfortunate enough to submit your try builds during the exact glibc maintenance window.
Sorry for the inconvenience. If you resubmit the builds it should all be good now. Ping me on irc when you do and I will follow along.
Flags: needinfo?(bob)
| Assignee | ||
Comment 5•11 years ago
|
||
Thanks, Bob. New pushes are here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=18116f1a98ef
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b8fec5e6996
| Reporter | ||
Comment 6•11 years ago
|
||
Cool. You can see the results at
https://treeherder.mozilla.org/#/jobs?repo=try&author=nnethercote@mozilla.com&exclusion_state=all
and http://phonedash.mozilla.org/#/org.mozilla.fennec/throbberstart/local-blank/rejected/2015-01-31/2015-01-31/notcached/noerrorbars/standarderror/try
Note the second build failed with PROCESS-CRASH | autophone-s1s2 | application crashed [@ PL_DHashTableRemove(PLDHashTable*, void const*)] for s1s2 and TEST_UNEXPECTED_FAIL | autophone-webapp | Failed to get uncached measurement.
| Assignee | ||
Comment 7•11 years ago
|
||
I fixed the crash and tried again. The new pair:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=18116f1a98ef
https://treeherder.mozilla.org/#/jobs?repo=try&revision=94ea353c2a56
The graph is here:
http://phonedash.mozilla.org/#/org.mozilla.fennec/throbberstop/local-blank/norejected/2015-01-31/2015-02-01/cached/errorbars/standarderror/try
Results looks fairly good:
- The nexus-4 improved by about 90, which looks like roughly what we lost.
- The nexus-5 improved by about 80, which also looks like roughly what we lost.
- The nexus-7 improved by about 80, which is a bit less than what we lost, which was about 120.
- The nexus-s-4 worsened by about 50, but I suspect that's noise. There's no equivalent device tested on m-i as far as I can tell, just on fx-team, so I don't know if the original patch caused a regression there.
The "local twitter" results also improved, though those measurements are noisy enough that I'm not sure if I trust them.
Bob, what do you think? Is this good enough to declare victory? Do we need more measurements? The combination of the noisy measurements and the subtlety of the code generation changes make me worry that it'll be hard to do better than this...
Flags: needinfo?(bob)
| Assignee | ||
Comment 8•11 years ago
|
||
Bob hasn't responded, but I will be optimistic and assume that this fixes the
problem.
Attachment #8558265 -
Flags: review?(nfroyd)
| Assignee | ||
Updated•11 years ago
|
Assignee: nalexander → n.nethercote
Status: NEW → ASSIGNED
| Reporter | ||
Comment 9•11 years ago
|
||
Sorry, I was under the weather and on PTO today. Looks like a definite improvement. Lets see if we can land it and compare it over time.
Flags: needinfo?(bob)
Comment 10•11 years ago
|
||
Comment on attachment 8558265 [details] [diff] [review]
Tweak PL_DHashTableSearch() to counter a Fennec regression
Review of attachment 8558265 [details] [diff] [review]:
-----------------------------------------------------------------
Initially I wanted to say "just use MOZ_ALWAYS_INLINE", but it's not clear to me that the code bloat from that is worthwhile.
::: xpcom/glue/pldhash.cpp
@@ +396,5 @@
> PLDHashEntryHdr* firstRemoved = nullptr;
>
> for (;;) {
> + if (IsAdd) {
> + if (MOZ_UNLIKELY(ENTRY_IS_REMOVED(entry))) {
Just to check my understanding, because it took me a while to catch on to this: is this just because |firstRemoved| is only used when we're doing an Add?
@@ +412,5 @@
>
> entry = ADDRESS_ENTRY(this, hash1);
> if (EntryIsFree(entry)) {
> METER(mStats.mMisses++);
> + return IsAdd ? (firstRemoved ? firstRemoved : entry) : nullptr;
...and then likewise for this transformation? (I think the compiler can forward-propagate the definition of |firstRemoved| in the !IsAdd case, but maybe it's better to have the more complicated expression to make things clearer?)
::: xpcom/glue/pldhash.h
@@ +298,5 @@
> static bool EntryIsFree(PLDHashEntryHdr* aEntry);
>
> PLDHashNumber ComputeKeyHash(const void* aKey);
>
> + template <bool IsAdd>
Instead of using |bool| here, let's use an enum, with bikesheddable names:
enum SearchReason {
ForLookup,
ForAdd,
};
template<enum SearchReason> ...
I think that will make the callers a little more readable, since we've lost the /*isAdd=*/ comments. The commit message will also need to be tweaked.
Attachment #8558265 -
Flags: review?(nfroyd) → review+
| Assignee | ||
Comment 11•11 years ago
|
||
> > for (;;) {
> > + if (IsAdd) {
> > + if (MOZ_UNLIKELY(ENTRY_IS_REMOVED(entry))) {
>
> Just to check my understanding, because it took me a while to catch on to
> this: is this just because |firstRemoved| is only used when we're doing an
> Add?
Yes. Sorry it wasn't clear.
> @@ +412,5 @@
> >
> > entry = ADDRESS_ENTRY(this, hash1);
> > if (EntryIsFree(entry)) {
> > METER(mStats.mMisses++);
> > + return IsAdd ? (firstRemoved ? firstRemoved : entry) : nullptr;
>
> ...and then likewise for this transformation? (I think the compiler can
> forward-propagate the definition of |firstRemoved| in the !IsAdd case, but
> maybe it's better to have the more complicated expression to make things
> clearer?)
Sorry, I'm not clear if you're asking for a change here, and if so, what it is.
> > + template <bool IsAdd>
>
> Instead of using |bool| here, let's use an enum, with bikesheddable names:
Sure.
Comment 12•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #11)
> > @@ +412,5 @@
> > >
> > > entry = ADDRESS_ENTRY(this, hash1);
> > > if (EntryIsFree(entry)) {
> > > METER(mStats.mMisses++);
> > > + return IsAdd ? (firstRemoved ? firstRemoved : entry) : nullptr;
> >
> > ...and then likewise for this transformation? (I think the compiler can
> > forward-propagate the definition of |firstRemoved| in the !IsAdd case, but
> > maybe it's better to have the more complicated expression to make things
> > clearer?)
>
> Sorry, I'm not clear if you're asking for a change here, and if so, what it
> is.
My fault! The "...and then likewise for this transformation" was continuing the previous question about whether this transformation was done because we only use |firstRemoved| in the add case. The parenthetical expression was just musing about what would be better code-wise.
| Assignee | ||
Comment 13•11 years ago
|
||
Comment 14•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 38
Comment 15•11 years ago
|
||
yay for this fix, talos didn't alert, but it did show a regression and a fix!
| Assignee | ||
Comment 16•11 years ago
|
||
Yeah, it looks good on phonedash:
http://phonedash.mozilla.org/#/org.mozilla.fennec/throbberstop/local-blank/norejected/2015-01-27/2015-02-03/cached/errorbars/standarderror/notry
The original regression was near the end of Jan 27. The fix is near the end of Feb 03. It's clearest on the dark green "nexus-7-jss15q-1 mozilla-inbound" line. Yay.
Updated•5 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•