Closed Bug 1348460 Opened 3 years ago Closed 3 years ago

~9% overhead of imgLoader::LoadImage() from PredictorLearn()

Categories

(Core :: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: ehsan, Assigned: u408661)

References

Details

(Whiteboard: [necko-active])

Attachments

(1 file)

See this profile: <https://perfht.ml/2nBng8v>

As far as I can tell, the network predictor is triggering two things that is incurring this cost: Sending the async IPC message.  This seems to be around 3ms?  And also calling into SpiderMonkey!!!  Thanks to nsILoadContext::GetOriginAttributes.  And in general a whole bunch of virtual function call overhead that is killing us here.

Are we gaining any benefits from this network prediction?  Can we please minimize the cost?  (needinfoing Patrick as a general Necko goto person since I'm not sure who the right person to ask is.)
Flags: needinfo?(mcmanus)
Depends on: 1348463
so its kind of weird to me that nsILoadContext is a js impl - is that unique to the test? OA is used throughout necko generally as part of hash keys, not just here - so that would be worth figuring out.

we've defintely seen benefits from this in the past - nick is the expert.
Flags: needinfo?(mcmanus) → needinfo?(hurley)
For the GetOriginAttributes there is bug 1348463.
Depends on: 1316683
No longer depends on: 1348463
Absolutely we see benefits from this - the 10% (or so) improvement we see on SpeedIndex is (to understate wildly) nothing to sneeze at.

I do terribly worry about the whole "load context in js" thing - the predictor needs load contexts to Do The Right Thing, so there's no way around that. It looks like you've got a handle on that over in the other bug, though.

Once we get that fixed and have numbers post-change, we can re-evaluate if there's more to be done here.
Flags: needinfo?(hurley)
ehsan, do you think we should do anything here beyond the OA fix?
Whiteboard: [necko-active]
Let me re-profile.  I need to get an optimized build with bug 1316683 applied (and hopefully some day learn enough coding skillz to land that bug also! ;-)
Flags: needinfo?(ehsan)
With that patch applied, PredictorLearn() almost doesn't show up in the profile any more! \o/
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(ehsan)
Resolution: --- → DUPLICATE
Duplicate of bug: 1316683
After further optimizations, now PredictorLearn() is 8-9% of the profile, even though this cost is now super small, but everything else around it is now much faster...

But there is barely anything else to shave off.  Bug 1356838 is the last bit of waste that I can see.  The rest is basically the cost of serialization and sending the IPC message to the parent process.

Nicholas, is it possible to dispatch a runnable to the main thread to run this whole thing asynchronously?  That would at least enable us to not incur this cost synchronously from the HTML5 parser flush loop.
Status: RESOLVED → REOPENED
Flags: needinfo?(hurley)
Resolution: DUPLICATE → ---
Summary: ~15% overhead of imgLoader::LoadImage() from PredictorLearn() → ~9% overhead of imgLoader::LoadImage() from PredictorLearn()
Yeah, this is easily doable - especially in the Learn() case (I'm more hesitant to make Predict() asynchronous, as that has the potential for delaying the predictions too long to be of any use).
Assignee: nobody → hurley
Flags: needinfo?(hurley)
Comment on attachment 8858974 [details]
Bug 1348460 - make predictor::learn async.

https://reviewboard.mozilla.org/r/130974/#review135392

Thanks a lot!

::: netwerk/base/Predictor.cpp:693
(Diff revision 1)
>  
>    nsCOMPtr<nsIThread> mIOThread;
>    nsCOMPtr<nsIFile> mDBFile;
>  };
>  
> +class PredictorLearnRunnable : public Runnable {

Nit: please make this final.

::: netwerk/test/unit/test_predictor.js:218
(Diff revision 1)
>      "http://localhost:4444/image.png"
>    ];
>  
>    // This is necessary to learn the origin stuff
>    predictor.learn(pageload_toplevel, null, predictor.LEARN_LOAD_TOPLEVEL, origin_attributes);
> +  do_timeout(0, () => { // allow the learn() to run on the main thread

Did you mean to adjust some whitespace here and below?  ;-)  (Perhaps you did and MozReview is hiding this?  It is showing something to me which is puzzling...)  It seems without that we're making these tests fairly hard to read.
Attachment #8858974 - Flags: review?(ehsan) → review+
Pushed by hurley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0107cfe404f8
make predictor::learn async. r=Ehsan
https://hg.mozilla.org/mozilla-central/rev/0107cfe404f8
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Hello.

This bug may have contributed to a sudden change in the Telemetry probe PREDICTOR_TOTAL_PRECONNECTS[1] which seems to have occurred in Nightly 20170425[2][3].

There was an overall increase in the number of preconnects reported which, since it's tracked, I assume has some sort of impact in some way?

Is this an improvement? A regression?

Is this intentional? Expected?

Is this probe measuring something useful?

[1]: http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/433/alerts/?from=2017-04-25&to=2017-04-25
[2]: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=73752931e273091185e1e4b5231c28beed657cc8&tochange=a30dc237c3a600a5231f2974fc2b85dfb5513414
[3]: https://mzl.la/2pxKLSX
Flags: needinfo?(hurley)
(In reply to Chris H-C :chutten from comment #14)
> Hello.
> 
> This bug may have contributed to a sudden change in the Telemetry probe
> PREDICTOR_TOTAL_PRECONNECTS[1] which seems to have occurred in Nightly
> 20170425[2][3].
> 
> There was an overall increase in the number of preconnects reported which,
> since it's tracked, I assume has some sort of impact in some way?

Huh!

> Is this an improvement? A regression?

Given there wasn't an associated reduction of other actions by the predictor, I would call this an improvement. Not necessarily huge, but one all the same.

> Is this intentional? Expected?

Quite unexpected, TBH. If anything, I would've expected this to result in a (slight) decrease in the preconnects reported, not the other way 'round. Given that, I should at least try to reason through why this would happen.

> Is this probe measuring something useful?

Yes! This is geared to understanding how changes in the internet (specifically how webpages are written) affect our ability to appropriately make predictions. It now seems that a change that is only tangentially related to how we decide what to do (the actual decision algorithm was unaffected by this patch) could change our behaviour, as well.

Like I said above, I need to reason through this to make sure I understand what happened here, but I'm not going to worry about it (I have higher priority stuff on my plate right now). Leaving the ni? for myself to make sure I circle back around.
Great! When you circle back, could you add a relevant alert_emails field to this probe (and it's friends?) so that the people who know it best get notified when it changes?
Flags: needinfo?(u408661)
You need to log in before you can comment on or make changes to this bug.