Closed Bug 1247798 Opened 8 years ago Closed 8 years ago

nsUrlClassifierDBService registers thousands of spurious pref observers when ::Init fails

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- fixed
firefox48 --- fixed
firefox-esr45 47+ fixed

People

(Reporter: keeler, Assigned: Cykesiopka)

References

Details

Attachments

(2 files)

When running without a profile (e.g. xpcshell/the HSTS preload list updater), nsUrlClassifierDBService::Init() fails around line 1194 because it can't get a profile directory. This doesn't appear to be fatal in general (which is fine). However, it has already registered a number of strong observers at that point. Because Init failed, every time some other code tries to get it as a service, a whole new instance is created that also registers those same observers. Then, at shutdown, the Preferences implementation has to unregister around 250,000 preference observers. Turns out, this can take some time. nsUrlClassifierDBService should probably just register those observers last in Init, when everything else has succeeded.
Blocks: 1244803
For more concrete numbers into the effect this has, see Bug 1244803 comment 31.

For reference, 6000+ entries is now what our periodic HSTS update script has to handle due to a recent influx of domains that want to be preloaded. This bug is causing the script to timeout on shutdown on m-a.
Assignee: nobody → cykesiopka.bmo
Status: NEW → ASSIGNED
These observers should only be added when everything else has succeeded.
Failing to do so can cause long shutdown hangs in certain situations such as during periodic HSTS update runs.

Review commit: https://reviewboard.mozilla.org/r/44023/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/44023/
Attachment #8737583 - Flags: review?(gpascutto)
https://reviewboard.mozilla.org/r/44023/#review40577

gcp: Feel free to redirect the review as necessary.

I ran all the tests under toolkit/components/url-classifier/tests/ locally and played around with itisatrap.org pages and some prefs, and everything still seems to work.
Hopefully this is sufficient. If not, I can test further - I just need to know what else to test.

Thanks!
Comment on attachment 8737583 [details]
MozReview Request: Bug 1247798 - Make nsUrlClassifierDBService::Init() register pref observers only in the success case.

https://reviewboard.mozilla.org/r/44023/#review41319

LGTM.
Attachment #8737583 - Flags: review?(gpascutto) → review+
https://hg.mozilla.org/mozilla-central/rev/eb010c3c8e20
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8737583 [details]
MozReview Request: Bug 1247798 - Make nsUrlClassifierDBService::Init() register pref observers only in the success case.

Approval Request Comment
[Feature/regressing bug #]:
None. A recent spike in sites that registered to be HSTS preloaded in Chromium (and therefore indirectly for other browsers such as Firefox) has exacerbated an existing efficiency problem.

[User impact if declined]:
The weekly HSTS periodic update script continues to take a huge amount of time shutting down, getting killed, hence resulting in stale HSTS entries.
This means some domains that should be upgraded to HTTPS automatically aren't, and some domains that should no longer be upgraded still are.

[Describe test coverage new/current, TreeHerder]:
Covered by existing automated tests.

[Risks and why]:
Low. The bug this patch fixes should only manifest in special environments such as when running the HSTS script. Normal browsing should be unaffected before and after this patch.

[String/UUID change made/needed]:
None.
Attachment #8737583 - Flags: approval-mozilla-aurora?
Hi GCP, should we uplift this to Aurora 47 or let it ride the trains? I'd like to get a second opinion on the uplift request.

Hi cykesiopka, do the automated tests also verify the slow shutdown and stale entry issue? Did you get to manually verify that the fix is effective at eliminating those problems? Just want to confirm. Thanks!
Flags: needinfo?(gpascutto)
Flags: needinfo?(cykesiopka.bmo)
(In reply to Ritu Kothari (:ritu) from comment #9)
> Hi cykesiopka, do the automated tests also verify the slow shutdown and
> stale entry issue?

Hi Ritu,

The automated tests don't verify the slow shutdown entry or stale HSTS entry issue.
I'm not entirely sure if it's possible to add a test to check for slow shutdown.

To explain the stale HSTS entry thing more:
1) A script is run every week that pulls a list from Chromium for domains that have expressed a desire to be HSTS preloaded in browsers.
   1a) This means that when the browser contacts the domain, it always uses HTTPS, even if the domain has never been visited before.
2) These entries expire after a while.
   2a) This is so that domains that e.g. don't want the browser to always use HTTPS anymore can actually do so.
   2b) To the end user, this means the browser *might not* use HTTPS automatically anymore.
   2c) I probably don't have to explain why not using HTTPS is bad.
3) The expiry period is just long enough so that the m-a update during the last week of a release cycle is valid for both m-b and m-r for a while.

If the entries are not updated in time, automated tests for the HSTS feature will start failing on m-b or m-r after the entries expire, which will mostly "just" annoy the sheriffs.

> Did you get to manually verify that the fix is effective
> at eliminating those problems? Just want to confirm. Thanks!
Yes. See Bug 1244803 comment 31. Note that I only tested at max n = 6000.
The current Chromium list has upwards of 11000 entries in it (https://twitter.com/lgarron/status/718242465782853633).

The HSTS periodic update process will simply not work on m-a without this change.
We could ask releng to bump the script timeout to some insane value, but I doubt they would be happy to do so. 

Hopefully this makes the situation clearer.
Flags: needinfo?(cykesiopka.bmo)
(In reply to :Cykesiopka from comment #10)
> (In reply to Ritu Kothari (:ritu) from comment #9)
> > Hi cykesiopka, do the automated tests also verify the slow shutdown and
> > stale entry issue?
> 
> Hi Ritu,
> 
> The automated tests don't verify the slow shutdown entry or stale HSTS entry
> issue.
> I'm not entirely sure if it's possible to add a test to check for slow
> shutdown.
> 
> To explain the stale HSTS entry thing more:
> 1) A script is run every week that pulls a list from Chromium for domains
> that have expressed a desire to be HSTS preloaded in browsers.
>    1a) This means that when the browser contacts the domain, it always uses
> HTTPS, even if the domain has never been visited before.
> 2) These entries expire after a while.
>    2a) This is so that domains that e.g. don't want the browser to always
> use HTTPS anymore can actually do so.
>    2b) To the end user, this means the browser *might not* use HTTPS
> automatically anymore.
>    2c) I probably don't have to explain why not using HTTPS is bad.
> 3) The expiry period is just long enough so that the m-a update during the
> last week of a release cycle is valid for both m-b and m-r for a while.
> 
> If the entries are not updated in time, automated tests for the HSTS feature
> will start failing on m-b or m-r after the entries expire, which will mostly
> "just" annoy the sheriffs.
> 
> > Did you get to manually verify that the fix is effective
> > at eliminating those problems? Just want to confirm. Thanks!
> Yes. See Bug 1244803 comment 31. Note that I only tested at max n = 6000.
> The current Chromium list has upwards of 11000 entries in it
> (https://twitter.com/lgarron/status/718242465782853633).
> 
> The HSTS periodic update process will simply not work on m-a without this
> change.
> We could ask releng to bump the script timeout to some insane value, but I
> doubt they would be happy to do so. 
> 
> Hopefully this makes the situation clearer.

Thank you for this detailed explanation. It helps a lot. The fact that you have verified some aspects of it is good. I wonder if some kind of logging could be done to determine if the script shutdown improves/worsens. In any case I'll approve the aurora uplift.
Comment on attachment 8737583 [details]
MozReview Request: Bug 1247798 - Make nsUrlClassifierDBService::Init() register pref observers only in the success case.

Needed for HSTS support, Aurora47+
Attachment #8737583 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Ritu Kothari (:ritu) from comment #9)
> Hi GCP, should we uplift this to Aurora 47 or let it ride the trains? I'd
> like to get a second opinion on the uplift request.

I generally don't want to uplift unless it's necessary. In this case I'd have hoped that DB could be updated via Nightly, but it's clear from the comments above that this isn't how the process works.

The change looks quite harmless to me so no objections.
Flags: needinfo?(gpascutto)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #14)
> (In reply to Ritu Kothari (:ritu) from comment #9)
> > Hi GCP, should we uplift this to Aurora 47 or let it ride the trains? I'd
> > like to get a second opinion on the uplift request.
> 
> I generally don't want to uplift unless it's necessary. In this case I'd
> have hoped that DB could be updated via Nightly, but it's clear from the
> comments above that this isn't how the process works.
> 
> The change looks quite harmless to me so no objections.

Thanks GCP! Phew, I am glad we made the right call here. :)
Unfortunately, more entries have been added to the Chromium list and now esr45 is failing due to this bug as well.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined:
See comment 8 and comment 10. I'm asking for ESR uplift even though this isn't a sec bug because failing to do so will result in "broken" functionality that is expected to work in ESR45.

Fix Landed on Version:
48, already uplifted to 47.

Risk to taking this patch (and alternatives if risky):
Low. I ran the same sort of tests I did on m-c/m-a on esr45 as well, and things still work.

String or UUID changes made by this patch:
None.
Attachment #8742094 - Flags: approval-mozilla-esr45?
(In reply to :Cykesiopka from comment #16)
> Created attachment 8742094 [details] [diff] [review]
> bug1247798_only-register-pref-obs-on-success-nsUrlClassifierDBService_esr45.
> patch

Same sort of test as Bug 1244803 comment 31:
  n  | w/o fix | w/ fix
---------------|--------
6000 | 571s    | 18s
(In reply to Gian-Carlo Pascutto [:gcp] from comment #14)
> I generally don't want to uplift unless it's necessary. In this case I'd
> have hoped that DB could be updated via Nightly, but it's clear from the
> comments above that this isn't how the process works.

FWIW, I will look into the practicality of making this change.
Why are you requesting the uplift to esr when it wasn't considered for 46?
(In reply to Sylvestre Ledru [:sylvestre] from comment #19)
> Why are you requesting the uplift to esr when it wasn't considered for 46?

This is needed on any branch that runs the automatic HSTS update script. See comment 10 for some more background.
(In reply to David Keeler [:keeler] (use needinfo?) from comment #20)
> (In reply to Sylvestre Ledru [:sylvestre] from comment #19)
> > Why are you requesting the uplift to esr when it wasn't considered for 46?
> 
> This is needed on any branch that runs the automatic HSTS update script. See
> comment 10 for some more background.

Looks like I should expand on comment 10 as well:
 1) HSTS periodic updates occur on m-c, m-a, and active ESR branches.
    1a) This is because each branch has its own copy of the script to run (to handle e.g. m-c using a completely different data structure vs m-a to store preloaded data).
    1b) In general, the exact version number of m-c, m-a, ESR etc is irrelevant. We only care that all the branches get an update once a week.
    1c) This is why it doesn't matter whether the fix here is in 46. m-b and m-r are (in this specific context) irrelevant.

 2) The fix here is not really to deal with an issue with safe browsing.
    2a) The changes here should not have any impact on actual uses of the browser. It should only affect the special case of xpcshell + the HSTS periodic update process.
    2b) It just happens that safe browsing adds pref observers in a way that triggers efficiency problems in the pref code.
    2c) This fix is the easiest fix to get the periodic update process working again. The "proper" fix would be to try and get the pref code to be more efficient, but even if I had a patch for that, it wouldn't meet ESR uplift criteria anyways.
Sylvestre: Is there any other information I should provide?
Flags: needinfo?(sledru)
Sounds reasonable. I think we should have this on the next ESR. It should end up in ESR 45.2.0 which is planned to release around June 7th.
Comment on attachment 8742094 [details] [diff] [review]
bug1247798_only-register-pref-obs-on-success-nsUrlClassifierDBService_esr45.patch

ok, let's take it for esr then!
Thanks
Flags: needinfo?(sledru)
Attachment #8742094 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: