Make Safe Browsing code more shutdown-aware

RESOLVED FIXED in Firefox 53

Status

()

defect
P1
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: francois, Assigned: tnguyen)

Tracking

unspecified
mozilla53
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

We listen for xpcom-shutdown-threads in:

http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1302

and xpcom-shutdown in:

http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#262
http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#562

As of bug 343033, we also listen to quit-application in the hash completer:

http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp#772-783

We should figure out which of xpcom-shutdown, xpcom-shutdown-threads or quit-application we should be listening for.

Also, we should probably add shutdown listeners in:

- list downloads
- ApplyUpdate()
- application reputation remote lookups

to stop waiting for the network (or to stop consuming the update data in the case of ApplyUpdate).
Assignee: nobody → tnguyen
I think we should focus on ApplyUpdate first, from telemetry[1], 1% updates take more than 15sec, this probably somehow indicate that shutdown during an update might be a problem.

BTW, I check the update time after 8/31, it seems there isn't dramatic change, so still don't know why shutdownhang in safebrowsing happen after that day.

[1] https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-10-19&keys=__none__!__none__!__none__&max_channel_version=release%252F49&measure=URLCLASSIFIER_CL_UPDATE_TIME&min_channel_version=null&os=Windows_NT&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-09-16&table=0&trim=1&use_submission_date=0
MozReview-Commit-ID: 819JgJxZaVl
Attachment #8808556 - Attachment description: Add shutdown tracker to make Safe Browsing code more shutdown-aware → WIP - Add shutdown tracker to make Safe Browsing code more shutdown-aware
Comment on attachment 8808556 [details] [diff] [review]
WIP - Add shutdown tracker to make Safe Browsing code more shutdown-aware

Review of attachment 8808556 [details] [diff] [review]:
-----------------------------------------------------------------

You might be able to just add the Observe method directly in one of the dbservice classes and not need a separate class.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +240,5 @@
>  nsUrlClassifierDBServiceWorker::DoLookup(const nsACString& spec,
>                                           const nsACString& tables,
>                                           nsIUrlClassifierLookupCallback* c)
>  {
> +  ENSURE_NOT_CALL_IN_SHUTDOWN(NS_ERROR_NOT_INITIALIZED);

You could simply call this `ENSURE_NOT_SHUTTING_DOWN`.

::: toolkit/components/url-classifier/nsUrlClassifierShutdownTracker.cpp
@@ +28,5 @@
> +                                        const char16_t *aData)
> +{
> +  NS_ENSURE_TRUE(mDBService, NS_ERROR_FAILURE);
> +
> +  if (!strcmp(aTopic, "quit-application") ||

It would be good to figure out the differences between these events and listen on the right one.

Right now, the Safe Browsing code is split between quit-application, xpcom-shutdown and xpcom-shutdown-threads and I doubt that's intentional.
Hmm, afaik, quit-application should be the earliest standard topic we could receive (perhaps quit-application-granted could be better but not too much different). Putting more another topic could be safer, probably the order of events are not guaranteed.
The key difference between those two notifications, as I understand it, is that at quit-application you still have profile access, while at xpcom-shutdown you no longer do. That's what I am concerned in our code, because we might have writing task running (writing to profile folder), not sure if we are violating expectation rule.
https://dxr.mozilla.org/mozilla-central/source/toolkit/modules/docs/AsyncShutdown.rst#227

For now, I just put the "ENSURE_NOT_SHUTTING_DOWN" to loops and functions in Classifier.cpp.
MozReview-Commit-ID: DVemih4NdZ5
Attachment #8808556 - Attachment is obsolete: true
Attachment #8808966 - Attachment description: Make Safe Browsing code more shutdown-aware → WIP Patch : Make Safe Browsing code more shutdown-aware
(In reply to Thomas Nguyen[:tnguyen] (use ni? plz) from comment #4)
> Hmm, afaik, quit-application should be the earliest standard topic we could
> receive (perhaps quit-application-granted could be better but not too much
> different). Putting more another topic could be safer, probably the order of
> events are not guaranteed.

I think the order for XPCOM shutdown event is guaranteed and how we should 
handle those events are well defined in [1], too.

[1] https://dxr.mozilla.org/mozilla-central/source/xpcom/build/XPCOMInit.cpp#820
Attachment #8808966 - Attachment is obsolete: true
Thanks :)

I added some shutdown listeners to reduce the time consuming of joining background thread (particularly in ApplyUpdate). The idea is to set the shutdown signature in "quit-application" event. The worker will be gracefully cancelled during shutdown phase (quit-application -> xpcom-shutdown) (it's still running in worker thread and it should not block shutting down) then the background thread will be joined at "xpcom-shutdown-threads" (at that time the thread is mostly complete).
https://wiki.mozilla.org/XPCOM_Shutdown
Although I think we should join background thread in "profile-before-change". (this will make shutting down time a bit longer, but we could safely write data to disk). xpcom-shutdown* observer topics will usually not happen in release builds, and "profile-before-change", release builds will _exit()!
We should make sure that all user data is written to disk by the end of the profile-before-change notification.
Status: NEW → ASSIGNED
Comment on attachment 8809307 [details]
Bug 1315386 - Make Safe Browsing code more shutdown-aware. .

https://reviewboard.mozilla.org/r/91902/#review93382

That's starting to look great Thomas. I've got a few more comments/questions.

Instead of `NS_ERROR_FAILURE`, maybe we should use `NS_ERROR_ABORT` to show that the operation was aborted (in this case due to a shutdown). In the cases where we are already returning `NS_ERROR_NOT_INITIALIZED`, we may as well keep using that in case anybody relies on that specific error.

Also, if we don't need to check `aData == 'shutdown'` and we can get away with just watching for `aTopic = 'quit-application'` like most of the other gecko users, then let's do that. It will avoid us breaking if someone splits "shutdown" into more things later.

::: toolkit/components/url-classifier/Classifier.cpp:424
(Diff revision 3)
>  }
>  
>  void
>  Classifier::AbortUpdateAndReset(const nsCString& aTable)
>  {
> +  // We dont need to reset after shutting down, it will make shutdown

"We don't need to reset while shutting down. It will only slow us down."

::: toolkit/components/url-classifier/ProtocolParser.cpp:148
(Diff revision 3)
>    nsresult rv;
>    mPending.Append(aData);
>  
>    bool done = false;
>    while (!done) {
> +    // Stop consuming data if we are going to shutdown

Here we could eliminate the comment if we replaced the macro with an `if` statement.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.h:37
(Diff revision 3)
>  #define PARTIAL_LENGTH 4
>  
>  // The hash length of a complete hash entry.
>  #define COMPLETE_LENGTH 32
>  
> +#define ENSURE_NOT_SHUTTING_DOWN(retval) \

I wonder whether the macros will make debugging more annoying.

It would probably be OK to have the `if` statement without the `NS_WARNING` at the top of the functions.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
(Diff revision 3)
>  nsUrlClassifierDBServiceWorker::DoLookup(const nsACString& spec,
>                                           const nsACString& tables,
>                                           nsIUrlClassifierLookupCallback* c)
>  {
> -  if (gShuttingDownThread) {
> +  ENSURE_NOT_SHUTTING_DOWN(NS_ERROR_FAILURE);
> -    c->LookupComplete(nullptr);

Looks like we lost a call to `LookupComplete(nullptr)` here.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1736
(Diff revision 3)
>      } else if (NS_LITERAL_STRING(CONFIRM_AGE_PREF).Equals(aData)) {
>        gFreshnessGuarantee = Preferences::GetInt(CONFIRM_AGE_PREF,
>          CONFIRM_AGE_DEFAULT_SEC);
>      }
> -  } else if (!strcmp(aTopic, "profile-before-change") ||
> -             !strcmp(aTopic, "xpcom-shutdown-threads")) {
> +  } else if (!strcmp(aTopic, "quit-application") &&
> +             !NS_strcmp(aData, u"shutdown")) {

Do we need to check the `aData` parameter?

It looks like most users of it in our codebase only look at `aTopic`.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1738
(Diff revision 3)
>          CONFIRM_AGE_DEFAULT_SEC);
>      }
> -  } else if (!strcmp(aTopic, "profile-before-change") ||
> -             !strcmp(aTopic, "xpcom-shutdown-threads")) {
> +  } else if (!strcmp(aTopic, "quit-application") &&
> +             !NS_strcmp(aData, u"shutdown")) {
>      Shutdown();
> +  } else if (!strcmp(aTopic, "profile-before-change")) {

Is this change necessary?

I don't fully understand what it does but I can see that it's setting `mWorkerProxy` to null and that's something that `Shutdown()` may call. If `quit-application` is called second, we may crash.

If it's not necessary to change this, I'd prefer we leave this alone to avoid introducing any new bugs.
Attachment #8809307 - Flags: review?(francois) → review-
(In reply to François Marier [:francois] from comment #13)
> ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1738
> (Diff revision 3)
> >          CONFIRM_AGE_DEFAULT_SEC);
> >      }
> > -  } else if (!strcmp(aTopic, "profile-before-change") ||
> > -             !strcmp(aTopic, "xpcom-shutdown-threads")) {
> > +  } else if (!strcmp(aTopic, "quit-application") &&
> > +             !NS_strcmp(aData, u"shutdown")) {
> >      Shutdown();
> > +  } else if (!strcmp(aTopic, "profile-before-change")) {
> 
> Is this change necessary?
> 
> I don't fully understand what it does but I can see that it's setting
> `mWorkerProxy` to null and that's something that `Shutdown()` may call. If
> `quit-application` is called second, we may crash.
> 
> If it's not necessary to change this, I'd prefer we leave this alone to
> avoid introducing any new bugs.

I see that you explained that change in comment 11. Sorry I missed that comment.

Since you've changed the even listeners from xpcom-shutdown* to quit-application, doesn't that take care of the problem for release builds already?
(In reply to François Marier [:francois] from comment #13)
> Also, if we don't need to check `aData == 'shutdown'` and we can get away
> with just watching for `aTopic = 'quit-application'` like most of the other
> gecko users, then let's do that. It will avoid us breaking if someone splits
> "shutdown" into more things later.

According to MDN (https://developer.mozilla.org/en/docs/Observer_Notifications#Application_shutdown) it's either "shutdown" or "restart". We should abort Safe Browsing work in both cases I think. We don't want to hang during restarts either.
Comment on attachment 8809307 [details]
Bug 1315386 - Make Safe Browsing code more shutdown-aware. .

https://reviewboard.mozilla.org/r/91902/#review93382

Thanks for your review :)

> Looks like we lost a call to `LookupComplete(nullptr)` here.

I am not quite sure if it's necessary to LookupComplete(nullptr) while shutting down. The main purpose of that is to trigger callback to resume/cancel channel in nsChannelClassifier.
This channel will be cancelled while shutting down.
Reverted to original version, anw
(In reply to François Marier [:francois] from comment #14)

> > >      Shutdown();
> > > +  } else if (!strcmp(aTopic, "profile-before-change")) {
> > 
> > Is this change necessary?
> > 
> > I don't fully understand what it does but I can see that it's setting
> > `mWorkerProxy` to null and that's something that `Shutdown()` may call. If
> > `quit-application` is called second, we may crash.
> > 
> > If it's not necessary to change this, I'd prefer we leave this alone to
> > avoid introducing any new bugs.
> 
> I see that you explained that change in comment 11. Sorry I missed that
> comment.
> 
> Since you've changed the even listeners from xpcom-shutdown* to
> quit-application, doesn't that take care of the problem for release builds
> already?
We had a condition check in the beginning   if (!gDbBackgroundThread || gShuttingDownThread)
    return NS_OK;
and also, quit-application is guaranteed to be called in correct order, before profile-before-change, then it will be ok.
FWIW:
Basically, shutting down contains several phases (said in (https://developer.mozilla.org/en/docs/Observer_Notifications#Application_shutdown) 
There's a watchdog
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/terminator/nsTerminator.cpp
starts in "quit-application" phase and kill firefox process (crash) without any cleanup if the shutdown phases take too long (> 63000 by default).
Joining background thread in dbservice will block a certain phase to complete(profile-before-change phase in our code). 
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-10-19&keys=__none__!__none__!__none__&max_channel_version=release%252F49&measure=SHUTDOWN_PHASE_DURATION_TICKS_PROFILE_BEFORE_CHANGE&min_channel_version=null&os=Windows_NT&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-09-16&table=0&trim=1&use_submission_date=0
I see that the profile-before-change phase takes the longest time during shutdown.
Not sure that is caused by which module (), but if in the case updating takes too long (> 15 secs, even more), it might be a problem.
Comment on attachment 8809307 [details]
Bug 1315386 - Make Safe Browsing code more shutdown-aware. .

https://reviewboard.mozilla.org/r/91902/#review93536

::: toolkit/components/downloads/ApplicationReputation.cpp:1341
(Diff revision 5)
> +      mTimeoutTimer->Cancel();
> +      mTimeoutTimer = nullptr;
> +    }
> +    nsresult rv;
> +    rv = mChannel->Cancel(NS_ERROR_ABORT);
> +    NS_ENSURE_SUCCESS(rv, rv);

Returning the error here can't do anything useful I think, so you can simplify this, or do Unused << NS_WARN_IF(NS_FAILED(call)) if you want the logging.

::: toolkit/components/url-classifier/Classifier.cpp:425
(Diff revision 5)
>  
>  void
>  Classifier::AbortUpdateAndReset(const nsCString& aTable)
>  {
> +  // We don't need to reset while shutting down. It will only slow us down.
> +  ENSURE_NOT_SHUTTING_DOWN_VOID();

The rationale for deprecating NS_ENSURE_SUCCESS was that it hid control flow. I would say that this has exactly the same problem. So I suggest a simple if (blah) { return NS_APPROPRIATE_ERROR; }.
Attachment #8809307 - Flags: review?(gpascutto) → review-
Comment on attachment 8809307 [details]
Bug 1315386 - Make Safe Browsing code more shutdown-aware. .

https://reviewboard.mozilla.org/r/91902/#review93536

> The rationale for deprecating NS_ENSURE_SUCCESS was that it hid control flow. I would say that this has exactly the same problem. So I suggest a simple if (blah) { return NS_APPROPRIATE_ERROR; }.

https://groups.google.com/forum/#!topic/mozilla.dev.platform/1clMLuuhtWQ
Oops, I didn't know about that. Thanks you for your review.
Moved back to simple if() :)
Comment on attachment 8809307 [details]
Bug 1315386 - Make Safe Browsing code more shutdown-aware. .

https://reviewboard.mozilla.org/r/91902/#review93874

::: toolkit/components/url-classifier/Classifier.cpp
(Diff revisions 5 - 6)
>    LOG(("  %d add completions", store.AddCompletes().Length()));
>    LOG(("  %d sub chunks", store.SubChunks().Length()));
>    LOG(("  %d sub prefixes", store.SubPrefixes().Length()));
>    LOG(("  %d sub completions", store.SubCompletes().Length()));
>  
> -  ENSURE_NOT_SHUTTING_DOWN(NS_ERROR_ABORT);

Did you intend to lose the checks here? Maybe one just before we start doing all the disk IO is sensible.
Attachment #8809307 - Flags: review?(gpascutto) → review+
Comment on attachment 8809307 [details]
Bug 1315386 - Make Safe Browsing code more shutdown-aware. .

https://reviewboard.mozilla.org/r/91902/#review93874

> Did you intend to lose the checks here? Maybe one just before we start doing all the disk IO is sensible.

too many 3 same lines if(){} in one function looks a little bit ugly, I just moved to beginning of HashStore.WriteFile()
Blocks: 1305486
Comment on attachment 8809307 [details]
Bug 1315386 - Make Safe Browsing code more shutdown-aware. .

https://reviewboard.mozilla.org/r/91902/#review94394
Attachment #8809307 - Flags: review?(francois) → review+
Thanks you all.
Rebased and I just revise a little to make sure it does not break xpcshell test (xpcshell does not usually fire "quit-application" as I know).
Keywords: checkin-needed
seems this need rebasing

 warning: conflicts while merging toolkit/components/url-classifier/nsUrlClassifierDBService.cpp! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)
Flags: needinfo?(tnguyen)
Keywords: checkin-needed
Flags: needinfo?(tnguyen)
Keywords: checkin-needed
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/a468efda20df
Make Safe Browsing code more shutdown-aware. r=francois,gcp.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/a468efda20df
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Depends on: 1321449
This had rather a profound change in reported values for URLCLASSIFIER_SHUTDOWN_TIME which is now reporting mostly 0ms with a handful of 1-2ms down from its previous-to-Nov28 median of about 8ms.

Is this expected? Enjoyed?

Is this measure still useful, since it's only ever reporting 0?
Flags: needinfo?(tnguyen)
Flags: needinfo?(francois)
(In reply to Chris H-C :chutten from comment #37)
> This had rather a profound change in reported values for
> URLCLASSIFIER_SHUTDOWN_TIME which is now reporting mostly 0ms with a handful
> of 1-2ms down from its previous-to-Nov28 median of about 8ms.
> 
> Is this expected? Enjoyed?

Yes. We made the URL Classifier bail really early when it receives the browser shutdown signal. So it should exit much more quickly.

> Is this measure still useful, since it's only ever reporting 0?

We'd like to keep it there to make sure we don't regress. It was put in place in order to prevent crashes like bug 1310060. Having it in the code should alert us of any new code that's not shutdown-aware.
Flags: needinfo?(tnguyen)
Flags: needinfo?(francois)
Excellent!

Are the alert emails getting through to people on safebrowsing-telemetry@mozilla.org? There should be one from telemetry-alerts@mozilla.com from December 2, 2016 about this.
Flags: needinfo?(francois)
(In reply to Chris H-C :chutten from comment #39)
> Are the alert emails getting through to people on
> safebrowsing-telemetry@mozilla.org? There should be one from
> telemetry-alerts@mozilla.com from December 2, 2016 about this.

I think so. I vaguely remember seeing it before the holidays, and ignoring it because it was expected :)
Flags: needinfo?(francois)
You need to log in before you can comment on or make changes to this bug.