Closed Bug 1385609 Opened 2 years ago Closed 2 years ago

Backoff seems to be interfering with updates

Categories

(Toolkit :: Safe Browsing, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- fixed

People

(Reporter: francois, Assigned: tnguyen)

Details

(Whiteboard: #sbv4-m10)

Attachments

(6 files)

From what I can see, the google4 provider is no longer updating even though both the backoff time and the nextupdatetime have passed.

Maybe there's a logic error somewhere in our code?
I did not take a deep look at this, but I guess there is probably something wrong when we show backoff time and last update status. If the last update succeeded, we should show 0 back off time. We should fix it first to get a correct debug information.
And, there's one case I am thinking. That is, we click update > 4 times in a sequence (in a short period of time), then this will happen. We could not make any other request in a period of 60 mins. (restart would help) 
http://searchfox.org/mozilla-central/rev/09c065976fd4f18d4ad764d7cb4bbc684bf56714/toolkit/components/url-classifier/nsUrlClassifierLib.js#119

hmm, we should clear requestTimes_ when getting 200 from server, shouldn't we?
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #1)
> I did not take a deep look at this, but I guess there is probably something
> wrong when we show backoff time and last update status. If the last update
> succeeded, we should show 0 back off time.

Yes. As soon as we have a successful update we should be exiting the backoff mode: https://developers.google.com/safe-browsing/v4/request-frequency#back-off-mode

That should be reflected on about:url-classifier since it's meant to help us detect when the internal state is wrong.

> We should fix it first to get a correct debug information.

Indeed, if there's a bug in about:url-classifier, we should fix it quickly so that we can detect any problems with the backoff algorithm as we roll this code out to more users.
Whiteboard: #sbv4-m8
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #2)
> hmm, we should clear requestTimes_ when getting 200 from server, shouldn't
> we?

I don't think we need to.

Looking at the code (http://searchfox.org/mozilla-central/rev/bbc1c59e460a27b20929b56489e2e55438de81fa/toolkit/components/url-classifier/nsUrlClassifierLib.js#119) if `requestTimes_` only contains old entries then they are ignored and the update is allowed to proceed.

Also, it was added in bug 428548 to prevent a high ping frequency if anything goes wrong and so I would be wary of removing it. Especially since it would mean that a user could go into about:url-classifier and click the button repeatedly.
I could not reproduce the issue, because if the last update succeeds, the about page will be refreshed with 0 backoff time.
It looks like there're some differences between your profile and my profile. Could you please send me your value of "urlclassifier" and "safebrowsing" preferences?
Flags: needinfo?(francois)
lower priority until I have more clue.
Priority: P1 → P2
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5)
> I could not reproduce the issue, because if the last update succeeds, the
> about page will be refreshed with 0 backoff time.
> It looks like there're some differences between your profile and my profile.
> Could you please send me your value of "urlclassifier" and "safebrowsing"
> preferences?

Yeah, let's give it a few days to see if you can reproduce it too.

(emailed you the details)
Flags: needinfo?(francois)
Whiteboard: #sbv4-m8 → #sbv4-m9
Priority: P2 → P3
It happened again on Nightly 57.0a1 (2017-08-16) (64-bit).
Assignee: nobody → tnguyen
UI looks correct this time but seems update was no longer triggered after the last failure. I will try to see what happened
Status: NEW → ASSIGNED
Priority: P3 → P2
Some cases I would think about:
- We did > 4 requests in 1 hour (server might return too small wait duration). Then the next scheduled checkupdate would be false, unfortunately, we did not reset the timer for the next update. Update would be stopped at that time.
http://searchfox.org/mozilla-central/rev/e8c36327cd8c9432c69e5e1383156a74330f11f2/toolkit/components/url-classifier/nsUrlClassifierListManager.js#324
- Or, the update request was put in queue
http://searchfox.org/mozilla-central/rev/e8c36327cd8c9432c69e5e1383156a74330f11f2/toolkit/components/url-classifier/nsUrlClassifierListManager.js#475
but the updater failed to notify callback (which will reset the next update timer).
I think a full log would be useful to look at this issue. I am trying to enable log in about:url-classifier and leaving my computer running for a long time.
If you could reproduce this issue, please set log enabled and log file so we could see what happened.
This one is strange: Baidu updates are stuck on an expired backoff, but the last update shows "N/A".
(In reply to François Marier [:francois] from comment #11)
> Created attachment 8898835 [details]
> Screenshot of stuck baidu update in about:url-classifier
> 
> This one is strange: Baidu updates are stuck on an expired backoff, but the
> last update shows "N/A".

The only place that we show last update "N/A" is when "onload" event is fired
Provider.init() -> Provider.render()
http://searchfox.org/mozilla-central/rev/b258e6864ee3e809d40982bc5d0d5aff66a20780/toolkit/content/aboutUrlClassifier.js#137
So I think that the page was refreshed or window onload was fired.
hmm, finally I saw it has just occurred today but unfortunately no useful JS log in Nightly 57.0a1 . I got modules log of nsUrlClassifierStreamUpdater but it looks good

[Main Thread]: D/UrlClassifierStreamUpdater nsUrlClassifierStreamUpdater::OnStartRequest (status=NS_OK, uri=https://cache-ams04.cdn.yandex.net/sba.cdn.yandex.net/chunks/ydx-fraud-phishing-shavar/.................., this=113bbc3a0)
[Main Thread]: D/UrlClassifierStreamUpdater nsUrlClassifierStreamUpdater::OnStartRequest failed (404)
[Main Thread]: D/UrlClassifierStreamUpdater nsUrlClassifierStreamUpdater::Download error [this=113bbc3a0]
..............
[Main Thread]: D/UrlClassifierStreamUpdater nsUrlClassifierStreamUpdater::UpdateSuccess [this=113bbc3a0]
[Main Thread]: D/UrlClassifierStreamUpdater nsUrlClassifierStreamUpdater::DownloadDone [this=113bbc3a0]
[Main Thread]: D/UrlClassifierStreamUpdater Notify download error callback in UpdateSuccess [this=113bbc3a0]
[Main Thread]: D/UrlClassifierStreamUpdater stream updater: calling into fetch next request


Not sure what happened in listmanager.js. I will try to run in debug build to see what happened
Attached file logLongRun.log.zip
I got one log
listmanager: 14:42:11 GMT+0800 (CST): checkForUpdates with https://download.api.bsb.baidu.com/downloads?ver=2.2&key=ffD7Y9anV5dZVp8
listmanager: 14:42:11 GMT+0800 (CST): Can't make update request
listmanager: 14:42:43 GMT+0800 (CST): checkForUpdates with https://safebrowsing.google.com/safebrowsing/downloads?client=navclient-auto-ffox&appver=57.0a&pver=2.2&key=[trimmed-google-api-key]


For any reason, we still have to make sure there's a timer for the next update running
Attached file logLongRun.log 2.zip
[Main Thread]: D/UrlClassifierStreamUpdater stream updater: calling into fetch next request
[Main Thread]: D/UrlClassifierStreamUpdater No more requests, returning
listmanager: 15:31:25 GMT+0800 (CST): checkForUpdates with https://download.api.bsb.baidu.com/downloads?ver=2.2&key=ffD7Y9anV5dZVp8
RequestBackoff: 15:31:25 GMT+0800 (CST): canMakeRequest : now : 1504078285494 requestTimes 1504063657792 length 4 nextRequestTime_ 1504078285611 errorTimeout_ 4241554 numErrors_ 3

The backoff time is shown in about:url-classifier is                                       Wed Aug 30 2017 15:31:25 GMT+0800 (CST)

I see that the canMakeRequest is bailed out at
http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/toolkit/components/url-classifier/nsUrlClassifierLib.js#115
Obviously, nsITimer is not abosultely accurate. The delay is approximate so we may have to give it a tolerance, 1 sec for example
Attachment #8902617 - Flags: review?(francois)
Comment on attachment 8902617 [details]
Bug 1385609 - Fix backoff issue that makes SB lists no longer update

https://reviewboard.mozilla.org/r/174232/#review179792

Great investigation Thomas!

::: commit-message-d10c9:4
(Diff revision 1)
> +Bug 1385609 - Fix backoff issue that makes SB lists no longer update
> +
> +The issue may occur when nsITimer is fired earlier than the request time
> +elapses. We fix by:

I would suggest the following to clarify the problem:

```
The issue occurs when nsITimer is fired earlier than the backoff time. In that case, the update doesn't proceed and we never make another attempt because the backoff update timer was oneshot.

We fix the issue in two ways:

- Add a tolerance of 1 second in case the timer fires too early.
- Set another oneshot timer whenever we are prevented from updating due to backoff.
```

::: toolkit/components/url-classifier/nsUrlClassifierLib.js:115
(Diff revision 1)
>   * Check to see if we can make a request.
>   */
>  RequestBackoff.prototype.canMakeRequest = function() {
>    var now = Date.now();
> -  if (now < this.nextRequestTime_) {
> +  // Note that nsITimer delay is approximate: the timer can be fired before the
> +  // requested time has elapsed. So, give it a tolerance 1 sec

"tolerance of 1 second."

::: toolkit/components/url-classifier/nsUrlClassifierLib.js:116
(Diff revision 1)
>   */
>  RequestBackoff.prototype.canMakeRequest = function() {
>    var now = Date.now();
> -  if (now < this.nextRequestTime_) {
> +  // Note that nsITimer delay is approximate: the timer can be fired before the
> +  // requested time has elapsed. So, give it a tolerance 1 sec
> +  if (now + 1000 < this.nextRequestTime_) {

If the timer fires before the update can proceed, then it's fine as long as it fires again.

Is this timer never going to fire again?

::: toolkit/components/url-classifier/nsUrlClassifierListManager.js:222
(Diff revision 1)
>    this.updateCheckers_[updateUrl] = Cc["@mozilla.org/timer;1"]
>                                      .createInstance(Ci.nsITimer);
>    this.updateCheckers_[updateUrl].initWithCallback(() => {
>      this.updateCheckers_[updateUrl] = null;
> -    this.checkForUpdates(updateUrl);
> +    if (updateUrl && !this.checkForUpdates(updateUrl)) {
> +      this.setUpdateCheckTimer(updateUrl, this.updateInterval);

Let's add a comment here:

    // Make another attempt later.
Attachment #8902617 - Flags: review?(francois) → review+
Comment on attachment 8902617 [details]
Bug 1385609 - Fix backoff issue that makes SB lists no longer update

https://reviewboard.mozilla.org/r/174232/#review179792

> If the timer fires before the update can proceed, then it's fine as long as it fires again.
> 
> Is this timer never going to fire again?

Yes, if the timer fires before the request time, it will be never going to fire again (that's how the issue occurs).
It only occurs for backoff case.
Another case, for test purpose, people click update >= 4 times update button. Then update timer will never fire again.
Since this one is a little riskier than other changes, and we've had this bug forever, let's not uplift to Beta.

We should get more testing on Nightly.
Whiteboard: #sbv4-m9 → #sbv4-m10
I add a slight change which added a parameter RequestBackoff.tolerance. So we can skip the tolerance in our test.
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #23)
> I add a slight change which added a parameter RequestBackoff.tolerance. So
> we can skip the tolerance in our test.

Looks good to me.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/14b4af261515
Fix backoff issue that makes SB lists no longer update r=francois
Keywords: checkin-needed
Backed out for eslint failures in nsUrlClassifierLib.js:

https://hg.mozilla.org/integration/autoland/rev/f08a3cf8fcb64c1e710e41e09733ef9094b1a080

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=14b4af2615158fe83b174c6f1236668e13cd20ab&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=128584746&repo=autoland

Two more spaces and you are done:
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/toolkit/components/url-classifier/nsUrlClassifierLib.js:117:3 | Expected space or tab after '//' in comment. (spaced-comment)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/toolkit/components/url-classifier/nsUrlClassifierLib.js:118:3 | Expected space or tab after '//' in comment. (spaced-comment)
Flags: needinfo?(tnguyen)
Flags: needinfo?(tnguyen)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/fdbb8e7b65f2
Fix backoff issue that makes SB lists no longer update r=francois
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/fdbb8e7b65f2
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
As per comment 21, we won't fix this in earlier versions and instead let it ride the train to get extra testing.
You need to log in before you can comment on or make changes to this bug.