Frequent checksum mismatches while applying pver4 updates

RESOLVED FIXED in Firefox 64

Status

()

defect
P1
normal
RESOLVED FIXED
2 years ago
10 months ago

People

(Reporter: francois, Assigned: dimi)

Tracking

unspecified
mozilla64
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox64 fixed)

Details

Attachments

(1 attachment)

While testing bug 1414051, I ran into an update problem:

[22894:URL Classifier]: D/UrlClassifierDbService nsUrlClassifierDBServiceWorker::BeginUpdate [googpub-phish-proto,goog-malware-proto,goog-unwanted-proto,goog-badbinurl-proto,goog-downloadwhite-proto,goog-passwordwhite-proto]
...
[Parent 22894, URL Classifier] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /home/francois/devel/mozilla-unified/toolkit/components/url-classifier/ProtocolParser.cpp, line 869
[Parent 22894, URL Classifier] WARNING: Failed to process one response.: file /home/francois/devel/mozilla-unified/toolkit/components/url-classifier/ProtocolParser.cpp, line 786
[22894:URL Classifier]: D/UrlClassifierDbService nsUrlClassifierDBServiceWorker::FinishUpdate
[22894:URL Classifier]: D/UrlClassifierDbService Classifier::AsyncApplyUpdates
[22894:Classifier Update]: D/UrlClassifierDbService Step 1. ApplyUpdatesBackground on update thread.
[22894:Classifier Update]: D/UrlClassifierDbService Copy in-use directory content for update.
[22894:Classifier Update]: D/UrlClassifierDbService Applying 5 table updates.
[22894:Classifier Update]: D/UrlClassifierDbService Classifier::UpdateTableV4(googpub-phish-proto)
...
[22894:Classifier Update]: D/UrlClassifierDbService Successfully updated goog-unwanted-proto
[22894:Classifier Update]: D/UrlClassifierDbService Classifier::UpdateTableV4(goog-badbinurl-proto)
[22894:Classifier Update]: D/UrlClassifierDbService Private store directory for goog-badbinurl-proto is /home/francois/devel/mozilla-unified/obj-x86_64-pc-linux-gnu/tmp/scratch_user/safebrowsing-updating/google4
[22894:Classifier Update]: D/UrlClassifierDbService Loading PrefixSet for goog-badbinurl-proto
[22894:Classifier Update]: D/UrlClassifierDbService stored PrefixSet exists, loading from disk
[22894:Classifier Update]: D/UrlClassifierDbService SB tree done, size = 109168 bytes
[22894:Classifier Update]: D/UrlClassifierDbService Checksum mismatch after applying partial update
[22894:URL Classifier]: D/UrlClassifierDbService Step 2. ApplyUpdatesForeground on caller thread
[22894:URL Classifier]: D/UrlClassifierDbService Resetting table: goog-badbinurl-proto
...
[22894:URL Classifier]: D/UrlClassifierDbService Dumping raw table updates...
[22894:URL Classifier]: D/UrlClassifierDbService Dumping failed update...
[22894:URL Classifier]: D/UrlClassifierDbService nsUrlClassifierDBServiceWorker::NotifyUpdateObserver
[22894:URL Classifier]: D/UrlClassifierDbService Notifying error: NS_ERROR_UC_UPDATE_CHECKSUM_MISMATCH (2154758149)
[22894:URL Classifier]: D/UrlClassifierDbService Resetting table: googpub-phish-proto
[22894:URL Classifier]: D/UrlClassifierDbService Resetting table: goog-malware-proto
[22894:URL Classifier]: D/UrlClassifierDbService Resetting table: goog-unwanted-proto
[22894:URL Classifier]: D/UrlClassifierDbService Resetting table: goog-badbinurl-proto
[22894:URL Classifier]: D/UrlClassifierDbService Resetting table: goog-downloadwhite-proto
[22894:URL Classifier]: D/UrlClassifierDbService Resetting table: goog-passwordwhite-proto
listmanager: 14:13:51 GMT-0800 (PST): update error for googpub-phish-proto,goog-malware-proto,goog-unwanted-proto,goog-badbinurl-proto,goog-downloadwhite-proto,goog-passwordwhite-proto from https://safebrowsing.googleapis.com/v4/threatListUpdates:fetch?$ct=application/x-protobuf&key=[trimmed-google-api-key]&$httpMethod=POST: 2154758149

which returned error NS_ERROR_UC_UPDATE_CHECKSUM_MISMATCH = 2154758149
while trying to apply a partial update to goog-badbinurl-proto. 

Looking inside tmp/scratch_user/safebrowsing/google4/, I couldn't find any
goog-badbinurl-proto.* files.

The original state according to the console log was:

goog-badbinurl-proto;Cg0IBxAGGAEiAzAwMTABENfXAhoCGAu+qmT6:Z59L3z5zS249nCUNSn3hbRXW2pRXxotEgpaEBQtI5Ic=
goog-malware-proto;Cg0IARAGGAEiAzAwMTABEOLnAxoCGAse6dMA:x9mkH+mhokP5gHP0T03srjv3wKV1pKEAkHfmL2sl+e8=
goog-unwanted-proto;Cg0IAxAGGAEiAzAwMTABEJXUAxoCGAtvexIZ:tGU0KySN51kIt+Jk3kX/JbK9Jcp6UCrQ7NZY0OBFHJ0=
goog-downloadwhite-proto;Cg0ICRAGGAEiAzAwMTABEAoaAhgLCRqTMA==:mfgIouBiFOrJCxqRaZh1yk2w9KWOiDRj6eDMfeooVvU=
googpub-phish-proto;Cg0IAhAGGAEiAzAwMTABEOXxAxoCGAuksioV:bMw5vBNL3RKHJ9Gno/10jlVdr8yZDXx8PIImUCGCCFQ=

I forced an update from about:url-classifier and then it worked:

[22894:URL Classifier]: D/UrlClassifierDbService Appending state 'Cg0IARAGGAEiAzAwMTABEOLnAxoCGAse6dMA' and checksum 'x9mkH+mhokP5gHP0T03srjv3wKV1pKEAkHfmL2sl+e8=' for table goog-malware-proto
[22894:URL Classifier]: D/UrlClassifierDbService Appending state 'Cg0IAxAGGAEiAzAwMTABEJXUAxoCGAtvexIZ' and checksum 'tGU0KySN51kIt+Jk3kX/JbK9Jcp6UCrQ7NZY0OBFHJ0=' for table goog-unwanted-proto
[22894:URL Classifier]: D/UrlClassifierDbService Appending state 'Cg0ICRAGGAEiAzAwMTABEAoaAhgLCRqTMA==' and checksum 'mfgIouBiFOrJCxqRaZh1yk2w9KWOiDRj6eDMfeooVvU=' for table goog-downloadwhite-proto
[22894:URL Classifier]: D/UrlClassifierDbService Appending state 'Cg0IAhAGGAEiAzAwMTABEOXxAxoCGAuksioV' and checksum 'bMw5vBNL3RKHJ9Gno/10jlVdr8yZDXx8PIImUCGCCFQ=' for table googpub-phish-proto
...
[22894:Classifier Update]: D/UrlClassifierDbService Classifier::UpdateTableV4(goog-badbinurl-proto)
[22894:Classifier Update]: D/UrlClassifierDbService Private store directory for goog-badbinurl-proto is /home/francois/devel/mozilla-unified/obj-x86_64-pc-linux-gnu/tmp/scratch_user/safebrowsing-updating/google4
[22894:Classifier Update]: D/UrlClassifierDbService Loading PrefixSet for goog-badbinurl-proto
[22894:Classifier Update]: D/UrlClassifierDbService no (usable) stored PrefixSet found
[22894:Classifier Update]: D/UrlClassifierDbService Write meta data of the last applied update.
[22894:Classifier Update]: D/UrlClassifierDbService Successfully updated goog-badbinurl-proto
...
[22894:URL Classifier]: D/UrlClassifierDbService Step 2. ApplyUpdatesForeground on caller thread
[22894:URL Classifier]: D/UrlClassifierDbService Swapping directories safebrowsing-updating and safebrowsing...
[22894:URL Classifier]: D/UrlClassifierDbService Private store directory for googpub-phish-proto is /home/francois/devel/mozilla-unified/obj-x86_64-pc-linux-gnu/tmp/scratch_user/safebrowsing/google4
[22894:URL Classifier]: D/UrlClassifierDbService Private store directory for goog-malware-proto is /home/francois/devel/mozilla-unified/obj-x86_64-pc-linux-gnu/tmp/scratch_user/safebrowsing/google4
[22894:URL Classifier]: D/UrlClassifierDbService Private store directory for goog-unwanted-proto is /home/francois/devel/mozilla-unified/obj-x86_64-pc-linux-gnu/tmp/scratch_user/safebrowsing/google4
[22894:URL Classifier]: D/UrlClassifierDbService Private store directory for goog-badbinurl-proto is /home/francois/devel/mozilla-unified/obj-x86_64-pc-linux-gnu/tmp/scratch_user/safebrowsing/google4
...
[22894:URL Classifier]: D/UrlClassifierDbService nsUrlClassifierDBServiceWorker::NotifyUpdateObserver
[22894:URL Classifier]: D/UrlClassifierDbService Notifying success: 1793
listmanager: 14:20:51 GMT-0800 (PST): update success for googpub-phish-proto,goog-malware-proto,goog-unwanted-proto,goog-badbinurl-proto,goog-downloadwhite-proto,goog-passwordwhite-proto from https://safebrowsing.googleapis.com/v4/threatListUpdates:fetch?$ct=application/x-protobuf&key=[trimmed-google-api-key]&$httpMethod=POST: 1793

Looking at the telemetry probe tracking this, more than 5% of updates fail
with a checksum mismatch:

https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-02-15&keys=google4!mozilla!google!baidu&max_channel_version=beta%252F59&measure=URLCLASSIFIER_UPDATE_ERROR&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-01-22&table=1&trim=1&use_submission_date=0
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-02-18&keys=google4!mozilla!google!baidu&max_channel_version=nightly%252F60&measure=URLCLASSIFIER_UPDATE_ERROR&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-01-22&table=1&trim=1&use_submission_date=0
See Also: → 1439455
Depends on: 1439468
i will take a look at this bug
If I remember correctly, if we get checksum mismatch error, the table will be deleted so we could not find that table in profile directory anymore. Then the next forced update should be ok ( we will store new metadata of the table).

However, I still don't know why we get "Checksum mismatch after applying partial update", I need more tests and look closer on that.
Assignee: francois → tnguyen
thomas is OOO, I'll take this one
Assignee: tnguyen → dlee
just a quick update that I am working on this bug.
I try to reproduce this bug on my Ubuntu & MAC, but this bug doesn't show up after running two consecutive overnight.
Fail to pass has_first_value()[1] check causes this issue.
I am still working on investigating if this is a server issue or a parser issue.

But no matter it is a parser or a server issue, right now when the protocol parser encounters an error,
we still try to apply an update to that table. This is the reason we saw lots of "checksum mismatch" in the Telemetry.

I think we should fix above issue and also add another Telemetry for protocol parser error(or combine them).

[1] https://searchfox.org/mozilla-central/source/toolkit/components/url-classifier/ProtocolParser.cpp#1007
Reproduced locally today, the data in RiceEncodingData has following properties:
1. has_first_value = 0
2. first_value = 0

The local modified build I tested ignored the |has_first_value| check[1] and everything works fine(the partial update is applied and passed the checksum mismatch check).

I think we might need to ask google if it is a server bug that when |first_value| is 0, |has_first_value| is also 0.
From the protocol buffer source code, I think it makes more sense that |has_first_value| is 1 in this case.

For us, we can simply fix this bug first by removing the |has_first_value| check. If there is a case that server sent bad data, it doesn't make much difference that we return an error in parser or in the partial update.

Also, from the evolution dashboard[2], we can see that the error happened on specific days, which prove the point that this is related to the server's data.

[1] https://searchfox.org/mozilla-central/rev/94e37e71ffbfd39e6ad73ebcda5d77cce8d341ae/toolkit/components/url-classifier/ProtocolParser.cpp#1007
[2] https://mzl.la/1PGPwMm
Sometimes the protocol buffer data(RiceEncodingData) sent by google's SafeBrowsing server has following properties:
1. |has_first_value| is false
2. |first_value| is 0

In this case, we can still parse the data and apply partial update correctly by just ignoring the |has_first_value| flag.
So comment out the check in ProtocolParser to avoid returning an error in an update appliable scenario.

Need to confirm with the server side to see if this is a temporarily workaround or a permanent solution.
See Also: → 1492858
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/37d138b1e58b
Ignore has_first_value() check in ProtocolParser. r=francois
https://hg.mozilla.org/mozilla-central/rev/37d138b1e58b
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.