Closed Bug 1362761 Opened 7 years ago Closed 5 years ago

Crash in nsUrlClassifierPrefixSet::WritePrefixes

Categories

(Toolkit :: Safe Browsing, defect, P2)

52 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox53 --- wontfix
firefox54 - wontfix
firefox55 - wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- fixed
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: philipp, Assigned: dimi)

References

Details

(Keywords: crash)

Crash Data

Attachments

(7 files, 4 obsolete files)

59 bytes, text/x-review-board-request
gcp
: review+
Details
59 bytes, text/x-review-board-request
gcp
: review+
Details
46 bytes, text/x-phabricator-request
dimi
: review+
Details | Review
46 bytes, text/x-phabricator-request
dimi
: review+
Details | Review
46 bytes, text/x-phabricator-request
dimi
: review+
Details | Review
46 bytes, text/x-phabricator-request
dimi
: review+
Details | Review
46 bytes, text/x-phabricator-request
dimi
: review+
Details | Review
This bug was filed from the Socorro interface and is 
report bp-f718be0e-d495-495a-b8cf-73c490170506.
=============================================================
Crashing Thread (27)
Frame 	Module 	Signature 	Source
0 	XUL 	nsUrlClassifierPrefixSet::WritePrefixes(nsIOutputStream*) 	xpcom/glue/nsTArray.h:395
1 	XUL 	nsUrlClassifierPrefixSet::StoreToFile(nsIFile*) 	toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:375
2 	XUL 	mozilla::safebrowsing::LookupCache::WriteFile() 	toolkit/components/url-classifier/LookupCache.cpp:154
3 	XUL 	mozilla::safebrowsing::Classifier::UpdateHashStore(nsTArray<mozilla::safebrowsing::TableUpdate*>*, nsACString_internal const&) 	toolkit/components/url-classifier/Classifier.cpp:998
4 	XUL 	mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*) 	toolkit/components/url-classifier/Classifier.cpp:532
5 	XUL 	nsUrlClassifierDBServiceWorker::FinishUpdate() 	toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:630
6 	XUL 	mozilla::detail::RunnableMethodImpl<nsresult (nsUrlClassifierDBServiceWorker::*)(), true, false>::Run 	xpcom/glue/nsThreadUtils.h:775
7 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1216
8 	XUL 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:361
9 	XUL 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:368
10 	XUL 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:232
11 	XUL 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:467
12 	libnss3.dylib 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:216
13 	libsystem_pthread.dylib 	_pthread_body 	
14 	libsystem_pthread.dylib 	_pthread_start 	
15 	libsystem_pthread.dylib 	thread_start 	
16 	libnss3.dylib 	libnss3.dylib@0x1b852f

this signature is regressing in firefox 52 and subsequent builds and primarily affects installations on android and macos. it looks like that may be related to bug 1305801.
[Tracking Requested - why for this release]: Lots of crashes on Android.  Might be worth including in a 53 dot release, not sure.
100 or so crashes per week on release isn't so bad as to make me want to risk trying to fix this in 53. 

What would be good here is someone to investigate the crash and come up with a patch. Then we can think about uplift to 54. It is hard to tell if 55 is affected but I would assume it's likely.
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Priority: -- → P1
his bug happens since 47,at least, I didn't dig further.

It looks like a regression of bug 1305801 because the function name changed from nsUrlClassifierPrefixSet::StoreToFile to nsUrlClassifierPrefixSet::WritePrefixes.

By using "nsUrlClassifierPrefixSet::StoreToFile" as the signature you will find the result[1] and look at the crash call stack it happened in the same place as this one[2].

[1] https://crash-stats.mozilla.com/signature/?signature=nsUrlClassifierPrefixSet%3A%3AStoreToFile&date=%3E%3D2016-01-01T01%3A11%3A00.000Z&date=%3C2016-11-30T01%3A11%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=version&_sort=-date&page=1
[2] https://crash-stats.mozilla.com/report/index/66f8a246-5a24-4aef-90bf-062362161128
As per comment 3, this is not a regression.
Assignee: dlee → nobody
Status: ASSIGNED → NEW
Keywords: regression
Priority: P1 → P3
Crash Signature: [@ nsUrlClassifierPrefixSet::WritePrefixes] → [@ nsUrlClassifierPrefixSet::WritePrefixes] [@ nsUrlClassifierPrefixSet::StoreToFile]
Old crash, not a huge volume, doesn't seem worth tracking for 55.
Priority: P3 → P5
I still could see it occurs on 56 and 57, give it a higher priority
Priority: P5 → P2
Depends on: 1401066
In the latest crash report, mCanary.check added by thomas in Bug 1401066 doesn't intercept the error[1].
So this is not caused by memory pollution in "nsUrlClassifierPrefixSet" class.
Also from the crash dump, I don't think the call stack is polluted.

The problem seems related to |mIndexPrefixes| in nsUrlClassifierPrefixSet[1].
I found all those crash dumps will get the same |indexSize| and |indexDeltaSize|[2] and by analyzing crash dump we
can know that this is not a bug of update algorithm.

Also the |fileSize| returned by |CalculatePreallocateSize|[3] is wrong for all crash dumps.
|CalculatePreallocateSize| function is quite simple it only invovles |mTotalPrefixes| & |mIndexPrefixes|.

Since mTotalPrefixes is just an integer & mIndexPrefixes is a nsTArray, so I'll guess mIndexPrefixes is corrupted.
(Maybe also mIndexDeltas, if everything is corrupted then mCanary.check added by thomas should find this issue).

The roughly update flow invovles mIndexPrefixes is:
1. nsUrlClassifierPrefixSet::MakePrefixSet (set mIndexPrefixes)
2. nsUrlClassifierPrefixSet::CalculatePreallocateSize (use mIndexPrefixes)
3. nsUrlClassifierPrefixSet::WritePrefixes (use mIndexPrefixes).

In step2 the value is already wrong, so the incorrect result may come from |MakePrefixSet|.
By checking nsUrlClassifierPrefixSet::MakePrefixSet[4], the only thing I can think of to cause this problem is |Compact|. At the end of MakePrefixSet we call Compact of mIndexDeltas and mIndexPrefixes to minimize memory usage,
maybe the memory rearrangement causes some kind of problem.

Hi francois,
I don't have any evidence to prove |Compact| causes this issue, it's just a guess after analyzing the crash dump.
I am not sure if its a good idea but since we already fully enable V4 in nightly, so V2 will only be used in tracking protection.
Maybe we can remove "Compact" in Nightly build, it shouldn't affect performance too much because most cases are V4.

Then we can keep monitoring nightly build to see if this may be the root cause.

[1] https://crash-stats.mozilla.com/report/index/f157ed44-cc87-41b1-8636-ec5bf0171001
[2] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.h#78
[3] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#503
[4] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#380
[5] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#90
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #7)
> I don't have any evidence to prove |Compact| causes this issue, it's just a
> guess after analyzing the crash dump.
> I am not sure if its a good idea but since we already fully enable V4 in
> nightly, so V2 will only be used in tracking protection.

Well, we do have a few crash reports in 57 and 58, so we may still be able to see a difference.

> Maybe we can remove "Compact" in Nightly build, it shouldn't affect
> performance too much because most cases are V4.
> 
> Then we can keep monitoring nightly build to see if this may be the root
> cause.

That sounds like a good idea.

Thanks for the detailed investigation Dimi!
Flags: needinfo?(francois)
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Attachment #8917283 - Flags: review?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #7)
> I am not sure if its a good idea but since we already fully enable V4 in
> nightly, so V2 will only be used in tracking protection.
> Maybe we can remove "Compact" in Nightly build, it shouldn't affect
> performance too much because most cases are V4.

Hi Francois,
I was wrong about above comment, those crashes happened in fixed length prefix set which will be used in both V2 and V4.
So if we remove Compact from NIGHTLY build it will affect both V2 and V4.

I am not sure if it is still fine to do it, I will spend some more time to dig deeper.
Comment on attachment 8917283 [details]
Bug 1362761 : Remove Compact from NIGHTLY build for debug purpose.

https://reviewboard.mozilla.org/r/188296/#review193736
Attachment #8917283 - Flags: review+
I have spent some time to keep finding other possible root causes, just cannot find one.

Hi francois,
Based on Comment 10, this patch will affect both V2 and V4.
If you think it is still worth doing it, I'll land this patch.
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #13)
> Based on Comment 10, this patch will affect both V2 and V4.
> If you think it is still worth doing it, I'll land this patch.

I think it would be worth checking whether or not compact() is the cause of the crashes, at least on Nightly.
Flags: needinfo?(francois)
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c70360f53ed9
Remove Compact from NIGHTLY build for debug purpose. r=francois
https://hg.mozilla.org/mozilla-central/rev/c70360f53ed9
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Reopen this bug since patch in Comment 15 is only for debug purpose.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Sadly this still happens after removing |Compact|[1]
I'll revert the patch landed and see what I can do...

[1] https://crash-stats.mozilla.com/report/index/e5d9df20-0d34-459f-9af5-19d340171028
Attached patch WIP Patch (obsolete) — Splinter Review
This WIP patch implements the checksum mechanism discussed in work week
Attachment #8917283 - Attachment is obsolete: true
Attachment #8940075 - Attachment is obsolete: true
Comment on attachment 8940563 [details]
Bug 1362761 - Add checksum to nsUrlClassifierPrefixSet::mIndexDeltas array

https://reviewboard.mozilla.org/r/210754/#review221106

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:515
(Diff revision 1)
> +  // in crash report to find out if this cause the crash.
> +  nsAutoCString checksum;
> +  CalculateTArrayChecksum(mIndexDeltas, checksum);
> +  bool isChecksumMatch = checksum == mIndexDeltasChecksum;
> +  if (!isChecksumMatch) {
> +    LOG(("The content of mIndexDeltas is changed!"));

Perhaps we should make it a full `NS_WARNING()` since it's quite bad if it happens.

Also, I'd suggest "The content of mIndexDeltas is corrupt!" to emphasize the badness of this state :)

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:588
(Diff revision 1)
> +  crypto = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv);
> +  if (NS_FAILED(rv)) {
> +    return rv;
> +  }
> +
> +  rv = crypto->Init(nsICryptoHash::MD5);

Instead of using MD5, we should probably use CRC32 which will be faster.

There's an implementation in mozilla-central:

https://searchfox.org/mozilla-central/rev/4611b9541894e90a421debb57ddbbcff55c2f369/other-licenses/bsdiff/bsdiff.c#46

that the layout code is already using:

https://searchfox.org/mozilla-central/rev/4611b9541894e90a421debb57ddbbcff55c2f369/layout/style/nsLayoutStylesheetCache.cpp#494
Attachment #8940563 - Flags: review-
Can be tested by looking for the isChecksumMatched variable in the stack trace of the minidump.

Dimi points out that his patch doesn't fix the problem, rather his intent was to first check whether or not we guessed right about the root cause.
Assignee: dimidiana → francois
Priority: P2 → P1
Keywords: leave-open
Whiteboard: #sbv4-m10
Attachment #8940563 - Attachment is obsolete: true
(In reply to François Marier [:francois] from comment #21)
> ::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:515
> (Diff revision 1)
> > +  // in crash report to find out if this cause the crash.
> > +  nsAutoCString checksum;
> > +  CalculateTArrayChecksum(mIndexDeltas, checksum);
> > +  bool isChecksumMatch = checksum == mIndexDeltasChecksum;
> > +  if (!isChecksumMatch) {
> > +    LOG(("The content of mIndexDeltas is changed!"));
> 
> Perhaps we should make it a full `NS_WARNING()` since it's quite bad if it
> happens.
> 
> Also, I'd suggest "The content of mIndexDeltas is corrupt!" to emphasize the
> badness of this state :)

I will make that change once I've confirmed that this theory was right, at the same time as returning an error instead of just logging the error.
Comment on attachment 8951480 [details]
Bug 1362761 - Add checksum to nsUrlClassifierPrefixSet::mIndexDeltas array.

https://reviewboard.mozilla.org/r/220784/#review227852

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:516
(Diff revision 1)
> +  // in crash report to find out if this cause the crash.
> +  uint32_t checksum;
> +  CalculateTArrayChecksum(mIndexDeltas, &checksum);
> +  bool isChecksumMatch = checksum == mIndexDeltasChecksum;
> +  if (!isChecksumMatch) {
> +    LOG(("The content of mIndexDeltas is changed!"));

If we expect to crash when this happens, what about a MOZ_RELEASE_ASSERT?

Seems handier than having to deal with minidumps?

Or if that's too severe, telemetry? (Maybe that won't catch the affected systems though)

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:588
(Diff revision 1)
> +
> +  for (uint32_t i = 0; i < aArray.Length(); i++) {
> +    void* hdr = reinterpret_cast<void*&>(aArray[i]);
> +    *outChecksum = ComputeCrc32c(*outChecksum,
> +                                 reinterpret_cast<uint8_t*>(&hdr),
> +                                 sizeof(void*));

Should the sizeof here be sizeof(T)?

The amount of data you are checksumming differs between 32-bit and 64-bit builds, it seems, whereas I doubt T is any different?
Attachment #8951480 - Flags: review?(gpascutto) → review-
Comment on attachment 8951481 [details]
Bug 1362761 - Improve logging in PrefixSet.

https://reviewboard.mozilla.org/r/220786/#review227856
Attachment #8951481 - Flags: review?(gpascutto) → review+
Comment on attachment 8951480 [details]
Bug 1362761 - Add checksum to nsUrlClassifierPrefixSet::mIndexDeltas array.

https://reviewboard.mozilla.org/r/220784/#review227852

> If we expect to crash when this happens, what about a MOZ_RELEASE_ASSERT?
> 
> Seems handier than having to deal with minidumps?
> 
> Or if that's too severe, telemetry? (Maybe that won't catch the affected systems though)

I put in a `MOZ_CRASH()`. You're right, since we suspect memory corruption, it's likely to be a bad day for the browser already. May as well make it easier to detect.

> Should the sizeof here be sizeof(T)?
> 
> The amount of data you are checksumming differs between 32-bit and 64-bit builds, it seems, whereas I doubt T is any different?

I changed the code to make it more obvious what we are doing. We're checksumming the memory address of each array element. Therefore yes, the pointer size will be different on 32-bit v. 64-bit. Does that make more sense or do you see something wrong with this approach?
Comment on attachment 8951480 [details]
Bug 1362761 - Add checksum to nsUrlClassifierPrefixSet::mIndexDeltas array.

https://reviewboard.mozilla.org/r/220784/#review229118

Okay, makes sense now realizing that mIndexDeltas is nested array.
Attachment #8951480 - Flags: review?(gpascutto) → review+
Pushed by fmarier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3a00711bb0e6
Add checksum to nsUrlClassifierPrefixSet::mIndexDeltas array. r=gcp
https://hg.mozilla.org/integration/autoland/rev/923a5ace946a
Improve logging in PrefixSet. r=gcp
See Also: → 1199617
MozReview-Commit-ID: DhhNOdrlkPF
Attachment #8991112 - Attachment is obsolete: true
This simplifies the logic around clearing the prefix set and also adds
the clearing of the mIndexDeltasChecksum which should have been done
as part of 3a00711bb0e6.

Additionally, the checks for whether or not the prefix set is empty
include some sanity-checking asserts.

Finally, mTotalPrefixes could be out of sync with mIndexPrefixes
and mIndexDeltas if LoadPrefixes() or MakePrefixSet() fail so we
now only update it once all elements have been added successfully.

There is now a release assert to catch grossly out-of-sync (or
corrupt) values of mTotalPrefixes.

MozReview-Commit-ID: BSbyD2dGsUY
Add assertions to highlight what the various data structures
should look like.

Also assert to ensure that mIndexPrefixes is always the same
length as mIndexDeltas and avoid writing the prefixes to disk
if that's not the case.

Do a single fallible allocation before we create the indexStarts
array instead of checking on each AppendElement() to simplify
the loop and emphasize the number of elements that the array
will receive (indexSize + 1). Remove the last element since we
don't actually end up writing it to disk.

MozReview-Commit-ID: HIg7ZmgaL7x
(In reply to Dimi Lee[:dimi][:dlee] from comment #7)
> Also the |fileSize| returned by |CalculatePreallocateSize|[3] is wrong for
> all crash dumps.
> |CalculatePreallocateSize| function is quite simple it only invovles
> |mTotalPrefixes| & |mIndexPrefixes|.
> 
> Since mTotalPrefixes is just an integer & mIndexPrefixes is a nsTArray, so
> I'll guess mIndexPrefixes is corrupted.
> (Maybe also mIndexDeltas, if everything is corrupted then mCanary.check
> added by thomas should find this issue).

I noticed that we set mTotalPrefixes early:

https://searchfox.org/mozilla-central/rev/93d2b9860b3d341258c7c5dcd4e278dea544432b/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#115

when we know we could exit this function with an NS_ERROR_OUT_OF_MEMORY.

nsUrlClassifierPrefixSet::SetPrefixes() is the only caller:
https://searchfox.org/mozilla-central/rev/93d2b9860b3d341258c7c5dcd4e278dea544432b/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#93

and so I added a Clear() when MakePrefixSet() fails.

Also, I think we're supposed to wipe the mIndexDeltasChecksum in Clear() too so I've added that.

I suspect the underlying problem has to do with data structures getting out-of-sync with one another and not getting cleared properly in some out-of-memory conditions.

If this patch series fixes the crashes, we should backout the checksumming patch.
Comment on attachment 8991116 [details]
Bug 1362761 - Add more specific warnings in case of file corruption. r?dimi

Dimi Lee[:dimi][:dlee] has approved the revision.

https://phabricator.services.mozilla.com/D2061
Attachment #8991116 - Flags: review+
Comment on attachment 8991119 [details]
Bug 1362761 - Make WritePrefixes() more readable. r?dimi

Dimi Lee[:dimi][:dlee] has approved the revision.

https://phabricator.services.mozilla.com/D2063
Attachment #8991119 - Flags: review+
Comment on attachment 8991117 [details]
Bug 1362761 - Safer Clean() and IsEmpty() handling in PrefixSet. r?dimi

Dimi Lee[:dimi][:dlee] has approved the revision.

https://phabricator.services.mozilla.com/D2062
Attachment #8991117 - Flags: review+
Comment on attachment 8991109 [details]
Bug 1362761 - Make prefix and chunk sets as const as possible. r?dimi

Dimi Lee[:dimi][:dlee] has approved the revision.

https://phabricator.services.mozilla.com/D2058
Attachment #8991109 - Flags: review+
Comment on attachment 8991115 [details]
Bug 1362761 - Force file and streams to use smart pointers. r?dimi

Dimi Lee[:dimi][:dlee] has approved the revision.

https://phabricator.services.mozilla.com/D2060
Attachment #8991115 - Flags: review+
Pushed by fmarier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/13866fb94708
Make prefix and chunk sets as const as possible. r=dimi
Pushed by fmarier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e1cb7dfc813e
Force file and streams to use smart pointers. r=dimi
Pushed by fmarier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cc89ebc5b5d8
Add more specific warnings in case of file corruption. r=dimi
Pushed by fmarier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6153cafd5002
Safer Clean() and IsEmpty() handling in PrefixSet. r=dimi
Pushed by fmarier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ec28801c69d7
Make WritePrefixes() more readable. r=dimi
This will hopefully be fixed by the changes that landed in comment 52 (Firefox 63).

If that's the case, we should backout the checksums added in comment 33 and then close this bug.
Assignee: francois → nobody
Status: ASSIGNED → NEW
Priority: P1 → P2
See Also: → 1510345

Adding 64 and 65 as affected per crash stats. For Mac crashes only, this is the #2 browser crash.

Base on the latest crash reports, I believe this bug is resolved after the fix in Bug 1542744!

Status: NEW → RESOLVED
Closed: 7 years ago5 years ago
Resolution: --- → FIXED
See Also: → 1542744
Assignee: nobody → francois
Assignee: francois → dlee
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: