Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | mozilla::safebrowsing::LookupCacheV2::ReadCompletions

RESOLVED FIXED in Firefox 56

Status

()

Toolkit
Safe Browsing
P1
critical
RESOLVED FIXED
4 months ago
3 months ago

People

(Reporter: philipp, Assigned: tnguyen)

Tracking

({crash})

52 Branch
mozilla56
crash
Points:
---

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox54 wontfix, firefox55 wontfix, firefox56 fixed)

Details

(Whiteboard: #sbv4-m8, crash signature)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments)

(Reporter)

Description

4 months ago
This bug was filed from the Socorro interface and is 
report bp-68018331-99e4-4edd-8a22-0d0ba0170706.
=============================================================
Crashing Thread (28)
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp:33
1 	mozglue.dll 	mozalloc_handle_oom(unsigned int) 	memory/mozalloc/mozalloc_oom.cpp:46
2 	mozglue.dll 	moz_xrealloc 	memory/mozalloc/mozalloc.cpp:107
3 	xul.dll 	nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned int, unsigned int) 	obj-firefox/dist/include/nsTArray-inl.h:183
4 	xul.dll 	nsTArray_Impl<ColorStop, nsTArrayInfallibleAllocator>::AppendElement<ColorStop, nsTArrayInfallibleAllocator>(ColorStop&&) 	obj-firefox/dist/include/nsTArray.h:2128
5 	xul.dll 	mozilla::safebrowsing::LookupCacheV2::ReadCompletions() 	toolkit/components/url-classifier/LookupCache.cpp:529
6 	xul.dll 	mozilla::safebrowsing::LookupCacheV2::Open() 	toolkit/components/url-classifier/LookupCache.cpp:409
7 	xul.dll 	mozilla::safebrowsing::Classifier::GetLookupCache(nsACString_internal const&, bool) 	toolkit/components/url-classifier/Classifier.cpp:1370
8 	xul.dll 	mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate() 	toolkit/components/url-classifier/Classifier.cpp:1035
9 	xul.dll 	mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*) 	toolkit/components/url-classifier/Classifier.cpp:735
10 	xul.dll 	nsUrlClassifierDBServiceWorker::ApplyUpdate() 	toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:677
11 	xul.dll 	nsUrlClassifierDBServiceWorker::FinishUpdate() 	toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:618
12 	xul.dll 	mozilla::detail::RunnableMethodImpl<nsIThread*, nsresult ( nsIThread::*)(void), 1, 0>::Run() 	xpcom/threads/nsThreadUtils.h:890
13 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1264
14 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:389
15 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:368
16 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
17 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
18 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:495
19 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
20 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
21 	ucrtbase.dll 	_o__CIpow 	
22 	kernel32.dll 	BaseThreadInitThunk 	
23 	ntdll.dll 	__RtlUserThreadStart 	
24 	ntdll.dll 	_RtlUserThreadStart

crash reports with this signature started showing up since firefox 52, perhaps related to bug 1305801?
(Assignee)

Comment 1

4 months ago
That is oom error which will crash FireFox as default
http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/LookupCache.cpp#691
unless we use AppendElemnt(falliable) then handle oom ourselves.
I doubt that the memory was exhausted for some reasons and FF would not work anymore. I think a crash is accepted in this case
See 
https://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/memory/mozalloc/mozalloc_oom.cpp#24,54
https://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/memory/mozalloc/mozalloc_abort.cpp#22,33
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #1)
> That is oom error which will crash FireFox as default
> http://searchfox.org/mozilla-central/rev/
> 5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/
> LookupCache.cpp#691
> unless we use AppendElemnt(falliable) then handle oom ourselves.

That seems like a reasonable thing to do.

In general, we try to "fail open" in these extreme cases (i.e. we give up and leave Safe Browsing in a non-working state but we don't crash).

In this specific case, it looks like it will cause LookupCacheV2::Open() to return an error:

http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/LookupCache.cpp#550-560

which we already handle in Classifier::GetLookupCache():

http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/Classifier.cpp#1466-1472
Priority: -- → P1
Whiteboard: #sbv4-m8
(Assignee)

Updated

4 months ago
Assignee: nobody → tnguyen
(Assignee)

Comment 3

4 months ago
Created attachment 8884222 [details] [diff] [review]
WIP - should use fallible array in url-classifier
(Assignee)

Comment 4

4 months ago
I still have to scan our code to see where we should use infallible aloc. Also, many memory alocs are explicit infallible and we don't need to check null after that.
For example 
http://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#944
Comment hidden (mozreview-request)
(Assignee)

Comment 6

4 months ago
I scanned url-classifier code and decided to change in several places, to fallible allocator (some places we did check null pointer but it seems not right, the allocator is set as infallible by default. We may crash before going to the null check).
This will help us fix couple of OOM crashes for example 
https://crash-stats.mozilla.com/report/index/557c749f-9f30-4e0d-ab78-9dbff0170710
or 
https://crash-stats.mozilla.com/report/index/676b06d8-1c49-460e-9071-24ba80170709

Could you please take a look?

Comment 7

4 months ago
mozreview-review
Comment on attachment 8884733 [details]
Bug 1378680 - Refactor usage fallible alloc in url-classifier

https://reviewboard.mozilla.org/r/155576/#review160816

This is very thorough. Thanks Thomas!

We should make sure to run all of the tests on this. It's touching a lot of code.

::: toolkit/components/url-classifier/Classifier.cpp:493
(Diff revision 1)
>        rv = cache->Has(lookupHash, &has, &matchLength, &confirmed);
>        NS_ENSURE_SUCCESS(rv, rv);
>  
>        if (has) {
> -        LookupResult *result = aResults.AppendElement();
> +        LookupResult *result = aResults.AppendElement(fallible);
>          if (!result)

nit: let's add braces around that if statement while we're at it.

::: toolkit/components/url-classifier/HashStore.cpp:204
(Diff revision 1)
>  }
>  
> -void
> +nsresult
>  TableUpdateV4::NewRemovalIndices(const uint32_t* aIndices, size_t aNumOfIndices)
>  {
> +  if (!mRemovalIndiceArray.SetCapacity(aNumOfIndices, fallible)) {

Let's add a `MOZ_ASSERT` just before that to ensure that `mRemovalIndiceArray` is empty.

The `SetCapacity()` assumes that this is true.

::: toolkit/components/url-classifier/HashStore.cpp:211
(Diff revision 1)
> +  }
> +
>    for (size_t i = 0; i < aNumOfIndices; i++) {
>      mRemovalIndiceArray.AppendElement(aIndices[i]);
>    }
>  }

This is missing a `return NS_OK` statement.

::: toolkit/components/url-classifier/HashStore.cpp:599
(Diff revision 1)
>        storeIter++;
>      }
>      // no match, add
>      if (storeIter == storeEnd
>          || storeIter->Compare(updatePrefix) != 0) {
> -      if (!adds.AppendElement(updatePrefix))
> +      if (!adds.AppendElement(updatePrefix, fallible))

nit: braces around the `if` block.

::: toolkit/components/url-classifier/LookupCache.cpp:426
(Diff revision 1)
>  
>    const nsACString& host = Substring(begin, iter);
>  
>    if (IsCanonicalizedIP(host)) {
> -    nsCString *key = aHostKeys->AppendElement();
> +    nsCString *key = aHostKeys->AppendElement(fallible);
>      if (!key)

nit: braces

::: toolkit/components/url-classifier/LookupCache.cpp:445
(Diff revision 1)
>    }
>  
>    // First check with two domain components
>    int32_t last = int32_t(hostComponents.Length()) - 1;
> -  nsCString *lookupHost = aHostKeys->AppendElement();
> +  nsCString *lookupHost = aHostKeys->AppendElement(fallible);
>    if (!lookupHost)

nit: braces

::: toolkit/components/url-classifier/LookupCache.cpp:456
(Diff revision 1)
>    lookupHost->Append("/");
>  
>    // Now check with three domain components
>    if (hostComponents.Length() > 2) {
> -    nsCString *lookupHost2 = aHostKeys->AppendElement();
> +    nsCString *lookupHost2 = aHostKeys->AppendElement(fallible);
>      if (!lookupHost2)

nit: braces

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:324
(Diff revision 1)
>                                                aCount, &noiseEntries);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
>    for (uint32_t i = 0; i < noiseEntries.Length(); i++) {
> -    LookupResult *result = results.AppendElement();
> +    LookupResult *result = results.AppendElement(fallible);
>      if (!result)

nit: braces

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:463
(Diff revision 1)
>  
> -  mProtocolParser = (useProtobuf ? static_cast<ProtocolParser*>(new ProtocolParserProtobuf())
> -                                 : static_cast<ProtocolParser*>(new ProtocolParserV2()));
> +  mProtocolParser = (useProtobuf ? static_cast<ProtocolParser*>(new (fallible)
> +                                     ProtocolParserProtobuf())
> +                                 : static_cast<ProtocolParser*>(new (fallible)
> +                                     ProtocolParserV2()));
>    if (!mProtocolParser)

nit: braces

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1345
(Diff revision 1)
>      auto cacheResult = new CacheResultV2;
>  
>      cacheResult->table = result.mTableName;
>      cacheResult->prefix = result.hash.fixedLengthPrefix;
>      cacheResult->miss = true;
> -    mCacheResults->AppendElement(cacheResult);
> +    // Ok if this fails, just don't cache

Because this is the negative cache time, we probably want to return an error.

Otherwise, we could end up putting too much load on the server by checking the same prefix over and over again.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1645
(Diff revision 1)
>    rv = NS_NewNamedThread("URL Classifier", &gDbBackgroundThread);
>    if (NS_FAILED(rv))
>      return rv;
>  
> -  mWorker = new nsUrlClassifierDBServiceWorker();
> +  mWorker = new (fallible) nsUrlClassifierDBServiceWorker();
>    if (!mWorker)

nit: braces

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:2001
(Diff revision 1)
>    // Create an nsUrlClassifierLookupCallback object.  This object will
>    // take care of confirming partial hash matches if necessary before
>    // calling the client's callback.
>    nsCOMPtr<nsIUrlClassifierLookupCallback> callback =
> -    new nsUrlClassifierLookupCallback(this, c);
> +    new (fallible) nsUrlClassifierLookupCallback(this, c);
>    if (!callback)

nit: braces

::: toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp:388
(Diff revision 1)
>                                                   const nsACString &aTable)
>  {
>    LOG(("Queuing requested update from %s\n", PromiseFlatCString(aUrl).get()));
>  
> -  PendingUpdate *update = mPendingUpdates.AppendElement();
> +  PendingUpdate *update = mPendingUpdates.AppendElement(fallible);
>    if (!update)

nit: braces
Attachment #8884733 - Flags: review?(francois) → review-
(Assignee)

Updated

4 months ago
Status: NEW → ASSIGNED
(Assignee)

Comment 8

4 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4f6168229dab332a85d0e959f8bd2f679
Comment hidden (mozreview-request)
(Assignee)

Comment 10

3 months ago
Try result
https://treeherder.mozilla.org/#/jobs?repo=try&revision=82a7e705c51a5beb574299eb0eb7515ad3b24dfc&selectedJob=113533847

Comment 11

3 months ago
mozreview-review
Comment on attachment 8884733 [details]
Bug 1378680 - Refactor usage fallible alloc in url-classifier

https://reviewboard.mozilla.org/r/155576/#review163084

The patch looks fine, but I want to ask GCP before we land this.

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:434
(Diff revisions 1 - 2)
>      }
>  
>      nsTArray<uint32_t> indexStarts;
>      if (!indexStarts.SetLength(indexSize, fallible) ||
>          !mIndexPrefixes.SetLength(indexSize, fallible) ||
> -        mIndexDeltas.SetLength(indexSize, fallible)) {
> +        !mIndexDeltas.SetLength(indexSize, fallible)) {

Good thing you caught that!
Attachment #8884733 - Flags: review?(francois) → review+
GCP, I remember you saying that making things fallible has the downside of potentially hiding problematic allocations that we should chunk and having them silently fail instead of generating a crash report we can investigate.

Do you think we're shooting ourselves in the foot with this patch? Should we only fix the one problematic allocation that's the cause of this bug instead?
Flags: needinfo?(gpascutto)
Looking at some crash reports, this is indeed what is happening here: many crashes show systems with plenty of free memory, but with a long uptime, and we're trying to allocate a 512KB sized chunk into a very fragmented heap.

You're prescribing painkillers where surgery is more appropriate.
Flags: needinfo?(gpascutto)
That's not to say not crashing the user isn't a good thing, by the way. The issue of keeping track of these kind of large allocations is something to bring up with platform dev - crashing the user is far from an ideal way to find bugs like this.
So for this particular bug, how about this?

1. Look at the actual crash and determine whether or not that allocation can be broken into smaller pieces.
2. Make all very small allocations (or those which cannot realistically be broken up) fallible since we wouldn't be able to do anything about them.
Flags: needinfo?(gpascutto)
Yes, to (1). I am not sure that we need to bother about (2). If very small allocations start failing, a crash is likely imminent and Firefox is unusable by that time anyway.

If you make the small allocations fallible, you need to make sure that the failure is correctly handled in all places. That's one reason why we have infallible allocations - crashing may be more secure than an incorrectly handled error condition. (A SafeBrowsing bypass could be a JS program that manages to fragment the Firefox heap appropriately, for example.)

In the context of SafeBrowsing, I think failing to update due to fallible malloc failing is fine, but making SafeBrowsing itself not work, likely isn't.
Flags: needinfo?(gpascutto)

Comment 17

3 months ago
mozreview-review
Comment on attachment 8884733 [details]
Bug 1378680 - Refactor usage fallible alloc in url-classifier

https://reviewboard.mozilla.org/r/155576/#review163988

Apologies for not thinking of this earlier Thomas.

I think we should take the approach suggested by GCP in comment 16 instead.
Attachment #8884733 - Flags: review+ → review-
Comment hidden (typo)
(Assignee)

Comment 19

3 months ago
Thanks gcp, francois,
Rescanned and removed unnecessary fallible things. Mostly I did change in update and make the possibly large allocations to be fallible.
And refactor some places that we did handle oom, but used explicit infallible allocation.
Comment hidden (mozreview-request)

Comment 21

3 months ago
mozreview-review
Comment on attachment 8884733 [details]
Bug 1378680 - Refactor usage fallible alloc in url-classifier

https://reviewboard.mozilla.org/r/155576/#review166530
Attachment #8884733 - Flags: review?(francois) → review+
(Assignee)

Comment 22

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f5774a3729bce349ee9348e52f11d06e68852eb8&selectedJob=117815807
Keywords: checkin-needed

Comment 23

3 months ago
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/84e29e660f0c
Refactor usage fallible alloc in url-classifier r=francois
Keywords: checkin-needed

Comment 24

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/84e29e660f0c
Status: ASSIGNED → RESOLVED
Last Resolved: 3 months ago
status-firefox56: ? → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Low enough volume that it's probably not worth backporting. Feel free to set the status back to affected and request approval if you feel otherwise.
status-firefox54: affected → wontfix
status-firefox55: affected → wontfix
status-firefox-esr52: affected → wontfix
You need to log in before you can comment on or make changes to this bug.