Closed Bug 1378680 Opened 8 years ago Closed 8 years ago

Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | mozilla::safebrowsing::LookupCacheV2::ReadCompletions

Categories

(Toolkit :: Safe Browsing, defect, P1)

52 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed

People

(Reporter: philipp, Assigned: tnguyen)

References

Details

(Keywords: crash, Whiteboard: #sbv4-m8)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is report bp-68018331-99e4-4edd-8a22-0d0ba0170706. ============================================================= Crashing Thread (28) Frame Module Signature Source 0 mozglue.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp:33 1 mozglue.dll mozalloc_handle_oom(unsigned int) memory/mozalloc/mozalloc_oom.cpp:46 2 mozglue.dll moz_xrealloc memory/mozalloc/mozalloc.cpp:107 3 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned int, unsigned int) obj-firefox/dist/include/nsTArray-inl.h:183 4 xul.dll nsTArray_Impl<ColorStop, nsTArrayInfallibleAllocator>::AppendElement<ColorStop, nsTArrayInfallibleAllocator>(ColorStop&&) obj-firefox/dist/include/nsTArray.h:2128 5 xul.dll mozilla::safebrowsing::LookupCacheV2::ReadCompletions() toolkit/components/url-classifier/LookupCache.cpp:529 6 xul.dll mozilla::safebrowsing::LookupCacheV2::Open() toolkit/components/url-classifier/LookupCache.cpp:409 7 xul.dll mozilla::safebrowsing::Classifier::GetLookupCache(nsACString_internal const&, bool) toolkit/components/url-classifier/Classifier.cpp:1370 8 xul.dll mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate() toolkit/components/url-classifier/Classifier.cpp:1035 9 xul.dll mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*) toolkit/components/url-classifier/Classifier.cpp:735 10 xul.dll nsUrlClassifierDBServiceWorker::ApplyUpdate() toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:677 11 xul.dll nsUrlClassifierDBServiceWorker::FinishUpdate() toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:618 12 xul.dll mozilla::detail::RunnableMethodImpl<nsIThread*, nsresult ( nsIThread::*)(void), 1, 0>::Run() xpcom/threads/nsThreadUtils.h:890 13 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1264 14 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:389 15 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:368 16 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 17 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 18 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:495 19 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 20 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 21 ucrtbase.dll _o__CIpow 22 kernel32.dll BaseThreadInitThunk 23 ntdll.dll __RtlUserThreadStart 24 ntdll.dll _RtlUserThreadStart crash reports with this signature started showing up since firefox 52, perhaps related to bug 1305801?
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #1) > That is oom error which will crash FireFox as default > http://searchfox.org/mozilla-central/rev/ > 5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/ > LookupCache.cpp#691 > unless we use AppendElemnt(falliable) then handle oom ourselves. That seems like a reasonable thing to do. In general, we try to "fail open" in these extreme cases (i.e. we give up and leave Safe Browsing in a non-working state but we don't crash). In this specific case, it looks like it will cause LookupCacheV2::Open() to return an error: http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/LookupCache.cpp#550-560 which we already handle in Classifier::GetLookupCache(): http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/Classifier.cpp#1466-1472
Priority: -- → P1
Whiteboard: #sbv4-m8
Assignee: nobody → tnguyen
I still have to scan our code to see where we should use infallible aloc. Also, many memory alocs are explicit infallible and we don't need to check null after that. For example http://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#944
I scanned url-classifier code and decided to change in several places, to fallible allocator (some places we did check null pointer but it seems not right, the allocator is set as infallible by default. We may crash before going to the null check). This will help us fix couple of OOM crashes for example https://crash-stats.mozilla.com/report/index/557c749f-9f30-4e0d-ab78-9dbff0170710 or https://crash-stats.mozilla.com/report/index/676b06d8-1c49-460e-9071-24ba80170709 Could you please take a look?
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review160816 This is very thorough. Thanks Thomas! We should make sure to run all of the tests on this. It's touching a lot of code. ::: toolkit/components/url-classifier/Classifier.cpp:493 (Diff revision 1) > rv = cache->Has(lookupHash, &has, &matchLength, &confirmed); > NS_ENSURE_SUCCESS(rv, rv); > > if (has) { > - LookupResult *result = aResults.AppendElement(); > + LookupResult *result = aResults.AppendElement(fallible); > if (!result) nit: let's add braces around that if statement while we're at it. ::: toolkit/components/url-classifier/HashStore.cpp:204 (Diff revision 1) > } > > -void > +nsresult > TableUpdateV4::NewRemovalIndices(const uint32_t* aIndices, size_t aNumOfIndices) > { > + if (!mRemovalIndiceArray.SetCapacity(aNumOfIndices, fallible)) { Let's add a `MOZ_ASSERT` just before that to ensure that `mRemovalIndiceArray` is empty. The `SetCapacity()` assumes that this is true. ::: toolkit/components/url-classifier/HashStore.cpp:211 (Diff revision 1) > + } > + > for (size_t i = 0; i < aNumOfIndices; i++) { > mRemovalIndiceArray.AppendElement(aIndices[i]); > } > } This is missing a `return NS_OK` statement. ::: toolkit/components/url-classifier/HashStore.cpp:599 (Diff revision 1) > storeIter++; > } > // no match, add > if (storeIter == storeEnd > || storeIter->Compare(updatePrefix) != 0) { > - if (!adds.AppendElement(updatePrefix)) > + if (!adds.AppendElement(updatePrefix, fallible)) nit: braces around the `if` block. ::: toolkit/components/url-classifier/LookupCache.cpp:426 (Diff revision 1) > > const nsACString& host = Substring(begin, iter); > > if (IsCanonicalizedIP(host)) { > - nsCString *key = aHostKeys->AppendElement(); > + nsCString *key = aHostKeys->AppendElement(fallible); > if (!key) nit: braces ::: toolkit/components/url-classifier/LookupCache.cpp:445 (Diff revision 1) > } > > // First check with two domain components > int32_t last = int32_t(hostComponents.Length()) - 1; > - nsCString *lookupHost = aHostKeys->AppendElement(); > + nsCString *lookupHost = aHostKeys->AppendElement(fallible); > if (!lookupHost) nit: braces ::: toolkit/components/url-classifier/LookupCache.cpp:456 (Diff revision 1) > lookupHost->Append("/"); > > // Now check with three domain components > if (hostComponents.Length() > 2) { > - nsCString *lookupHost2 = aHostKeys->AppendElement(); > + nsCString *lookupHost2 = aHostKeys->AppendElement(fallible); > if (!lookupHost2) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:324 (Diff revision 1) > aCount, &noiseEntries); > NS_ENSURE_SUCCESS(rv, rv); > > for (uint32_t i = 0; i < noiseEntries.Length(); i++) { > - LookupResult *result = results.AppendElement(); > + LookupResult *result = results.AppendElement(fallible); > if (!result) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:463 (Diff revision 1) > > - mProtocolParser = (useProtobuf ? static_cast<ProtocolParser*>(new ProtocolParserProtobuf()) > - : static_cast<ProtocolParser*>(new ProtocolParserV2())); > + mProtocolParser = (useProtobuf ? static_cast<ProtocolParser*>(new (fallible) > + ProtocolParserProtobuf()) > + : static_cast<ProtocolParser*>(new (fallible) > + ProtocolParserV2())); > if (!mProtocolParser) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1345 (Diff revision 1) > auto cacheResult = new CacheResultV2; > > cacheResult->table = result.mTableName; > cacheResult->prefix = result.hash.fixedLengthPrefix; > cacheResult->miss = true; > - mCacheResults->AppendElement(cacheResult); > + // Ok if this fails, just don't cache Because this is the negative cache time, we probably want to return an error. Otherwise, we could end up putting too much load on the server by checking the same prefix over and over again. ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1645 (Diff revision 1) > rv = NS_NewNamedThread("URL Classifier", &gDbBackgroundThread); > if (NS_FAILED(rv)) > return rv; > > - mWorker = new nsUrlClassifierDBServiceWorker(); > + mWorker = new (fallible) nsUrlClassifierDBServiceWorker(); > if (!mWorker) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:2001 (Diff revision 1) > // Create an nsUrlClassifierLookupCallback object. This object will > // take care of confirming partial hash matches if necessary before > // calling the client's callback. > nsCOMPtr<nsIUrlClassifierLookupCallback> callback = > - new nsUrlClassifierLookupCallback(this, c); > + new (fallible) nsUrlClassifierLookupCallback(this, c); > if (!callback) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp:388 (Diff revision 1) > const nsACString &aTable) > { > LOG(("Queuing requested update from %s\n", PromiseFlatCString(aUrl).get())); > > - PendingUpdate *update = mPendingUpdates.AppendElement(); > + PendingUpdate *update = mPendingUpdates.AppendElement(fallible); > if (!update) nit: braces
Attachment #8884733 - Flags: review?(francois) → review-
Status: NEW → ASSIGNED
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review163084 The patch looks fine, but I want to ask GCP before we land this. ::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:434 (Diff revisions 1 - 2) > } > > nsTArray<uint32_t> indexStarts; > if (!indexStarts.SetLength(indexSize, fallible) || > !mIndexPrefixes.SetLength(indexSize, fallible) || > - mIndexDeltas.SetLength(indexSize, fallible)) { > + !mIndexDeltas.SetLength(indexSize, fallible)) { Good thing you caught that!
Attachment #8884733 - Flags: review?(francois) → review+
GCP, I remember you saying that making things fallible has the downside of potentially hiding problematic allocations that we should chunk and having them silently fail instead of generating a crash report we can investigate. Do you think we're shooting ourselves in the foot with this patch? Should we only fix the one problematic allocation that's the cause of this bug instead?
Flags: needinfo?(gpascutto)
Looking at some crash reports, this is indeed what is happening here: many crashes show systems with plenty of free memory, but with a long uptime, and we're trying to allocate a 512KB sized chunk into a very fragmented heap. You're prescribing painkillers where surgery is more appropriate.
Flags: needinfo?(gpascutto)
That's not to say not crashing the user isn't a good thing, by the way. The issue of keeping track of these kind of large allocations is something to bring up with platform dev - crashing the user is far from an ideal way to find bugs like this.
So for this particular bug, how about this? 1. Look at the actual crash and determine whether or not that allocation can be broken into smaller pieces. 2. Make all very small allocations (or those which cannot realistically be broken up) fallible since we wouldn't be able to do anything about them.
Flags: needinfo?(gpascutto)
Yes, to (1). I am not sure that we need to bother about (2). If very small allocations start failing, a crash is likely imminent and Firefox is unusable by that time anyway. If you make the small allocations fallible, you need to make sure that the failure is correctly handled in all places. That's one reason why we have infallible allocations - crashing may be more secure than an incorrectly handled error condition. (A SafeBrowsing bypass could be a JS program that manages to fragment the Firefox heap appropriately, for example.) In the context of SafeBrowsing, I think failing to update due to fallible malloc failing is fine, but making SafeBrowsing itself not work, likely isn't.
Flags: needinfo?(gpascutto)
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review163988 Apologies for not thinking of this earlier Thomas. I think we should take the approach suggested by GCP in comment 16 instead.
Attachment #8884733 - Flags: review+ → review-
Thanks gcp, francois, Rescanned and removed unnecessary fallible things. Mostly I did change in update and make the possibly large allocations to be fallible. And refactor some places that we did handle oom, but used explicit infallible allocation.
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review166530
Attachment #8884733 - Flags: review?(francois) → review+
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/84e29e660f0c Refactor usage fallible alloc in url-classifier r=francois
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Low enough volume that it's probably not worth backporting. Feel free to set the status back to affected and request approval if you feel otherwise.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: