Closed
Bug 1378680
Opened 7 years ago
Closed 7 years ago
Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | mozilla::safebrowsing::LookupCacheV2::ReadCompletions
Categories
(Toolkit :: Safe Browsing, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla56
People
(Reporter: philipp, Assigned: tnguyen)
References
Details
(Keywords: crash, Whiteboard: #sbv4-m8)
Crash Data
Attachments
(2 files)
8.60 KB,
patch
|
Details | Diff | Splinter Review | |
59 bytes,
text/x-review-board-request
|
francois
:
review+
|
Details |
This bug was filed from the Socorro interface and is report bp-68018331-99e4-4edd-8a22-0d0ba0170706. ============================================================= Crashing Thread (28) Frame Module Signature Source 0 mozglue.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp:33 1 mozglue.dll mozalloc_handle_oom(unsigned int) memory/mozalloc/mozalloc_oom.cpp:46 2 mozglue.dll moz_xrealloc memory/mozalloc/mozalloc.cpp:107 3 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned int, unsigned int) obj-firefox/dist/include/nsTArray-inl.h:183 4 xul.dll nsTArray_Impl<ColorStop, nsTArrayInfallibleAllocator>::AppendElement<ColorStop, nsTArrayInfallibleAllocator>(ColorStop&&) obj-firefox/dist/include/nsTArray.h:2128 5 xul.dll mozilla::safebrowsing::LookupCacheV2::ReadCompletions() toolkit/components/url-classifier/LookupCache.cpp:529 6 xul.dll mozilla::safebrowsing::LookupCacheV2::Open() toolkit/components/url-classifier/LookupCache.cpp:409 7 xul.dll mozilla::safebrowsing::Classifier::GetLookupCache(nsACString_internal const&, bool) toolkit/components/url-classifier/Classifier.cpp:1370 8 xul.dll mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate() toolkit/components/url-classifier/Classifier.cpp:1035 9 xul.dll mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*) toolkit/components/url-classifier/Classifier.cpp:735 10 xul.dll nsUrlClassifierDBServiceWorker::ApplyUpdate() toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:677 11 xul.dll nsUrlClassifierDBServiceWorker::FinishUpdate() toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:618 12 xul.dll mozilla::detail::RunnableMethodImpl<nsIThread*, nsresult ( nsIThread::*)(void), 1, 0>::Run() xpcom/threads/nsThreadUtils.h:890 13 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1264 14 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:389 15 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:368 16 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 17 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 18 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:495 19 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 20 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 21 ucrtbase.dll _o__CIpow 22 kernel32.dll BaseThreadInitThunk 23 ntdll.dll __RtlUserThreadStart 24 ntdll.dll _RtlUserThreadStart crash reports with this signature started showing up since firefox 52, perhaps related to bug 1305801?
Assignee | ||
Comment 1•7 years ago
|
||
That is oom error which will crash FireFox as default http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/LookupCache.cpp#691 unless we use AppendElemnt(falliable) then handle oom ourselves. I doubt that the memory was exhausted for some reasons and FF would not work anymore. I think a crash is accepted in this case See https://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/memory/mozalloc/mozalloc_oom.cpp#24,54 https://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/memory/mozalloc/mozalloc_abort.cpp#22,33
Comment 2•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #1) > That is oom error which will crash FireFox as default > http://searchfox.org/mozilla-central/rev/ > 5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/ > LookupCache.cpp#691 > unless we use AppendElemnt(falliable) then handle oom ourselves. That seems like a reasonable thing to do. In general, we try to "fail open" in these extreme cases (i.e. we give up and leave Safe Browsing in a non-working state but we don't crash). In this specific case, it looks like it will cause LookupCacheV2::Open() to return an error: http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/LookupCache.cpp#550-560 which we already handle in Classifier::GetLookupCache(): http://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/toolkit/components/url-classifier/Classifier.cpp#1466-1472
Priority: -- → P1
Whiteboard: #sbv4-m8
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → tnguyen
Assignee | ||
Comment 3•7 years ago
|
||
Assignee | ||
Comment 4•7 years ago
|
||
I still have to scan our code to see where we should use infallible aloc. Also, many memory alocs are explicit infallible and we don't need to check null after that. For example http://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#944
Comment hidden (mozreview-request) |
Assignee | ||
Comment 6•7 years ago
|
||
I scanned url-classifier code and decided to change in several places, to fallible allocator (some places we did check null pointer but it seems not right, the allocator is set as infallible by default. We may crash before going to the null check). This will help us fix couple of OOM crashes for example https://crash-stats.mozilla.com/report/index/557c749f-9f30-4e0d-ab78-9dbff0170710 or https://crash-stats.mozilla.com/report/index/676b06d8-1c49-460e-9071-24ba80170709 Could you please take a look?
Comment 7•7 years ago
|
||
mozreview-review |
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review160816 This is very thorough. Thanks Thomas! We should make sure to run all of the tests on this. It's touching a lot of code. ::: toolkit/components/url-classifier/Classifier.cpp:493 (Diff revision 1) > rv = cache->Has(lookupHash, &has, &matchLength, &confirmed); > NS_ENSURE_SUCCESS(rv, rv); > > if (has) { > - LookupResult *result = aResults.AppendElement(); > + LookupResult *result = aResults.AppendElement(fallible); > if (!result) nit: let's add braces around that if statement while we're at it. ::: toolkit/components/url-classifier/HashStore.cpp:204 (Diff revision 1) > } > > -void > +nsresult > TableUpdateV4::NewRemovalIndices(const uint32_t* aIndices, size_t aNumOfIndices) > { > + if (!mRemovalIndiceArray.SetCapacity(aNumOfIndices, fallible)) { Let's add a `MOZ_ASSERT` just before that to ensure that `mRemovalIndiceArray` is empty. The `SetCapacity()` assumes that this is true. ::: toolkit/components/url-classifier/HashStore.cpp:211 (Diff revision 1) > + } > + > for (size_t i = 0; i < aNumOfIndices; i++) { > mRemovalIndiceArray.AppendElement(aIndices[i]); > } > } This is missing a `return NS_OK` statement. ::: toolkit/components/url-classifier/HashStore.cpp:599 (Diff revision 1) > storeIter++; > } > // no match, add > if (storeIter == storeEnd > || storeIter->Compare(updatePrefix) != 0) { > - if (!adds.AppendElement(updatePrefix)) > + if (!adds.AppendElement(updatePrefix, fallible)) nit: braces around the `if` block. ::: toolkit/components/url-classifier/LookupCache.cpp:426 (Diff revision 1) > > const nsACString& host = Substring(begin, iter); > > if (IsCanonicalizedIP(host)) { > - nsCString *key = aHostKeys->AppendElement(); > + nsCString *key = aHostKeys->AppendElement(fallible); > if (!key) nit: braces ::: toolkit/components/url-classifier/LookupCache.cpp:445 (Diff revision 1) > } > > // First check with two domain components > int32_t last = int32_t(hostComponents.Length()) - 1; > - nsCString *lookupHost = aHostKeys->AppendElement(); > + nsCString *lookupHost = aHostKeys->AppendElement(fallible); > if (!lookupHost) nit: braces ::: toolkit/components/url-classifier/LookupCache.cpp:456 (Diff revision 1) > lookupHost->Append("/"); > > // Now check with three domain components > if (hostComponents.Length() > 2) { > - nsCString *lookupHost2 = aHostKeys->AppendElement(); > + nsCString *lookupHost2 = aHostKeys->AppendElement(fallible); > if (!lookupHost2) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:324 (Diff revision 1) > aCount, &noiseEntries); > NS_ENSURE_SUCCESS(rv, rv); > > for (uint32_t i = 0; i < noiseEntries.Length(); i++) { > - LookupResult *result = results.AppendElement(); > + LookupResult *result = results.AppendElement(fallible); > if (!result) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:463 (Diff revision 1) > > - mProtocolParser = (useProtobuf ? static_cast<ProtocolParser*>(new ProtocolParserProtobuf()) > - : static_cast<ProtocolParser*>(new ProtocolParserV2())); > + mProtocolParser = (useProtobuf ? static_cast<ProtocolParser*>(new (fallible) > + ProtocolParserProtobuf()) > + : static_cast<ProtocolParser*>(new (fallible) > + ProtocolParserV2())); > if (!mProtocolParser) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1345 (Diff revision 1) > auto cacheResult = new CacheResultV2; > > cacheResult->table = result.mTableName; > cacheResult->prefix = result.hash.fixedLengthPrefix; > cacheResult->miss = true; > - mCacheResults->AppendElement(cacheResult); > + // Ok if this fails, just don't cache Because this is the negative cache time, we probably want to return an error. Otherwise, we could end up putting too much load on the server by checking the same prefix over and over again. ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1645 (Diff revision 1) > rv = NS_NewNamedThread("URL Classifier", &gDbBackgroundThread); > if (NS_FAILED(rv)) > return rv; > > - mWorker = new nsUrlClassifierDBServiceWorker(); > + mWorker = new (fallible) nsUrlClassifierDBServiceWorker(); > if (!mWorker) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:2001 (Diff revision 1) > // Create an nsUrlClassifierLookupCallback object. This object will > // take care of confirming partial hash matches if necessary before > // calling the client's callback. > nsCOMPtr<nsIUrlClassifierLookupCallback> callback = > - new nsUrlClassifierLookupCallback(this, c); > + new (fallible) nsUrlClassifierLookupCallback(this, c); > if (!callback) nit: braces ::: toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp:388 (Diff revision 1) > const nsACString &aTable) > { > LOG(("Queuing requested update from %s\n", PromiseFlatCString(aUrl).get())); > > - PendingUpdate *update = mPendingUpdates.AppendElement(); > + PendingUpdate *update = mPendingUpdates.AppendElement(fallible); > if (!update) nit: braces
Attachment #8884733 -
Flags: review?(francois) → review-
Assignee | ||
Updated•7 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 8•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4f6168229dab332a85d0e959f8bd2f679
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•7 years ago
|
||
Try result https://treeherder.mozilla.org/#/jobs?repo=try&revision=82a7e705c51a5beb574299eb0eb7515ad3b24dfc&selectedJob=113533847
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review163084 The patch looks fine, but I want to ask GCP before we land this. ::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:434 (Diff revisions 1 - 2) > } > > nsTArray<uint32_t> indexStarts; > if (!indexStarts.SetLength(indexSize, fallible) || > !mIndexPrefixes.SetLength(indexSize, fallible) || > - mIndexDeltas.SetLength(indexSize, fallible)) { > + !mIndexDeltas.SetLength(indexSize, fallible)) { Good thing you caught that!
Attachment #8884733 -
Flags: review?(francois) → review+
Comment 12•7 years ago
|
||
GCP, I remember you saying that making things fallible has the downside of potentially hiding problematic allocations that we should chunk and having them silently fail instead of generating a crash report we can investigate. Do you think we're shooting ourselves in the foot with this patch? Should we only fix the one problematic allocation that's the cause of this bug instead?
Flags: needinfo?(gpascutto)
Comment 13•7 years ago
|
||
Looking at some crash reports, this is indeed what is happening here: many crashes show systems with plenty of free memory, but with a long uptime, and we're trying to allocate a 512KB sized chunk into a very fragmented heap. You're prescribing painkillers where surgery is more appropriate.
Flags: needinfo?(gpascutto)
Comment 14•7 years ago
|
||
That's not to say not crashing the user isn't a good thing, by the way. The issue of keeping track of these kind of large allocations is something to bring up with platform dev - crashing the user is far from an ideal way to find bugs like this.
Comment 15•7 years ago
|
||
So for this particular bug, how about this? 1. Look at the actual crash and determine whether or not that allocation can be broken into smaller pieces. 2. Make all very small allocations (or those which cannot realistically be broken up) fallible since we wouldn't be able to do anything about them.
Flags: needinfo?(gpascutto)
Comment 16•7 years ago
|
||
Yes, to (1). I am not sure that we need to bother about (2). If very small allocations start failing, a crash is likely imminent and Firefox is unusable by that time anyway. If you make the small allocations fallible, you need to make sure that the failure is correctly handled in all places. That's one reason why we have infallible allocations - crashing may be more secure than an incorrectly handled error condition. (A SafeBrowsing bypass could be a JS program that manages to fragment the Firefox heap appropriately, for example.) In the context of SafeBrowsing, I think failing to update due to fallible malloc failing is fine, but making SafeBrowsing itself not work, likely isn't.
Flags: needinfo?(gpascutto)
Comment 17•7 years ago
|
||
mozreview-review |
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review163988 Apologies for not thinking of this earlier Thomas. I think we should take the approach suggested by GCP in comment 16 instead.
Attachment #8884733 -
Flags: review+ → review-
Comment hidden (typo) |
Assignee | ||
Comment 19•7 years ago
|
||
Thanks gcp, francois, Rescanned and removed unnecessary fallible things. Mostly I did change in update and make the possibly large allocations to be fallible. And refactor some places that we did handle oom, but used explicit infallible allocation.
Comment hidden (mozreview-request) |
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8884733 [details] Bug 1378680 - Refactor usage fallible alloc in url-classifier https://reviewboard.mozilla.org/r/155576/#review166530
Attachment #8884733 -
Flags: review?(francois) → review+
Assignee | ||
Comment 22•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f5774a3729bce349ee9348e52f11d06e68852eb8&selectedJob=117815807
Keywords: checkin-needed
Comment 23•7 years ago
|
||
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/84e29e660f0c Refactor usage fallible alloc in url-classifier r=francois
Keywords: checkin-needed
Comment 24•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/84e29e660f0c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 25•7 years ago
|
||
Low enough volume that it's probably not worth backporting. Feel free to set the status back to affected and request approval if you feel otherwise.
You need to log in
before you can comment on or make changes to this bug.
Description
•