702217 - OOM crash in nsUrlClassifierStore::ReadPrefixes or nsUrlClassifierPrefixSet::SetPrefixes

Reporter

Description

•

13 years ago

It's a new crash signature that first appeared in 9.0a1/20110911. It's #20 top crasher in 9.0b1. There are two kinds of stack traces: 0 mozalloc.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:77 1 mozalloc.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:54 2 xul.dll nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity obj-firefox/dist/include/nsTArray-inl.h:106 3 xul.dll nsTArray<nsTimerImpl*,nsTArrayDefaultAllocator>::AppendElements<nsTimerImpl*> obj-firefox/dist/include/nsTArray.h:811 4 xul.dll nsUrlClassifierStore::ReadPrefixes toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:3562 5 xul.dll nsUrlClassifierDBServiceWorker::ConstructPrefixSet toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:3589 6 xul.dll nsUrlClassifierDBServiceWorker::ApplyUpdate toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:3159 7 xul.dll nsUrlClassifierDBServiceWorker::FinishUpdate toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:3194 8 xul.dll nsRunnableMethodImpl<unsigned int obj-firefox/dist/include/nsThreadUtils.h:345 9 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:631 10 xul.dll nsRunnable::Release obj-firefox/xpcom/build/nsThreadUtils.cpp:55 ... 0 mozalloc.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:77 1 mozalloc.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:54 2 xul.dll nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity obj-firefox/dist/include/nsTArray-inl.h:77 3 xul.dll nsTArray<nsTimerImpl*,nsTArrayDefaultAllocator>::AppendElements<nsTimerImpl*> obj-firefox/dist/include/nsTArray.h:811 4 xul.dll nsUrlClassifierPrefixSet::SetPrefixes toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:122 5 xul.dll nsUrlClassifierDBServiceWorker::ConstructPrefixSet toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:3611 6 xul.dll nsUrlClassifierDBServiceWorker::ApplyUpdate toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:3159 7 xul.dll nsUrlClassifierDBServiceWorker::FinishStream 8 xul.dll nsRunnableMethodImpl<unsigned int obj-firefox/dist/include/nsThreadUtils.h:345 9 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:631 10 xul.dll nsRunnable::Release obj-firefox/xpcom/build/nsThreadUtils.cpp:55 11 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426 ... More reports at: https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char%20const*%20const%29%20|%20mozalloc_handle_oom%28%29%20|%20nsTArray_base%3CnsTArrayDefaultAllocator%3E%3A%3AEnsureCapacity%28unsigned%20int%2C%20unsigned%20int%29%20|%20nsTArray%3CnsTimerImpl*%2C%20nsTArrayDefaultAllocator%3E%3A%3AAppendElements%3CnsTimerImpl*%3E%28nsTimerImpl*%20const*%2C%20unsign...

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 1

•

13 years ago

Quite some of these are repeated, i.e. if it crashes it will do so multiple times for the same user (unsure how socorro works, but I presume same install data+time=same user). Could these be corrupted databases? Maybe we should add a sanity check when constructing the tree, and axe the databases if they appear corrupted? Maybe add Telemetry for that, too?

Scoobidiver (away)

Reporter

Updated

•

13 years ago

Crash Signature: unsign... ] → unsign... ] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom() | nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity(unsigned int, unsigned int) | nsTArray<unsigned int nsTArrayDefaultAllocator>::AppendElements<unsigned int>(unsigned int co…

Robert Kaiser

Comment 3

•

13 years ago

Sorry, filed a dupe because of the signature being slightly different, here's my comment from there: This is topcrash #33 in Firefox 9 data from yesterday, it majorly happens in 9.0b2 builds, but I'm also seeing 10.0a2 Aurora and 11.a1 Nightly crashes with that signature in https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char%20const*%20const%29%20|%20mozalloc_handle_oom%28%29%20|%20nsTArray_base%3CnsTArrayDefaultAllocator%3E%3A%3AEnsureCapacity%28unsigned%20int%2C%20unsigned%20int%29%20|%20nsTArray%3Cunsigned%20int%2C%20nsTArrayDefaultAllocator%3E%3A%3AAppendElements%3Cunsigned%20int%3E%28unsigned%20int%20const*%2C%20unsign...

tracking-firefox10: --- → ?

tracking-firefox9: --- → ?

Robert Kaiser

Updated

•

13 years ago

Component: General → Phishing Protection

Product: Toolkit → Firefox

QA Contact: general → phishing.protection

Scoobidiver (away)

Reporter

Updated

•

13 years ago

Crash Signature: nsTArrayDefaultAllocator>::AppendElements<unsigned int>(unsigned int const*, unsign... ] → nsTArrayDefaultAllocator>::AppendElements<unsigned int>(unsigned int const*, unsign... ] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom() | nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity(unsigned int unsigned int) | nsTArray<gfxGlyph…

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 4

•

13 years ago

Question about this crash signature: some other code I am working on is doing AppendElements and then checking if that returns a NULL pointer, to detect OOM condiations. However, given that the traces here are aborts inside the allocator and not null pointer dereferences, am I correct that those checks are useless?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 5

•

13 years ago

Attached patch Patch 1. Sanity check prefixes size. Pass down error codes. — Details — Splinter Review

I don't think we can do much more than add a sanity check and reset the (corrupted) database if things seem to be going bonkers.

Assignee: nobody → gpascutto

Status: NEW → ASSIGNED

Attachment #578214 - Flags: review?(dcamp)

Attachment #578214 - Flags: approval-mozilla-beta?

Attachment #578214 - Flags: approval-mozilla-aurora?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 6

•

13 years ago

FWIW, I do think we want to fix/work around this, because users who encounter these issues likely won't be able to use Firefox at all unless they downgrade or kill their entire profile. Uninstall/reinstall won't fix it either.

Dave Camp (:dcamp)

Comment 7

•

13 years ago

(In reply to Gian-Carlo Pascutto (:gcp) from comment #4) > Question about this crash signature: some other code I am working on is > doing AppendElements and then checking if that returns a NULL pointer, to > detect OOM condiations. However, given that the traces here are aborts > inside the allocator and not null pointer dereferences, am I correct that > those checks are useless? By default nsTArray is infallible (will oom abort rather than return null). iirc it used to be fallible. You can switch back to a fallible version by using FallibleTArray instead of nsTArray. Might be a good idea here, but could do that in a followup.

Dave Camp (:dcamp)

Comment 8

•

13 years ago

Comment on attachment 578214 [details] [diff] [review] Patch 1. Sanity check prefixes size. Pass down error codes. Review of attachment 578214 [details] [diff] [review]: ----------------------------------------------------------------- Would it be reasonable to note in telemetry data how often this is happening? This is a reasonable sanity check, so r+.

Attachment #578214 - Flags: review?(dcamp) → review+

Alex Keybl [:akeybl]

Comment 9

•

13 years ago

Comment on attachment 578214 [details] [diff] [review] Patch 1. Sanity check prefixes size. Pass down error codes. [Triage Comment] Significant crasher and new to FF9. Let's take on aurora and beta.

Attachment #578214 - Flags: approval-mozilla-beta?

Attachment #578214 - Flags: approval-mozilla-beta+

Attachment #578214 - Flags: approval-mozilla-aurora?

Attachment #578214 - Flags: approval-mozilla-aurora+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 10

•

13 years ago

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=4be41994deb7

Ed Morley [:emorley]

Comment 11

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/636ea2bf3366

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → Firefox 11

Scoobidiver (away)

Reporter

Updated

•

13 years ago

status-firefox10: --- → fixed

status-firefox9: --- → fixed

Robert Kaiser

Comment 12

•

13 years ago

Scoobidiver, no idea why you did set this to fixed for 9 and 10 but I can't see any checkin on mozilla-aurora or mozilla-beta for this, so I'm setting this to affected for them, should be marked fixed on those by whoever checks in on the branches.

status-firefox10: fixed → affected

status-firefox9: fixed → affected

Scoobidiver (away)

Reporter

Comment 13

•

13 years ago

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12) > Scoobidiver, no idea why you did set this to fixed for 9 and 10 but I can't > see any checkin on mozilla-aurora or mozilla-beta for this My bad. I saw the Aurora and Beta approval flags but I didn't check the landing in branches.

Alex Keybl [:akeybl]

Updated

•

13 years ago

tracking-firefox10: ? → +

tracking-firefox9: ? → +

Alex Keybl [:akeybl]

Comment 14

•

13 years ago

(In reply to Ed Morley [:edmorley] from comment #11) > https://hg.mozilla.org/mozilla-central/rev/636ea2bf3366 Please land asap on aurora and beta to make this week's build.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 15

•

13 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/733bb4ce6f96

status-firefox10: affected → fixed

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 16

•

13 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/085723bda6b1

status-firefox9: affected → fixed

u279076

Comment 17

•

13 years ago

Is there something QA can do to verify this fix?

Whiteboard: [qa?]

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 18

•

13 years ago

>Is there something QA can do to verify this fix? Constructing a corrupted urlclassifier3.sqlite database that has this effect is tricky - I didn't succeed with adding random errors. If you can't get one from one of the users that were seeing this, then no, you can't verify this.

u279076

Comment 19

•

13 years ago

QA- based on comment 18. Can someone who was previously experiencing this bug please verify the fix on Firefox 9 and 10?

Whiteboard: [qa?] → [qa-]

Robert Kaiser

Comment 20

•

13 years ago

(In reply to Gian-Carlo Pascutto (:gcp) from comment #16) > https://hg.mozilla.org/releases/mozilla-beta/rev/085723bda6b1 IIUC, this landed in 9.0b5, but https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char%20const%2A%20const%29%20%7C%20mozalloc_handle_oom%28%29%20%7C%20nsTArray_base%3CnsTArrayDefaultAllocator%3E%3A%3AEnsureCapacity%28unsigned%20int%2C%20unsigned%20int%29%20%7C%20nsTArray%3CnsTimerImpl%2A%2C%20nsTArrayDefaultAllocator%3E%3A%3AAppendElements%3CnsTimerImpl%2A%3E%28nsTimerImpl%2A%20const%2A%2C%20unsign... is rising somewhat on that version. Is this something different?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 21

•

13 years ago

>is rising somewhat on that version. Is this something different? I don't see anything in that link? This one is in the same module, and probably related (I didn't look at the code in detail, but it could potentially have the same cause as this one): https://bugzilla.mozilla.org/show_bug.cgi?id=691364

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 22

•

13 years ago

>is rising somewhat on that version. Is this something different? Got it now: https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2011-12-12&signature=mozalloc_abort%28char%20const*%20const%29%20|%20mozalloc_handle_oom%28%29%20|%20nsTArray_base%3CnsTArrayDefaultAllocator%3E%3A%3AEnsureCapacity%28unsigned%20int%2C%20unsigned%20int%29%20|%20nsTArray%3CnsTimerImpl*%2C%20nsTArrayDefaultAllocator%3E%3A%3AAppendElements%3CnsTimerImpl*%3E%28nsTimerImpl*%20const*%2C%20unsign...&version=Firefox%3A9.0b5 It's the same issue as in this bug indeed. (The nsTimerImpl is bogus) Should we reopen this one or make a new bug? Looking at: http://hg.mozilla.org/releases/mozilla-beta/annotate/dea4e14dc88a/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#l3578 The fix that was done limits it at 5M entries in all cases, each 4 bytes big, so 20M, say 40M worst case depending on how nsTArray allocates memory. So we're OOM-ing trying to allocate at most 40M RAM, and in typical usage we know it will max out at about 5M RAM (because the real DB is 600-900k entries at most). What I'm suspecting is that something else entirely is leaking out RAM, and UrlClassifier is hitting it because its one of the only activities that runs automatically in the background even when the browser is idle, and allocates memory.

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Assignee: gpascutto → nobody

Status: RESOLVED → REOPENED

status-firefox10: fixed → ---

status-firefox9: fixed → ---

Resolution: FIXED → ---

Justin Lebar (not reading bugmail)

Comment 23

•

13 years ago

> What I'm suspecting is that something else entirely is leaking out RAM That's entirely possible, although it strikes me as a little far-fetched that we'd so often run *almost* entirely out of RAM and then, when idle, OOM. You could switch your TArrays to fallible. But I think I'd also put a fatal release-build assertion (in Nightly only) checking that the URL classifier is not using more memory than you think, because it strikes me as more likely that the problem is somehow self-contained.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 25

•

13 years ago

>You could switch your TArrays to fallible. What do I do if the allocation fails? I can't just disable SafeBrowsing. >But I think I'd also put a fatal release-build assertion (in Nightly only) >checking that the URL classifier is not using more memory than you think, because >it strikes me as more likely that the problem is somehow self-contained. The initial patch in this bug puts a hard limit on the maximum memory use of 40M, as explained in comment 22. The code still OOM's. Help? The OOM's happen in this loop, which has the hard protection. The protection is not very complicated, please check it and see if I'm missing something obvious: http://hg.mozilla.org/releases/mozilla-beta/annotate/745b004fad4a/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#l3594 or here, which is just making a copy to sort: http://hg.mozilla.org/releases/mozilla-beta/annotate/745b004fad4a/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#l122 (I think the copy in this case might be avoidable because the callers can tolerate that what they pass gets clobbered, so maybe I should just change the code) In real use cases (600k entries), lets say even including the extra copy, this code is allocating 10M of RAM. Why should it OOM? There seem to be a disproportional amount of cases on Windows XP, so I'd almost fear that it really is people running out of memory on tiny old systems. But there are many places in Firefox that use way more temporary memory than this, surely?

Justin Lebar (not reading bugmail)

Comment 26

•

13 years ago

> The code still OOM's. Help? Hmm. So on crash-stats, there's a large spread in uptimes over which this crash occurs. There are plenty of single-digit uptimes, and there are also some five-digit uptimes. That suggests this may not be just "collateral OOM" caused by FF using lots of memory. If it were, I'd expect the crashes to be weighted towards longer sessions. I guess that's not much help, is it? :) Do you need to process the whole thing all at once? Can you bring it in piecewise?

LegNeato

Comment 27

•

13 years ago

So should we back this out of beta/aurora or keep it in?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 28

•

13 years ago

>So should we back this out of beta/aurora or keep it in? Back what out? The patch here is harmless even if it didn't stop the problem. Backing out all recent UrlClassifier changes because of this crash which we don't understand? Bleh, that makes it difficult to progress forward as this is not reproducible. It might be that we're hitting a *peak* memory usage with the changes that's higher than before, but not more than about 15M (unless there's another copy I missed). I can attempt to reduce this, but that'll be taking time away from getting the *new* urlclassifier landed, so I'd prefer not to unless we judge the amount of crashes we're getting from this a problematic.

Justin Lebar (not reading bugmail)

Comment 29

•

13 years ago

(From comment 26) > Do you need to process the whole thing all at once? Can you bring it in piecewise?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 30

•

13 years ago

UrlClassifierPrefixSet supports piecewise construction, but the entries must be brought in in sorted order. The latter part can (currently) be offloaded to SQLite, though. So yeah, that should be possible.

LegNeato

Comment 31

•

13 years ago

(In reply to Gian-Carlo Pascutto (:gcp) from comment #28) > >So should we back this out of beta/aurora or keep it in? > > Back what out? The patch here is harmless even if it didn't stop the > problem. > > Backing out all recent UrlClassifier changes because of this crash which we > don't understand? Bleh, that makes it difficult to progress forward as this > is not reproducible. > > It might be that we're hitting a *peak* memory usage with the changes that's > higher than before, but not more than about 15M (unless there's another copy > I missed). I can attempt to reduce this, but that'll be taking time away > from getting the *new* urlclassifier landed, so I'd prefer not to unless we > judge the amount of crashes we're getting from this a problematic. We took a change into aurora and beta to fix a crash and it did not seem to fix it. We should generally back out to the state that was tested on the channels for many weeks to reduce risk, as the reward we were seeking doesn't look to be there. That being said, it's your call on what you want to do.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 32

•

13 years ago

Attached patch Patch 2. Fix double allocation. Reduce peak size. Allow fallible alloc. (obsolete) — Details — Splinter Review

Unfortunately the two things I said above didn't actually work, so I had to try something different: 1) Clobbering what's passed to SetPrefixes is annoying because it's an XPCOM exposed interface. It's simpler to just document that the caller is responsible for sorting the array before passing and just do that. Code adjusted like that now, this saves a complete copy of the array that's OOM-ing. 2) Although PrefixSet can be constructed in parts, we need the array sorted to do this, and that *isn't* possible from within SQLite because we still rehash the array's entries afterwards, messing up the sorted order. So we need at least 1 full copy in RAM. However, we can probe what size the array will be before constructing it, and SetCapacity it to the correct size. This might save another doubling of the array size in the worst case. 3) I made the arrays involved fallible, and added telemetry to detect when this triggers. This will cause us to disable SafeBrowsing if we OOM, at least until the next update (30-45 mins). Disabling a security feature isn't nice, but if the alternate is to crash the browser with OOM... Maybe the Telemetry can help us narrow down the circumstances in which this happens. I considered falling back to SQLite lookups only, but this (a) complicates the code severely (b) is likely to give a terrible experience as there won't be RAM for SQLite cache either.

Assignee: nobody → gpascutto

Attachment #585702 - Flags: review?(justin.lebar+bug)

Attachment #585702 - Flags: feedback?(dcamp)

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

tracking-firefox11: --- → ?

Justin Lebar (not reading bugmail)

Comment 33

•

13 years ago

This looks virtuous to me, but f- because I'd like to take another look. ># HG changeset patch ># Parent f301341f2e02f2ad33525f616dac019dc0f1e158 ># User Gian-Carlo Pascutto <gpascutto@mozilla.com> >Bug 702217 - Avoid double allocation in UrlClassifier. Handle OOM conditions. r= > >diff --git a/toolkit/components/build/nsToolkitCompsCID.h b/toolkit/components/build/nsToolkitCompsCID.h >--- a/toolkit/components/build/nsToolkitCompsCID.h >+++ b/toolkit/components/build/nsToolkitCompsCID.h >@@ -158,19 +158,19 @@ >-// {7902e243-9b00-4673-9288-1b7fc246a8f8} >+// {f806c3b6-a7cd-4ac9-bb5d-8c8a20a6e175} > #define NS_URLCLASSIFIERPREFIXSET_CID \ >-{ 0x7902e243, 0x9b00, 0x4673, { 0x92, 0x88, 0x1b, 0x7f, 0xc2, 0x46, 0xa8, 0xf8} } >+{ 0xf806c3b6, 0xa7cd, 0x4ac9, { 0xbb, 0x5d, 0x8c, 0x8a, 0x20, 0xa6, 0xe1, 0x75} } The CID (class ID) doesn't need to match the IID (interface ID). In fact, it probably shouldn't. I'd change the CID to a new random value and leave the IID unchanged, since you didn't modify the interfacce. >diff --git a/toolkit/components/url-classifier/nsIUrlClassifierPrefixSet.idl b/toolkit/components/url-classifier/nsIUrlClassifierPrefixSet.idl >--- a/toolkit/components/url-classifier/nsIUrlClassifierPrefixSet.idl >+++ b/toolkit/components/url-classifier/nsIUrlClassifierPrefixSet.idl >@@ -36,17 +36,17 @@ > * > * ***** END LICENSE BLOCK ***** */ > > #include "nsISupports.idl" > #include "nsIFile.idl" > > interface nsIArray; > >-[scriptable, uuid(7902e243-9b00-4673-9288-1b7fc246a8f8)] >+[scriptable, uuid(f806c3b6-a7cd-4ac9-bb5d-8c8a20a6e175)] > interface nsIUrlClassifierPrefixSet : nsISupports You didn't modify the interface, so no need to change the interface ID. (Unless you want to modify the IID because you changed the contract, requiring the input to setPrefixes to be sorted? I guess that's OK, although it's kind of a bummer that none of the methods are commented here.) > { > void setPrefixes([const, array, size_is(aLength)] in unsigned long aPrefixes, > in unsigned long aLength); > void addPrefixes([const, array, size_is(aLength)] in unsigned long aPrefixes, > in unsigned long aLength); > boolean contains(in unsigned long aPrefix); > boolean probe(in unsigned long aPrefix, in unsigned long aKey, >-nsresult nsUrlClassifierStore::ReadPrefixes(nsTArray<PRUint32>& array, >+nsresult nsUrlClassifierStore::ReadPrefixes(FallibleTArray<PRUint32>& array, > PRUint32 aKey) > { >- mozStorageStatementScoper scoper(mAllPrefixStatement); >+ mozStorageStatementScoper scoper(mAllPrefixGetStatement); >+ mozStorageStatementScoper scoperToo(mAllPrefixCountStatement); > bool hasMoreData; > PRUint32 pcnt = 0; > PRUint32 fcnt = 0; > > #if defined(PR_LOGGING) > PRIntervalTime clockStart = 0; > if (LOG_ENABLED()) { > clockStart = PR_IntervalNow(); > } > #endif > >- while (NS_SUCCEEDED(mAllPrefixStatement->ExecuteStep(&hasMoreData)) && hasMoreData) { >+ // Make sure we allocate no more than we really need, so first >+ // check how much entries there are >+ if (NS_SUCCEEDED(mAllPrefixCountStatement->ExecuteStep(&hasMoreData)) && hasMoreData) { >+ PRUint32 count = mAllPrefixCountStatement->AsInt32(0); >+ if (!array.SetCapacity(count)) { >+ return NS_ERROR_OUT_OF_MEMORY; >+ } >+ } else { >+ return NS_ERROR_FILE_CORRUPTED; >+ } >+ >+ while (NS_SUCCEEDED(mAllPrefixGetStatement->ExecuteStep(&hasMoreData)) && hasMoreData) { > PRUint32 prefixval; > PRUint32 domainval; > PRUint32 size; > >- const PRUint8 *blobdomain = mAllPrefixStatement->AsSharedBlob(0, &size); >+ const PRUint8 *blobdomain = mAllPrefixGetStatement->AsSharedBlob(0, &size); > if (!blobdomain || (size != DOMAIN_LENGTH)) > return false; > > domainval = *(reinterpret_cast<const PRUint32*>(blobdomain)); > >- const PRUint8 *blobprefix = mAllPrefixStatement->AsSharedBlob(1, &size); >+ const PRUint8 *blobprefix = mAllPrefixGetStatement->AsSharedBlob(1, &size); > if (!blobprefix || (size != PARTIAL_LENGTH)) { >- const PRUint8 *blobfull = mAllPrefixStatement->AsSharedBlob(2, &size); >+ const PRUint8 *blobfull = mAllPrefixGetStatement->AsSharedBlob(2, &size); > if (!blobfull || (size != COMPLETE_LENGTH)) { > prefixval = domainval; > fcnt++; > } else { > prefixval = *(reinterpret_cast<const PRUint32*>(blobfull)); > } > } else { > prefixval = *(reinterpret_cast<const PRUint32*>(blobprefix)); > } > > PRUint32 keyedVal; > nsresult rv = KeyedHash(prefixval, domainval, aKey, &keyedVal); > NS_ENSURE_SUCCESS(rv, rv); > >- array.AppendElement(keyedVal); >+ if (!array.AppendElement(keyedVal)) { >+ return NS_ERROR_OUT_OF_MEMORY; >+ } This shouldn't fail because we already called SetCapacity. Can you assert false if it fails here? (You could use MOZ_ASSERT(), which is fatal in debug builds.) The DB isn't going to change under us, will it? >@@ -3618,47 +3639,65 @@ nsresult > nsUrlClassifierDBServiceWorker::ConstructPrefixSet() > { > Telemetry::AutoTimer<Telemetry::URLCLASSIFIER_PS_CONSTRUCT_TIME> timer; > > PRUint32 key; > nsresult rv = mPrefixSet->GetKey(&key); > NS_ENSURE_SUCCESS(rv, rv); > >- nsTArray<PRUint32> array; >+ FallibleTArray<PRUint32> array; > rv = mMainStore.ReadPrefixes(array, key); >- NS_ENSURE_SUCCESS(rv, rv); >+ if (NS_FAILED(rv)) { >+ goto error_bailout; >+ } > > #ifdef HASHFUNCTION_COLLISION_TEST > array.Sort(); > PRUint32 collisions = 0; > for (int i = 1; i < array.Length(); i++) { > if (array[i - 1] == array[i]) { > collisions++; > } > } > LOG(("%d collisions in the set", collisions)); > #endif > >+ if (array.IsEmpty()) { >+ // DB is empty, put a sentinel to show that we loaded it >+ array.AppendElement(0); Array is now fallible; need to check return value. >+ } >+ // SetPrefixes requires sorted arrays >+ array.Sort(); >+ > // clear old tree > rv = mPrefixSet->SetPrefixes(nsnull, 0); > NS_ENSURE_SUCCESS(rv, rv); >- if (array.IsEmpty()) { >- // DB is empty, but put a sentinel to show that we looked >- array.AppendElement(0); >+ >+ // construct new prefixset >+ rv = mPrefixSet->SetPrefixes(array.Elements(), array.Length()); >+ if (NS_FAILED(rv)) { >+ goto error_bailout; > } >- // construct new one >- rv = mPrefixSet->SetPrefixes(array.Elements(), array.Length()); >- NS_ENSURE_SUCCESS(rv, rv); > > // store the new tree to disk > rv = mPrefixSet->StoreToFile(mPSFile); > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "failed to store the prefixset"); > > return NS_OK; >+ >+ error_bailout: >+ // load an empty prefixset so the browser can work >+ array.Clear(); >+ array.AppendElement(0); >+ mPrefixSet->SetPrefixes(array.Elements(), array.Length()); >+ if (rv == NS_ERROR_OUT_OF_MEMORY) { >+ Telemetry::Accumulate(Telemetry::URLCLASSIFIER_PS_OOM, 1); Don't you want to accumulate (URLCLASSIFIER_PS_OOM, 0) on success? Maybe on failure we can fire a memory pressure event and then reschedule the URL classifier refresh to run sooner than in 30 minutes. I guess running without it is better than crashing, but it's not a lot better! You can fire memory pressure with: nsCOMPtr<nsIObserverService> os = services::GetObserverService(); NS_ENSURE_TRUE(os, NS_ERROR_FAILURE); os->NotifyObservers(nsnull, "memory-pressure", NS_LITERAL_STRING("low-memory").get()); >+ } >+ return rv; > } >diff --git a/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp b/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp >--- a/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp >+++ b/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp >@@ -195,21 +195,17 @@ nsUrlClassifierPrefixSet::SetPrefixes(co > LOG(("Clearing PrefixSet")); > mDeltas.Clear(); > mIndexPrefixes.Clear(); > mIndexStarts.Clear(); > mHasPrefixes = false; > } > } > if (aLength > 0) { >- // Ensure they are sorted before adding >- nsTArray<PRUint32> prefixes; >- prefixes.AppendElements(aArray, aLength); >- prefixes.Sort(); >- AddPrefixes(prefixes.Elements(), prefixes.Length()); >+ AddPrefixes(aArray, aLength); > } Could you add debug-only code which checks that the array is sorted, please? Also, AddPrefixes can now return NS_ERROR_OOM, so we need to propagate this error out of SetPrefixes. > return NS_OK; > } > > NS_IMETHODIMP > nsUrlClassifierPrefixSet::AddPrefixes(const PRUint32 * prefixes, PRUint32 aLength) > { >@@ -224,22 +220,28 @@ nsUrlClassifierPrefixSet::AddPrefixes(co > mNewIndexPrefixes.AppendElement(prefixes[0]); > mNewIndexStarts.AppendElement(mNewDeltas.Length()); These variables aren't members, so their names shouldn't start with 'm'. Can you please change that while you're here? > > PRUint32 numOfDeltas = 0; > PRUint32 currentItem = prefixes[0]; > for (PRUint32 i = 1; i < aLength; i++) { > if ((numOfDeltas >= DELTAS_LIMIT) || > (prefixes[i] - currentItem >= MAX_INDEX_DIFF)) { >- mNewIndexStarts.AppendElement(mNewDeltas.Length()); >- mNewIndexPrefixes.AppendElement(prefixes[i]); >+ if (!mNewIndexStarts.AppendElement(mNewDeltas.Length())) { >+ return NS_ERROR_OUT_OF_MEMORY; >+ } >+ if (!mNewIndexPrefixes.AppendElement(prefixes[i])) { >+ return NS_ERROR_OUT_OF_MEMORY; >+ } > numOfDeltas = 0; > } else { > PRUint16 delta = prefixes[i] - currentItem; >- mNewDeltas.AppendElement(delta); >+ if (!mNewDeltas.AppendElement(delta)) { >+ return NS_ERROR_OUT_OF_MEMORY; >+ } Don't you need to change mNewIndexPrefixes and friends changed from nsTArray to FallibleTArray? Actually, I'm kind of confused about this whole method: It looks like we copy mIndexPrefixes/mStarts/mDeltas, modify them, and then discard the originals. Why don't we just modify the originals?

Justin Lebar (not reading bugmail)

Comment 34

•

13 years ago

Comment on attachment 585702 [details] [diff] [review] Patch 2. Fix double allocation. Reduce peak size. Allow fallible alloc. (I don't feel comfortable enough with this code to give a formal review, but I'm happy to provide feedback.)

Attachment #585702 - Flags: review?(justin.lebar+bug) → feedback-

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 35

•

13 years ago

>(Unless you want to modify the IID because you changed the contract, requiring >the input to setPrefixes to be sorted? I guess that's OK, although it's kind >of a bummer that none of the methods are commented here.) Right, the idea is that the interface *did* change. I guess I should move the comments from the .h to the .idl. >Array is now fallible; need to check return value. OK. In the error_bailout case we absolutely need it to work, so I guess I should use an nsAutoTArray<PRUint32, 1> there. >Don't you want to accumulate (URLCLASSIFIER_PS_OOM, 0) on success? None of the HISTOGRAM_BOOLEAN users I looked at does this, they only signal the "true" case. I think that's because the HISTOGRAM_BOOLEAN's are just normal HISTOGRAMS with 2 bins and not doing anything is the same as accumulating 0. (CC taras) >Also, AddPrefixes can now return NS_ERROR_OOM, so we need to propagate this >error out of SetPrefixes. Oh, f---, missing that is a serious mistake :P >Actually, I'm kind of confused about this whole method: It looks like we copy >mIndexPrefixes/mStarts/mDeltas, modify them, and then discard the originals. > >Why don't we just modify the originals? Because they're being accessed by another thread. Note that we don't take the lock at the entry of that function, only just before the pointer swaps. Now, actually, the way these functions are being used is completely wrong. AddPrefixes was meant to be used to construct the PrefixSet gradually, while not blocking the main thread. But this ended up not being possible, and what is in there right now ends up blocking the main thread anyway because the SetPrefixes(nsnull) clearing in the callers. SetPrefixes should be folded in with AddPrefixes.

Andrew McCreight [:mccr8]

Comment 36

•

13 years ago

I don't know if it is right or not, but I signal both cases in the boolean telemetry in bug 706164.

Alex Keybl [:akeybl]

Updated

•

13 years ago

tracking-firefox11: ? → +

(dormant account)

Comment 37

•

13 years ago

gcp's telemetry approach is ok. Telemetry::Accumulate(Telemetry::URLCLASSIFIER_PS_OOM, rv == NS_ERROR_OUT_OF_MEMORY) would produce more data, but not it's clear that it would be very interesting. Due to limitations in current telemetry web ui, gcp's existing approach would actually be easier to read.

(dormant account)

Comment 38

•

13 years ago

In most cases it makes sense to report 0/1 because you a counting a) frequency of something occurring b) distribution of binary outcome We are pretty much guaranteed to do url classifier stuff on startup, so recording a) is pointless... existing approach is fine for b)

Justin Lebar (not reading bugmail)

Comment 39

•

13 years ago

> existing approach is fine for b) How do we get this information out of the histogram with the approach currently in the patch? Aren't we just going to get a graph which says "N out-of-memory events in this time period"? Then we'd have to correlate this with number of startup in order to figure out how frequently the OOM occurs? That's kind of lame.

(dormant account)

Comment 40

•

13 years ago

(In reply to Justin Lebar [:jlebar] from comment #39) > > existing approach is fine for b) > > How do we get this information out of the histogram with the approach > currently in the patch? Aren't we just going to get a graph which says "N > out-of-memory events in this time period"? Then we'd have to correlate this > with number of startup in order to figure out how frequently the OOM occurs? > That's kind of lame. or compare to frequency of another metric. Works fine for tracking extremes.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 41

•

13 years ago

>We are pretty much guaranteed to do url classifier stuff on startup, so recording >a) is pointless... existing approach is fine for b) What I specifically want is being able to look at what telemetry submissions had OOM=1, and then compare *all other data* for that submission to "normal" data. I'm less interested in knowing the frequency than am I in figuring out what's causing it. We can directly infer that a Telemetry submission that does not have OOM=1 means OOM=0, right? It's one or the other. I understand this is what you mean with (b).

(dormant account)

Comment 42

•

13 years ago

Yup, sounds like your approach is perfect for that. You'll have to get a hadooper to pull the data for you, we don't yet filter by histogram values

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 43

•

13 years ago

Attached patch Patch 2. v2 Fix double allocation. Reduce peak size. Fallible alloc. — Details — Splinter Review

Attachment #585702 - Attachment is obsolete: true

Attachment #585702 - Flags: feedback?(dcamp)

Attachment #586108 - Flags: review?(dcamp)

Attachment #586108 - Flags: feedback?(justin.lebar+bug)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 44

•

13 years ago

Attached patch Patch 3. Fallback to SQLite lookups at doomsday — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=af13fe4bfb0d Patch 2 implements the review comments. It also gets rid of gradual construction in AddPrefixes and reworks it into MakePrefixSet, which no longer blocks the main thread during PrefixSet construction. This code will disable SafeBrowsing if we run out of memory (and send telemetry back). Patch 3 implements the fallback to SQLite (instead of disabling) in the latter case. This will make the browser slower, but perhaps its better than crashing or running with SafeBrowsing disabled. If the users system is already in the state where we can't allocate a few MB without OOM, it's probably not going to be "fast" to begin with. Unfortunately, this isn't a trivial change due to the thread interactions.

Attachment #586111 - Flags: review?(dcamp)

Attachment #586111 - Flags: feedback?(justin.lebar+bug)

Scoobidiver (away)

Reporter

Comment 45

•

13 years ago

I added the Mac crash signature.

Crash Signature: unsigned int) | nsTArray<gfxGlyphExtents*, nsTArrayDefaultAllocator>::AppendElements<gfxGlyphExtents*>(gfxGlyphExtents* co... ] → unsigned int) | nsTArray<gfxGlyphExtents*, nsTArrayDefaultAllocator>::AppendElements<gfxGlyphExtents*>(gfxGlyphExtents* co... ] [@ mozalloc_abort | mozalloc_handle_oom | moz_xrealloc]

Depends on: 636113

OS: Windows XP → All

Justin Lebar (not reading bugmail)

Comment 46

•

13 years ago

Comment on attachment 586108 [details] [diff] [review] Patch 2. v2 Fix double allocation. Reduce peak size. Fallible alloc. Just some nits on the comments; otherwise, it looks sane to me! > interface nsIUrlClassifierPrefixSet : nsISupports > { >+ // Can send an empty Array to clear the tree. A truly "empty tree" >+ // cannot be represented, so put a sentinel value if that is required >+ // Requires array to be sorted > void setPrefixes([const, array, size_is(aLength)] in unsigned long aPrefixes, > in unsigned long aLength); Although it may be obvious from the name, please use the first sentence of the comment to describe what the method does. Please check the punctiation here (missing periods at end of second and third lines). >+ // Does the PrefixSet contain this prefix? not thread-safe > boolean contains(in unsigned long aPrefix); Please capitalize "not". Also, if this method is not thread-safe, are the rest of them thread-safe? >+ // Do a lookup in the PrefixSet >+ // if aReady is set, we will block until there are any entries >+ // if not set, we will return in aReady whether we were ready or not > boolean probe(in unsigned long aPrefix, in unsigned long aKey, > inout boolean aReady); Again, please punctuate and capitalize this properly. >+ // Return a key that is used to randomize the collisions in the prefixes > PRUint32 getKey(); Does this return "a" or "the" key used to randomize collisions?

Attachment #586108 - Flags: feedback?(justin.lebar+bug) → feedback+

Justin Lebar (not reading bugmail)

Comment 47

•

13 years ago

Comment on attachment 586111 [details] [diff] [review] Patch 3. Fallback to SQLite lookups at doomsday Have you used a browser with the SQLite fallback forced-on? Is it noticeably slower? Is it usable?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 48

•

13 years ago

>Have you used a browser with the SQLite fallback forced-on? Is it noticeably >slower? Is it usable? I forced it on to test the codepath. I didn't notice anything, but this box has 6G RAM and an SSD, so that's meaningless. I'm obviously not going to see a real OOM on this system either.

Justin Lebar (not reading bugmail)

Comment 49

•

13 years ago

It's one async SQLite lookup for each HTML document retrieved, is that right?

Justin Lebar (not reading bugmail)

Updated

•

13 years ago

Attachment #586111 - Flags: feedback?(justin.lebar+bug) → feedback+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 50

•

13 years ago

>It's one async SQLite lookup for each HTML document retrieved, is that right? The lookup is async, but it blocks the page load. Two lookups due to the domain being tried in different permutations.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 51

•

13 years ago

>The lookup is async, but it blocks the page load. Two lookups due to the domain >being tried in different permutations. And not only for the HTML, also for the JS and CSS.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 52

•

13 years ago

>Also, if this method is not thread-safe, are the rest of them thread-safe? Yes, as implied by the IMPL_THREADSAFE, I think. This method can probably be removed from the XPCOM interface.

OS: All → Windows XP

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

OS: Windows XP → All

Justin Lebar (not reading bugmail)

Comment 53

•

13 years ago

> Yes, as implied by the IMPL_THREADSAFE, I'd like all the thread-safe/unsafe-ness to be explicitly laid out in the interface comments, if we're going the route of documenting the interface.

Dave Camp (:dcamp)

Updated

•

13 years ago

Attachment #586108 - Flags: review?(dcamp) → review+

Dave Camp (:dcamp)

Updated

•

13 years ago

Attachment #586111 - Flags: review?(dcamp) → review+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 54

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a0fb6ed985c6 https://hg.mozilla.org/integration/mozilla-inbound/rev/9bee22d394a9

Ed Morley [:emorley]

Comment 55

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/a0fb6ed985c6 https://hg.mozilla.org/mozilla-central/rev/9bee22d394a9

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Target Milestone: Firefox 11 → Firefox 12

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Attachment #586108 - Flags: approval-mozilla-aurora?

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Attachment #586111 - Flags: approval-mozilla-aurora?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 56

•

13 years ago

This has baked a few days in Nightly. I see no new crashes any more. I also don't see any hits on the telemetry that was added, which suggests reducing the peak memory usage made the problem way less likely to trigger. - Fixes a topcrasher. - Reduces peak Firefox memory usage. - Prevents blocking the main thread during updates. User impact if declined: occasional OOM crashes, small responsiveness hits every 30-45minutes Risk to taking this patch (and alternatives if risky): these fixes are more fundamental and involved, and hence touch more code, than the previous patch.

Justin Lebar (not reading bugmail)

Comment 57

•

13 years ago

How many people do you see hitting the SQLite fallback in telemetry?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 58

•

13 years ago

>How many people do you see hitting the SQLite fallback in telemetry? >> I also don't see any hits on the telemetry that was added,...

Alex Keybl [:akeybl]

Comment 59

•

13 years ago

(In reply to Gian-Carlo Pascutto (:gcp) from comment #56) > - Fixes a topcrasher. Sorry if this is hidden in the comments already - did the previously landed patches for aurora/beta not mitigate the issue?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 60

•

13 years ago

The first patch protects against a potential corruption case, but data from the field shows that it is not the cause for the problem as the number of crashes didn't drop. (There's no reason to back out the additional error protection though, so patches 2 and 3 build on it).

Alex Keybl [:akeybl]

Comment 61

•

13 years ago

Comment on attachment 586108 [details] [diff] [review] Patch 2. v2 Fix double allocation. Reduce peak size. Fallible alloc. [Triage Comment] Approved for Aurora - we're going to take the hit on beta given the fact that this carries non-zero risk.

Attachment #586108 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Alex Keybl [:akeybl]

Updated

•

13 years ago

Attachment #586111 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 62

•

13 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/5ec1ce401db8 https://hg.mozilla.org/releases/mozilla-aurora/rev/1370ecdf1fe5

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 63

•

13 years ago

I know why the telemetry isn't triggering: https://crash-stats.mozilla.com/report/index/1d2909f4-3a8f-461e-9948-0c0772120118 We're getting an OOM abort when a FallibleTArray is calling SetCapacity. This appears to be in violation of SetCapacity's documented behavior. So this looks like a bug in nsTArray or in our memory allocation, not in the code fixed here.

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Depends on: 719531

Scoobidiver (away)

Reporter

Comment 64

•

13 years ago

It's currently #9 top crasher in the first day of 10.0b5.

status-firefox10: --- → affected

status-firefox11: --- → fixed

status-firefox12: --- → fixed

status-firefox9: --- → affected

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 65

•

13 years ago

I've looked at a bunch of crash reports, and they mostly look like: "TotalVirtualMemory": "2147352576" "AvailableVirtualMemory": "1848963072", "SystemMemoryUsePercentage": "90" "TotalVirtualMemory": "2147352576" "AvailableVirtualMemory": "1977462784", "SystemMemoryUsePercentage": "80" These people have plenty of free RAM.

Justin Lebar (not reading bugmail)

Comment 66

•

13 years ago

In the second case, we're reporting that the machine is using 162mb of virtual memory. But on my Windows 7 machine, Firefox uses more than 200mb of vmem after I start up and load about:memory. Are these crashes happening during startup?

Scoobidiver (away)

Reporter

Updated

•

13 years ago

Crash Signature: unsigned int) | nsTArray<gfxGlyphExtents*, nsTArrayDefaultAllocator>::AppendElements<gfxGlyphExtents*>(gfxGlyphExtents* co... ] [@ mozalloc_abort | mozalloc_handle_oom | moz_xrealloc] → unsigned int) | nsTArray<gfxGlyphExtents* nsTArrayDefaultAllocator>::AppendElements<gfxGlyphExtents*>(gfxGlyphExtents* co... ] [@ mozalloc_abort | mozalloc_handle_oom | moz_xrealloc] [@ mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base…

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 67

•

13 years ago

>In the second case, we're reporting that the machine is using 162mb of virtual >memory. But on my Windows 7 machine, Firefox uses more than 200mb of vmem after I >start up and load about:memory. http://msdn.microsoft.com/en-us/library/windows/desktop/aa366589%28v=vs.85%29.aspx http://msdn.microsoft.com/en-us/library/windows/desktop/aa366770%28v=vs.85%29.aspx This is reporting the address space of the Firefox process. So what it's saying is that Firefox is using 162Mb there. To really see the free memory at OOM, we should log ullAvailPageFile. It's true we don't really know how much free RAM there is. >Are these crashes happening during startup? They're all over the place.

Justin Lebar (not reading bugmail)

Comment 68

•

13 years ago

> This is reporting the address space of the Firefox process. So what it's saying is that Firefox is > using 162Mb there. Correct; I should have said "*Firefox* is using 162mb of virtual memory." What I'm saying is: I've never seen Firefox use so little virtual memory, except, presumably while it's in the process of starting up. Are specifically the two crashes you cited startup crashes? (You should be able to tell by the uptime figure.)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 69

•

13 years ago

>Are specifically the two crashes you cited startup crashes? (You should be able >to tell by the uptime figure.) This is the second one: https://crash-stats.mozilla.com/report/index/ca4a7e4f-33ad-4394-9e9e-f31e32120121 7 seconds, so pretty much. Sampling a few with higher uptimes shows correspondingly more memory in use. This all looks completely normal. Which is creepy.

Justin Lebar (not reading bugmail)

Comment 70

•

13 years ago

What I think we should do is: * Add AvailablePagefile, like you suggested, and see what that reports. * If we're seeing very low numbers there, consider enabling MALLOC_PAGEFILE in jemalloc for Windows, which will let jemalloc swap even when the machine is out of swap. * In parallel, fix bug 716638, which will let us report how much memory is being requested with malloc. (It's still conceivable that we're somehow requesting way too much memory with malloc.) gcp, do you mind filing a bug on the first item there?

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Depends on: 720444

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 71

•

13 years ago

Attached patch Patch 4. Fix a potential OOM in the new code. Reorder telemetry. (obsolete) — Details — Splinter Review

The new code from bug 673470 has a path that should've used a fallible array but didn't. Patch attached. After Bug 719531 landed, I now see URLCLASSIFIER_OOM Telemetry hits. So we're still OOMing here in some cases. It's too bad not crashing here makes us unable to use the extra information Bug 720444 or Bug 716638 would bring. This is a top 20 crasher in 10.0 so I would've liked to understand the underlying issue. Taras, does it make sense to add Telemetry for what Bug 720444 gathers? We could then look at the Telemetry submissions with OOM=1 and see if they are really low on memory. I think reordering the Telemetry recording (done in the patch) before the memory allocation will give us information on the allocation size similar to bug 716638 does.

Attachment #595994 - Flags: review?(dcamp)

Attachment #595994 - Flags: feedback?(justin.lebar+bug)

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

status-firefox13: --- → affected

Scoobidiver (away)

Reporter

Comment 72

•

13 years ago

It's hard to follow with tracking flags as it's marked as fixed in 11 and 12, but not in 13. But according to comments: * patch 1 landed in Fx 9 * patch 2 & 3 landed in Fx 11 * patch 4 will land in Fx 13

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 73

•

13 years ago

Correct. The code fixed here (in ff9<->ff12) was replaced in Firefox 13, and the new code was missing a fix, so it could potentially crash again with this or a similar signature.

Scoobidiver (away)

Reporter

Updated

•

13 years ago

status-firefox11: fixed → affected

status-firefox12: fixed → affected

Scoobidiver (away)

Reporter

Updated

•

13 years ago

Target Milestone: Firefox 12 → Firefox 13

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 74

•

13 years ago

The problem is in Firefox 13, not in the others. They don't even have the files the patch applies to.

status-firefox11: affected → fixed

status-firefox12: affected → fixed

Justin Lebar (not reading bugmail)

Comment 75

•

13 years ago

This new patch needs to be in a separate bug, please. Too many landings here.

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Attachment #595994 - Attachment is obsolete: true

Attachment #595994 - Flags: review?(dcamp)

Attachment #595994 - Flags: feedback?(justin.lebar+bug)

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

13 years ago

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

status-firefox13: affected → ---

Resolution: --- → FIXED

Target Milestone: Firefox 13 → Firefox 12

Scoobidiver (away)

Reporter

Comment 76

•

12 years ago

It's currently #4 top browser crasher in 10.0.5 ESR.

status-firefox-esr10: --- → affected

tracking-firefox-esr10: --- → ?

u279076

Comment 77

•

12 years ago

(In reply to Scoobidiver from comment #76) > It's currently #4 top browser crasher in 10.0.5 ESR. What's the crash data look like for Firefox 11 and 12? Given that QA does not have a reproducible case for verification, and assuming 11 and 12 show no crashes, it might be worth considering landing this fix on ESR10. Follow-up question, do we let this fix ride to ESR17, fix in time for a normally scheduled 10.0.6 ESR release, or fix for a quick 10.0.6 ESR release?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 78

•

12 years ago

This likely needs bug 719531 to land too to be effective.

Lukas Blakk [:lsblakk] use ?needinfo

Comment 79

•

12 years ago

(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #77) > it might be worth considering landing this fix on ESR10. > Follow-up question, do we let this fix ride to ESR17, fix in time for a > normally scheduled 10.0.6 ESR release, or fix for a quick 10.0.6 ESR release? Sure thing, stability fix for ESR is valid and we can take this for the next ESR 10.0.6 instead of waiting until ESR17 - won't do a chemspill ESR for this though. Please go ahead and nominate for ESR landing

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

tracking-firefox-esr10: ? → 14+

Alex Keybl [:akeybl]

Comment 80

•

12 years ago

We're still waiting for a landing for this patch and the one in bug 719531 on ESR to fix this top ESR crasher.

Justin Lebar (not reading bugmail)

Comment 81

•

12 years ago

I guess I'll nom the patches, then...

Justin Lebar (not reading bugmail)

Comment 82

•

12 years ago

Comment on attachment 578214 [details] [diff] [review] Patch 1. Sanity check prefixes size. Pass down error codes. esr10? per comment 80.

Attachment #578214 - Flags: approval-mozilla-esr10?

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Attachment #586108 - Flags: approval-mozilla-esr10?

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Attachment #586111 - Flags: approval-mozilla-esr10?

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 83

•

12 years ago

I sent some email 2012-07-22 to point out I didn't think I could land this. Since it seems to have been missed, I'll repost here: This bug depends on Bug 719531 to have any effect. I took a look at applying the patches to the mozilla-esr10 tree. Bug 719531 applies pretty cleanly, but bug 702217 needed some rebasing (though nothing that looked particularly worrying). However, the result doesn't compile because it depends on HISTOGRAM_BOOLEAN (and potentially other Telemetry changes) which are not in the esr10 tree. Now, I could go and fix that up as well, but that's already 2 layers of "new" rebasing changes. And ESR releases do not go through nightly-aurora-beta as far as I know, so making new patches would be releasing untested code.

Justin Lebar (not reading bugmail)

Comment 84

•

12 years ago

Oh, I remember that e-mail now. I'm sorry I forgot it, Gian-Carlo.

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Attachment #578214 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Attachment #586108 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Attachment #586111 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+

Lukas Blakk [:lsblakk] use ?needinfo

Comment 85

•

12 years ago

Thanks for the clarification Gian-Carlo, if we don't have the Telemetry code in the esr10 then I'm going to wontfix this since it doesn't meet ESR criteria anyway (will also wontfix bug 719531 for ESR). The next ESR (17) will have this included.

status-firefox-esr10: affected → wontfix

tracking-firefox-esr10: 14+ → -

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Attachment #578214 - Flags: approval-mozilla-esr10+

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Attachment #586108 - Flags: approval-mozilla-esr10+

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Attachment #586111 - Flags: approval-mozilla-esr10+

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: Firefox → Toolkit

Patch 1. Sanity check prefixes size. Pass down error codes. 13 years ago Gian-Carlo Pascutto [:gcp] 3.89 KB, patch	dcamp : review+ akeybl : approval-mozilla-aurora+ akeybl : approval-mozilla-beta+	Details \| Diff \| Splinter Review
Patch 2. Fix double allocation. Reduce peak size. Allow fallible alloc. 13 years ago Gian-Carlo Pascutto [:gcp] 17.16 KB, patch	justin.lebar+bug : feedback-	Details \| Diff \| Splinter Review
Patch 2. v2 Fix double allocation. Reduce peak size. Fallible alloc. 13 years ago Gian-Carlo Pascutto [:gcp] 20.57 KB, patch	dcamp : review+ justin.lebar+bug : feedback+ akeybl : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
Patch 3. Fallback to SQLite lookups at doomsday 13 years ago Gian-Carlo Pascutto [:gcp] 6.32 KB, patch	dcamp : review+ justin.lebar+bug : feedback+ akeybl : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
Patch 4. Fix a potential OOM in the new code. Reorder telemetry. 13 years ago Gian-Carlo Pascutto [:gcp] 2.72 KB, patch		Details \| Diff \| Splinter Review