This bug was filed from the Socorro interface and is report bp-2f54f011-7dfc-4342-8fbf-1116c2140730. ============================================================= nightly was running in the background and crashed Crashing Thread Frame Module Signature Source 0 XUL nsUrlClassifierPrefixSet::MakePrefixSet(unsigned int const*, unsigned int) obj-firefox/x86_64/dist/include/nsTArray.h 1 XUL nsUrlClassifierPrefixSet::SetPrefixes(unsigned int const*, unsigned int) toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp 2 XUL mozilla::safebrowsing::LookupCache::ConstructPrefixSet(FallibleTArray<mozilla::safebrowsing::AddPrefix>&) toolkit/components/url-classifier/LookupCache.cpp 3 XUL mozilla::safebrowsing::LookupCache::Build(FallibleTArray<mozilla::safebrowsing::AddPrefix>&, FallibleTArray<mozilla::safebrowsing::AddComplete>&) toolkit/components/url-classifier/LookupCache.cpp 4 XUL mozilla::safebrowsing::Classifier::ApplyTableUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*, nsACString_internal const&) toolkit/components/url-classifier/Classifier.cpp 5 XUL mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*) toolkit/components/url-classifier/Classifier.cpp 6 XUL nsUrlClassifierDBServiceWorker::FinishUpdate() toolkit/components/url-classifier/nsUrlClassifierDBService.cpp 7 XUL nsRunnableMethodImpl<tag_nsresult (nsIUrlClassifierDBService::*)(), void, true>::Run() obj-firefox/x86_64/dist/include/nsThreadUtils.h 8 XUL nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 9 XUL NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp 10 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp 11 XUL MessageLoop::Run() ipc/chromium/src/base/message_loop.cc 12 XUL nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp 13 libnss3.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c Ø 14 libsystem_pthread.dylib libsystem_pthread.dylib@0x1898 Ø 15 libsystem_pthread.dylib libsystem_pthread.dylib@0x1729 Ø 16 libsystem_pthread.dylib libsystem_pthread.dylib@0x5fc8 17 libnss3.dylib libnss3.dylib@0x10668f
This crash trace is unusable, but the Windows ones are more helpful: https://crash-stats.mozilla.com/report/index/de43cbdf-d8ea-4add-9c75-703d32140725 http://hg.mozilla.org/releases/mozilla-release/annotate/529a45c94e5a/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#l118 http://hg.mozilla.org/releases/mozilla-release/annotate/529a45c94e5a/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.h#l64 mDeltas is fallible but the code doesn't deal with that anywhere.
Created attachment 8465466 [details] [diff] [review] Patch 1. Make core arrays unfallible If we don't have the memory to store the core SafeBrowsing data, we can't really do anything except silently disable the feature. I don't think we want that (because then malware can just try to OOM us), so these have to be infallible. They should always have been, but the big rewrite to the new storage engine added reading them directly instead of into a temporary (fallible) copy and the code using these arrays wasn't updated.
Comment on attachment 8465466 [details] [diff] [review] Patch 1. Make core arrays unfallible Review of attachment 8465466 [details] [diff] [review]: ----------------------------------------------------------------- What's the big rewrite to the new storage engine? Also, we should figure out what to do with the phishing components sometime in both toolkit and firefox. Technically the bug description for Core::DOM::Security module reads "Native content-based security features including: Content Security Policy (CSP), Mixed Content Blocker (MCB), and Safe Browsing."
>What's the big rewrite to the new storage engine? Bug 673470. We used to rely exclusively on SQLite, which lead to a 60M DB and having to cache it entirely in memory for decent performance. I originally added UrlClassifierPrefixSet to reduce the in-memory part to 1.2M, then eventually got rid of SQLite entirely.
Created attachment 8465623 [details] [diff] [review] Patch 2. Replace big array by array of arrays Comments such as https://bugzilla.mozilla.org/show_bug.cgi?id=995782#c5 and the continuous appearance of SafeBrowsing bugs that involve OOMs with 1-2M allocations even with free virtual/physical memory suggest that making a >1M infallible allocation in Firefox is sufficiently problematic that we shouldn't do it. This is, to me, a sad and strange state of affairs. This patch reworks the ~1.2M deltas array into an array of arrays, matching the actual datastructure, which means it's about 6000 arrays of 100 2-byte elements each. That's obviously not the most memory optimal new layout, but it's what I could reasonably do without making this code significantly more complex, introducing a huge risk of new bugs and likely writing backing segmented array classes - for which I quite frankly don't have time. So my question now is, does this actually improve the situation over a singular contiguous array of about 1.2M? The actual memory usage jumps from about 1.2M to 1.7M, but it will now work if no more than ~200 byte chunks of continuous memory are available. I worry that this may just make the heap fragmentation even worse. mmc r? for checking the algorithmic changes, dmajor/bsmedberg/anyone appropriate f? for the issue of >1M allocations being such a problem and some advice if this patch is a good idea or not.
jemalloc allocates in aligned chunks of 1 MiB, then hands out smaller arenas from those chunks, so I wouldn't expect this to make fragmentation worse. With any luck those 6000 arrays will be handed out from existing chunks. The reason you're ssing it use more memory in practice is probably because jemalloc rounds up to the next available size class, which for a 200-byte array is probably 256 bytes (and there's more administrative overhead in jemalloc). IIUC the 1.2MiB allocation probably used a huge allocation, which actually rounds up to the next MiB (though on Windows, we decommit the unused parts).
Comment on attachment 8465623 [details] [diff] [review] Patch 2. Replace big array by array of arrays > dmajor/bsmedberg/anyone appropriate f? for the issue of >1M allocations > being such a problem and some advice if this patch is a good idea or not. In general, yes, >1M allocations are a problem on 32-bit builds. At high memory usage, most of the "holes" of free space are 1MB-sized regions that jemalloc has deallocated. According to the diagram in jemalloc.c, 200 bytes isn't a horrible size from a slop perspective, but Nick should have a look for other gotchas.
I can tune the size of the subarrays, within reason, say between 50 and 500 elements, if that improves things or moves us to an optimum. As you can see in the code it's nsTArray<nsTArray<uint16_t> >. >The reason you're ssing it use more memory in practice.. I was expecting that if only for the reason that every 100 entries now require an extra pointer. >IIUC the 1.2MiB allocation probably used a huge allocation, which actually rounds up to the next MiB (though on Windows, we decommit the unused parts). That' surprising, because: http://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/LookupCache.cpp?from=LookupCache.cpp&case=true#635 http://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp?from=nsUrlClassifierPrefixSet.cpp&case=true#263 report non-rounded up values. Our memory reporting doesn't take slack into account?
> Our memory reporting doesn't take slack into account? It does. The 1 MiB+ cases are a bit weird. The amount of address space allocated is a multiple of 1 MiB, but only the requested size gets committed. So the memory reporters only report that committed size (which is the requested size rounded up to the nearest page size).
So this patch should actually reduce the VM memory usage but make the physical memory situation slightly worse. Funny. I guess I can tune the subarrays then by just looking at what results in the lowest memory usage, rather than trying to determine it by counting and then hitting the worst case when I'm off-by-one.
Comment on attachment 8465623 [details] [diff] [review] Patch 2. Replace big array by array of arrays Review of attachment 8465623 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp @@ +317,5 @@ > + uint32_t numInDelta = i == indexSize - 1 ? deltaSize - indexStarts[i] > + : indexStarts[i + 1] - indexStarts[i]; > + mIndexDeltas[i].SetLength(numInDelta); > + mTotalPrefixes += numInDelta; > + toRead = numInDelta*sizeof(uint16_t); spaces around * ::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.h @@ +61,1 @@ > // array containing deltas from indices. This comment is out of date.
Comment on attachment 8465623 [details] [diff] [review] Patch 2. Replace big array by array of arrays Review of attachment 8465623 [details] [diff] [review]: ----------------------------------------------------------------- My feedback: measure the new memory usage via about:memory. Try not to make it much bigger than it was previously. There's obviously a trade-off here between total memory usage and maximum allocation size. I don't have a good sense what kind of reduction in virtual OOMs this is likely to cause.
I made the arrays 120 elements (240 bytes), which seems a good compromise between minimal slop and not accidentally hitting the worst case when we need to store an extra pointer, length field and/or go 64-bit. Memory usage (for the arrays) increased by ~6%, which seems very acceptable for a big win in dealing with VM fragmentation.
I just crashed with this stack, using Firefox32beta :( https://crash-stats.mozilla.com/report/index/d5534af1-1a30-4300-ad7a-60cc92140820
This appears to be quite low volume and a Mac OS X specific signature, so I'm not too keen in uplifting aggressively.
Gian-Carlo, is there any particular way we can verify this bug or add tests for it, other than having a look at crash-stats? Thanks!
I think crash stats are your most reasonable bet.
Looking in Socorro  for crashes with this signature over the past 4 weeks, I see that there are 6 crashes on Firefox 34 Beta, and 2 crashes on Firefox 35 Aurora (0 crashes on 36 Nightly). Given the very low number of crashes I think we can call this fixed. Please reopen if you think otherwise.  https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=nsUrlClassifierPrefixSet%3A%3AMakePrefixSet%28unsigned+int+const%2A%2C+unsigned+int%29#tab-sigsummary