Closed Bug 1046038 Opened 10 years ago Closed 10 years ago

crash in nsUrlClassifierPrefixSet::MakePrefixSet(unsigned int const*, unsigned int)

Categories

(Toolkit :: Safe Browsing, defect)

All
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla34
Tracking Status
firefox34 --- verified

People

(Reporter: cbook, Assigned: gcp)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-2f54f011-7dfc-4342-8fbf-1116c2140730.
=============================================================

nightly was running in the background and crashed

Crashing Thread
Frame 	Module 	Signature 	Source
0 	XUL 	nsUrlClassifierPrefixSet::MakePrefixSet(unsigned int const*, unsigned int) 	obj-firefox/x86_64/dist/include/nsTArray.h
1 	XUL 	nsUrlClassifierPrefixSet::SetPrefixes(unsigned int const*, unsigned int) 	toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp
2 	XUL 	mozilla::safebrowsing::LookupCache::ConstructPrefixSet(FallibleTArray<mozilla::safebrowsing::AddPrefix>&) 	toolkit/components/url-classifier/LookupCache.cpp
3 	XUL 	mozilla::safebrowsing::LookupCache::Build(FallibleTArray<mozilla::safebrowsing::AddPrefix>&, FallibleTArray<mozilla::safebrowsing::AddComplete>&) 	toolkit/components/url-classifier/LookupCache.cpp
4 	XUL 	mozilla::safebrowsing::Classifier::ApplyTableUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*, nsACString_internal const&) 	toolkit/components/url-classifier/Classifier.cpp
5 	XUL 	mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*) 	toolkit/components/url-classifier/Classifier.cpp
6 	XUL 	nsUrlClassifierDBServiceWorker::FinishUpdate() 	toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
7 	XUL 	nsRunnableMethodImpl<tag_nsresult (nsIUrlClassifierDBService::*)(), void, true>::Run() 	obj-firefox/x86_64/dist/include/nsThreadUtils.h
8 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
9 	XUL 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp
10 	XUL 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
11 	XUL 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
12 	XUL 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp
13 	libnss3.dylib 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c
Ø 14 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x1898 	
Ø 15 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x1729 	
Ø 16 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x5fc8 	
17 	libnss3.dylib 	libnss3.dylib@0x10668f
Component: Phishing Protection → DOM: Security
Product: Toolkit → Core
Component: DOM: Security → Phishing Protection
Product: Core → Toolkit
If we don't have the memory to store the core SafeBrowsing data, we can't really do anything except silently disable the feature. I don't think we want that (because then malware can just try to OOM us), so these have to be infallible.

They should always have been, but the big rewrite to the new storage engine added reading them directly instead of into a temporary (fallible) copy and the code using these arrays wasn't updated.
Assignee: nobody → gpascutto
Status: NEW → ASSIGNED
Attachment #8465466 - Flags: review?(mmc)
Comment on attachment 8465466 [details] [diff] [review]
Patch 1. Make core arrays unfallible

Review of attachment 8465466 [details] [diff] [review]:
-----------------------------------------------------------------

What's the big rewrite to the new storage engine?

Also, we should figure out what to do with the phishing components sometime in both toolkit and firefox. Technically the bug description for Core::DOM::Security module reads "Native content-based security features including: Content Security Policy (CSP), Mixed Content Blocker (MCB), and Safe Browsing."
Attachment #8465466 - Flags: review?(mmc) → review+
>What's the big rewrite to the new storage engine?

Bug 673470. We used to rely exclusively on SQLite, which lead to a 60M DB and having to cache it entirely in memory for decent performance. I originally added UrlClassifierPrefixSet to reduce the in-memory part to 1.2M, then eventually got rid of SQLite entirely.
Comments such as https://bugzilla.mozilla.org/show_bug.cgi?id=995782#c5 and the continuous appearance of SafeBrowsing bugs that involve OOMs with 1-2M allocations even with free virtual/physical memory suggest that making a >1M infallible allocation in Firefox is sufficiently problematic that we shouldn't do it. This is, to me, a sad and strange state of affairs.

This patch reworks the ~1.2M deltas array into an array of arrays, matching the actual datastructure, which means it's about 6000 arrays of 100 2-byte elements each. That's obviously not the most memory optimal new layout, but it's what I could reasonably do without making this code significantly more complex, introducing a huge risk of new bugs and likely writing backing segmented array classes - for which I quite frankly don't have time.

So my question now is, does this actually improve the situation over a singular contiguous array of about 1.2M? The actual memory usage jumps from about 1.2M to 1.7M, but it will now work if no more than ~200 byte chunks of continuous memory are available. I worry that this may just make the heap fragmentation even worse.

mmc r? for checking the algorithmic changes, dmajor/bsmedberg/anyone appropriate f? for the issue of >1M allocations being such a problem and some advice if this patch is a good idea or not.
Attachment #8465623 - Flags: review?(mmc)
Attachment #8465623 - Flags: feedback?(dmajor)
Attachment #8465623 - Flags: feedback?(benjamin)
jemalloc allocates in aligned chunks of 1 MiB, then hands out smaller arenas from those chunks, so I wouldn't expect this to make fragmentation worse. With any luck those 6000 arrays will be handed out from existing chunks. The reason you're ssing it use more memory in practice is probably because jemalloc rounds up to the next available size class, which for a 200-byte array is probably 256 bytes (and there's more administrative overhead in jemalloc). IIUC the 1.2MiB allocation probably used a huge allocation, which actually rounds up to the next MiB (though on Windows, we decommit the unused parts).
Comment on attachment 8465623 [details] [diff] [review]
Patch 2. Replace big array by array of arrays

> dmajor/bsmedberg/anyone appropriate f? for the issue of >1M allocations 
> being such a problem and some advice if this patch is a good idea or not.

In general, yes, >1M allocations are a problem on 32-bit builds. At high memory usage, most of the "holes" of free space are 1MB-sized regions that jemalloc has deallocated.

According to the diagram in jemalloc.c, 200 bytes isn't a horrible size from a slop perspective, but Nick should have a look for other gotchas.
Attachment #8465623 - Flags: feedback?(n.nethercote)
Attachment #8465623 - Flags: feedback?(dmajor)
Attachment #8465623 - Flags: feedback?(benjamin)
I can tune the size of the subarrays, within reason, say between 50 and 500 elements, if that improves things or moves us to an optimum. As you can see in the code it's nsTArray<nsTArray<uint16_t> >.

>The reason you're ssing it use more memory in practice..

I was expecting that if only for the reason that every 100 entries now require an extra pointer.

>IIUC the 1.2MiB allocation probably used a huge allocation, which actually rounds up to the next MiB (though on Windows, we decommit the unused parts).

That' surprising, because:

http://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/LookupCache.cpp?from=LookupCache.cpp&case=true#635
http://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp?from=nsUrlClassifierPrefixSet.cpp&case=true#263

report non-rounded up values. Our memory reporting doesn't take slack into account?
> Our memory reporting doesn't take slack into account?

It does. The 1 MiB+ cases are a bit weird. The amount of address space allocated is a multiple of 1 MiB, but only the requested size gets committed. So the memory reporters only report that committed size (which is the requested size rounded up to the nearest page size).
So this patch should actually reduce the VM memory usage but make the physical memory situation slightly worse. Funny. I guess I can tune the subarrays then by just looking at what results in the lowest memory usage, rather than trying to determine it by counting and then hitting the worst case when I'm off-by-one.
Comment on attachment 8465623 [details] [diff] [review]
Patch 2. Replace big array by array of arrays

Review of attachment 8465623 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp
@@ +317,5 @@
> +        uint32_t numInDelta = i == indexSize - 1 ? deltaSize - indexStarts[i]
> +                                                 : indexStarts[i + 1] - indexStarts[i];
> +        mIndexDeltas[i].SetLength(numInDelta);
> +        mTotalPrefixes += numInDelta;
> +        toRead = numInDelta*sizeof(uint16_t);

spaces around *

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.h
@@ +61,1 @@
>    // array containing deltas from indices.

This comment is out of date.
Attachment #8465623 - Flags: review?(mmc) → review+
Comment on attachment 8465623 [details] [diff] [review]
Patch 2. Replace big array by array of arrays

Review of attachment 8465623 [details] [diff] [review]:
-----------------------------------------------------------------

My feedback: measure the new memory usage via about:memory. Try not to make it much bigger than it was previously. There's obviously a trade-off here between total memory usage and maximum allocation size. I don't have a good sense what kind of reduction in virtual OOMs this is likely to cause.
Attachment #8465623 - Flags: feedback?(n.nethercote)
I made the arrays 120 elements (240 bytes), which seems a good compromise between minimal slop and not accidentally hitting the worst case when we need to store an extra pointer, length field and/or go 64-bit. Memory usage (for the arrays) increased by ~6%, which seems very acceptable for a big win in dealing with VM fragmentation.
https://hg.mozilla.org/mozilla-central/rev/25c173040e1f
https://hg.mozilla.org/mozilla-central/rev/9d24ecc84a50
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
This appears to be quite low volume and a Mac OS X specific signature, so I'm not too keen in uplifting aggressively.
Gian-Carlo, is there any particular way we can verify this bug or add tests for it, other than having a look at crash-stats? Thanks!
Flags: needinfo?(gpascutto)
Flags: qe-verify?
I think crash stats are your most reasonable bet.
Flags: needinfo?(gpascutto)
Flags: qe-verify? → qe-verify+
Looking in Socorro [1] for crashes with this signature over the past 4 weeks, I see that there are 6 crashes on Firefox 34 Beta, and 2 crashes on Firefox 35 Aurora (0 crashes on 36 Nightly). 

Given the very low number of crashes I think we can call this fixed. Please reopen if you think otherwise.

[1] https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=nsUrlClassifierPrefixSet%3A%3AMakePrefixSet%28unsigned+int+const%2A%2C+unsigned+int%29#tab-sigsummary
Status: RESOLVED → VERIFIED
See Also: → 1199617
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: