Closed
Bug 1074196
Opened 10 years ago
Closed 10 years ago
Nightly/Aurora startup crash spike in nsUrlClassifierPrefixSet::Contains(unsigned int, bool*)
Categories
(Toolkit :: Safe Browsing, defect)
Toolkit
Safe Browsing
Tracking
()
VERIFIED
FIXED
mozilla35
Tracking | Status | |
---|---|---|
firefox32 | --- | unaffected |
firefox33 | --- | unaffected |
firefox34 | + | verified |
firefox35 | + | verified |
People
(Reporter: kairo, Assigned: gcp)
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
5.99 KB,
patch
|
mmc
:
review+
lmandel
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-fd16294d-0bbf-4530-b2d4-65c2d2140929. ============================================================= Top frames: 0 xul.dll nsUrlClassifierPrefixSet::Contains(unsigned int, bool*) toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp 1 xul.dll mozilla::safebrowsing::LookupCache::Has(mozilla::safebrowsing::SafebrowsingHash<32, mozilla::safebrowsing::CompletionComparator> const&, bool*, bool*) toolkit/components/url-classifier/LookupCache.cpp 2 xul.dll mozilla::safebrowsing::Classifier::Check(nsACString_internal const&, nsACString_internal const&, nsTArray<mozilla::safebrowsing::LookupResult>&) toolkit/components/url-classifier/Classifier.cpp 3 xul.dll nsUrlClassifierDBServiceWorker::DoLookup(nsACString_internal const&, nsACString_internal const&, nsIUrlClassifierLookupCallback*) toolkit/components/url-classifier/nsUrlClassifierDBService.cpp 4 xul.dll nsUrlClassifierDBServiceWorker::HandlePendingLookups() toolkit/components/url-classifier/nsUrlClassifierDBService.cpp 5 xul.dll nsUrlClassifierDBServiceWorker::Lookup(nsIPrincipal*, nsACString_internal const&, nsIUrlClassifierCallback*) toolkit/components/url-classifier/nsUrlClassifierDBService.cpp 6 xul.dll UrlClassifierDBServiceWorkerProxy::LookupRunnable::Run() toolkit/components/url-classifier/nsUrlClassifierProxies.cpp 7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp [...] We had a few of those crashes in the days before, but this started *really* spike on 2014-09-28 (yesterday), mostly with the builds from the 27th, but builds from previous days see it as well with lower volume. This is mostly on startup, on both Nightly and Aurora, is the #1 crash by a large margin for that date. URLs are not interesting, as you's expect on startup, mostly about:blank, about:sessionrestore, about:home, etc. Looking at installs vs. crashes, it looks like on average, we have 3 crashes per install, so the same people are crashing repeatedly with this one.
Reporter | ||
Comment 1•10 years ago
|
||
For more stats and reports, see https://crash-stats.mozilla.com/report/list?signature=nsUrlClassifierPrefixSet%3A%3AContains%28unsigned%20int%2C%20bool%2A%29
Updated•10 years ago
|
In the while-loop here: http://hg.mozilla.org/mozilla-central/annotate/818f353b7aa6/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#l222 The loop spins around until it hits an unmapped page. |diff| is extremely 'negative' (e.g. 0x8071c2df, 0xabd90594). I guess it never reached exactly zero so it kept going. At the time of crash, mIndexDeltas[i][deltaIndex] is the poison value 0x5a5a, but it's not clear whether entire the array was dead, or whether we're accessing a bad index. gcp can you help?
Flags: needinfo?(gpascutto)
Assignee | ||
Comment 3•10 years ago
|
||
Would be interesting to get a "safebrowsing" folder from someone affected. It's in \Users\XXX\AppData\Local\Mozilla\Firefox\Profiles\XXXX\safebrowsing on Windows, and IIRC in a similar place in .cache on Unix. This spike happening on a certain date could mean that build changes the memory layout such that the crash is triggered, or that an update happened that causes us to write invalid data. Being limited to Aurora/Nightly is likely because bug 1046038 completely changes the memory layout. Notably, it makes the array we're running past much smaller. If there's a logic error in the code, it's not immediately obvious to me, which is why having a crashing profile would help. If you hit this bug you'd be unable to start Firefox until you blow away that directory in your profile.
Flags: needinfo?(gpascutto)
Assignee | ||
Comment 4•10 years ago
|
||
// Now search through the deltas for the target. uint32_t diff = target - mIndexPrefixes[i]; uint32_t deltaSize = mIndexDeltas[i].Length(); uint32_t deltaIndex = 0; while (diff > 0 && deltaIndex < deltaSize) { diff -= mIndexDeltas[i][deltaIndex]; deltaIndex++; } If the mIndexDelta[i][deltaIndex] is running into poison, that must mean "i" is already invalid at this point, because deltaIndex is checked against the size of the array (deltaSize) at each loop iteration. That's reinforced by for example this: https://crash-stats.mozilla.com/report/index/deaa3289-e0a9-490b-83b2-52fc62140925 which is crashing here: http://hg.mozilla.org/mozilla-central/annotate/372ce1f36116/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#l224
Comment 5•10 years ago
|
||
Can the array change while we're in this loop? The crash spike here is likely triggered by an updated safe-browsing list, not a change in binary layout or anything like that.
Updated•10 years ago
|
Assignee: nobody → gpascutto
Assignee | ||
Comment 6•10 years ago
|
||
Going into those lines, the i comes directly from BinSearch, and that can only return garbage if mIndexPrefixes.Length() is 0 but mHasPrefixes is true, which seems unlikely. The alternate explanation is that mIndexPrefixes and mIndexDeltas are of different length. I'm checking now for logic errors that could cause that to happen. MakePrefixSet looks like it can't violate that invariant, but for LoadFromFd I'm not so sure: http://hg.mozilla.org/mozilla-central/annotate/818f353b7aa6/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#l301 http://hg.mozilla.org/mozilla-central/annotate/818f353b7aa6/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#l314 If deltaSize is 0, then the mIndexDeltas array won't be set up properly. Now, constructing a prefixset that has zero deltas is hard, but it's certainly possible if on a fresh profile the first update is very small.
Assignee | ||
Comment 7•10 years ago
|
||
No idea if it fixes the crash spike but this was a real bug. I included a test to catch it.
Attachment #8497654 -
Flags: review?(mmc)
Comment 8•10 years ago
|
||
Comment on attachment 8497654 [details] [diff] [review] Correctly initialize PrefixSets with no deltas. r=mmc Review of attachment 8497654 [details] [diff] [review]: ----------------------------------------------------------------- This is a bug but I don't think it fixes the crash. It looks like in Contains() we access mIndexPrefixes[0] without checking to see if it has at least one element. That seems like a bug.
Attachment #8497654 -
Flags: review?(mmc) → review+
Assignee | ||
Comment 9•10 years ago
|
||
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #8) > This is a bug but I don't think it fixes the crash. It looks like in > Contains() we access mIndexPrefixes[0] without checking to see if it has at > least one element. What makes you believe that? As noted in comment 6, that should be caught by the mHasPrefixes check. http://hg.mozilla.org/mozilla-central/annotate/372ce1f36116/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#l197
Assignee | ||
Comment 10•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c8818fdb16c
Comment 11•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/3c8818fdb16c
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment 12•10 years ago
|
||
Setting NEEDINFO for liz to verify whether this actually goes away in crash-stats.
Comment 13•10 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #9) > (In reply to [:mmc] Monica Chew (please use needinfo) from comment #8) > > > This is a bug but I don't think it fixes the crash. It looks like in > > Contains() we access mIndexPrefixes[0] without checking to see if it has at > > least one element. > > What makes you believe that? As noted in comment 6, that should be caught by > the mHasPrefixes check. > http://hg.mozilla.org/mozilla-central/annotate/372ce1f36116/toolkit/ > components/url-classifier/nsUrlClassifierPrefixSet.cpp#l197 Re-reading again, you are right. Splitting this into multiple state variables instead of using mIndexPrefixes.Length() makes following this logic unnecessarily confusing.
Assignee | ||
Comment 14•10 years ago
|
||
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #13) > Re-reading again, you are right. Splitting this into multiple state > variables instead of using mIndexPrefixes.Length() makes following this > logic unnecessarily confusing. I fully agree, patches welcome ;-) I'll fix it up in the next iteration of the crash doesn't die down.
Comment 15•10 years ago
|
||
Thanks Benjamin. I'll have a look tomorrow and see if the crash rate is going down. If it does over the next few days, maybe this should get uplifted to 34 as well.
Flags: in-testsuite+
Comment 16•10 years ago
|
||
There are 35 crashes for the 2014100103 build of Nightly, and none on mozilla-central builds after that - they all look to be on Aurora. Looks good!
Flags: needinfo?(lhenry)
Assignee | ||
Comment 17•10 years ago
|
||
Comment on attachment 8497654 [details] [diff] [review] Correctly initialize PrefixSets with no deltas. r=mmc Approval Request Comment [Feature/regressing bug #]: Bug 1046038 [User impact if declined]: In very specific situations, startup crash that's only recoverable by clearing profile. [Describe test coverage new/current, TBPL]: Landed in m-c few days ago, adds additional tests. [Risks and why]: Very low. Clear bugfix with new test coverage. [String/UUID change made/needed]: N/A.
Attachment #8497654 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
status-firefox32:
--- → unaffected
status-firefox33:
--- → unaffected
Comment 18•10 years ago
|
||
Comment on attachment 8497654 [details] [diff] [review] Correctly initialize PrefixSets with no deltas. r=mmc Aurora+
Attachment #8497654 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 20•10 years ago
|
||
The stats in Socorro [1] for the past 4 weeks (excluding Firefox 35 prior to October 2nd and Firefox 34 prior to October 7th) show: - 1 crash in 34.0a2 (20141010004001) - 1 crash on 35.0a1 (20141012030203) - 1 crash in 35.0a2 (20141019004001) - 5 crashes in 34.0b1 (20141014134955) - 1 crash in 34.0b2 (20141020184313) There is clearly no spike anymore. I'm guessing this can be marked as Verified (unless you think the few new crashes need additional work). [1] https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=nsUrlClassifierPrefixSet%3A%3AContains%28unsigned+int%2C+bool%2A%29#tab-reports
Comment 21•10 years ago
|
||
Florin, I agree, one or two crashes is nothing like the hundreds/ thousands we were seeing before. Marking verified for 34.
Comment 22•10 years ago
|
||
32 and 33 were only showing a few crashes per week so I think they can be safely marked as wontfix.
Status: RESOLVED → VERIFIED
Comment 24•9 years ago
|
||
I am getting crashes like this with the current tree. Filed bug 1120145 to track it.
You need to log in
before you can comment on or make changes to this bug.
Description
•