Closed Bug 1276636 Opened 8 years ago Closed 8 years ago

Crash in mozilla::places::AsyncFetchAndSetIconForPage::start

Categories

(Toolkit :: Places, defect, P2)

47 Branch
x86
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox47 + wontfix
firefox48 + fixed
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: Gijs, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-b02e97f5-bccf-48ed-bfc1-705222160528.
=============================================================

This is on beta and aurora.

I'm wondering if the out param is somehow not happy-making, but the reports seem to be from JS calls so that doesn't make an awful lot of sense to me.
yeah, so far it's a mistery to me. event is created with new, that should be infallible and _canceler is checked with NS_ENSURE_ARG_POINTER.
The only alternative I could think of, would be to use NS_IF_ADDREF(*_canceler = event); but it should basically be the same.

As a side note, it looks like a crash on startup.
I wonder if the fact there are no crashes on 48 is just due to a limited number of users, or something has been fixed in 48, but I couldn't find anything related by a quick look.
Priority: -- → P2
(In reply to Marco Bonardo [::mak] from comment #1)
> yeah, so far it's a mistery to me. event is created with new, that should be
> infallible and _canceler is checked with NS_ENSURE_ARG_POINTER.

Is it? Where? I don't see it here:

https://hg.mozilla.org/releases/mozilla-beta/file/tip/toolkit/components/places/AsyncFaviconHelpers.cpp#l377

It wasn't part of the csets that landed for bug 1265420 so it wasn't part of the code that landed on beta.

Even so, as noted, if we're calling from JS I don't understand why the arg pointer would be unusable. :-\
Flags: needinfo?(mak77)
(In reply to :Gijs Kruitbosch from comment #2)
> Is it? Where? I don't see it here:

it's in the API calling point, http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/nsFaviconService.cpp#220
Flags: needinfo?(mak77)
Keywords: regression
The overall crash volume on this one is really low on Beta47. 12 crashes in 7 days. Unless the fix is ready in the next 8 hrs and is low risk, we might have to ship with this on Fx47. 

The risk I am taking here is when we do go to release, this crash might worsen but will it be bad enough to need a dot release, we will have to wait and watch.
All the new crashes are EXCEPTION_ACCESS_VIOLATION_WRITE which could be a security vulnerability. To the extent it seems to be at startup it's probably not exploitable -- but I did find one crash at 112 seconds and another at 226.
bp-341cc34b-bc10-4d28-826c-a13e32160529
bp-851be43e-e506-40f5-b6bc-e073f2160601

Of the 20 crashes I see there are only 5 unique people based on install time. One person crashed 10 times, another 6 times, another twice, and two people crashed once each.

There are some older crashes that might be related, and might indicate the recent changes are exposing a latent problem.

Back in 43.0 I see a use-after-free crash at nsCOMPtr_base::~nsCOMPtr_base | mozilla::places::AsyncFetchAndSetIconForPage::start 
bp-b421bf83-4689-445b-ad26-056282160602
(maybe the callback has gone away?)

In 46.0.1 I see two crashes that are a read access violation on the address 0x7f532015, which is also the top of the stack. (both seem to be startup, and the same person judging by install time.) This is in the constructor AsyncFetchAndSetIconForPage(), so maybe a getter on one of the params, and it's bogus? Maybe the callback again?
bp-e85cc0cd-4b65-435d-b13f-6002a2160521
(In reply to Marco Bonardo [::mak] from comment #1)
> I wonder if the fact there are no crashes on 48 is just due to a limited number of users, or
> something has been fixed in 48, but I couldn't find anything related by a quick look.

In the 20 crashes I found, one person crashed twice on 48.0a2
bp-905fbf3d-ff25-41a2-a09a-1364b2160525
bp-18e9bc20-6dd0-484b-b2e9-102682160526
I plan to keep this bug around for a few more weeks in case this becomes a dot-release driver/ride-along for Fx47. Otherwise, I will wontfix for Fx47 3 weeks from now.
Tracked for Fx47 (to retriage a week from now on whether this needs to be in a dot release or not).
Marco, can we get an assignee here or is this lower priority?
Flags: needinfo?(mak77)
(In reply to Jim Mathies [:jimm] from comment #10)
> Marco, can we get an assignee here or is this lower priority?

I'd be glad to fix it, but I already investigated the code and I cannot see anything wrong, unless I'm missing something about having to use NS_IF_ADDREF rather than .forget(). The only way to tell, for me, would be to be able to capture this at least once in a debugger, and I doubt it's possible since we have no STR.

Maybe someone with a greater expertise in xpcom may notice some detail I missed?

Nathan, could you please help me figuring if there's any wrong xpcom usage in the way we pass through _canceler and assign to it?
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/nsFaviconService.cpp#215
Flags: needinfo?(mak77) → needinfo?(nfroyd)
(In reply to Marco Bonardo [::mak] from comment #11)
> Nathan, could you please help me figuring if there's any wrong xpcom usage
> in the way we pass through _canceler and assign to it?
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/
> nsFaviconService.cpp#215

There's nothing wrong with the xpcom usage.

There is something peculiar about the address we're trying to assign to in Gijs's original report, though; the address lies within the text segment of libxul, which seems a bit of an odd place.  That address was fetched off the stack though; I'm not sure whether that means:

* somebody corrupted the stack;
* somebody didn't properly update the stack location; or
* the compiler has bad codegen here.

I would think that if it was the third option, we'd be seeing a lot more crashes?  Unless this is just uncommonly executed code, of course...

I don't have time at the moment to undertake a detailed analysis of the assembly and figure out whether the compiler is indeed getting things from the correct location.
Flags: needinfo?(nfroyd)
Blocks: 1279208
Blocks: 1283067
Version: unspecified → 47 Branch
No crashes showing up here for 48 in the last 3 weeks. There is one remaining crash in 47. Marking this Worksforme since we don't have a clear idea of the problem or if something we did fixed it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.