Closed Bug 971888 Opened 10 years ago Closed 10 years ago

nsComponentManagerImpl::RegisterModule calling into CC_AbortIfNull(void*) and crashing

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox28 - unaffected

People

(Reporter: mccr8, Unassigned)

Details

(Keywords: crash, topcrash-win)

Crash Data

This is apparently a top crash on windows. This stack looks pretty bogus, so I'm going to go out on a limb and suggest that a binary component is doing something bad.  Either that or there's a null-checked MOZ_CRASH in here somewhere, and PGO is being clever and merging together two implementations of it.

CC_AbortIfNull(void *)
nsComponentManagerImpl::RegisterModule(mozilla::Module const *,mozilla::FileLocation *)
nsComponentManagerImpl::ManifestBinaryComponent(nsComponentManagerImpl::ManifestProcessingContext &,int,char * const *)
ParseManifest(NSLocationType,mozilla::FileLocation &,char *,bool)
Crash Signature: [@ CC_AbortIfNull(void*) ]
Here's an example crash report:
https://crash-stats.mozilla.com/report/index/16a6f9dc-1208-400c-9ab6-bb51e2140211

I looked at 10 or so reports, and the top of the stack looked like this for all of them.
Could the presence of this signature on the top-crash list be a result of the gigantic spike (776!) of this crash on 0205 builds?  I think that's just another symptom of the media-related OOM bugs that seem to be crowding the top 10 crashes right now.  If that's really all the top crash is, then that should be fixed.
(In reply to Andrew McCreight [:mccr8] from comment #1)
> Here's an example crash report:
> https://crash-stats.mozilla.com/report/index/16a6f9dc-1208-400c-9ab6-
> bb51e2140211

The compiler folded some MOZ_CRASHes together and that confused both crash-stats and my windbg (in different ways). Here is what I've reconstructed by hand from the minidump.

xul!ProcessSelectorMatches
xul!nsComponentManagerImpl::RegisterCIDEntryLocked
xul!nsComponentManagerImpl::RegisterModule
xul!nsComponentManagerImpl::ManifestBinaryComponent
xul!ParseManifest

We crash at the default case of ProcessSelectorMatches [1]. If these leftover registers are correct, then |selector| was some completely bogus value (0x724fc010).

[1] http://mxr.mozilla.org/mozilla-central/source/xpcom/components/nsComponentManager.cpp#433
This is a change to the binary layout of CIDEntry and/or ContractIDEntry from bug 966476. This means that any binary addons will need to be recompiled to avoid crashing.


The only simple way to fix this is to bump the Module::kVersion, which normally matches the train version. I don't know if it would screw up automated version bumps to change that now. NI?release-mgmt
Flags: needinfo?(release-mgmt)
Which versions are affected here? 
Where the layout changes should be applied?
The change landed in bug 966476. The question is whether we should bump the mozilla::Module version or whether that would confuse the release management uplift script which normally bumps the version automatically. If we don't bump the version we're likely to continue seeing these crashes on Nightly until

1) the extension is updated
2) users disable the extension
3) next uplift in 5 weeks
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> The change landed in bug 966476. 
I think that was meant to be bug 966467.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> The change landed in bug 966476. The question is whether we should bump the
> mozilla::Module version or whether that would confuse the release management
> uplift script which normally bumps the version automatically. If we don't
> bump the version we're likely to continue seeing these crashes on Nightly
> until
> 
> 1) the extension is updated
> 2) users disable the extension
> 3) next uplift in 5 weeks

We're handing off merge script to Release Engineering -- https://github.com/mozilla/relman-tools/blob/master/merge-day-scripts/merge_helper.py#L61 has the files we do version bumping on, which one is this one for?  I don't see a Module::kVersion there.  That being said, since this will be fixed in 5 weeks, we know why this crash is happening, and it is only an issue on Nightly, I lean towards leaving it alone and not doing an early version bump.
Flags: needinfo?(release-mgmt) → needinfo?(benjamin)
https://github.com/mozilla/relman-tools/blob/3bcec2acb20e303c81c2df743a83dd7423d29147/merge-day-scripts/merge_helper.py#L66 is the line which does the version bump.

ok, let's leave the code as it is. Kairo can you check addon/module correlations to see if there are addon authors we can ping to recompile their XPCOM extensions?
Flags: needinfo?(benjamin) → needinfo?(kairo)
Correlations for Firefox 28.0b Windows NT

Modules with >10% margin:
42% (90/215) vs.  19% (13581/72370) EhStorShell.dll
63% (136/215) vs.  40% (29097/72370) dhcpcsvc.dll
61% (132/215) vs.  39% (27933/72370) cscapi.dll
45% (96/215) vs.  22% (16216/72370) WindowsCodecs.dll
51% (110/215) vs.  29% (21021/72370) dhcpcsvc6.DLL
65% (139/215) vs.  43% (31004/72370) ntshrui.dll
54% (116/215) vs.  32% (23276/72370) slc.dll
62% (134/215) vs.  41% (29467/72370) linkinfo.dll
37% (80/215) vs.  16% (11651/72370) quartz.dll
37% (80/215) vs.  16% (11804/72370) cscui.dll
37% (80/215) vs.  16% (11823/72370) cscdll.dll

Most of those sounds like Windows DLLs from looking at their version numbers.


Add-ons:
10% (21/215) vs.   0% (142/72370) {6C8B07BF-0F6D-4EA4-B96F-FF1CCBAAE553}

This seems to be "FastestTube - YouTube Video Downloader" - but that's all I see in correlations at all here.
Flags: needinfo?(kairo)
Nominating for tracking since this is #15 in Firefox 28 currently.
This bug is not in Firefox 28 at all. Perhaps you're looking at the CC_AbortIfNull(void*) signature and thinking of another bug?
Flags: needinfo?(anthony.s.hughes)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)
> This bug is not in Firefox 28 at all. Perhaps you're looking at the
> CC_AbortIfNull(void*) signature and thinking of another bug?

I can see well over 1000 crashes with CC_AbortIfNull(void*) in Firefox 28:
https://crash-stats.mozilla.com/report/list?signature=CC_AbortIfNull%28void*%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A28.0b&hang_type=any&date=2014-02-27+17%3A00%3A00&range_value=1#reports

Maybe this is a different bug, I don't know...
Flags: needinfo?(anthony.s.hughes)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #13)
> Maybe this is a different bug, I don't know...

If you look at the stacks, they are all from the cycle collector calling CC_AbortIfNull, not  nsComponentManagerImpl::RegisterModule.  Most of them had around 95% memory usage, so I think it is just a generic OOM signature.  I should just inline CC_AbortIfNull to make it less confusing.
Just to clarify further: this bug is about a specific subset of the CC_AbortIfNull signature when called from the component manager. There is another bug (bug 941606) which is about the signature when called from the cycle collector, although that bug is primarily about win64.

The crashes on beta appear to be OOM crashes where we aren't going to fix the crash itself but need to figure out why we're using memory and fix it.

Bug this bug doesn't occur on beta at all.
(In reply to Andrew McCreight [:mccr8] from comment #14)
> I should just inline CC_AbortIfNull to make it less confusing.

We also can add it to the "prefix list", which would mean that in crash signatures, the next frame is being appended with a | symbol. If you think that's a good idea, please file a bug for it in the Socorro product and we'll get that done ASAP.
Nah, it is a silly function anyways.  I have a patch in bug 977688 that I'll land today.  Thanks, though.
Coming from bp-2103adc5-718a-413a-9b51-2e1a62140422 in case you need something collected from a crashing profile...
Another one from 29.0.1 bp-c01cf251-2acd-470e-acc8-e6e442140523
Comment 18/19 are not this bug. They are bug 941606.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.