Closed Bug 1100501 Opened 10 years ago Closed 10 years ago

Intermittent WinXP mochitest PROCESS-CRASH | Main app process exited normally | application crashed [@ moz_abort] with base::StatisticsRecorder::~StatisticsRecorder() on the stack

Categories

(Core :: IPC, defect)

All
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
e10s + ---
firefox37 --- wontfix
firefox38 --- fixed
firefox38.0.5 --- fixed
firefox39 --- fixed
firefox40 --- fixed
firefox-esr31 --- unaffected
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: RyanVM, Assigned: jimm)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure)

Attachments

(2 files, 2 obsolete files)

03:56:56 WARNING - PROCESS-CRASH | Main app process exited normally | application crashed [@ moz_abort] 03:56:56 INFO - Crash dump filename: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpdmrkh6.mozrunner\minidumps\ce103272-cc9b-49fc-b3a8-c57e7ee359f2.dmp 03:56:56 INFO - Operating system: Windows NT 03:56:56 INFO - 5.1.2600 Service Pack 3 03:56:56 INFO - CPU: x86 03:56:56 INFO - GenuineIntel family 6 model 30 stepping 5 03:56:56 INFO - 8 CPUs 03:56:56 INFO - Crash reason: EXCEPTION_BREAKPOINT 03:56:56 INFO - Crash address: 0x9e3182 03:56:56 INFO - Assertion: Unknown assertion type 0x00000000 03:56:56 INFO - Thread 0 (crashed) 03:56:56 INFO - 0 mozglue.dll!moz_abort [jemalloc_config.c:085e5a323eed : 33 + 0x34] 03:56:56 INFO - eip = 0x009e3182 esp = 0x0012f4f0 ebp = 0x0012f56c ebx = 0x00000001 03:56:56 INFO - esi = 0x00b000f4 edi = 0x00b000f4 eax = 0x00000000 ecx = 0x00910ad9 03:56:56 INFO - edx = 0x009d0ea0 efl = 0x00000216 03:56:56 INFO - Found by: given as instruction pointer in context 03:56:56 INFO - 1 mozglue.dll!arena_avail_tree_remove [jemalloc.c:085e5a323eed : 3256 + 0x14f] 03:56:56 INFO - eip = 0x009e3ae4 esp = 0x0012f4f4 ebp = 0x0012f56c 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 2 mozglue.dll!arena_run_dalloc [jemalloc.c:085e5a323eed : 3847 + 0x14] 03:56:56 INFO - eip = 0x009e6334 esp = 0x0012f574 ebp = 0x0012f590 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 3 mozglue.dll!arena_dalloc_small [jemalloc.c:085e5a323eed : 4616 + 0x8] 03:56:56 INFO - eip = 0x009e5160 esp = 0x0012f598 ebp = 0x0012f5b4 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 4 mozglue.dll!arena_dalloc [jemalloc.c:085e5a323eed : 4719 + 0xc] 03:56:56 INFO - eip = 0x009e5016 esp = 0x0012f5bc ebp = 0x0012f628 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 5 mozglue.dll!je_free [jemalloc.c:085e5a323eed : 6459 + 0x6] 03:56:56 INFO - eip = 0x009e9f6c esp = 0x0012f630 ebp = 0x0012f638 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 6 xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::_Erase(std::_Tree_node<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *>,void *> *) [xtree : 2083 + 0x6] 03:56:56 INFO - eip = 0x01040125 esp = 0x0012f640 ebp = 0x0012f650 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 7 xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::_Erase(std::_Tree_node<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *>,void *> *) [xtree : 2078 + 0x9] 03:56:56 INFO - eip = 0x01040110 esp = 0x0012f658 ebp = 0x0012f668 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 8 xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::clear() [xtree : 1540 + 0x9] 03:56:56 INFO - eip = 0x01041492 esp = 0x0012f670 ebp = 0x0012f688 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 9 xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::erase(std::_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> > > >,std::_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> > > >) [xtree : 1514 + 0x4] 03:56:56 INFO - eip = 0x01041bc2 esp = 0x0012f67c ebp = 0x0012f688 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 10 xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::~_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >() [xtree : 1192 + 0xd] 03:56:56 INFO - eip = 0x01037b57 esp = 0x0012f690 ebp = 0x0012f6a4 03:56:56 INFO - Found by: call frame info 03:56:56 INFO - 11 xul.dll!base::StatisticsRecorder::~StatisticsRecorder() [histogram.cc:085e5a323eed : 1221 + 0x14] 03:56:56 INFO - eip = 0x010382e4 esp = 0x0012f6ac ebp = 0x0012f6f0 03:56:56 INFO - Found by: call frame info
These are jemalloc aborts under ShutdownXPCOM -> ~StatisticsRecorder. StatisticsRecorder is ipc/chromium code.
Component: Telemetry → IPC
Product: Toolkit → Core
This has gone through the roof lately. Need some IPC engineer love :(
Flags: needinfo?(jmathies)
tracking-e10s: --- → ?
Flags: needinfo?(jmathies)
This code is in the Chromium IPC code, but I think it is more strictly speaking part of our telemetry infrastructure. It looks like we're crashing on a jemalloc assertion on this line: delete it->second; It is very late in shutdown, so maybe jemalloc is in some kind of shut down state already. Maybe this could be explicitly shut down earlier? I'm not sure what kind of process this is. It looks like a child process, based on the XRE_InitChildProcess in the stack, but I don't see any other evidence of any child process being created. Vladan, is there somebody who could look at this?
Flags: needinfo?(vdjeric)
Georg or Dexter, can you take a look a this e10s shutdown crash on WinXP?
Flags: needinfo?(vdjeric)
Flags: needinfo?(gfritzsche)
Flags: needinfo?(alessio.placitelli)
If we can find someone to look at this outside the e10s team that would be great. This is currently flagged for e10s triage, however as long as there are needinfos on the bug it won't show up in our triage list. If we can't find an engineer to look at this, please clear the needinfos and the e10s team will figure out what to do with it.
I don't think we will get to this soon, so clearing the needinfos for now.
Flags: needinfo?(gfritzsche)
Flags: needinfo?(alessio.placitelli)
Flags: needinfo?(alessio.placitelli)
Unfortunately I won't be able to take a look a this bug now, but will take a look as soon as I have a free cycle if it's still there.
Flags: needinfo?(alessio.placitelli)
See Also: → 1148772
See Also: → 1148956
See Also: → 1139802
Tracy, be careful about dumping really bad automation problems in the '+' bucket, something like this should get triaged by the team due to its severity.
Flags: needinfo?(twalker)
Flags: needinfo?(twalker)
Attached patch potential fix (obsolete) — — Splinter Review
Comment on attachment 8589909 [details] [diff] [review] potential fix Georg, I don't know that much about telemetry. But I'm pretty sure this gets initialized early enough, and shutdown soon enough to avoid any problems with jemalloc shutdown. See comment 135 for a brief explanation of the problem.
Attachment #8589909 - Flags: review?(gfritzsche)
Assignee: nobody → jmathies
Comment on attachment 8589909 [details] [diff] [review] potential fix Review of attachment 8589909 [details] [diff] [review]: ----------------------------------------------------------------- This seems reasonable, let's hope that this was the issue. Telemetry should be usable after this triggered: NS_GENERIC_FACTORY_SINGLETON_CONSTRUCTOR(nsITelemetry, TelemetryImpl::CreateTelemetryInstance) That's definitely after the StatisticsRecorder init, right? ::: toolkit/xre/nsAppRunner.cpp @@ +2972,5 @@ > #endif > > ScopedXPCOMStartup* mScopedXPCOM; > ScopedAppData* mAppData; > + base::StatisticsRecorder* mStatisticsRecorder; Maybe use a UniquePtr<StatisticsRecorder> and get rid of the explicit deletes and null-check? @@ +4207,5 @@ > gArgv = argv; > > NS_ENSURE_TRUE(aAppData, 2); > > + // A chromium thing used by our telemetry. Lets keep the original info too, that this is an initializer needed for Telemetry histogram data collection. @@ +4208,5 @@ > > NS_ENSURE_TRUE(aAppData, 2); > > + // A chromium thing used by our telemetry. > + mStatisticsRecorder = new base::StatisticsRecorder(); Shouldn't we do this in the metro startup routines too?
Flags: needinfo?(jmathies)
> @@ +4208,5 @@ > > > > NS_ENSURE_TRUE(aAppData, 2); > > > > + // A chromium thing used by our telemetry. > > + mStatisticsRecorder = new base::StatisticsRecorder(); > > Shouldn't we do this in the metro startup routines too? Not really, that code will get ripped out of mc as soon as I find the time to do it in bug 1039866.
Flags: needinfo?(jmathies)
Attachment #8589909 - Flags: review?(gfritzsche) → review-
Attached patch patch v.2 (obsolete) — — Splinter Review
> Telemetry should be usable after this triggered: > NS_GENERIC_FACTORY_SINGLETON_CONSTRUCTOR(nsITelemetry, TelemetryImpl::CreateTelemetryInstance) > > That's definitely after the StatisticsRecorder init, right? Yes, telemetry initializes when we init XPCOM. Just confirmed this.= with this build.
Attachment #8590263 - Flags: review?(gfritzsche)
Attachment #8589909 - Attachment is obsolete: true
Comment on attachment 8590263 [details] [diff] [review] patch v.2 Review of attachment 8590263 [details] [diff] [review]: ----------------------------------------------------------------- Looks good with maybe the comment addressed. ::: toolkit/xre/nsAppRunner.cpp @@ +2962,5 @@ > #endif > > + UniquePtr<ScopedXPCOMStartup> mScopedXPCOM; > + UniquePtr<base::StatisticsRecorder> mStatisticsRecorder; > + nsAutoPtr<mozilla::ScopedAppData> mAppData; Is this using nsAutoPtr for the implicit conversion to T*? If so, you could probably just use UniquePtr::get() where needed?
Attachment #8590263 - Flags: review?(gfritzsche) → review+
(In reply to Georg Fritzsche [:gfritzsche] from comment #251) > Comment on attachment 8590263 [details] [diff] [review] > patch v.2 > > Review of attachment 8590263 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good with maybe the comment addressed. > > ::: toolkit/xre/nsAppRunner.cpp > @@ +2962,5 @@ > > #endif > > > > + UniquePtr<ScopedXPCOMStartup> mScopedXPCOM; > > + UniquePtr<base::StatisticsRecorder> mStatisticsRecorder; > > + nsAutoPtr<mozilla::ScopedAppData> mAppData; > > Is this using nsAutoPtr for the implicit conversion to T*? > If so, you could probably just use UniquePtr::get() where needed? It's not unique, we store it in a global as a raw ptr and then reference that all throughout nsAppRunner. My experience with unique_ptr is low though, so maybe there's a right way around this. Obviously auto ptr works too.
Attached patch patch (r=georg) — — Splinter Review
Attachment #8590263 - Attachment is obsolete: true
Flags: needinfo?(jmathies)
(In reply to Carsten Book [:Tomcat] from comment #270) > sorry had to back this out for test failures like > https://treeherder.mozilla.org/logviewer.html#?job_id=8696212&repo=mozilla- > inbound I'll bet xpcshell needs to init this object as well now somehwere. Will take a look today.
Flags: needinfo?(jmathies)
Attached patch xpcshell patch — — Splinter Review
Add the same init code to xpcshell.
Attachment #8590875 - Flags: review?(gfritzsche)
Attachment #8590875 - Flags: review?(gfritzsche) → review+
See Also: → 1153197
(In reply to Jim Mathies [:jimm] from comment #291) > https://treeherder.mozilla.org/#/jobs?repo=mozilla- > inbound&revision=522aa79beb09 I get the sense that this will be backed out, there's a W(4) failure on XP that doesn't have any annotations. This probably doesn't make any difference anyhow since there ia a jemalloc crash on a retrigger, very similar to these crashes just not in ~StatisticsRecorder(). Looks like we may have a problem with memory allocation somewhere during the test run.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1155127
Please nominate this for Aurora/Beta/b2g37 approval when you get a chance.
Keywords: leave-open
Target Milestone: --- → mozilla40
Comment on attachment 8590875 [details] [diff] [review] xpcshell patch Approval Request Comment [Feature/regressing bug #]: fix for random test failure on shutdown. No known regression bug. [User impact if declined]: tree managers go insane at an earlier age. [Describe test coverage new/current, TreeHerder]: on mc for a two weeks. [Risks and why]: Seems low, this mucks with how we free some chromium memory on shutdown such that it happens earlier, before xpcom shtudown. Seems like a good thing to do in general. [String/UUID change made/needed]: none Requesting for both of these csets, both of which apply cleanly to aurora: https://hg.mozilla.org/mozilla-central/rev/dd396261f189 https://hg.mozilla.org/mozilla-central/rev/522aa79beb09
Flags: needinfo?(jmathies)
Attachment #8590875 - Flags: approval-mozilla-aurora?
Comment on attachment 8590875 [details] [diff] [review] xpcshell patch Approval Request Comment [Feature/regressing bug #]: fix for random test failure on shutdown. No known regression bug. [User impact if declined]: tree managers go insane at an earlier age. [Describe test coverage new/current, TreeHerder]: on mc for a two weeks. [Risks and why]: Seems low, this mucks with how we free some chromium memory on shutdown such that it happens earlier, before xpcom shtudown. Seems like a good thing to do in general. [String/UUID change made/needed]: none Requesting for both of these csets, both of which apply cleanly to aurora: https://hg.mozilla.org/mozilla-central/rev/dd396261f189 https://hg.mozilla.org/mozilla-central/rev/522aa79beb09
Attachment #8590875 - Flags: approval-mozilla-b2g37?
Comment on attachment 8590875 [details] [diff] [review] xpcshell patch 38 is definitely affected and going ESR, so wanted there as well.
Attachment #8590875 - Flags: approval-mozilla-release?
Comment on attachment 8590875 [details] [diff] [review] xpcshell patch This has already been on m-c for a week. Let's get it on Aurora. I'm leaving the Beta call for Sylvestre. Aurora+
Attachment #8590875 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8590875 [details] [diff] [review] xpcshell patch I care about Ryan sanity, let's take it. Should be in 38 beta 7.
Attachment #8590875 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #8590875 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Depends on: 1156857
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: