Intermittent WinXP mochitest PROCESS-CRASH | Main app process exited normally | application crashed [@ moz_abort] with base::StatisticsRecorder::~StatisticsRecorder() on the stack

RESOLVED FIXED in Firefox 38

Status

()

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: RyanVM, Assigned: jimm)

Tracking

(Blocks 2 bugs, {crash, intermittent-failure})

unspecified
mozilla40
All
Windows XP
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+, firefox37 wontfix, firefox38 fixed, firefox38.0.5 fixed, firefox39 fixed, firefox40 fixed, firefox-esr31 unaffected, b2g-v2.2 fixed, b2g-master fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

03:56:56  WARNING -  PROCESS-CRASH | Main app process exited normally | application crashed [@ moz_abort]
03:56:56     INFO -  Crash dump filename: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpdmrkh6.mozrunner\minidumps\ce103272-cc9b-49fc-b3a8-c57e7ee359f2.dmp
03:56:56     INFO -  Operating system: Windows NT
03:56:56     INFO -                    5.1.2600 Service Pack 3
03:56:56     INFO -  CPU: x86
03:56:56     INFO -       GenuineIntel family 6 model 30 stepping 5
03:56:56     INFO -       8 CPUs
03:56:56     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
03:56:56     INFO -  Crash address: 0x9e3182
03:56:56     INFO -  Assertion: Unknown assertion type 0x00000000
03:56:56     INFO -  Thread 0 (crashed)
03:56:56     INFO -   0  mozglue.dll!moz_abort [jemalloc_config.c:085e5a323eed : 33 + 0x34]
03:56:56     INFO -      eip = 0x009e3182   esp = 0x0012f4f0   ebp = 0x0012f56c   ebx = 0x00000001
03:56:56     INFO -      esi = 0x00b000f4   edi = 0x00b000f4   eax = 0x00000000   ecx = 0x00910ad9
03:56:56     INFO -      edx = 0x009d0ea0   efl = 0x00000216
03:56:56     INFO -      Found by: given as instruction pointer in context
03:56:56     INFO -   1  mozglue.dll!arena_avail_tree_remove [jemalloc.c:085e5a323eed : 3256 + 0x14f]
03:56:56     INFO -      eip = 0x009e3ae4   esp = 0x0012f4f4   ebp = 0x0012f56c
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   2  mozglue.dll!arena_run_dalloc [jemalloc.c:085e5a323eed : 3847 + 0x14]
03:56:56     INFO -      eip = 0x009e6334   esp = 0x0012f574   ebp = 0x0012f590
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   3  mozglue.dll!arena_dalloc_small [jemalloc.c:085e5a323eed : 4616 + 0x8]
03:56:56     INFO -      eip = 0x009e5160   esp = 0x0012f598   ebp = 0x0012f5b4
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   4  mozglue.dll!arena_dalloc [jemalloc.c:085e5a323eed : 4719 + 0xc]
03:56:56     INFO -      eip = 0x009e5016   esp = 0x0012f5bc   ebp = 0x0012f628
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   5  mozglue.dll!je_free [jemalloc.c:085e5a323eed : 6459 + 0x6]
03:56:56     INFO -      eip = 0x009e9f6c   esp = 0x0012f630   ebp = 0x0012f638
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   6  xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::_Erase(std::_Tree_node<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *>,void *> *) [xtree : 2083 + 0x6]
03:56:56     INFO -      eip = 0x01040125   esp = 0x0012f640   ebp = 0x0012f650
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   7  xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::_Erase(std::_Tree_node<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *>,void *> *) [xtree : 2078 + 0x9]
03:56:56     INFO -      eip = 0x01040110   esp = 0x0012f658   ebp = 0x0012f668
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   8  xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::clear() [xtree : 1540 + 0x9]
03:56:56     INFO -      eip = 0x01041492   esp = 0x0012f670   ebp = 0x0012f688
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -   9  xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::erase(std::_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> > > >,std::_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> > > >) [xtree : 1514 + 0x4]
03:56:56     INFO -      eip = 0x01041bc2   esp = 0x0012f67c   ebp = 0x0012f688
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -  10  xul.dll!std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >::~_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,base::Histogram *,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,base::Histogram *> >,0> >() [xtree : 1192 + 0xd]
03:56:56     INFO -      eip = 0x01037b57   esp = 0x0012f690   ebp = 0x0012f6a4
03:56:56     INFO -      Found by: call frame info
03:56:56     INFO -  11  xul.dll!base::StatisticsRecorder::~StatisticsRecorder() [histogram.cc:085e5a323eed : 1221 + 0x14]
03:56:56     INFO -      eip = 0x010382e4   esp = 0x0012f6ac   ebp = 0x0012f6f0
03:56:56     INFO -      Found by: call frame info
Duplicate of this bug: 1119809
These are jemalloc aborts under ShutdownXPCOM -> ~StatisticsRecorder. StatisticsRecorder is ipc/chromium code.
Component: Telemetry → IPC
Product: Toolkit → Core
This has gone through the roof lately. Need some IPC engineer love :(
Flags: needinfo?(jmathies)
tracking-e10s: --- → ?
Flags: needinfo?(jmathies)
This code is in the Chromium IPC code, but I think it is more strictly speaking part of our telemetry infrastructure.  It looks like we're crashing on a jemalloc assertion on this line:
  delete it->second;
It is very late in shutdown, so maybe jemalloc is in some kind of shut down state already.  Maybe this could be explicitly shut down earlier?

I'm not sure what kind of process this is.  It looks like a child process, based on the XRE_InitChildProcess in the stack, but I don't see any other evidence of any child process being created.

Vladan, is there somebody who could look at this?
Flags: needinfo?(vdjeric)
Georg or Dexter, can you take a look a this e10s shutdown crash on WinXP?
Flags: needinfo?(vdjeric)
Flags: needinfo?(gfritzsche)
Flags: needinfo?(alessio.placitelli)
If we can find someone to look at this outside the e10s team that would be great.

This is currently flagged for e10s triage, however as long as there are needinfos on the bug it won't show up in our triage list. If we can't find an engineer to look at this, please clear the needinfos and the e10s team will figure out what to do with it.
I don't think we will get to this soon, so clearing the needinfos for now.
Flags: needinfo?(gfritzsche)
Flags: needinfo?(alessio.placitelli)
Flags: needinfo?(alessio.placitelli)
Unfortunately I won't be able to take a look a this bug now, but will take a look as soon as I have a free cycle if it's still there.
Flags: needinfo?(alessio.placitelli)
See Also: → 1148772
See Also: → 1148956
See Also: → 1139802
Tracy, be careful about dumping really bad automation problems in the '+' bucket, something like this should get triaged by the team due to its severity.
Flags: needinfo?(twalker)
Flags: needinfo?(twalker)
Posted patch potential fix (obsolete) — Splinter Review
Comment on attachment 8589909 [details] [diff] [review]
potential fix

Georg, I don't know that much about telemetry. But I'm pretty sure this gets initialized early enough, and shutdown soon enough to avoid any problems with jemalloc shutdown. See comment 135 for a brief explanation of the problem.
Attachment #8589909 - Flags: review?(gfritzsche)
Assignee: nobody → jmathies
Comment on attachment 8589909 [details] [diff] [review]
potential fix

Review of attachment 8589909 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable, let's hope that this was the issue.

Telemetry should be usable after this triggered:
NS_GENERIC_FACTORY_SINGLETON_CONSTRUCTOR(nsITelemetry, TelemetryImpl::CreateTelemetryInstance)

That's definitely after the StatisticsRecorder init, right?

::: toolkit/xre/nsAppRunner.cpp
@@ +2972,5 @@
>  #endif
>  
>    ScopedXPCOMStartup* mScopedXPCOM;
>    ScopedAppData* mAppData;
> +  base::StatisticsRecorder* mStatisticsRecorder;

Maybe use a UniquePtr<StatisticsRecorder> and get rid of the explicit deletes and null-check?

@@ +4207,5 @@
>    gArgv = argv;
>  
>    NS_ENSURE_TRUE(aAppData, 2);
>  
> +  // A chromium thing used by our telemetry.

Lets keep the original info too, that this is an initializer needed for Telemetry histogram data collection.

@@ +4208,5 @@
>  
>    NS_ENSURE_TRUE(aAppData, 2);
>  
> +  // A chromium thing used by our telemetry.
> +  mStatisticsRecorder = new base::StatisticsRecorder();

Shouldn't we do this in the metro startup routines too?
Flags: needinfo?(jmathies)
> @@ +4208,5 @@
> >  
> >    NS_ENSURE_TRUE(aAppData, 2);
> >  
> > +  // A chromium thing used by our telemetry.
> > +  mStatisticsRecorder = new base::StatisticsRecorder();
> 
> Shouldn't we do this in the metro startup routines too?

Not really, that code will get ripped out of mc as soon as I find the time to do it in bug 1039866.
Flags: needinfo?(jmathies)
Attachment #8589909 - Flags: review?(gfritzsche) → review-
Posted patch patch v.2 (obsolete) — Splinter Review
> Telemetry should be usable after this triggered:
> NS_GENERIC_FACTORY_SINGLETON_CONSTRUCTOR(nsITelemetry, TelemetryImpl::CreateTelemetryInstance)
> 
> That's definitely after the StatisticsRecorder init, right?

Yes, telemetry initializes when we init XPCOM. Just confirmed this.= with this build.
Attachment #8590263 - Flags: review?(gfritzsche)
Attachment #8589909 - Attachment is obsolete: true
Comment on attachment 8590263 [details] [diff] [review]
patch v.2

Review of attachment 8590263 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good with maybe the comment addressed.

::: toolkit/xre/nsAppRunner.cpp
@@ +2962,5 @@
>  #endif
>  
> +  UniquePtr<ScopedXPCOMStartup> mScopedXPCOM;
> +  UniquePtr<base::StatisticsRecorder> mStatisticsRecorder;
> +  nsAutoPtr<mozilla::ScopedAppData> mAppData;

Is this using nsAutoPtr for the implicit conversion to T*?
If so, you could probably just use UniquePtr::get() where needed?
Attachment #8590263 - Flags: review?(gfritzsche) → review+
(In reply to Georg Fritzsche [:gfritzsche] from comment #251)
> Comment on attachment 8590263 [details] [diff] [review]
> patch v.2
> 
> Review of attachment 8590263 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good with maybe the comment addressed.
> 
> ::: toolkit/xre/nsAppRunner.cpp
> @@ +2962,5 @@
> >  #endif
> >  
> > +  UniquePtr<ScopedXPCOMStartup> mScopedXPCOM;
> > +  UniquePtr<base::StatisticsRecorder> mStatisticsRecorder;
> > +  nsAutoPtr<mozilla::ScopedAppData> mAppData;
> 
> Is this using nsAutoPtr for the implicit conversion to T*?
> If so, you could probably just use UniquePtr::get() where needed?

It's not unique, we store it in a global as a raw ptr and then reference that all throughout nsAppRunner. My experience with unique_ptr is low though, so maybe there's a right way around this. Obviously auto ptr works too.
Attachment #8590263 - Attachment is obsolete: true
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=8696212&repo=mozilla-inbound
Flags: needinfo?(jmathies)
(In reply to Carsten Book [:Tomcat] from comment #270)
> sorry had to back this out for test failures like
> https://treeherder.mozilla.org/logviewer.html#?job_id=8696212&repo=mozilla-
> inbound

I'll bet xpcshell needs to init this object as well now somehwere. Will take a look today.
Flags: needinfo?(jmathies)
Add the same init code to xpcshell.
Attachment #8590875 - Flags: review?(gfritzsche)
Attachment #8590875 - Flags: review?(gfritzsche) → review+
See Also: → 1153197
(In reply to Jim Mathies [:jimm] from comment #291)
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&revision=522aa79beb09

I get the sense that this will be backed out, there's a W(4) failure on XP that doesn't have any annotations.

This probably doesn't make any difference anyhow since there ia a jemalloc crash on a retrigger, very similar to these crashes just not in ~StatisticsRecorder(). Looks like we may have a problem with memory allocation somewhere during the test run.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Blocks: 1155127
Please nominate this for Aurora/Beta/b2g37 approval when you get a chance.
Keywords: leave-open
Target Milestone: --- → mozilla40
Comment on attachment 8590875 [details] [diff] [review]
xpcshell patch

Approval Request Comment
[Feature/regressing bug #]:
fix for random test failure on shutdown. No known regression bug.
[User impact if declined]:
tree managers go insane at an earlier age.
[Describe test coverage new/current, TreeHerder]:
on mc for a two weeks.
[Risks and why]:
Seems low, this mucks with how we free some chromium memory on shutdown such that it happens earlier, before xpcom shtudown. Seems like a good thing to do in general.
[String/UUID change made/needed]:
none

Requesting for both of these csets, both of which apply cleanly to aurora:

https://hg.mozilla.org/mozilla-central/rev/dd396261f189
https://hg.mozilla.org/mozilla-central/rev/522aa79beb09
Flags: needinfo?(jmathies)
Attachment #8590875 - Flags: approval-mozilla-aurora?
Comment on attachment 8590875 [details] [diff] [review]
xpcshell patch

Approval Request Comment
[Feature/regressing bug #]:
fix for random test failure on shutdown. No known regression bug.
[User impact if declined]:
tree managers go insane at an earlier age.
[Describe test coverage new/current, TreeHerder]:
on mc for a two weeks.
[Risks and why]:
Seems low, this mucks with how we free some chromium memory on shutdown such that it happens earlier, before xpcom shtudown. Seems like a good thing to do in general.
[String/UUID change made/needed]:
none

Requesting for both of these csets, both of which apply cleanly to aurora:

https://hg.mozilla.org/mozilla-central/rev/dd396261f189
https://hg.mozilla.org/mozilla-central/rev/522aa79beb09
Attachment #8590875 - Flags: approval-mozilla-b2g37?
Comment on attachment 8590875 [details] [diff] [review]
xpcshell patch

38 is definitely affected and going ESR, so wanted there as well.
Attachment #8590875 - Flags: approval-mozilla-release?
Comment on attachment 8590875 [details] [diff] [review]
xpcshell patch

This has already been on m-c for a week. Let's get it on Aurora. I'm leaving the Beta call for Sylvestre. Aurora+
Attachment #8590875 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+