Closed Bug 872788 Opened 12 years ago Closed 11 years ago

Frequent PGO Windows 7 and XP iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by 'timeout exceeded'

Categories

(Testing :: Talos, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(firefox30 wontfix, firefox31 fixed, firefox32 fixed, firefox-esr24 wontfix)

RESOLVED FIXED
mozilla32
Tracking Status
firefox30 --- wontfix
firefox31 --- fixed
firefox32 --- fixed
firefox-esr24 --- wontfix

People

(Reporter: philor, Assigned: jmaher)

References

Details

(Keywords: intermittent-failure, Whiteboard: [pgo])

https://tbpl.mozilla.org/php/getParsedLog.php?id=22996244&tree=Mozilla-Inbound Windows 7 32-bit mozilla-inbound pgo talos dromaeojs on 2013-05-15 13:19:07 PDT for push 835e5b42aa10 slave: t-w732-ix-011 Cycle 1(1): loaded http://localhost/page_load_test/dromaeo/dom-attr.html (next: http://localhost/page_load_test/dromaeo/dom-modify.html) RSS: Main: 192466944 Screen width/height:1600/1200 colorDepth:24 Browser inner width/height: 1008/666 browser_name:Firefox browser_version:24.0a1 buildID:20130515091549 PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame] Crash dump filename: c:\users\cltbld\appdata\local\temp\tmpbua8b9\profile\minidumps\b4122b86-00e9-489b-9e08-9ef22f9ed6f1.dmp stderr from minidump_stackwalk: 2013-05-15 14:39:03: minidump_processor.cc:264: INFO: Processing minidump in file c:\users\cltbld\appdata\local\temp\tmpbua8b9\profile\minidumps\b4122b86-00e9-489b-9e08-9ef22f9ed6f1.dmp cygwin warning: MS-DOS style path detected: c:\users\cltbld\appdata\local\temp\tmpbua8b9\profile\minidumps\b4122b86-00e9-489b-9e08-9ef22f9ed6f1.dmp Preferred POSIX equivalent is: /cygdrive/c/users/cltbld/appdata/local/temp/tmpbua8b9/profile/minidumps/b4122b86-00e9-489b-9e08-9ef22f9ed6f1.dmp CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames 2013-05-15 14:39:03: minidump.cc:3815: INFO: Minidump opened minidump c:\users\cltbld\appdata\local\temp\tmpbua8b9\profile\minidumps\b4122b86-00e9-489b-9e08-9ef22f9ed6f1.dmp 2013-05-15 14:39:03: minidump.cc:4086: ERROR: ReadBytes: read 0/32 2013-05-15 14:39:03: minidump.cc:3834: ERROR: Minidump cannot read header 2013-05-15 14:39:03: minidump_processor.cc:268: ERROR: Minidump c:\users\cltbld\appdata\local\temp\tmpbua8b9\profile\minidumps\b4122b86-00e9-489b-9e08-9ef22f9ed6f1.dmp could not be read 2013-05-15 14:39:03: minidump.cc:3787: INFO: Minidump closing minidump 2013-05-15 14:39:03: minidump_stackwalk.cc:529: ERROR: MinidumpProcessor::Process failed minidump_stackwalk exited with return code 1 Failed dromaeo_dom: Stopped Wed, 15 May 2013 14:39:03 Traceback (most recent call last): File "run_tests.py", line 277, in run_tests talos_results.add(mytest.runTest(browser_config, test)) File "C:\slave\talos-data\talos\ttest.py", line 387, in runTest raise talosError("timeout exceeded") talosError: 'timeout exceeded' Traceback (most recent call last): File "run_tests.py", line 330, in <module> main() File "run_tests.py", line 327, in main run_tests(parser) File "run_tests.py", line 289, in run_tests raise e utils.talosError: 'timeout exceeded' program finished with exit code 1 https://tbpl.mozilla.org/php/getParsedLog.php?id=22991966&tree=Mozilla-Inbound t-w732-ix-019 https://tbpl.mozilla.org/php/getParsedLog.php?id=22988149&tree=Mozilla-Inbound t-w732-ix-060 https://tbpl.mozilla.org/php/getParsedLog.php?id=22996903&tree=Mozilla-Inbound t-w732-ix-050
https://tbpl.mozilla.org/php/getParsedLog.php?id=22998759&tree=Mozilla-Inbound t-w732-ix-041 With just the timeout exceeded, without the empty crash.
I'm undecided about whether the comment 2 timeout, timeout, slave disconnect during tp5 is the same thing or a different thing, so I stuck it here while I thought about it.
Summary: Frequent Windows 7 iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame] followed by talosError: 'timeout exceeded' → Frequent Windows 7 iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by talosError: 'timeout exceeded'
Summary: Frequent Windows 7 iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by talosError: 'timeout exceeded' → Frequent Windows 7 and XP iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by talosError: 'timeout exceeded'
And see also bug 798219, since for the most part the WinXP iX slaves don't manage to produce the crash, but hellishly frequently hit the timeout exceeded.
Andrew, I believe that this basic issue has been linked to OOMs in the past. Any chance you could take a look at this? Interestingly, it's a top 10 orange on aurora and beta, but is pretty far down the list (but still happening) on trunk. Not sure what to make of that...
Flags: needinfo?(continuation)
I'm not sure how much I can do. There's no information about what is happening. It looks like we're crashing with PGO builds.
Flags: needinfo?(continuation)
Summary: Frequent Windows 7 and XP iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by talosError: 'timeout exceeded' → Frequent PGO Windows 7 and XP iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by talosError: 'timeout exceeded'
Whiteboard: [pgo]
Talking to bz about this on IRC, it sounds like this a test issue. Requesting more info from him so we can try to find an owner for this.
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bzbarsky)
(Reducing risk of false-positives against bug 798219 et al)
Summary: Frequent PGO Windows 7 and XP iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by talosError: 'timeout exceeded' → Frequent PGO Windows 7 and XP iX PROCESS-CRASH | dromaeo_dom | application crashed [Unknown top frame][@ mozalloc_abort(char const * const)] followed by 'timeout exceeded'
This is one of our longest-running Talos intermittents that we hit on a regular basis. Any ideas who I should ping to get this moving forward?
Flags: needinfo?(jmaher)
I assume this is some sort of OOM. Interesting that it seems to affect PGO only.
Pretty odd how consistent they are. The last 4 or so I looked at all happen during an nsTextNodeRelease that does a suspect, during ContentUnbinder. That doesn't seem like a run of the mill OOM. Though comment 483 does say that this test has some OOM issue. Did we get whatever upstream fix there seemed to be for that issue? 00:08:57 INFO - 0 mozalloc.dll!mozalloc_abort(char const * const) [mozalloc_abort.cpp:15503d9a400a : 30 + 0x0] 00:08:57 INFO - 1 mozalloc.dll!mozalloc_handle_oom(unsigned int) [mozalloc_oom.cpp:15503d9a400a : 50 + 0x8] 00:08:57 INFO - 2 mozalloc.dll!moz_xmalloc [mozalloc.cpp:15503d9a400a : 54 + 0x7] 00:08:57 INFO - 3 xul.dll!NS_CycleCollectorSuspect3 [nsCycleCollector.cpp:15503d9a400a : 3718 + 0xa] 00:08:57 INFO - 4 xul.dll!nsTextNode::Release() [nsTextNode.cpp:15503d9a400a : 100 + 0x19] 00:08:57 INFO - 5 xul.dll!nsRefPtr<mozilla::dom::Element>::~nsRefPtr<mozilla::dom::Element>() [nsCOMPtr.h:15503d9a400a : 448 + 0x7] 00:08:57 INFO - 6 xul.dll!ContentUnbinder::UnbindSubtree(nsIContent *) [FragmentOrElement.cpp:15503d9a400a : 1166 + 0x7] 00:08:57 INFO - 7 xul.dll!ContentUnbinder::Run() [FragmentOrElement.cpp:15503d9a400a : 1177 + 0xf] 00:08:57 INFO - 8 xul.dll!ContentUnbinder::UnbindAll() [FragmentOrElement.cpp:15503d9a400a : 1203 + 0x5] 00:08:57 INFO - 9 xul.dll!nsCCUncollectableMarker::Observe(nsISupports *,char const *,wchar_t const *) [nsCCUncollectableMarker.cpp:15503d9a400a : 342 + 0x4]
Unfortunately we ended up disabling this for xp instead of living with it. That was a long time ago- it would be nice not to disable this for windows 7. I have no idea who would be the best person to look into this. Probably someone with a debugger setup on windows and just run this test in a loop until it is caught.
there is some talk about the usefulness of dromaeo in general in bug 987136
Flags: needinfo?(jmaher)
Depends on: 1001565
For those following along (or in my case sifting through the top N OrangeFactor) - fingers crossed for bug 1001565 to fix this - latest progress at rolling it out in bug 1002780 :-)
When we do get this landed, it will fix this error. I plan to uplift the talos changes to Aurora so it will stick on ESR.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Does this mean dromaeojs should be re-enabled on WinXP?
Assignee: nobody → jmaher
Target Milestone: --- → mozilla32
oh, good point, let me see what :bz says
Flags: needinfo?(bzbarsky)
We should consider seeing whether it's green, just to see. That said, I don't really care all that much about the resulting numbers, I think. It's pretty unlikely that they'd be different from the Win7 numbers, and I don't expect press (e.g. Tom's Hardware) to run benchmarks on WinXP and report the results.
Flags: needinfo?(bzbarsky)
I verified it is green on try: https://tbpl.mozilla.org/?tree=Try&rev=24ca3c5ef453 (the failures are related to my development branch being out of sync) We will still see failures on beta or any release earlier than 31 that run on windows 7. So we are green on windows xp, it sounds like the value of turning it on is not really worth it.
I've spun off the recent failures to bug 1124740. There's clearly a new issue at play here that isn't being properly served by commenting in a bug that'd been largely gone for some time now.
You need to log in before you can comment on or make changes to this bug.