Intermittent tresize | Found crashes after test run, terminating test

RESOLVED DUPLICATE of bug 1345735

Status

Testing
Talos
RESOLVED DUPLICATE of bug 1345735
2 years ago
2 months ago

People

(Reporter: Treeherder Bug Filer, Unassigned)

Tracking

({intermittent-failure})

Version 3
intermittent-failure
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [stockwell fixed])

Comment 1

a year ago
5 failures in 836 pushes (0.006 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* autoland: 3
* mozilla-inbound: 1
* mozilla-central: 1

Platform breakdown:
* windows7-32: 3
* windows8-64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-02-06&endday=2017-02-12&tree=all

Comment 2

a year ago
18 failures in 812 pushes (0.022 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* autoland: 11
* mozilla-inbound: 4
* mozilla-central: 3

Platform breakdown:
* windows8-64: 12
* windows7-32: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-02-20&endday=2017-02-26&tree=all
Frequency increase on Feb 24 may be related to bug 1339594.
See Also: → bug 1339594
marking as fixed as this is greatly reduced.
Whiteboard: [stockwell fixed]

Comment 5

a year ago
42 failures in 783 pushes (0.054 failures/push) were associated with this bug in the last 7 days. 

This is the #23 most frequent failure this week. 

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **

Repository breakdown:
* autoland: 18
* mozilla-inbound: 15
* mozilla-central: 7
* try: 2

Platform breakdown:
* windows8-64: 38
* windows7-32: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-02-27&endday=2017-03-05&tree=all
given the fact that these are almost all win8 pgo/opt, we should look into this.  tresize is where we crash- the simple thing is to not run tresize on win8 :)

any thoughts on why we get an unknown top frame?  do we need to do something different for stackwalk?

here is an example:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=82538140&lineNumber=1666

11:29:42     INFO -  mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/eG2s1dbXQx-PgU5F3jPmeQ/artifacts/public/build/firefox-55.0a1.en-US.win64.crashreporter-symbols.zip
11:29:46     INFO -  mozcrash Copy/paste: C:\slave\test\build\win32-minidump_stackwalk.exe c:\users\cltbld~1.t-w\appdata\local\temp\tmpkw8zcu\profile\minidumps\05fc0085-4ed2-4af6-93d6-eb03f1953fbd.dmp c:\users\cltbld~1.t-w\appdata\local\temp\tmpbfkner
11:29:46     INFO -  mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\05fc0085-4ed2-4af6-93d6-eb03f1953fbd.dmp
11:29:47     INFO -  PROCESS-CRASH | tresize | application crashed [unknown top frame]
11:29:47     INFO -  Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\tmpkw8zcu\profile\minidumps\05fc0085-4ed2-4af6-93d6-eb03f1953fbd.dmp
11:29:47     INFO -  stderr from minidump_stackwalk:
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4359: INFO: Minidump opened minidump c:\users\cltbld~1.t-w\appdata\local\temp\tmpkw8zcu\profile\minidumps\05fc0085-4ed2-4af6-93d6-eb03f1953fbd.dmp
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4479: INFO: Minidump not byte-swapping minidump
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 15 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 7 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 7 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 1197932545 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 6 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 1197932546 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 4 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 5 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4956: INFO: GetStream: type 3 not present
11:29:47     INFO -  2017-03-08 11:29:46: minidump_processor.cc:152: ERROR: Minidump c:\users\cltbld~1.t-w\appdata\local\temp\tmpkw8zcu\profile\minidumps\05fc0085-4ed2-4af6-93d6-eb03f1953fbd.dmp has no thread list
11:29:47     INFO -  2017-03-08 11:29:46: stackwalk.cc:139: ERROR: MinidumpProcessor::Process failed
11:29:47     INFO -  2017-03-08 11:29:46: minidump.cc:4331: INFO: Minidump closing minidump
11:29:47     INFO -  minidump_stackwalk exited with return code 1
11:29:47     INFO -  TEST-UNEXPECTED-ERROR | tresize | Found crashes after test run, terminating test


:ted, would you have ideas why we have no top frame?
Flags: needinfo?(ted)
Whiteboard: [stockwell fixed] → [stockwell needswork]
These are all timeouts:

08:19:51     INFO -  TEST-INFO | started process 2756 (C:\slave\test\build\application\firefox\firefox -profile c:\users\cltbld~1.t-w\appdata\local\temp\tmp2o5cos\profile http://localhost:49280/startup_test/tresize/addon/content/tresize-test.html)
08:22:21     INFO -  Timeout waiting for test completion; killing browser...
08:22:21     INFO -  Terminating psutil.Process(pid=2756, name=u'firefox.exe')

There is a good crash report sometimes, as in https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=81698927&lineNumber=1695, but not often.
this looks to be the same stack as mentioned in bug 1342685.  If this is the case (which makes sense if this is a timeout), then we have issues creating a new window (although this could be the process creation).
See Also: → bug 1345730

Comment 9

a year ago
46 failures in 790 pushes (0.058 failures/push) were associated with this bug in the last 7 days. 

This is the #41 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 18
* autoland: 15
* mozilla-central: 8
* mozilla-aurora: 5

Platform breakdown:
* windows8-64: 40
* windows7-32: 5
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-03-06&endday=2017-03-12&tree=all
Also for :ted's consideration, there is another Talos, Windows crash report in https://bugzilla.mozilla.org/show_bug.cgi?id=1346707#c0 with breakpad in the stack.
See Also: → bug 1346709
See Also: → bug 1346707
so it appears that :blassey fixed the win7 issues!!  Now we have 100+ failures/week of win8 crashes in talos, will ask for more help on Monday if we don't have any action here.
53 failures in 777 pushes (0.068 failures/push) were associated with this bug in the last 7 days. 

This is the #26 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 20
* autoland: 20
* mozilla-aurora: 8
* mozilla-central: 5

Platform breakdown:
* windows8-64: 50
* windows7-32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-03-13&endday=2017-03-19&tree=all
> 11:29:47     INFO -  2017-03-08 11:29:46: minidump_processor.cc:152: ERROR: Minidump c:\users\cltbld~1.t-w\appdata\local\temp\tmpkw8zcu\profile\minidumps\05fc0085-4ed2-4af6-93d6-eb03f1953fbd.dmp has no thread list

That's the smoking gun--the dump is missing all the useful information we'd need to actually produce a stack. This can happen sometimes when we're writing a minidump for the chrome process but the process is in a really bad state (heap corruption or certain kinds of OOM).
Flags: needinfo?(ted)
So then what do we do to figure this out?  We have almost 200 instances/week of this between all the different talos bugs.  Do we need to fix the minidump tool?
See Also: → bug 1342735
See Also: → bug 1345724
See Also: → bug 1345735
See Also: → bug 1345723
(In reply to Joel Maher ( :jmaher) from comment #14)
> So then what do we do to figure this out?  We have almost 200 instances/week
> of this between all the different talos bugs.  Do we need to fix the
> minidump tool?

The issue here is that Firefox is crashing in a way that it can't generate a valid minidump. The minidump writing code we use on Windows is just MinidumpWriteDump from Microsoft's dbghelp.dll. I don't have any great solutions for you.
29 failures in 898 pushes (0.032 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 15
* mozilla-inbound: 7
* mozilla-central: 6
* mozilla-aurora: 1

Platform breakdown:
* windows8-64: 25
* linux64: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-03-20&endday=2017-03-26&tree=all
Most of these are still Windows 8 with no crash report, as before. I found some crash reports in bug 1345735 and am waiting on a needinfo there.

There are also now a few Linux crashes reported here...those look different, so I'll spin off another bug for Linux.
Depends on: 1351731
See Also: → bug 1351818
36 failures in 845 pushes (0.043 failures/push) were associated with this bug in the last 7 days. 

This is the #50 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* autoland: 15
* mozilla-inbound: 14
* mozilla-central: 7

Platform breakdown:
* windows8-64: 32
* windows7-32: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-03-27&endday=2017-04-02&tree=all
36 failures in 867 pushes (0.042 failures/push) were associated with this bug in the last 7 days. 

This is the #49 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 16
* mozilla-central: 10
* autoland: 10

Platform breakdown:
* windows8-64: 33
* linux64: 2
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-04-03&endday=2017-04-09&tree=all
following up here, we are looking closer at bug 1345735
39 failures in 894 pushes (0.044 failures/push) were associated with this bug in the last 7 days. 

This is the #38 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 20
* autoland: 16
* mozilla-central: 3

Platform breakdown:
* windows8-64: 36
* linux64: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-04-10&endday=2017-04-16&tree=all
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → DUPLICATE
Whiteboard: [stockwell needswork] → [stockwell fixed]
Duplicate of bug: 1345735
19 failures in 817 pushes (0.023 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 8
* mozilla-central: 5
* autoland: 4
* try: 2

Platform breakdown:
* windows8-64: 18
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2017-04-17&endday=2017-04-23&tree=all

Comment 24

2 months ago
1 failures in 685 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1

Platform breakdown:
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1310638&startday=2018-02-12&endday=2018-02-18&tree=all
You need to log in before you can comment on or make changes to this bug.