Closed
Bug 1296819
Opened 8 years ago
Closed 7 years ago
Intermittent valgrind-test | Invalid write of size 4 at mozilla:: / _pt_root / start_thread / clone
Categories
(Core :: General, defect, P3)
Core
General
Tracking
()
VERIFIED
FIXED
mozilla61
People
(Reporter: intermittent-bug-filer, Assigned: jseward)
References
Details
(Keywords: bulk-close-intermittents, intermittent-failure, Whiteboard: [ele:1b][stockwell fixed:product])
Attachments
(1 file, 1 obsolete file)
2.35 KB,
patch
|
baku
:
review+
RyanVM
:
approval-mozilla-esr60+
|
Details | Diff | Splinter Review |
Comment 1•8 years ago
|
||
First occurrence seems to be https://treeherder.mozilla.org/logviewer.html#?job_id=33703148&repo=mozilla-inbound
Comment hidden (mozreview-request) |
Updated•8 years ago
|
Attachment #8783427 -
Attachment is obsolete: true
Comment 3•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Comment hidden (Intermittent Failures Robot) |
Comment 5•8 years ago
|
||
Am I reading the stack right that this looks like an issue with the hang monitor?
status-firefox50:
--- → affected
status-firefox51:
--- → affected
status-firefox52:
--- → affected
Flags: needinfo?(jseward)
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #5)
> Am I reading the stack right that this looks like an issue with the hang
> monitor?
Well, I don't know if it is an issue with the hang monitor per se.
The invalid write to 0x0 is this, at the end of "void RunWatchdog(void* arg)"
(the watchdog thread main function):
// Shutdown is apparently dead. Crash the process.
MOZ_CRASH("Shutdown too long, probably frozen, causing a crash.");
So the question that comes to mind is: why is this timing out now when it
didn't previously?
Flags: needinfo?(jseward)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 12•8 years ago
|
||
Mass wontfix for bugs affecting firefox 52.
Comment 13•7 years ago
|
||
Hello, I'm asking your help with an experiment with making decisions on bugs. You've been needinfo'ed on this bug. I'd like you to take one action to help this bug make progress toward a decision. The things you can do include:
* If you know or have a good guess of which product and component this bug belongs to, change the product and component of the bug
* If you know of the right person to ask about this bug, redirect the needinfo to them
* If you cannot reproduce the bug, close it
All we need you to do is one thing that will help us make a decision on the bug or resolve it.
Thank you for your help with this. If you have questions, please contact emma@mozilla.com.
Flags: needinfo?(valentin.gosu)
Whiteboard: [ele:1b]
Updated•7 years ago
|
Flags: needinfo?(valentin.gosu) → needinfo?(hchang)
Comment 14•7 years ago
|
||
Not sure why I am ni'ed but I am trying to give feedback as far as I know:
1) There seems to be a wrong review commit (from me) pushed to this bug but it shouldn't
be relevant to this bug.
2) I didn't find the stack but that reminds me a safebrowsing "long hang crash" which
has been resolved for a while.
Flags: needinfo?(hchang)
Comment 15•7 years ago
|
||
Thanks for that. It also seemed to me that this hasn't been reproduced in a while, but I thought I would ask someone who looked at it previously. Seems I should have looked more carefully.
Closing as WORKSFORME
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Comment 16•7 years ago
|
||
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Keywords: bulk-close-intermittents
Resolution: --- → INCOMPLETE
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Comment hidden (Intermittent Failures Robot) |
Comment 21•7 years ago
|
||
This has become a frequent failure. First failure after a long break was on 2018-02-14.
Julian, can you take a look at this or redirect to an appropriate contact, please?
Flags: needinfo?(jseward)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 25•7 years ago
|
||
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #21)
> Julian, can you take a look at this or redirect to an appropriate contact,
> please?
Looking. At first glance this looks like a shutdown hang which
results in a MOZ_CRASH, causing Valgrind to report an invalid
write at address zero. I wonder if we need to increase some
timeout value for the running-on-valgrind case.
Flags: needinfo?(jseward)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 30•7 years ago
|
||
There have been 63 failures in the last 7 days.
Almost all the failures occur on Linux x64 /opt.
Here is a relevant log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=166912132
[task 2018-03-09T03:14:39.963Z] 03:14:39 INFO - 14:09.27 --31681-- WARNING: Serious error when reading debug info
[task 2018-03-09T03:14:39.963Z] 03:14:39 INFO - 14:09.27 --31681-- When reading debug info from /dev/shm/org.chromium.GwrzjR (deleted):
[task 2018-03-09T03:14:39.963Z] 03:14:39 INFO - 14:09.27 --31681-- failed to stat64/stat this file
[task 2018-03-09T03:15:56.146Z] 03:15:56 INFO - 15:25.45 ==31681== Thread 63 Shutdow~minator:
[task 2018-03-09T03:15:56.146Z] 03:15:56 INFO - 15:25.45 TEST-UNEXPECTED-FAIL | valgrind-test | Invalid write of size 4 at mozilla:: / _pt_root / start_thread / clone
[task 2018-03-09T03:15:56.146Z] 03:15:56 INFO - 15:25.45 ==31681== Invalid write of size 4
[task 2018-03-09T03:15:56.146Z] 03:15:56 INFO - 15:25.45 ==31681== at 0x114EC2CD: mozilla::(anonymous namespace)::RunWatchdog(void*) (nsTerminator.cpp:206)
[task 2018-03-09T03:15:56.146Z] 03:15:56 INFO - 15:25.45 ==31681== by 0x642CB80: _pt_root (ptthread.c:201)
[task 2018-03-09T03:15:56.146Z] 03:15:56 INFO - 15:25.45 ==31681== by 0x4E3AB4F: start_thread (pthread_create.c:304)
[task 2018-03-09T03:15:56.146Z] 03:15:56 INFO - 15:25.45 ==31681== by 0x5D51FBC: clone (clone.S:112)
[task 2018-03-09T03:15:56.147Z] 03:15:56 INFO - 15:25.45 ==31681== Address 0x0 is not stack'd, malloc'd or (recently) free'd
:jseward any updates on this?
Flags: needinfo?(jseward)
Assignee | ||
Comment 31•7 years ago
|
||
(In reply to Tiberius Oros[:tiberius_oros] from comment #30)
> :jseward any updates on this?
I have been looking at this in detail this week (sorry, I should
have mentioned this earlier). It's not easy to figure out the root
cause. I currently have 2 bugs in progress: bug 1442603 and bug 1443499
which I believe should improve the situation. I hope to have them
landed by early next week at the latest.
Flags: needinfo?(jseward)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 33•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #31)
> (In reply to Tiberius Oros[:tiberius_oros] from comment #30)
> > :jseward any updates on this?
> [..]
> I currently have 2 bugs in progress: bug 1442603 and bug 1443499
These have both now landed. They reduce the worst-case BHR resource
usage, which I suspect may be a factor here. Let's see if that
makes a difference. If not, I will look into increasing the shutdown
hang timeouts when running on valgrind.
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Whiteboard: [ele:1b][stockwell disable-recommended] → [ele:1b][stockwell fixed:product]
Comment 35•7 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → INCOMPLETE
Comment 36•7 years ago
|
||
This bug is back on mozilla-release and mozilla-esr60, failing on recent pushes 40% of all runs (?).
Log https://treeherder.mozilla.org/logviewer.html#?job_id=176808362&repo=mozilla-release
[task 2018-05-03T18:57:12.162Z] 18:57:12 INFO - 14:49.54 TEST-UNEXPECTED-FAIL | valgrind-test | Invalid write of size 4 at mozilla:: / _pt_root / start_thread / clone
[task 2018-05-03T18:57:12.162Z] 18:57:12 INFO - 14:49.54 ==30231== Invalid write of size 4
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 ==30231== at 0x11111EED: mozilla::(anonymous namespace)::RunWatchdog(void*) (nsTerminator.cpp:206)
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 ==30231== by 0x642B60C: _pt_root (ptthread.c:201)
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 ==30231== by 0x4E3AB4F: start_thread (pthread_create.c:304)
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 ==30231== by 0x5D51FBC: clone (clone.S:112)
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 ==30231== Address 0x0 is not stack'd, malloc'd or (recently) free'd
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 ==30231==
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 {
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 <insert_a_suppression_name_here>
[task 2018-05-03T18:57:12.163Z] 18:57:12 INFO - 14:49.54 Memcheck:Addr4
[task 2018-05-03T18:57:12.164Z] 18:57:12 INFO - 14:49.54 fun:_ZN7mozilla12_GLOBAL__N_111RunWatchdogEPv
[task 2018-05-03T18:57:12.164Z] 18:57:12 INFO - 14:49.54 fun:_pt_root
[task 2018-05-03T18:57:12.164Z] 18:57:12 INFO - 14:49.54 fun:start_thread
[task 2018-05-03T18:57:12.164Z] 18:57:12 INFO - 14:49.54 fun:clone
[task 2018-05-03T18:57:12.164Z] 18:57:12 INFO - 14:49.54 }
This sounds similar to comment 33. Julian, can you take a look at this, please?
Status: RESOLVED → VERIFIED
Flags: needinfo?(jseward)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 41•6 years ago
|
||
Increases the shutdown timeout by a factor of 3 when built --enable-valgrind
*and* actually running on Valgrind. Otherwise no effect.
Flags: needinfo?(jseward)
Attachment #8981448 -
Flags: review?(amarchesini)
Comment 42•6 years ago
|
||
Comment on attachment 8981448 [details] [diff] [review]
bug1296819-increase-shutdown-timeout-1.diff
Review of attachment 8981448 [details] [diff] [review]:
-----------------------------------------------------------------
looks good to me. Thanks.
Attachment #8981448 -
Flags: review?(amarchesini) → review+
Comment 43•6 years ago
|
||
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/17ad383212f4
Intermittent valgrind-test | Invalid write of size 4 at mozilla:: / _pt_root / start_thread / clone. r=amarchesini@mozilla.com.
Comment 44•6 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 46•6 years ago
|
||
Comment on attachment 8981448 [details] [diff] [review]
bug1296819-increase-shutdown-timeout-1.diff
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
For automation runs on Valgrind, Valgrind slows down progress so much
that the Shutdown Hang Reporter sometimes kills the process unnecessarily,
leading the test to fail. This patch increases the SHR timeouts
only for builds which are configured --enable-valgind and which are actually
running on Valgrind.
User impact if declined:
Zero, because it requires (1) config with --enable-valgind and (2) actually
running on Valgrind.
Fix Landed on Version:
62
Risk to taking this patch (and alternatives if risky):
Very low risk (no end-user effect)
String or UUID changes made by this patch:
None.
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8981448 -
Flags: approval-mozilla-esr60?
Comment 47•6 years ago
|
||
Comment on attachment 8981448 [details] [diff] [review]
bug1296819-increase-shutdown-timeout-1.diff
Per bug 1443499 comment 9, taking for ESR60.
Attachment #8981448 -
Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Updated•6 years ago
|
Assignee: nobody → jseward
status-firefox60:
--- → wontfix
status-firefox61:
--- → fixed
status-firefox-esr52:
--- → wontfix
status-firefox-esr60:
--- → affected
Resolution: INCOMPLETE → FIXED
Updated•6 years ago
|
Target Milestone: --- → mozilla61
Comment 48•6 years ago
|
||
bugherder uplift |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•