Closed
Bug 1361476
Opened 7 years ago
Closed 6 years ago
Intermittent LeakSanitizer | LLVMSymbolizer was unable to allocate memory.
Categories
(Testing :: General, defect)
Tracking
(firefox56 wontfix, firefox57 fixed, firefox58 fixed)
RESOLVED
FIXED
mozilla58
People
(Reporter: intermittent-bug-filer, Assigned: gbrown)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])
Attachments
(2 files)
1.07 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
1.05 KB,
patch
|
RyanVM
:
review+
|
Details | Diff | Splinter Review |
Filed by: archaeopteryx [at] coole-files.de https://treeherder.mozilla.org/logviewer.html#?job_id=96013175&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/I6FlWGUfTQWUyG_yN8iMaQ/runs/0/artifacts/public/logs/live_backing.log See bug 1355834 for the implementation of this as error message.
![]() |
||
Comment 1•7 years ago
|
||
Also shows up as e.g. browser_ext_windows_update.js | application terminated with exit code -9 https://treeherder.mozilla.org/logviewer.html#?job_id=96013177&repo=mozilla-inbound [task 2017-05-02T18:23:43.144614Z] 18:23:43 INFO - TEST-START | browser/components/extensions/test/browser/browser_ext_windows_update.js [task 2017-05-02T18:23:53.044524Z] 18:23:53 INFO - GECKO(1092) | MEMORY STAT | vsize 20974052MB | residentFast 922MB [task 2017-05-02T18:23:53.046592Z] 18:23:53 INFO - TEST-OK | browser/components/extensions/test/browser/browser_ext_windows_update.js | took 9898ms [task 2017-05-02T18:23:53.268178Z] 18:23:53 INFO - checking window state [task 2017-05-02T18:24:02.640290Z] 18:24:02 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1218 [task 2017-05-02T18:24:02.883361Z] 18:24:02 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1176 [task 2017-05-02T18:24:03.060844Z] 18:24:03 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1198 [task 2017-05-02T18:24:03.230339Z] 18:24:03 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1144 [task 2017-05-02T18:24:06.335985Z] 18:24:06 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1092 [task 2017-05-02T18:24:06.336370Z] 18:24:06 INFO - TEST-START | Shutdown [task 2017-05-02T18:24:06.336597Z] 18:24:06 INFO - Browser Chrome Test Summary [task 2017-05-02T18:24:06.347194Z] 18:24:06 INFO - Passed: 3936 [task 2017-05-02T18:24:06.347467Z] 18:24:06 INFO - Failed: 0 [task 2017-05-02T18:24:06.349556Z] 18:24:06 INFO - Todo: 0 [task 2017-05-02T18:24:06.352062Z] 18:24:06 INFO - Mode: e10s [task 2017-05-02T18:24:06.354072Z] 18:24:06 INFO - *** End BrowserChrome Test Results *** [task 2017-05-02T18:24:07.359801Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:07.485290Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:07.588698Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:07.778473Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:11.450698Z] 18:24:11 INFO - GECKO(1092) | 1493749451447 Marionette INFO Ceased listening [task 2017-05-02T18:24:14.688861Z] 18:24:14 INFO - GECKO(1092) | ----------------------------------------------------- [task 2017-05-02T18:24:14.689320Z] 18:24:14 INFO - GECKO(1092) | Suppressions used: [task 2017-05-02T18:24:14.689621Z] 18:24:14 INFO - GECKO(1092) | count bytes template [task 2017-05-02T18:24:14.689869Z] 18:24:14 INFO - GECKO(1092) | 1 2032 libglib-2.0.so [task 2017-05-02T18:24:14.690174Z] 18:24:14 INFO - GECKO(1092) | ----------------------------------------------------- [task 2017-05-02T18:24:39.488139Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.540687Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.540964Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.541190Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.542811Z] 18:24:39 INFO - TEST-INFO | Main app process: killed by SIGKILL [task 2017-05-02T18:24:39.543673Z] 18:24:39 INFO - Buffered messages finished [task 2017-05-02T18:24:39.544626Z] 18:24:39 ERROR - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/browser_ext_windows_update.js | application terminated with exit code -9
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 6•6 years ago
|
||
this seemed to be a 1 day spike and has been fixed since then.
Whiteboard: [stockwell unknown]
Comment 7•6 years ago
|
||
I've seen some LSan logs where stack depth truncation is not working. For instance, here there's a leak report where it is reporting all the way to the stop of the stack: https://treeherder.mozilla.org/logviewer.html#?job_id=93338084&repo=mozilla-esr52&lineNumber=9225 However, I've seen other instances where it was properly truncated. That could certainly cause memory issues, though I don't know why that would happen intermittently.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Assignee | |
Comment 13•6 years ago
|
||
This bug was dormant for almost a month until July 17. On July 17, all failures were on try, in https://treeherder.mozilla.org/#/jobs?repo=try&revision=e9941a7448eea58445e4bfc35fe828015809cf53, for bug 1351148. But that bug has not landed any patches yet! On July 18, failures started on autoland: https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=linux%20asan%20mochitest&tochange=8ff4f17b266db9a780efe06f7fbdae629e49f5bc&fromchange=0be298129be1221d42b269a61ab3911349738e70 suggesting failures started with the backout at https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=8ff4f17b266db9a780efe06f7fbdae629e49f5bc, bug 1363361. The backout looks correct/complete to me, and I don't see any sign of this happening before those patches landed...so, this isn't making sense to me. :mccr8 -- Thoughts?
Flags: needinfo?(continuation)
Whiteboard: [stockwell unknown] → [stockwell needswork]
Comment 14•6 years ago
|
||
One odd thing about this is that if it fails, it seems to fail repeatedly. But it is scattered across many test chunks, so it isn't like there's one problem. I think one theory glandium had was that maybe sometimes we produce a binary that is extremely large, but you'd think that it would fail every chunk in that case rather than just one of them at a high rate. Maybe somebody could check if the binaries are oddly large in failing runs? Glandium, do you have any ideas here? This is pretty baffling.
Flags: needinfo?(continuation) → needinfo?(mh+mozilla)
Comment 15•6 years ago
|
||
There's an IDL change in bug 1363361, so maybe this is something a clobber would fix? I don't know how often we clobber ASan builds.
Comment 16•6 years ago
|
||
Decoder said upgrading from ASan 3.9 to 4.0 might help. Or might make it worse...
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 19•6 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #16) > Decoder said upgrading from ASan 3.9 to 4.0 might help. Sorry, I misunderstood what he said. He was talking about some other kind of error.
Comment hidden (Intermittent Failures Robot) |
Comment 21•6 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #14) > One odd thing about this is that if it fails, it seems to fail repeatedly. > But it is scattered across many test chunks, so it isn't like there's one > problem. I think one theory glandium had was that maybe sometimes we produce > a binary that is extremely large, but you'd think that it would fail every > chunk in that case rather than just one of them at a high rate. Maybe > somebody could check if the binaries are oddly large in failing runs? > > Glandium, do you have any ideas here? This is pretty baffling. I don't remember having come up with a theory, but one observation to be made here is that those jobs are running on m1.medium instances, which have 3.75GiB of physical memory. That's not a lot... and I wouldn't be surprised if it's the core problem.
Flags: needinfo?(mh+mozilla)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Assignee | |
Updated•6 years ago
|
Flags: needinfo?(gbrown)
![]() |
Assignee | |
Comment 24•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #21) > I don't remember having come up with a theory, but one observation to be > made here is that those jobs are running on m1.medium instances, which have > 3.75GiB of physical memory. That's not a lot... and I wouldn't be surprised > if it's the core problem. There have been 3 "outbreaks" of this failure over the last month. In each one, all failures are linux64-asan/opt and occur in 3 suites: mochitest mochitest-browser-chrome mochitest-devtools-chrome These suites have taskcluster configurations with "instance-size" of "legacy", with a comment referencing bug 1281241. I'll check on try to see if tests fail when these are changed to "default" or "xlarge". If not, perhaps we can "upgrade" one of the suites to see if that eliminates failures on that suite the next time this failure returns.
![]() |
Assignee | |
Comment 25•6 years ago
|
||
My motivation here is to modify one type of test (devtools) to see if, the next time we get a spike/outbreak of this failure, there is a clear difference between devtools and, say, browser-chrome. This is experimental only and intended to test the theory of comment 21. I'm changing to xlarge instead of default since I think legacy and default have the same amount of memory. Changing the instance type to xlarge seems to do no harm: https://treeherder.mozilla.org/#/jobs?repo=try&revision=99c6be8436b1ac9fbb67c678582f3282b71091b5 In fact, it may reduce the frequency of other intermittent failures. Do you have any concerns with this? I'm not sure I understand the history of bug 1281241. I'm taking the bug so that I won't lose track of it, but I don't have plans other than follow-up on this experiment; others are welcome to take it!
Updated•6 years ago
|
Attachment #8893525 -
Flags: review?(jmaher) → review+
Comment 26•6 years ago
|
||
This is happening with unnerving regularity on Beta now too since the uplift yesterday, but I guess it's not anything specific to that.
status-firefox56:
--- → affected
status-firefox57:
--- → affected
![]() |
Assignee | |
Updated•6 years ago
|
Keywords: leave-open
Comment 27•6 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3d0c24a37df9 Run linux64-asan/opt mochitest-devtools-chrome tests on xlarge instances, experimentally; r=jmaher
Comment 28•6 years ago
|
||
If this doesn't help, maybe you could talk to some of the fuzzing people. I think they have experience with running ASan builds on AWS.
![]() |
||
Comment 29•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3d0c24a37df9
Updated•6 years ago
|
Whiteboard: [stockwell needswork] → [stockwell needswork][checkin-needed-beta]
Comment 30•6 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/5fb77be6beb6 Not that I actually expect this to help much on Beta since they've all been mochitest-bc there.
Whiteboard: [stockwell needswork][checkin-needed-beta] → [stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Comment 33•6 years ago
|
||
No devtools failures since comment 29. Can we consider doing the same for browser-chrome now?
Flags: needinfo?(gbrown)
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed:other]
![]() |
Assignee | |
Comment 35•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #33) > No devtools failures since comment 29. Can we consider doing the same for > browser-chrome now? But there have only been 3 other failures since then too. I'd like to wait for more definitive evidence. Will continue to watch...
Flags: needinfo?(gbrown)
Keywords: leave-open
Whiteboard: [stockwell fixed:other] → [stockwell unknown]
![]() |
Assignee | |
Comment 36•6 years ago
|
||
There have been no failures reported here since Aug 21, on any test type. I don't think my devtools change made any difference...this just fixed itself somehow.
Attachment #8910977 -
Flags: review?(ryanvm)
![]() |
Assignee | |
Comment 37•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=15920d7d5ff36ed5ba8fdc7a1aebd1346b7a7815
Comment 38•6 years ago
|
||
Comment on attachment 8910977 [details] [diff] [review] return to legacy instances for linux64-asan/opt mochitest-devtools-chrome Review of attachment 8910977 [details] [diff] [review]: ----------------------------------------------------------------- rs=me, I guess.
Attachment #8910977 -
Flags: review?(ryanvm) → review+
Comment 39•6 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/ad80a6d082c4 Change instance type for linux64-asan mochitest-devtools tests; r=ryanvm
![]() |
||
Comment 40•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ad80a6d082c4
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Updated•6 years ago
|
Comment 41•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/628445067e7c
You need to log in
before you can comment on or make changes to this bug.
Description
•