1361476 - Intermittent LeakSanitizer | LLVMSymbolizer was unable to allocate memory.

Reporter

Description

•

8 years ago

treeherder

Filed by: archaeopteryx [at] coole-files.de https://treeherder.mozilla.org/logviewer.html#?job_id=96013175&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/I6FlWGUfTQWUyG_yN8iMaQ/runs/0/artifacts/public/logs/live_backing.log See bug 1355834 for the implementation of this as error message.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 1

•

8 years ago

Also shows up as e.g. browser_ext_windows_update.js | application terminated with exit code -9 https://treeherder.mozilla.org/logviewer.html#?job_id=96013177&repo=mozilla-inbound [task 2017-05-02T18:23:43.144614Z] 18:23:43 INFO - TEST-START | browser/components/extensions/test/browser/browser_ext_windows_update.js [task 2017-05-02T18:23:53.044524Z] 18:23:53 INFO - GECKO(1092) | MEMORY STAT | vsize 20974052MB | residentFast 922MB [task 2017-05-02T18:23:53.046592Z] 18:23:53 INFO - TEST-OK | browser/components/extensions/test/browser/browser_ext_windows_update.js | took 9898ms [task 2017-05-02T18:23:53.268178Z] 18:23:53 INFO - checking window state [task 2017-05-02T18:24:02.640290Z] 18:24:02 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1218 [task 2017-05-02T18:24:02.883361Z] 18:24:02 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1176 [task 2017-05-02T18:24:03.060844Z] 18:24:03 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1198 [task 2017-05-02T18:24:03.230339Z] 18:24:03 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1144 [task 2017-05-02T18:24:06.335985Z] 18:24:06 INFO - GECKO(1092) | Completed ShutdownLeaks collections in process 1092 [task 2017-05-02T18:24:06.336370Z] 18:24:06 INFO - TEST-START | Shutdown [task 2017-05-02T18:24:06.336597Z] 18:24:06 INFO - Browser Chrome Test Summary [task 2017-05-02T18:24:06.347194Z] 18:24:06 INFO - Passed: 3936 [task 2017-05-02T18:24:06.347467Z] 18:24:06 INFO - Failed: 0 [task 2017-05-02T18:24:06.349556Z] 18:24:06 INFO - Todo: 0 [task 2017-05-02T18:24:06.352062Z] 18:24:06 INFO - Mode: e10s [task 2017-05-02T18:24:06.354072Z] 18:24:06 INFO - *** End BrowserChrome Test Results *** [task 2017-05-02T18:24:07.359801Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:07.485290Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:07.588698Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:07.778473Z] 18:24:07 INFO - GECKO(1092) | ###!!! [Child][MessageChannel] Error: (msgtype=0x4800B2,name=PContent::Msg_ConsoleMessage) Channel closing: too late to send/recv, messages will be lost [task 2017-05-02T18:24:11.450698Z] 18:24:11 INFO - GECKO(1092) | 1493749451447 Marionette INFO Ceased listening [task 2017-05-02T18:24:14.688861Z] 18:24:14 INFO - GECKO(1092) | ----------------------------------------------------- [task 2017-05-02T18:24:14.689320Z] 18:24:14 INFO - GECKO(1092) | Suppressions used: [task 2017-05-02T18:24:14.689621Z] 18:24:14 INFO - GECKO(1092) | count bytes template [task 2017-05-02T18:24:14.689869Z] 18:24:14 INFO - GECKO(1092) | 1 2032 libglib-2.0.so [task 2017-05-02T18:24:14.690174Z] 18:24:14 INFO - GECKO(1092) | ----------------------------------------------------- [task 2017-05-02T18:24:39.488139Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.540687Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.540964Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.541190Z] 18:24:39 INFO - GECKO(1092) | LLVM ERROR: IO failure on output stream. [task 2017-05-02T18:24:39.542811Z] 18:24:39 INFO - TEST-INFO | Main app process: killed by SIGKILL [task 2017-05-02T18:24:39.543673Z] 18:24:39 INFO - Buffered messages finished [task 2017-05-02T18:24:39.544626Z] 18:24:39 ERROR - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/browser_ext_windows_update.js | application terminated with exit code -9

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Blocks: LSan

Comment hidden (Intermittent Failures Robot)

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

8 years ago

this seemed to be a 1 day spike and has been fixed since then.

Whiteboard: [stockwell unknown]

Andrew McCreight [:mccr8]

Comment 7

•

8 years ago

I've seen some LSan logs where stack depth truncation is not working. For instance, here there's a leak report where it is reporting all the way to the stop of the stack: https://treeherder.mozilla.org/logviewer.html#?job_id=93338084&repo=mozilla-esr52&lineNumber=9225 However, I've seen other instances where it was properly truncated. That could certainly cause memory issues, though I don't know why that would happen intermittently.

Comment hidden (Intermittent Failures Robot)

Geoff Brown [:gbrown]

Assignee

Comment 13

•

8 years ago

This bug was dormant for almost a month until July 17. On July 17, all failures were on try, in https://treeherder.mozilla.org/#/jobs?repo=try&revision=e9941a7448eea58445e4bfc35fe828015809cf53, for bug 1351148. But that bug has not landed any patches yet! On July 18, failures started on autoland: https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=linux%20asan%20mochitest&tochange=8ff4f17b266db9a780efe06f7fbdae629e49f5bc&fromchange=0be298129be1221d42b269a61ab3911349738e70 suggesting failures started with the backout at https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=8ff4f17b266db9a780efe06f7fbdae629e49f5bc, bug 1363361. The backout looks correct/complete to me, and I don't see any sign of this happening before those patches landed...so, this isn't making sense to me. :mccr8 -- Thoughts?

Flags: needinfo?(continuation)

Whiteboard: [stockwell unknown] → [stockwell needswork]

Andrew McCreight [:mccr8]

Comment 14

•

8 years ago

One odd thing about this is that if it fails, it seems to fail repeatedly. But it is scattered across many test chunks, so it isn't like there's one problem. I think one theory glandium had was that maybe sometimes we produce a binary that is extremely large, but you'd think that it would fail every chunk in that case rather than just one of them at a high rate. Maybe somebody could check if the binaries are oddly large in failing runs? Glandium, do you have any ideas here? This is pretty baffling.

Flags: needinfo?(continuation) → needinfo?(mh+mozilla)

Andrew McCreight [:mccr8]

Comment 15

•

8 years ago

There's an IDL change in bug 1363361, so maybe this is something a clobber would fix? I don't know how often we clobber ASan builds.

Andrew McCreight [:mccr8]

Comment 16

•

8 years ago

Decoder said upgrading from ASan 3.9 to 4.0 might help. Or might make it worse...

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Comment 19

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #16) > Decoder said upgrading from ASan 3.9 to 4.0 might help. Sorry, I misunderstood what he said. He was talking about some other kind of error.

Comment hidden (Intermittent Failures Robot)

Mike Hommey [:glandium]

Comment 21

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #14) > One odd thing about this is that if it fails, it seems to fail repeatedly. > But it is scattered across many test chunks, so it isn't like there's one > problem. I think one theory glandium had was that maybe sometimes we produce > a binary that is extremely large, but you'd think that it would fail every > chunk in that case rather than just one of them at a high rate. Maybe > somebody could check if the binaries are oddly large in failing runs? > > Glandium, do you have any ideas here? This is pretty baffling. I don't remember having come up with a theory, but one observation to be made here is that those jobs are running on m1.medium instances, which have 3.75GiB of physical memory. That's not a lot... and I wouldn't be surprised if it's the core problem.

Flags: needinfo?(mh+mozilla)

Comment hidden (Intermittent Failures Robot)

Geoff Brown [:gbrown]

Assignee

Updated

•

8 years ago

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 24

•

8 years ago

(In reply to Mike Hommey [:glandium] from comment #21) > I don't remember having come up with a theory, but one observation to be > made here is that those jobs are running on m1.medium instances, which have > 3.75GiB of physical memory. That's not a lot... and I wouldn't be surprised > if it's the core problem. There have been 3 "outbreaks" of this failure over the last month. In each one, all failures are linux64-asan/opt and occur in 3 suites: mochitest mochitest-browser-chrome mochitest-devtools-chrome These suites have taskcluster configurations with "instance-size" of "legacy", with a comment referencing bug 1281241. I'll check on try to see if tests fail when these are changed to "default" or "xlarge". If not, perhaps we can "upgrade" one of the suites to see if that eliminates failures on that suite the next time this failure returns.

Geoff Brown [:gbrown]

Assignee

Comment 25

•

8 years ago

Attached patch try using xlarge for linux64-asan/opt mochitest-devtools-chrome — Details — Splinter Review

My motivation here is to modify one type of test (devtools) to see if, the next time we get a spike/outbreak of this failure, there is a clear difference between devtools and, say, browser-chrome. This is experimental only and intended to test the theory of comment 21. I'm changing to xlarge instead of default since I think legacy and default have the same amount of memory. Changing the instance type to xlarge seems to do no harm: https://treeherder.mozilla.org/#/jobs?repo=try&revision=99c6be8436b1ac9fbb67c678582f3282b71091b5 In fact, it may reduce the frequency of other intermittent failures. Do you have any concerns with this? I'm not sure I understand the history of bug 1281241. I'm taking the bug so that I won't lose track of it, but I don't have plans other than follow-up on this experiment; others are welcome to take it!

Assignee: nobody → gbrown

Flags: needinfo?(gbrown)

Attachment #8893525 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

8 years ago

Attachment #8893525 - Flags: review?(jmaher) → review+

Ryan VanderMeulen [:RyanVM]

Comment 26

•

8 years ago

This is happening with unnerving regularity on Beta now too since the uplift yesterday, but I guess it's not anything specific to that.

status-firefox56: --- → affected

status-firefox57: --- → affected

Geoff Brown [:gbrown]

Assignee

Updated

•

8 years ago

Keywords: leave-open

Pulsebot

Comment 27

•

8 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3d0c24a37df9 Run linux64-asan/opt mochitest-devtools-chrome tests on xlarge instances, experimentally; r=jmaher

Andrew McCreight [:mccr8]

Comment 28

•

8 years ago

If this doesn't help, maybe you could talk to some of the fuzzing people. I think they have experience with running ASan builds on AWS.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 29

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3d0c24a37df9

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Whiteboard: [stockwell needswork] → [stockwell needswork][checkin-needed-beta]

Ryan VanderMeulen [:RyanVM]

Comment 30

•

8 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/5fb77be6beb6 Not that I actually expect this to help much on Beta since they've all been mochitest-bc there.

Whiteboard: [stockwell needswork][checkin-needed-beta] → [stockwell needswork]

Comment hidden (Intermittent Failures Robot)

Ryan VanderMeulen [:RyanVM]

Comment 33

•

8 years ago

No devtools failures since comment 29. Can we consider doing the same for browser-chrome now?

Flags: needinfo?(gbrown)

Comment hidden (Intermittent Failures Robot)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

8 years ago

Whiteboard: [stockwell needswork] → [stockwell fixed:other]

Geoff Brown [:gbrown]

Assignee

Comment 35

•

8 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #33) > No devtools failures since comment 29. Can we consider doing the same for > browser-chrome now? But there have only been 3 other failures since then too. I'd like to wait for more definitive evidence. Will continue to watch...

Flags: needinfo?(gbrown)

Keywords: leave-open

Whiteboard: [stockwell fixed:other] → [stockwell unknown]

Geoff Brown [:gbrown]

Assignee

Comment 36

•

8 years ago

Attached patch return to legacy instances for linux64-asan/opt mochitest-devtools-chrome — Details — Splinter Review

There have been no failures reported here since Aug 21, on any test type. I don't think my devtools change made any difference...this just fixed itself somehow.

Attachment #8910977 - Flags: review?(ryanvm)

Geoff Brown [:gbrown]

Assignee

Comment 37

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=15920d7d5ff36ed5ba8fdc7a1aebd1346b7a7815

Ryan VanderMeulen [:RyanVM]

Comment 38

•

8 years ago

Comment on attachment 8910977 [details] [diff] [review] return to legacy instances for linux64-asan/opt mochitest-devtools-chrome Review of attachment 8910977 [details] [diff] [review]: ----------------------------------------------------------------- rs=me, I guess.

Attachment #8910977 - Flags: review?(ryanvm) → review+

Pulsebot

Comment 39

•

8 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/ad80a6d082c4 Change instance type for linux64-asan mochitest-devtools tests; r=ryanvm

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 40

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ad80a6d082c4

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox58: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox56: affected → wontfix

Ryan VanderMeulen [:RyanVM]

Comment 41

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/628445067e7c

status-firefox57: affected → fixed

Gregory Szorc [:gps]

Updated

•

8 years ago

Blocks: 1429595

try using xlarge for linux64-asan/opt mochitest-devtools-chrome 8 years ago Geoff Brown [:gbrown] 1.07 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
return to legacy instances for linux64-asan/opt mochitest-devtools-chrome 8 years ago Geoff Brown [:gbrown] 1.05 KB, patch	RyanVM : review+	Details \| Diff \| Splinter Review