Closed Bug 494769 Opened 12 years ago Closed 12 years ago

[MacOSX] mochitest-plain: test_wav_trailing.html (and others) intermittently triggers "malloc: *** error: can't allocate region"

Categories

(Core :: Audio/Video, defect)

1.9.1 Branch
x86
macOS
defect
Not set
normal

Tracking

()

VERIFIED WORKSFORME
mozilla1.9.1

People

(Reporter: sgautherie, Unassigned)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Lately, I've been noticing this quite frequently: (for examples)
{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243160836.1243167228.14501.gz
OS X 10.4 comm-central unit test on 2009/05/24 03:27:16
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243240525.1243243583.22719.gz
OS X 10.4 comm-central unit test on 2009/05/25 01:35:25

*** 28173 INFO Running /tests/content/media/video/test/test_wav_trailing.html...
seamonkey-bin(24704,0x1cc7800) malloc: *** vm_allocate(size=1069056) failed (error code=3)
NEXT ERROR seamonkey-bin(24704,0x1cc7800) malloc: *** error: can't allocate region
seamonkey-bin(24704,0x1cc7800) malloc: *** set a breakpoint in szone_error to debug
[...]
terminate called after throwing an instance of 'St9bad_alloc'
[...]
  what():  St9bad_alloc
TEST-UNEXPECTED-FAIL | (automation.py) | Exited with code -6 during test run
}
Flags: blocking1.9.1?
Blocks: 438871
Whiteboard: [orange]
Flags: blocking1.9.1? → wanted1.9.1+
It looks like this started on 2009/05/16 12:28:07.  I went back another 2-3 weeks before this and couldn't find any earlier occurrences.  It's always the same size allocation that starts failing, but it's not always during the same test, sometimes it's test_wav_trailing, other times it's test_wav_ended1.

There were some checkins to the media code before this build (http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?startdate=2009-05-16+00%3A00%3A00&enddate=2009-05-16+12%3A38%3A07), but nothing that looks suspicious.
It'd be nice to know why we see this on the SeaMonkey OS X 10.4 unit test machine but not the Firefox 3.5 OS X 10.5 one.  It seems unlikely, maybe this problem only occurs on OS X 10.4?

An optimized build of mozilla-1.9.1 trunk against the 10.4 SDK (but on 10.5) doesn't reveal any vm_allocate allocations anywhere near this large while running the media tests.  But that's not too surprising, that vm_allocate call may only happen when growing internal malloc buffers and thus could be dependent on many things.

Since the tests aren't directly causing an allocation that large (at least, not on my configuration), catching this particular vm_allocate in a debugger probably isn't that useful, because I'm guessing that we're just running out of address space for some reason, and that happens to be the allocation that ends up failing.
Irc, KaiRo wrote "this machine only has 512MB of RAM".
Let me disable the tests ftb, like bug 471085:
this happens very frequently and SeaMonkey box(es) need a chance to go green...
Attachment #380244 - Flags: review?
Attachment #380244 - Flags: review? → review?(roc)
Are these failures happening on comm-central?
Yes, on cb-sea-miniosx01 only.
Comment on attachment 380244 [details] [diff] [review]
(Av1-191) Disable wave test(s) on MacOSX SeaMonkey
[Backout: Comment 28]


http://hg.mozilla.org/releases/mozilla-1.9.1/rev/f8cdd1f61eff
Attachment #380244 - Attachment description: (Av1-191) Disable wave test(s) on MacOSX SeaMonkey → (Av1-191) Disable wave test(s) on MacOSX SeaMonkey [Checkin: Comment 7]
Whiteboard: [orange] → [test(s) disabled on SM/MacOSX] [orange]
Blocks: SmTestFail
(In reply to comment #6)
> Yes, on cb-sea-miniosx01 only.

Doesn't make me happy with disabling then, as that box is slated to die soon and the Leopard boxes should take over then. I hope you'll remember to activate it again at least when that has happened.
Blocks: 494671
(In reply to comment #8)

> Doesn't make me happy with disabling then, as that box is slated to die soon

I know we (seem to) have two different approaches:
I want to try whatever we can to get back to a green state, without keeping
painful oranges we have no solutions to solve atm.

NB: As a note, I'm a little confused/disappointed as you reply me both that
"new SM-Ports is not in production yet" and "current SM production will be
replaced soon". It looks like a "no-no" to me :-/

> and the Leopard boxes should take over then.

Well, that box(es) is the one in the worst shape currently,
yet, it looks like this disabling may have actually helped it too... :-)

> I hope you'll remember to activate it again at least when that has happened.

Yes, the bugs stay open.!.
Blocks: 493450
No longer blocks: 494671
Now, next (wave) test is showing this bug:

http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243600872.1243607597.25938.gz
OS X 10.4 comm-central unit test on 2009/05/29 05:41:12
{
28093 INFO Running /tests/content/media/video/test/test_wav_ended2.html...
seamonkey-bin(8411,0xcd9dc00) malloc: *** vm_allocate(size=1069056) failed (error code=3)
NEXT ERROR seamonkey-bin(8411,0xcd9dc00) malloc: *** error: can't allocate region
}

I'll disable it too, if it happens too often.

***

KaiRo, did you rule out that this box might need a reboot or something?
Or do you believe its 512MB are just not enough (anymore)? (because there are more tests to handle or whatever)
No idea, it could be that it's just too little RAM to usefully test, or could be something not playing nice with our code on Tiger. I remember we had that problem and I rebooted it some time in between, so I don't think it's that. This is not Windows (*g*).
The Wave tests aren't specifically using a lot of memory, so disabling them one by one probably isn't going to achieve much.  Unless someone can reproduce this locally and work out the real cause or we can get similar data from that problematic box, this is going to be tricky to solve.

I think vm_allocate is failing because we're running out of virtual address space, not physical memory (OS X overcommits, and swap is enabled by default anyway).  If it was a physical memory problem, I think we'd just end up swapping badly and see test timeouts...
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243671807.1243675263.7730.gz
OS X 10.4 comm-central unit test on 2009/05/30 01:23:27
(In reply to comment #12)
> The Wave tests aren't specifically using a lot of memory, so disabling them one
> by one probably isn't going to achieve much.

At least, it quiets this box, which is my main goal ftb.

> I think vm_allocate is failing because we're running out of virtual address
> space

"Not enough disk space" then?
(In reply to comment #2)
> It'd be nice to know why we see this on the SeaMonkey OS X 10.4 unit test
> machine but not the Firefox 3.5 OS X 10.5 one.

Actually, it happens on Firefox (3.5) too:
(I just found this (example) while looking for another bug...)

{
Building on: bm-xserve16

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.5/1243677475.1243679463.15473.gz
OS X 10.5.2 mozilla-1.9.1 unit test on 2009/05/30 02:57:55

28442 INFO Running /tests/content/media/video/test/test_wav_ended1.html...
firefox-bin(97213,0xb03b1000) malloc: *** mmap(size=2097152) failed (error code=12)
NEXT ERROR *** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
}

Fwiw, not the same error code...
Summary: [SeaMonkey, MacOSX] mochitest-plain: test_wav_trailing.html intermittently triggers "malloc: *** error: can't allocate region" → [MacOSX] mochitest-plain: test_wav_trailing.html intermittently triggers "malloc: *** error: can't allocate region"
No longer blocks: SmTestFail
It's really useful to know that this is happening on that machine too, thank you!

That's a 10.5 machine, so it's not too surprising the error is slightly different.  The meaning is the same (errno 12 from mmap is ENOMEM).  I was going to try reproducing this on the 10.4 Mac Mini in the office when I was back from holiday, but since this also happens on 10.5 I can try using my laptop.
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243755523.1243758785.25801.gz
OS X 10.4 comm-central unit test on 2009/05/31 00:38:43
Summary: [MacOSX] mochitest-plain: test_wav_trailing.html intermittently triggers "malloc: *** error: can't allocate region" → [MacOSX] mochitest-plain: test_wav_trailing.html (and others) intermittently triggers "malloc: *** error: can't allocate region"
Let's quiet the SM box some more and find out which (other) tests are triggering this error: might give a clue on what the cause is.

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/bb038569028f
(Bv1-191) Disable test_wav_ended2.html on MacOSX SeaMonkey
Whiteboard: [test(s) disabled on SM/MacOSX] [orange] → [3 tests disabled on SM/MacOSX] [orange]
{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243791517.1243794783.10738.gz
OS X 10.4 comm-central unit test on 2009/05/31 10:38:37

28147 INFO TEST-PASS | /tests/content/media/video/test/test_wav_trunc.html | Duration should be around 1.8: 1.8100680112838745
seamonkey-bin(27219,0x76c6c00) malloc: *** vm_allocate(size=1069056) failed (error code=3)
}

{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243844092.1243847303.27129.gz
OS X 10.4 comm-central unit test on 2009/06/01 01:14:52

seamonkey-bin(8502,0x7141000) malloc: *** vm_allocate(size=1069056) failed (error code=3)
}
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/1cc01132dd89
(Cv1-191) Disable test_wav_trunc.html on MacOSX SeaMonkey
Whiteboard: [3 tests disabled on SM/MacOSX] [orange] → [4 tests disabled on SM/MacOSX] [orange]
{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243878965.1243882191.5517.gz
OS X 10.4 comm-central unit test on 2009/06/01 10:56:05

28100 INFO TEST-PASS | /tests/content/media/video/test/test_wav_onloadedmetadata.html | No more than 1 onloadeddata events
seamonkey-bin(19935,0x74f4a00) malloc: *** vm_allocate(size=1069056) failed (error code=3)
seamonkey-bin(19935,0x74f4a00) malloc: *** error: can't allocate region
}
Is it possible to get ssh access to the builder that this is happening on frequently?  Or, alternatively, if I were to write up some instructions to investigate it, can someone work through them for me?

I've been trying to reproduce this locally on my 10.5 Macbook Pro, but I haven't seen it so far (after 50 continuous iterations of the full mochitest suite).

Once I get some more free time, I can set up continuous runs on our 10.4 Mac mini, but I suspect getting some kind of access to this builder is going to be a quicker way to find the root cause.
Blocks: 494120
{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243908228.1243911539.30383.gz
OS X 10.4 comm-central unit test on 2009/06/01 19:03:48

28108 INFO TEST-PASS | /tests/content/media/video/test/test_wav_list.html | Duration should be around 4.2: 4.264124870300293
seamonkey-bin(9534,0x1ad5600) malloc: *** vm_allocate(size=1069056) failed (error code=3)
seamonkey-bin(9534,0x1ad5600) malloc: *** error: can't allocate region
}
No longer blocks: 494120
(In reply to comment #23)
> http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1243908228.1243911539.30383.gz
> OS X 10.4 comm-central unit test on 2009/06/01 19:03:48

This bug never happened again on this box since this last failure.

Possible fix timeframe:
http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?startdate=+2009-06-01+18%3A25%3A52&enddate=+2009-06-03+20%3A26%3A31
especially the audio fixes between Jun 01 23:10:23 2009 -0700 and Jun 02 02:06:20 2009 -0700.
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/af6a737e35c4
(Dv1-191) Re-enable test_wav_trunc.html on MacOSX SeaMonkey

Undo Cv1-191, and sort the file names.
Whiteboard: [4 tests disabled on SM/MacOSX] [orange] → [3 tests disabled on SM/MacOSX] [orange]
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/2626fd551f31
(Ev1-191) Re-enable test_wav_ended2.html on MacOSX SeaMonkey

Undo Bv1-191, and sort test names.
http://hg.mozilla.org/mozilla-central/rev/26d9acfe0092
(Fv1) reorder file list
Whiteboard: [3 tests disabled on SM/MacOSX] [orange] → [2 tests disabled on SM/MacOSX] [orange]
Comment on attachment 380244 [details] [diff] [review]
(Av1-191) Disable wave test(s) on MacOSX SeaMonkey
[Backout: Comment 28]


http://hg.mozilla.org/releases/mozilla-1.9.1/rev/a339028e68c3


Bug 493450 / bug 494671 may not like (all) these tests,
but we don't care about this unstable box (anymore).
Attachment #380244 - Attachment description: (Av1-191) Disable wave test(s) on MacOSX SeaMonkey [Checkin: Comment 7] → (Av1-191) Disable wave test(s) on MacOSX SeaMonkey [Backout: Comment 28]
Status: NEW → RESOLVED
Closed: 12 years ago
No longer depends on: 435223
Flags: in-testsuite-
Resolution: --- → WORKSFORME
Whiteboard: [2 tests disabled on SM/MacOSX] [orange] → [orange]
Target Milestone: --- → mozilla1.9.1
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1244722040.1244727765.11225.gz
OS X 10.4 comm-central unit test on 2009/06/11 05:07:20

Error never seen again since comment 23.

V.WFM
Status: RESOLVED → VERIFIED
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.