Closed Bug 1273258 Opened 8 years ago Closed 6 years ago

Browser crash in emunittest suite when loading up the second test page

Categories

(Core :: IPC, defect, P3)

x86_64
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox48 --- affected
firefox49 --- affected

People

(Reporter: jujjyl, Unassigned)

References

Details

(Whiteboard: btpp-active)

Crash Data

Attachments

(2 files)

Attached file crash_bisect_log.txt
Bug 1268616 seems to have introduced a crash regression in emunittest 0.5 suite, when the suite is run in full with options "cpuprofiler enabled" and "vsync disabled".

Crash reports:

https://crash-stats.mozilla.com/report/index/cb7b1d18-c686-493c-8a49-dbd322160516
https://crash-stats.mozilla.com/report/index/1f37a564-eee7-4739-b7c9-1ac462160516
https://crash-stats.mozilla.com/report/index/fe9b882e-0f79-4a9e-8162-63d502160516
https://crash-stats.mozilla.com/report/index/dd951099-be20-45cc-b2aa-33d272160516

The crash occurs even with e10s disabled.

Attached the mozregressions bisection log which led to

20:01.23 INFO: Narrowed inbound regression window from [de5ab3fd, bd5c9cfe] (3 revisions) to [92eb1555, bd5c9cfe] (2 revisions) (~1 steps left)
20:01.24 INFO: Oh noes, no (more) inbound revisions :(
20:01.24 INFO: Last good revision: 92eb1555b47c07af66e17c08cac4ed8bb4e67aa5
20:01.25 INFO: First bad revision: bd5c9cfed83db345be921735e61a8bdd63070841
20:01.25 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=92eb1555b47c07af66e17c08cac4ed8bb4e67aa5&tochange=bd5c9cfed83db345be921735e61a8bdd63070841

20:02.40 INFO: Looks like the following bug has the changes which introduced the regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=1268616

+++ This bug was initially created as a clone of Bug #1268616 +++
Summary: Very large allocations in call to input_overflow_buf_.reserve() → Browser crash in emunittest suite when loading up the second test page
The stacks in those crash reports are pretty bad.

I wouldn't be surprised if something is just sending a ton of data via IPC, but we'd have to know what was actually doing it.
Any chance you could get a stack trace Jukka? Also, how does one run emunittest?
Attached file crash_bisect_log_2.txt
For good measure, reran the bisection process in case there might be some nondeterminism, but the second run did end to the same offending commit.
(In reply to Jukka Jylänki from comment #3)
> For good measure, reran the bisection process in case there might be some
> nondeterminism, but the second run did end to the same offending commit.

That patch limited the size of IPC messages to 128MB, and from previous OOM bugs you've filed, we've seen that this test suite sends some very large messages, so I'm not surprised that this just made the test suite crash all the time instead of just sometimes. We need a stack trace so that we can figure out what IPDL protocol is sending large messages. The stacks in comment 0 are not usable to determine this, I think because of known Windows crash stats issues.
Blocks: 1268616
No longer depends on: 1268616
Same kind of feeling crash (i.e. crash stats not having a call stack, only appears on recent Nightly) was reported on the Ski Safari facebook asm.js game at https://apps.facebook.com/skisafari/, although that didn't crash for me on a Windows system at least.

I believe not having the callstack is an effect of the crash, and not the OS, since the reports regarding Ski Safari were from OS X, and they also had very bogus looking crash reports:

https://crash-stats.mozilla.com/report/index/ce634560-b272-46c2-bd13-083ea2160513
https://crash-stats.mozilla.com/report/index/aebf1502-3b01-47f8-878f-2baac2160513
https://crash-stats.mozilla.com/report/index/d24c7618-ae82-4ba9-b7ed-685ec2160513
That is peculiar. I'm not sure how that would happen, as the regressing patch mostly just adds a MOZ_RELEASE_ASSERT().
(I don't know how that would make the stacks bad, I mean.)
For a public test case that reproduces for me reliably on Windows, visit the following page twice:

https://s3.amazonaws.com/mozilla-games/emunittest-0.4/2015-08-28-emunittest_0.4-AngryBots-u5.1.3f1_hg-e1.34.6-release-prof/index.html?playback&novsync&cpuprofiler

i.e. after the test finishes, hit enter on address bar to reload the page. The crash occurs somewhere in the middle of the second run.

Crash reports:
https://crash-stats.mozilla.com/report/index/6c4be2a4-77dc-41e1-9435-cf5a22160516
https://crash-stats.mozilla.com/report/index/1061a70a-57f3-41fc-b704-4bd162160516
https://crash-stats.mozilla.com/report/index/37da4d4f-d56b-4c2b-b107-6d36f2160516
https://crash-stats.mozilla.com/report/index/2bcea8b1-dc4a-4e0c-835b-3a5972160516

which don't make much sense in the callstacks. These had e10s disabled.
Crash volume for signature 'OOM | large | NS_ABORT_OOM | Buffer::try_realloc':
 - nightly (version 51): 0 crashes from 2016-08-01.
 - aurora  (version 50): 0 crashes from 2016-08-01.
 - beta    (version 49): 0 crashes from 2016-08-02.
 - release (version 48): 835 crashes from 2016-07-25.
 - esr     (version 45): 0 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       0
 - aurora        0       0       0
 - beta          0       0       0
 - release     364     211     100
 - esr           0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly
 - aurora
 - beta
 - release #76       #62
 - esr
Priority: P1 → P3
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: