1345093 - Windows reftest timeouts after "JavaScript error: chrome://reftest/content/reftest.jsm, line 1741: NS_ERROR_FAILURE:"

reftest.jsm is preprocessed. Are the line numbers in the errors adjusted for that, or do you need to unpack the version after preprocessing to see what's on line 1741?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 4

•

9 years ago

Ah, good point. I downloaded the zip file [1] for the try push I linked above, and looked at the post-processed reftest.jsm. Line 1741 points to [2] which seems like a more reasonable culprit. [1] https://queue.taskcluster.net/v1/task/bQvMnnkFToiXdf5VpYy8Xw/artifacts/public/build/firefox-54.0a1.en-US.win32.reftest.tests.zip [2] http://searchfox.org/mozilla-central/rev/1bc7720a434af3c13b68b8d69f92078cae8890c4/layout/tools/reftest/reftest.jsm#1757

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 5

•

9 years ago

Also it seems like some (but not all) of the failing reftest jobs have gfx error output, for example: 05:52:19 INFO - [GFX1-]: Failed 2 buffer db=00000000 dw=00000000 for 0, 0, 800, 1000 and this shows up just before the NS_ERROR_FAILURE. My best guess is that we're running out of memory, and so both the GFX code and toDataURL are failing as a result.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 6

•

9 years ago

Attached file Virt samples — Details

I took the resource-usage file from a sample failing job [1] and extracted the "virt" samples. The sample with smallest "free" value is still 1233567744 - assuming this is virtual memory data during the run (I assume it is, but it's not documented) that means the free memory never dropped below 1.2G. That makes it pretty unlikely that running out of memory is the problem, but maybe we have really bad memory fragmentation? [1] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/77b2dbbe62bccc9204f81d0884a514db0bb6834fc5af5eda5e5afb84e7cbb4015e69db7e28d1b37801d8da4277e5b5132bf24809d54c1143d7c12a8d21c935b1

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 7

•

9 years ago

This is starting to sound a lot like bug 1300355.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

9 years ago

Comment 8

•

9 years ago

Indeed it is. But in this case at least the error is being reported to Javascript, so we should be able to catch and force-trigger a GC/CC and try again. I'll write a patch that does that.

Ryan VanderMeulen [:RyanVM]

Comment 9

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0) > It would be super if TreeHerder or the reftest harness or mozharness flagged > the JavaScript error output as relevant, so that sheriffs can more easily > annotate these failures as being the same thing, rather than ending up > filing tons of one-off bugs. Bug 1299274 was my idea. Hasn't gotten any traction since.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 10

•

9 years ago

Try push with the OOM guard: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7df09d6534a28046016fe8fd9846795ea3672fe0 While testing the patch locally I realized that the error line (a call to canvas.toDataURL()) only actually gets run if the test is failing anyway. So this might actually be exactly the same as bug 1300355 but with a different manifestation (i.e. there's an OOM that causes the test to fail, but then during the reporting of the failure there's another OOM that causes this hang). Anyhow, I'll retrigger the reftest a bunch of times and see what happens.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 11

•

9 years ago

So in https://treeherder.mozilla.org/logviewer.html#?job_id=82479713&repo=try&lineNumber=61373 it triggered my oom guard (I can see the output from it) but my recovery attempt by running a GC/CC didn't seem to help. It failed the allocation even after that, and ran into the same timeout. So whatever is fragmenting the memory isn't helped by GC'ing. We should probably fix the fragmentation - I left a comment in bug 1300355.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Updated

•

9 years ago

Depends on: 1346033

Comment hidden (Intermittent Failures Robot)

Jamie Nicol [:jnicol]

Updated

•

9 years ago

Depends on: 1341664

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Bugzilla

Windows reftest timeouts after "JavaScript error: chrome://reftest/content/reftest.jsm, line 1741: NS_ERROR_FAILURE:"

Categories

(Testing :: Reftest, defect)

Tracking

(Not tracked)

People

(Reporter: kats, Unassigned)

References

(Depends on 20 open bugs)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Updated

Updated

Attachment

General

Description

File Name

Content Type