969986 - jittest suite sometimes fails to detect and kill hangs

Reporter

Description

•

12 years ago

Since the switch to m3.medium instances for linux64 test slaves caused linux64 asan and debug jittests to either frequently or perhaps always hang, you can see from https://tbpl.mozilla.org/php/getParsedLog.php?id=34017372&tree=Cedar and https://tbpl.mozilla.org/php/getParsedLog.php?id=34017387&tree=Cedar and https://tbpl.mozilla.org/php/getParsedLog.php?id=34355492&tree=Try that rather than noticing at any point during the 30 minutes that it sits hung with no output, we just wait until the 7200 second total job timer expires and kills the step. The proper thing for a suite that wants to be visible to do would be "TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/jaeger/bug781859-1.js | Timed out after 330 seconds with no output" rather than an utterly generic and wasteful of 30 or 90 minutes "command timed out: 7200 seconds elapsed, attempting to kill".

Dan Minor [:dminor]

Comment 1

•

12 years ago

There is a timeout parameter which defaults to 150 seconds, which I'm fairly certain I've seen work propery in the past. Not sure what went wrong here.

Chris AtLee [:catlee]

Comment 2

•

12 years ago

Where does the harness live?

Dan Minor [:dminor]

Comment 3

•

12 years ago

It lives at: js/src/jit-test/jit_test.py.

Dan Minor [:dminor]

Updated

•

12 years ago

Component: General Automation → JavaScript Engine

Product: Release Engineering → Core

QA Contact: catlee

Summary: jittest suite needs to detect and kill hangs → jittest suite sometimes fails to detect and kill hangs

Phil Ringnalda (:philor)

Reporter

Comment 4

•

12 years ago

I may well have been wrong on my description but right on my choice of product - someone with an m3.medium loaner would need to run the suite to see, but since jit_test.py most certainly does have timeout handling, the failure mode on m3.medium could well be "jit_test.py itself hangs, but we haven't set a particular timeout or no-output-timeout on the step running it, so we just run the whole job out to 7200 seconds."

Dan Minor [:dminor]

Comment 5

•

12 years ago

(In reply to Phil Ringnalda (:philor) from comment #4) > I may well have been wrong on my description but right on my choice of > product - someone with an m3.medium loaner would need to run the suite to > see, but since jit_test.py most certainly does have timeout handling, the > failure mode on m3.medium could well be "jit_test.py itself hangs, but we > haven't set a particular timeout or no-output-timeout on the step running > it, so we just run the whole job out to 7200 seconds." There is a 20 minute no output timeout that I've hit before when playing with the timeout settings (e.g. https://tbpl.mozilla.org/?tree=Try&rev=6f78e053742c&showall=1). I initially read this as test harness bug, but if that timer only fails on m3.medium test machines then I'll be happy to move it back to the releng component.

Phil Ringnalda (:philor)

Reporter

Comment 6

•

12 years ago

Dunno, it's sort of inconvenient having them currently be unable to start. I see that I failed to notice that my Try log wasn't even hung at all, it started a new test during the second when it was killed for taking 7200 seconds, so that was just a jaw-droppingly slow run.

Dan Minor [:dminor]

Updated

•

12 years ago

Blocks: 973900

Dan Minor [:dminor]

Comment 7

•

11 years ago

Has this problem shown up in any recent test runs?

Flags: needinfo?(philringnalda)

Phil Ringnalda (:philor)

Reporter

Comment 8

•

11 years ago

No idea: Cedar is a killing field, m3.medium instances are gone, and until yesterday only ASan ran it on try.

Flags: needinfo?(philringnalda)

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Bugzilla

jittest suite sometimes fails to detect and kill hangs

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: philor, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated