Closed Bug 706071 Opened 13 years ago Closed 6 years ago

deep test timeouts since rev 6732:0d3188bf9ed0

Categories

(Tamarin Graveyard :: Build Config, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: pnkfelix, Assigned: dschaffe)

Details

Attachments

(1 file)

From:

  http://asteam.corp.adobe.com/irc/log/ttlogger/tamarin/20111129

fklockii: brbaker: do you have any clue what's going on with the deep tests?  Skimming over the changeset you narrowed the problem down to, I don't see how its possibly related.  I'm unable to replicate the timeout issue locally (since the timeout is originating from buildbot, it seems), and I'm hesitant to backout the changeset without being able to get some confidence that doing so will fix things

...

brbaker: and Dan Schaffer added a call to self.getAscVersion(self.asc) which is what was causing the runs to hang before

brbaker: hmmm although where that code was added shouldn't be causing issues

fklockii: From looking at the log of the failure, it seems like the runtests.py is running to completion (in terms of it getting through all of its own output during its own five minute run), but the buildbot timeout monitor is clamining that it isn't producing output for 20 minutes… Does this sound like an accurate inference, or am I misinterpreting?

brbaker: That seems like a correct interpretation....

...

brbaker: I just ran a quick -dgreedy run and I got some python errors trying to shut down, and the process has not completed cleanly

brbaker: still running even though I do have the output of # of tests passing/failing etc

brbaker: so for buildbot, the process would sit like this for 20 minutes and then it would be killed

...

brbaker: ok, I suggest we punt to Dan on this and just revert the change

fklockii: yeah okay I'm starting to believe that is a sane plan of attack

brbaker: rev 6731 passed, 6732 fails, and it is 6732 where this was changed
Assignee: nobody → dschaffe
changeset: 6746:dd5a8ee68da0
user:      Brent Baker <brbaker@adobe.com>
summary:   Bug 706071: Backed out changeset 0d3188bf9ed0 which was pushed for Bug 689592 (inspired by=brbaker, author=fklockii, r=fklockii, pusher=fklockii, further questions=brbaker).

http://hg.mozilla.org/tamarin-redux/rev/dd5a8ee68da0
changeset: 6747:59d86dbd7381
user:      Brent Baker <brbaker@adobe.com>
summary:   Bug 706071: Merged backout of changeset 0d3188bf9ed0 which was pushed for Bug 689592 (inspired by=brbaker, author=fklockii, r=fklockii, pusher=fklockii, further questions=brbaker).

http://hg.mozilla.org/tamarin-redux/rev/59d86dbd7381
Unfortunately the backout logged in comment 1 and comment 2 does not seem to have brought deep into a total green state; win64-deep and linux-arm-deep are both still having timeout issues.

I wouldn't be surprised if we just have landed some set of tests that are taking a bit too long to run.  (I also wouldn't be completely surprised if I am the source of those tests.)  But this isn't really a full explanation; the timeout failures in the backed-out changeset look like a situation where "runtests.py is running to completion ... but the buildbot timeout monitor is clamining that it isn't producing output for 20 minutes", as discussed in comment 0.

Resolving this in one way or another seems like a very high priority item; I am not eager to start a new Tamarin-to-FRmain integration with any red cells in the deep runs.

It seems like resolving Bug 618980 (which strikes me as a work item) might do a lot of good here.
changeset: 6754:b442a94598f0
user:      Dan Schaffer <dschaffe@adobe.com>
summary:   bug 706071: fix bug causing deep hang from runtests.py --timeout=,  original patch is slightly modified from bug 689592 (r=trbaker)

http://hg.mozilla.org/tamarin-redux/rev/b442a94598f0
Attachment #579059 - Flags: review?(brbaker)
Attachment #579059 - Flags: review?(brbaker) → review+
changeset: 6768:cabd5079eecb
user:      Dan Schaffer <dschaffe@adobe.com>
summary:   bug 706071: workaround for deep failures with --timeout by running tests with threads==1 (r=brbaker)

http://hg.mozilla.org/tamarin-redux/rev/cabd5079eecb
changeset: 7133:509206e6ba68
user:      Brent Baker <brbaker@adobe.com>
summary:   bug 706071: revert workaround for deep failures with --timeout by running tests with threads==1

http://hg.mozilla.org/tamarin-redux/rev/509206e6ba68
changeset: 7134:f2aeb3df7cbe
user:      Brent Baker <brbaker@adobe.com>
summary:   Bug 706071: revert the revert. It was discussed prior to the holiday break that this workaround may not be required anymore... It appears that it is still required since the first deep run did hang again in the mac-deep Release-Dgreedy run. Putting the workaround back in place

http://hg.mozilla.org/tamarin-redux/rev/f2aeb3df7cbe
(In reply to Tamarin Bot from comment #8)
> It was discussed prior to the
> holiday break that this workaround may not be required anymore... It appears
> that it is still required since the first deep run did hang again in the
> mac-deep Release-Dgreedy run.

Just to supply a touch more context: The reason I hypothesized the work-around might be unnecessary was because I pushed a fix for how threadpool.py handles exceptions, as logged on Bug 710587, comment 4

But it appears that my hypothesis was incorrect, in that the threadpool.py update does not seem to have fixed the mac-deep -Dgreedy hangs.
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: