Closed Bug 757684 Opened 13 years ago Closed 13 years ago

stop eating all exceptions in dmg_signpackage

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

Attachments

(1 file)

stop eating exceptions in dmg_signpackage 13 years ago bhearsum@mozilla.com (:bhearsum) 588 bytes, patch	bear : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review

bhearsum@mozilla.com (:bhearsum)

Assignee

Description

•

13 years ago

I've hit this a few times with Mac signing, and it causes awful problems with l10n - it seems to corrupt the objdir in such a way that subsequent builds on the same slave do not work. 2012-05-22 15:50:55,695 - DEBUG - 28218: Exceeded timeout 2012-05-22 15:50:56,695 - DEBUG - 28218: Success! The first line comes from the main thread, here: https://github.com/mozilla/build-tools/blob/master/release/signing/signing-server.py#L130, and is printed right very shortly before kill() is called. The second line comes from the main thread too, here: https://github.com/mozilla/build-tools/blob/master/release/signing/signing-server.py#L155. The first time through the while loop we print out Exceeded timeout, and then proceed into the kill() code. That code runs, and doesn't raise any exception. We then hit this code: https://github.com/mozilla/build-tools/blob/master/release/signing/signing-server.py#L148 which polls the process, and I _assume_ receives a return code and breaks out of the loop. Then we get to: https://github.com/mozilla/build-tools/blob/master/release/signing/signing-server.py#L155 which only prints success if rc = 0. _SO_. It appears to me that the worker processes are somehow getting killed with kill(), as evidenced by the fact that the tarball is corrupt. Additionally, it appears that despite being kill()'ed are returning 0, as evidenced by the fact that we get "success!" in the log. Still digging into how this is possible, and what exactly we can do about it.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 1

•

13 years ago

Attached patch stop eating exceptions in dmg_signpackage — Details — Splinter Review

21:00 < bhearsum|afk> ok, so there's some sort of race or logic error in the signing server that's causing it to kill workers yet have them think they succeeded - full details of that are in https://bugzilla.mozilla.org/show_bug.cgi?id=757684 - the long and short is that we try to kill the workers with SIGINT, and then check the exit code to see what happened. the worker thread is executing some code that (unfortunately) eats all exceptions and returns False, but we don't check the return code of the function that does that 21:00 < bhearsum|afk> _so_, my theory is that the worker is dying, but the exception that the kill causes gets eaten, and the process exits normally Here's the test I did to prove that theory: ➜ tmp cat test.py #!/usr/bin/python import time try: while True: time.sleep(1) except: import traceback traceback.print_exc() ➜ tmp python test.py Traceback (most recent call last): File "test.py", line 7, in <module> time.sleep(1) KeyboardInterrupt ➜ tmp echo $? 0

Assignee: nobody → bhearsum

Attachment #626276 - Flags: review?(bear)

Mike Taylor [:bear]

Updated

•

13 years ago

Attachment #626276 - Flags: review?(bear) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

13 years ago

Attachment #626276 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

13 years ago

Summary: possible race condition in signing server can cause workers to both succeed and time out → stop eating all exceptions in dmg_signpackage

Chris AtLee [:catlee]

Comment 2

•

13 years ago

we should probably update the signing server logic so that it's impossible for a worker that's timed out to be considered successful.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 3

•

13 years ago

OK, I think this is fixed now. I'm getting a different error, but I'm pretty sure it's a build system issue. Looking at that in bug 723176.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 4

•

13 years ago

(In reply to Chris AtLee [:catlee] from comment #2) > we should probably update the signing server logic so that it's impossible > for a worker that's timed out to be considered successful. Filed bug 757692 on this.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: General Automation → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

stop eating all exceptions in dmg_signpackage

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Updated

Updated

Comment 2

Comment 3

Comment 4

Updated

Updated

Attachment

General

Description

File Name

Content Type