Closed Bug 1164816 Opened 5 years ago Closed 4 years ago

symbolstore.py will hang if a background task raises an unhandled exception

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox41 --- fixed
firefox42 --- fixed
firefox43 --- fixed

People

(Reporter: ted, Assigned: ted)

References

Details

Attachments

(3 files, 1 obsolete file)

Due to the way that symbolstore.py manages its background tasks, if one raises an unhandled exception the script will hang. The patch for bug 528092 neglected to update a mock in a unit test and this caused a hang in "make check", which was bad.

This is because the main process is waiting on jobs_condition, but that only gets signalled by way of the callback on Pool.apply_async:
https://hg.mozilla.org/mozilla-central/annotate/1fab94ad196c/toolkit/crashreporter/tools/symbolstore.py#l574

However, the docs state that the callback isn't called if the call fails:
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.apply_async

apply_async does return an AsyncResult object with methods to get the result regardless of success, but the multiprocessing module doesn't have a way to wait for the first result of many AsyncResult objects. I decided to rewrite it to use concurrent.futures instead, which does have a wait() method to wait on the first result of a set of futures.
Assignee: nobody → ted
Attached file MozReview Request: bz://1164816/ted (obsolete) —
/r/8745 - bug 1164816 - Import concurrent.futures into the tree. r?gps
/r/8747 - bug 1164816 - Refactor symbolstore.py to remove WorkerInitializer. r?gps
/r/8749 - bug 1164816 - Rewrite symbolstore.py to use concurrent.futures. r?gps

Pull down these commits:

hg pull -r d8114097bd43e9ae017fc4782e9570bd5c1b72a0 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8605807 - Flags: review?(gps)
https://reviewboard.mozilla.org/r/8745/#review7461

I also have a patch series somewhere where I did this. I'm glad concurrent.futures will finally live in the tree!
Comment on attachment 8605807 [details]
MozReview Request: bz://1164816/ted

https://reviewboard.mozilla.org/r/8743/#review7467

Ship It!
Attachment #8605807 - Flags: review?(gps) → review+
https://reviewboard.mozilla.org/r/8743/#review7469

I granted r+, but it looks like Try is unhappy :/
Comment on attachment 8605807 [details]
MozReview Request: bz://1164816/ted

/r/8745 - bug 1164816 - Import concurrent.futures into the tree. r?gps
/r/8747 - bug 1164816 - Refactor symbolstore.py to remove WorkerInitializer. r?gps
/r/8749 - bug 1164816 - Rewrite symbolstore.py to use concurrent.futures. r?gps

Pull down these commits:

hg pull -r bf4f776a038f06c6c686e36b093195d4825b113d https://reviewboard-hg.mozilla.org/gecko/
Attachment #8605807 - Flags: review+ → review?(gps)
https://reviewboard.mozilla.org/r/8743/#review7487

Yeah, I checked this out locally, it turns out having the tests use ThreadPoolExecutor and the actual script use ProcessPoolExecutor means you can miss things being broken! Specifically, ProcessPoolExecutor needs to pickle all the args you pass to submit(), and one of those is the Dumper in this case, and I added that jobs dict which contains Futures, which are not pickleable. I broke the jobs/executor out into a separate class here, and wrote a test that calls the script like buildsymbols does for my own sanity.
https://reviewboard.mozilla.org/r/8749/#review7799

LGTM! Love Review Board's interdiff support :)

::: toolkit/crashreporter/tools/symbolstore.py:359
(Diff revision 2)
> +class JobPool:

Any reason for not using `class JobPool(object):` here? If so, please document it. If not, please use new-style declarations.

::: toolkit/crashreporter/tools/unit-symbolstore.py:476
(Diff revision 2)
> +            return

raise unittest.SkipTest('...')
Comment on attachment 8605807 [details]
MozReview Request: bz://1164816/ted

https://reviewboard.mozilla.org/r/8743/#review7801

Ship It!
Attachment #8605807 - Flags: review?(gps) → review+
Attachment #8605807 - Attachment is obsolete: true
Attachment #8620291 - Flags: review+
Attachment #8620292 - Flags: review+
Attachment #8620293 - Flags: review+
The Mac failure I hit when I pushed this to Try earlier was pretty dumb: I was missing a comma in a single-item tuple. I fixed that and verified buildsymbols and the unit tests work locally on my Mac. I'm pushing to Try again to make sure there's nothing else hiding, and if that goes green I'll land this.
This worked everywhere except the new unit test failed on the Mac Universal build (of course). I think I can fix that without much trouble.
url:        https://hg.mozilla.org/integration/mozilla-inbound/rev/c1cbf483a450ae2fb878344e5c9af56086205b67
changeset:  c1cbf483a450ae2fb878344e5c9af56086205b67
user:       Ted Mielczarek <ted@mielczarek.org>
date:       Tue May 12 14:52:06 2015 -0400
description:
bug 1164816 - Import concurrent.futures into the tree. r=gps

url:        https://hg.mozilla.org/integration/mozilla-inbound/rev/4a3aa2b923fb66df719199be506e6eef7feb7d2a
changeset:  4a3aa2b923fb66df719199be506e6eef7feb7d2a
user:       Ted Mielczarek <ted@mielczarek.org>
date:       Wed May 13 14:14:08 2015 -0400
description:
bug 1164816 - Refactor symbolstore.py to remove WorkerInitializer. r=gps

concurrent.futures doesn't have a WorkerInitializer equivalent to what
multiprocessing.Pool has, so refactor things slightly to remove that dependency.

url:        https://hg.mozilla.org/integration/mozilla-inbound/rev/d95b0ae9564c1db448941e547839c7cca42bbd51
changeset:  d95b0ae9564c1db448941e547839c7cca42bbd51
user:       Ted Mielczarek <ted@mielczarek.org>
date:       Wed May 13 14:50:11 2015 -0400
description:
bug 1164816 - Rewrite symbolstore.py to use concurrent.futures. r=gps
CLOSED TREE
Blocks: 1191877
You need to log in before you can comment on or make changes to this bug.