Closed Bug 543441 Opened 16 years ago Closed 15 years ago

Crash when using process.kill() on a process that has ended

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.9.3a5

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(2 files, 3 obsolete files)

Possible patch 16 years ago Mike Hommey [:glandium] 404 bytes, patch	mossop : review+	Details \| Diff \| Splinter Review
Test 16 years ago Mike Hommey [:glandium] 1.06 KB, patch	mossop : review-	Details \| Diff \| Splinter Review
Test v2 15 years ago Mike Hommey [:glandium] 925 bytes, patch		Details \| Diff \| Splinter Review
Test v2.01 15 years ago Mike Hommey [:glandium] 933 bytes, patch	mossop : review+	Details \| Diff \| Splinter Review
Patch v2 15 years ago Mike Hommey [:glandium] 795 bytes, patch	mossop : review+	Details \| Diff \| Splinter Review

Mike Hommey [:glandium]

Assignee

Description

•

16 years ago

This was uncovered at the same time as bug 543438. When stdin is not a tty during the nsIProcess unit test, TestBlockingProcess returns immediately, and it appears that is it ends way before process.kill is called from the unit test. What happens then is that PR_KillProcess is called with a NULL argument, which is then dereferenced, leading to a segmentation fault. The problem could be considered a NSPR bug, where PR_KillProcess should be checking for non NULLness of its argument (but maybe the API requires that the argument is not NULL). It could also be considered a bug in nsProcess::Kill, which doesn't check if mProcess is NULL before calling PR_KillProcess. I'm going to attach a possible patch, but I'm not convinced by the return value. Should process.kill() call throw an exception when the process is already done ? That doesn't feel quite right.

Mike Hommey [:glandium]

Assignee

Comment 1

•

16 years ago

Attached patch Possible patch (obsolete) — Details — Splinter Review

Assignee: nobody → mh+mozilla

Attachment #424573 - Flags: review?(benjamin)

Benjamin Smedberg

Updated

•

16 years ago

Attachment #424573 - Flags: review?(benjamin) → review?(dtownsend)

Benjamin Smedberg

Comment 2

•

16 years ago

We should be able to have an automated test for this.

Flags: in-testsuite?

Dave Townsend [:mossop]

Comment 3

•

16 years ago

Comment on attachment 424573 [details] [diff] [review] Possible patch This looks ok to me but please can you write up an automated test, it should just be as simple as blocking the main thread until you're sure a background process has exited and then trying to kill it.

Mike Hommey [:glandium]

Assignee

Comment 4

•

16 years ago

(In reply to comment #3) > (From update of attachment 424573 [details] [diff] [review]) > This looks ok to me but please can you write up an automated test, it should > just be as simple as blocking the main thread until you're sure a background > process has exited and then trying to kill it. I'm okay with writing a test for that, I just wanted to make sure we all agree that failing is the right way to handle the condition, since the test would obviously be different if we'd decide not to fail, but silently return.

Reed Loden [:reed]

Updated

•

16 years ago

Status: NEW → ASSIGNED

Dave Townsend [:mossop]

Comment 5

•

16 years ago

Comment on attachment 424573 [details] [diff] [review] Possible patch Failing is correct I think, r+ but don't land without a test

Attachment #424573 - Flags: review?(dtownsend) → review+

Mike Hommey [:glandium]

Assignee

Comment 6

•

16 years ago

(In reply to comment #5) > (From update of attachment 424573 [details] [diff] [review]) > Failing is correct I think, r+ but don't land without a test Just a stupid question: should that be a unit test or a crashtest ?

Benjamin Smedberg

Comment 7

•

16 years ago

xpcshell unit test

Mike Hommey [:glandium]

Assignee

Comment 8

•

16 years ago

Attached patch Test (obsolete) — Details — Splinter Review

I don't like having to use a global variable, but while https://developer.mozilla.org/en/Writing_xpcshell-based_unit_tests claims it's possible to use a function instead of a string for do_timeout(), it doesn't to be the case (yet) on 1.9.1, which is all I have at hand to do the testing. The do_timeout is necessary to avoid race conditions, which happened a few times when testing without a do_timeout.

Attachment #427200 - Flags: review?(dtownsend)

Dave Townsend [:mossop]

Updated

•

16 years ago

Attachment #427200 - Flags: review?(dtownsend) → review-

Dave Townsend [:mossop]

Comment 9

•

16 years ago

Comment on attachment 427200 [details] [diff] [review] Test You need to use a function for do_timeout not a string, this should work on 1.9.2 as well, but using a string will not work on trunk at all which is where we must land first. Perhaps you are seeing a problem because you are trying to call finish_test_kill rather than finish_test_kill_2? Even after correcting both of those the test passes on trunk without your patch to fix the bug applied so this test is not properly testing the bug. >diff --git a/xpcom/tests/unit/test_nsIProcess.js b/xpcom/tests/unit/test_nsIProcess.js >index 06d8e1e..8bbce3e 100644 >--- a/xpcom/tests/unit/test_nsIProcess.js >+++ b/xpcom/tests/unit/test_nsIProcess.js >@@ -185,6 +185,32 @@ function test_notify_killed() > process.kill(); > } > >+var process; >+ >+function finish_test_kill_2() >+{ Nit: Braces go at the end of the function declaration.

Dave Townsend [:mossop]

Comment 10

•

16 years ago

(In reply to comment #8) > Created an attachment (id=427200) [details] > Test > > I don't like having to use a global variable, but while > https://developer.mozilla.org/en/Writing_xpcshell-based_unit_tests claims it's > possible to use a function instead of a string for do_timeout(), it doesn't to > be the case (yet) on 1.9.1, which is all I have at hand to do the testing. You won't be able to use a function on 1.9.1 so you'll need to test on a more recent branch, ideally trunk.

Mike Hommey [:glandium]

Assignee

Comment 11

•

16 years ago

(In reply to comment #9) > (From update of attachment 427200 [details] [diff] [review]) > You need to use a function for do_timeout not a string, this should work on > 1.9.2 as well, but using a string will not work on trunk at all which is where > we must land first. Perhaps you are seeing a problem because you are trying to > call finish_test_kill rather than finish_test_kill_2? I actually edited that after testing, and it worked properly then, but only when using a string. > Even after correcting both of those the test passes on trunk without your patch > to fix the bug applied so this test is not properly testing the bug. Mmmmm interesting. I must say i didn't try the crash case with do_timeout. What happens if you call finish_test_kill_2() directly ?

Mike Hommey [:glandium]

Assignee

Comment 12

•

16 years ago

> You won't be able to use a function on 1.9.1 so you'll need to test on a more > recent branch, ideally trunk. I won't be able to do so before next week, at which point I may be able to test on 1.9.2. Testing on trunk will take more time.

Mike Hommey [:glandium]

Assignee

Comment 13

•

15 years ago

I've been trying to get the test do something useful, but unfortunately, there is some kind of race here. Unless I'm killing right after the process.run(), without a do_timeout, I have a hard time triggering the segfault, and even then it doesn't happen reliably. And when it doesn't segfault, process.kill *does* kill the process because it is not terminated yet... which makes it difficult to have a test that doesn't fail in such cases... So the only way I can think of is to do a process.run/process.kill loop a bunch of times, catching any exception, but no do_check_* or do_throw... Any better idea?

Mike Hommey [:glandium]

Assignee

Comment 14

•

15 years ago

Attached patch Test v2 (obsolete) — Details — Splinter Review

cf. comment 13, this is the best I could come up with. Usually, it takes between 1 and 100 loops to get a crash on my machine without the patch.

Attachment #427200 - Attachment is obsolete: true

Attachment #437287 - Flags: review?(dtownsend)

Mike Hommey [:glandium]

Assignee

Comment 15

•

15 years ago

Attached patch Test v2.01 — Details — Splinter Review

More accurate comment

Attachment #437287 - Attachment is obsolete: true

Attachment #437288 - Flags: review?(dtownsend)

Attachment #437287 - Flags: review?(dtownsend)

Dave Townsend [:mossop]

Comment 16

•

15 years ago

Comment on attachment 437288 [details] [diff] [review] Test v2.01 Just a thought, what if you run the process then delay for a short time using a simple loop and then kill? Otherwise this feels kind of nasty but I guess if that is all that works then at least we have something.

Attachment #437288 - Flags: review?(dtownsend) → review+

Mike Hommey [:glandium]

Assignee

Comment 17

•

15 years ago

You made me wonder, so I went back to the code and the problem is actually a pure race condition, which I didn't spot first, and is why it's hard to come up with something that works accurately. With my patch, there still remains a race condition, by the way, though it has a much smaller window. What seems to happen without the patch is that in the non blocking case, you can call Kill slightly after nsProcess was notified the process terminated, but before it sets mThread to null. mThread is nulled in nsProcess::ProcessComplete, which happens long enough after mProcess is nulled in Monitor that there is a race condition window. With the patch, there is still a race condition window, though tighter. Kill may be called before Monitor acquires the lock, the mProcess test then occurs, and as being non null, Kill continues, then the scheduler may switch threads, Monitor gets the lock, nulls mProcess, releases the lock, and then whenever the scheduler goes back to the thread currently in Kill(), and PR_KillProcess is called with a null argument. I think the mProcess test in Kill() should be moved after the lock.

Mike Hommey [:glandium]

Assignee

Comment 18

•

15 years ago

Attached patch Patch v2 — Details — Splinter Review

See comment 17

Attachment #424573 - Attachment is obsolete: true

Attachment #441252 - Flags: review?(dtownsend)

Dave Townsend [:mossop]

Updated

•

15 years ago

Attachment #441252 - Flags: review?(dtownsend) → review+

Mike Hommey [:glandium]

Assignee

Comment 19

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/46c6105f5429 http://hg.mozilla.org/mozilla-central/rev/5e3ddc9cd1c5 (with a wrong bug number in the commit message, damnit)

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla1.9.3a5

Mark Banner (:standard8)

Updated

•

15 years ago

Depends on: 620101

You need to log in before you can comment on or make changes to this bug.