Closed
Bug 988878
Opened 11 years ago
Closed 11 years ago
Maverick crash (?) on latest nightly with this testcase
Categories
(Core :: Graphics, defect)
Tracking
()
VERIFIED
FIXED
mozilla31
Tracking | Status | |
---|---|---|
firefox29 | --- | unaffected |
firefox30 | + | fixed |
firefox31 | + | fixed |
firefox-esr24 | --- | unaffected |
People
(Reporter: martijn.martijn, Assigned: bent.mozilla)
References
Details
(5 keywords)
Attachments
(3 files)
250 bytes,
text/html
|
Details | |
22.80 KB,
text/plain
|
Details | |
918 bytes,
patch
|
benjamin
:
review+
lsblakk
:
approval-mozilla-aurora+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
See testcase:
<html>
<head>
</head>
<body style="-moz-column-count: 1;transform: scale(99);" onload="this.removeAttribute('style');">
<embed type="application/x-shockwave-flash"/>
<button style="position: fixed; top: 999999999px;">m</button>
</body>
</html>
With this testcase, my 10.9.2 MacOSX OS seems to crash on the latest nightly build. It doesn't seem to crash with Firefox29beta.
Reporter | ||
Comment 2•11 years ago
|
||
I don't get anything, the whole OS seems to lock or every application seems to close and the only thing I'm allowed to do is a restart.
Flags: needinfo?(martijn.martijn)
Comment 3•11 years ago
|
||
Same here. I'm not sure it's a security bug given that it takes down the whole machine, but let me see if I can catch anything in the debug build.
Comment 4•11 years ago
|
||
Running under debugger doesn't stop it. Not clear what state the computer is in at that point. The display is just white, can't switch to other applications (or see that I've switched.) If I sleep and weak up, I will get the login prompt, but after logging in, the screen is still white. If I'm remotely connected to the computer at the time we hit this bug, I get disconnected. Connecting again, firefox process is gone. Killing the loginwindow process logs me out and then things work properly.
Comment 5•11 years ago
|
||
Turning off OMTC stops the crash. Switching to skia from cg doesn't make a difference. With OMTC on, forcing basic also doesn't make a difference.
Comment 6•11 years ago
|
||
Last thing that seems to happen before we crash is:
IPDL protocol error: unknown union type
###!!! ASSERTION: IPDL error [PLayerTransactionParent]: "unknown union type". Killing child side as a result.: 'Error', file /Users/msreckovic/Repos/mozilla-central/ipc/glue/ProtocolUtils.cpp, line 190
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bent.mozilla)
Comment 7•11 years ago
|
||
Comment 8•11 years ago
|
||
Complete shot in the dark - we send the message with scale, attribute gets removed, we end up in getting an invalid type? (and call abort). The whole "kicked out of ssh" is interesting...
Comment 9•11 years ago
|
||
Integrated or discrete GPU, makes no difference.
Comment 10•11 years ago
|
||
Come to think of it, "kicked out of ssh" is probably tty going away?
Comment 11•11 years ago
|
||
Process id in ProtocolUtils.FatalError is -1. That means kill all processes owned by the user, or perhaps all processes (if superuser), right? That seems... dangerous? Or am I misinterpreting the meaning of pid of -1?
Assignee | ||
Comment 12•11 years ago
|
||
pid -1 I think means 'cross thread actor' rather than 'cross process actor'.
Flags: needinfo?(bent.mozilla)
Comment 13•11 years ago
|
||
Hmm. Right now if we call KillProcess(pid=-1,...), it calls system kill(-1, SIGTERM), then kill(-1,SIGKILL) if necessary. Which is supposed to terminate/kill all user processes.
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 14•11 years ago
|
||
Yikes. I guess we were just calling kill with a random pid before bug 974356, and now we're taking down the whole system. Amazing.
I think we would have caught this sooner if chromium's pid type and processhandle types were distinct...
Blocks: 974356
Assignee | ||
Comment 15•11 years ago
|
||
Assignee: nobody → bent.mozilla
Status: NEW → ASSIGNED
Attachment #8398882 -
Flags: review?(benjamin)
Comment 16•11 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #15)
> Created attachment 8398882 [details] [diff] [review]
> Patch, v1
Is the right thing here to not do anything at all (this case) or actually abort the whole thing? Your change just ignores the FatalError call - nothing gets killed at all - is that the right thing?
Comment 17•11 years ago
|
||
When we do avoid aborting in the FatalError code, I can get the intermittent crash here (usually if loading the page from the command line):
[28044] ###!!! ABORT: Aborting on channel error.: file /Users/msreckovic/Repos/mozilla-central/ipc/glue/MessageChannel.cpp, line 1522
so it may actually be better to crash when we hit this FatalError.
Assignee | ||
Comment 18•11 years ago
|
||
FatalError is meant to be called when we need to kill a child process. For cross-thread actors there's nothing to kill since we implicitly trust our own process. It's just a programmer bug that causes this problem, not a hacked or misbehaving child process. How we handle that is a bit debatable I guess.
I think in theory we should be able to maintain the same model for both cross-process and cross-thread actors where we tear down the channel and send ActorDestroy messages to the child and handle everything cleanly. It might be tough though, and aborting if we get it wrong is probably the only option.
Admittedly error handling in the cross-thread actor code has probably been tested much less than the cross-process case. I would expect you to reliably hit the abort in comment 17, not just sometimes.
Comment 19•11 years ago
|
||
:bsmedberg, (admittedly overeager) review ping :)
:bent, the bug 974356 is in Aurora/30 right now, so we will want an uplift; is the patch the same?
Flags: needinfo?(benjamin)
Assignee | ||
Comment 20•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] (TPE timezone) from comment #19)
> :bent, the bug 974356 is in Aurora/30 right now, so we will want an uplift;
> is the patch the same?
Yeah, assuming this is what we want to do here.
We should really get someone working on comment 6 though!
Comment 21•11 years ago
|
||
Yeah, comment 6 is probably a non-security and a separate bug(?), but, I agree we should look at it and I've been doing that a bit.
Comment 22•11 years ago
|
||
This sounds like a pretty bad DoS if someone could force this so obviously we need to fix it, but doesn't sound like a severe "security" bug.
status-firefox29:
--- → unaffected
status-firefox30:
--- → affected
status-firefox31:
--- → affected
tracking-firefox30:
--- → +
tracking-firefox31:
--- → +
Keywords: csectype-dos,
sec-moderate
Updated•11 years ago
|
Attachment #8398882 -
Flags: review?(benjamin) → review+
Flags: needinfo?(benjamin)
Assignee | ||
Comment 23•11 years ago
|
||
Comment on attachment 8398882 [details] [diff] [review]
Patch, v1
[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Very difficult imo.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No.
Which older supported branches are affected by this flaw?
See flags.
If not all supported branches, which bug introduced the flaw?
Bug 974356 exposed this, but the faulty code has been here essentially ever since we started doing off-main-thread-compositing.
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
Same patch.
How likely is this patch to cause regressions; how much testing does it need?
This shouldn't affect anything except preventing us from taking down all of OS X :)
Attachment #8398882 -
Flags: sec-approval?
Comment 24•11 years ago
|
||
Comment on attachment 8398882 [details] [diff] [review]
Patch, v1
Sec-approval+ for trunk. Let's get branch patches created and nominated too.
Attachment #8398882 -
Flags: sec-approval? → sec-approval+
Assignee | ||
Comment 25•11 years ago
|
||
Assignee | ||
Comment 26•11 years ago
|
||
Comment on attachment 8398882 [details] [diff] [review]
Patch, v1
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 974356
User impact if declined: Killing all apps on OS X when we have an OMTC layers bug
Testing completed (on m-c, etc.): on m-i now
Risk to taking this patch (and alternatives if risky): This should be very safe
String or IDL/UUID changes made by this patch: None
Attachment #8398882 -
Flags: approval-mozilla-aurora?
Comment 27•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Reporter | ||
Comment 28•11 years ago
|
||
Ok, this seems to be fixed. But when I load the testcase in the latest Nightly build, it causes Firefox to not repaint anything, so when I switch from one tab to the tab with that page loaded, the previous tab is still shown. I filed bug 992999 for it.
Status: RESOLVED → VERIFIED
Updated•11 years ago
|
Attachment #8398882 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•11 years ago
|
status-firefox-esr24:
--- → unaffected
Comment 29•11 years ago
|
||
Updated•10 years ago
|
Group: core-security
Updated•8 years ago
|
Flags: in-testsuite?
You need to log in
before you can comment on or make changes to this bug.
Description
•