Closed Bug 988878 Opened 11 years ago Closed 11 years ago

Maverick crash (?) on latest nightly with this testcase

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla31

Tracking Flags:

Tracking

Status

firefox29

---

unaffected

firefox30

fixed

firefox31

fixed

firefox-esr24

---

unaffected

People

(Reporter: martijn.martijn, Assigned: bent.mozilla)

References

Details

(5 keywords)

Attachments

(3 files)

crash1.htm 11 years ago Martijn Wargers (dead) 250 bytes, text/html		Details
Debug build log when running the test 11 years ago Milan Sreckovic [:milan] (needinfo for best results) 22.80 KB, text/plain		Details
Patch, v1 11 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 918 bytes, patch	benjamin : review+ lsblakk : approval-mozilla-aurora+ abillings : sec-approval+	Details \| Diff \| Splinter Review

Martijn Wargers (dead)

Reporter

Description

•

11 years ago

Attached file crash1.htm — Details

See testcase: <html> <head> </head> <body style="-moz-column-count: 1;transform: scale(99);" onload="this.removeAttribute('style');"> <embed type="application/x-shockwave-flash"/> <button style="position: fixed; top: 999999999px;">m</button> </body> </html> With this testcase, my 10.9.2 MacOSX OS seems to crash on the latest nightly build. It doesn't seem to crash with Firefox29beta.

Benjamin Smedberg

Comment 1

•

11 years ago

Crash report ID or stack?

Flags: needinfo?(martijn.martijn)

Martijn Wargers (dead)

Reporter

Comment 2

•

11 years ago

I don't get anything, the whole OS seems to lock or every application seems to close and the only thing I'm allowed to do is a restart.

Flags: needinfo?(martijn.martijn)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 3

•

11 years ago

Same here. I'm not sure it's a security bug given that it takes down the whole machine, but let me see if I can catch anything in the debug build.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 4

•

11 years ago

Running under debugger doesn't stop it. Not clear what state the computer is in at that point. The display is just white, can't switch to other applications (or see that I've switched.) If I sleep and weak up, I will get the login prompt, but after logging in, the screen is still white. If I'm remotely connected to the computer at the time we hit this bug, I get disconnected. Connecting again, firefox process is gone. Killing the loginwindow process logs me out and then things work properly.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 5

•

11 years ago

Turning off OMTC stops the crash. Switching to skia from cg doesn't make a difference. With OMTC on, forcing basic also doesn't make a difference.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 6

•

11 years ago

Last thing that seems to happen before we crash is: IPDL protocol error: unknown union type ###!!! ASSERTION: IPDL error [PLayerTransactionParent]: "unknown union type". Killing child side as a result.: 'Error', file /Users/msreckovic/Repos/mozilla-central/ipc/glue/ProtocolUtils.cpp, line 190

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(bent.mozilla)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 7

•

11 years ago

Attached file Debug build log when running the test — Details

Milan Sreckovic [:milan] (needinfo for best results)

Comment 8

•

11 years ago

Complete shot in the dark - we send the message with scale, attribute gets removed, we end up in getting an invalid type? (and call abort). The whole "kicked out of ssh" is interesting...

Milan Sreckovic [:milan] (needinfo for best results)

Comment 9

•

11 years ago

Integrated or discrete GPU, makes no difference.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 10

•

11 years ago

Come to think of it, "kicked out of ssh" is probably tty going away?

Milan Sreckovic [:milan] (needinfo for best results)

Comment 11

•

11 years ago

Process id in ProtocolUtils.FatalError is -1. That means kill all processes owned by the user, or perhaps all processes (if superuser), right? That seems... dangerous? Or am I misinterpreting the meaning of pid of -1?

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 12

•

11 years ago

pid -1 I think means 'cross thread actor' rather than 'cross process actor'.

Flags: needinfo?(bent.mozilla)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 13

•

11 years ago

Hmm. Right now if we call KillProcess(pid=-1,...), it calls system kill(-1, SIGTERM), then kill(-1,SIGKILL) if necessary. Which is supposed to terminate/kill all user processes.

Flags: needinfo?(nical.bugzilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 14

•

11 years ago

Yikes. I guess we were just calling kill with a random pid before bug 974356, and now we're taking down the whole system. Amazing. I think we would have caught this sooner if chromium's pid type and processhandle types were distinct...

Blocks: 974356

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 15

•

11 years ago

Attached patch Patch, v1 — Details — Splinter Review

Assignee: nobody → bent.mozilla

Status: NEW → ASSIGNED

Attachment #8398882 - Flags: review?(benjamin)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 16

•

11 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #15) > Created attachment 8398882 [details] [diff] [review] > Patch, v1 Is the right thing here to not do anything at all (this case) or actually abort the whole thing? Your change just ignores the FatalError call - nothing gets killed at all - is that the right thing?

Milan Sreckovic [:milan] (needinfo for best results)

Comment 17

•

11 years ago

When we do avoid aborting in the FatalError code, I can get the intermittent crash here (usually if loading the page from the command line): [28044] ###!!! ABORT: Aborting on channel error.: file /Users/msreckovic/Repos/mozilla-central/ipc/glue/MessageChannel.cpp, line 1522 so it may actually be better to crash when we hit this FatalError.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 18

•

11 years ago

FatalError is meant to be called when we need to kill a child process. For cross-thread actors there's nothing to kill since we implicitly trust our own process. It's just a programmer bug that causes this problem, not a hacked or misbehaving child process. How we handle that is a bit debatable I guess. I think in theory we should be able to maintain the same model for both cross-process and cross-thread actors where we tear down the channel and send ActorDestroy messages to the child and handle everything cleanly. It might be tough though, and aborting if we get it wrong is probably the only option. Admittedly error handling in the cross-thread actor code has probably been tested much less than the cross-process case. I would expect you to reliably hit the abort in comment 17, not just sometimes.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 19

•

11 years ago

:bsmedberg, (admittedly overeager) review ping :) :bent, the bug 974356 is in Aurora/30 right now, so we will want an uplift; is the patch the same?

Flags: needinfo?(benjamin)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 20

•

11 years ago

(In reply to Milan Sreckovic [:milan] (TPE timezone) from comment #19) > :bent, the bug 974356 is in Aurora/30 right now, so we will want an uplift; > is the patch the same? Yeah, assuming this is what we want to do here. We should really get someone working on comment 6 though!

Milan Sreckovic [:milan] (needinfo for best results)

Comment 21

•

11 years ago

Yeah, comment 6 is probably a non-security and a separate bug(?), but, I agree we should look at it and I've been doing that a bit.

Daniel Veditz [:dveditz]

Comment 22

•

11 years ago

This sounds like a pretty bad DoS if someone could force this so obviously we need to fix it, but doesn't sound like a severe "security" bug.

status-firefox29: --- → unaffected

status-firefox30: --- → affected

status-firefox31: --- → affected

tracking-firefox30: --- → +

tracking-firefox31: --- → +

Keywords: csectype-dos, sec-moderate

Benjamin Smedberg

Updated

•

11 years ago

Attachment #8398882 - Flags: review?(benjamin) → review+

Flags: needinfo?(benjamin)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 23

•

11 years ago

Comment on attachment 8398882 [details] [diff] [review] Patch, v1 [Security approval request comment] How easily could an exploit be constructed based on the patch? Very difficult imo. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No. Which older supported branches are affected by this flaw? See flags. If not all supported branches, which bug introduced the flaw? Bug 974356 exposed this, but the faulty code has been here essentially ever since we started doing off-main-thread-compositing. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Same patch. How likely is this patch to cause regressions; how much testing does it need? This shouldn't affect anything except preventing us from taking down all of OS X :)

Attachment #8398882 - Flags: sec-approval?

Al Billings [:abillings - ex-MoCo]

Comment 24

•

11 years ago

Comment on attachment 8398882 [details] [diff] [review] Patch, v1 Sec-approval+ for trunk. Let's get branch patches created and nominated too.

Attachment #8398882 - Flags: sec-approval? → sec-approval+

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 25

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8b908fdf7bc7

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 26

•

11 years ago

Comment on attachment 8398882 [details] [diff] [review] Patch, v1 [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 974356 User impact if declined: Killing all apps on OS X when we have an OMTC layers bug Testing completed (on m-c, etc.): on m-i now Risk to taking this patch (and alternatives if risky): This should be very safe String or IDL/UUID changes made by this patch: None

Attachment #8398882 - Flags: approval-mozilla-aurora?

Phil Ringnalda (:philor)

Comment 27

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/8b908fdf7bc7

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

status-firefox31: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla31

Martijn Wargers (dead)

Reporter

Comment 28

•

11 years ago

Ok, this seems to be fixed. But when I load the testcase in the latest Nightly build, it causes Firefox to not repaint anything, so when I switch from one tab to the tab with that page loaded, the previous tab is still shown. I filed bug 992999 for it.

Status: RESOLVED → VERIFIED

Martijn Wargers (dead)

Reporter

Updated

•

11 years ago

Depends on: 992999

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

11 years ago

Attachment #8398882 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Al Billings [:abillings - ex-MoCo]

Updated

•

11 years ago

status-firefox-esr24: --- → unaffected

Ryan VanderMeulen [:RyanVM]

Comment 29

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/d63ea81bafed

status-firefox30: affected → fixed

Daniel Veditz [:dveditz]

Updated

•

10 years ago

Group: core-security

Mats Palmgren (inactive)

Updated

•

8 years ago

Flags: in-testsuite?

You need to log in before you can comment on or make changes to this bug.