Closed Bug 1146416 Opened 9 years ago Closed 9 years ago

NS_OpenAnonymousTemporaryFile() runs main thread code when run off the main thread

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- affected
firefox39 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- wontfix
b2g-v1.4 --- unaffected
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- wontfix
b2g-v2.1S --- wontfix
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: mccr8, Assigned: jld)

References

Details

(Keywords: csectype-race, sec-moderate, Whiteboard: [adv-main39+])

Crash Data

Attachments

(3 files, 1 obsolete file)

lmandel found this crash report on crash stats:

https://crash-stats.mozilla.com/report/index/c3fccd5e-042e-4939-8751-b91dd2150323

MessageChannel::Send() is being called off the main thread, but somehow triggers a DelayedFireDOMPaintEvent::Run() that gets caught and then run by the script blocker.

Jed, is it expected that we can call NS_OpenAnonymousTemporaryFile() from off the main thread?  I can't tell which layer involved is not handling the off main thread case.
Flags: needinfo?(jld)
Filed bug 1146471 for assertions to catch this.
Crash Signature: [@ NS_CycleCollectorSuspect3 ]
(In reply to Andrew McCreight [:mccr8] from comment #0)
> Jed, is it expected that we can call NS_OpenAnonymousTemporaryFile() from
> off the main thread?  I can't tell which layer involved is not handling the
> off main thread case.

If MediaCache/EncodedBufferCache is meant to be used off the main thread — ni?(roc) about that — then: yes, it's expected, and it's the child process case of NS_OpenAnonymousTemporaryFile that's dropped the ball here (by neither handling nor asserting the absence of the off-main-thread case).

And, if that's the case, it's a little disturbing that nothing caught that mistake until now: it was committed in June 2014.
Flags: needinfo?(jld) → needinfo?(roc)
This is unrelated to MediaCache. This is EncodedBufferCache, which is part of MediaRecorder, and yes EncodedBufferCache::AppendBuffer is intended to run off the main thread.

So then, the question is whether it's better to fix NS_OpenAnonymousTemporaryFile to work off the main thread, or fix EncodedBufferCache to avoid calling NS_OpenAnonymousTemporaryFile off the main thread. We can do the latter, though it would mean pre-opening the temporary file (which is not always needed) or dispatching a runnable to the main thread to open it and making the rest of EncodedBufferCache::AppendBuffer run asynchronously --- not ideal, but not very hard. Is there any easy non-deadlocky way to make NS_OpenAnonymousTemporaryFile work off the main thread?
Flags: needinfo?(roc)
I'll note that since we generally have a goal of keeping I/O off the main thread, letting NS_OpenAnonymousTemporaryFile work off the main thread would be nice.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #4)
> I'll note that since we generally have a goal of keeping I/O off the main
> thread, letting NS_OpenAnonymousTemporaryFile work off the main thread would
> be nice.

There are two main threads here: the real NS_OpenAnonymousTemporaryFile opens the file on the main thread of the parent process (which is strongly discouraged, because it can block), and the IPC client makes a sync call from what would be the main thread of the child process (which is also strongly discouraged) if not for this bug.

The quick fix would be to make the child side use a SyncRunnable, thus blocking its caller and both main threads.  People will complain, but I don't know that we'd want to risk uplifting anything fancier.

Making those people happier should probably be a separate public bug.  Issues to consider: how to async-ify existing callers of NS_OpenAnonymousTemporaryFile, and/or figuring out what thread of the parent process the actual open should happen on (nsIStreamTransportService talks about reading and writing things, but this is not really that).  PBackground is probably not the right tool for this, but documentation on how and when to use PBackground does not seem to be very existent.
PBackground would be fine for the communication protocol but we would still want the parent to bounce the actual I/O calls to STS. Of course, PBackground on the child side requires a thread with an event loop of some kind, so it might not work with all the media threads (I don't know what tricks they're using).
STR: change the definition of MAX_ALLOW_MEMORY_BUFFER in MediaRecorder.h to 0 and run dom/media/test/test_mediarecorder_record_getdata_afterstart.html on a debug build in e10s mode.

Non-OOP MediaRecorder isn't affected by this bug; this might be relevant to the bug status flags.  I don't know if this could be exposed on non-e10s desktop via out-of-process thumbnailing or something along those lines.
Attached patch STR (as a patch)Splinter Review
Better version of the STR: lower the EncodedBufferCache memory/disk cutoff for tests only.  It should probably be landed in a separate bug that isn't publicly linked to this one.
Assignee: nobody → jld
Attached patch bug1146416-quick-fix-hg0.diff (obsolete) — Splinter Review
This will make :bent sad, due to sync operations as discussed above, but it's a low-risk fix that should be easy to uplift to the affected branches.

A followup bug can be filed to make this more async.
Attachment #8584015 - Flags: review?(roc)
This seems fine for now. Performance-wise it's not worse than what we were already doing.
Attached patch Patch [v2]Splinter Review
Adding the magic word "explicit" so the static checking builds pass.  Carrying over r+.

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

Not trivial; the patch is fairly obviously a thread-safety bug, but it leaves open the question of how to get to an actual exploit from there.

> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No.

> Which older supported branches are affected by this flaw?

See bug flags, but see also comment #8 about e10s dependency.

> If not all supported branches, which bug introduced the flaw?

Bug 965724.

> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Not yet.  This patch will confict if merged past bug 1140714.

> How likely is this patch to cause regressions; how much testing does it need?

Regressions should be unlikely; the patch is relatively straightforward.  As for how much testing: part of the problem here is that we don't currently have automated test coverage for this case; the other attachment fixes that.
Attachment #8584015 - Attachment is obsolete: true
Attachment #8584251 - Flags: sec-approval?
Attachment #8584251 - Flags: review+
sec-moderate doesn't need sec-approval, though it is always nice to have the information.
Also, I think anything prior to 39 could be listed as disabled, because as far as I can tell we only hit the code in e10s.
Comment on attachment 8584251 [details] [diff] [review]
Patch [v2]

Yeah, this doesn't need sec-approval as a moderate.

Can you confirm that this is effectively disabled off of trunk?
Attachment #8584251 - Flags: sec-approval?
(In reply to Al Billings [:abillings] from comment #15)
> Can you confirm that this is effectively disabled off of trunk?

I'm not sure about that.  Non-e10s mode uses e10s to do background thumbnail generation for tiles, but the browser has a few adjustments: https://mxr.mozilla.org/mozilla-central/source/toolkit/components/thumbnails/content/backgroundPageThumbsContent.js?rev=e046475a75cb#28

docShell.allowMedia=false prevents audio/video elements from loading, so it *might* not be possible to get >1000KiB into a MediaRecorder buffer and trigger the race condition that way… but I don't see anything that would obviously prevent recording a getUserMedia stream, if the attacker can get permissions for it somehow.  Or if I'm overlooking something else.
(In reply to Bill McCloskey (:billm) from comment #1)
> Filed bug 1146471 for assertions to catch this.

And I've filed bug 1148650 for another assertion that might have caught this sooner — notice that the crash report in comment #0 shows us in the middle of firing a DOM Event while on a non-main thread.
https://hg.mozilla.org/mozilla-central/rev/1628ecf1c71f
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
(In reply to Jed Davis [:jld] {UTC-7} from comment #10)
> A followup bug can be filed to make this more async.

Yes please!
sec-moderates generally aren't taken for B2G v2.1 unless there's a strong reason to do so. That said, please do nominate this for Beta and b2g37 approval :)
(In reply to Jed Davis [:jld] {UTC-7} from comment #16)
> but I don't see anything that would obviously prevent recording a getUserMedia stream, if
> the attacker can get permissions for it somehow.

For what it's worth: I tried capturing a thumbnail for a test page that calls gUM while loading (I used BackgroundPageThumbs directly from chrome script, which dodges the question of how hard it would be for a real attacker to cause a chosen document to be background-thumbnailed).  Neither the gUM success callback nor the error callback appears to be invoked, even if the origin had already been granted “Always Allow” permissions.  I don't know *why* they're not called, so I can't be sure there isn't a way around it.

This also means that it's very unlikely that the code touched by this patch will ever be run on non-e10s desktop, which means the regression risk should be basically zero.
The patch needed some adjustment to cross bug 1140714; tested locally with the patch from bug 1149312 (and, on B2G for b2g37, bug 1148650).
Flags: needinfo?(jld)
Comment on attachment 8585835 [details] [diff] [review]
Patch for beta38/b2g37

Approval Request Comment
[Feature/regressing bug #]: Bug 965724
[User impact if declined]: Possible security vulnerability.  Exploitability on desktop is unclear (comment #16, comment #22), but underestimating exploit developers is usually a bad idea.
[Describe test coverage new/current, TreeHerder]: Bug 1149312 is adding test coverage on mozilla-central; that wouldn't help beta, because we're not testing e10s (or B2G) on beta on treeherder, but I've tested locally.
[Risks and why]: Very unlikely on desktop — reaching this code at all with e10s disabled is difficult enough that I haven't been able to do it deliberately, and I doubt it would happen accidentally.  (More details in comment #16 and comment #22.)
[String/UUID change made/needed]: None.
Attachment #8585835 - Flags: approval-mozilla-beta?
Comment on attachment 8585835 [details] [diff] [review]
Patch for beta38/b2g37

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 965724.
User impact if declined: Security.  Unlike the previous comment about desktop, B2G is always e10s, so this is exposed to web content.  Turning this kind of race condition into an exploit is believed to be nontrivial (thus the sec-moderate), but it's not unheard-of.
Testing completed: locally uplifted bug 1149312 and bug 1148650 to b2g37 — which causes dom/media/test/test_mediarecorder_record_getdata_afterstart.html to reliably crash the content process — and verified that this patch on top of that makes the test pass instead.
Risk to taking this patch (and alternatives if risky): Low.  If this code is reached, that was previously a thread-safety violation (and, on debug builds, at least one assertion failure!); the replacement is unlikely to do worse than that.
String or UUID changes made by this patch: None.
Attachment #8585835 - Flags: approval-mozilla-b2g37?
(In reply to Jed Davis [:jld] {UTC-7} from comment #24)
> [Describe test coverage new/current, TreeHerder]: Bug 1149312 is adding test
> coverage on mozilla-central; that wouldn't help beta, because we're not
> testing e10s (or B2G) on beta on treeherder, but I've tested locally.

Strike that last part.  `mach mochitest --e10s` isn't actually e10s on beta.  This version of the patch has been properly tested, applied to b2g37, on B2G; but not on beta.
Attachment #8585835 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
(In reply to Jed Davis [:jld] {UTC-7} from comment #26)
> (In reply to Jed Davis [:jld] {UTC-7} from comment #24)
> > [Describe test coverage new/current, TreeHerder]: Bug 1149312 is adding test
> > coverage on mozilla-central; that wouldn't help beta, because we're not
> > testing e10s (or B2G) on beta on treeherder, but I've tested locally.
> 
> Strike that last part.  `mach mochitest --e10s` isn't actually e10s on beta.
> This version of the patch has been properly tested, applied to b2g37, on
> B2G; but not on beta.

If I have testing/mochitest/runtests.py set the pref layers.offmainthreadcomposition.testing.enabled to true, then e10s is truly e10s on beta.  With that change I have reproduced the assertion failure, and verified that attachment #8585835 [details] [diff] [review] avoids it and the test in question seems to pass according to its log messages… but it doesn't display anything in the browser window.  This is probably as tested as it's going to get.
(In reply to Jed Davis [:jld] {UTC-7} from comment #24)
> [Risks and why]: Very unlikely on desktop — reaching this code at all with
> e10s disabled is difficult enough that I haven't been able to do it
> deliberately, and I doubt it would happen accidentally.  (More details in
> comment #16 and comment #22.)

As every patch introduces a risk, why should we take it in beta?
Flags: needinfo?(jld)
Flags: needinfo?(jld)
Attachment #8585835 - Flags: approval-mozilla-beta?
Blocks: 1151643
Whiteboard: [adv-main39+]
Group: core-security → core-security-release
Group: core-security-release
Depends on: 1346583
You need to log in before you can comment on or make changes to this bug.