Queued up rendering can allow websites to clickjack
Categories
(Core :: DOM: Navigation, defect)
Tracking
()
People
(Reporter: jgilbert, Assigned: sefeng)
References
(Depends on 1 open bug)
Details
(Keywords: csectype-clickjacking, csectype-priv-escalation, sec-high, Whiteboard: [adv-main119+][adv-ESR115.4+])
Attachments
(3 files, 2 obsolete files)
48 bytes,
text/x-phabricator-request
|
tjr
:
sec-approval+
|
Details | Review |
14.96 KB,
patch
|
dmeehan
:
approval-mozilla-esr115+
|
Details | Diff | Splinter Review |
219 bytes,
text/plain
|
Details |
This is a continuation of the general problem in bug 1695986. (bug 1695986 only patched the specific cert page vuln)
Queued rendering or even just too-fast rendering/navigation can allow websites to misdirect clicks into other websites.
For instance, in bug 1695986, we had a case where a malicious page would tell users "click here", queue extra graphics work, and navigate to a bad-cert page. While rendering lagged behind, users were furiously clicking as directed by the malicious page, and these clicks eventually start getting sent to the navigated-to page, which brings up the bad-cert dialog, and the clicks hit "accept". Even in the case where there isn't queued graphics work (which merely makes this more reliable), we must prevent input events from firing before users can realistically be responding to presented graphics/stimuli. I believe this is an S2.
In bug 1695986, we mitigated this in one vulnerable page in the browser by waiting (as per our security-delay settings), and also by waiting 10 (rendered requestAnimationFrame) frames for extra measure. We need a (quicker) version of this to happen for at least navigation.
I previously created bug 1783252 to serve as the non-sec-bug whitewashed place to deal with this, but without a blocking bug for that, it's likely to be ignored, so I'm filing this bug to clarify that there's still sec-related things to fix here.
I don't know if this is Layout, DOM:Events or DOM:Navigation. Since I believe this is most closely related to navigation, that's where I'm putting it initially.
Reporter | ||
Comment 1•2 years ago
|
||
@smaug, @dholbert: Does that sound right?
Comment 2•2 years ago
|
||
So DOM side would need to get some notification/flag (in the parent process) from gfx side that the page which is being loaded has been represented (painted) to the user. DOM could suspend sending user input events to the page-being-loaded until such notification is received.
Android (non-Fission) would need to get also a notification that the old page is basically gone (but I think we might already have such).
Do you know if we have a valid layersID here https://searchfox.org/mozilla-central/rev/3563da061ca2b32f7f77f5f68088dbf9b5332a9f/dom/events/EventStateManager.cpp#1477?
Perhaps we could use that as a way to notify the DOM side that the new page hasn't be shown to the user?
Updated•2 years ago
|
Comment 4•2 years ago
•
|
||
Olli's suggestion in comment 2 sounds good to me. We may want to have some supplemental pref-controlled delay at some level before we start accepting user input, to avoid problems where a user taps immediately as we paint (in the spirit of waiting until a user "can reasonably respond" per bug 1783252). On the order of tens of milliseconds, perhaps, just to ensure users have a chance to see what they're tapping.
(The "stuff moved around on the page and I clicked the wrong thing" scenario in bug 1568934 is harder to robustly fix, without causing unwanted breakage of legitimate content. I'll add a comment there [left as bug 1568934 comment 2]. I'm not sure whether it's tractable to mitigate that sort of issue, and it's not obvious to me that mitigations over there would prevent security issues. But when navigation is involved, it does seem more concerning and more tractable to fix.)
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 5•2 years ago
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #2)
So DOM side would need to get some notification/flag (in the parent process) from gfx side that the page which is being loaded has been represented (painted) to the user. DOM could suspend sending user input events to the page-being-loaded until such notification is received.
Android (non-Fission) would need to get also a notification that the old page is basically gone (but I think we might already have such).
On the graphics side, while we can probably plumb something more explicit like that throught, I think it might be even better if DOM could self-host this detection off of requestAnimationFrame. (E.g. spinning RAF 2-3x) Even Gfx kind of doesn't know when something is shown on the screen exactly, but "we're 2-3 frames past that" is probably the sort of thing we'd be doing anyway. We know only when we're handing something to the OS compositor, so we could say, when we hand frame N+1 off, report frame N as "visible", but that's pretty much the same behavior as spinning RAF I believe. (And you shouldn't need any changes from us if you go the spinning-RAF route)
Do you know if we have a valid layersID here https://searchfox.org/mozilla-central/rev/3563da061ca2b32f7f77f5f68088dbf9b5332a9f/dom/events/EventStateManager.cpp#1477?
Perhaps we could use that as a way to notify the DOM side that the new page hasn't be shown to the user?
I don't know about APZ, but I'll find someone who does!
(In reply to Daniel Holbert [:dholbert] from comment #4)
We may want to have some supplemental pref-controlled delay at some level before we start accepting user input, to avoid problems where a user taps immediately as we paint (in the spirit of waiting until a user "can reasonably respond" per bug 1783252). On the order of tens of milliseconds, perhaps, just to ensure users have a chance to see what they're tapping.
Yeah, this sounds ideal to me. Keep in mind it's not just the classic reaction-to-stimulus time, but also interpreting the destination page, moving the mouse, and clicking deliberately. Even as high as 100ms would not be unreasonable for this, but two-frames-plus-30ms is probably undetectable to users and makes things a lot safer.
(The "stuff moved around on the page and I clicked the wrong thing" scenario in bug 1568934 is harder to robustly fix, without causing unwanted breakage of legitimate content. I'll add a comment there [left as bug 1568934 comment 2]. I'm not sure whether it's tractable to mitigate that sort of issue, and it's not obvious to me that mitigations over there would prevent security issues. But when navigation is involved, it does seem more concerning and more tractable to fix.)
Thanks for looking! Makes sense.
Assignee | ||
Comment 6•2 years ago
|
||
So sounds like the proposed solution is we wait for a 2-3 rAFs before responding users action.
Kelsey, if we get the valid layerId in Olli's idea, do we still need to wait 2-3 rAFs? And do you plan to write the patch for this?
Assignee | ||
Comment 7•2 years ago
|
||
I think if we just implement the waiting for 2-3 rAFs
approach, we can downgrade this to S3?
Reporter | ||
Comment 8•2 years ago
|
||
I don't know what a layerId is, but if it's a graphics thing, it probably means we only need to wait for one RAF instead, so I don't think that's the most efficient direction to head with this solution.
N-RAFs-plus-fixed-N-ms is my proposed design. RAFs first to "prove" that (really, convince ourselves that it is likely that) the graphics are visible on the user's actual display, and fixed-ms timer to prevent faster-than-reaction-time false/misguided inputs. I believe that until we do both, that this class of attacks is fundamentally unaddressed, and so should remain an open S2.
I will not be writing a patch unless we're really hitting into graphics code. Graphics donated non-trivial engineering to research and fix the previous case outside our module. This one needs to be fixed by the code owners. This is not fundamentally a bug solvable by graphics, though it can be exacerbated by properties of graphics use (like maliciously queued work). I'm very happy to help with support and/or patching graphics code (we can add bugs that block this bug), but we should efficiently allocate resources here based on expertise with the code where the solution should live.
Comment 9•2 years ago
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #2)
DOM side would need to get some notification/flag (in the parent process) from gfx side that the page which is being loaded has been represented (painted) to the user
I think there needs to be information flow in both directions:
- DOM needs to tell the compositor when the content being painted corresponds to a new page
- The compositor needs to tell DOM when the content corresponding to the new page has been composited
One potential way this could work:
- DOM maintains a "page sequence number"
- DOM increments the "page sequence number" whenever a navigation occurs
- When triggering a paint, the information sent to the compositor is annotated with the "page sequence number" of the page being rendered
- When receiving an event, the compositor annotates the event with the "page sequence number" of the page that was composited at the time the event was received
- DOM drops events annotated with a "page sequence number" that's older than the currently loaded page
Assignee | ||
Comment 10•2 years ago
|
||
Botond, when DOM knows the new page has been composited, correct me if I am wrong, I don't think this is equivalent to the content is visible to the user, as there still might have a short delay? So we probably need to combine it with the fix-N-ms
approach? I am assuming the fixed-rAFs
can be replaced by this. However, I am not sure the effort and the scope of doing this, maybe the N-RAFs-plus-fixed-N-ms
proposed by Kelsey is the easiest solution.
Reporter | ||
Comment 11•2 years ago
|
||
From my point of view in graphics, the gfx compositor doesn't have events that it sends back upstream to layout or dom. The only info that flows that way is RAF, which will stop being triggered if gfx/the gpu fall behind.
That's why I rely on RAF here, as changing the page, and waiting for raf means that, after raf callbacks end, the page is composited. The next RAF after that, we know the page is on its way towards the screen, though it may still be in IPC or WR processing. If we get ahead by 2-3 though, RAF will stop being triggered, since we're not ready to paint more.
Navigating and then await RAF
x3 should be sufficient to know that the rendered page is on-or-near-the-screen, and shouldn't require any additional API on the GFX side.
In other words:
async function dom_safe_navigate(desc) {
function in_navigation_flux_ignore_events(e) {
warn(`Page contents are in-flux due to navigation, ignoring user event: `, e);
const consider_handled_and_stop_processing = true;
return consider_handled_and_stop_processing;
}
addChromePriorityUserEventListener(in_navigation_flux_ignore_events);
await dom_navigate(desc);
const MIN_REASONABLE_HUMAN_REACTION_MS = 100;
// We've changed the new contents of the page.
// The maybe-slow-for-gfx previous contents of the page are also part of this update.
await new Promise(requestAnimationFrame); // RAF #1
// By now (at callback-invoke time for the very next raf after making changes), all new page contents are about to be locked-in, last chance to change "Frame N".
const earliest_possible_visibility = performance.now();
await new Promise(requestAnimationFrame); // RAF #2
// By now, new page contents in Frame N were locked-in and sent to "painting as far as the html spec is concerned".
// This is last call for Frame N+1.
// Frame N might still be in-transit to compositor, or to gpu, or to screen, though.
await new Promise(requestAnimationFrame); // RAF #3
// If "Frame N" is not done yet, we're likely to not get RAF called until it is.
// After this above await unblocks, we are relatively confident that Frame N is effectively on the screen,
// though this does depend on our max total frame pipelining dom-to-screen depth.
await new Promise(requestAnimationFrame); // RAF #4
// Very safe by now
await new Promise(requestAnimationFrame); // RAF #5
// Something with RAF is very broken if we are not safe to process click events once we get here.
const ms_since_earliest_possible_visibility = performance.now() - earliest_possible_visibility;
const ms_min_reaction_time_remaining = MIN_REASONABLE_HUMAN_REACTION_MS - ms_since_earliest_possible_visibility;
if (ms_min_reaction_time_remaining > 0) {
await new Promise(go => setTimeout(go, ms_min_reaction_time_remaining));
} else {
warn(`Took ${ms_since_earliest_possible_visibility}ms after navigation, longer than MIN_REASONABLE_HUMAN_REACTION_MS : ${MIN_REASONABLE_HUMAN_REACTION_MS}ms which is worth investigating`);
}
removeChromePriorityUserEventListener(in_navigation_flux_ignore_events);
return;
}
Do you believe there is other information needed here from gfx?
Comment 12•2 years ago
|
||
To be honest, I'm not fully clear on the link between compositing and RAF.
If the strategy outlined in comment 11 leverages an existing communication path from the compositor back to content to give us the properties we're looking for, that does seem simpler than plumbing through additional information like a sequence number described in comment 9.
Updated•2 years ago
|
Comment 13•2 years ago
|
||
Hi Sean, could you please share updates here? Thanks.
Assignee | ||
Comment 14•2 years ago
|
||
I am working on the patch for this. Will have something up for review this week.
Assignee | ||
Comment 15•2 years ago
|
||
There's a potential security issue such that malicious websites
could trick the user to furiously clicking and misdirect those
clicks into other websites.
For instance, malicious websites would tell users to "click here",
queue extra graphic works and navigate to a bad cert page. While
rendering lagged behind, users were furiously clicking as directed by
the malicious page. These clicks eventually start getting
sent to the navigated-to page.
This patch introduces some delay to handle user input events. There
are two prefs introduced. One minimum number of RAFs that need s
to be passed and one minimum time that needs to be elapsed. Both
condition need to be met to handle user input events.
Assignee | ||
Comment 16•2 years ago
|
||
Update: Olli suggested to use FinishedWaitingForTransaction()
. This seems to be adaptable for our needs however I think this is webrender only, so maybe its not the best given we need to support non-webrender case. I'll chat with Olli about this.
Updated•1 years ago
|
Assignee | ||
Comment 17•1 years ago
|
||
Comment on attachment 9339135 [details]
Bug 1830820 - Introduce some delays to user input handling r=#dom-core,smaug!
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Should be quite easy.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all branches
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: It shouldn't be hard to create backports for other branches. If this patch works well in Nightly, then the risk is low.
- How likely is this patch to cause regressions; how much testing does it need?: The only testing I did was pushing to try. Since this patch introduces a delay for user input handling, there's a moderate risk that this patch will cause regressions.
- Is Android affected?: Yes
Comment 18•1 years ago
|
||
Comment on attachment 9339135 [details]
Bug 1830820 - Introduce some delays to user input handling r=#dom-core,smaug!
Can we split the patches into two: one for the code, and one for the tests? The patch for the tests should also add the code for isClickjackMitigationTest
and the pref into StaticPrefs so it's not present in the initial code.
Assignee | ||
Comment 19•1 years ago
|
||
Assignee | ||
Comment 20•1 years ago
|
||
Hi Tom, I took your suggestion and split the patch into two. Can you check this again please? I can't change the sec-approval
flag anymore.
Comment 21•1 years ago
|
||
Comment on attachment 9339135 [details]
Bug 1830820 - Introduce some delays to user input handling r=#dom-core,smaug!
Approved to land and uplift
Updated•1 years ago
|
Comment 22•1 years ago
|
||
FYI, we're down to 2 betas left this cycle, so if this needs bake time, we should get it landed ASAP. Also, this will need a rebased patch for ESR102.
Comment 23•1 years ago
|
||
Per discussion with Sean, this needs more bake time than we can give it this cycle.
Comment 24•1 years ago
|
||
Comment 25•1 years ago
|
||
Comment 26•1 years ago
|
||
Push with failures
Failure log (multiple test failures)
Comment 27•1 year ago
|
||
Comment 28•1 year ago
|
||
Comment 29•1 year ago
|
||
Backout link: https://hg.mozilla.org/integration/autoland/rev/a356d0eb9decfcb1ded59073a22058c7c694bad4
Push with failures
Example of failure log 1
Example of failure log 2
Failure log for TV 1
Failure log for TV 2
So far, the marionette jobs failed on Linux 18.04 x64 WebRender opt, Windows 11 x64 22H2 WebRender debug and OS X 10.15 WebRender debug.
Assignee | ||
Comment 30•1 year ago
|
||
We don't need this feature when running geckodriver tests
Comment 31•1 year ago
|
||
Comment 32•1 year ago
|
||
https://hg.mozilla.org/mozilla-central/rev/4cf8515b79ee
https://hg.mozilla.org/mozilla-central/rev/2e560bd558ef
https://hg.mozilla.org/mozilla-central/rev/a1735b4457aa
Updated•1 year ago
|
Updated•1 year ago
|
Comment 33•1 year ago
|
||
The patch landed in nightly and beta is affected.
:sefeng, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox118
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 34•1 year ago
|
||
Is this ready for an ESR115 approval request? Please attach rebased patches and nominate if so.
Comment 35•1 year ago
|
||
a month ago, tjr placed a reminder on the bug using the whiteboard tag [reminder-test 2023-10-10]
.
sefeng, please refer to the original comment to better understand the reason for the reminder.
Assignee | ||
Comment 36•1 year ago
|
||
Update: Those two regressions need to be fixed first before this can be back ported to ESR115.
Assignee | ||
Comment 37•1 year ago
|
||
ESR Approval Request Comment
[Feature/Bug causing the regression]: Introduce some delays to user input handling to fix this sec-high bug
[User impact if declined]: Users may under risk
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: Bug 1856497 and Bug 1856514
[Is the change risky?]: A little bit
[Why is the change risky/not risky?]: The change changes the time of when we handle user input events, so it's risky in its nature. However it has been tested on Nightly for a cycle already and it seems to be working well. Though there were two regressions.
[String changes made/needed]:
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Comment 38•1 year ago
|
||
Comment on attachment 9358324 [details] [diff] [review]
0001-Bug-1830820-Introduce-some-delays-to-user-input-hand.patch
Approved for 115.4esr.
Comment 39•1 year ago
|
||
uplift |
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 40•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 41•9 months ago
|
||
Bulk-unhiding security bugs fixed in Firefox 119-121 (Fall 2023). Use "moo-doctrine-subsidy" to filter
Updated•8 months ago
|
Description
•