Closed Bug 1830820 (CVE-2023-5721) Opened 1 year ago Closed 1 year ago

Queued up rendering can allow websites to clickjack

Categories

(Core :: DOM: Navigation, defect)

defect

Tracking

()

RESOLVED FIXED
119 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox-esr115 119+ fixed
firefox116 --- wontfix
firefox117 --- wontfix
firefox118 - wontfix
firefox119 + fixed

People

(Reporter: jgilbert, Assigned: sefeng)

References

(Depends on 1 open bug)

Details

(Keywords: csectype-clickjacking, csectype-priv-escalation, sec-high, Whiteboard: [adv-main119+][adv-ESR115.4+])

Attachments

(3 files, 2 obsolete files)

This is a continuation of the general problem in bug 1695986. (bug 1695986 only patched the specific cert page vuln)

Queued rendering or even just too-fast rendering/navigation can allow websites to misdirect clicks into other websites.
For instance, in bug 1695986, we had a case where a malicious page would tell users "click here", queue extra graphics work, and navigate to a bad-cert page. While rendering lagged behind, users were furiously clicking as directed by the malicious page, and these clicks eventually start getting sent to the navigated-to page, which brings up the bad-cert dialog, and the clicks hit "accept". Even in the case where there isn't queued graphics work (which merely makes this more reliable), we must prevent input events from firing before users can realistically be responding to presented graphics/stimuli. I believe this is an S2.

In bug 1695986, we mitigated this in one vulnerable page in the browser by waiting (as per our security-delay settings), and also by waiting 10 (rendered requestAnimationFrame) frames for extra measure. We need a (quicker) version of this to happen for at least navigation.

I previously created bug 1783252 to serve as the non-sec-bug whitewashed place to deal with this, but without a blocking bug for that, it's likely to be ignored, so I'm filing this bug to clarify that there's still sec-related things to fix here.

I don't know if this is Layout, DOM:Events or DOM:Navigation. Since I believe this is most closely related to navigation, that's where I'm putting it initially.

@smaug, @dholbert: Does that sound right?

Flags: needinfo?(smaug)
Flags: needinfo?(dholbert)

So DOM side would need to get some notification/flag (in the parent process) from gfx side that the page which is being loaded has been represented (painted) to the user. DOM could suspend sending user input events to the page-being-loaded until such notification is received.
Android (non-Fission) would need to get also a notification that the old page is basically gone (but I think we might already have such).

Do you know if we have a valid layersID here https://searchfox.org/mozilla-central/rev/3563da061ca2b32f7f77f5f68088dbf9b5332a9f/dom/events/EventStateManager.cpp#1477?
Perhaps we could use that as a way to notify the DOM side that the new page hasn't be shown to the user?

Flags: needinfo?(smaug) → needinfo?(jgilbert)
Group: core-security → dom-core-security

I'll mark it sec-high like the other bug.

Keywords: sec-high

Olli's suggestion in comment 2 sounds good to me. We may want to have some supplemental pref-controlled delay at some level before we start accepting user input, to avoid problems where a user taps immediately as we paint (in the spirit of waiting until a user "can reasonably respond" per bug 1783252). On the order of tens of milliseconds, perhaps, just to ensure users have a chance to see what they're tapping.

(The "stuff moved around on the page and I clicked the wrong thing" scenario in bug 1568934 is harder to robustly fix, without causing unwanted breakage of legitimate content. I'll add a comment there [left as bug 1568934 comment 2]. I'm not sure whether it's tractable to mitigate that sort of issue, and it's not obvious to me that mitigations over there would prevent security issues. But when navigation is involved, it does seem more concerning and more tractable to fix.)

Flags: needinfo?(dholbert)
Flags: needinfo?(jgilbert)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #2)

So DOM side would need to get some notification/flag (in the parent process) from gfx side that the page which is being loaded has been represented (painted) to the user. DOM could suspend sending user input events to the page-being-loaded until such notification is received.
Android (non-Fission) would need to get also a notification that the old page is basically gone (but I think we might already have such).

On the graphics side, while we can probably plumb something more explicit like that throught, I think it might be even better if DOM could self-host this detection off of requestAnimationFrame. (E.g. spinning RAF 2-3x) Even Gfx kind of doesn't know when something is shown on the screen exactly, but "we're 2-3 frames past that" is probably the sort of thing we'd be doing anyway. We know only when we're handing something to the OS compositor, so we could say, when we hand frame N+1 off, report frame N as "visible", but that's pretty much the same behavior as spinning RAF I believe. (And you shouldn't need any changes from us if you go the spinning-RAF route)

Do you know if we have a valid layersID here https://searchfox.org/mozilla-central/rev/3563da061ca2b32f7f77f5f68088dbf9b5332a9f/dom/events/EventStateManager.cpp#1477?
Perhaps we could use that as a way to notify the DOM side that the new page hasn't be shown to the user?

I don't know about APZ, but I'll find someone who does!


(In reply to Daniel Holbert [:dholbert] from comment #4)

We may want to have some supplemental pref-controlled delay at some level before we start accepting user input, to avoid problems where a user taps immediately as we paint (in the spirit of waiting until a user "can reasonably respond" per bug 1783252). On the order of tens of milliseconds, perhaps, just to ensure users have a chance to see what they're tapping.

Yeah, this sounds ideal to me. Keep in mind it's not just the classic reaction-to-stimulus time, but also interpreting the destination page, moving the mouse, and clicking deliberately. Even as high as 100ms would not be unreasonable for this, but two-frames-plus-30ms is probably undetectable to users and makes things a lot safer.

(The "stuff moved around on the page and I clicked the wrong thing" scenario in bug 1568934 is harder to robustly fix, without causing unwanted breakage of legitimate content. I'll add a comment there [left as bug 1568934 comment 2]. I'm not sure whether it's tractable to mitigate that sort of issue, and it's not obvious to me that mitigations over there would prevent security issues. But when navigation is involved, it does seem more concerning and more tractable to fix.)

Thanks for looking! Makes sense.

So sounds like the proposed solution is we wait for a 2-3 rAFs before responding users action.

Kelsey, if we get the valid layerId in Olli's idea, do we still need to wait 2-3 rAFs? And do you plan to write the patch for this?

Flags: needinfo?(jgilbert)

I think if we just implement the waiting for 2-3 rAFs approach, we can downgrade this to S3?

I don't know what a layerId is, but if it's a graphics thing, it probably means we only need to wait for one RAF instead, so I don't think that's the most efficient direction to head with this solution.

N-RAFs-plus-fixed-N-ms is my proposed design. RAFs first to "prove" that (really, convince ourselves that it is likely that) the graphics are visible on the user's actual display, and fixed-ms timer to prevent faster-than-reaction-time false/misguided inputs. I believe that until we do both, that this class of attacks is fundamentally unaddressed, and so should remain an open S2.

I will not be writing a patch unless we're really hitting into graphics code. Graphics donated non-trivial engineering to research and fix the previous case outside our module. This one needs to be fixed by the code owners. This is not fundamentally a bug solvable by graphics, though it can be exacerbated by properties of graphics use (like maliciously queued work). I'm very happy to help with support and/or patching graphics code (we can add bugs that block this bug), but we should efficiently allocate resources here based on expertise with the code where the solution should live.

Flags: needinfo?(jgilbert)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #2)

DOM side would need to get some notification/flag (in the parent process) from gfx side that the page which is being loaded has been represented (painted) to the user

I think there needs to be information flow in both directions:

  • DOM needs to tell the compositor when the content being painted corresponds to a new page
  • The compositor needs to tell DOM when the content corresponding to the new page has been composited

One potential way this could work:

  • DOM maintains a "page sequence number"
  • DOM increments the "page sequence number" whenever a navigation occurs
  • When triggering a paint, the information sent to the compositor is annotated with the "page sequence number" of the page being rendered
  • When receiving an event, the compositor annotates the event with the "page sequence number" of the page that was composited at the time the event was received
  • DOM drops events annotated with a "page sequence number" that's older than the currently loaded page

Botond, when DOM knows the new page has been composited, correct me if I am wrong, I don't think this is equivalent to the content is visible to the user, as there still might have a short delay? So we probably need to combine it with the fix-N-ms approach? I am assuming the fixed-rAFs can be replaced by this. However, I am not sure the effort and the scope of doing this, maybe the N-RAFs-plus-fixed-N-ms proposed by Kelsey is the easiest solution.

Flags: needinfo?(botond)

From my point of view in graphics, the gfx compositor doesn't have events that it sends back upstream to layout or dom. The only info that flows that way is RAF, which will stop being triggered if gfx/the gpu fall behind.
That's why I rely on RAF here, as changing the page, and waiting for raf means that, after raf callbacks end, the page is composited. The next RAF after that, we know the page is on its way towards the screen, though it may still be in IPC or WR processing. If we get ahead by 2-3 though, RAF will stop being triggered, since we're not ready to paint more.

Navigating and then await RAFx3 should be sufficient to know that the rendered page is on-or-near-the-screen, and shouldn't require any additional API on the GFX side.

In other words:

async function dom_safe_navigate(desc) {
  function in_navigation_flux_ignore_events(e) {
    warn(`Page contents are in-flux due to navigation, ignoring user event: `, e);
    const consider_handled_and_stop_processing = true;
    return consider_handled_and_stop_processing;
  }
  addChromePriorityUserEventListener(in_navigation_flux_ignore_events);

  await dom_navigate(desc);
  const MIN_REASONABLE_HUMAN_REACTION_MS = 100;
  // We've changed the new contents of the page.
  // The maybe-slow-for-gfx previous contents of the page are also part of this update.
  await new Promise(requestAnimationFrame); // RAF #1
  // By now (at callback-invoke time for the very next raf after making changes), all new page contents are about to be locked-in, last chance to change "Frame N".
  const earliest_possible_visibility = performance.now();
  await new Promise(requestAnimationFrame); // RAF #2
  // By now, new page contents in Frame N were locked-in and sent to "painting as far as the html spec is concerned".
  // This is last call for Frame N+1.
  // Frame N might still be in-transit to compositor, or to gpu, or to screen, though.
  await new Promise(requestAnimationFrame); // RAF #3
  // If "Frame N" is not done yet, we're likely to not get RAF called until it is.
  // After this above await unblocks, we are relatively confident that Frame N is effectively on the screen,
  // though this does depend on our max total frame pipelining dom-to-screen depth.

  await new Promise(requestAnimationFrame); // RAF #4
  // Very safe by now
  await new Promise(requestAnimationFrame); // RAF #5
  // Something with RAF is very broken if we are not safe to process click events once we get here.

  const ms_since_earliest_possible_visibility = performance.now() - earliest_possible_visibility;
  const ms_min_reaction_time_remaining = MIN_REASONABLE_HUMAN_REACTION_MS - ms_since_earliest_possible_visibility;
  if (ms_min_reaction_time_remaining  > 0) {
    await new Promise(go => setTimeout(go, ms_min_reaction_time_remaining));
  } else {
    warn(`Took ${ms_since_earliest_possible_visibility}ms after navigation, longer than MIN_REASONABLE_HUMAN_REACTION_MS : ${MIN_REASONABLE_HUMAN_REACTION_MS}ms which is worth investigating`);
  }

  removeChromePriorityUserEventListener(in_navigation_flux_ignore_events);
  return;
}

Do you believe there is other information needed here from gfx?

To be honest, I'm not fully clear on the link between compositing and RAF.

If the strategy outlined in comment 11 leverages an existing communication path from the compositor back to content to give us the properties we're looking for, that does seem simpler than plumbing through additional information like a sequence number described in comment 9.

Flags: needinfo?(botond)
Assignee: nobody → sefeng

Hi Sean, could you please share updates here? Thanks.

Flags: needinfo?(sefeng)

I am working on the patch for this. Will have something up for review this week.

Flags: needinfo?(sefeng)

There's a potential security issue such that malicious websites
could trick the user to furiously clicking and misdirect those
clicks into other websites.

For instance, malicious websites would tell users to "click here",
queue extra graphic works and navigate to a bad cert page. While
rendering lagged behind, users were furiously clicking as directed by
the malicious page. These clicks eventually start getting
sent to the navigated-to page.

This patch introduces some delay to handle user input events. There
are two prefs introduced. One minimum number of RAFs that need s
to be passed and one minimum time that needs to be elapsed. Both
condition need to be met to handle user input events.

Update: Olli suggested to use FinishedWaitingForTransaction(). This seems to be adaptable for our needs however I think this is webrender only, so maybe its not the best given we need to support non-webrender case. I'll chat with Olli about this.

Attachment #9339135 - Attachment description: Bug 1830820 - Introduce a clickjack mitigation r=#dom-core,smaug! → Bug 1830820 - Introduce some delays to user input handling r=#dom-core,smaug!

Comment on attachment 9339135 [details]
Bug 1830820 - Introduce some delays to user input handling r=#dom-core,smaug!

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Should be quite easy.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all branches
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: It shouldn't be hard to create backports for other branches. If this patch works well in Nightly, then the risk is low.
  • How likely is this patch to cause regressions; how much testing does it need?: The only testing I did was pushing to try. Since this patch introduces a delay for user input handling, there's a moderate risk that this patch will cause regressions.
  • Is Android affected?: Yes
Attachment #9339135 - Flags: sec-approval?

Comment on attachment 9339135 [details]
Bug 1830820 - Introduce some delays to user input handling r=#dom-core,smaug!

Can we split the patches into two: one for the code, and one for the tests? The patch for the tests should also add the code for isClickjackMitigationTest and the pref into StaticPrefs so it's not present in the initial code.

Attachment #9339135 - Flags: sec-approval? → sec-approval-

Hi Tom, I took your suggestion and split the patch into two. Can you check this again please? I can't change the sec-approval flag anymore.

Flags: needinfo?(tom)

Comment on attachment 9339135 [details]
Bug 1830820 - Introduce some delays to user input handling r=#dom-core,smaug!

Approved to land and uplift

Flags: needinfo?(tom)
Attachment #9339135 - Flags: sec-approval- → sec-approval+
Whiteboard: [reminder-test 2023-10-10]

FYI, we're down to 2 betas left this cycle, so if this needs bake time, we should get it landed ASAP. Also, this will need a rebased patch for ESR102.

Per discussion with Sean, this needs more bake time than we can give it this cycle.

Pushed by sefeng@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/da493ba4c9a6 Introduce some delays to user input handling r=smaug https://hg.mozilla.org/integration/autoland/rev/941c44881781 Add tests for user input delay handling r=smaug
Backout by nbeleuzu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3cb800b647d9 Backed out 2 changesets for reftest failure on inline-block-slice-5.html . CLOSED TREE
Pushed by sefeng@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4df110e417d4 Introduce some delays to user input handling r=smaug https://hg.mozilla.org/integration/autoland/rev/e930161a5935 Add tests for user input delay handling r=smaug
Backout by imoraru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a356d0eb9dec Backed out 2 changesets for causing marionette failures on test_actions_wheel.py. CLOSED TREE

We don't need this feature when running geckodriver tests

Pushed by sefeng@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4cf8515b79ee Introduce some delays to user input handling r=smaug https://hg.mozilla.org/integration/autoland/rev/2e560bd558ef Add tests for user input delay handling r=smaug https://hg.mozilla.org/integration/autoland/rev/a1735b4457aa Update geckodriver prefs to disable delayed user input event handling r=whimboo,webdriver-reviewers
Flags: needinfo?(sefeng)

The patch landed in nightly and beta is affected.
:sefeng, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox118 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(sefeng)
Flags: needinfo?(sefeng)
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-

Is this ready for an ESR115 approval request? Please attach rebased patches and nominate if so.

Flags: needinfo?(sefeng)
Regressions: 1856514

a month ago, tjr placed a reminder on the bug using the whiteboard tag [reminder-test 2023-10-10] .

sefeng, please refer to the original comment to better understand the reason for the reminder.

Flags: needinfo?(sefeng)
Whiteboard: [reminder-test 2023-10-10]

Update: Those two regressions need to be fixed first before this can be back ported to ESR115.

ESR Approval Request Comment
[Feature/Bug causing the regression]: Introduce some delays to user input handling to fix this sec-high bug
[User impact if declined]: Users may under risk
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: Bug 1856497 and Bug 1856514
[Is the change risky?]: A little bit
[Why is the change risky/not risky?]: The change changes the time of when we handle user input events, so it's risky in its nature. However it has been tested on Nightly for a cycle already and it seems to be working well. Though there were two regressions.
[String changes made/needed]:

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #9348648 - Attachment is obsolete: true
Attachment #9351124 - Attachment is obsolete: true
Flags: needinfo?(sefeng)
Attachment #9358324 - Flags: approval-mozilla-esr115?

Comment on attachment 9358324 [details] [diff] [review]
0001-Bug-1830820-Introduce-some-delays-to-user-input-hand.patch

Approved for 115.4esr.

Attachment #9358324 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
Whiteboard: [adv-main119+]
Whiteboard: [adv-main119+] → [adv-main119+][adv-ESR115.4+]
Attached file advisory.txt
Alias: CVE-2023-5721

Bulk-unhiding security bugs fixed in Firefox 119-121 (Fall 2023). Use "moo-doctrine-subsidy" to filter

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: