1434400 - (GLitch) using WebGL to Rowhammer the GPU ("GLitch")

Reporter

Description

•

6 years ago

From National Cyber Security Centre (Netherlands)
- - - - - - - - - - - - - - - - - - - - - - - - - 
Attached is a paper by several Dutch researchers of the Vrije Universiteit in Amsterdam that they wish to disclose to you. We are coordinating the disclosure on their behalf. In the paper they describe a Rowhammer attack using the GPU and WebGL. The paper already describes some possible mitigations, in addition in light of the developments on Spectre and Meltdown the researchers have provided some additional insights, copied below.

The paper has been accepted at the IEEE Security&Privacy conference later this year, and is currently awaiting publication. The S&P conference have changed their publication model this year and intends to publish the paper already in February. Should you require more time, please let us know so that we can ask S&P to extend the embargo on the paper to a later date.

Main questions:
- Could you let us know whether the researchers should ask for an embargo on the paper, or can this be published in February?
- Do you expect to release patches/mitigations for this vulnerability?
- What would be the timeline for these patches/mitigations?
- What would be the preferred publication strategy for this issue?

Once we have received initial feedback from the different parties (Apple, Google (including Android ARM-manufacturers) and Mozilla), we will send an update with a proposed timeline.

Please let us know if you have any further questions.

Kind regards,
NCSC-NL




Additional information of the researchers regarding the current status:

SPECTRE for GLitch:      

None of the currently proposed countermeasures affects our attack. The current 
countermeasures (i.e., timers throttling, and array index masking in chrome) do not interfere with WebGL, therefore they don't impede our exploit since the whole attack runs on the GPU.

GLitch for SPECTRE:     
Spectre (from JS) relies on the possibility of performing fine grained timing. We introduce some timing techniques that allow you to measure cached vs uncached accesses on the CPU. 

1. We have the GL_TIMESTAMP_EXT that if combined with clock edging[1] or a more precise version of such technique that consists in running a padding function with known execution time to see if it runs within the boundaries of two edges after the access, allows you to distinguish the difference between the two accesses. 

2. We discovered there was a mistake in our evaluation of the GLSync timers and we recovered a newer resolution of ~250 ns by using ticks-to-signal. We believe that it might be possible to measure cached/uncached accesses if using the padding technique. 
WIP on this side.

MITIGATIONS: 

To tackle SPECTRE:

- We suggest for mozilla to completely disable GL_TIMESTAMP_EXT returning 0 as Chrome is currently doing and we eventually suggest to completely remove the GL_TIMESTAMP_EXT from the WebGL EXT_disjoint_timer_query extension. 

- We suggest to remove the getSyncParameter function from the WebGL2 specification since it represents a run-to-completion violation and it provides the attacker with insights of the current status of the GPU operations. 

- We suggest to modify the clientWaitSync function to accept a callback that would run after the fence gets signalled. This would make it impossible to use such function as a timer since it won't return any information regarding the status of the GPU at a certain time. 

To tackle GLitch:

To partially mitigate GLitch browser vendors should completely disable the EXT_disjoint_timer_query extension so that it would be impossible to measure the GPU execution time — besides doing what mentioned above. For the contiguous memory side channel we do not rely on the resolution of the timer since we can just loop more times over the same access pattern depending on the granularity and then compute the mean access time. As a consequence simply reducing the granularity of such timers wouldn't be the optimal solution. If that's not an option we would still suggest to reduce it in order to make the attack slower and less practical. 

FINAL NOTES:
While our main suggestion is this of disabling timers in order to (partially) block our and other possible side channels, we believe that this represents only a quick fix against the side channel threat which is effective only in the short run. However, this shouldn’t be considered a long term solution against this family of attacks since, as shown by this and previous work [1][2], it is possible to craft timers from many different JS interfaces. 
 
 
[1] Kohlbrenner, D., & Shacham, H.. Trusted Browsers for Uncertain Times. USENIX 17 
[2] Gras, B., Razavi, K., Bosman, E., Bos, H., & Giuffrida, C. (2017). ASLR on the line: Practical cache attacks on the MMU. NDSS’17

With kind regards,
...........................................................................
National Cyber Security Centre
P.O. box 117 | 2501 CC | The Hague | The Netherlands | www.ncsc.nl

Daniel Veditz [:dveditz]

Reporter

Comment 1

•

6 years ago

Note: paper was encrypted with the wrong key. Working on getting a readable copy.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 2

•

6 years ago

SPECTRE recommendations 1-3 seem over-aggressive. We should be fine applying the same sort of mitigations we've applied to performance.now().

Changing clientWaitSync to use a callback is not an effective change, so we should ignore that aspect of the recommendation.

I'm curious that they finger 'getSyncParameter' but not other synchronous calls, like readPixels, getBufferSubData, and finish(). Maybe I don't understand what they mean by 'run-to-completion violation'. Maybe whole-pipeline-flushing calls are ok, since they reduce the resolution enough. If so, we should be fine simply throttling fence usage.

I'll need to see the paper to know more, but I suspect the mitigation for GLitch is similar to SPECTRE mitigation 1 and performance.now().


Regardless of what we do the address untrusted input, devs will need a way to act as if trusted in order to use these tools for debugging and profiling. We should preserve their presence in the browser, but not allow them to be run by untrusted content. (a pref that doesn't persist across restarts? ENV variable?)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 3

•

6 years ago

More specifically, the fix for these should be pretty straight-forward, though since it's Jan30, I'm worried about the paper going out in Feb already. We can definitely have this sorted by mid-Feb.


Also, run-to-completion[1] just means that it blocks the main thread/queue, which is known. (though curiously, this is the behavior of clientWaitSync, whereas getSyncParameter is actually non-blocking, though you could loop on it)

[1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop#Run-to-completion

Kelsey Gilbert [:jgilbert]

Assignee

Comment 4

•

6 years ago

Ah, run-to-completion /violations/ are when run-to-completion isn't done, which is what we get for naive polling like getSyncParameter.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 5

•

6 years ago

For background, "Trusted Browsers for Uncertain Times" is relevant:
https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_kohlbrenner.pdf

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 6

•

6 years ago

(In reply to Jeff Gilbert [:jgilbert] from comment #2)
> SPECTRE recommendations 1-3 seem over-aggressive. We should be fine applying
> the same sort of mitigations we've applied to performance.now().

Can you be more specific? The mitigations we've done with performance.now() (and all timers in fact) are really poor delaying measures. We have proofs of concept that entirely bypass any amount of timer rounding or jittering we do. 

(Also we don't do jittering yet, because it's really hard to do without letting time go backwards. I recently 'broke' Chrome's jitter by going backwards.)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 7

•

6 years ago

Ok, but it doesn't make sense to me to fix webgl and performance.now at different rates, particularly since this would tip off that the performance.now mitigations are insufficient. I do believe we need to bring WebGL up to the same mitigation level as the rest the browser, but going further earlier seems pointless. If we are investigating further mitigations to performance.now, this work should be folded into that, so we can move forward together.

I don't have access to any of the performance.now bugs, so I can't comment on how we can integrate their findings. I do think we need to have a unified (or at least cohesive) approach here.

Kelsey Gilbert [:jgilbert]

Assignee

Updated

•

6 years ago

Assignee: nobody → jgilbert

Priority: -- → P1

Whiteboard: gfx-noted

Daniel Veditz [:dveditz]

Reporter

Updated

•

6 years ago

Alias: GLitch

Summary: using WebGL to Rowhammer the GPU (plus bonus SPECTRE) → using WebGL to Rowhammer the GPU ("GLitch")

Luke Wagner [:luke]

Comment 8

•

6 years ago

Is the attack here just using WebGL features as a timer or is it actually doing the speculative access (bypassing a bounds check et al) in WebGL?  If it's the former, then I'd agree with Jeff that just using timer clamping analogous to p.now() is  sufficient.  But if it's the latter, then I think Tom's right and we need to stop the speculative access (just like we're doing in the JIT) in *addition* to clamping/fuzzing the WebGL timer.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 9

•

6 years ago

It's not doing spec-access, just using WebGL as a timer source with which to measure other timing attack vectors.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 10

•

6 years ago

At this point we might want to actually pivot this bug into webgl+SPECTRE discussion, and move GLitch to its own bug.

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 11

•

6 years ago

(In reply to Jeff Gilbert [:jgilbert] from comment #9)
> It's not doing spec-access, just using WebGL as a timer source with which to
> measure other timing attack vectors.

Oh okay. Is it easy to wrap the timer call that WebGL returns with ReduceTimePrecisionAs()?  That would feed it through all of our timer clamping stuff.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 12

•

6 years ago

(In reply to Tom Ritter [:tjr] from comment #11)
> (In reply to Jeff Gilbert [:jgilbert] from comment #9)
> > It's not doing spec-access, just using WebGL as a timer source with which to
> > measure other timing attack vectors.
> 
> Oh okay. Is it easy to wrap the timer call that WebGL returns with
> ReduceTimePrecisionAs()?  That would feed it through all of our timer
> clamping stuff.

Yep. I can get that thrown together as a starting point.

Daniel Veditz [:dveditz]

Reporter

Updated

•

6 years ago

Keywords: sec-high

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

6 years ago

status-firefox58: --- → ?

status-firefox59: --- → ?

status-firefox60: --- → affected

tracking-firefox59: --- → +

tracking-firefox60: --- → +

Al Billings [:abillings - ex-MoCo]

Comment 13

•

6 years ago

Attached file Grand Pwning Unit: Accelerating Microarchitectural Attacks with the GPU — Details

Kelsey Gilbert [:jgilbert]

Assignee

Comment 14

•

6 years ago

My initial take is that we might be OK if we neuter the timers, since this may render the physical adjacency detection ineffective.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 15

•

6 years ago

One other mitigation that comes to mind is to throttle the number of sync objects we allow per frame or second. We might have to do both, because I think you can just edge-detect otherwise. (though you can edge-detect today already, since the basic clamping we do doesn't mitigate that. (to my knowledge))

Kelsey Gilbert [:jgilbert]

Assignee

Comment 16

•

6 years ago

One of the things we need to keep in mind is that many of the mitigation approaches center on having a central standard time. The GPU timestamp queries a different time counters, so we need to hitch GPU time to our coordinated system time.

Implementation aside for now, if we had a function to map real timestamps to mitigated timestamps, we may be able to apply it directly to real gpu timestamps. 

We should consider establishing an (approximate) offset to link gpu time to system time, so that we functionally expose only one clock.

Dealing with time intervals is trickier. Clamping or fuzzing time intervals separately from mitigated system time seems unsafe to me. I believe we can transform these intervals (TIME_ELAPSED) into timestamp queries and reconstruct the interval from the mitigated timestamps, which is safe. (this is the same as JS diffing performance.now queries)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 17

•

6 years ago

After discussions with Chrome, we're going to over-restrict a bit:

- Disable disjoint_timer_query for now
- Prevent Sync objects from becoming SIGNALED until the next frame

That's what we're starting with.

We do have some lingering concerns about how essential the physical adjacency detection phase is in order to carry out the rowhammer attack, given the fairly complete understanding of the allocators involved.

Daniel Veditz [:dveditz]

Reporter

Comment 18

•

6 years ago

publication of the paper pushed out to May 5

Whiteboard: gfx-noted → gfx-noted [Paper to be published May 5]

Kelsey Gilbert [:jgilbert]

Assignee

Comment 19

•

6 years ago

Awesome. That should be fine for us.

Worse news is that I suspect the physical adjacency detection is unnecessary, though I haven't proved it.

We did clarify our approach for mitigating timers though, so smooth sailing on that. (Make Sync objects act like Queries, and fix up clientWaitSync to handle the too-soon special case)

Julien Cristau [:jcristau]

Comment 20

•

6 years ago

Setting 58 wontfix based on the publication being pushed out.

status-firefox58: ? → wontfix

Julien Cristau [:jcristau]

Comment 21

•

6 years ago

Jeff, what's the target for landing a mitigation here?  We're a week from 59 rc.

Flags: needinfo?(jgilbert)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 22

•

6 years ago

It'll be in. The patch is written.

Flags: needinfo?(jgilbert)

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox59: ? → affected

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

6 years ago

Flags: needinfo?(milan)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 23

•

6 years ago

Disabling disjoint timer queries: (sec bug, land asap)
https://bugzilla.mozilla.org/show_bug.cgi?id=1442504

Require round-trip for Sync objects: (public response to spec change, can probably ride the trains)
https://bugzilla.mozilla.org/show_bug.cgi?id=1442502

Liz Henry (:lizzard) (relman/hg->git project)

Comment 24

•

6 years ago

Did this actually land? We are building the release candidate today. Firefox 60 release  will be May 9. 
We could plan to include this in either in an RC2 build for 59, or in a dot release.

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

6 years ago

Flags: needinfo?(jgilbert)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 25

•

6 years ago

Looks like bug 1442504 did land for 59 and esr.
Should we call this fixed for 59 then?

Daniel Veditz [:dveditz]

Reporter

Updated

•

6 years ago

Depends on: 1442502

Liz Henry (:lizzard) (relman/hg->git project)

Comment 26

•

6 years ago

I'm removing the tracking flag because I don't think there's anything left for relman to do here for 59. If there is, please feel free to ask for tracking again.

tracking-firefox59: + → -

Ryan VanderMeulen [:RyanVM]

Comment 27

•

6 years ago

We're due to ship Fx60 on May 9, which is 4 days after this is due to be disclosed IIUC. That seems unfortunate :(

status-firefox-esr52: --- → ?

Daniel Veditz [:dveditz]

Reporter

Updated

•

6 years ago

Depends on: 1442504

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Whiteboard: gfx-noted [Paper to be published May 5] → gfx-noted [Paper to be published May 5][disclose 1442504 in advisory for release when this is public]

Kelsey Gilbert [:jgilbert]

Assignee

Updated

•

6 years ago

Alias: GLitch

Flags: needinfo?(jgilbert)

Daniel Veditz [:dveditz]

Reporter

Comment 28

•

6 years ago

[From NCSC:]

Hello all,

This week we have seen the accidental disclosure of the GLitch vulnerability as described in the "Grand Pwning Unit: Accelerating Microarchitectural Attacks with the GPU" paper of Frigo et al. Previously we have all agreed on an embargo until beginning of May. This was also communicated to the IEEE S&P conference. However, IEEE accidently published the paper at the initially proposed publication date.

The paper has been quickly removed from the IEEE page, but it has been downloaded by several readers, as can be seen by Twitter discussions. Unfortunately, one of the visitors of the IEEE page was the crawler of the Way Back Machine of archive.org, which currently still hosts a copy of the PDF. So the paper is still available to people who know the original URL, which unfortunately was also shared on Twitter.

While we have attempted to contain the disclosure, we are now forced to conclude that this containment is not perfect, and the paper should be considered disclosed to the curious public.

We propose to do a public disclosure on Wednesday 21 March at 16:00 CET.

Regards,
NCSC.

Whiteboard: gfx-noted [Paper to be published May 5][disclose 1442504 in advisory for release when this is public] → [Paper leaked, announcement March 21][disclose 1442504 in advisory for release when this is public] gfx-noted

Daniel Veditz [:dveditz]

Reporter

Comment 29

•

6 years ago

NCSC-NL says GLitch has been assigned CVE-2018-10229

Whiteboard: [Paper leaked, announcement March 21][disclose 1442504 in advisory for release when this is public] gfx-noted → [CVE-2018-10229][disclose 1442504 in advisory for release when this is public] gfx-noted

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

6 years ago

Flags: needinfo?(milaninbugzilla)

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox59: affected → wontfix

status-firefox60: affected → wontfix

David Bolter [:davidb] (NeedInfo me for attention)

Comment 30

•

6 years ago

Dan, Jeff, can this bug now be closed fixed?

Flags: needinfo?(jgilbert)

Flags: needinfo?(dveditz)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 31

•

6 years ago

I think so. dveditz, anything else here?

Flags: needinfo?(jgilbert)

Maire Reavy [:mreavy]

Comment 32

•

6 years ago

Adding Jessie (new graphics engineering manager) to all sec-crit and sec-high graphics bugs

Kelsey Gilbert [:jgilbert]

Assignee

Comment 33

•

6 years ago

We seem to be safe here with our removal of the perf queries the exploit used.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Daniel Veditz [:dveditz]

Reporter

Updated

•

5 years ago

Group: gfx-core-security → core-security-release

Daniel Veditz [:dveditz]

Reporter

Updated

•

5 years ago

status-firefox-esr60: --- → fixed

Flags: needinfo?(dveditz)

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox-esr52: ? → wontfix

David Lawrence [:dkl]

Comment 34

•

4 years ago

Removing employee no longer with company from CC list of private bugs.

Daniel Veditz [:dveditz]

Reporter

Updated

•

4 years ago

Group: core-security-release