Closed Bug 464071 Opened 11 years ago Closed 10 years ago
User tracking via Math
.random output and multipart/form-data boundary string
Bug 461204 seems to be about the post boundary. We should at least search the file contents for the boundary string and use a different boundary on conflict. That's probably a separate issue from the random number seeding.
CCing some people from Labs as I believe this also affects the cryptographic aspects of Weave.
We have PR_GetRandomNoise and should seed with that instead of a predictable value.
If you need unpredictable pseudorandom numbers, you need to use NSS's PK11_GenerateRandom function. If the PRNG is not cryptographic secure, it is not enough to seed it with a random value. The PRNG itself needs to be secure, too, to produce unpredictable output (as opposed to just uniformly distributed output).
Just to raise the issue (I think it's not actually a problem, but I wouldn't consider myself an RFC 2616 expert), I assume because our form submission requests have Content-Length headers, the predictability of form submission boundaries can't be used to perform HTTP request splitting, right? (Except in the sense of being able to send two files in the same request where the actual form only offered space for one, that is -- but that's not the traditional meaning.)
As far as Weave goes, all the IWeaveCrypto stuff uses NSS's secure RNG. But it looks like modules/xmpp/authenticationLayer.js uses Math.random() to generate a nonce, that should be fixed. Not sure of the impact without looking closer at the code. The weakness of Math.random() is nothing new, but perhaps not widely known...
If people care, we could also use the uuid generator for form submission boundaries (of course that still falls back to random() on everything but Mac/Linux). Not sure whether the uuid generators on Win/Mac are good enough for what we want here, though.
(In reply to comment #6) > The weakness of Math.random() is nothing new, but perhaps not widely known... The full writeup does mention (p. 8) that "Firefox's implementation of Math.random() is almost identical to Java's util.Random.nextDouble() (). The Java util.Random PRNG is well analyzed (e.g. )" I think the new angle here is that even if you don't care about the actual randomness/predictability of Math.random(), the fact that the seed can be recovered is an issue in itself. This is because the seed is sensitive, as it can be used to identify the browser instance, and it leaks information. Likewise (to some degree), the Math.random() "mileage" can be used to further distinguish among browser instances.
(In reply to comment #5) > Just to raise the issue (I think it's not actually a problem, but I wouldn't > consider myself an RFC 2616 expert), I assume because our form submission > requests have Content-Length headers, the predictability of form submission > boundaries can't be used to perform HTTP request splitting, right? (Except in > the sense of being able to send two files in the same request where the actual > form only offered space for one, that is -- but that's not the traditional > meaning.) That's my understanding (i.e. Content-Length determines overall message size, parts are internal to the message), though I'm sure there are people who know better than me on this list.
Just a quick question: I notice that there's a single Bugzilla case (#464071) for this issue. I am not familiar with Mozilla's procedures, but I think that perhaps two issues should be opened (one for Math.random, another for the boundary string).
I think we should use bug 461204 for the boundary string (and perhaps the exiting Math.random bug for Math.random).
(In reply to comment #11) > I think we should use bug 461204 for the boundary string (and perhaps the > exiting Math.random bug for Math.random). I hope I'm not nitpicking, but the particular problem described in #461204 seems to be around *static* boundary string, which may very well be a separate issue (one which is not normally encountered - in all my experiments with Firefox 2/3 I did not encounter such situation). The issue I submitted earlier this week is about the predictability of the boundary string, and implications of its predictability (e.g. calculating the browser "mileage").
The string is static because we use rand() without seeding it, as described in this bug... Fixing that involves overhauling how the string is generated, which is exactly what's needed to fix the boundary-string issue you submitted.
(In reply to comment #13) > The string is static because we use rand() without seeding it, as described in > this bug... Fixing that involves overhauling how the string is generated, > which is exactly what's needed to fix the boundary-string issue you submitted. Oh, right. I missed the part where this is supposed to be the *first* file upload submission. So yes, there's a point in routing the multipart boundary issue to #461204. At the same time, may I point out that this can be thought of as a security issue (whereas #461204 is a publicly accessible bug), and that fixing #461204 per-se involves merely seeding srand(), whereas fixing the issue I pointed out requires overhauling the string generation. Ultimately it's your call, of course.
Maybe I misunderstood the paper, but it sounded like the problem with the boundary was that it was predictable due to the seeding either not happening or the seed value being recoverable from the JS random number generator seed. So just using a better seed for srand() should help there, as I understand. Or am I missing something?
(In reply to comment #15) > Maybe I misunderstood the paper, but it sounded like the problem with the > boundary was that it was predictable due to the seeding either not happening or > the seed value being recoverable from the JS random number generator seed. So > just using a better seed for srand() should help there, as I understand. Or am > I missing something? There are two distinct (in my opinion) issues, although they share the same concept, sort-of. The first is the preditability of the Math.random() PRNG, and the ability to recover the seed used. The second is the predictability of the boundary string, which uses the CRT rand/srand PRNG. The root cause in this case is the predictability of the CRT rand/srand PRNG (at least on Windows and Mac OS/X, and to some degree on Linux too, and quite possibly on other O/Ses as well). Additionally, since srand is not called, the seed is static and always produces the same sequence (of which, #461204 is a symptom). Fixing one does not automatically fixes the other, and vice versa. Additionally, with the boundary string issue, just seeding srand() with a good seed won't get rid of the issue. First, the PRNG will still be predictable, so knowing the current value would make it easy (at least on Windows and Mac OS/X) to know the next value (read: next boundary string). And quite possibly the PRNG can be rolled back to the point where seeding took place, and thereby identify the seed used and be able to track the browser (this may depend on the seeding scheme).
Any PRNG with a published algorithm will let you predict the next result from the previous one. So as long as we're using a PRNG at all for the boundary string (and not one-way hashing the result or anything) you'll be able to predict future boundaries. I'm not clear on what the threat in predicting future boundaries is. The tracking is the real issue, as I understand it. How does the seeding scheme affect the ability to roll back the PRNG? Just in terms of being able to determine when you've rolled back to the original seed? For the seed we should be able to use truly random bytes, which ought to obviate this problem.
The user tracking is not the single issue here. Since the seed is the start up time, its recovery means that it is possible to deduce for how long the user runs the browser.
(In reply to comment #17) > Any PRNG with a published algorithm will let you predict the next result from > the previous one. Eh? I disagree... Why is this an inherent attribute of any PRNG? it is only correct if you can deduce the internal state of the PRNG given a sequence of outputs. However, a cryptographic PRNG would not let you do that. So as long as we're using a PRNG at all for the boundary > string (and not one-way hashing the result or anything) you'll be able to > predict future boundaries. I'm not clear on what the threat in predicting > future boundaries is. You can use it e.g. for submitting arbitrary data as files to 3rd party sites(i.e. controlling the inner-part headers) as part of a CSRF attack. > The tracking is the real issue, as I understand it. How does the seeding > scheme affect the ability to roll back the PRNG? Just in terms of being able > to determine when you've rolled back to the original seed? Yes. Suppose you seed with 3 bytes of true random noise, and zero-out the fourth byte (this is a cotrived example, of course), then the seed is quite good, but if I'm able to roll back the PRNG I can determine the seed by the zero byte, which will then give me what I need - an identifier for the browser. For the seed we > should be able to use truly random bytes, which ought to obviate this problem. This still depends on the implementation of the CRT rand/srand PRNG. Suppose it allows seeding by 32 bits, but its internal state is much larger (e.g. 48 bits, 64 bits or even 992 bits which is the case with glibc - Linux). Then an attacker can still identify the seed (if the PRNG is weak) and the browser mileage.
Note that we have nsIRandomGenerator if we really want, by the way. For the form boundary that seems perfectly reasonable to me.
(In reply to comment #20) > Note that we have nsIRandomGenerator if we really want, by the way. For the > form boundary that seems perfectly reasonable to me. There are two questions: a. Is nsIRandomGenerator implemented as a cryptographic PRNG? I assume so, but I didn't really dig inside the code. It would be nice to know the algorithm used. b. Is the PRNG seeded with enough high-entropy bits? this is usually a trickier question... Note that if (a) holds, but (b) doesn't, it's still quite bad for the boundary strings, because an attacker may be able to enumerate the possible seeds, and eliminate false guesses using one or more boundary string samples.
nsIRandomGenerator just directly calls PK11_GenerateRandom. Per comment 4, I would assume it's fine (in particular, that it is in fact a cryptographic PRNG, and that the seed is sufficiently long truly random data). One of the NSS developers would have to comment on the details, though.
PK11_GenerateRandom is a good cryptographic PRNG. To use it, you must have first initialized NSS. That initialization will have seeded the PRNG adequately. One of the properties of this PRNG is that, without knowing the internal state (which is very large), one cannot predict the output, even if one has collected all previous inputs.
I received some additional material from the reporter describing attack scenarios which he feels constitute a real security threat rather than privacy-only. I'll attach the report momentarily, but in summary the threats are threefold: 1. An attacker can read state information about another site by sampling Math.random before and after embedding the site in their own page. 2. An attacker can influence the output of Math.random for any site, since all sites share this resource. 3. Math.random output is predictable for all sites so any web apps using it for a source of cryptographic randomness are vulnerable. As Amit points out in the report, the JS Standard doesn't require that Math.random be cryptographically strong, but it is good to at least consider these attack scenarios in our severity rating.
(In reply to comment #22) > nsIRandomGenerator just directly calls PK11_GenerateRandom. Per comment 4, I > would assume it's fine (in particular, that it is in fact a cryptographic PRNG, > and that the seed is sufficiently long truly random data). One of the NSS > developers would have to comment on the details, though. Are you proposing wiring Math.random() to nsIRandomGenerator, or just using nsIRandomGenerator to seed the JS PRNG? (In reply to comment #19) > > The tracking is the real issue, as I understand it. How does the seeding > > scheme affect the ability to roll back the PRNG? Just in terms of being able > > to determine when you've rolled back to the original seed? > > Yes. Suppose you seed with 3 bytes of true random noise, and zero-out the > fourth byte (this is a cotrived example, of course), then the seed is quite > good, but if I'm able to roll back the PRNG I can determine the seed by the > zero byte, which will then give me what I need - an identifier for the browser. Is our seed actually weak in this way? http://mxr.mozilla.org/mozilla-central/source/js/src/jsmath.cpp#434 I could *easily* be misreading, but it looks like giving random_setSeed a good int64 worth of random noise instead of PRMJ_Now() would not leave any of the tell-tale signs you're talking about in terms of identifying the seed.
> Are you proposing wiring Math.random() to nsIRandomGenerator, or just using > nsIRandomGenerator to seed the JS PRNG? I'm proposing using it instead of random() for the form submission multipart separator, so that it would actually not be the same from run to run as it is now). This would have the side effect that it would no longer be possible to go back and forth from those strings to Math.random() results. Not sure what the best way to proceed on Math.random() is.
(In reply to comment #26) > (In reply to comment #19) > > > The tracking is the real issue, as I understand it. How does the seeding > > > scheme affect the ability to roll back the PRNG? Just in terms of being able > > > to determine when you've rolled back to the original seed? > > > > Yes. Suppose you seed with 3 bytes of true random noise, and zero-out the > > fourth byte (this is a cotrived example, of course), then the seed is quite > > good, but if I'm able to roll back the PRNG I can determine the seed by the > > zero byte, which will then give me what I need - an identifier for the browser. > Is our seed actually weak in this way? > http://mxr.mozilla.org/mozilla-central/source/js/src/jsmath.cpp#434 > I could *easily* be misreading, but it looks like giving random_setSeed a good > int64 worth of random noise instead of PRMJ_Now() would not leave any of the > tell-tale signs you're talking about in terms of identifying the seed. If you only give it a strong seed, without changing the weak algorithm, the overall system is still vulnerable. Perhaps a bit less vulnerable, but still vulnerable. An attacker can still find the internal PRNG state at any given moment (because the PRNG algorithm is weak). And then, the attacker can find out e.g. how many times Math.random() was called in between checkpoints. An attacker can also still mount some form of user tracking. If the attacker knows the state at time T1, and then the state at time T2, and the attacker can also assume that say no more than 1000 Math.random() values were consumed by FF in between, then the attacker can roll forward the state from T1 up to 1000 iterations, and compare each intermediate state with the state at T2. If there's a match, the attacker can say with high likelihood of being correct, that the same browser was used in T1 and in T2. Hence user tracking (more demanding on the CPU though).
Trying to contact Mozilla's security group regarding logistics of this issue (somehow emailing security@ didn't work). If you're on the security group, please send me an email off-list. If you know someone in the security group, please ping him/her. Thanks.
This bug was opened in response to your mail to security@, and is where the "logistics" of this issue are being discussed. The commenters here are members of the security group.
Apparently there's already a CVE number assigned to this issue: https://bugzilla.redhat.com/show_bug.cgi?id=480938
bug 475585 is closely related to this one. I recently worked on bug 511328. It made the global per-runtime (one per browser) random number stream thread local. Since most of the browser runs in one thread, that didn't substantially change anything except performance (no more locking needed). Based on the work in bug 511328 I can easily move the random number generator state into each JSContext. The memory overhead will be minimal. My question is whether people agree that this approach would fix this bug here. I plan on doing the work in 475585 (which is open), make this bug depend on 475585, and then resolve this one once the work is done. The main remaining worries if each context has its own random number generator are: 1) the stream of random numbers a script gets will become more predictable (we remove any noise from other consumers of the RN stream). This should be fine. Its already pretty predictable now. 2) we will still not use a cryptographically strong algorithm. I think we will have to stick with this approach. random() is used in benchmarks and silly places all over where performance is expected. We should file a separate bug to introduce a strong random function, and also standardize it. Making our random unilaterally better is not going to help the web. Everyone else seems to use thread-local poor-mans RNGs right now. 3) if the RNG is per context, content might be able to spoof/predict/steer the numbers chrome gets if chrome executes on the same context (which right now it usually does). I think the main security bug is here the fact that chrome re-uses the same context content uses. I talked to mrbkap about splitting these context uses and that seems to be the right way to go. If anyone sees any other weaknesses I didn't mention above, please let me know. Otherwise I will go ahead and mark the dependency and whip up a patch in the other bug.
(In reply to comment #32) > My question is whether people agree that this approach would fix this bug here. > I plan on doing the work in 475585 (which is open), make this bug depend on > 475585, and then resolve this one once the work is done. Honestly, I think we should just open this bug. The paper has been public for a pretty long time, so it's not like there's anything in this bug that isn't already public.
I have no objections to #33.
blocking2.0: --- → ?
Bug 322529 proposes to strengthen Math.random. The benchmarks that use it should change to use a deterministic alternative -- they aren't measuring Math.random performance. Real performance-critical code on the web probably doesn't count on Math.random being fast and crappy; we need evidence for this before optimizing it prematurely. We do know Math.random needs to be less crappy. A new API is not going to see use until years from now. We should avoid adding more peas to the pod if the first one can be strengthened. /be
Will be fixed by 475585.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: CVE-2008-5913
Clearing the blocking nom.
(In reply to comment #36) > Will be fixed by 475585. > > *** This bug has been marked as a duplicate of bug 475585 *** Bug 475585 doesn't take care of the "multipart/form-data boundary string" part of this bug, does it?
Yeah, I concur. We should split the bug. Or re-open and re-task for the boundary string only? Dealers choice.
(In reply to comment #39) > Yeah, I concur. We should split the bug. Or re-open and re-task for the > boundary string only? Dealers choice. I'm converting this bug to a meta bug with bug 461204 and bug 475585 as its dependencies. I think that makes sense.
Assignee: gal → nobody
QA Contact: general → toolkit
Resolution: DUPLICATE → FIXED
You need to log in before you can comment on or make changes to this bug.