Regular expression denial-of-service in mozilla/bleach BleachSanitizerFilter.sanitize_css gauntlet regular expression
Categories
(Webtools :: Bleach-security, defect)
Tracking
(Not tracked)
People
(Reporter: schwag09, Assigned: u581815)
References
()
Details
(Keywords: sec-moderate, wsec-deplib, wsec-dos)
Attachments
(1 file)
Hi there,
I believe I've found a DoS bug in the mozilla/bleach library. The bug itself is in the BleachSanitizerFilter.sanitize_css
function. The vulnerability can be exploited with the following code:
In [1]: import bleach
In [2]: bleach.__version__
Out[2]: '3.1.3'
In [3]: payload = "'" + '"a"' * 64 + '^' + "'"
In [4]: bleach.clean('<a style=' + payload + '></a>', attributes={'a': ['style']})
...Spins...
The bug itself is a ReDoS bug. It occurs due to the gauntlet
regular expression:
gauntlet = re.compile(
r"""^([-/:,#%.'"\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
flags=re.U
)
The issue is mutually inclusive alternation nested inside a quantifier. In other words, two alternation statements (either/or via |
) that have full overlap inside a quantifier (e.g. +
or *
). The overlapping statements are "[\s\w]+"
and [-/:,#%.'"\s!\w]
- these overlap such that '"a"' * 64
in the payload causes catastrophic backtracking. We can also exploit this from \w-\w
and [-/:,#%.'"\s!\w]
with 'a-a' * 64
. The ^
in the payload matches none of the alternation statements, so when $
is hit in the regular expression, it causes it to backtrack.
Note that, as seen in the demo payload, attributes={'a': ['style']}
is required because bleach will not parse the style attribute by default. This is a mitigating factor that, as far as I can tell, prevents the functionality from being vulnerable by default. However, I likely did not exhaustively search all code paths that lead to BleachSanitizerFilter.sanitize_css
. I would double check my work here to ensure there isn't another avenue which makes this exploitable by default.
Regarding a fix, my first choice/suggestion would be to not use regular expressions for sanitization. Not only can it lead to esoteric DoS bugs like this, but it's also difficult to ensure you're allowing the minimal set of text through that you would like. I realize it's a much larger effort to refactor this to not use regular expressions, so we can also patch up the expression itself. To do so we must ensure that none of the alternation statements can completely overlap with another. I believe the following expression should do the trick:
gauntlet = re.compile(
r"""^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
flags=re.U
)
Note that we've removed the -
, '
, and "
from the initial alternation statement. This avoids complete overlap with the 2nd, 3rd, and 4th alternation statements. Since the 1st statement doesn't include a (
it cannot overlap with the 5th statement. If this expression breaks some of the logic necessary in the check we'll have to consider other options.
Hope this is helpful! Let me know if you have any questions and what the next steps are. One quick shoutout: I used Dlint + r2c to find this bug. Check them out if you're interested. If you're curious about ReDoS bugs, I'd recommend checking out this blog post: Finding Python ReDoS bugs at scale using Dlint and r2c.
Copying https://bugzilla.mozilla.org/show_bug.cgi?id=1623650#c1 here:
On my end I need to:
- reproduce and confirm the issue (ideally for multiple versions but it should be anything with that regex in it)
- make sure we recommend users don't pass unbounded strings
- look into tweaking guantlet regex as Matt suggests
my first choice/suggestion would be to not use regular expressions for sanitization
Absolutely, we've had a number of bugs around sanitizing CSS and filtering styles, so ideally we could switch to a different or better CSS parser:
- tinycss for a pure python implementation (with optional cython) https://github.com/mozilla/bleach/issues/248
- something battle-tested from Firefox/servo w/ python bindings e.g. https://github.com/servo/rust-cssparser
CCing :willkg in case he has any additional context or opinions
Comment 4•4 years ago
|
||
Side note--world is crazy now and maybe I'm not thinking at 100%. My apologies for bad thinking.
sanitize_css
is terrible and I continue to regret not putting in the effort to make the situation better. I agree the best plan is to drop that gauntlet regexp because it's terrible and either replace it with a parser or a library.
I think I understand what's going on to cause the DoS and the suggested fix. I don't know if I could figure out whether it affects real-world scenarios--what things aren't parseable now? Even so, I think the specter of possible unknown regressions is ok and we should fix this.
Relatedly, the email and url regexps are also overly complicated. It'd be interesting to know if they suffer similar problems.
https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L46
https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L75
I think I understand what's going on to cause the DoS and the suggested fix. I don't know if I could figure out whether it affects real-world scenarios--what things aren't parseable now? Even so, I think the specter of possible unknown regressions is ok and we should fix this.
I agree, I don't really have a sense of what the 1st alternation is getting at in the expression. It feels kind of like a catchall. I took a look through the git history of the gauntlet
and it seems the ReDoS has always been there (at least since 2012). One interesting note is that the html5lib
version of this function doesn't suffer from the same issue:
https://github.com/mozilla/bleach/blob/master/bleach/_vendor/html5lib/filters/sanitizer.py#L874
I see there's been some changes to the expression over time, which may have fixed minor bugs, but the fact that the html5lib
version isn't totally different and doesn't suffer from ReDoS may be some consolation that there won't be big regressions.
Relatedly, the email and url regexps are also overly complicated. It'd be interesting to know if they suffer similar problems.
https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L46
https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L75
After a cursory look at these expressions, I don't think they're vulnerable to ReDoS.
reproduce and confirm the issue (ideally for multiple versions but it should be anything with that regex in it)
I can confirm the PoC on python 3.8.1 with bleach 3.1.3, 3.1.2, 3.1.1, 3.1.0, 3.0.0, 2.1.4, 2.1.3 and didn't check earlier versions.
git blame shows we landed the current regex with: https://github.com/mozilla/bleach/pull/434 14 months ago and I released it with 3.1.3 this week. I also see I included an incomplete changelog with the 3.1.3 release.
Looking at older versions we used the following for the gauntlet regex:
-
^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$
w/re.U
in 3.1.3 -
^([-/:,#%.'"\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$
from 2.0 to 3.1.2, and before that it was split over two lines at least from 1.4 to 1.5
all of which include vulnerable patterns so I'll hazard that all versions are impacted.
edit: as noted schwag09 already note:
I agree, I don't really have a sense of what the 1st alternation is getting at in the expression. It feels kind of like a catchall. I took a look through the git history of the gauntlet and it seems the ReDoS has always been there (at least since 2012).
Note that, as seen in the demo payload, attributes={'a': ['style']} is required because bleach will not parse the style attribute by default. This is a mitigating factor that, as far as I can tell, prevents the functionality from being vulnerable by default. However, I likely did not exhaustively search all code paths that lead to BleachSanitizerFilter.sanitize_css. I would double check my work here to ensure there isn't another avenue which makes this exploitable by default.
I took a look and BleachSanitizerFilter.sanitize_stream -> .sanitize_token -> .allow_token -> .sanitize_css
looks like the only path to the regex.
I added failing unit test to the attached PR and confirmed that Matt's proposed fix does fix the backtracking issue:
gauntlet = re.compile(
r"""^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
flags=re.U
)
It does introduce regressions against:
FAILED tests/test_css.py::test_allowed_css[<p style="cursor: -moz-grab;">bar</p>-styles5-<p style="cursor: -moz-grab;">bar</p>]
FAILED tests/test_css.py::test_allowed_css[<p style="text-overflow: ',' ellipsis;">bar</p>-styles8-<p style="text-overflow: ',' ellipsis;">bar</p>]
FAILED tests/test_css.py::test_allowed_css[<p style='text-overflow: "," ellipsis;'>bar</p>-styles9-<p style='text-overflow: "," ellipsis;'>bar</p>]
re: replacing with tinycss / https://github.com/mozilla/bleach/issues/248
tinycss is archived https://github.com/Kozea/tinycss
tinycss2 is Python 3.5+ and probably needs a sponsor https://github.com/Kozea/tinycss2
so that's probably not worth pursuing at the moment.
Reporter | ||
Comment 10•4 years ago
|
||
Nice work!
I'm not sure what the best fix will be for the regressions. We may have to consider larger expression changes.
One other quick question: are you planning on requesting a CVE for this bug?
Assignee | ||
Comment 11•4 years ago
|
||
(In reply to schwag09 from comment #10)
I'm not sure what the best fix will be for the regressions. We may have to consider larger expression changes.
OK per discussion with :willkg and :peterbe yesterday, I updated the PR to accept the above regressions.
Running tests with tox:
- 3.8 errors with the known url changes that I need to address elsewhere
- 2.7, 3.6, docs, and vendorverify targets pass
One other quick question: are you planning on requesting a CVE for this bug?
That's not something I can do directly through Github, since Mozilla has not delegated CNA Scope to request CVEs for bleach to Github. However, for the last couple advisories a downstream packager has emailed security@mozilla.org to request a CVE once the advisory was public.
Reporter | ||
Comment 12•4 years ago
|
||
That's not something I can do directly through Github, since Mozilla has not delegated CNA Scope to request CVEs for bleach to Github. However, for the last couple advisories a downstream packager has emailed security@mozilla.org to request a CVE once the advisory was public.
No worries - I can always request a CVE from MITRE once the advisory is public, too, if that's easier.
Assignee | ||
Comment 13•4 years ago
|
||
Patched in 3.1.4: https://pypi.org/project/bleach/3.1.4/
Advisory is public on GitHub: https://github.com/mozilla/bleach/security/advisories/GHSA-vqhp-cxgc-6wmm
Updated https://github.com/mozilla/bleach/issues/248 with some of the options discussed here and out of bug.
I assigned this sec-moderate since it requires non-default args to clean
and is a DoS. Happy to reconsider if someone else on the security team brings something else up.
Thanks Matt! I appreciate the detailed report.
Reporter | ||
Comment 14•4 years ago
|
||
Sure thing, thanks for being so responsive and quick to fix!
Description
•