1623633 - (CVE-2020-6817) Regular expression denial-of-service in mozilla/bleach BleachSanitizerFilter.sanitize_css gauntlet regular expression

Reporter

Description

•

6 years ago

Hi there,

I believe I've found a DoS bug in the mozilla/bleach library. The bug itself is in the BleachSanitizerFilter.sanitize_css function. The vulnerability can be exploited with the following code:

In [1]: import bleach
In [2]: bleach.__version__
Out[2]: '3.1.3'
In [3]: payload = "'" + '"a"' * 64 + '^' + "'"
In [4]: bleach.clean('<a style=' + payload + '></a>', attributes={'a': ['style']})
...Spins...

The bug itself is a ReDoS bug. It occurs due to the gauntlet regular expression:

gauntlet = re.compile(
    r"""^([-/:,#%.'"\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
    flags=re.U
)

The issue is mutually inclusive alternation nested inside a quantifier. In other words, two alternation statements (either/or via |) that have full overlap inside a quantifier (e.g. + or *). The overlapping statements are "[\s\w]+" and [-/:,#%.'"\s!\w] - these overlap such that '"a"' * 64 in the payload causes catastrophic backtracking. We can also exploit this from \w-\w and [-/:,#%.'"\s!\w] with 'a-a' * 64. The ^ in the payload matches none of the alternation statements, so when $ is hit in the regular expression, it causes it to backtrack.

Note that, as seen in the demo payload, attributes={'a': ['style']} is required because bleach will not parse the style attribute by default. This is a mitigating factor that, as far as I can tell, prevents the functionality from being vulnerable by default. However, I likely did not exhaustively search all code paths that lead to BleachSanitizerFilter.sanitize_css. I would double check my work here to ensure there isn't another avenue which makes this exploitable by default.

Regarding a fix, my first choice/suggestion would be to not use regular expressions for sanitization. Not only can it lead to esoteric DoS bugs like this, but it's also difficult to ensure you're allowing the minimal set of text through that you would like. I realize it's a much larger effort to refactor this to not use regular expressions, so we can also patch up the expression itself. To do so we must ensure that none of the alternation statements can completely overlap with another. I believe the following expression should do the trick:

gauntlet = re.compile(
    r"""^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
    flags=re.U
)

Note that we've removed the -, ', and " from the initial alternation statement. This avoids complete overlap with the 2nd, 3rd, and 4th alternation statements. Since the 1st statement doesn't include a ( it cannot overlap with the 5th statement. If this expression breaks some of the logic necessary in the check we'll have to consider other options.

Hope this is helpful! Let me know if you have any questions and what the next steps are. One quick shoutout: I used Dlint + r2c to find this bug. Check them out if you're interested. If you're curious about ReDoS bugs, I'd recommend checking out this blog post: Finding Python ReDoS bugs at scale using Dlint and r2c.

Greg G

Assignee

Comment 2

•

6 years ago

•

Edited

Copying https://bugzilla.mozilla.org/show_bug.cgi?id=1623650#c1 here:

On my end I need to:

reproduce and confirm the issue (ideally for multiple versions but it should be anything with that regex in it)
make sure we recommend users don't pass unbounded strings
look into tweaking guantlet regex as Matt suggests

my first choice/suggestion would be to not use regular expressions for sanitization

Absolutely, we've had a number of bugs around sanitizing CSS and filtering styles, so ideally we could switch to a different or better CSS parser:

tinycss for a pure python implementation (with optional cython) https://github.com/mozilla/bleach/issues/248
something battle-tested from Firefox/servo w/ python bindings e.g. https://github.com/servo/rust-cssparser

Greg G

Assignee

Comment 3

•

6 years ago

CCing :willkg in case he has any additional context or opinions

Greg G

Assignee

Updated

•

6 years ago

URL: https://github.com/mozilla/bleach/sec...

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 4

•

6 years ago

Side note--world is crazy now and maybe I'm not thinking at 100%. My apologies for bad thinking.

sanitize_css is terrible and I continue to regret not putting in the effort to make the situation better. I agree the best plan is to drop that gauntlet regexp because it's terrible and either replace it with a parser or a library.

I think I understand what's going on to cause the DoS and the suggested fix. I don't know if I could figure out whether it affects real-world scenarios--what things aren't parseable now? Even so, I think the specter of possible unknown regressions is ok and we should fix this.

Relatedly, the email and url regexps are also overly complicated. It'd be interesting to know if they suffer similar problems.

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L46

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L75

schwag09

Reporter

Comment 5

•

6 years ago

I think I understand what's going on to cause the DoS and the suggested fix. I don't know if I could figure out whether it affects real-world scenarios--what things aren't parseable now? Even so, I think the specter of possible unknown regressions is ok and we should fix this.

I agree, I don't really have a sense of what the 1st alternation is getting at in the expression. It feels kind of like a catchall. I took a look through the git history of the gauntlet and it seems the ReDoS has always been there (at least since 2012). One interesting note is that the html5lib version of this function doesn't suffer from the same issue:

https://github.com/mozilla/bleach/blob/master/bleach/_vendor/html5lib/filters/sanitizer.py#L874

I see there's been some changes to the expression over time, which may have fixed minor bugs, but the fact that the html5lib version isn't totally different and doesn't suffer from ReDoS may be some consolation that there won't be big regressions.

Relatedly, the email and url regexps are also overly complicated. It'd be interesting to know if they suffer similar problems.

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L46

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L75

After a cursory look at these expressions, I don't think they're vulnerable to ReDoS.

Greg G

Assignee

Comment 6

•

6 years ago

•

Edited

reproduce and confirm the issue (ideally for multiple versions but it should be anything with that regex in it)

I can confirm the PoC on python 3.8.1 with bleach 3.1.3, 3.1.2, 3.1.1, 3.1.0, 3.0.0, 2.1.4, 2.1.3 and didn't check earlier versions.

git blame shows we landed the current regex with: https://github.com/mozilla/bleach/pull/434 14 months ago and I released it with 3.1.3 this week. I also see I included an incomplete changelog with the 3.1.3 release.

Looking at older versions we used the following for the gauntlet regex:

^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|$[\d,%\.\s]+$)*$ w/ re.U in 3.1.3
^([-/:,#%.'"\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|$[\d,%\.\s]+$)*$ from 2.0 to 3.1.2, and before that it was split over two lines at least from 1.4 to 1.5

all of which include vulnerable patterns so I'll hazard that all versions are impacted.

edit: as noted schwag09 already note:

I agree, I don't really have a sense of what the 1st alternation is getting at in the expression. It feels kind of like a catchall. I took a look through the git history of the gauntlet and it seems the ReDoS has always been there (at least since 2012).

Assignee: nobody → gguthe

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Keywords: wsec-deplib, wsec-dos

Greg G

Assignee

Comment 7

•

6 years ago

Note that, as seen in the demo payload, attributes={'a': ['style']} is required because bleach will not parse the style attribute by default. This is a mitigating factor that, as far as I can tell, prevents the functionality from being vulnerable by default. However, I likely did not exhaustively search all code paths that lead to BleachSanitizerFilter.sanitize_css. I would double check my work here to ensure there isn't another avenue which makes this exploitable by default.

I took a look and BleachSanitizerFilter.sanitize_stream -> .sanitize_token -> .allow_token -> .sanitize_css looks like the only path to the regex.

Greg G

Assignee

Comment 8

•

6 years ago

Attached file WIP GitHub Pull Request — Details

Greg G

Assignee

Comment 9

•

6 years ago

•

Edited

I added failing unit test to the attached PR and confirmed that Matt's proposed fix does fix the backtracking issue:

gauntlet = re.compile(
    r"""^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
    flags=re.U
)

It does introduce regressions against:

FAILED tests/test_css.py::test_allowed_css[<p style="cursor: -moz-grab;">bar</p>-styles5-<p style="cursor: -moz-grab;">bar</p>]
FAILED tests/test_css.py::test_allowed_css[<p style="text-overflow: ',' ellipsis;">bar</p>-styles8-<p style="text-overflow: ',' ellipsis;">bar</p>]
FAILED tests/test_css.py::test_allowed_css[<p style='text-overflow: "," ellipsis;'>bar</p>-styles9-<p style='text-overflow: "," ellipsis;'>bar</p>]

re: replacing with tinycss / https://github.com/mozilla/bleach/issues/248

tinycss is archived https://github.com/Kozea/tinycss
tinycss2 is Python 3.5+ and probably needs a sponsor https://github.com/Kozea/tinycss2

so that's probably not worth pursuing at the moment.

schwag09

Reporter

Comment 10

•

6 years ago

Nice work!

I'm not sure what the best fix will be for the regressions. We may have to consider larger expression changes.

One other quick question: are you planning on requesting a CVE for this bug?

Greg G

Assignee

Comment 11

•

6 years ago

(In reply to schwag09 from comment #10)

I'm not sure what the best fix will be for the regressions. We may have to consider larger expression changes.

OK per discussion with :willkg and :peterbe yesterday, I updated the PR to accept the above regressions.

Running tests with tox:

3.8 errors with the known url changes that I need to address elsewhere
2.7, 3.6, docs, and vendorverify targets pass

One other quick question: are you planning on requesting a CVE for this bug?

That's not something I can do directly through Github, since Mozilla has not delegated CNA Scope to request CVEs for bleach to Github. However, for the last couple advisories a downstream packager has emailed security@mozilla.org to request a CVE once the advisory was public.

schwag09

Reporter

Comment 12

•

6 years ago

That's not something I can do directly through Github, since Mozilla has not delegated CNA Scope to request CVEs for bleach to Github. However, for the last couple advisories a downstream packager has emailed security@mozilla.org to request a CVE once the advisory was public.

No worries - I can always request a CVE from MITRE once the advisory is public, too, if that's easier.

Greg G

Assignee

Comment 13

•

6 years ago

Patched in 3.1.4: https://pypi.org/project/bleach/3.1.4/

Advisory is public on GitHub: https://github.com/mozilla/bleach/security/advisories/GHSA-vqhp-cxgc-6wmm

Updated https://github.com/mozilla/bleach/issues/248 with some of the options discussed here and out of bug.

I assigned this sec-moderate since it requires non-default args to clean and is a DoS. Happy to reconsider if someone else on the security team brings something else up.

Thanks Matt! I appreciate the detailed report.

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Keywords: sec-moderate

Resolution: --- → FIXED

Summary: Regular expression denial-of-service in mozilla/bleach → Regular expression denial-of-service in mozilla/bleach BleachSanitizerFilter.sanitize_css gauntlet regular expression

schwag09

Reporter

Comment 14

•

6 years ago

Sure thing, thanks for being so responsive and quick to fix!

Tom Ritter [:tjr]

Comment 15

•

6 years ago

Please use CVE-2020-6817

Alias: CVE-2020-6817

Greg G

Assignee

Updated

•

6 years ago

Group: webtools-security

Bugzilla

Regular expression denial-of-service in mozilla/bleach BleachSanitizerFilter.sanitize_css gauntlet regular expression

Categories

(Webtools :: Bleach-security, defect)

Tracking

(Not tracked)

People

(Reporter: schwag09, Assigned: u581815)

References

(
URL
)

Details

(Keywords: sec-moderate, wsec-deplib, wsec-dos)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Attachment

General

Description

File Name

Content Type