Closed Bug 1623633 (CVE-2020-6817) Opened 4 years ago Closed 4 years ago

Regular expression denial-of-service in mozilla/bleach BleachSanitizerFilter.sanitize_css gauntlet regular expression

Categories

(Webtools :: Bleach-security, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: schwag09, Assigned: u581815)

References

()

Details

(Keywords: sec-moderate, wsec-deplib, wsec-dos)

Attachments

(1 file)

60 bytes, text/x-github-pull-request
Details | Review

Hi there,

I believe I've found a DoS bug in the mozilla/bleach library. The bug itself is in the BleachSanitizerFilter.sanitize_css function. The vulnerability can be exploited with the following code:

In [1]: import bleach
In [2]: bleach.__version__
Out[2]: '3.1.3'
In [3]: payload = "'" + '"a"' * 64 + '^' + "'"
In [4]: bleach.clean('<a style=' + payload + '></a>', attributes={'a': ['style']})
...Spins...

The bug itself is a ReDoS bug. It occurs due to the gauntlet regular expression:

gauntlet = re.compile(
    r"""^([-/:,#%.'"\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
    flags=re.U
)

The issue is mutually inclusive alternation nested inside a quantifier. In other words, two alternation statements (either/or via |) that have full overlap inside a quantifier (e.g. + or *). The overlapping statements are "[\s\w]+" and [-/:,#%.'"\s!\w] - these overlap such that '"a"' * 64 in the payload causes catastrophic backtracking. We can also exploit this from \w-\w and [-/:,#%.'"\s!\w] with 'a-a' * 64. The ^ in the payload matches none of the alternation statements, so when $ is hit in the regular expression, it causes it to backtrack.

Note that, as seen in the demo payload, attributes={'a': ['style']} is required because bleach will not parse the style attribute by default. This is a mitigating factor that, as far as I can tell, prevents the functionality from being vulnerable by default. However, I likely did not exhaustively search all code paths that lead to BleachSanitizerFilter.sanitize_css. I would double check my work here to ensure there isn't another avenue which makes this exploitable by default.

Regarding a fix, my first choice/suggestion would be to not use regular expressions for sanitization. Not only can it lead to esoteric DoS bugs like this, but it's also difficult to ensure you're allowing the minimal set of text through that you would like. I realize it's a much larger effort to refactor this to not use regular expressions, so we can also patch up the expression itself. To do so we must ensure that none of the alternation statements can completely overlap with another. I believe the following expression should do the trick:

gauntlet = re.compile(
    r"""^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
    flags=re.U
)

Note that we've removed the -, ', and " from the initial alternation statement. This avoids complete overlap with the 2nd, 3rd, and 4th alternation statements. Since the 1st statement doesn't include a ( it cannot overlap with the 5th statement. If this expression breaks some of the logic necessary in the check we'll have to consider other options.

Hope this is helpful! Let me know if you have any questions and what the next steps are. One quick shoutout: I used Dlint + r2c to find this bug. Check them out if you're interested. If you're curious about ReDoS bugs, I'd recommend checking out this blog post: Finding Python ReDoS bugs at scale using Dlint and r2c.

Copying https://bugzilla.mozilla.org/show_bug.cgi?id=1623650#c1 here:

On my end I need to:

  • reproduce and confirm the issue (ideally for multiple versions but it should be anything with that regex in it)
  • make sure we recommend users don't pass unbounded strings
  • look into tweaking guantlet regex as Matt suggests

my first choice/suggestion would be to not use regular expressions for sanitization

Absolutely, we've had a number of bugs around sanitizing CSS and filtering styles, so ideally we could switch to a different or better CSS parser:

CCing :willkg in case he has any additional context or opinions

Side note--world is crazy now and maybe I'm not thinking at 100%. My apologies for bad thinking.

sanitize_css is terrible and I continue to regret not putting in the effort to make the situation better. I agree the best plan is to drop that gauntlet regexp because it's terrible and either replace it with a parser or a library.

I think I understand what's going on to cause the DoS and the suggested fix. I don't know if I could figure out whether it affects real-world scenarios--what things aren't parseable now? Even so, I think the specter of possible unknown regressions is ok and we should fix this.

Relatedly, the email and url regexps are also overly complicated. It'd be interesting to know if they suffer similar problems.

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L46

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L75

I think I understand what's going on to cause the DoS and the suggested fix. I don't know if I could figure out whether it affects real-world scenarios--what things aren't parseable now? Even so, I think the specter of possible unknown regressions is ok and we should fix this.

I agree, I don't really have a sense of what the 1st alternation is getting at in the expression. It feels kind of like a catchall. I took a look through the git history of the gauntlet and it seems the ReDoS has always been there (at least since 2012). One interesting note is that the html5lib version of this function doesn't suffer from the same issue:

https://github.com/mozilla/bleach/blob/master/bleach/_vendor/html5lib/filters/sanitizer.py#L874

I see there's been some changes to the expression over time, which may have fixed minor bugs, but the fact that the html5lib version isn't totally different and doesn't suffer from ReDoS may be some consolation that there won't be big regressions.

Relatedly, the email and url regexps are also overly complicated. It'd be interesting to know if they suffer similar problems.

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L46

https://github.com/mozilla/bleach/blob/master/bleach/linkifier.py#L75

After a cursory look at these expressions, I don't think they're vulnerable to ReDoS.

reproduce and confirm the issue (ideally for multiple versions but it should be anything with that regex in it)

I can confirm the PoC on python 3.8.1 with bleach 3.1.3, 3.1.2, 3.1.1, 3.1.0, 3.0.0, 2.1.4, 2.1.3 and didn't check earlier versions.

git blame shows we landed the current regex with: https://github.com/mozilla/bleach/pull/434 14 months ago and I released it with 3.1.3 this week. I also see I included an incomplete changelog with the 3.1.3 release.

Looking at older versions we used the following for the gauntlet regex:

  • ^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$ w/ re.U in 3.1.3

  • ^([-/:,#%.'"\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$ from 2.0 to 3.1.2, and before that it was split over two lines at least from 1.4 to 1.5

all of which include vulnerable patterns so I'll hazard that all versions are impacted.

edit: as noted schwag09 already note:

I agree, I don't really have a sense of what the 1st alternation is getting at in the expression. It feels kind of like a catchall. I took a look through the git history of the gauntlet and it seems the ReDoS has always been there (at least since 2012).

Assignee: nobody → gguthe
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Note that, as seen in the demo payload, attributes={'a': ['style']} is required because bleach will not parse the style attribute by default. This is a mitigating factor that, as far as I can tell, prevents the functionality from being vulnerable by default. However, I likely did not exhaustively search all code paths that lead to BleachSanitizerFilter.sanitize_css. I would double check my work here to ensure there isn't another avenue which makes this exploitable by default.

I took a look and BleachSanitizerFilter.sanitize_stream -> .sanitize_token -> .allow_token -> .sanitize_css looks like the only path to the regex.

I added failing unit test to the attached PR and confirmed that Matt's proposed fix does fix the backtracking issue:

gauntlet = re.compile(
    r"""^([/:,#%.\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
    flags=re.U
)

It does introduce regressions against:

FAILED tests/test_css.py::test_allowed_css[<p style="cursor: -moz-grab;">bar</p>-styles5-<p style="cursor: -moz-grab;">bar</p>]
FAILED tests/test_css.py::test_allowed_css[<p style="text-overflow: ',' ellipsis;">bar</p>-styles8-<p style="text-overflow: ',' ellipsis;">bar</p>]
FAILED tests/test_css.py::test_allowed_css[<p style='text-overflow: "," ellipsis;'>bar</p>-styles9-<p style='text-overflow: "," ellipsis;'>bar</p>]

re: replacing with tinycss / https://github.com/mozilla/bleach/issues/248

tinycss is archived https://github.com/Kozea/tinycss
tinycss2 is Python 3.5+ and probably needs a sponsor https://github.com/Kozea/tinycss2

so that's probably not worth pursuing at the moment.

Nice work!

I'm not sure what the best fix will be for the regressions. We may have to consider larger expression changes.

One other quick question: are you planning on requesting a CVE for this bug?

(In reply to schwag09 from comment #10)

I'm not sure what the best fix will be for the regressions. We may have to consider larger expression changes.

OK per discussion with :willkg and :peterbe yesterday, I updated the PR to accept the above regressions.

Running tests with tox:

  • 3.8 errors with the known url changes that I need to address elsewhere
  • 2.7, 3.6, docs, and vendorverify targets pass

One other quick question: are you planning on requesting a CVE for this bug?

That's not something I can do directly through Github, since Mozilla has not delegated CNA Scope to request CVEs for bleach to Github. However, for the last couple advisories a downstream packager has emailed security@mozilla.org to request a CVE once the advisory was public.

That's not something I can do directly through Github, since Mozilla has not delegated CNA Scope to request CVEs for bleach to Github. However, for the last couple advisories a downstream packager has emailed security@mozilla.org to request a CVE once the advisory was public.

No worries - I can always request a CVE from MITRE once the advisory is public, too, if that's easier.

Patched in 3.1.4: https://pypi.org/project/bleach/3.1.4/

Advisory is public on GitHub: https://github.com/mozilla/bleach/security/advisories/GHSA-vqhp-cxgc-6wmm

Updated https://github.com/mozilla/bleach/issues/248 with some of the options discussed here and out of bug.

I assigned this sec-moderate since it requires non-default args to clean and is a DoS. Happy to reconsider if someone else on the security team brings something else up.

Thanks Matt! I appreciate the detailed report.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Keywords: sec-moderate
Resolution: --- → FIXED
Summary: Regular expression denial-of-service in mozilla/bleach → Regular expression denial-of-service in mozilla/bleach BleachSanitizerFilter.sanitize_css gauntlet regular expression

Sure thing, thanks for being so responsive and quick to fix!

Please use CVE-2020-6817

Alias: CVE-2020-6817
Group: webtools-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: