Closed Bug 818966 Opened 12 years ago Closed 1 year ago

Remove all/most of the bleach whitelist

Categories

(developer.mozilla.org :: Security, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: groovecoder, Unassigned)

References

Details

"To be safer, we should strip out all the Bleach whitelist exceptions we've been accumulating. This is a fairly large task, since it requires converting examples that depend on Bleach exceptions into live samples."
Just to be clear, when this bug is finished, the *only* exceptions we should retain for Bleach filtering are those that are absolutely necessary for formatting document content. Since live samples allow markup without restriction, all samples / examples / demos should be revised to use that feature.

Ideally, this should result in an *empty* whitelist because Bleach should allow basic document formatting out of the box. But, who knows, we might still have some special cases. 

And, even in those cases, the better solution might be to implement per-page Bleach whitelists. (Thought we had a bug for that already, but can't seem to find it.)

As a bonus, this bug should allow us to return to normal Bleach releases, since my fork exists mainly to support an edge case produced by certain bits of inline CSS on MDN.

Having stated the goal... yeah, this is a big effort that implies edits to many wiki documents.
I think we should divide this in smaller tasks and make these blocking this bug. For example "Convert all existing CSS examples to Live Samples", and so on.

That way we can track what's have been fixed, but also list tasks to do as we discover them.

Any opposition?
Sure, just file blockers to this one and treat this as the overall tracking bug.
Depends on: 819654
Depends on: 819656
Depends on: 819657
Depends on: 819658
Depends on: 819659
Depends on: 819661
Depends on: 819662
Depends on: 819663
No longer depends on: 819662
Depends on: 819665
I created the main dependent bugs. Note that this is not an exhaustive list of the work to be done, but it is a fairly amount of the work for most element of the exception lists. Once these are done, we should be able to find the few remaining use using a Find & Replace Tools without being swamped by 100s of values.
Depends on: 833914
Jean-Yves: Assuming you started today, how long do you think it would take to close all of the documentation-related dependencies of this bug? Trying to get a sense of how far away we are from removing the bleach whitelist.
Flags: needinfo?(jypenator)
Given that this can't be a full-time action, and that we need to double check that we have not forgotten away, we still are months away from being able to remove the bleach whitelist.
Flags: needinfo?(jypenator)
I would like to see this become a priority.  This appears to have a negative impact on security and we are seeing lots of people attempt to exploit this site even before we added it to the bug bounty program so we should take this very seriously.  Can I get an update on what we can do to push up the priority for this and to get a timeline for when it can be complete?
The most difficult and time-consuming work here is on the documentation work, which is described by the dependencies of this bug.

Sheppy is probably the best person to speak to that.
Flags: needinfo?(eshepherd)
And we need a way to know which items on the list are in use and where. Some may be used at very few places and we could remove them quickly.

But we have no way to make such search right now, so even partial removal can't be done.

Also management clearly defined priority onto new documentation for Fx OS and Apps and not update of existing documentation. With the limited resources we have there is no way we can see this near finished in the next 6 months.
Like Jean-Yves said, there's no chance of this happening anytime soon given priorities and available manpower.

Plus we need functionality we don't currently have in the system; namely, a global search that would let us specifically find every use of specific keywords so we can track down all uses of HTML elements in content that we need to revise.

I will try to fix some of these things as I work on content reorganization in the coming months, but realistically, I will only get a tiny percentage of the work done.
Flags: needinfo?(eshepherd)
What about a doc sprint specifically aimed at converting samples to use new live sample system? Aren't we getting a fresh new influx of contributors with the 15-year anniversary thing?
We might also want to consider implementing some systems to track conversion progress, maybe driven by a per-page flag to use a new restricted Bleach whitelist. 

This is very much a manual process, and we have over 50000 documents at last count. Not all of them will need conversion or even review, but there's still a huge corpus to consider. Not even sure how we'd estimate the time to complete this.
Yeah, the amount of work that has to be done is enormous and we don't even know what pages need the work done. And it's not just about samples. It's about content that's embedding stuff because it was made without trying to set it out as being a sample, per se. Or even just content that's using styles embedded instead of site CSS.

Basically, we need to find the pages that are even using stuff that should be blacklisted, then we need to go through them by hand and fix them. It's going to take time. Lots of time.
(In reply to Luke Crouch [:groovecoder] from comment #11)
> What about a doc sprint specifically aimed at converting samples to use new
> live sample system? Aren't we getting a fresh new influx of contributors
> with the 15-year anniversary thing?

We can certainly suggest this as a task for doc sprint participants. It's hard to tell if we are getting new contributors on MDN because I don't have Bug 809991.
I'm going to prioritize this as part of the various sprints we are doing. Let's make this a Q3 goal and see if we can be ready at that point.
Things we need in order to get lots of community involvement on this:

1. A way to find pages that need to be converted, so contributors can easily find and pick pages to do.

2. Clear instructions on how to convert a page to live samples. We have https://developer.mozilla.org/en-US/docs/Project:MDN/Contributing/Editor_guide#Turning_snippets_into_live_samples but it would be nice to separate it out, so it's not in the middle of another big long page of editor instructions.


Nice to have:

* A screencast showing an example of converting a page would also be helpful.

* Visualization of progress on converting samples. Yes, you could just keep going until #1 returns no more pages. But a visualization of how much has been done, as well as how much is left to do, is more motivating (think, giant fundraising thermometer).
1. Finding pages that need converting is tricky. You can't even use "has <pre> blocks but no live sample embeds" because we use <pre> blocks for syntax descriptions and output from terminal commands, for example.

If we can get a list of pages that have <pre> blocks and no live sample embeds, though, we could whittle it down by hand from there. It'd be a start, anyway.

2. I've now created this page: https://developer.mozilla.org/en-US/docs/Project:MDN/Contributing/How_to_help/Code_samples

It's simply an introduction and then transcludes the content from the editor guide. Prevents duplication of effort but gives us a page specifically for the purpose as well.
Sheppy, can you please open a bug for building that list you describe in point 1?
Depends on: 1328439

The MDN content is now in Markdown, and the live samples are generated from code fences inside that Markdown, but served separately from the content. This means we no longer use Bleach, so I will go ahead and mark this bug as wontfix.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.