Closed
Bug 840973
Opened 12 years ago
Closed 12 years ago
update bleach to v1.2.1
Categories
(support.mozilla.org :: General, defect, P3)
support.mozilla.org
General
Tracking
(Not tracked)
RESOLVED
FIXED
2013Q1
People
(Reporter: willkg, Assigned: rrosario)
References
Details
(Whiteboard: u=dev c=infrastructure p=2 s=2013.8)
We're currently using bleach 1.1.5 and should upgrade to 1.2.
Reporter | ||
Updated•12 years ago
|
Summary: update bleach → update bleach to v1.2
Reporter | ||
Comment 1•12 years ago
|
||
Putting this in the 2013.4 sprint, but it's possible that the changes between 1.1.5 and 1.2 aren't so compelling that we couldn't push it off a sprint if need be.
Whiteboard: u=dev c=infrastructure p= s=2013.4
Updated•12 years ago
|
Priority: -- → P4
Target Milestone: --- → 2013Q1
Reporter | ||
Comment 2•12 years ago
|
||
I think this is probably between a 1 pointer and a 2 pointer. It's a small version jump, but I vaguely remember there being some API changes involved.
James: Do you know offhand if this is a trivial project or not?
Reporter | ||
Comment 3•12 years ago
|
||
I missed the CHANGES file:
Version 1.2
-----------
- Bleach will no longer consider unacceptable protocols when linkifying.
- linkify() now takes a tokenizer argument that allows it to skip
sanitization.
- delinkify() is gone.
- Removed exception handling from _render. clean() and linkify() may now
throw.
- linkify() correctly ignores case for protocols and domain names.
- linkify() correctly handles markup within an <a> tag.
We don't use delinkify, so I think it shouldn't be a hard upgrade. Estimating at 1 point.
Whiteboard: u=dev c=infrastructure p= s=2013.4 → u=dev c=infrastructure p=1 s=2013.4
Comment 4•12 years ago
|
||
Assigning to backlog for an appropriate sprint.
Whiteboard: u=dev c=infrastructure p=1 s=2013.4 → u=dev c=infrastructure p=1 s=2013.backlog
Reporter | ||
Comment 5•12 years ago
|
||
Looked into it briefly. Updating bleach requires us to fix py-wikimarkup because it's passing a nofollow argument into linkify which bleach 1.2 no longer has.
py-wikimarkup needs to be fixed. Something like this:
diff --git a/setup.py b/setup.py
index c312255..dd0cb66 100644
--- a/setup.py
+++ b/setup.py
@@ -13,6 +13,6 @@ setup(
url='http://www.github.com/dcramer/py-wikimarkup/',
zip_safe=False,
include_package_data=True,
- install_requires=['bleach>=1.1.4'],
+ install_requires=['bleach>=1.2'],
package_data = { '': ['README.rst'] },
)
diff --git a/wikimarkup/parser.py b/wikimarkup/parser.py
index 65d1dd0..a42e7e8 100644
--- a/wikimarkup/parser.py
+++ b/wikimarkup/parser.py
@@ -1713,7 +1713,11 @@ class Parser(BaseParser):
if utf8:
text.encode("utf-8")
# Pass output through bleach and linkify
- text = bleach.linkify(text, nofollow=nofollow, tokenizer=HTMLTokenizer)
+ if nofollow:
+ callbacks = [bleach.linkify_callbacks.nofollow]
+ else:
+ callbacks = []
+ text = bleach.linkify(text, callbacks=callbacks, tokenizer=HTMLTokenizer)
return bleach.clean(text, tags=self.tags, attributes=attributes,
styles=styles, strip_comments=False)
After that, all the tests in kitsune pass except one:
FAIL: kitsune.apps.sumo.tests.test_parser:TestWikiImageTags.test_page_link_caption
vim +432 apps/sumo/tests/test_parser.py # test_page_link_caption
eq_('my caption', caption)
vim +31 vendor/packages/nose/nose/tools.py # eq_
assert a == b, msg or "%r != %r" % (a, b)
AssertionError: 'my caption' != '<img class=" "\n src="/media/uploads/gallery/images/2013-02-19-11-53-00-098f6b.jpg"\n alt="my caption"\n /> my caption'
The html that gets returned is pretty crazy. Seems like something is horribly wrong there.
Reporter | ||
Comment 6•12 years ago
|
||
Docs for linkify: http://bleach.readthedocs.org/en/latest/linkify.html#bleach-linkify
Assignee | ||
Comment 7•12 years ago
|
||
Would love to do this in next sprint.
Whiteboard: u=dev c=infrastructure p=1 s=2013.backlog → u=dev c=infrastructure p=1 s=2013.6
Reporter | ||
Comment 8•12 years ago
|
||
This requires patching py-wikimarkup. I'm pretty sure James said we're the only users of that, so we can probably just patch it without really thinking about anyone else.
I did some work on this already, but it probably needs to be a 2 pointer.
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] from comment #8)
> I did some work on this already, but it probably needs to be a 2 pointer.
Changing to 2pts. Moving back to the backlog for now.
Whiteboard: u=dev c=infrastructure p=1 s=2013.6 → u=dev c=infrastructure p=2 s=2013.backlog
Comment 10•12 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] from comment #8)
> This requires patching py-wikimarkup. I'm pretty sure James said we're the
> only users of that, so we can probably just patch it without really thinking
> about anyone else.
AFAIK we're still using Paul C's fork, and we're the only users of that. And I have commit. So if there are patches just @jsocol ping me on GitHub.
Updated•12 years ago
|
Assignee | ||
Comment 12•12 years ago
|
||
Putting into current sprint and upping the priority.
Priority: P4 → P3
Whiteboard: u=dev c=infrastructure p=2 s=2013.backlog → u=dev c=infrastructure p=2 s=2013.8
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → rrosario
Assignee | ||
Comment 13•12 years ago
|
||
v1.2.1 is the latest latest
Summary: update bleach to v1.2 → update bleach to v1.2.1
Assignee | ||
Comment 14•12 years ago
|
||
S(In reply to Will Kahn-Greene [:willkg] from comment #5)
> FAIL:
> kitsune.apps.sumo.tests.test_parser:TestWikiImageTags.test_page_link_caption
> vim +432 apps/sumo/tests/test_parser.py # test_page_link_caption
> eq_('my caption', caption)
> vim +31 vendor/packages/nose/nose/tools.py # eq_
> assert a == b, msg or "%r != %r" % (a, b)
> AssertionError: 'my caption' != '<img class=" "\n
> src="/media/uploads/gallery/images/2013-02-19-11-53-00-098f6b.jpg"\n
> alt="my caption"\n /> my caption'
I've been able to reduce this down to the following difference between old and new bleach.
Bleach 1.1.x:
In [5]: bleach.linkify('<a href="/"><noscript><img src="img.jpg" alt=""/></noscript></a>', tokenizer=HTMLTokenizer)
Out[5]: u'<a href="/" rel="nofollow"><noscript><img src="img.jpg" alt=""/></noscript></a>'
Bleach 1.2.1:
In [37]: bleach.linkify('<a href="/"><noscript><img src="img.jpg" alt=""/></noscript></a>', tokenizer=HTMLTokenizer)
Out[37]: u'<a href="/" rel="nofollow"><noscript><img src="img.jpg" alt=""/></noscript></a>'
For some reason, the inner <img/> is getting escaped. If I don't include the outter <a ...>...</a>, everything is fine.
BTW `<a href="/"><noscript><img src="img.jpg" alt=""/></noscript></a>` is valid HTML according to the W3C validator.
Assignee | ||
Comment 15•12 years ago
|
||
So, I took a look at bleach and html5lib.. From what I can tell, it is an issue with the parser treating the contents of <noscript/> as raw text instead of HTML. Then when it goes to render it back, it escapes everything screwing up the HTML inside.
We don't have this problem in older bleach, because bleach used to just write back the text inside an <a/> without parsing:
https://github.com/jsocol/bleach/commit/fb404b2cc0bc22daa3642f214cdda852855d2553
Comment 16•12 years ago
|
||
OK, a quick reading of the spec... http://www.w3.org/TR/html5/scripting-1.html#the-noscript-element confuses the hell out of me. I think it's to do with html5lib. I *guess* that html5lib is parsing it as if scripting was enabled? But that's more of a thing to test than anything else, and I don't know how to test it.
Assignee | ||
Comment 17•12 years ago
|
||
(In reply to James Socol [:jsocol, :james] from comment #16)
> OK, a quick reading of the spec...
> http://www.w3.org/TR/html5/scripting-1.html#the-noscript-element confuses
> the hell out of me. I think it's to do with html5lib. I *guess* that
> html5lib is parsing it as if scripting was enabled? But that's more of a
> thing to test than anything else, and I don't know how to test it.
Yep, I ended up on that page last night too and was scratching my head. I think you are right, it is parsing as if scripting was enabled. In that case, it is following the spec.
Assignee | ||
Comment 18•12 years ago
|
||
This is all related to how we render out images in the wiki so that they are lazy loaded and still work without js. The markup has something like:
```
<a href=...>
<img class="lazy-img" data-src=... />
<noscript>
<img src=... />
</noscript>
</a>
```
we could just change that to:
```
<a href=...>
<img class="lazy-img" data-src=... />
</a>
<noscript>
<a href=...>
<img src=... />
</a>
</noscript>
```
And we'd probably be fine. If we don't care about bleach/html5lib being perfect for this super edge case, I can just make this change instead.
Assignee | ||
Comment 19•12 years ago
|
||
In a pull request:
https://github.com/mozilla/kitsune/pull/1305
I went with the workaround from Comment 18
Assignee | ||
Comment 20•12 years ago
|
||
Landed on master:
https://github.com/mozilla/kitsune/compare/099230e0c1e5...91f3d8c130a6
Assignee | ||
Comment 21•12 years ago
|
||
Rehan deployed this to prod this morning.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•