Closed Bug 343573 Opened 13 years ago Closed 10 years ago

Allow add-on developers limited use of HTML

Categories

(addons.mozilla.org Graveyard :: Developer Pages, defect, P1, major)

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: cameron, Assigned: clouserw)

References

()

Details

Attachments

(5 files, 1 obsolete file)

Add-on developers should be able to use some HTML. I'd suggest: 
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> and <strong>

Would need some sanity checks to make sure they keep it valid and keep it clean (eg. don't put the whole thing in bold, have tags correctly nested, etc.)

Additionally, it would be great if we could automagically link stuff if someone just types http://mozilla.com

We're not trying to allow people to format it to death, but the complete lack of formatting makes shaver cry all the time -> MAJOR
*quiet cough*
Severity: normal → major
I wrote a function for core/inc_global.php that will handle the autolinking. I figured it would be best to do the autolink as a display thing rather than actually inserting the html into the database when they submit.

This method will require passing the description, developer's comments, version notes, and anything else we want autolinked through this function on every page we want it on.

The pages I can think of without looking too hard: moreinfo.php, search.php, developers/main.php, developers/approval.php, developers/itemoverview.php

//Parse for URLs to hyperlink
function autolink($text) {
   return  preg_replace("/\w+:\/\/\S+/", "<a href=\"\\0\">\\0</a>", $text);
}


I'm also working on the limited HTML... allowing the specific tags was the easy part, still working on automatically closing them.
Question: If we do autolinking, do we need to allow the <a> HTML tag? It would probably be safer if people saw the whole URL they were clicking.

OK, I admit I'm asking because I would have to rewrite my function to make sure it's not already in a hyperlink :)
Attached patch autolink patch (obsolete) — Splinter Review
This patch adds the autolink() function and implements it on the Description, Developer's Comments, and Version Notes on the following pages: extensions/showlist.php, extensions/moreinfo.php, developers/main.php, developers/itemoverview.php, developers/approval.php

I can do another patch for autolinking in public comments if requested.
Also, I could not find search.php in the -rMOZILLA_UPDATE_1_0_BRANCH, but it should probably be updated to autolink as well.

Notes about the autolink() function: Right now it will convert anything with :// in it to be a URL, and the end of the URL is considered the first space. As mentioned in a comment above, this will not be compatable with <a> tags if we allow them in HTML.

I also have a patch for allowing limited HTML based on a 30% rule but I am still tweaking it a bit and awaiting response on whether we need to allow <a> tags.

Thanks, Justin
Attachment #232549 - Flags: first-review?(morgamic)
I think developers should be able to use <a> and people can just look in their status bar for the link location. People click on links all the time, it's the nature of the internet. 

Also I've had a request from Mossop for <ul> and <li> - might as well have <ol> while we're at it.

The code for the public part of AMO is in /webtools/addons/ - http://lxr.mozilla.org/mozilla/source/webtools/addons/
Comment on attachment 232549 [details] [diff] [review]
autolink patch

Alrighty, I'll work on that.
Attachment #232549 - Attachment is obsolete: true
Attachment #232549 - Flags: first-review?(morgamic)
Assignee: nobody → fligtar
Status: NEW → ASSIGNED
Depends on: 347759
Depends on: remora-dev
With the recent XSS awareness and other delays, I don't think I'm going to finish this in time for Remora launch.

TM: Future
Assignee: fligtar → nobody
Status: ASSIGNED → NEW
Target Milestone: --- → Future
http://rediscoverer.net/archives/19 has a cakephp component for http://htmlpurifier.org/ which produces validating HTML sans XSS woes, and which would also let us convert URLs such that they bounced off a redirect vhost and didn't get the whitelist application for outbound links.
Target Milestone: Future → 3.2
Target Milestone: 3.2 → 3.3
Retargeting appropriately.
Target Milestone: 3.3 → 3.x (triaged)
This bug scares me, it is extremely difficult to write regexes that only allow 'safe' html through. There are dozens of ways to get evil html through a basic stripping function/regex. (UTF-8/url/hex/html character encoding, protocol filtering, tag and bracket balancing, style tags + javascript inside whitelisted tags)

Using http://htmlpurifier.org seems like a good idea.
How are we doing this for the homepage url?  Could we use that filter?
Duplicate of this bug: 507320
I am not a big fan of the idea of allowing HTML, but if you allow HTML, please make sure that “&lt;” is interpreted correctly as the character “<”.  To me, ability to write “<” in a long description in an add-on is way more important than ability to write an <acronym> tag (currently neither is possible).
Regarding allowing HTML,
I think the OP just want AMO to support basic formatting in reviews, comments etc. Personally I think support of basic formatting is good. It hasn't to be HTML. We can create special tags (like how BBCode does) which can tell the system to bold it, italicize it, or underline it.

Regarding XSS exploits,
I'm interested to know how XSS exploits if you are really strict at what HTML codes can be parsed. I don't see how we are vulnerable to XSS threats if we only parse <b> <i> <u> <strike> <a href> for example. 

Regarding what basic formatting to support,
Please add <u> too if you are going to implement it:

Basic:
<b> <i> <u> <em> <i> <strike> <strong> <a href="" title="">
<ul> <li> <ol>

More (OP suggestions):
<abbr title=""> <acronym title=""> <blockquote cite=""> <code>
Rich Text Format (HTML or BBCode doesn't matter) should be available for add-ons developers (in add-on long description field at least) and collectors (in yet-to-be-implemented collection long description field at least).
(In reply to comment #15)
> Rich Text Format (HTML or BBCode doesn't matter) should be available for
> add-ons developers (in add-on long description field at least) and collectors
> (in yet-to-be-implemented collection long description field at least).

By the way, can't we simply use some WYSIWYG Online Editor (preferably Open Source) such as Xinha or openWYSIWYG?
This bug has a patch I wrote in 2006, omg!

We are tackling this next release (5.3). Spec incoming.
Target Milestone: 4.x (triaged) → 5.3
(In reply to comment #18)
> Proposed mini-spec:
> http://docs.google.com/Doc?docid=0Acwo2Bn17-PrZGZudHRobnJfMjFnYnhyMnFjNQ&hl=en

I thought we were using the fake markup stuff that the editors' tools are already using.
I don't want to make add-on developers that already know HTML use some other made up code. We're about web standards 'n stuff.

And I didn't know the editor tools had a feature needing markup, but I frown upon their fake tags too.
(In reply to comment #20)
> I don't want to make add-on developers that already know HTML use some other
> made up code.

I know HTML but prefer markdown for writing text because it looks nice and reads easier.  I'm sorry that you're scared of new things.

> We're about web standards 'n stuff.

That's nice.
I prefer HTML too, but I'm concerned about injection attacks.  Using an alternative form mitigates that concern.
Can you give some examples of injection attacks you're concerned about?

[url=" <script>alert('XSS');</script> <a href="http://google.com] still converts to XSS unless you do some validation of the parameter, which is the same validation you'd be doing in an HTML tag.
That's not markdown syntax, nonetheless: Doesn't the markdown parser take care of proper input validation/escaping? I am in favor of keeping markup languages consistent across the editor and dev CPs.
I still don't know where in the Editor CP it is used, and while I'm fine with being consistent, I don't think we should use the same thing just because a group of maybe 10 people have previously used it.

If someone tells me where in the Editor CP it is used I can look at the markdown syntax there and update my XSS example appropriately.
(In reply to comment #23)
> [url=" <script>alert('XSS');</script> <a href="http://google.com] still
> converts to XSS unless you do some validation of the parameter, which is the
> same validation you'd be doing in an HTML tag.

It's not the same validation you'd do in an HTML tag.  Special chars like < are completely invalid with alternate syntax.
Right, so you would strip them out. Just like you would for an href="" attribute. The only acceptable value for a URL parameter in both bbcode and HTML should be a valid URL.

I don't see how they are different here, except the learning curve.
The difference is < is a valid character sometimes in HTML and it's never a valid character in markdown.  It's been shown time and again that trying to parse html into tokens where some characters are valid and some aren't is a recipe for failure.

> The redirector is a standalone script that takes a URL parameter and hash and
> redirects to that URL if the hash matches. (The hash will be the URL + a salt)

Can you explain how this works?  What is the hash+salt for?
(In reply to comment #28)
> > The redirector is a standalone script that takes a URL parameter and hash and
> > redirects to that URL if the hash matches. (The hash will be the URL + a salt)
> 
> Can you explain how this works?  What is the hash+salt for?

The idea was that going from addons.mozilla.org -> redirector/<hash>/http://foo.com would prevent the endpoint from having the xpi install privileges of addons.mozilla.org.  It turns out that this does not work as planned.  Firefox extends the privileges to the redirected site.  If we put a <meta> redirect in the middle (instead of a 302), it would work but be pretty lame.

The <hash> part is so that the redirector isn't open for the whole world to bounce things off, gaining credibility from the mozilla name.  The hash would be a secret shared between the redirector site and our sites like personas and AMO.  The problem there is that everyone knows the correct hash once it's been published on our site, though it would still prevent most of the internet from using it as an open relay.
I don't want to argue about technical implementation anymore; that's your call. I would very much like to have real HTML for this feature, even if it takes considerably longer to implement than a wimpy version.

If you tell me it can't be done, I will frown and move on, but I would like us to at least explore the possibility that it's feasible. This is a pretty common problem that people solve, and there are things like http://htmlpurifier.org that on the surface seem to be perfect for our needs.

Can you or an assigned developer assess our available options with the HTML requirement, and if it absolutely can't be done, then I will refine the spec to use whatever else.

> Can you explain how this works?  What is the hash+salt for?

I don't want to fill up this bug with that discussion; I will explain in the separate redirecter bug I file today or tomorrow, since it looks like Personas will not have built this for us in time.
Depends on: 523249
Assignee: nobody → clouserw
Priority: -- → P1
Attached patch v1Splinter Review
As it turns out, we've put a lot of effort into exactly the opposite of this bug and changing that affects a _lot_ of our core code.  This does not excite me.

So, here is v1 which should address storing data for the 4 requested fields in the spec.  Things not addressed yet:
- echoing out unescaped data (it's still htmlspecialchar'd)
- auto-linking fields in add-on summaries & collection descriptions
- routing all links through the redirector (this is waiting on that bug)
- existing URL fields should use redirector (this shouldn't even be a part of this bug)
Attachment #408672 - Flags: review?(fwenzel)
Attachment #408672 - Flags: review?(fwenzel) → review+
Comment on attachment 408672 [details] [diff] [review]
v1

You want to add HTML Purifier to the vendors dir when you commit this.

Don't forget to add your changes to remora.sql, please.

Also, somehow it feels like this should be easier to achieve, considering we are using a web-framework here. Made for web apps and such. But I am preaching to the choir, aren't I :)

Another problem you may want to look into (or at least tell that I did it wrong), at first it complained that /tmp/HTMLPurifier did not exist, until I created it. If it's going to do stuff in /tmp anyway, it might as well create it itself.
Thanks.

> (From update of attachment 408672 [details] [diff] [review])
r54479

> You want to add HTML Purifier to the vendors dir when you commit this.
r54480

> Don't forget to add your changes to remora.sql, please.
r54482

> Another problem you may want to look into (or at least tell that I did it
> wrong), at first it complained that /tmp/HTMLPurifier did not exist, until I
> created it. If it's going to do stuff in /tmp anyway, it might as well create
> it itself.
That's the HTMLPurifier library - it doesn't create directories automatically, I guess.  I don't want to do a check every time since it's not a fatal error we should see the warnings.  I could get rid of the sub directory and just drop it in the root but that is sloppy.  I could also take it off the NETAPP but I thought the servers might benefit from sharing the cache.  Might not be important.

I'll work on patch #2.
Attached patch part 2Splinter Review
Short patch - this unescapes the data for printing in the views.  I probably missed a spot where we are echoing out the data, but it's easy to fix and I'm sure someone will tell us. :)
Attachment #408919 - Flags: review?(fwenzel)
Attached patch part3Splinter Review
Another small patch.

I didn't think save() was ever called directly on fields but it turns out saveTranslations() will call save() if the id doesn't exist in the table (e.g. addon.description is null still).

This is kind of a pain to test, but it happens on newly created add-ons that have never had descriptions before.  This patch adds the _clean stuff to save().
Attachment #409038 - Flags: review?(fwenzel)
Comment on attachment 408919 [details] [diff] [review]
part 2

Looks good, wfm.
Attachment #408919 - Flags: review?(fwenzel) → review+
Attachment #409038 - Flags: review?(fwenzel) → review+
(In reply to comment #36)
> (From update of attachment 408919 [details] [diff] [review])
> Looks good, wfm.
r54542

(In reply to comment #37)
> (From update of attachment 409038 [details] [diff] [review])
> Works!
r54541
summary of outstanding issues, as much for myself as anyone:

1) "Any tags included that are not allowed should be displayed as entities rather than stripped out." This is just adding $this->config->set('Core.EscapeInvalidTags', true) however, we're passing the strings to our view sanitized, and then unsanitizing them when we need them in that form.  Our unsanitize() function uses preg_replace() which doesn't care how many replacements it does, meaning this is a regular occurance:  &amp;lt; -> &lt; -> > .  This effectively ruins our sanitization with mixed levels of escaping.  The current code is stripping out the entities which is working fine.

2) "Existing URL Fields" from the spec.  This is bug 347759 and not a part of this bug.

3) Link-replacement to hook into the redirection script hasn't been done.

4) Link-replacement JS to hide the URL hasn't been done.

5) Auto-linking in add-on summaries and collection descriptions hasn't been done.

If any part of these are not P1s please file separate bugs for them.
<3 clouserw
Should take care of #5 of comment 39
Attachment #409716 - Flags: review?(fwenzel)
Comment on attachment 409716 [details] [diff] [review]
343573.partsomething.patch

That wfm. I hope this won't be abused for spam, but time will tell.
Attachment #409716 - Flags: review?(fwenzel) → review+
Thanks, r54864.  It should be noted I'm leaving the short summaries in the collections in the carousel and the collection descriptions on the right side of the home page as the original text (not linked) since I don't think links fit well there.

I also added a bonus script in r54866 which will seed your database with clean strings.
This will route the URLs through outgoing.m.o.  I'm ignoring URLs that start with mailto:.  There may be others we want to ignore too, but that's all for now.
Attachment #410079 - Flags: review?(jbalogh)
Comment on attachment 410079 [details] [diff] [review]
route URLs through outgoing.m.o

warning: 2 lines add whitespace errors.

r+ after you fix the url encoding issues, see the bottom.


>--- a/site/app/controllers/components/includes/amopurifier.php
>+++ b/site/app/controllers/components/includes/amopurifier.php

>@@ -102,6 +107,54 @@ class AMOPurifier extends Object {
> 
>         return $this->purifier;
>     }
>-    
>+
>+    private function replaceURLs($text) {
>+    }
>+}
>+

Is that suppposed to be here?

>+class HTMLPurifier_URIFilter_AMORedirector extends HTMLPurifier_URIFilter {
>+
>+    public $name = 'AMORedirector';
>+
>+    // Do this after we filter/cleanup
>+    public $post = true;
>+
>+    private $redirect_url;
>+
>+    public function prepare($config) {
>+        $this->redirect_url = parse_url(REDIRECT_URL);
>+    }
>+
>+    public function filter(&$uri, $config, $context) {
>+
>+        if ($uri->scheme == 'mailto') return true;
>+
>+        // Build the full redirect URL
>+            $_user_url = "{$uri->scheme}://";

Tabs vs. spaces?

>+            if (!empty($uri->userinfo)) {
>+                $_user_url .= "{$uri->userinfo}@";
>+            }
>+            $_user_url .= "{$uri->host}";
>+            if (!empty($uri->port)) {
>+                $_user_url .= ":{$uri->port}";
>+            }
>+            $_user_url .= "/{$uri->path}";
>+            if (!empty($uri->query)) {
>+                $_user_url .= "?{$uri->query}";
>+            }
>+            if (!empty($uri->fragment)) {
>+                $_user_url .= "#{$uri->fragment}";
>+            }

That whole thing makes me kinda sad.

>+
>+        $_user_hash = sha1(REDIRECT_SECRET_KEY . $_user_url);
>+
>+        $_path = "{$this->redirect_url['path']}{$_user_hash}/{$_user_url}";

$_user_url needs to be url encoded or else weird things will happen, like fragments getting dropped.
Attachment #410079 - Flags: review?(jbalogh) → review+
Thanks, r55022

> Is that suppposed to be here?
f no

> Tabs vs. spaces?
Seemed like it was all spaces to me.

> That whole thing makes me kinda sad.
I looked for a collapse_url() function or something.  Seems like it would exist.

> $_user_url needs to be url encoded or else weird things will happen, like
> fragments getting dropped.
Done!

(In reply to comment #39)
> 
> 1) "Any tags included that are not allowed should be displayed as entities
> rather than stripped out." This is just adding
> $this->config->set('Core.EscapeInvalidTags', true) however, we're passing the
> strings to our view sanitized, and then unsanitizing them when we need them in
> that form.  Our unsanitize() function uses preg_replace() which doesn't care
> how many replacements it does, meaning this is a regular occurance:  &amp;lt;
> -> &lt; -> > .  This effectively ruins our sanitization with mixed levels of
> escaping.  The current code is stripping out the entities which is working
> fine.

I turned this in to bug 526334.

> 4) Link-replacement JS to hide the URL hasn't been done.

The way htmlpurifier generates the link doesn't make it easy to inject something like a class which we could key off of easily with some JS.  I don't know how we'll do that, but I'm not considering that a blocker for this.  I've filed bug 526373.

With all that, I'm calling this fixed.

QA:  URLs typed in and with <a> tags should be sent through the redirecter.  Talk to me on IRC to get the current status of outgoing.m.o, and the pre-seeded text in db.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Verified FIXED; see the "More about this add-on" section on https://preview.addons.mozilla.org/en-US/firefox/addon/9331.
Status: RESOLVED → VERIFIED
Thanks for all of your work on this, Wil.
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.