Closed
Bug 274386
Opened 20 years ago
Closed 20 years ago
RSS including HTML entities in <description>
Categories
(addons.mozilla.org Graveyard :: Public Pages, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
1.1
People
(Reporter: philor, Assigned: Bugzilla-alanjstrBugs)
References
Details
Attachments
(1 file, 4 obsolete files)
|
9.33 KB,
patch
|
mconnor
:
first-review+
|
Details | Diff | Splinter Review |
The "Newest Firefox Extensions" RSS feed currently includes an <item> for http://update.mozilla.org/extensions/moreinfo.php?id=420&vid=1382 which has a fairly borked description even in the HTML (slashes escaping single quotes), but in the RSS it's pointing out that the generating code is apparently doing a no-no, including HTML entities. That's probably the number one way of making an RSS feed not-well-formed XML, and none of the ways around it (including the RSS 0.91 DTD, or a DTD fragment that defines them, mostly) work at all well: far far better to pick a charset (UTF-8 unless you absolutely have to have something different) and then convert any entities coming out of your db to actual characters, rather than what looks to be happening, converting UTF-8 characters into two broken entities based on the ISO-8859-1 characters that the UTF-8 bytes sort-of resemble (a » converted to »). (For pedantic purposes, if you are using encoding="utf-8", you should also be sending the mime-type application/xml, or at the very least include a charset in the content-type header, because text/xml with no charset *is* US-ASCII, no matter what encoding you put in the XML Declaration. Very few consumers actually follow that requirement, but it's easy enough to avoid the problem entirely.)
The backslashes you see are there because that's what was given to me, so that's what I pasted. But yes, it should be application/xml. I've re-saved the DB entry.
Assignee: psychoticwolf → Bugzilla-alanjstrBugs
Status: NEW → ASSIGNED
Attachment #168605 -
Flags: first-review?(psychoticwolf)
Hmmm. Went back to bug 273206 comment 4. There were no backslashes there. Since I re-saved the data, it isn't appearing with backslashes any more.
Comment 3•20 years ago
|
||
http://update-beta.mozilla.org/rss/?application=firefox&type=E&list=newest displays this: XML Parsing Error: undefined entity Location: http://update-beta.mozilla.org/rss/?application=firefox&type=E&list=newest Line Number 31, Column 36:- Menu Items under 'Go' and 'View--»Sidebar' That's in the description of the "Enhanced History Manager 0.4.1.02". It complains about Â. I don't think that belongs there (it's the letter A with a circumflex), and should be removed.
Comment 5•20 years ago
|
||
Thanks, that fixed it.
Comment 6•20 years ago
|
||
Comment on attachment 168605 [details] [diff] [review] Proper XML mimetype Patch looks fine. Though does it actually decode the html entities (if needed) and ensure DB data will always yield a proper RSS feed?
Attachment #168605 -
Flags: first-review?(psychoticwolf) → first-review+
| Reporter | ||
Comment 7•20 years ago
|
||
(In reply to comment #6) > Though does it actually decode the html entities (if needed) > and ensure DB data will always yield a proper RSS feed? Nope, now that I finally figured out where the source is (I'm slow, I am), what this bug needs is a s/htmlentities/htmlspecialchars/g on inc_rssfeed.php. You don't want to entify everything, only & and < (you'll get > too, but with ENT_NOQUOTES since you aren't sticking anything in an attribute, you can at least avoid the overhead of replacing '/"). Personally, I'd leave it out entirely on things like $sitelanguage, and require callers to behave themselves, since these feeds are going to beat the servers to a bloody pulp anyway. Language is required to be an ISO639/ISO3166 code, which means it's US-ASCII: any caller that doesn't make it so deserves to explode.
I'm not sure if I was supposed to get rid of the UTF stuff. I don't need to make the stuff inside entity-safe?
Attachment #168605 -
Attachment is obsolete: true
Attachment #169340 -
Flags: first-review?(psychoticwolf)
Comment 9•20 years ago
|
||
Comment on attachment 169340 [details] [diff] [review] Wrap lots of stuff in htmlspecialchars There's no need to htmlspecialchars() the same variable more than once. $description is an example. If you've ran the function on the variable when it was assigned you don't need to do it again when it's echoed. This patch is also a mess to read. @Phil: Your thoughts on this patch, with regard to it accomplishing its goal?
Attachment #169340 -
Flags: first-review?(psychoticwolf) → first-review-
Comment 10•20 years ago
|
||
Bumping target milestone, though this change can be applied to the site when its complete. :-)
Target Milestone: 1.0 → 1.1
Version: unspecified → 0.9
| Reporter | ||
Comment 11•20 years ago
|
||
>+echo " <title>" . htmlspecialchars($sitetitle) . "::" . htmlspecialchars($list) . " " . $listType . "</title>\n"; Function calls are cheap, but they aren't free. $list, yes, because you shouldn't break on bad user input, but all of the $sitefoo vars were set by a caller who knew he was creating XML: if he puts in "<stuff&junk>", he should break, rather than making millions of extra function calls every day. >+echo " <description>" . htmlspecialchars($description) . "</description>\n"; $sitedescription, no? >+ $title = htmlspecialchars($row["Title"]); >+ $description = htmlspecialchars($row["Description"]); htmlspecialchars($row["Title"],ENT_NOQUOTES,UTF-8) for both would be better: you aren't putting it in an attribute, so there's no point in escaping quotes, and no sense risking having a multi-byte UTF-8 character where a byte in the middle looks like a < to the default ISO-8859-1 charset. Get those two, right there only, plus the content-type header that you still also need, as header("Content-type: application/xml; charset=utf-8");, and your only risk of being not-well-formed ought to be if someone can put a non-UTF-8 character into the db.
| Assignee | ||
Comment 12•20 years ago
|
||
Yeah, i got carried away. patch later.
| Assignee | ||
Comment 13•20 years ago
|
||
*** Bug 275775 has been marked as a duplicate of this bug. ***
| Assignee | ||
Comment 14•20 years ago
|
||
Attachment #169340 -
Attachment is obsolete: true
Attachment #169463 -
Flags: first-review?(psychoticwolf)
Updated•20 years ago
|
Attachment #168605 -
Flags: first-review+
| Assignee | ||
Comment 15•20 years ago
|
||
Attachment #169463 -
Attachment is obsolete: true
Attachment #169464 -
Flags: first-review?(psychoticwolf)
| Assignee | ||
Comment 16•20 years ago
|
||
Attachment #169464 -
Attachment is obsolete: true
Attachment #169467 -
Flags: first-review?(psychoticwolf)
Attachment #169467 -
Flags: first-review?(psychoticwolf) → first-review?(cst)
Attachment #169467 -
Flags: first-review?(cst) → first-review?(cbeard)
Attachment #169464 -
Flags: first-review?(psychoticwolf)
Attachment #169463 -
Flags: first-review?(psychoticwolf)
Attachment #169467 -
Flags: first-review?(cbeard) → first-review?(mconnor)
Comment 17•20 years ago
|
||
Comment on attachment 169467 [details] [diff] [review] missed a bit >Index: rss/inc_rssfeed.php >=================================================================== >RCS file: /cvsroot/mozilla/webtools/update/rss/inc_rssfeed.php,v >retrieving revision 1.1 >diff --unified=4 -d -p -u -p -r1.1 inc_rssfeed.php >--- rss/inc_rssfeed.php 24 Aug 2004 19:39:49 -0000 1.1 >+++ rss/inc_rssfeed.php 23 Dec 2004 21:21:01 -0000 >@@ -1,95 +1,96 @@ >-<?php >-// ***** BEGIN LICENSE BLOCK ***** >-// Version: MPL 1.1/GPL 2.0/LGPL 2.1 >-// >-// The contents of this file are subject to the Mozilla Public License Version >-// 1.1 (the "License"); you may not use this file except in compliance with >-// the License. You may obtain a copy of the License at >-// http://www.mozilla.org/MPL/ >-// >-// Software distributed under the License is distributed on an "AS IS" basis, >-// WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License >-// for the specific language governing rights and limitations under the >-// License. >-// >-// The Original Code is Mozilla Update. >-// >-// The Initial Developer of the Original Code is >-// Chris "Wolf" Crews. >-// Portions created by the Initial Developer are Copyright (C) 2004 >-// the Initial Developer. All Rights Reserved. >-// >-// Contributor(s): >-// Chris "Wolf" Crews <psychoticwolf@carolina.rr.com> >-// Alan Starr <alanjstarr@yahoo.com> >-// >-// Alternatively, the contents of this file may be used under the terms of >-// either the GNU General Public License Version 2 or later (the "GPL"), or >-// the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), >-// in which case the provisions of the GPL or the LGPL are applicable instead >-// of those above. If you wish to allow use of your version of this file only >-// under the terms of either the GPL or the LGPL, and not to allow others to >-// use your version of this file under the terms of the MPL, indicate your >-// decision by deleting the provisions above and replace them with the notice >-// and other provisions required by the GPL or the LGPL. If you do not delete >-// the provisions above, a recipient may use your version of this file under >-// the terms of any one of the MPL, the GPL or the LGPL. >-// >-// ***** END LICENSE BLOCK ***** >-?> >-<?php >-// http://blogs.law.harvard.edu/tech/rss >- >-switch ($type) { >- case "E": >- $listType = "Extensions"; >- break; >- case "P": >- $listType = "Plugins"; >- break; >- case "T": >- $listType = "Themes"; >- break; >-} >- >-echo "<rss version=\"2.0\">\n"; >-echo "<channel>\n"; >- >-echo " <title>" . htmlentities($sitetitle) . "::" . htmlentities($list) . " " . $listType . "</title>\n"; >-echo " <link>" . htmlentities($siteurl) . "</link>\n"; >-echo " <description>" . htmlentities($description) . "</description>\n"; >-echo " <language>" . htmlentities($sitelanguage) . "</language>\n"; >-echo " <copyright>" . htmlentities($sitecopyright) . "</copyright>\n"; >-echo " <lastBuildDate>" . $currenttime . "</lastBuildDate>\n"; >-echo " <ttl>" . $rssttl . "</ttl>\n"; >-echo " <image>\n"; >-echo " <title>" . htmlentities($sitetitle) . "</title>\n"; >-echo " <link>" . htmlentities($siteurl) . "</link>\n"; >-echo " <url>" . htmlentities($siteicon) . "</url>\n"; >-echo " <width>16</width>\n"; >-echo " <height>16</height>\n"; >-echo " </image>\n"; >- >- $sql_result = mysql_query($sql, $connection) or trigger_error("MySQL Error ".mysql_errno().": ".mysql_error()."", E_USER_NOTICE); >- while ($row = mysql_fetch_array($sql_result)) { >- $id = $row["ID"]; >- $title = htmlentities($row["Title"]); >- $description = htmlentities($row["Description"]); >- $dateupdated = gmdate("r", strtotime($row["DateStamp"])); >- $version = $row["Version"]; >- $vid = $row["vID"]; >- $appname = $row["AppName"]; >- >- echo " <item>\n"; >- echo " <pubDate>" . $dateupdated . "</pubDate>\n"; >- echo " <title>" . $title . " " . $version . " for " . $appname . "</title>\n"; >- echo " <link>http://$sitehostname/" . strtolower($listType) . "/moreinfo.php?id=" . $id . "&vid=" . $vid . "</link>\n"; >- echo " <description>" . $description . "</description>\n"; >- echo " </item>\n"; >- >- } >- >-echo "</channel>\n"; >-echo "</rss>\n"; >- >-?> >+<?php >+// ***** BEGIN LICENSE BLOCK ***** >+// Version: MPL 1.1/GPL 2.0/LGPL 2.1 >+// >+// The contents of this file are subject to the Mozilla Public License Version >+// 1.1 (the "License"); you may not use this file except in compliance with >+// the License. You may obtain a copy of the License at >+// http://www.mozilla.org/MPL/ >+// >+// Software distributed under the License is distributed on an "AS IS" basis, >+// WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License >+// for the specific language governing rights and limitations under the >+// License. >+// >+// The Original Code is Mozilla Update. >+// >+// The Initial Developer of the Original Code is >+// Chris "Wolf" Crews. >+// Portions created by the Initial Developer are Copyright (C) 2004 >+// the Initial Developer. All Rights Reserved. >+// >+// Contributor(s): >+// Chris "Wolf" Crews <psychoticwolf@carolina.rr.com> >+// Alan Starr <alanjstarr@yahoo.com> >+// >+// Alternatively, the contents of this file may be used under the terms of >+// either the GNU General Public License Version 2 or later (the "GPL"), or >+// the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), >+// in which case the provisions of the GPL or the LGPL are applicable instead >+// of those above. If you wish to allow use of your version of this file only >+// under the terms of either the GPL or the LGPL, and not to allow others to >+// use your version of this file under the terms of the MPL, indicate your >+// decision by deleting the provisions above and replace them with the notice >+// and other provisions required by the GPL or the LGPL. If you do not delete >+// the provisions above, a recipient may use your version of this file under >+// the terms of any one of the MPL, the GPL or the LGPL. >+// >+// ***** END LICENSE BLOCK ***** >+?> >+<?php >+// http://blogs.law.harvard.edu/tech/rss >+ >+switch ($type) { >+ case "E": >+ $listType = "Extensions"; >+ break; >+ case "P": >+ $listType = "Plugins"; >+ break; >+ case "T": >+ $listType = "Themes"; >+ break; >+} >+ >+echo "<rss version=\"2.0\">\n"; >+echo "<channel>\n"; >+ >+echo " <title>" . htmlspecialchars($sitetitle,ENT_NOQUOTES,UTF-8) . "::" . htmlspecialchars($list,ENT_NOQUOTES,UTF-8) . " " . $listType . "</title>\n"; >+echo " <link>" . $siteurl . "</link>\n"; >+echo " <description>" . htmlspecialchars($sitedescription,ENT_NOQUOTES,UTF-8) . >+"</description>\n"; >+echo " <language>" . $sitelanguage . "</language>\n"; >+echo " <copyright>" . htmlspecialchars($sitecopyright,ENT_NOQUOTES,UTF-8) . "</copyright>\n"; >+echo " <lastBuildDate>" . $currenttime . "</lastBuildDate>\n"; >+echo " <ttl>" . $rssttl . "</ttl>\n"; >+echo " <image>\n"; >+echo " <title>" . htmlspecialchars($sitetitle,ENT_NOQUOTES,UTF-8) . "</title>\n"; >+echo " <link>" . $siteurl . "</link>\n"; >+echo " <url>" . $siteicon . "</url>\n"; >+echo " <width>16</width>\n"; >+echo " <height>16</height>\n"; >+echo " </image>\n"; >+ >+ $sql_result = mysql_query($sql, $connection) or trigger_error("MySQL Error ".mysql_errno().": ".mysql_error()."", E_USER_NOTICE); >+ while ($row = mysql_fetch_array($sql_result)) { >+ $id = $row["ID"]; >+ $title = htmlspecialchars($row["Title"],ENT_NOQUOTES,UTF-8); >+ $description = htmlspecialchars($row["Description"],ENT_NOQUOTES,UTF-8); >+ $dateupdated = gmdate("r", strtotime($row["DateStamp"])); >+ $version = htmlspecialchars($row["Version"],ENT_NOQUOTES,UTF-8); >+ $vid = $row["vID"]; >+ $appname = htmlspecialchars($row["AppName"],ENT_NOQUOTES,UTF-8); >+ >+ echo " <item>\n"; >+ echo " <pubDate>" . $dateupdated . "</pubDate>\n"; >+ echo " <title>" . $title . " " . $version . " for " . $appname . "</title>\n"; >+ echo " <link>http://$sitehostname/" . strtolower($listType) . "/moreinfo.php?id=" . $id . "&vid=" . $vid . "</link>\n"; >+ echo " <description>" . $description . "</description>\n"; >+ echo " </item>\n"; >+ >+ } >+ >+echo "</channel>\n"; >+echo "</rss>\n"; >+ >+?> >Index: rss/index.php >=================================================================== >RCS file: /cvsroot/mozilla/webtools/update/rss/index.php,v >retrieving revision 1.3 >diff --unified=4 -d -p -u -p -r1.3 index.php >--- rss/index.php 9 Dec 2004 06:34:53 -0000 1.3 >+++ rss/index.php 23 Dec 2004 21:21:01 -0000 >@@ -45,17 +45,16 @@ $type = escape_string($_GET["type"]); // > $list = ucwords(strtolower($_GET["list"])); // Newest, Updated, [Editors], Popular > > $sitetitle = "Mozilla Update"; > $siteicon = "http://www.mozilla.org/images/mozilla-16.png"; >-$siteurl = "http://update.mozilla.org"; >+$siteurl = $_SERVER["SERVER_NAME"]; > $sitedescription = "the way to keep your mozilla software up-to-date"; >-$sitelanguage = "en-us"; >-$sitecopyright = "Creative Commons?"; >+$sitelanguage = "en-US"; >+$sitecopyright = "Copyright 2004-2005 The Mozilla Organization"; > $currenttime = gmdate(r);// GMT > $rssttl = "120"; //Life of feed in minutes > >-//header("Content-Type: application/octet-stream"); >-header("Content-Type: text/xml"); >+header("Content-Type: application/xml; charset=utf-8"); > > // Firefox, extensions, by date added > > $select = "SELECT DISTINCT >@@ -103,5 +102,5 @@ $sql = $select . " " . $from . " WHERE " > > include"inc_rssfeed.php"; > > >-?> >\ No newline at end of file >+?>
Attachment #169467 -
Flags: first-review?(mconnor) → first-review+
| Assignee | ||
Comment 18•20 years ago
|
||
Checked into branch and trunk by mconnor at 2005-01-16 13:37
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 19•20 years ago
|
||
<b>Warning</b> : htmlspecialchars() [ <a href="http://www.php.net/function.htmlspecialchars">function.htmlspecialchars</a> ]: charset `-8' not supported, assuming iso-8859-1 in - <b> D:\cvs-1.11.5\mozilla\webtools\update\rss\inc_rssfeed.php </b> on line <b>58</b>
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
Comment 20•20 years ago
|
||
Wups, sorry, UTF-8 is a string there, not a constant like ENT_NOQUOTES. htmlspecialchars($sitetitle,ENT_NOQUOTES,"UTF-8") etc.
| Assignee | ||
Comment 21•20 years ago
|
||
Ok, mconnor has the fix on his tree. Hopefully this will land in cvs soon. Not going to bother to open a new bug.
Status: REOPENED → ASSIGNED
| Assignee | ||
Comment 22•20 years ago
|
||
Fix checked into trunk and branch by mconnor.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•