Closed
Bug 302133
Opened 19 years ago
Closed 19 years ago
Fix parsing of Atom 1.0 titles for Live Bookmarks
Categories
(Firefox :: Bookmarks & History, defect)
Firefox
Bookmarks & History
Tracking
()
RESOLVED
FIXED
People
(Reporter: philor, Assigned: sayrer)
References
Details
(Keywords: fixed1.8.1)
Attachments
(2 files, 1 obsolete file)
1.50 KB,
application/atom+xml
|
Details | |
13.88 KB,
patch
|
vlad
:
review+
mconnor
:
approval-branch-1.8.1+
|
Details | Diff | Splinter Review |
Atom 1.0 allows the content of titles to be plain-text (@type="text" or no type
attribute), escaped HTML (@type="html"), or an inline XHTML div (@type="xhtml").
For @type="text" we're fine: we pass <b>&copy</b> through as-is,
correctly.
For @type="html" we fail to convert escaped character entity references and
escaped numeric character references to characters (&copy; &#xA9; should
display as two copyright symbol characters), and we fail to strip markup ("this
is <b>bold</b>" should be displayed as "this is bold" rather than "this is
<b>bold</b>").
For @type="xhtml" we fail completely: FindTextInNode looks like it should find
the first descendant text node (when what's needed is actually concatenating all
the text nodes), but rather than that, finding "strong" in <title
type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><strong>strong</strong>
isn't better, it's different</div></title>, or the correct result, "strong isn't
better, it's different" we get nothing: the entry fails to load as if there was
no title.
Reporter | ||
Comment 1•19 years ago
|
||
A variety-pack of Atom titles.
Assignee | ||
Updated•19 years ago
|
Assignee: nobody → sayrer
Assignee | ||
Updated•19 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•19 years ago
|
||
Not sure if this is the optimal strategy, but it passes all of Phil's testcases.
Attachment #205503 -
Flags: review?(vladimir)
Comment on attachment 205503 [details] [diff] [review]
First cut at handling escaped HTML and XML in Atom titles
Thanks for the patch! Some comments..
>-
>+ NS_METHOD FindTextInChildNodes (nsIDOMNode *aNode, nsAString &aString);
>+ NS_METHOD ParseHTMLFragment(nsAString &aFragString, nsIDocument* aTargetDocument, nsCOMPtr<nsIDOMNode> *outNode);
nsIDOMNode **outNode; don't use nsCOMPtr<> for params
> nsresult
>+nsFeedLoadListener::ParseHTMLFragment(nsAString &aFragString,
>+ nsIDocument* aTargetDocument,
>+ nsCOMPtr<nsIDOMNode> *outNode)
nsIDOMNode **outNode
>+{
>+ NS_ENSURE_ARG(aTargetDocument);
>+ nsresult rv;
>+ nsCOMPtr<nsIParser> parser = do_CreateInstance(kCParserCID, &rv);
>+ if (NS_FAILED(rv)) return rv;
>+
>+ // create the html fragment sink
>+ nsCOMPtr<nsIContentSink> sink;
>+ sink = do_CreateInstance(NS_HTMLFRAGMENTSINK2_CONTRACTID);
>+ NS_ENSURE_TRUE(sink, NS_ERROR_FAILURE);
>+ nsCOMPtr<nsIFragmentContentSink> fragSink(do_QueryInterface(sink));
>+ NS_ENSURE_TRUE(fragSink, NS_ERROR_FAILURE);
I'm not sure offhand if we really want to be creating a new instance each time; not sure if we can reuse the parser or not. If we can, it would be better to do so.
>+ *outNode = do_QueryInterface(contextfrag);
CallQueryInterface (contextfrag, outNode);
>+ return rv;
>+}
>+
>+// find all of the text nodes below aNode at return them as a string
>+nsresult
>+nsFeedLoadListener::FindTextInChildNodes (nsIDOMNode *aNode, nsAString &aString)
>+{
>+ NS_ENSURE_ARG(aNode);
>+
>+ nsresult rv;
>+
>+ nsCOMPtr<nsIDOMDocument> aDoc;
>+ aNode->GetOwnerDocument(getter_AddRefs(aDoc));
>+ nsCOMPtr<nsIDOMDocumentTraversal> trav = do_QueryInterface(aDoc, &rv);
>+ if (NS_FAILED(rv)) return rv;
>+
>+ nsCOMPtr<nsIDOMTreeWalker> walker;
>+ rv = trav->CreateTreeWalker(aNode,
>+ nsIDOMNodeFilter::SHOW_TEXT | nsIDOMNodeFilter::SHOW_CDATA_SECTION,
>+ nsnull, PR_TRUE, getter_AddRefs(walker));
>+ if (NS_FAILED(rv)) return rv;
>+
>+ nsCOMPtr<nsIDOMNode> currentNode;
>+ walker->GetCurrentNode(getter_AddRefs(currentNode));
>+ nsCOMPtr<nsIDOMCharacterData> charTextNode;
>+ nsAutoString tempString;
>+ while (currentNode)
>+ {
{ on same line as while, i.e. "while (currentNode) {"
>+ charTextNode = do_QueryInterface(currentNode);
>+ if(charTextNode)
>+ {
same line { again, space after "if" (and in a bunch of other places -- please try to follow the coding style of whatever file you're in :)
>+ charTextNode->GetData(tempString);
>+ aString.Append(tempString);
>+ }
>+ walker->NextNode(getter_AddRefs(currentNode));
>+ }
>+
>+ if(aString.IsEmpty()){
space after "if", before {
>+ return NS_ERROR_FAILURE;
>+ }else{
space around "else"
>+ return NS_OK;
>+ }
>+}
>+
>+nsresult
> nsFeedLoadListener::HandleRDFItem (nsIRDFDataSource *aDS, nsIRDFResource *aItem,
> nsIRDFResource *aLinkResource,
> nsIRDFResource *aTitleResource)
> {
> nsresult rv;
>
> /* We care about this item's link and title */
> nsCOMPtr<nsIRDFNode> linkNode;
>@@ -665,17 +749,53 @@ nsFeedLoadListener::TryParseAsSimpleRSS
> rv = childNode->GetNodeType(&childNtype);
> if (NS_FAILED(rv)) return rv;
>
> if (childNtype == nsIDOMNode::ELEMENT_NODE) {
> nsAutoString childNname;
> rv = childNode->GetNodeName (childNname);
>
> if (childNname.Equals(NS_LITERAL_STRING("title"))) {
>- rv = FindTextInNode (childNode, titleStr);
>+ if(isAtom){
>+ /* Atom titles can contain HTML, so we need to find and remove it */
>+ nsCOMPtr<nsIDOMElement> titleElem = do_QueryInterface(childNode);
>+ if (!titleElem) break; // out of while(childNode) loop
>+
>+ nsAutoString titleMode; // Atom 0.3 only
>+ nsAutoString titleType;
>+ titleElem->GetAttribute(NS_LITERAL_STRING("type"), titleType);
>+ titleElem->GetAttribute(NS_LITERAL_STRING("mode"), titleMode);
>+
>+ /* Cover Atom 0.3 and RFC 4287 together */
>+ if(titleType.Equals(NS_LITERAL_STRING("text")) ||
space after "if"; also use EqualsLiteral("text") -- I know a bunch of other places don't use it, but it wasn't available on the 1.0 branch (but is available on the trunk). We should start using it where possible now.
>+ titleType.Equals(NS_LITERAL_STRING("text/plain")) ||
EqualsLiteral
>+ titleType.IsEmpty())
>+ {
>+ rv = FindTextInNode(childNode, titleStr);
>+ }else if(titleType.Equals(NS_LITERAL_STRING("text/html")) ||
space around "else if"; EqualsLiteral
>+ titleType.Equals(NS_LITERAL_STRING("html")))
>+ {
>+ nsAutoString escapedHTMLStr;
>+ rv = FindTextInNode(childNode, escapedHTMLStr);
>+ if (NS_FAILED(rv)) break; // out of while(childNode) loop
>+
>+ nsCOMPtr<nsIDOMNode> newNode;
>+ nsCOMPtr<nsIDocument> doc = do_QueryInterface(xmldoc);
>+ ParseHTMLFragment(escapedHTMLStr, doc, address_of(newNode));
>+ rv = FindTextInChildNodes(newNode, titleStr);
>+ }else if(titleType.Equals(NS_LITERAL_STRING("xhtml")) ||
spaces, EqualsLiteral
>+ titleType.Equals(NS_LITERAL_STRING("application/xhtml")))
>+ {
>+ rv = FindTextInChildNodes(childNode, titleStr);
>+ }else{
spaces
>+ break; // out of while(childNode) loop
>+ }
>+ }else{
ditto
>+ rv = FindTextInNode(childNode, titleStr);
>+ }
> if (NS_FAILED(rv)) break;
> } else if (childNname.Equals(NS_LITERAL_STRING("pubDate")) ||
> childNname.Equals(NS_LITERAL_STRING("updated")))
> {
> rv = FindTextInNode (childNode, dateStr);
> if (NS_FAILED(rv)) break;
> } else if (!isAtom && childNname.Equals(NS_LITERAL_STRING("guid"))) {
> nsCOMPtr<nsIDOMElement> linkElem = do_QueryInterface(childNode);
Assignee | ||
Comment 4•19 years ago
|
||
FindTextInNode seems to have problems wrt all FeedFormats. Here are some tests:
http://franklinmint.fm/2005/12/18/pathological-rss2.xml
All valid:
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Ffranklinmint.fm%2F2005%2F12%2F18%2Fpathological-rss2.xml
Assignee | ||
Updated•19 years ago
|
Attachment #205503 -
Flags: review?(vladimir)
Assignee | ||
Comment 5•19 years ago
|
||
This patch removes FindTextInNode in favor of FindTextInChildNodes and cleans up whitespace in resulting values.
Tests nasty but valid RSS2 variants:
http://franklinmint.fm/2005/12/18/pathological-rss2.xml
Tests real-world Atom 0.3 escaped HTML titles:
http://compulsorybloggerblog.blogspot.com/atom.xml
Phil's character and stripTags tests:
http://franklinmint.fm/2005/12/09/Atom-title-conversion-testcase.xml
Attachment #206272 -
Flags: review?(vladimir)
Assignee | ||
Updated•19 years ago
|
Attachment #205503 -
Attachment is obsolete: true
Assignee | ||
Comment 6•19 years ago
|
||
(In reply to comment #3)
>
> >+ nsCOMPtr<nsIParser> parser = do_CreateInstance(kCParserCID, &rv);
> >+ if (NS_FAILED(rv)) return rv;
>
> I'm not sure offhand if we really want to be creating a new instance each time;
> not sure if we can reuse the parser or not. If we can, it would be better to
> do so.
I tried reusing it, and got seg faults. I also asked if there was a way to reuse it on IRC and was told "probably not".
Comment on attachment 206272 [details] [diff] [review]
fix formatting, better text handling
Looks fine to me.
r=vladimir
Attachment #206272 -
Flags: review?(vladimir) → review+
Assignee | ||
Updated•19 years ago
|
Whiteboard: [checkin needed]
Reporter | ||
Comment 8•19 years ago
|
||
*** Bug 321606 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 9•19 years ago
|
||
*** Bug 322951 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 10•19 years ago
|
||
*** Bug 253814 has been marked as a duplicate of this bug. ***
Comment 11•19 years ago
|
||
i'm not sure that the desired behavior is ever to pass escaped content through and display it to the user. if someone publishes an rss feed with a title that has escaped html entities in it, they almost certainly want it unescaped. i think firefox should make this assumption. it makes the code much simpler.
so, i think the easy fix here is to make this assumption and always pass the bookmark's title through an unescape function. this is similar to what nsBookmarksService does when it loads the bookmarks file off of the disk.
it appears that someone hacked in an unescape function in nsBookmarksService.cpp (not sure why nsEscape.h isn't used here). i was able to make this function visible to the nsBookmarksFeedLoader and use it to preprocess every live bookmark title. this totally fixed the problem.
so i would not advise employing a complicated solution (the one below) to fix this simple but high visibility bug. just unescape all of the live bookmark titles, and you're done. it takes about 5 lines of code to get this working.
Comment 12•19 years ago
|
||
I should clarify that I'm speaking specifically to the failure to unescape html in the titles. This is by far the most visible aspect of this bug. Fixing the other parts of this bug (parsing bold, italics, etc) is arguably less important, since those tags are much more rare in the live bookmark titles.
Assignee | ||
Comment 13•19 years ago
|
||
I don't think we're going to fix this for 1.5, and FF2 will have a new parser. Not sure why we're talking about RSS2, but I'll gleefully note that the meaning of escaped markup in titles is undefined.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → WONTFIX
Reporter | ||
Comment 14•19 years ago
|
||
*** Bug 311065 has been marked as a duplicate of this bug. ***
No longer blocks: 311065
Updated•19 years ago
|
Whiteboard: [checkin needed]
Reporter | ||
Comment 15•19 years ago
|
||
Since it's looking like we'll be using the old parser after all for 2.0, can we please have this?
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Assignee | ||
Comment 16•19 years ago
|
||
(In reply to comment #15)
> Since it's looking like we'll be using the old parser after all for 2.0, can we
> please have this?
Looks to me like this landed on trunk already:
http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsBookmarksFeedHandler.cpp&branch=&root=/cvsroot&subdir=mozilla/browser/components/bookmarks/src&command=DIFF_FRAMESET&rev1=1.10&rev2=1.11
Reporter | ||
Comment 17•19 years ago
|
||
Heh. So it did, though the checkin comment makes it look accidental. Okay, baked on the trunk for a month or so between accidental checkin and Places being on by default with no regressions reported, plus bug 321036 got it into the Places fork where it's also apparently working fine, so all we need is a1.8.1.
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 18•19 years ago
|
||
Comment on attachment 206272 [details] [diff] [review]
fix formatting, better text handling
(Well, approval and for the branch to reopen after a2, that is...)
Attachment #206272 -
Flags: approval-branch-1.8.1?(mconnor)
Updated•19 years ago
|
Attachment #206272 -
Flags: approval-branch-1.8.1?(mconnor) → approval-branch-1.8.1+
Assignee | ||
Comment 19•19 years ago
|
||
checked in on branch.
Checking in Makefile.in;
/cvsroot/mozilla/browser/components/bookmarks/src/Makefile.in,v <-- Makefile.in
new revision: 1.16.8.1; previous revision: 1.16
done
Checking in nsBookmarksFeedHandler.cpp;
/cvsroot/mozilla/browser/components/bookmarks/src/nsBookmarksFeedHandler.cpp,v <-- nsBookmarksFeedHandler.cpp
new revision: 1.8.2.2; previous revision: 1.8.2.1
done
Keywords: fixed1.8.1
Reporter | ||
Comment 20•18 years ago
|
||
*** Bug 347316 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•