Closed
Bug 1256640
Opened 9 years ago
Closed 7 years ago
Make SimpleFeedParser more robust
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: sebastian, Unassigned)
References
Details
In bug 1241810 Michael raised some good points how SimpleFeedParser could be improved.
Copying from the other bug:
> Two concerning issues I saw (marked in the review) were:
> We make assumptions that the feed is valid (e.g. we replace the current
> item if we ever see another <item> tag even if the last one is closed)
> We make assumptions about the parent tag (e.g. if we're not in <image>
> or <source>, update the title. What if there is another parent tag we
> missed?)
>
> I saw an approach that could be helpful here: I read this brief essay
> on pull parsers and it mentioned the benefit is that the code can follow
> the structure of the feed (their example is useful). In this case, we
> could follow their example by:
>
> while (true) {
> // handle simple top-level tags
> // ...
> } else if (start_tag == "item") {
> handleItem(...);
> }
> // ...
> }
>
> where handleItem will handle all containing tags until it sees it's
> matching END_TAG. This creates a more encapsulated and accurate
> state machine where the inputs we parse inside the <item> parent
> tag are more limited (e.g. we could throw if we see another <item>
> tag while handling an item tag).
XMLPullParser has support for some "validation mode":
> I see in the XmlPullParser docs that you can set the parser to run
> in validation mode – is there a reason we didn't run in that? If
> so, I'd add a comment.
Nested tags (in <item>):
> This will cause problems if item or entry tags can appear inside
> one another. While this might be against the feed specification,
> we could still receive an invalid feed – you should probably ensure
> state.currentItem == null before assigning (and it also has the
> benefit of explicitly stating assumptions in the code).
Text nodes that contain tags with text nodes that are not human-readable text:
> I'm unfamiliar with the RSS spec, but this can be concerning if a new
> tag opens and we start appending its mysterious text content, e.g.:
>
> <title>We are trying to get this text<image src=...>Pretend this is alt-text for an emoji</image> and we want this text too</title>
>
> But I can see a case for it with <title><a ...></a></title>.
> Can you add a comment explaining your thinking here?
Add tests for invalid feeds:
> Might be good to add some invalid feeds in here too, e.g. empty feeds,
> feeds with <item> inside <item> tags, invalid xml (e.g. non-closed tags,
> broken tags), etc.
| Reporter | ||
Comment 1•7 years ago
|
||
This code is gone.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
| Assignee | ||
Updated•5 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•