Closed Bug 1256640 Opened 9 years ago Closed 7 years ago

Make SimpleFeedParser more robust

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sebastian, Unassigned)

References

Details

In bug 1241810 Michael raised some good points how SimpleFeedParser could be improved. Copying from the other bug: > Two concerning issues I saw (marked in the review) were: > We make assumptions that the feed is valid (e.g. we replace the current > item if we ever see another <item> tag even if the last one is closed) > We make assumptions about the parent tag (e.g. if we're not in <image> > or <source>, update the title. What if there is another parent tag we > missed?) > > I saw an approach that could be helpful here: I read this brief essay > on pull parsers and it mentioned the benefit is that the code can follow > the structure of the feed (their example is useful). In this case, we > could follow their example by: > > while (true) { > // handle simple top-level tags > // ... > } else if (start_tag == "item") { > handleItem(...); > } > // ... > } > > where handleItem will handle all containing tags until it sees it's > matching END_TAG. This creates a more encapsulated and accurate > state machine where the inputs we parse inside the <item> parent > tag are more limited (e.g. we could throw if we see another <item> > tag while handling an item tag). XMLPullParser has support for some "validation mode": > I see in the XmlPullParser docs that you can set the parser to run > in validation mode – is there a reason we didn't run in that? If > so, I'd add a comment. Nested tags (in <item>): > This will cause problems if item or entry tags can appear inside > one another. While this might be against the feed specification, > we could still receive an invalid feed – you should probably ensure > state.currentItem == null before assigning (and it also has the > benefit of explicitly stating assumptions in the code). Text nodes that contain tags with text nodes that are not human-readable text: > I'm unfamiliar with the RSS spec, but this can be concerning if a new > tag opens and we start appending its mysterious text content, e.g.: > > <title>We are trying to get this text<image src=...>Pretend this is alt-text for an emoji</image> and we want this text too</title> > > But I can see a case for it with <title><a ...></a></title>. > Can you add a comment explaining your thinking here? Add tests for invalid feeds: > Might be good to add some invalid feeds in here too, e.g. empty feeds, > feeds with <item> inside <item> tags, invalid xml (e.g. non-closed tags, > broken tags), etc.
See Also: → 1256641
This code is gone.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.