If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Replacing RaptureXML/SWXMLHash with a library that is a Carthage compliant, actively maintained Swift project

RESOLVED FIXED

Status

()

Firefox for iOS
Build & Test
P1
normal
Rank:
6
RESOLVED FIXED
2 years ago
11 months ago

People

(Reporter: fluffyemily, Assigned: farhan, Mentored)

Tracking

unspecified
All
iOS

Firefox Tracking Flags

(fxios+)

Details

(Whiteboard: [mobileAS])

Attachments

(2 attachments)

48 bytes, text/x-github-pull-request
fluffyemily
: review-
Details | Review | Splinter Review
48 bytes, text/x-github-pull-request
bnicholson
: review+
Details | Review | Splinter Review
(Reporter)

Description

2 years ago
Code that currently uses RaptureXML should be moved to use SWXMLHash. This is because RaptureXML is not a supported project, we don't need 2 different xml parsing libraries in the app and Rapture is also no carthage compliant.
(Reporter)

Updated

2 years ago
Depends on: 1237578
(Reporter)

Updated

2 years ago
Assignee: etoop → nobody
Rank: 4
tracking-fxios: ? → +
Hardware: Other → All
Rank: 4 → 6
More research: SWXMLHash seems only used for parsing well known XML files (e.g. opensearch.xml), and RaptureXML is used parsing HTML to get Favicons.

This suggests we need to think again. This may mean a departure from downloading and parsing HTML in swift, or adopting another library that is supported.
(Reporter)

Updated

2 years ago
No longer blocks: 1223403

Comment 2

2 years ago
Created attachment 8726978 [details] [review]
Pull Request

I agree with James, it isnt quite clear if SWXMLHash is okay with HTML. It works fine and some of their examples are examples of small HTML pages but it seems more like a library focused on XML and parsing properly structured XML based on the github page. 

I've submitted a PR for this. It's just a PR for everyone to see how it would look with SWXML. I didn't remove RaptureXML but if I get the go ahead with this I'll add that to the PR and squash after. I've also left some comments on the PR. Would love any feedback.

If we want to be safer it would be better to pick a new library that specializes in parsing html and switch to that. We could then use that new library to parse the XML that SWXML is currently being used for and remove SWXML. The top 3 libraries I found for this are 
https://github.com/honghaoz/Ji
https://github.com/tid-kijyun/Kanna
https://github.com/cezheng/Fuzi
Attachment #8726978 - Flags: review?(etoop)

Comment 3

2 years ago
Well. After some more testing. SWXML doesn't work with many pages. More times than it should it falls back to the favicon. I'll look into a new library if we want to close this ticket that way.
(Reporter)

Comment 4

2 years ago
So, I agree that if SWXMLHash doesn't work reliably for favicons then we should choose another library. 

We only parse XML/HTML in 2 places in our codebase, Favicons and OpenSearch, and we use two separate libraries to do that each of those things. One of those libraries, RaptureXML is no longer maintained, is not carthage compliant and is written in Objective C. The other is SWXMLHash, hence the idea of moving Rapture code over to use SWXMLHash. 

The ultimate aim of this issue is to unify our XML/HTML parsing code to use the same library - one that is actively maintained, carthage compliant and, preferably, written in Swift. I shall amend the title of this bug to reflect that.
(Reporter)

Updated

2 years ago
Summary: Migrate code using RaptureXML to SWXMLHash → Replacing RaptureXML/SWXMLHash with a library that is a Carthage compliant, actively maintained Swift project
(Reporter)

Comment 5

2 years ago
(In reply to farhanpatel.17 from comment #2)
> The top 3 libraries I found for this are 
> https://github.com/honghaoz/Ji
> https://github.com/tid-kijyun/Kanna
> https://github.com/cezheng/Fuzi

I've had a look at these. Ji seems to have the cleanest API. I like the way that Fuzi uses errors so you can get actual information from the API as to why things have failed, but the author isn't using the best code to create the project so I would be a little wary of the results. 

All 3 are actively maintained, swift and carthage compliant. My usual approach at this point would be to pick a couple (probably Ji and Kanna), and run a comparison between the two, seeing what the implementation would look like and running performance metrics. However, that is quite a lot to ask a contributor to do :)

If you would like to do this, by all means do. If, however, you'd rather not, leave it with me. I'll run the comparison tests and post back here which library we'd like to go with, if you'd like?
Flags: needinfo?(farhanpatel.17)

Comment 6

2 years ago
It would be best if you could compare the libraries. You'll have a better idea of what performance metrics are important to the project. I can submit a PR once you choose a library.
Thank you.
Flags: needinfo?(farhanpatel.17)
(Reporter)

Comment 7

2 years ago
Comment on attachment 8726978 [details] [review]
Pull Request

I'll run the comparison tests on the possible replacements and let you know which one we decide to go with.

Thanks for the PR :)
Attachment #8726978 - Flags: review?(etoop) → review-

Comment 8

a year ago
(In reply to Emily Toop (:fluffyemily) from comment #5)
> (In reply to farhanpatel.17 from comment #2)
> > The top 3 libraries I found for this are 
> > https://github.com/honghaoz/Ji
> > https://github.com/tid-kijyun/Kanna
> > https://github.com/cezheng/Fuzi
> 
> I've had a look at these. Ji seems to have the cleanest API. I like the way
> that Fuzi uses errors so you can get actual information from the API as to
> why things have failed, but the author isn't using the best code to create
> the project so I would be a little wary of the results. 
> 
> All 3 are actively maintained, swift and carthage compliant. My usual
> approach at this point would be to pick a couple (probably Ji and Kanna),
> and run a comparison between the two, seeing what the implementation would
> look like and running performance metrics. However, that is quite a lot to
> ask a contributor to do :)
> 
> If you would like to do this, by all means do. If, however, you'd rather
> not, leave it with me. I'll run the comparison tests and post back here
> which library we'd like to go with, if you'd like?

Maybe a little bit late to join this topic, but I'm a little bit curious about why you think Fuzi's author isn't using the best code to create the project though?

AFAIK, Fuzi is careful about libxml2's API usage and I has been memory leak free ever since its first release, and many of those libxml2 memory leaks were not fixed until recent versions of Ji or Kanna.

What I personally think about these 3 libraries is that:
1. Fuzi is the fastest(due to heavy use of lazy) and the only one with error reporting.
2. Kanna has the most comprehensive CSS -> XPath conversion logic.
3. Ji has some convenient methods such as parsing from URL directly, but no CSS support

TBH, I think CSS -> XPath conversion shouldn't be used at all since it's just using Regex to do the trick.

Comment 9

a year ago
How these 3 projects were created, AFAIK

Fuzi: Based on Ono, fix some leaks and bugs, improved error reporting.
Ji: Initially based on hpple, but did many optimizations & reduced a lot of copy operations
Kanna: Pretty much inspired directly by nokogiri, and the implementation of the library seems to be original.

For objc libraries, Ono was much faster than hpple due to the heavy use of lazy initialization, Fuzi just took that from Ono.
(Assignee)

Comment 10

a year ago
Thanks for the well thought out analysis! I cant speak for Emily on why she didn't like Fuzi. We haven't had the chance to work on this and if you would like, you are welcome to assign this to yourself and submit a PR using a library you think would work best. 

I would prefer a lazy loading library because in the case of parsing HTML we only use the information in the head tag. We also don't need any fancy css or XPath support. 

The XML libraries are used in 2 places
1) Parsing <link rel="icon"> from HTML web pages
2) Parsing OpenSearch formatted XML files

A simple metric you can use to compare the speed of the libraries for the specific use cases is to use the Instruments Time Profiler to get a rough estimate on performance. This would also be helpful to make sure there aren't any regressions.
Any update? I'm considering Kanna, Ji, Fuzi, etc. for a similar kind of project.

So far I am leaning toward Kanna because the author seems to be the most up to date, such as supporting the new Swift 3 beta. 

Advice is much appreciated.

Comment 12

a year ago
If you don't need to use CSS selectors, I think Fuzi or Ji might be better. For Swift 3, both Kanna & Fuzi has a branch for Swift 3(I just finished migrating one of my personal app that uses Fuzi and it works well). From cocoapods info page Fuzi seems to have a much smaller integration size than Kanna which was the reason I picked Fuzi at the very beginning, they ain't that different after all though
(Assignee)

Updated

a year ago
Assignee: nobody → fpatel
(Assignee)

Comment 13

a year ago
Now seems like a good time to fix this. There are a lot of depreciation warnings related to SWXMLHash. I'll look for a  new library that fixes these warnings and is compat with swift 2.3/3.0
Priority: -- → P2
Whiteboard: [mobileAS]
Priority: P2 → P1
(Assignee)

Comment 14

a year ago
Created attachment 8793126 [details] [review]
Pull Request
Attachment #8793126 - Flags: review?(bnicholson)
See Also: → bug 1303756
Comment on attachment 8793126 [details] [review]
Pull Request

Awesome!
Attachment #8793126 - Flags: review?(bnicholson) → review+
(Assignee)

Comment 16

a year ago
master 2523a9189e1125812be925942d6566668c8477ce
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED

Updated

11 months ago
Iteration: --- → 1.5
You need to log in before you can comment on or make changes to this bug.