Closed Bug 1237577 Opened 10 years ago Closed 9 years ago

Replacing RaptureXML/SWXMLHash with a library that is a Carthage compliant, actively maintained Swift project

Tracking

()

Status:

RESOLVED FIXED

Iteration:

1.5

Tracking Flags:

Tracking

Status

fxios

---

People

(Reporter: fluffyemily, Assigned: farhan, Mentored)

References

Details

(Whiteboard: [mobileAS])

Attachments

(2 files)

Pull Request 9 years ago farhanpatel.17 48 bytes, text/x-github-pull-request	fluffyemily : review-	Details \| Review
Pull Request 9 years ago Farhan Patel [:farhan] 48 bytes, text/x-github-pull-request	bnicholson : review+	Details \| Review

Emily Toop (:fluffyemily)

Reporter

Description

•

10 years ago

Code that currently uses RaptureXML should be moved to use SWXMLHash. This is because RaptureXML is not a supported project, we don't need 2 different xml parsing libraries in the app and Rapture is also no carthage compliant.

Emily Toop (:fluffyemily)

Reporter

Updated

•

10 years ago

Depends on: 1237578

Emily Toop (:fluffyemily)

Reporter

Updated

•

10 years ago

Assignee: etoop → nobody

Richard Newman [:rnewman]

Updated

•

10 years ago

Rank: 4

tracking-fxios: ? → +

Hardware: Other → All

Richard Newman [:rnewman]

Updated

•

10 years ago

Rank: 4 → 6

James Hugman [:jhugman] [@jhugman]

Comment 1

•

10 years ago

More research: SWXMLHash seems only used for parsing well known XML files (e.g. opensearch.xml), and RaptureXML is used parsing HTML to get Favicons. This suggests we need to think again. This may mean a departure from downloading and parsing HTML in swift, or adopting another library that is supported.

Emily Toop (:fluffyemily)

Reporter

Updated

•

10 years ago

No longer blocks: 1223403

farhanpatel.17

Comment 2

•

9 years ago

Attached file Pull Request — Details

I agree with James, it isnt quite clear if SWXMLHash is okay with HTML. It works fine and some of their examples are examples of small HTML pages but it seems more like a library focused on XML and parsing properly structured XML based on the github page. I've submitted a PR for this. It's just a PR for everyone to see how it would look with SWXML. I didn't remove RaptureXML but if I get the go ahead with this I'll add that to the PR and squash after. I've also left some comments on the PR. Would love any feedback. If we want to be safer it would be better to pick a new library that specializes in parsing html and switch to that. We could then use that new library to parse the XML that SWXML is currently being used for and remove SWXML. The top 3 libraries I found for this are https://github.com/honghaoz/Ji https://github.com/tid-kijyun/Kanna https://github.com/cezheng/Fuzi

Attachment #8726978 - Flags: review?(etoop)

farhanpatel.17

Comment 3

•

9 years ago

Well. After some more testing. SWXML doesn't work with many pages. More times than it should it falls back to the favicon. I'll look into a new library if we want to close this ticket that way.

Emily Toop (:fluffyemily)

Reporter

Comment 4

•

9 years ago

So, I agree that if SWXMLHash doesn't work reliably for favicons then we should choose another library. We only parse XML/HTML in 2 places in our codebase, Favicons and OpenSearch, and we use two separate libraries to do that each of those things. One of those libraries, RaptureXML is no longer maintained, is not carthage compliant and is written in Objective C. The other is SWXMLHash, hence the idea of moving Rapture code over to use SWXMLHash. The ultimate aim of this issue is to unify our XML/HTML parsing code to use the same library - one that is actively maintained, carthage compliant and, preferably, written in Swift. I shall amend the title of this bug to reflect that.

Emily Toop (:fluffyemily)

Reporter

Updated

•

9 years ago

Summary: Migrate code using RaptureXML to SWXMLHash → Replacing RaptureXML/SWXMLHash with a library that is a Carthage compliant, actively maintained Swift project

Emily Toop (:fluffyemily)

Reporter

Comment 5

•

9 years ago

(In reply to farhanpatel.17 from comment #2) > The top 3 libraries I found for this are > https://github.com/honghaoz/Ji > https://github.com/tid-kijyun/Kanna > https://github.com/cezheng/Fuzi I've had a look at these. Ji seems to have the cleanest API. I like the way that Fuzi uses errors so you can get actual information from the API as to why things have failed, but the author isn't using the best code to create the project so I would be a little wary of the results. All 3 are actively maintained, swift and carthage compliant. My usual approach at this point would be to pick a couple (probably Ji and Kanna), and run a comparison between the two, seeing what the implementation would look like and running performance metrics. However, that is quite a lot to ask a contributor to do :) If you would like to do this, by all means do. If, however, you'd rather not, leave it with me. I'll run the comparison tests and post back here which library we'd like to go with, if you'd like?

Flags: needinfo?(farhanpatel.17)

farhanpatel.17

Comment 6

•

9 years ago

It would be best if you could compare the libraries. You'll have a better idea of what performance metrics are important to the project. I can submit a PR once you choose a library. Thank you.

Flags: needinfo?(farhanpatel.17)

Emily Toop (:fluffyemily)

Reporter

Comment 7

•

9 years ago

Comment on attachment 8726978 [details] [review] Pull Request I'll run the comparison tests on the possible replacements and let you know which one we decide to go with. Thanks for the PR :)

Attachment #8726978 - Flags: review?(etoop) → review-

cezheng.cs

Comment 8

•

9 years ago

(In reply to Emily Toop (:fluffyemily) from comment #5) > (In reply to farhanpatel.17 from comment #2) > > The top 3 libraries I found for this are > > https://github.com/honghaoz/Ji > > https://github.com/tid-kijyun/Kanna > > https://github.com/cezheng/Fuzi > > I've had a look at these. Ji seems to have the cleanest API. I like the way > that Fuzi uses errors so you can get actual information from the API as to > why things have failed, but the author isn't using the best code to create > the project so I would be a little wary of the results. > > All 3 are actively maintained, swift and carthage compliant. My usual > approach at this point would be to pick a couple (probably Ji and Kanna), > and run a comparison between the two, seeing what the implementation would > look like and running performance metrics. However, that is quite a lot to > ask a contributor to do :) > > If you would like to do this, by all means do. If, however, you'd rather > not, leave it with me. I'll run the comparison tests and post back here > which library we'd like to go with, if you'd like? Maybe a little bit late to join this topic, but I'm a little bit curious about why you think Fuzi's author isn't using the best code to create the project though? AFAIK, Fuzi is careful about libxml2's API usage and I has been memory leak free ever since its first release, and many of those libxml2 memory leaks were not fixed until recent versions of Ji or Kanna. What I personally think about these 3 libraries is that: 1. Fuzi is the fastest(due to heavy use of lazy) and the only one with error reporting. 2. Kanna has the most comprehensive CSS -> XPath conversion logic. 3. Ji has some convenient methods such as parsing from URL directly, but no CSS support TBH, I think CSS -> XPath conversion shouldn't be used at all since it's just using Regex to do the trick.

cezheng.cs

Comment 9

•

9 years ago

How these 3 projects were created, AFAIK Fuzi: Based on Ono, fix some leaks and bugs, improved error reporting. Ji: Initially based on hpple, but did many optimizations & reduced a lot of copy operations Kanna: Pretty much inspired directly by nokogiri, and the implementation of the library seems to be original. For objc libraries, Ono was much faster than hpple due to the heavy use of lazy initialization, Fuzi just took that from Ono.

Farhan Patel [:farhan]

Assignee

Comment 10

•

9 years ago

Thanks for the well thought out analysis! I cant speak for Emily on why she didn't like Fuzi. We haven't had the chance to work on this and if you would like, you are welcome to assign this to yourself and submit a PR using a library you think would work best. I would prefer a lazy loading library because in the case of parsing HTML we only use the information in the head tag. We also don't need any fancy css or XPath support. The XML libraries are used in 2 places 1) Parsing <link rel="icon"> from HTML web pages 2) Parsing OpenSearch formatted XML files A simple metric you can use to compare the speed of the libraries for the specific use cases is to use the Instruments Time Profiler to get a rough estimate on performance. This would also be helpful to make sure there aren't any regressions.

Joel Parker Henderson

Comment 11

•

9 years ago

Any update? I'm considering Kanna, Ji, Fuzi, etc. for a similar kind of project. So far I am leaning toward Kanna because the author seems to be the most up to date, such as supporting the new Swift 3 beta. Advice is much appreciated.

Dooofy

Comment 12

•

9 years ago

If you don't need to use CSS selectors, I think Fuzi or Ji might be better. For Swift 3, both Kanna & Fuzi has a branch for Swift 3(I just finished migrating one of my personal app that uses Fuzi and it works well). From cocoapods info page Fuzi seems to have a much smaller integration size than Kanna which was the reason I picked Fuzi at the very beginning, they ain't that different after all though

Farhan Patel [:farhan]

Assignee

Updated

•

9 years ago

Assignee: nobody → fpatel

Farhan Patel [:farhan]

Assignee

Comment 13

•

9 years ago

Now seems like a good time to fix this. There are a lot of depreciation warnings related to SWXMLHash. I'll look for a new library that fixes these warnings and is compat with swift 2.3/3.0

Priority: -- → P2

Whiteboard: [mobileAS]

Jenn Chaulk (:jchaulk)

Updated

•

9 years ago

Priority: P2 → P1

Farhan Patel [:farhan]

Assignee

Comment 14

•

9 years ago

Attached file Pull Request — Details

Attachment #8793126 - Flags: review?(bnicholson)

Stephan Leroux [:sleroux]

Updated

•

9 years ago

Comment 15

•

9 years ago

Comment on attachment 8793126 [details] [review] Pull Request Awesome!

Attachment #8793126 - Flags: review?(bnicholson) → review+

Farhan Patel [:farhan]

Assignee

Comment 16

•

9 years ago

master 2523a9189e1125812be925942d6566668c8477ce

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Jenn Chaulk (:jchaulk)

Updated

•

9 years ago

Iteration: --- → 1.5

You need to log in before you can comment on or make changes to this bug.