Code that currently uses RaptureXML should be moved to use SWXMLHash. This is because RaptureXML is not a supported project, we don't need 2 different xml parsing libraries in the app and Rapture is also no carthage compliant.
More research: SWXMLHash seems only used for parsing well known XML files (e.g. opensearch.xml), and RaptureXML is used parsing HTML to get Favicons. This suggests we need to think again. This may mean a departure from downloading and parsing HTML in swift, or adopting another library that is supported.
Created attachment 8726978 [details] [review] Pull Request I agree with James, it isnt quite clear if SWXMLHash is okay with HTML. It works fine and some of their examples are examples of small HTML pages but it seems more like a library focused on XML and parsing properly structured XML based on the github page. I've submitted a PR for this. It's just a PR for everyone to see how it would look with SWXML. I didn't remove RaptureXML but if I get the go ahead with this I'll add that to the PR and squash after. I've also left some comments on the PR. Would love any feedback. If we want to be safer it would be better to pick a new library that specializes in parsing html and switch to that. We could then use that new library to parse the XML that SWXML is currently being used for and remove SWXML. The top 3 libraries I found for this are https://github.com/honghaoz/Ji https://github.com/tid-kijyun/Kanna https://github.com/cezheng/Fuzi
Well. After some more testing. SWXML doesn't work with many pages. More times than it should it falls back to the favicon. I'll look into a new library if we want to close this ticket that way.
So, I agree that if SWXMLHash doesn't work reliably for favicons then we should choose another library. We only parse XML/HTML in 2 places in our codebase, Favicons and OpenSearch, and we use two separate libraries to do that each of those things. One of those libraries, RaptureXML is no longer maintained, is not carthage compliant and is written in Objective C. The other is SWXMLHash, hence the idea of moving Rapture code over to use SWXMLHash. The ultimate aim of this issue is to unify our XML/HTML parsing code to use the same library - one that is actively maintained, carthage compliant and, preferably, written in Swift. I shall amend the title of this bug to reflect that.
(In reply to farhanpatel.17 from comment #2) > The top 3 libraries I found for this are > https://github.com/honghaoz/Ji > https://github.com/tid-kijyun/Kanna > https://github.com/cezheng/Fuzi I've had a look at these. Ji seems to have the cleanest API. I like the way that Fuzi uses errors so you can get actual information from the API as to why things have failed, but the author isn't using the best code to create the project so I would be a little wary of the results. All 3 are actively maintained, swift and carthage compliant. My usual approach at this point would be to pick a couple (probably Ji and Kanna), and run a comparison between the two, seeing what the implementation would look like and running performance metrics. However, that is quite a lot to ask a contributor to do :) If you would like to do this, by all means do. If, however, you'd rather not, leave it with me. I'll run the comparison tests and post back here which library we'd like to go with, if you'd like?
It would be best if you could compare the libraries. You'll have a better idea of what performance metrics are important to the project. I can submit a PR once you choose a library. Thank you.
Comment on attachment 8726978 [details] [review] Pull Request I'll run the comparison tests on the possible replacements and let you know which one we decide to go with. Thanks for the PR :)
(In reply to Emily Toop (:fluffyemily) from comment #5) > (In reply to farhanpatel.17 from comment #2) > > The top 3 libraries I found for this are > > https://github.com/honghaoz/Ji > > https://github.com/tid-kijyun/Kanna > > https://github.com/cezheng/Fuzi > > I've had a look at these. Ji seems to have the cleanest API. I like the way > that Fuzi uses errors so you can get actual information from the API as to > why things have failed, but the author isn't using the best code to create > the project so I would be a little wary of the results. > > All 3 are actively maintained, swift and carthage compliant. My usual > approach at this point would be to pick a couple (probably Ji and Kanna), > and run a comparison between the two, seeing what the implementation would > look like and running performance metrics. However, that is quite a lot to > ask a contributor to do :) > > If you would like to do this, by all means do. If, however, you'd rather > not, leave it with me. I'll run the comparison tests and post back here > which library we'd like to go with, if you'd like? Maybe a little bit late to join this topic, but I'm a little bit curious about why you think Fuzi's author isn't using the best code to create the project though? AFAIK, Fuzi is careful about libxml2's API usage and I has been memory leak free ever since its first release, and many of those libxml2 memory leaks were not fixed until recent versions of Ji or Kanna. What I personally think about these 3 libraries is that: 1. Fuzi is the fastest(due to heavy use of lazy) and the only one with error reporting. 2. Kanna has the most comprehensive CSS -> XPath conversion logic. 3. Ji has some convenient methods such as parsing from URL directly, but no CSS support TBH, I think CSS -> XPath conversion shouldn't be used at all since it's just using Regex to do the trick.
How these 3 projects were created, AFAIK Fuzi: Based on Ono, fix some leaks and bugs, improved error reporting. Ji: Initially based on hpple, but did many optimizations & reduced a lot of copy operations Kanna: Pretty much inspired directly by nokogiri, and the implementation of the library seems to be original. For objc libraries, Ono was much faster than hpple due to the heavy use of lazy initialization, Fuzi just took that from Ono.
Thanks for the well thought out analysis! I cant speak for Emily on why she didn't like Fuzi. We haven't had the chance to work on this and if you would like, you are welcome to assign this to yourself and submit a PR using a library you think would work best. I would prefer a lazy loading library because in the case of parsing HTML we only use the information in the head tag. We also don't need any fancy css or XPath support. The XML libraries are used in 2 places 1) Parsing <link rel="icon"> from HTML web pages 2) Parsing OpenSearch formatted XML files A simple metric you can use to compare the speed of the libraries for the specific use cases is to use the Instruments Time Profiler to get a rough estimate on performance. This would also be helpful to make sure there aren't any regressions.
Any update? I'm considering Kanna, Ji, Fuzi, etc. for a similar kind of project. So far I am leaning toward Kanna because the author seems to be the most up to date, such as supporting the new Swift 3 beta. Advice is much appreciated.
If you don't need to use CSS selectors, I think Fuzi or Ji might be better. For Swift 3, both Kanna & Fuzi has a branch for Swift 3(I just finished migrating one of my personal app that uses Fuzi and it works well). From cocoapods info page Fuzi seems to have a much smaller integration size than Kanna which was the reason I picked Fuzi at the very beginning, they ain't that different after all though
Now seems like a good time to fix this. There are a lot of depreciation warnings related to SWXMLHash. I'll look for a new library that fixes these warnings and is compat with swift 2.3/3.0
Created attachment 8793126 [details] [review] Pull Request
Comment on attachment 8793126 [details] [review] Pull Request Awesome!