Closed Bug 858370 Opened 12 years ago Closed 7 years ago

[email] reduce use of regexp's in linkification logic for performance reasons / port gecko text-to-html linkifying logic

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect, P5)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: asuth, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c= p= u= s=])

Although we have greatly improved our linkification performance in bug 846373, the use of regular expressions still means our performance is less than it could be. See https://bugzilla.mozilla.org/show_bug.cgi?id=846373#c8 for numbers. The current plan is to largely port the logic used by gecko, described at https://bugzilla.mozilla.org/show_bug.cgi?id=846373#c9 and which I include here: Based on the jsperf results (thanks to you both!) at our perf standup we determined that: 1) the regex is just way too slow, so we want to eventually port the gecko solution or at least mimic it. 2) we want to run it on the worker so we can pre-compute the linkification. Because of how our HTML sanitizer works, we can especially be much more performant for HTML. 3) Because the current implementation is so pathological in certain cases, once :lightsofapollo finishes the review, I think we will land that and then spin off the porting effort. For the spin-off bug, here's a summary/references for what to port that we should copy over. I have no plans to work this currently: The nutshell is that gecko scans the string for ':', '@', or '.': http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#1170 and then looks for URLs around that character: http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#493 There is a shockingly well documented header file (yay BenB!): http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/mozTXTToHTMLConv.h The mechanism tries 4 modes in succession that can be briefly summarized as: <URL:blah>, <blah>, proto:blah, blah.
See Also: → 846373
Thanks, Andrew! Here are more high-level docs: http://www.bucksch.org/1/projects/mozilla/16507/ Also relevant (implemented in libmime): http://www.bucksch.org/1/projects/mozilla/31906/ For the URL recognizer, I like a lot what it does, but its codification could be more elegant. It was good that it uses nsIURI to check whether the concrete URL is valid (different check per scheme), and only linkifies URLs that actually have an application handler registered on this machine for this scheme. E.g. if a phone app is installed, tel: will be linkified, otherwise not. This trick prevents a lot of false positive or false negatives. If you can anyhow achieve that in Gaia, that would help a *lot*.
Because you want to use Gecko's implementation or mimic it, be aware of differences between that and the current implementation, e.g. in a Try server mail, the link to the files on the FTP is broken because the commit hash is not part of the link: As shown in the Mail app on Gecko 1.1.0.0-pre 20130417070204 on Unagi: http://ftp.mozilla.org /pub/mozilla.org/thunderbird /try-builds /<my-mail-address>- 54b23416992a "-54b23416992a" is not part of the link. So hopefully with the conversion here, this issue will go away.
Keywords: perf
Whiteboard: [c= p= u= s=]
See Also: → 906592
Summary: [email] reduce use of regexp's in linkification logic for performance reasons → [email] reduce use of regexp's in linkification logic for performance reasons / port gecko text-to-html linkifying logic
Priority: -- → P5
Blocks: 948604
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.