Closed Bug 19251 Opened 26 years ago Closed 26 years ago

improve way to recognize URLs in messages

Categories

(MailNews Core :: MIME, enhancement, P1)

enhancement

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: warrensomebody, Assigned: BenB)

References

(Blocks 1 open bug, )

Details

Attachments

(6 files)

The current mechanism for recognizing URLs in mail messages is hard coded to only look for a few URL schemes. We need to make this extensible so that URLs associated with all protocol plugins are recognized. For instance, jar: URLs aren't recognized right now. To do this, I think all we have to do is first detect something that looks like a protocol scheme (e.g. "foo:") and then take the text up to the next whitespace character and hand it to nsIOService::NewURI. If this successfully constructs a URL, then we know that the protocol scheme does correspond to an installed protocol plugin, and that the URL should be converted into an actual link in the text.
Assignee: phil → rhp
Reassign to rhp
Ben had been working on this code (actually working on a rewrite of some of these routines) so he would be the person to look at this. - rhp
Thats: Ben Bucksch http://www.bucksch.org
Assignee: rhp → mozilla
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Accepting
Severity: normal → enhancement
Component: Front End → MIME
OS: other → All
Priority: P3 → P1
Target Milestone: M19 → M12
The most basic recognition functionality seems to work. Need to do more testing. Not working: mailto, abbreviated URLs. All the other funtions in the class (ParseURL etc.) still don't use Necko and need to be rewritten (by me).
Some description of my code: It works mode-based: modes are tested in sequence (defined by a const) and the first successful one wins. Modes are the following (copied from source code): RFC1738, /* Check, if RFC1738, APPENDIX compliant, like <URL:http://www.mozilla.org>. */ RFC2396, /* RFC2396, APPENDIX E allows anglebrackets (like <http://www.mozilla.org>) or quotation marks (like "http://www.mozilla.org") (w/o "URL:"). */ freetext /* assume heading scheme with "[a-zA-Z0-9]*:" like "news:". Certain characters (see code) or any whitespace (including linebreaks) end the URL. Other certain (punctation) characters (see code) at the end are stripped off. */ /* RFC1738 and RFC2396 type URLs may may use multiple lines, whitespace is stripped. Special characters like ',' stay intact.*/
Sounds like you're saying that you wrote your own recognizer based on the specs, but I'd rather see us use what necko has for consistency. That way if the thing is highlighted, we'll be assured that we can handle it. If necko's url parsing doesn't meet the specs specified, then we should fix it.
Warren, all my functiom does is to decide, where the URL starts and ends. I don't know of a Necko function doing this. After that is done, I leave it up to Necko (NS_NewURI) to decide, is the result is valid or not. I'll attach the current code (work in progress). If you still think, it should be moved to Necko, please provide me with the necessary background (knowledge) and I'll integrate it.
I forgot to mention: function in question is FindURL.
Depends on: 19313
Blocks: 18410
Blocks: 5351
*** Bug 7176 has been marked as a duplicate of this bug. ***
Blocks: 19992
The patches/files create a new defunct stream converter with an XPCOM interface and 3 (static) functions: ScanTXT, ScanHTML and CiteLevel. The latter is currently unused, ScanTXT is used by mimetpla.cpp and mimetplf.cpp, ScanHTML by nsMsgSendPart. I changed these functions to use the new class and removed the old functions from nsMimeURLUtils. I will ask Shaver, if the licence is OK. The callers still need some work for I18N and perf checking to pass the right modes to the functions, but most if not all points are marked with a XXX comment. rhp, can you please review the mime parts and the 3 functions? valeski, can you please review the converter and it's integration in Necko? Is it OK, that it registers for text/plain? If not, can you make it register with the Factory, so libmime can access it? Tnx.
Typo: "pref(erences) checking", not "perf checking"
Everyone, It probably makes sense for one person to land all of these changes. If you want, I can step up and take that role. I will probably get this stuff ready to rock over the weekend and look for a Monday landing. If anyone objects, please let me know. - rhp PS: Warren: this will include the other changes we talked about today.
Note, that the license for moz(I)TXTToHTMLConv is possibly invalid. I may release it under a different licence (e.g. a modified MPL or new BSD-style (w/o ad restriction) license.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
Ok gang, this is all checked in now. There seems to be an issue with the emoticon detection that Ben is working on, but other than that, we seem to be working. - rhp
Judging from Ben's comment about just looking at where the url starts and ends and then calling NS_NewURI, I'm happy. I haven't looked at the code though. One thing I'd like to see though that I always considered broken in 4.x releases: If a url is broken across a line, there should be heuristics that recognize that fact, and pick up the rest of the url as the continuation, e.g.: bla bla bla bla bla bla bla bla bla bla bla bla bla http://listings.ebay.com/aw/ listings/list/category1497/index.html bla bla bla bla bla bla The recognizer should notice the "<text>*:" as the start of the url, then notice that it ends with the newline, and then look on the next line for a string a of text containing slashes, and dots, etc. and including it in the url string.
Warren, bug #5351 (dependant on this) addresses the linebreaks in URLs.
Blocks: 21564
No longer blocks: 21564
QA Contact: lchiang → esther
Can anyone give me some test url's for this bug. I have verified a mailto link works OK (comment #9) and in (comment #22) a long string url with text in front and in back of it sent and received as a url OK. I'm not sure what all the protocols are as stated in original description, so if someone can help with this I would appreciate it.
Esther, this code is so old and so visible that you can fairly securely mark this verified. (I won't do so, because I am the one who fixed it.)
Thanks Ben! Verified.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: