Closed Bug 19251 Opened 26 years ago Closed 26 years ago

improve way to recognize URLs in messages

Categories

(MailNews Core :: MIME, enhancement, P1)

Product:

Component:

Type:

enhancement

Priority:

P1

Severity:

normal

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

M12

People

(Reporter: warrensomebody, Assigned: BenB)

References

(Blocks 1 open bug,
URL
)

Details

Attachments

(6 files)

Current state, URL recognition incl. mailto and abbreviated should work 26 years ago Ben Bucksch (:BenB) 53.59 KB, text/plain		Details
1/5 - mailnews/mime 26 years ago Ben Bucksch (:BenB) 35.12 KB, patch		Details \| Diff \| Splinter Review
2/5 - netwerk/streamconv 26 years ago Ben Bucksch (:BenB) 5.17 KB, patch		Details \| Diff \| Splinter Review
3/5 - streamconv/public/mozITXTToHTMLConv.idl 26 years ago Ben Bucksch (:BenB) 5.07 KB, patch		Details \| Diff \| Splinter Review
4/5 - streamconv/converters/mozTXTToHTMLConv.h 26 years ago Ben Bucksch (:BenB) 10.91 KB, patch		Details \| Diff \| Splinter Review
5/5 - streamconv/converters/mozTXTToHTMLConv.cpp 26 years ago Ben Bucksch (:BenB) 26.90 KB, patch		Details \| Diff \| Splinter Review

Reporter

Description

•

26 years ago

The current mechanism for recognizing URLs in mail messages is hard coded to only look for a few URL schemes. We need to make this extensible so that URLs associated with all protocol plugins are recognized. For instance, jar: URLs aren't recognized right now. To do this, I think all we have to do is first detect something that looks like a protocol scheme (e.g. "foo:") and then take the text up to the next whitespace character and hand it to nsIOService::NewURI. If this successfully constructs a URL, then we know that the protocol scheme does correspond to an installed protocol plugin, and that the URL should be converted into an actual link in the text.

Updated

•

26 years ago

Assignee: phil → rhp

Comment 1

•

26 years ago

Reassign to rhp

Comment 2

•

26 years ago

Ben had been working on this code (actually working on a rewrite of some of these routines) so he would be the person to look at this. - rhp

Comment 3

•

26 years ago

Thats: Ben Bucksch http://www.bucksch.org

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Assignee: rhp → mozilla

Status: ASSIGNED → NEW

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Status: NEW → ASSIGNED

Ben Bucksch (:BenB)

Assignee

Comment 4

•

26 years ago

Accepting

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Severity: normal → enhancement

Component: Front End → MIME

OS: other → All

Priority: P3 → P1

Target Milestone: M19 → M12

Ben Bucksch (:BenB)

Assignee

Comment 5

•

26 years ago

The most basic recognition functionality seems to work. Need to do more testing. Not working: mailto, abbreviated URLs. All the other funtions in the class (ParseURL etc.) still don't use Necko and need to be rewritten (by me).

Ben Bucksch (:BenB)

Assignee

Comment 6

•

26 years ago

Some description of my code: It works mode-based: modes are tested in sequence (defined by a const) and the first successful one wins. Modes are the following (copied from source code): RFC1738, /* Check, if RFC1738, APPENDIX compliant, like <URL:http://www.mozilla.org>. */ RFC2396, /* RFC2396, APPENDIX E allows anglebrackets (like <http://www.mozilla.org>) or quotation marks (like "http://www.mozilla.org") (w/o "URL:"). */ freetext /* assume heading scheme with "[a-zA-Z0-9]*:" like "news:". Certain characters (see code) or any whitespace (including linebreaks) end the URL. Other certain (punctation) characters (see code) at the end are stripped off. */ /* RFC1738 and RFC2396 type URLs may may use multiple lines, whitespace is stripped. Special characters like ',' stay intact.*/

Reporter

Comment 7

•

26 years ago

Sounds like you're saying that you wrote your own recognizer based on the specs, but I'd rather see us use what necko has for consistency. That way if the thing is highlighted, we'll be assured that we can handle it. If necko's url parsing doesn't meet the specs specified, then we should fix it.

Ben Bucksch (:BenB)

Assignee

Comment 8

•

26 years ago

Warren, all my functiom does is to decide, where the URL starts and ends. I don't know of a Necko function doing this. After that is done, I leave it up to Necko (NS_NewURI) to decide, is the result is valid or not. I'll attach the current code (work in progress). If you still think, it should be moved to Necko, please provide me with the necessary background (knowledge) and I'll integrate it.

Ben Bucksch (:BenB)

Assignee

Comment 9

•

26 years ago

Attached file Current state, URL recognition incl. mailto and abbreviated should work — Details

Ben Bucksch (:BenB)

Assignee

Comment 10

•

26 years ago

I forgot to mention: function in question is FindURL.

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Depends on: 19313

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Blocks: 18410

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Blocks: 5351

Comment 11

•

26 years ago

*** Bug 7176 has been marked as a duplicate of this bug. ***

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Blocks: 19992

Ben Bucksch (:BenB)

Assignee

Comment 12

•

26 years ago

Attached patch 1/5 - mailnews/mime — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 13

•

26 years ago

Attached patch 2/5 - netwerk/streamconv — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 14

•

26 years ago

Attached patch 3/5 - streamconv/public/mozITXTToHTMLConv.idl — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 15

•

26 years ago

Attached patch 4/5 - streamconv/converters/mozTXTToHTMLConv.h — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 16

•

26 years ago

Attached patch 5/5 - streamconv/converters/mozTXTToHTMLConv.cpp — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 17

•

26 years ago

The patches/files create a new defunct stream converter with an XPCOM interface and 3 (static) functions: ScanTXT, ScanHTML and CiteLevel. The latter is currently unused, ScanTXT is used by mimetpla.cpp and mimetplf.cpp, ScanHTML by nsMsgSendPart. I changed these functions to use the new class and removed the old functions from nsMimeURLUtils. I will ask Shaver, if the licence is OK. The callers still need some work for I18N and perf checking to pass the right modes to the functions, but most if not all points are marked with a XXX comment. rhp, can you please review the mime parts and the 3 functions? valeski, can you please review the converter and it's integration in Necko? Is it OK, that it registers for text/plain? If not, can you make it register with the Factory, so libmime can access it? Tnx.

Ben Bucksch (:BenB)

Assignee

Comment 18

•

26 years ago

Typo: "pref(erences) checking", not "perf checking"

Comment 19

•

26 years ago

Everyone, It probably makes sense for one person to land all of these changes. If you want, I can step up and take that role. I will probably get this stuff ready to rock over the weekend and look for a Monday landing. If anyone objects, please let me know. - rhp PS: Warren: this will include the other changes we talked about today.

Ben Bucksch (:BenB)

Assignee

Comment 20

•

26 years ago

Note, that the license for moz(I)TXTToHTMLConv is possibly invalid. I may release it under a different licence (e.g. a modified MPL or new BSD-style (w/o ad restriction) license.

Updated

•

26 years ago

Status: ASSIGNED → RESOLVED

Closed: 26 years ago

Resolution: --- → FIXED

Comment 21

•

26 years ago

Ok gang, this is all checked in now. There seems to be an issue with the emoticon detection that Ben is working on, but other than that, we seem to be working. - rhp

Reporter

Comment 22

•

26 years ago

Judging from Ben's comment about just looking at where the url starts and ends and then calling NS_NewURI, I'm happy. I haven't looked at the code though. One thing I'd like to see though that I always considered broken in 4.x releases: If a url is broken across a line, there should be heuristics that recognize that fact, and pick up the rest of the url as the continuation, e.g.: bla bla bla bla bla bla bla bla bla bla bla bla bla http://listings.ebay.com/aw/ listings/list/category1497/index.html bla bla bla bla bla bla The recognizer should notice the "<text>*:" as the start of the url, then notice that it ends with the newline, and then look on the next line for a string a of text containing slashes, and dots, etc. and including it in the url string.

Ben Bucksch (:BenB)

Assignee

Comment 23

•

26 years ago

Warren, bug #5351 (dependant on this) addresses the linebreaks in URLs.

Updated

•

26 years ago

Blocks: 21564

Ben Bucksch (:BenB)

Assignee

Comment 24

•

26 years ago

Docs at <http://www.bucksch.org/1/projects/mozilla/>

URL: http://www.bucksch.org/1/projects/moz...

Updated

•

25 years ago

No longer blocks: 21564

Updated

•

25 years ago

QA Contact: lchiang → esther

Comment 25

•

24 years ago

Can anyone give me some test url's for this bug. I have verified a mailto link works OK (comment #9) and in (comment #22) a long string url with text in front and in back of it sent and received as a url OK. I'm not sure what all the protocols are as stated in original description, so if someone can help with this I would appreciate it.

Ben Bucksch (:BenB)

Assignee

Comment 26

•

24 years ago

Esther, this code is so old and so visible that you can fairly securely mark this verified. (I won't do so, because I am the one who fixed it.)

Comment 27

•

24 years ago

Thanks Ben! Verified.

Status: RESOLVED → VERIFIED

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: MailNews → Core

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core

You need to log in before you can comment on or make changes to this bug.