Closed Bug 116242 Opened 24 years ago Closed 16 years ago

[mozTXTToHTMLConv] Function: Find URL in plaintext string

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.5beta

People

(Reporter: BenB, Assigned: BenB)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Proposed fix, Version 3 22 years ago Ben Bucksch (:BenB) 33.93 KB, patch		Details \| Diff \| Splinter Review

Ben Bucksch (:BenB)

Assignee

Description

•

24 years ago

Several features in Mozilla (e.g. the spellchecker, an "open selection as URL" context menu item, maybe the urlbar etc.) need to find a URL in a plaintext string. I.e. you have a string and you suspect a URL in it, but you don't know, where it starts and ends. The task is difficult. But we have code in Mozilla which performs exactly that, namely in the TXT->HTML converter called mozTXTToHTMLConv in netwerk. I think, this code works fairly reliably, so we should reuse it in the other part ofs the app. The subject of this bug is to create an XPCOM- or C++-function suitable for use in these other features. The signature could look like void findURLinPlaintext(in string text, out long start, out long end) If an URL has been found, the function returns NS_SUCCESS and fills |start| and |end| with the indices of the start and end of the first URL found (|end| is the index of the last charater of the URL, not the char after it). If no URL could be found, it returns a certain (non-fatal) error code.

Ben Bucksch (:BenB)

Assignee

Updated

•

24 years ago

Keywords: mozilla1.0

Summary: Find URL in string → Function: Find URL in plaintext string

Boris Zbarsky [:bzbarsky]

Comment 1

•

24 years ago

Could we also provide an nsAString version that will take start and end iterators and adjust them to point to the url (a la FindInReadable)?

Ben Bucksch (:BenB)

Assignee

Comment 2

•

24 years ago

Boris Zbarsky, do you have a concrete use in mind?

Boris Zbarsky [:bzbarsky]

Comment 3

•

24 years ago

Yes. The concrete use is if I have a unicode string and don't want to UTF8-encode it, make a copy, send it through findURLinPlaintext, take the substring and convert it back into UCS2.... This is most likely to be needed by the spellchecker, since I presume the message being composed is in UCS2 internally...

Ben Bucksch (:BenB)

Assignee

Comment 4

•

24 years ago

I intended to use 16bit wide strings anyway. If for no other reason, then because indices in utf8 are harder (do they mean the char-index or the byte-index? ...).

Boris Zbarsky [:bzbarsky]

Comment 5

•

24 years ago

Ah. That was not clear from the proposed prototype... :) In that case, what you have is likely fine. It _does_ require a flat string, but that can be worked around... Iterators are just more convenient than numeric indices for a lot of string work, which is why I suggested that.

Ben Bucksch (:BenB)

Assignee

Updated

•

24 years ago

Blocks: 10080

Ben Bucksch (:BenB)

Assignee

Updated

•

23 years ago

Blocks: 172186

Summary: Function: Find URL in plaintext string → [mozTXTToHTMLConv] Function: Find URL in plaintext string

Ben Bucksch (:BenB)

Assignee

Comment 6

•

22 years ago

Attached patch Proposed fix, Version 3 — Details — Splinter Review

This is the function: /** Pass a plaintext string to it and it will try to find/recognize the first URL in it (possibly abbr. URL and burried like in "foo@example.com." or augmented like in "<http://www.example.com>") and return the loadable URL (e.g. "mailto:foo@example.com" or "http://www.example.com", respectively). @param text search for the URL here @param startPos first character of the URL in |text| @param endPos last character of the URL in |text| @param url loadable URL (the URL in |text|, as returned by start/endPos, may be abbreviated). You have to nsMemory:Free() this @return URL found */ boolean findURLTXT([const] in wstring text, out long startPos, out long endPos, out wstring url); The trigger was bug 172186, but that doesn't need the url param. However, other expected users of the function, e.g. load selection as URL or the URLbar, will need it. And this url out param is also what required quite some reworking of the mozTXTToHTMLConv class. FindURL previously generanted HTML, but here I need the real, valid, completed URL, so I had to reorganize the functions to get the HTML stuff out of FindURL. While I was at it, I also fixed a number of other stuff, like - warnings - a function rename (ShouldLinkify() -> HasValidScheme()) - lines > 80 chars - comments I tested this against my old test cases (created when I initially wrote the class / the converter), and they still all work fine. The new IDL function is not yet tested, though.

Ben Bucksch (:BenB)

Assignee

Comment 7

•

22 years ago

-mozilla1.0 keyword. I guess I missed that target.

Keywords: mozilla1.0

Target Milestone: --- → mozilla1.5beta

Boris Zbarsky [:bzbarsky]

Comment 8

•

22 years ago

Why wstring instead of nsIURI?

Ben Bucksch (:BenB)

Assignee

Comment 9

•

22 years ago

No hard reason. I think I could use nsURI, but I'd have to change a number of internal function signatures.

Darin Fisher

Comment 10

•

22 years ago

conversion from nsIURI to wstring is lossy, which sometimes matters. prefer nsIURI whenever possible. if nothing else it helps avoid string copies :)

benc

Comment 11

•

22 years ago

-> qawanted. This is interesting stuff, but until the ns and mz stuff is unified, I'm going to focus on other technologies, since my time is really limited right now. As I recall, the mail used mz, chatzilla used ns. I guess it doesn't matter what NIM uses anymore...

Keywords: qawanted

QA Contact: benc → nobody

Ben Bucksch (:BenB)

Assignee

Updated

•

22 years ago

Blocks: 227922

Ben Bucksch (:BenB)

Assignee

Comment 12

•

21 years ago

Bug 254913 (anti-phishing) is another potential user of this function.

Wayne Mery (:wsmwk)

Updated

•

20 years ago

No longer blocks: 227922

Phil Ringnalda (:philor)

Updated

•

16 years ago

QA Contact: nobody → networking

Ben Bucksch (:BenB)

Assignee

Comment 13

•

16 years ago

This was fixed 2004-02-19 12:44 as part of bug 234936 and bug 172186.

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Ben Bucksch (:BenB)

Assignee

Comment 14

•

16 years ago

Current API: @param a wide string to scan for the presence of a URL. @param aLength --> the length of the buffer to be scanned @param aPos --> the position in the buffer to start scanning for a url aStartPos --> index into the start of a url (-1 if no url found) aEndPos --> index of the last character in the url (-1 if no url found) void findURLInPlaintext(in wstring text, in long aLength, in long aPos, out long aStartPos, out long aEndPos); <http://mxr.mozilla.org/comm-central/source/mozilla/netwerk/streamconv/public/mozITXTToHTMLConv.idl#111> (m-c) <http://mxr.mozilla.org/seamonkey/source/netwerk/streamconv/public/mozITXTToHTMLConv.idl#111> (CVS)

Peter Bylenga [:PBylenga]

Updated

•

11 years ago

Keywords: qawanted

You need to log in before you can comment on or make changes to this bug.