Closed
Bug 116242
Opened 23 years ago
Closed 15 years ago
[mozTXTToHTMLConv] Function: Find URL in plaintext string
Categories
(Core :: Networking, enhancement)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla1.5beta
People
(Reporter: BenB, Assigned: BenB)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
33.93 KB,
patch
|
Details | Diff | Splinter Review |
Several features in Mozilla (e.g. the spellchecker, an "open selection as URL" context menu item, maybe the urlbar etc.) need to find a URL in a plaintext string. I.e. you have a string and you suspect a URL in it, but you don't know, where it starts and ends. The task is difficult. But we have code in Mozilla which performs exactly that, namely in the TXT->HTML converter called mozTXTToHTMLConv in netwerk. I think, this code works fairly reliably, so we should reuse it in the other part ofs the app. The subject of this bug is to create an XPCOM- or C++-function suitable for use in these other features. The signature could look like void findURLinPlaintext(in string text, out long start, out long end) If an URL has been found, the function returns NS_SUCCESS and fills |start| and |end| with the indices of the start and end of the first URL found (|end| is the index of the last charater of the URL, not the char after it). If no URL could be found, it returns a certain (non-fatal) error code.
Assignee | ||
Updated•23 years ago
|
Keywords: mozilla1.0
Summary: Find URL in string → Function: Find URL in plaintext string
Comment 1•23 years ago
|
||
Could we also provide an nsAString version that will take start and end iterators and adjust them to point to the url (a la FindInReadable)?
Assignee | ||
Comment 2•23 years ago
|
||
Boris Zbarsky, do you have a concrete use in mind?
Comment 3•23 years ago
|
||
Yes. The concrete use is if I have a unicode string and don't want to UTF8-encode it, make a copy, send it through findURLinPlaintext, take the substring and convert it back into UCS2.... This is most likely to be needed by the spellchecker, since I presume the message being composed is in UCS2 internally...
Assignee | ||
Comment 4•23 years ago
|
||
I intended to use 16bit wide strings anyway. If for no other reason, then because indices in utf8 are harder (do they mean the char-index or the byte-index? ...).
Comment 5•23 years ago
|
||
Ah. That was not clear from the proposed prototype... :) In that case, what you have is likely fine. It _does_ require a flat string, but that can be worked around... Iterators are just more convenient than numeric indices for a lot of string work, which is why I suggested that.
Assignee | ||
Updated•22 years ago
|
Blocks: 172186
Summary: Function: Find URL in plaintext string → [mozTXTToHTMLConv] Function: Find URL in plaintext string
Assignee | ||
Comment 6•21 years ago
|
||
This is the function: /** Pass a plaintext string to it and it will try to find/recognize the first URL in it (possibly abbr. URL and burried like in "foo@example.com." or augmented like in "<http://www.example.com>") and return the loadable URL (e.g. "mailto:foo@example.com" or "http://www.example.com", respectively). @param text search for the URL here @param startPos first character of the URL in |text| @param endPos last character of the URL in |text| @param url loadable URL (the URL in |text|, as returned by start/endPos, may be abbreviated). You have to nsMemory:Free() this @return URL found */ boolean findURLTXT([const] in wstring text, out long startPos, out long endPos, out wstring url); The trigger was bug 172186, but that doesn't need the url param. However, other expected users of the function, e.g. load selection as URL or the URLbar, will need it. And this url out param is also what required quite some reworking of the mozTXTToHTMLConv class. FindURL previously generanted HTML, but here I need the real, valid, completed URL, so I had to reorganize the functions to get the HTML stuff out of FindURL. While I was at it, I also fixed a number of other stuff, like - warnings - a function rename (ShouldLinkify() -> HasValidScheme()) - lines > 80 chars - comments I tested this against my old test cases (created when I initially wrote the class / the converter), and they still all work fine. The new IDL function is not yet tested, though.
Assignee | ||
Comment 7•21 years ago
|
||
-mozilla1.0 keyword. I guess I missed that target.
Keywords: mozilla1.0
Target Milestone: --- → mozilla1.5beta
Comment 8•21 years ago
|
||
Why wstring instead of nsIURI?
Assignee | ||
Comment 9•21 years ago
|
||
No hard reason. I think I could use nsURI, but I'd have to change a number of internal function signatures.
Comment 10•21 years ago
|
||
conversion from nsIURI to wstring is lossy, which sometimes matters. prefer nsIURI whenever possible. if nothing else it helps avoid string copies :)
Comment 11•21 years ago
|
||
-> qawanted. This is interesting stuff, but until the ns and mz stuff is unified, I'm going to focus on other technologies, since my time is really limited right now. As I recall, the mail used mz, chatzilla used ns. I guess it doesn't matter what NIM uses anymore...
Keywords: qawanted
QA Contact: benc → nobody
Assignee | ||
Comment 12•20 years ago
|
||
Bug 254913 (anti-phishing) is another potential user of this function.
Updated•15 years ago
|
QA Contact: nobody → networking
Assignee | ||
Comment 13•15 years ago
|
||
This was fixed 2004-02-19 12:44 as part of bug 234936 and bug 172186.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 14•15 years ago
|
||
Current API: @param a wide string to scan for the presence of a URL. @param aLength --> the length of the buffer to be scanned @param aPos --> the position in the buffer to start scanning for a url aStartPos --> index into the start of a url (-1 if no url found) aEndPos --> index of the last character in the url (-1 if no url found) void findURLInPlaintext(in wstring text, in long aLength, in long aPos, out long aStartPos, out long aEndPos); <http://mxr.mozilla.org/comm-central/source/mozilla/netwerk/streamconv/public/mozITXTToHTMLConv.idl#111> (m-c) <http://mxr.mozilla.org/seamonkey/source/netwerk/streamconv/public/mozITXTToHTMLConv.idl#111> (CVS)
You need to log in
before you can comment on or make changes to this bug.
Description
•