Closed
Bug 116242
Opened 24 years ago
Closed 16 years ago
[mozTXTToHTMLConv] Function: Find URL in plaintext string
Categories
(Core :: Networking, enhancement)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla1.5beta
People
(Reporter: BenB, Assigned: BenB)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
33.93 KB,
patch
|
Details | Diff | Splinter Review |
Several features in Mozilla (e.g. the spellchecker, an "open selection as URL"
context menu item, maybe the urlbar etc.) need to find a URL in a plaintext
string. I.e. you have a string and you suspect a URL in it, but you don't know,
where it starts and ends.
The task is difficult. But we have code in Mozilla which performs exactly that,
namely in the TXT->HTML converter called mozTXTToHTMLConv in netwerk. I think,
this code works fairly reliably, so we should reuse it in the other part ofs the
app.
The subject of this bug is to create an XPCOM- or C++-function suitable for use
in these other features. The signature could look like
void findURLinPlaintext(in string text, out long start, out long end)
If an URL has been found, the function returns NS_SUCCESS and fills |start| and
|end| with the indices of the start and end of the first URL found (|end| is the
index of the last charater of the URL, not the char after it). If no URL could
be found, it returns a certain (non-fatal) error code.
Assignee | ||
Updated•24 years ago
|
Keywords: mozilla1.0
Summary: Find URL in string → Function: Find URL in plaintext string
![]() |
||
Comment 1•24 years ago
|
||
Could we also provide an nsAString version that will take start and end
iterators and adjust them to point to the url (a la FindInReadable)?
Assignee | ||
Comment 2•24 years ago
|
||
Boris Zbarsky, do you have a concrete use in mind?
![]() |
||
Comment 3•24 years ago
|
||
Yes. The concrete use is if I have a unicode string and don't want to
UTF8-encode it, make a copy, send it through findURLinPlaintext, take the
substring and convert it back into UCS2....
This is most likely to be needed by the spellchecker, since I presume the
message being composed is in UCS2 internally...
Assignee | ||
Comment 4•24 years ago
|
||
I intended to use 16bit wide strings anyway. If for no other reason, then
because indices in utf8 are harder (do they mean the char-index or the
byte-index? ...).
![]() |
||
Comment 5•24 years ago
|
||
Ah. That was not clear from the proposed prototype... :)
In that case, what you have is likely fine. It _does_ require a flat string, but
that can be worked around...
Iterators are just more convenient than numeric indices for a lot of string
work, which is why I suggested that.
Assignee | ||
Updated•23 years ago
|
Blocks: 172186
Summary: Function: Find URL in plaintext string → [mozTXTToHTMLConv] Function: Find URL in plaintext string
Assignee | ||
Comment 6•22 years ago
|
||
This is the function:
/**
Pass a plaintext string to it and it will try to find/recognize the
first URL in it (possibly abbr. URL and burried like in
"foo@example.com." or augmented like in "<http://www.example.com>")
and return the loadable URL (e.g. "mailto:foo@example.com" or
"http://www.example.com", respectively).
@param text search for the URL here
@param startPos first character of the URL in |text|
@param endPos last character of the URL in |text|
@param url loadable URL (the URL in |text|, as returned by start/endPos,
may be abbreviated). You have to nsMemory:Free() this
@return URL found
*/
boolean findURLTXT([const] in wstring text,
out long startPos, out long endPos, out wstring url);
The trigger was bug 172186, but that doesn't need the url param. However, other
expected users of the function, e.g. load selection as URL or the URLbar, will
need it.
And this url out param is also what required quite some reworking of the
mozTXTToHTMLConv class. FindURL previously generanted HTML, but here I need the
real, valid, completed URL, so I had to reorganize the functions to get the
HTML stuff out of FindURL.
While I was at it, I also fixed a number of other stuff, like
- warnings
- a function rename (ShouldLinkify() -> HasValidScheme())
- lines > 80 chars
- comments
I tested this against my old test cases (created when I initially wrote the
class / the converter), and they still all work fine.
The new IDL function is not yet tested, though.
Assignee | ||
Comment 7•22 years ago
|
||
-mozilla1.0 keyword. I guess I missed that target.
Keywords: mozilla1.0
Target Milestone: --- → mozilla1.5beta
![]() |
||
Comment 8•22 years ago
|
||
Why wstring instead of nsIURI?
Assignee | ||
Comment 9•22 years ago
|
||
No hard reason. I think I could use nsURI, but I'd have to change a number of
internal function signatures.
Comment 10•22 years ago
|
||
conversion from nsIURI to wstring is lossy, which sometimes matters. prefer
nsIURI whenever possible. if nothing else it helps avoid string copies :)
Comment 11•22 years ago
|
||
-> qawanted.
This is interesting stuff, but until the ns and mz stuff is unified, I'm going
to focus on other technologies, since my time is really limited right now. As I
recall, the mail used mz, chatzilla used ns. I guess it doesn't matter what NIM
uses anymore...
Keywords: qawanted
QA Contact: benc → nobody
Assignee | ||
Comment 12•21 years ago
|
||
Bug 254913 (anti-phishing) is another potential user of this function.
Updated•16 years ago
|
QA Contact: nobody → networking
Assignee | ||
Comment 13•16 years ago
|
||
This was fixed 2004-02-19 12:44 as part of bug 234936 and bug 172186.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 14•16 years ago
|
||
Current API:
@param a wide string to scan for the presence of a URL.
@param aLength --> the length of the buffer to be scanned
@param aPos --> the position in the buffer to start scanning for a url
aStartPos --> index into the start of a url (-1 if no url found)
aEndPos --> index of the last character in the url (-1 if no url found)
void findURLInPlaintext(in wstring text, in long aLength, in long aPos, out long aStartPos, out long aEndPos);
<http://mxr.mozilla.org/comm-central/source/mozilla/netwerk/streamconv/public/mozITXTToHTMLConv.idl#111> (m-c)
<http://mxr.mozilla.org/seamonkey/source/netwerk/streamconv/public/mozITXTToHTMLConv.idl#111> (CVS)
You need to log in
before you can comment on or make changes to this bug.
Description
•