Closed Bug 24945 Opened 25 years ago Closed 24 years ago

mozTXTToHTMLConv::CheckURLAndCreateHref actually creates real uris...

Categories

(MailNews Core :: MIME, defect, P5)

x86
Other
defect

Tracking

(Not tracked)

VERIFIED WONTFIX

People

(Reporter: mscott, Assigned: BenB)

Details

Hey Rich, can you help me find the right owner for this bug? I think Ben wrote
this stuff right?

I noticed that mozTXTToHTMLConv::CheckURLAndCreateHref is actually creating real
nsIURI objects for each url spec it encounters. But it never actually uses the
nsIURI returned. Unfortunately creating uri's can be expensive as they trigger a
host of uri event sink getters that get created and associated with the uri.


It looks like we're just trying to create a nsIURI to see if the app knows how
to handle the URI. Then we know we can create an href for the link.

Here is another ideas we could try that would be faster:
1) Take the IO Service, and ask it to extract the url scheme from the url spec
stored in txtURL.
2) Take that spec, and ask the nsIOService to give you a protocol handler for
that could handle that spec. GetProtocolHandler

If you are given a protocol handler, then you know we can handle this url! And
we don't actually have to go create a nsIURI for the url spec.
Hi Ben,
Could you take a look at the changes that Scott is suggesting. 

Thanks!

- rhp
Assignee: rhp → mozilla
Thanks for your proposal.

> It looks like we're just trying to create a nsIURI to see if the app knows how
> to handle the URI.
Right and the general validity of the URI.

It looks to me, that your proposals would just check the scheme part. Am I
right?
We need to check the validity of the full URL/URI, so that (when bug #19313 is
fixed) "mailto:///@@@@////" or "mailto:foo@www/mozilla/com" are not turned into
links. This is a frequently requested feature.
BTW: |CheckURLAndCreateHref| is only rarely invoked.

I would also need more detailed information about the usage of the proposed
functions. I don't know Necko or XPCOM very well.
Status: NEW → ASSIGNED
Priority: P3 → P4
Blocks: 26915
warren,
do you have an idea, how to achieve what me *and* mscott want, i.e. fully test
the validity of the complete URL, but without time-consuming side effects?
We need this for M14.

mscott, if for any reason warren cannot read this, please tell me.
Priority: P4 → P2
Target Milestone: M14
Nope. You can't have it both ways. Either you have to verify whether the URL 
exists (which may involve sending a HEAD request to a server in Nepal), or you 
assume that if it looks like a URL, you can click on it. I think the latter is 
better. Besides, just because the link isn't currently valid, doesn't mean it 
never will be (a server may be down, etc). Also, I think the ratio of valid 
URLs to invalid ones will be high.

You can check the URL syntactically by attempting to construct an nsIURI object 
out of it (which is what I thought you were doing already). This should weed 
out the mailto:///@@@@//// case.
warren,
sorry, I meant syntactical validity. Network traffic is a no-no for these
checks.
Yes, I do try to create an URI (via nsIOService::NewURI), but that's slow. Is
there anything, which the same effect, but faster?
First, I don't think NewURI is that slow. Second, you're going to have to do 
that work anyway if you really want to know it's syntactically valid. Finally, 
you said |CheckURLAndCreateHref| is only rarely invoked anyway.

NS_NewURI is the right thing.
>you said |CheckURLAndCreateHref| is only rarely invoked anyway.

After doing some profiling for bug 26915, I saw, that despite the infrequency,
the time taken by NS_NewURI is noticable (~10% of overall scanner time).

mscott wrote:
>Unfortunately creating uri's can be expensive as they trigger a
>host of uri event sink getters that get created and associated with the uri.

I thought, there's a way to circumvent that, but still use the Necko checking
routines.
Not to completely scare mscott: the 10% are for exceptional cases, which are our
problem right now, like C++ source code. Normal text has another profile.
Creating URIs does nothing with event sinks. Just fetching them.

Again, NS_NewURI is the thing to use. If it becomes a bottleneck, let's look at 
ways to speed it up.
OK
I have some a bit more reliable numbers and it doesn't look so bad anymore. A
100K msg with >700 URLs needs 1.2s to go through ScanHTML. I think, this is OK
for beta1.
M15 and removing dependency to bug #26915.
No longer blocks: 26915
Target Milestone: M14 → M15
Mass-LATER
Priority: P2 → P5
LATER
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → LATER
Ben - here's another one you may want to reopen and then move to "helpwanted", 
assigned to nobody@mozilla.org.  Thanks.
Warren worte:
> Again, NS_NewURI is the thing to use. If it becomes a bottleneck, let's look
> at ways to speed it up.

So, this bug is more WONTFIX. We could file a new bug on "Speed up NS_NewURI",
it is remains to be a problem.
Status: RESOLVED → REOPENED
Resolution: LATER → ---
WONTFIX
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WONTFIX
verified won't fix then.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.