Closed Bug 108654 Opened 20 years ago Closed 17 years ago

[mozTXTToHTMLConv] Message-ID interpreted as mail address

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: 3.14, Assigned: BenB)

Details

Attachments

(1 file, 1 obsolete file)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.5) Gecko/20011012

It is common practice to cite message-ids in news articles, e.g.,
<3BE7BB7F.1070204@logic.univie.ac.at>. Mozilla makes it mailto which is a very
bad guess. So it is not possible to follow the hint. Hence there must be either
a very good heuristic to find out or a user dialog, which must in any case be
available (thru the right mouse button).

pi
There is no realistic way for the link parser to know whether
<3BE7BB7F.1070204@logic.univie.ac.at> is a mailto: address or message-id 
because they look the exact same.

Besides, you usually use full news:// urls (like
<news://news.mozilla.org/3BE7BB7F.1070204@logic.univie.ac.at>) when citing
messages in newsgroups, and that's the way it should be.

Suggesting wontfix.
> There is no realistic way for the link parser to know whether
> <3BE7BB7F.1070204@logic.univie.ac.at> is a mailto: address or message-id
> because they look the exact same.

They are *formally* the same. But when you look at it your guess will be right
in almost any case. And if you think it is not possible to have a heuristic (as
I suggested), you are wrong, Forte Agent has this for many years.

> Besides, you usually use full news:// urls (like
> <news://news.mozilla.org/3BE7BB7F.1070204@logic.univie.ac.at>)

No. It is common practice in usenet much longer than URLs are used. Usenet is
not the non-clickable part of the web.

And nobody (with understanding of usenet) will add the server for normal
messages, because you usually need the right to access the server.

> when citing messages in newsgroups, and that's the way it should be.

Do we want a newsreader to work for the people or the people to adjust to broken
software? In the latter case, cancel this project and let people use Outbreak
Excess.

> Suggesting wontfix.

You ignored my suggestion to allow the user to decide.

pi
the linkifier is actually a browser:networking component iirc
-> me
Assignee: sspitzer → mozilla
Component: Networking - News → Networking
OS: Linux → All
Product: MailNews → Browser
Hardware: PC → All
Not that this "ceveat" is well-documented
(<http://www.bucksch.com/1/projects/mozilla/16507/>, "Known failures").

> Hence there must be either a very good heuristic to find out

Impossible, to my knowledge. Message-IDs are syntactically identical to email
addresses.

> a user dialog, which must in any case be available (thru the right mouse
> button).

huh? right mouse button in a dialog?

anyways, I don't see, how that could be achived, technically.
The converter converts to generic HTML, and could be used to for saved documents
or something. So,
- I don't know, how to code that at all.
- Even if we do, it would probably depend on Mozilla as renderer, which might
not be desireable for saved docs.
Status: UNCONFIRMED → NEW
Ever confirmed: true
>> Hence there must be either a very good heuristic to find out
>
>Impossible, to my knowledge. Message-IDs are syntactically identical to email
>addresses.

Yes, but you can make an educated guess; Forte Agent is pretty good at it. For
example, e-mail addresses tend to be much shorter (in the local part) than
message-ids. $ symbols never how up in e-mail (I am not positive if it would be
legal). A long sequence of numbers and consonants are unlikely in an e-mail
address. Just a few ideas. Of course, this is not 100% safe.

>> a user dialog, which must in any case be available (thru the right mouse
>> button).
>
>huh? right mouse button in a dialog?

I thought about the following: On the link right-click, so the context menu
comes up. There should be something like "send e-mail to that address" and "get
that message-id".

>anyways, I don't see, how that could be achived, technically.
>The converter converts to generic HTML, and could be used to for saved documents
>or something. So,
>- I don't know, how to code that at all.
>- Even if we do, it would probably depend on Mozilla as renderer, which might
>not be desireable for saved docs.

That is a problem. But as long as the messages are displayed, we have access. So
the heuristic approach would again be useful for the rest.

pi
> you can make an educated guess


An educated, guess is OK, but can we make one? (An educated guess is for me one
that either strcitly follows the spec or is right in 99.9%+ of the cases.)

> For example, e-mail addresses tend to be much shorter (in the local part)

Uh. There are very long email addresses. Just today, I had something like
<cornelia.inschweiler-povierski@bigbank.com>.
> $ symbols never how up in e-mail (I am not positive if it would be legal)

They are legal.
RFC822:
Without quoting (where everything is allowed), the local part must be an atom,
which consists of any char except specials, space and ctls. ctls are ASCII <= 31
and 127. specials are "(" / ")" / "<" / ">" / "@" /  "," / ";" / ":" / "\" / <">
/  "." / "[" / "]" (but "." are allowed bey special exception).

> A long sequence of numbers and consonants are unlikely in an e-mail address.

Uh, there are indeed valid email addresses of that form, e.g. generated ones.

> Of course, this is not 100% safe.

It's too unsafe for my taste. I think that it would confuse users more than it
does today.
To summarize, like I said in my first comment, unless your computer happens to
have a human brain attached to it, there is no realistic chance we can change
this behaviour and make an "educated guess" without breaking more functionality
than we'd add; it would add more pain than gain.

Ben, wontfix?
Håkan, yes, I'm leaning towards wontfix, but I'd like to first give Boris a
chance to figure out some valid options for fixing this bug.
>> you can make an educated guess
> 
> An educated, guess is OK, but can we make one? (An educated guess is for me
> one that either strcitly follows the spec or is right in 99.9%+ of the
> cases.)

Well, now we are 100% wrong with message-ids.

>> For example, e-mail addresses tend to be much shorter (in the local part)
> 
> Uh. There are very long email addresses. Just today, I had something like
> <cornelia.inschweiler-povierski@bigbank.com>.

Right, it is just *a* hint.

>> $ symbols never how up in e-mail (I am not positive if it would be legal)
> 
> They are legal.

OK, but I have never seen them in practice. So this is at least the 99.9%
reliability you asked for. I also did a test (with Mozilla): I sent e-mail to
$@piology.org which actually arrived, but the $ changed to "$". Same for
make$fast@piology.org which became "make$fast"@piology.org.

>> A long sequence of numbers and consonants are unlikely in an e-mail address.
> 
> Uh, there are indeed valid email addresses of that form, e.g. generated ones.

Yes, they are legal. Any message-id would be legal. But it is the question of
best guessing.

>> Of course, this is not 100% safe.
> 
> It's too unsafe for my taste. I think that it would confuse users more than
> it does today.

If you think, this is fine with me, for that case I offered the interactive
solution. There must be a nice way of accessing a message-id. So let the user
say "I want to access this as a message-id" (using the right-click-menu).

Nofix would be like writing a web browser which only displays links, but you
cannot click, you have to copy it and paste it into the goto field where you
also have to modify the link address first.

pi
> Well, now we are 100% wrong with message-ids.

The risk is that we would be turning email addresses in msg ids, which would be
extremely confusing for users.


> I offered the interactive solution

Yes, but while that might be a nice UI, it is really hard to implement in the
converter.


Please note that the linked msg-ids are not very useful anyway, because we can
hardly do anything useful with them. In the best case, we can show a msg that
happens to lie in the same mail folder or on the same news server. This could
change (there is a bug about it), but it is unlikely to happen in the mid-term
future.
I forgot one thing: The $-rule looks OK, I think. Can you come up with more
ones? (which don't bear the risk to consider a real email address a msg-id.)

Note that someone still has to implement the stuff. I probably won't do it myself.
>> I offered the interactive solution
> 
> Yes, but while that might be a nice UI, it is really hard to
> implement in the converter.

You right-click on a mailto-link, choose "get this message-id". What
would Mozilla do? Just replace mailto: by news://server/ where server
is the server you are using right now or else the default news server.
Then call this URL. Doing that manually is quite annoying. Having the
program do it is easy.

> Please note that the linked msg-ids are not very useful anyway,
> because we can hardly do anything useful with them.

NACK. Someone tells me that the answer is in the message with that id.
So I get it and have the answer. Very useful. Happens all the time.

> In the best case, we can show a msg that happens to lie in the 
> same mail folder or on the same news server.

That's good enough for practical purposes. Or you could (as an option)
also allow to try to get it from groups.google.com; Gnus has this option.

A nice argument for news *instead* of mailto right away:
http://www.cbl.ncsu.edu/DiscussionGroups/MHonArc/MHonArc-1998-08-15/msg00024.html

> I forgot one thing: The $-rule looks OK, I think. Can you come up 
> with more ones? (which don't bear the risk to consider a real 
> email address a msg-id.)

Some readers add their name, like <pine.somestring@somehost>. I couldn't find a
list, but I keep searching. Forte Agent uses <identifier@4ax.com> for
message-ids, no e-mails at this domain; I am not aware of another program using
this domain approach.

Also, if the domain name is only one part, that is a (broken) message-id and not
a mail address, e.g., <something@localhost>, <somethingelse@myhost>.

OTOH, everything with a short local part (we would have to find a good limit
here) is an e-mail address.

pi
> > same mail folder or on the same news server.
> That's good enough for practical purposes.
[...]
> A nice argument for news *instead* of mailto right away:

I think, you are talking mostly about news. However, the vast majority of users
never use news at all. So, correct recognition of email addresses has, for me,
absolute priority.

As for the attribution lines like
"In your message <a6758ghd74.8123456@foo.org>, you write:",
I'd argue that they are broken and should read <mid:a6758ghd74.8123456@foo.org>.

> Some readers add their name, like <pine.somestring@somehost>.
> I couldn't find a list, but I keep searching. Forte Agent uses
> <identifier@4ax.com> for message-ids, no e-mails at this domain;
> I am not aware of another program using this domain approach.

Very interesting. This is something safe and implementable.

> Also, if the domain name is only one part, that is a (broken) message-id
> and not a mail address, e.g., <something@localhost>, <somethingelse@myhost>.

Apart from the fact that your assertion is completely wrong (<ben@myserver> is
indeed an email address, just that it is only meaningful in the local network),
it is also unsafe, compare "let's meet@9am".
> As for the attribution lines like
> "In your message <a6758ghd74.8123456@foo.org>, you write:",
> I'd argue that they are broken

Anyways, this happens a lot.

>> Also, if the domain name is only one part, that is a (broken) message-id
>> and not a mail address, e.g., <something@localhost>, <somethingelse@myhost>.
> 
> Apart from the fact that your assertion is completely wrong (<ben@myserver>
> is indeed an email address, just that it is only meaningful in the local
> network),

Yes, so this is not really used outside.

> it is also unsafe, compare "let's meet@9am".

Making this mailto isn't any better.

pi
> it is also unsafe, compare "let's meet@9am".
> Making this mailto isn't any better.

We do nothing at all.
>> it is also unsafe, compare "let's meet@9am".
>> Making this mailto isn't any better.
>
> We do nothing at all.

I overlooked earlier, that it cannot be a message-id, anyways. The angle
brackets are part of the message-id, hence the above is no problem.

pi
> I forgot one thing: The $-rule looks OK, I think.
> Can you come up with more ones?

I started a discussion at <3BE915AF.4060708@logic.univie.ac.at>. There are
already excellent answers. Even if you don't speak German you will understand
the key points.

Google is a few hours late, but you can later find it there:
http://groups.google.com/groups?threadm=3BE915AF.4060708%40logic.univie.ac.at

Next week I will post a summary here.

pi
Sorry if I'm being ignorant but I don't really see the point or need of this...
It's much better to incorrectly guess at a *@* address being mailto: than
guessing it's a message-id; because mailto: addresses are so much more common.

Doing guesses on valid characters, depending on how long an email address is and
so on is not good enough for me... So summing up usefulness and reliability I
still think this should be WONTFIX.
Sorry, it took longer than expected and there is still no end. But in the mean
time there is an alpha version of a perl script (should be easy to understand)
which guess pretty well: http://piology.org/perl/id-or-mail.pl.html

pi
To add to my last comment. I further developed the script. On a large test base
(more than 4 million message-ids and more than .5 million e-mail addresses) the
error is below 1% for message-ids and below .33% for e-mail addresses).
Certainly, this could be improved further. So far it is at least a proof of concept.

pi
...for archival purposes.
I think the actual script is more useful than the web page wrapped around it.
This obsoltes attachment 63749 [details], but I cannot mark this.

pi
Attachment #63749 - Attachment description: CUrrent version of Boris Perl script (copied from Website) → Current version of Boris' Perl script (copied from Website)
Attachment #63749 - Attachment is obsolete: true
Not even news:3BE7BB7F.1070204@logic.univie.ac.at will work.
It will lappear to be a link, it will show a link in statusbar on mouseover, it
will NOT spawn a mailcompose when clicking on it...actually...*nothing* happens
when clicking it.

NC4.78 and Outlook will load the news-message when clicking that link. (if valid)
Is that another bug?
I programed a little xpi-addon for mozilla which helps to find the message to a
certain messageid.
It integrates a menu-item to the message pane context menu. So you just have to
right-click on the messageid and then choose the adequate newsserver in the
context menu.

Your could find the xpi-addon and some explainations at:
http://messageidfinder.mozdev.org/

Markus

PS: Waiting for feedback
Markus Hossner, this is the wrong bug. You probably want bug 37653.
Summary: Message-ID interpreted as mail address → URL: Message-ID interpreted as mail address
Ben, you just changed the summary. Strictly speaking, this is not a URL, this
bug is about plain message-ids. news:message-id is bug 108877.

pi
I see this question spawned a long thread and Boris' perl script, but one short
look at the script tells me that we probably can't implement it will in Mozilla.
There's no way anything like the script could go into Mozilla. It should be 1 or
2 rules (otherwise code bloat), which are correct like 99.9% of the time
(otherwise users get *very* confused why it sometimes says mailto, sometimes
msgid). Given the script, it doesn't look like there is such a rule, so I'm
closing this as WONTFIX.

Note: We still don't have proper support for actually *using* msg-ids, I think.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
Summary: URL: Message-ID interpreted as mail address → [mozHTMLToTXTConv] Message-ID interpreted as mail address
Note: If there is indeed a safe rule, please specify it readably, not in Perl
code, because Perl looks to me like a bunch of strange characters thrown in a mixer.
Oh, and Boris, sorry for first asking and then nor using the result, but this
script is so many miles away from what I could or would implement in C++ (even
if I wanted to fix this bug). It's not even the same country. I blame myself for
asking "Can you come up with more [rules]?" in comment 12, but I also said "that
someone still has to implement the stuff. I probably won't do it myself.".
Summary: [mozHTMLToTXTConv] Message-ID interpreted as mail address → [mozTXTToHTMLConv] Message-ID interpreted as mail address
Ben, part of the discussion was to offer both in a context menu so people can
make the decission themself. You are right, that Mozilla cannot deal with
message-ids at all (this includes news and nntp URLs). This is embarrassing.

So even if Mozilla cannot decide it can offer both.

pi
Boris, as I said several times, there is no way for the converter to offer UI
(not even iframes with DOM and JS would work, because JS is likely to be
disabled). The TXT->HTML conversion is implemented in the backend (in the Gecko
network library), and it's normal HTML in the Mailnews frontend.
Well, there is a context menu which works somehow. It should be possible to
launch a message-id (once we can do that at all), we can already launch it as
mailto.

pi
You need to log in before you can comment on or make changes to this bug.