Open Bug 278713 Opened 20 years ago Updated 2 years ago

[mozTXTToHTMLConv] linkified text must always be displayed embedded left-to-right

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

People

(Reporter: smontagu, Unassigned)

References

()

Details

(Whiteboard: [necko-would-take])

http://www.ietf.org/internet-drafts/draft-duerst-iri-11.txt, soon to be an RFC,
says:

 Bidirectional IRIs MUST be rendered in the same way as they would be rendered if
 they were in an left-to-right embedding, i.e.  as if they were preceded by
 U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL
 FORMATTING (PDF).  Setting the embedding direction can also be done in a
 higher-level protocol (e.g.  the dir='ltr' attribute in HTML).

Sample steps to reproduce:
 Send yourself a (plain text) mail containing the following sentence:
 גלשתי אל http://he.wikipedia.org/מוזילה וקראתי

Expected results depend on whether BidiMailUI is installed, but the IRI should
always appear as
 http://he.wikipedia.org/מוזילה

One way to fix this would be to add a CSS rule:

a[class|="moz-txt-link"] {
	direction: ltr;
	unicode-bidi: embed;
}
So this is a bug on the text-to-html converter?
If you're talking about plaintext mails, then probably.

What is an "IRI"? A URL with unicode characters in the URL? In your example, the
URL is *not* in unicode, only the rest of the text is.

The recognizer doesn't even attempt to recognize "internationalized" URLs, nor
do I think it should. (It's just a convienience feature, and these countries
usually use HTML mail anyways, because it works better.)

Or are you saying that normal ASCII-URLs are rendered in right-to-left, i.e.
gro.aidepikiw.eh//:ptth
? (If so, what does that have to do with "Internationalized Resource Identifiers
(IRIs)")?
Summary: linkified text must always be displayed embedded left-to-right → [mozTXTToHTMLConv] linkified text must always be displayed embedded left-to-right
(In reply to comment #3)

> What is an "IRI"? A URL with unicode characters in the URL?

Yes. See RFC 3987, referenced in the URL field.

> In your example, the
> URL is *not* in unicode, only the rest of the text is.

That's because of a bug in Bugzilla's URL recognizing (bug 229010, which
unfortunately went off-topic after the first few comments). The IRI in my
examples is the whole of the line following this one, not just the part of it
linkified by bugzilla:
 http://he.wikipedia.org/מוזילה

> The recognizer doesn't even attempt to recognize "internationalized" URLs, nor
> do I think it should.

They are defined in a published RFC, and I see no reason for not recognizing
them. Anyway, your statement is inaccurate: in mail, unlike Bugzilla, the whole
IRI gets linkified, even when reordering splits it into two parts.
> I see no reason for not recognizing them

Because they are a large and complex topic, which I don't understand, because is
basically coveres all languages of this world, and if we attempt to recognize
them, people will demand that it works correctly, and that's a pretty futile
attempt, because the recognizer uses language customs like word separation etc.
(e.g. in Japanese, they don't necessarily put spaces between URL and text, I
heard). We have to draw a line somewhere, and I think that international URLs
are beyond it. Esp. considering that plaintext is not terribly suited for
international languages anyways, appearantly.

I'm fine with adding a bidi CSS rule, though.
> Esp. considering that plaintext is not terribly suited for
> international languages anyways, appearantly.

I wonder where on earth you got such an  impression. The apparent proliferation
of emails in HTML in some languages (especially spam) has nothing to do with the
suitability of plain text emails for those languages. 
> I wonder where on earth you got such an  impression.

(That's what momoi said, formerly responsible for internationalisation of the
Netscape/Mozilla Mail client and (I think) himself Japanese.)
(In reply to comment #7)
> > I wonder where on earth you got such an  impression.

> (That's what momoi said, formerly responsible for internationalisation of the
> Netscape/Mozilla Mail client and (I think) himself Japanese.)

Yes, he's a Japanese. Anyway, it's very hard to believe he said that. 

Is this bug about adding "direction: ltr; unicode-bidi: embed;" or about making
the recognizer capable of detecting Unicode URLs in all languages?
Yeah, the discussion has been getting a little out of scope here ;-)

Ben, shall I attach a patch with the CSS rule in comment 0? Which file would
that belong in?
themes/.../messenger/messageBody.css, if it's OK to only use it in our viewer.
If it needs to be adhered to when quoting plaintext mails in HTML and sending
that out, it needs to be in the msg source, but that's getting ugly.
Assignee: darin → nobody
QA Contact: benc → networking
Whiteboard: [necko-would-take]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.