Closed
Bug 944773
Opened 12 years ago
Closed 10 years ago
XML tags and urls in right-to-left texts in verbatim
Categories
(Webtools Graveyard :: Verbatim, defect)
Webtools Graveyard
Verbatim
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: amir_farsi, Unassigned)
Details
Attachments
(1 file, 1 obsolete file)
370.05 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36
Steps to reproduce:
problem is for format of any left to right text in right to left texts(specially for codes, urls, tags, xml and...).I have problems in verbatim. The verbatim shows XML tag error.
Actual results:
Shows XML error in verbatim on complex right-to-left texts. The result is complex and long text made untranslatable for right-to-left languages, specially Persian(Farsi)
![]() |
Reporter | |
Updated•12 years ago
|
Severity: normal → major
Updated•12 years ago
|
Component: fa / Persian → Verbatim
Product: Mozilla Localizations → Webtools
Version: unspecified → Trunk
Comment 1•12 years ago
|
||
If there are XML tag errors then your XML tags are in most cases incorrect.
It is hard to determine this in RTL languages. Verbatim uses the standard textarea so any potential RTL issues here are impacting to everyone.
The most useful keyboard shortcut in these cases is, Ctrl+Shift+x which allows you to toggle between RTL and LTR. In LTR you will be able to correct any broken variables or tags
![]() |
Reporter | |
Comment 2•12 years ago
|
||
Yes. I used buttons, for switch between RTL and LTR. It made another problems for typing.
Anyway, I think a problem is in processing LTR texts which are in RTL texts.
For examples i can edit texts which have only one xml code or url, but when it has multiple xml codes in text, or it contains long xml strings ,brackets, and other symbols, like {} the work for editing the text in RTL language is very hard.
The rtl and ltr buttons couldn't solve the problem good.
Also, i think problem is when the system changes the direction of text to RTL, it think all of the text has RTL direction! For example system changes the direction of { to } and } to {, and it making problem in long texts and texts with multiple xml tags.
I think Ehsan Akhgari can understand what is my problem?! You know this problem is in most of applications, except professional word processors. Mr. Akhgari, if you can describe the problem, please describe here.
Thanks for your comment.
![]() |
Reporter | |
Comment 3•12 years ago
|
||
I added a screenshot from errors and status of text.
![]() |
Reporter | |
Updated•12 years ago
|
Attachment #8340598 -
Attachment is obsolete: true
Attachment #8340598 -
Attachment is patch: true
Attachment #8340598 -
Attachment mime type: image/jpeg → text/plain
Updated•12 years ago
|
Attachment #8340598 -
Attachment is patch: false
Updated•12 years ago
|
Attachment #8340598 -
Attachment mime type: text/plain → image/jpeg
Comment 4•12 years ago
|
||
OK I see you solved the bug, I found Shift-Ctrl-X useful to validate that. You had a stray > that cause the error I think. Can you please close the bug if it is solved.
![]() |
Reporter | |
Comment 5•12 years ago
|
||
This keys, only changing and switching the direction of all of text between RTL and LTR.
But my problem is not changing direction of all text in the translation text box. In the verbatim system for RTL languages, the direction of translation text box is RTL by default and it don't require to switch from LTR to RTL.
I removed that attachment because i uploaded that attachment invalid.
I think this new attachment Can describe this problem visually.
My problem is writing correct ditection of LTR text in RTL text.
Comment 6•12 years ago
|
||
Amir, the problem that you're facing is very common. Basically, the Unicode Bidi Algorithm (UBA) which is what pretty much all software use these days to lay out bidirectional text specifies some characters as having strong directionality (examples: a and ب) while some others (such as < and >) have weak directionality. The directionality of those weak characters is basically determined by their surrounding characters (note that I'm over-simplifying here) which means that the angle bracket characters used in HTML tags will get the wrong direction if their preceding character has strong RTL directionality. It's impossible for a program to determine whether this heuristic will result in a bad rendering. It doesn't most of the time, but it does especially when working on bidi HTML source code.
You _can_ work around this problem by adding unicode marker characters (RLM and LRM) which are invisible characters having strong RTL and LTR directionality respectively, but then those characters will end up in your HTML code and will potentially break the actual layout of the rendered HTML.
There is really no good solution here, I'm afraid.
![]() |
Reporter | |
Comment 7•12 years ago
|
||
Thanks Ehsan, for your very good description. Your description shows the problem which i reported, is a common problem, and how it happens. But Ehsan, should there be a solution, for processing and rendering directionality of bidirectional texts, for example character shouldn't get their direction from surrounding characters, they can be have separate direction processing. At least we can have separated direction processing for characters such as < , > which maybe part pf html and other markup languages. if it embed in the rendering systems, softwares and algorithms).
I think one day the problems of bidirectional text must be solved. Ehsan, can you run an open source project for more research on solving this problem in Mozilla?
Also i used that Unicode marker chars, but they made another problems, and not solved the problem.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #6)
> Amir, the problem that you're facing is very common. Basically, the Unicode
> Bidi Algorithm (UBA) which is what pretty much all software use these days
> to lay out bidirectional text specifies some characters as having strong
> directionality (examples: a and ب) while some others (such as < and >) have
> weak directionality. The directionality of those weak characters is
> basically determined by their surrounding characters (note that I'm
> over-simplifying here) which means that the angle bracket characters used in
> HTML tags will get the wrong direction if their preceding character has
> strong RTL directionality. It's impossible for a program to determine
> whether this heuristic will result in a bad rendering. It doesn't most of
> the time, but it does especially when working on bidi HTML source code.
>
> You _can_ work around this problem by adding unicode marker characters (RLM
> and LRM) which are invisible characters having strong RTL and LTR
> directionality respectively, but then those characters will end up in your
> HTML code and will potentially break the actual layout of the rendered HTML.
>
> There is really no good solution here, I'm afraid.
Comment 8•12 years ago
|
||
Well the issue is that the Unicode Bidi Algorithm has not been designed to deal with things such as laying out a bidi XML source code. It's mostly designed with the natural language use cases in mind, so it's not entirely a surprise that it falls down badly in this case.
I've never seen a better proposal which can handle both natural language and these "programming" use cases. Also note that the UBA is implemented in pretty much every piece of software that displays text on any platform (at least those that are bidi aware) so replacing it with another algorithm is a huge challenge, if not impossible.
Comment 9•10 years ago
|
||
Resolving this bug as Ehsan has described the problem very well and since this is a UBA-specific issue and the cost is too high to attempt to replace UBA with something else.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•10 years ago
|
Product: Webtools → Webtools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•