The default bug view has changed. See this FAQ.

[mozTXTToHTMLConv] structs with leading/trailing international chars not recognized. For example structured plain text */_éfoobar$_/* not displayed as bold, italic, or underline when there are trailing or leading special or accented characters.

NEW
Assigned to

Status

MailNews Core
Backend
--
minor
16 years ago
a year ago

People

(Reporter: Johannes Teveßen, Assigned: smontagu)

Tracking

(Depends on: 2 bugs, {intl, ux-consistency})

Trunk
intl, ux-consistency
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: See dependency bug 415209)

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

16 years ago
Mozilla Mail and News does not highlight text in messages
that is marked with Asterisks ('*') to be bold if the text
between the asterisks contains "international" characters.

This is annoying in languages like German (Umlauts). For
example, the word

  *nichts*

is written in bold, but the word

  *überhaupt*

is not written in bold, presumably because of the Umlaut 'ü'.

The charset of this mail in the example is set by
the sender to iso-8859-1.

Tested with Mozilla Mail and News in 0.9.5 (Release) on WinNT4.

Comment 1

15 years ago
Windows 2000
./nightly/latest-1.0.0/
Build 2002041617

Confirmed using french characters, in my case the "é" . HOWEVER: reported
behaviour is not correctly described. 

*écriture* - does not work, is NOT bold.
*téléphone* - WORKS, IS BOLD.

Hence, problem occurs when character is FIRST char after "*"

Comment 2

15 years ago
Not a dupe, confirming.
Blocks: 116842
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 3

15 years ago
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0+) Gecko/20020527

I also see it, marking ALL.

pi
OS: Windows NT → All
Hardware: PC → All

Comment 4

15 years ago
I think there *is* a bug about it, but I cannot find it.

This bug is because the code uses |nsCRT::IsAsciiAlpha()| to determine, if it's
an alpha char. Tell me a function that works internationally and I can change it.
Assignee: sspitzer → ben.bucksch
Severity: trivial → minor
Component: Mail Window Front End → Networking
Product: MailNews → Browser
Summary: Message text highlighting broken with 8bit chars → [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized
Target Milestone: --- → Future

Comment 5

15 years ago
*** Bug 142507 has been marked as a duplicate of this bug. ***

Comment 6

15 years ago
*** Bug 123326 has been marked as a duplicate of this bug. ***

Comment 7

15 years ago
*** Bug 149245 has been marked as a duplicate of this bug. ***
>Tell me a function that works internationally and I can change it.

well, if it's a char*, isascii() might be enough but works only for the current
locale.

as there are more than 64000 characters, I don't think there's a function that
tells you for each one if it's an alphanumeric char... at least none I know of.

Comment 9

15 years ago
Sure? isascii() tells you only, wether a certain character belongs to 7bit
ascii. What Ben asked for is a function wich tells him wether a certain
character is alphanumeric.

Comment 10

15 years ago
*** Bug 172800 has been marked as a duplicate of this bug. ***

Comment 11

15 years ago
mass assignment of text->HTML bugs to MailNews w/ esther as QA.
Component: Networking → Mail Back End
Product: Browser → MailNews
Target Milestone: Future → ---
Version: Trunk → other

Comment 12

14 years ago
*** Bug 194032 has been marked as a duplicate of this bug. ***
We have some IsUTF8 function, but a IsUTF8Alpha function needs to be written first.

Comment 14

13 years ago
*** Bug 206298 has been marked as a duplicate of this bug. ***
Product: MailNews → Core

Comment 15

13 years ago
*** Bug 272981 has been marked as a duplicate of this bug. ***

Comment 16

13 years ago
(In reply to comment #8)
> >Tell me a function that works internationally and I can change it.
> 

> as there are more than 64000 characters, I don't think there's a function that
> tells you for each one if it's an alphanumeric char... at least none I know of.

Actually, there are |nsIUGenCategory| and |nsIUGenDetailCategory|.
Keywords: intl

Comment 17

12 years ago
*** Bug 280298 has been marked as a duplicate of this bug. ***

Comment 18

12 years ago
*** Bug 292558 has been marked as a duplicate of this bug. ***

Comment 19

9 years ago
It occurs also with UTF-8, ISO-8859-7, Windows-1253. And it is for all three formatting characters: *, /, _

Comment 20

9 years ago
Currently using isAsciiAlpha() to tell whether it's a letter. Need generic isUnicodeAlpha() which works for all Unicode chars/languages.

Filed bug 415209 for this.
Depends on: 415209
Whiteboard: Currently using isAsciiAlpha() to tell whether it's a letter. Need generic isUnicodeAlpha() which works for all international chars/languages. See dependencies.

Updated

9 years ago
Whiteboard: Currently using isAsciiAlpha() to tell whether it's a letter. Need generic isUnicodeAlpha() which works for all international chars/languages. See dependencies. → See dependencies.

Comment 21

9 years ago
I'm not sure whether replacing isAsciiAlpha() with some isUnicodeAlpha()-like is sufficient, as Tb should format phrases like *13 Ιαν*, or _νοκ-άουτ_, i.e., words and phrases that combine any letters with digits, punctuation marks, etc.

Comment 22

9 years ago
> should format phrases like *13 Ιαν*, or_νοκ-άουτ_, i.e., words
> and phrases that combine any letters with digits, punctuation marks

No, that was an intentional decision not to do that. It's too likely to go wrong for math (simple), ascii-art etc.pp.. It may look a bit arbitrary, but the converter is written with the goal of minimal false positives, even if that means false negatives, esp. given that this is just niceness and nothing depends on this feature.

Comment 23

9 years ago
IMHO, statistics might show you that the majority on the Tb users don't have English as their mother tongue. So, perhaps, you ought to put a little extra effort for them.
Duplicate of this bug: 422796

Updated

9 years ago
QA Contact: esther → backend

Updated

9 years ago
Duplicate of this bug: 436413
Duplicate of this bug: 437395
Product: Core → MailNews Core
Duplicate of this bug: 465679
Duplicate of this bug: 473947
Duplicate of this bug: 826281
Duplicate of this bug: 828485
Duplicate of this bug: 622057
The summary of this bug, while certainly correct and concise, is too technical for both general users and even QA to find it, which contributes to unnecessary inflation of duplicates (currently 16), which is highly undesirable for bug workflow and management as we are wasting time to analyse, discuss, and add testcases for the same problem all over again in each bug. More so given our current scarce manpower in QA.

For exactly and only those reasons, I will add frequent search words that users associate with this bug to the bug summary, with the unfortunate but inevitable side effect that the summary will be longer and less concise than now. However, that appears to me to be clearly the lesser evil compared to wasting more time on duplicates. Feel free to improve the summary, but pls refrain from removing any of the popular search words (and you have no idea what people search for when they search, it's creative language use for sure). What I usually try is to blend relevant search words into a human-readable summary of the bug, which is also helpful for a better understanding of the bug itself and from search results.

Btw, it's a major shortcoming of Bugzilla that we don't have a separate field for adding freetext searchwords, which would be a simple and much superior solution over stuffing summaries. Lacking that, there's currently no other way than using summary for that purpose.
Summary: [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized → [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized (structured plain text like */_éfoobar$_/* with special or accented characters at beginning or end of string is not displayed as formatted bold, italics, or underlined)
I fully appreciate that when Ben assigned himself to this bug in 2002 (comment 4), he was willing to fix it, but was hindered by bug 415209 on which this bug depends, where bug 415209 is obviously a lot harder than this one.

So this bug and its assignee are apparently waiting for bug 415209 to be done first, but different assignee of bug 415209 unfortunately hasn't touched that one for several years either (I've just invited him to continue, in support of Ben's bug).

The net effect is that while this bug appears "assigned", nobody is working on it. Again, I'm looking at this from a bug workflow and management perspective. Such bugs create a false sense of progress and security where there is neither. We could probably assign the whole database that way asserting that somebody will fix it IF all those other blocking bugs were fixed, and we'd end up with an assignment quota like from a book of fairy tales.

I don't see much benefit in such resting assignments. On the contrary, I'd suspect that an inactive assignment might narrow down that little chance of somebody actually coming along to pick this up or contribute new ideas. Why should somebody try a bug that is already assigned? How can I as a QA volunteer credibly invite active coders to try their creativity and ideas on bugs like this (or their blockers) if it's already preoccupied by an inactive assignement where they will be shy to interfere?

In conclusion, I'd recommend that we unassign bugs like this one to keep the door open for others who might wish to work on this or even just add alternative ideas, and to have a more truthful reflection of the bug status in our database. Instead of assignment-on-hold, we could just keep a comment from Ben for the record that he's willing to work on this after somebody else has fixed blocking bug 415209.

How do others think about this? Comments welcome.
Flags: needinfo?
Duplicate of this bug: 208522
(Assignee)

Comment 35

4 years ago
Created attachment 796909 [details] [diff] [review]
106028.diff

This builds, but I'm not sure if it's correct and I don't know how to test it.
Attachment #796909 - Flags: feedback?(ben.bucksch)
(Assignee)

Comment 36

4 years ago
Created attachment 796915 [details] [diff] [review]
Patch

hg export did something strange in the previous attachment
Attachment #796909 - Attachment is obsolete: true
Attachment #796909 - Flags: feedback?(ben.bucksch)
Attachment #796915 - Flags: feedback?
(Assignee)

Updated

4 years ago
Attachment #796915 - Flags: feedback? → feedback?(ben.bucksch)
Flags: needinfo?

Comment 37

4 years ago
Comment on attachment 796915 [details] [diff] [review]
Patch

Approach looks good to me.
- Consider speed. This must process large texts. I don't know how
  fast the functions are that you call in comparison to the previous ones.
- I don't know where IsAlpha() comes from, nor
  why the change from PRUnichar to uint32_t (that doesn't look right to me).
Attachment #796915 - Flags: feedback?(ben.bucksch) → feedback+

Comment 38

4 years ago
> - I don't know where IsAlpha() comes from, nor

Ah, from bug 415209
Summary: [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized (structured plain text like */_éfoobar$_/* with special or accented characters at beginning or end of string is not displayed as formatted bold, italics, or underlined) → [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized
(In reply to Ben Bucksch (:BenB) from comment #38)
> > - I don't know where IsAlpha() comes from, nor
> 
> Ah, from bug 415209

Fixing Ben's accidental truncation of summary from comment 38, sorry for spam.

Longer summary, while not ideal, is required for QA workflow because of design shortcoming of Bugzilla, as explained with several clear reasons in my comment 32. In view of that comment, deliberate truncating of summary without refuting those arguments would be offensive, nonsensical and an open violation of cooperative spirit in Bugzilla, especially if you're not pushing large chunks of QA work as I do. So I assume it would be very unfortunate and I don't want to believe that Ben would deliberately insist on offending my work like that by annihilating my changes to the bug, while - all differences aside - I've actually succeeded to get some traction on a bug which was assigned to him since 2002 and hasn't seen any activity except piling up duplicates since 2008.
Summary: [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized → [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized (structured plain text like */_éfoobar$_/* with special or accented characters at beginning or end of string is not displayed as formatted bold, italics, or underlined)

Comment 40

4 years ago
For testing, see http://mxr.mozilla.org/comm-central/source/mozilla/netwerk/test/unit/test_mozTXTToHTMLConv.js
Assignee: ben.bucksch → smontagu
(Assignee)

Comment 41

4 years ago
(In reply to Ben Bucksch (:BenB) from comment #37)
> Comment on attachment 796915 [details] [diff] [review]
> - Consider speed. This must process large texts. I don't know how
>   fast the functions are that you call in comparison to the previous ones.

I believe the functions in nsUnicodeProperties.cpp are well enough optimized for speed that this won't be a problem.

> - I don't know where IsAlpha() comes from, nor
>   why the change from PRUnichar to uint32_t (that doesn't look right to me).

Ideally we should support any Unicode character up to U+10FFFF, not just the range up to U+FFFF that can fit in a PRUnichar. However, doing this properly will require disproportionately more work, and I've decided to postpone it to a follow-up bug.

Comment 42

4 years ago
(In reply to Thomas D. from comment #39)
> Fixing Ben's accidental truncation of summary from comment 38, sorry for
> spam.
> 
> Longer summary, while not ideal, is required for QA workflow because of
> design shortcoming of Bugzilla, as explained with several clear reasons in
> my comment 32. In view of that comment, deliberate truncating of summary
> without refuting those arguments would be offensive, nonsensical and an open
> violation of cooperative spirit in Bugzilla, especially if you're not
> pushing large chunks of QA work as I do. So I assume it would be very
> unfortunate and I don't want to believe that Ben would deliberately insist
> on offending my work like that by annihilating my changes to the bug, while
> - all differences aside - I've actually succeeded to get some traction on a
> bug which was assigned to him since 2002 and hasn't seen any activity except
> piling up duplicates since 2008.

Don't CC me to any more bugs, Thomas.  I can't stand reading your self-important pronouncements.

Comment 43

4 years ago
@Thomas @Mike By the way, there are users facing this bug everyday... Time for change? :-) Thanks in advance to you guys for your precious help!
(Assignee)

Comment 44

4 years ago
Created attachment 798299 [details] [diff] [review]
Patch v.2

I removed the attempt to support supplementary characters and added a unit test. I also tested manually in a thunderbird build and everything seems to be working.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=39d247f29958
Attachment #796915 - Attachment is obsolete: true
Attachment #798299 - Flags: review?(ben.bucksch)

Comment 45

4 years ago
Comment on attachment 798299 [details] [diff] [review]
Patch v.2

Review of attachment 798299 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/xpcshell.ini
@@ +15,5 @@
>  [test_auth_proxy.js]
>  [test_authentication.js]
>  [test_authpromptwrapper.js]
>  [test_backgroundfilesaver.js]
> +[test_bug106028.js]

Please use a more descriptive file name.

Updated

4 years ago
Summary: [mozTXTToHTMLConv] structs with leading/trailing 8bit chars not recognized (structured plain text like */_éfoobar$_/* with special or accented characters at beginning or end of string is not displayed as formatted bold, italics, or underlined) → [mozTXTToHTMLConv] structs with leading/trailing international chars not recognized

Comment 46

4 years ago
I wrote in comment 37:
> - Consider speed. This must process large texts.

I retract that, because I remember that we go through this code only when we found one of these */_ marker characters, so it's not a big deal.

smontagu wrote in comment 41:
> Ideally we should support any Unicode character up to U+10FFFF

I don't think there's a need to it. Can you leave this with PRUnichar? r=BenB with this change.

Remember that we never claimed we'd recognize *everything*. Let's not get overboard with supporting Maya language.

(In fact, more important would be to recognize the emphasis in the last sentence, before sentence markers like .,;"' , without triggering any other false positives.)

Code, testcase:
> 	"\u03C5\u03C0\u03BF\u03B3\u03C1\u03AC\u03BC\u03BC\u03B9\u03C3\u03B7",
>       // Greek υπογράμμιση

Minor NIT:
If you can add the Greek characters as a comment, can you add them directly to the string literal instead of the escaped \u? That would require that your HTML doc is parsed as UTF8.
If that poses any problem or costs significant time, please ignore this comment.
(In reply to Ben Bucksch (:BenB) from comment #46)
[...]
> smontagu wrote in comment 41:
> > Ideally we should support any Unicode character up to U+10FFFF
> 
> I don't think there's a need to it. Can you leave this with PRUnichar?
> r=BenB with this change.
> 
> Remember that we never claimed we'd recognize *everything*. Let's not get
> overboard with supporting Maya language.
[...]

Maybe not Maya, but what about CJK Extensions B, C and D in code pane 2 (U+20000 to U+2B81F)? My Chinese friends tell me they use some of them.
(Assignee)

Comment 48

4 years ago
(In reply to Ben Bucksch (:BenB) from comment #46)
> smontagu wrote in comment 41:
> > Ideally we should support any Unicode character up to U+10FFFF
> 
> I don't think there's a need to it. Can you leave this with PRUnichar?
> r=BenB with this change.

Well we have to cast to uint32_t at *some* point before passing to mozilla::unicode::GetGenCategory. If it isn't in this patch it will need to be in the patch for bug 415209, so adding needinfo jfkthame.
Flags: needinfo?(jfkthame)
I think we should bite the bullet (it's not *that* hard!) and handle surrogate pairs properly here. Otherwise we risk passing isolated surrogates to the Unicode character-category functions, which is basically meaningless.
Flags: needinfo?(jfkthame)
unless it can be demonstrated that this bug does not cause "(structured plain text like */_éfoobar$_/* with special or accented characters at beginning or end of string is not displayed as formatted bold, italics, or underlined)", then supplementary phrase of this type should remain in the bug summary - for all the reasons that other people have stated. 

On behalf of intl users, thanks everyone for working on this.
Severity: minor → normal
Summary: [mozTXTToHTMLConv] structs with leading/trailing international chars not recognized → [mozTXTToHTMLConv] structs with leading/trailing international chars not recognized. For example structured plain text */_éfoobar$_/* not displayed as bold, italic, or underline when there are trailing or leading special or accented characters.

Comment 51

4 years ago
Thanks a lot to you guys for your support. By the way, I bumped recently in a new side effect caused by the / / markup, that could also hide another bug... See the bug 913768 report for more. Thanks in advance for your help!
(Assignee)

Comment 52

4 years ago
On second thoughts, perhaps the test should include "mark" characters as well as "letter" characters, to cover a case like "_souligné_" (using ASCII e and U+0301 COMBINING ACUTE ACCENT rather than U+00E9 LATIN SMALL LETTER E WITH ACUTE)
Severity: normal → minor

Comment 53

3 years ago
This bug actually applies only when either the first or last character of the string is not a "Western" alphabetical character.  Thus, while the problem is seen for the string überhaupt, it is not seen for the string xüberhaupt.  

The problem also appears when the first or last character is numeric or a "special" character.  Thus, the problem appears with the following character strings:  
  12345
  1b2c3d4e
  a1b2c3d4
  $1b2c3d4e
  a1b2c3d4#
but not with a1b2c3d4e.  

It was argued in a comment to bug #949066 that the handling of numeric characters -- not applying the markup -- is not a problem, that it is intentional so as not to affect mathematical equations.  That argument is invalid.  My degree is in mathematics.  There are many equations and formulae that have alphabetic terms without any numeric characters.

Updated

3 years ago
Duplicate of this bug: 949066

Updated

3 years ago
Summary: [mozTXTToHTMLConv] structs with leading/trailing international chars not recognized. For example structured plain text */_éfoobar$_/* not displayed as bold, italic, or underline when there are trailing or leading special or accented characters. → [mozTXTToHTMLConv] structs with leading/trailing international or numeric chars not recognized. For example structured plain text */_éfoobar$_/* not displayed as bold, italic, or underline when there are trailing or leading special or accented characters.
Created attachment 8347713 [details]
Testcase1.eml: Structs vs. Maths (showing that numbers should just be formatted like any other structs)

Per Bug 949066 Comment 1, plain numbers in structs (more precisely, struct text content where 1st character is numeric) are intentionally not formatted because
> math equations would be messed up.

(In reply to David E. Ross from comment #53)
> That argument is invalid.
+1

Testcase1.eml tries this hypothesis, and proves it wrong:

The only valid, numerical struct...
123 *345* 678
...can never be correct mathematical syntax (wrong spacing), so it should just be formatted like any other alphabetical character struct (this bug).

Otoh, correct mathematical syntax will never be formatted as a struct (wrong spacing again, correctly not recognized as a struct):
123 * 345 * 678
123*345*678
That's valid mathematical syntax, but not a valid struct - no problem again.
(In reply to Thomas D. from comment #55)

I've already shown in attachment 8347713 [details] that "structs vs. maths" is a myth, so e.g. *123* should just be formatted bold the same way we format *foo*. This has major implications on how we fix this bug, so let me add some more arguments before conclusions:

Arguments supporting that numerical structs *123* should be formatted like alphabetical structs *foo*

1) Valid struct syntax like *123* around numbers will always be invalid maths syntax (wrong spacing, see attachment 8347713 [details]), so formatting that is perfectly ok.

2) Valid maths syntax like 123 * 456 * 789 is always invalid struct syntax, so it will not be formatted anyway (because it's not recognized as a struct, regardless of alphabetic or numerical content). So again, no special rules for numbers required.

3) Suppose there would be valid maths syntax that is also valid struct syntax, and we'd actually format the number to be bold or italics - still no big deal:
When TB message reader displays a struct like 123 *456* 789 (invalid maths syntax), the struct characters (*,/) are always preserved, so any potential mathematical equation would still be correctly printed, albeit with a little formatting - where's the problem?
You can even copy that and paste without formatting, or paste into text editor, or paste into word and just remove formatting. No big deal.
But anyway, unless shown otherwise, this is hypothetical and cannot occur.

4) How likely is it for users to send mathematical equations in a plaintext message, given that there are dozens of tools out there to produce nice graphical equations with proper display of fractions etc.? Sorry, but imho the scenario of mathematical equations in plaintext emails belongs to the realm of plaintext myths, of which there are far too many around TB, and mostly told by the very same few plaintext lovers against better evidence. Times have changed. Mathematical equations are much more likely to be found in appropriate file attachments.

5) On the other hand, given 1), there's actually a pretty high chance that numerical structs like *123* are intentionally used as structs by those users who actually use structs. So for that much more plausible scenario, we currently fail (this bug), which is also ux-inconsistent because there's just no convincing reason why number structs should not be formatted.

q.e.d.
(In reply to Thomas D. from comment #56)

> Arguments supporting that numerical structs *123* should be formatted like alphabetical structs *foo*

(In reply to David E. Ross from comment #53)
> There are many equations and formulae that have alphabetic terms without any numeric characters.

6) Given that many mathematical equations and formulae use alphabetic terms (even without any numeric characters), current algorithm (if it were applicable to valid maths, see 1) would only exclude a small and rather random subset of mathematical expressions from formatting by structs (why numbers only?). That's also ux-inconsistent. Assuming we'd rather keep structs than cater for *invalid* maths syntax (see 1), again, structs win because excluding only one half (or less) of mathematical expressions would not make much sense.
This bug violates ux-consistency as explained in comment 53 to comment 57 for numeric structs like *123*, and by analogy, for structs with leading/trailing international or special characters, too - no reason to exclude such structs from formatting.
Keywords: ux-consistency
(In reply to Thomas D. from comment #56)
> I've already shown in attachment 8347713 [details] that "structs vs.
> maths" is a myth, so e.g. *123* should just be formatted bold the same way
> we format *foo*. This has major implications on how we fix this bug

As shown in testcase1.eml, attachment 8347713 [details], comment 56 and comment 57,
it's wrong and unnecessary to ignore structs with leading/trailing numeric character (e.g. *123* or *10EUR* or *EUR 10*) and treat them differently from alphabetical structs (e.g. *foo*).

Per this bug, it's also wrong to ignore structs with leading/trailing international/special characters (e.g. *écriture*, *$1 US*, *13 Ιαν*) and treat them differently from simple ASCII alphabetical structs (e.g. *foo*).

*** Conclusion (wrt fixing this bug): ***

For formatted rendering of structs, the type of the leading/trailing character of the inner text is irrelevant. We can just remove the entire special-casing of alphabetical characters vs. numeric characters, and render all structs correctly formatted as they occur, regardless of the character type of the first or last character. (I can't think of any leading/trailing character that should cause the user's struct formatting intention to be ignored, can you?)

So we no longer need complicated functions like isUnicodeAlpha() to verify the nature of leading/trailing characters inside structs; iow, this bug no longer depends on Bug 415209.

As a nice side-effect, removing the special-casing of numeric characters, currently realized as incomplete special-casing of alphabetical characters, will also simplify the code and improve performance. And we'll allow our international users to enjoy Ben's struct algorithms.
Looks like a win-win for everyone. :)
No longer depends on: 415209
Whiteboard: See dependencies.

Comment 60

3 years ago
I have also determined that at least some special characters (e.g., $, #) appearing first or last in a string cause the markup to be ignored.  Thus, the problem is seen with the following despite the fact that the first and last alphanumeric characters are alphabetic and not numeric:  
  *a1b2c3d4e$*
  *a1b2c3d4e#*
  *$a1b2c3d4e*
  *#a1b2c3d4e*
I decline to test other special characters.  

I must concur with the conclusion stated in comment #59, especially the sentence:  
> For formatted rendering of structs, the type of the 
> leading/trailing character of the inner text is irrelevant.
(Assignee)

Comment 61

3 years ago
(In reply to Thomas D. from comment #59)
> For formatted rendering of structs, the type of the leading/trailing
> character of the inner text is irrelevant. We can just remove the entire
> special-casing of alphabetical characters vs. numeric characters, and render
> all structs correctly formatted as they occur, regardless of the character
> type of the first or last character. (I can't think of any leading/trailing
> character that should cause the user's struct formatting intention to be
> ignored, can you?)
> 
> So we no longer need complicated functions like isUnicodeAlpha() to verify
> the nature of leading/trailing characters inside structs; iow, this bug no
> longer depends on Bug 415209.

Wait a minute. This bug has now morphed too far from its original description. Bug 949066 was duped to here, even though its scope was rather different and this bug has now been changed into a dupe of that bug. Better to keep bug 949066 as a separate RFE. If that is fixed, this will become WONTFIX.
Summary: [mozTXTToHTMLConv] structs with leading/trailing international or numeric chars not recognized. For example structured plain text */_éfoobar$_/* not displayed as bold, italic, or underline when there are trailing or leading special or accented characters. → [mozTXTToHTMLConv] structs with leading/trailing international chars not recognized. For example structured plain text */_éfoobar$_/* not displayed as bold, italic, or underline when there are trailing or leading special or accented characters.
(Assignee)

Updated

3 years ago
Depends on: 949066

Comment 62

3 years ago
Thomas D. in comment 59:
> No longer depends on: 415209
> Whiteboard: See dependencies.

Don't mess with the dependencies that I added. This is an actual, hard code-level dependency and the whole reason why this bug here exists.
Depends on: 415209
No longer depends on: 949066

Updated

3 years ago
Whiteboard: See dependency bug 415209

Comment 63

3 years ago
To translate: I would *like* to have any accented or non-English alphabetic characters to be recognized.
I *do not* want numbers or other special symbols after * to be recognized as structs. There is a too high risk of false positives, e.g. in ASCII art or other strings of special characters.

Comment 64

3 years ago
xref bug 950605 for numbers/digits.
Depends on: 950606
To everyone affected by this bug 106028:

Please have a look at xref bug 950606.

Ben has just filed xref bug 950606 in which he seeks to ignore even more of what he considers "special characters" in the structs parsing algorithm. Even everyday puncuation like dots, commas, round brackets or $ signs will no longer be possible inside structs. So if that bug succeeds, we'll see even more struct variants fail or continue to fail as they do now:

Ben's Bug 950606 wants structs like the following to *fail* (*not* get formatted *bold* in msg viewer):

a) *(wth!)*                              leading and trailing brackets ("special chars")
b) *I really hate inconsistent design!*  trailing exclamation mark ("special char")
c) *Good design needs user input, design principles, good reasons and cooperation!*
                                 More than 4 words (Ben has determined that structs should not have
                                 more than "up to 3 or 4 words", see bug 950606, comment 0);
                                 inner comma, and exclamation mark ("punctuation")
d) M$ really make *XXL$$$*       trailing $ sign ("special char")
e) *$ 20*                        leading $ sign ("special char")
f) *(!foo.bar)*                  leading round bracket ("special char")
g) *I lv u 4ever. Really!*       inner full stop; trailing exclamation mark ("punctuation")

Fwiw, I think all of these should be recognized and rendered with formatting just like any other struct. That's what I proposed in comment 59. I am interested what others think.
(In reply to Ben Bucksch (:BenB) from comment #62)
> Thomas D. in comment 59:
> > No longer depends on: 415209
> > Whiteboard: See dependencies.
> 
> Don't mess with the dependencies that I added.

Ben, I never "mess with the dependencies". Pls refrain from such personally abusive language which deliberately tries to discredit my painstaking work of bug triaging. After more than 6 years of being an active contributor and doing bug triage on thousands of bugs (see my BMO profile and activity log), I definitly know what I'm doing, and strangely there have almost never been any such problems with my 12700+ activities on TB bugs except on what you consider "your" bugs.

Pls be advised that such agressive comments violate not only basic standards of mutual respect and cooperation among fellow contributors, but they can easily be read as a personal attack against me, in violation of the rules here on bmo:
https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
> 3. No abusing people.

On a more factual level, it's inappropriate to complain about "messing with dependencies" while I've provided extensive reason for this particular *change* in dependencies (and it's easy to undo, so where's the problem?). Instead of railing at your fellow contributors, what about *answering my comments in detail*, starting from my comment 55 and testcase attachment 8347713 [details]?

> This is an actual, hard
> code-level dependency and the whole reason why this bug here exists.

Fwiw, that's just not true like that, and think you know that (if not, please re-read my comment 59).
It all depends on the solution which the TB community(!) prefers for this bug:

- For /your/ personally preferred solution, yes, bug 415209 is required to create more special cases of accepting international characters in structs and continue to fail for structs containing other characters that /you/ consider "special" characters, like dots(.), commas(,), $-signs and even round brackets (()).
- For the more comprehensive solution which /I/ proposed in comment 59 for solving this bug and other bugs and return ux-consistency to structs parsing, bug 415209 is *not* required, that's why I correctly removed the dependency.

So it's just that we are heading in different directions:
Ben wants to add *even more* special cases, including more special cases which will be *ignored* by structs parsing (see xref bug 950606), while users and I are requesting the exact opposite:

There's more than enough evidence on this bug and its 18 duplicates (including bug 949066 which unfortunately got moved out from here) that users are not happy with the current special-casing in structs algorithm due to its failure in terms of ux-consistency. (I encourage such users to vote for this bug and bug 949066).

The solution I proposed in comment 59 tries to address these real-life problems faced by users. Afasics it's also in line with advice from the UX lead, Blake Winton, who just commented against special-casing on bug 949066:

(In reply to Blake Winton (:bwinton) from bug 949066, comment #15)
> Also, I strongly suspect that the "maths" argument is already kind of messed
> up due to things like "5*a*b", and perhaps we want to figure out something
> better to do there, instead of adding special cases to the struct parsing.
(In reply to Ben Bucksch (:BenB) from comment #63)
> To translate: I would *like* to have any accented or non-English alphabetic
> characters to be recognized.

> I *do not* want numbers or other special symbols after * to be recognized as
> structs. There is a too high risk of false positives, e.g. in ASCII art or
> other strings of special characters.

So that's what /Ben/ wants and claims.

I've provided detailed arguments why the usecase of ASCII art is another non-argument, in Bug 949066 Comment 12. Swap "maths" against "ASCII art" and imo :bwinton's bug 949066, comment #15 applies seamlessly, again:

(In reply to Blake Winton (:bwinton) from bug 949066, comment #15)
> Also, I strongly suspect that the "maths" [and "ASCII art", T.D.] argument is already kind of messed
> up due to things like "5*a*b", and perhaps we want to figure out something
> better to do there, instead of adding special cases to the struct parsing.

In short, because ASCII art can include any alphabetical character (not just "special" characters), in theory they would/could already fail now for those default structs like *foo* which we render formatted.

Comment 68

3 years ago
Please do NOT add me to the CC list of any bug report.  I have a list of bugs that I am tracking and a query that uses that list.

Comment 69

2 years ago
I would like to propose the following solution.

Start applying the markup when there is a blank before the *, /, or _ but not after; and end applying the markup when there is a blank after the character but not before.  

This would mean a*b*c or a * b * c as mathematical expressions would be preserved with the asterisks visible and the expressions not bold.  On the other hand *a*b*c* would have the internal asterisks visible but the expression bold.  

This would mean 3/4 would be visible as a fraction not Italic, but /3/4/ would be visible as a fraction that is Italic.  

This would mean show_bug (from the URI of this bug report) would have the underline visible as part of the path (not underlined), but _show_bug_ would have the internal underline visible with the phrase underlined.  Depending upon the user's fonts and monitor settings, the internal underline might be hidden by the overall underlining, but that is no different from how a Web page might appear with <span style="text-decoration:underline">show_bug</span>.  While underlining is generally discouraged in Internet communications for this reason, this is a problem for the user and not relevant to fixing this bug report.

Comment 70

2 years ago
Also note that, in a message containing an invoice for payment, the last itemized amount should be underlined before the line containing the total.  But _$123.74_ will not work.
You need to log in before you can comment on or make changes to this bug.