Open Bug 950605 Opened 11 years ago Updated 2 years ago

[mozTXTToHTMLConv] Recognize *structs* that start/end with digits

Categories

(MailNews Core :: Backend, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: BenB, Unassigned)

References

Details

This bug is to discuss whether we will, and possibly implement to, recoginize structs of the form:

There were *300* (in words three hundred!) people.

So far, the current recognizer intentionally excludes structs that start or end with anything that is not a letter, to err on the safe side.
It also requires that around the *, there is a "delimiter", which is defined to be anything other than letters, digits or the character itself.

Please add testcases involving digits at the start or end a) which could be recognized b) which should not be recognized. Based on this, I'll then make a decision how or if we implement this.
Summary: [nozTXTToHTMLConv] Recognize *structs* that start/end with digits → [mozTXTToHTMLConv] Recognize *structs* that start/end with digits
(In reply to Ben Bucksch (:BenB) from comment #0)
> This bug is to discuss whether we will, and possibly implement to,
> recoginize structs of the form:
> 
> There were *300* (in words three hundred!) people.


(In reply to Thomas D. from Bug 949066, comment #10)
> (In reply to Ben Bucksch (:BenB) from comment #8)
> > I've filed bug 950605 about digits.
> 
> The logic of filing a new bug for the very same issue of this bug completely
> escapes me, given that
> - the entire discussion about digits is documented in bug 106028 and here
> - just tweaking two words of this bug's summary could have morphed this bug
> back into its original intention concerning digits in structs:
> Plain-Text Markup Should work even if the first or last character of the
> string is /numeric/
> - briskly closing bugs with a two-liner and without further input from
> anyone except Ben himself, nor any further discussion, does not look like a
> nice, wise, and cooperative way of handling real user problems and find
> creative, and balanced solutions to these ux-problems

For the reasons above, in my capacity as one of the main bug triagers for TB, I close this bug because it's just a duplicate of bug 949066. It violates established procedure of efficient bug management to have a new bug asking for discussion from scratch while the entire discussion is already on record in bug 949066, so we can just take it from there. And with only 12 comments, it's not like the other bug is out of hands.

> So far, the current recognizer intentionally excludes structs that start or
> end with anything that is not a letter, to err on the safe side.

We need to find out what that "safe side" actually is, given the resulting failures in ux-consistency that are on record to cause problems for our users.

> Please add testcases involving digits at the start or end a) which could be
> recognized b) which should not be recognized.

Testcases already referenced in bug 949066, e.g. attachment 8347713 [details] (originally posted in bug 106028 which is essentially the same user story, currently 18 duplicates including twin bug 949066).

> Based on this, I'll then make
> a decision how or if we implement this.

No Ben, it's not you alone you will make that decision, as it's not "your" bug just because you filed it. We work as a team, and notwithstanding your great achievements in some areas of coding, you are just one experienced volunteer among others. Developers, even module owners, need to listen to input from users, volunteer contributors, bug triagers, UX experts, peers and UX leads.

I've invited Mozilla's Blake Winton for further UX input on some of the respective bugs.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
(In reply to Ben Bucksch (:BenB) from comment #0)
> Based on this, I'll then make a decision how or if we implement this.

^^ see my comment 1
REOPEN. And Thomas D. is now forbidden from commenting here, because he disrupts reasonable discussion. See bug 949066 comment 17 for reasons. No place for political campaigns here.

We need a rational discussion, to come to a good solution that works in all cases.
Rational suggestions and solutions are welcome, from any party.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Status: REOPENED → NEW
(In reply to Ben Bucksch (:BenB) from comment #3)
> REOPEN. And Thomas D. is now forbidden from commenting here,

Ben, don't you ever remove my CC from any bugs unless I remove myself. You're not the owner of any bug on BMO, and as a longterm respected contributor on BMO with canconfirm and editbugs privs (see my bmo profile and activity log), I have every right to be present. So removing me is in open violation of BMO rules: https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
Fwiw, I'll probably cut down on comments or perhaps even refrain from commenting here (unless provoked or required for the common good of TB UX), as I'm also tired of discussing with you.

> because he disrupts reasonable discussion. See bug 949066 comment 17 for reasons.
> No place for political campaigns here.

Ben, pls stop the flamewars against me, and turning the truth upside-down. The fact that you're using less words than me doesn't make you any less campaigning, it's just campaigning in stealth mode. Your unwarranted and effectively uncommented closing of bugs that you don't like and reopening new bugs like this one for the same story is also campaining. What you call my "campaigns" always comes with evidence, explanation, and detailed arguments that are out in the public domain for everyone to see and comment in favour or against. I wish the same was true for you. (Fwiw, my reply to bug 949066 comment 17 is bug 949066 comment 19).

I myself have /started/ reasonable and rational discussion by providing a testcase about "maths vs. structs" in bug 106028, comment 55 (attachment 8347713 [details]) proving that "simple maths" in plaintext messages cannot be affected by structs, and providing further evidence in Bug 106028 Comment 56 ff. Simon moved that entire discussion into Bug 949066 (accept leading/trailing digits in structs) where it originated, fair enough (albeit imo neither helpful nor required); and David E. Ross tweaked two words of the summary to widen the scope of that bug to match the more general solution which I proposed in Bug 106028 Comment 59, to which you did not agree. Instead of just tweaking back two words in the summary of Bug 949066 to continue with "digits in structs" there, it was /you/ who then "disrupted reasonable discussion" by just terminating/annihilating the entire rational discussion of Bug 949066 and refiling the same story in this bug, effectively a duplicate of bug 949066, just so that you can have it /your way/ per your dogma of personal ownership and de facto unquestionable authority of this code.

> We need a rational discussion, to come to a good solution that works in all
> cases. Rational suggestions and solutions are welcome, from any party.

+1. (Caveat: Opinions about what's "rational" and who defines that might differ...)

Per Ben's public request in this bug's comment 0, I reserve the right to contribute testcases, analysis, rational suggestions and solutions, as I've already done in both of the other bugs, bug 106028 and bug 949066, but Ben preferred to restart the same discussion here from scratch.
This is intentional. See description: http://www.bucksch.org/1/projects/mozilla/16507/

Please note the general guideline: "Failures should be minimized. A wrong recognition is a failure, not recognizing a structure/formatting is not seen as failure."

Rationale: 5*b*m² = c is a mathematical formula. Given the above guideline, I excluded everything that starts/ends with a number, because it's more likely part of a formula than a bold word.

WONTFIX (works as designed)
Another rationale is that no mathematical formula begins and ends with an asterisk, so in my opinion every string enclosed in asterisk separated from others should be recognized. The only false positive I can think of with this approach would be messy formulas written like 5 *b* m^2=c.
(In reply to karaluh from comment #7)
> Another rationale is that no mathematical formula begins and ends with an
> asterisk, so in my opinion every string enclosed in asterisk separated from
> others should be recognized. The only false positive I can think of with
> this approach would be messy formulas written like 5 *b* m^2=c.

+1. Thank you karaluh for providing evidence from a mathematical viewpoint.

Testcases (as requested per Ben's comment 0):

5y * 3 * m² = c  valid maths; not a struct
5y / 3 / m² = c  valid maths; not a struct

5y*3*m²=c        valid maths; not a struct
5y/3/m²=c        valid maths; not a struct

5y *3* m²=c      invalid/messy maths; valid struct (should be recognized per this bug, see examples below)
5y /3/ m²=c      invalid/messy maths; valid struct (should be recognized per this bug, see examples below)

5y* 3* m²=c      invalid/messy maths; not a struct
5y/ 3/ m²=c      invalid/messy maths; not a struct

Real life struct examples that start/end with digits:

We ordered *3*, not 5 containers.   no/invalid/messy maths; valid struct (should be recognized per this bug)
We ordered /3/, not 5 containers.   no/invalid/messy maths; valid struct (should be recognized per this bug)

The correct price is *$ 300*, not $ 200.  no maths; valid and intentional struct (should be recognized per this bug)
The correct price is *$300*, not $200.    no maths; valid and intentional struct (should be recognized per this bug)
The correct price is /$ 300/, not $ 200.  no maths; valid and intentional struct (should be recognized per this bug)
The correct price is /$300/, not $200.    no maths; valid and intentional struct (should be recognized per this bug)

For comparison (ux-consistency):

We ordered type *A*, not type B.   no maths -> valid struct (correctly recognized)
We ordered type /A/, not type B.   no maths -> valid struct (correctly recognized)

Conclusion:

1) There's no evidence for the claim that the proposal to "Recognize *structs* that start/end with digits" (this bug, current summary) will cause false positives affecting the formatting of plaintext mathematical formulas (and qualified mathematicians concur, see my next comment).

2) There's plenty of evidence from users and real life examples that *NOT* "recognizing *structs* that start/end with digits" is needlessly ux-inconsistent and violating the legitimate formatting expectations of our users, because their intentional use of structs is ignored by the current TB struct recognition algorithm.

So this looks like a valid bug which should be fixed.
(In reply to Ben Bucksch (:BenB) from comment #6)
> This is intentional. See description:
> http://www.bucksch.org/1/projects/mozilla/16507/
> 
> Please note the general guideline: "Failures should be minimized. A wrong
> recognition is a failure, not recognizing a structure/formatting is not seen
> as failure."

With all due respect to this personal opinion of Ben at the time of designing, there's plenty of evidence from our users (e.g. 18 duplicates of similar Bug 106028) that "not recognizing a structure/formatting" *IS* seen as a failure, as it violates their legitimate UX expectations.

> Rationale: 5*b*m² = c is a mathematical formula. Given the above guideline,
> I excluded everything that starts/ends with a number, because it's more
> likely part of a formula than a bold word.

Looking at testcases and real life examples presented in my comment 8, this assumption has been refuted by evidence:
For a given digit x, structs having { *x | x* | /x | x/ } *cannot* be part of a correctly formatted mathematical formula, and thus are more likely to be intentional formatting.
And here's a qualified mathematician who concurs:

(In reply to David E. Ross from bug 106028 comment #53)
> It was argued in a comment to bug #949066 that the handling of numeric
> characters -- not applying the markup -- is not a problem, that it is
> intentional so as not to affect mathematical equations.  That argument is
> invalid.  My degree is in mathematics.  There are many equations and
> formulae that have alphabetic terms without any numeric characters.

Iow, the only case where structs can affect maths is "messy formulas":
a) 5 *4* m^2=c.   invalid/messy maths; valid struct (should be recognized as struct, this bug)
b) 5 *b* m^2=c.   invalid/messy maths; valid struct (alphabetical term in maths formula; currently recognized as struct)

There's no reason to disappoint users legitimate, real-life struct formatting expectations to "protect" invalid/messy formulas from structs. Moreover, as David points out, there's no way we could protect messy formulas because then we'd have to abondon structs recognition altogether, even for alphabetical characters, as seen in b) above.
(In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment #9)
> Moreover, as David points out, there's no way we could protect messy
> formulas because then we'd have to abondon structs recognition altogether,
> even for alphabetical characters, as seen in a) above.

Sorry, typo: as seen in *b)* above.

Btw, *1000 thanks to Ben* who implemented the structs recognition. It's a really convenient way of formatting which is loved by those users who know about it. That is why they want it to work better for their real-life usecases. Which may include sending *1000 thanks to Ben* ... ;)
> We ordered type *A*, not type B.

This should and is recognized.

> We ordered *3*, not 5 containers

This should be recognized, it would be nice. But it's not an error or even bug to not recognize it. See more below.

> 5y *3* m²=c
> The correct price is *$ 300*, not $ 200
> as seen in *b)* above.

The above could be structs, but it's not even obviously clear to be as a human. They are messy. They should not be recognized. You've of course deliberately used examples where they are structs, but there are plenty of cases where the same set of characters are part of something else and were not meant to be structs.

Also, the algorithm works on the paragraph level (in Thunderbird on the line level). If there's a *$ at the start and a )* at the end, everything in-between would be recognized as bold. There are so many cases where this would go completely wrong.
The heuristic is already going wrong in some places, where it's difficult to fix, because it's already very conservative. If we start to allow special characters, it would go completely wrong and wild.

This is a heuristic.
That's why it was defined at the start of the project that a failure to recognize a valid struct is *not* (!!!) a bug.
It *cannot* recognize all cases perfectly, and at the same time avoid to go wrong in all cases. This is impossible. Trying it is a futile exercise, and even attempting to do so will lead to more false positive bugs than it fixes false negatives. This is the crucial part you need to understand. There's a limit of what you can do with heuristics.

The only thing we could do is allow *300* . I.e. require that the start and the end are both numbers. I feel uncomfortable with it, that it might have unintended side effects. And I do not recommend it, and if it's up to me, we would NOT do it. But that's the maximum we could do. We definitely cannot recognize * followed by a special character.
E.g. *1 or *1) , where it's meant to be a foot note. It should not be part of a struct.

> 18 duplicates of similar Bug 106028

I understand what you mean, but technology still has limits. Users don't understand the negative side effects, and would file bugs about them as well. And I expect more than 18 bugs caused by this proposed change here. There's *no way* to make this bullet proof. It's impossible. You need to understand that.
(In reply to Ben Bucksch (:BenB) from comment #12)
> > 18 duplicates of similar Bug 106028
> 
> I understand what you mean, but technology still has limits.

Just a thought:
How about expanding the Pref:"mail.display_struct" from BOOL to Integer:

0: off
1: on (failsave) [default]
2: on (with alphanum)

This way, each user can decide for himself what he wants.
(In reply to Ben Bucksch (:BenB) from comment #12)
> E.g. *1 or *1) , where it's meant to be a foot note. It should not be part
> of a struct.

Thanks Ben for your answers. This helps to understand your concerns and investigate them.
You mentioned footnotes. Let's try that and see if or how they could be misinterpreted as a struct.

> The quick brown*1 fox jumps over the lazy*2 dog.     not a struct -> ok.
> The quick brown*1) fox jumps over the lazy*2) dog.   not a struct -> ok.
> The quick brown *1 fox jumps over the lazy *2 dog.   not a struct -> ok.
> The quick brown *1) fox jumps over the lazy *2) dog. not a struct -> ok.

So for all default cases of using footnotes with asterix followed by number, with or without bracket, this does not cause any problem, as the footnotes themselves can never be misinterpreted as a struct. It's because under normal circumstances, having a recognized closing delimiter for the struct is extremely unlikely for any correctly formatted document.

The only way of having false positives of struct recognition with inner leading digit of footnotes would be something like this:

> The quick brown *1 fox jumps over the lazy* dog.   unintended struct -> false positive; messy, highly unlikely to occur
> The quick brown *1) fox jumps over the lazy* dog.  unintended struct -> false positive; messy, highly unlikely to occur

The above false positive is highly unlikely to occur because the most likely intention of an asterix after a word would be adding a footnote, but in a correctly formatted document numbered and unnumbered footnotes would never co-occur in the sequence required for struct recognition.
I honestly can't think of any other reason to add a single asterix after a word, and even if there was, I'd think we'd be much safer and match user's expectations better if we err in favor of struct recognition.

Because it's much more likely that this type is actually *intended* as a struct:

> I'd much prefer to implement *1) Always recognize structs that start/end with digits*. (should be recognized as a struct, this bug).

(In reply to Ben Bucksch (:BenB) from comment #11)

>> 5y *3* m²=c

For the avoidance of doubt, I think this above case will never occur unless explicitly intended as a struct.

>> The correct price is *$ 300*, not $ 200
>> as seen in *b)* above.

> The above could be structs, but it's not even obviously clear to be
> as a human. They are messy.

?? I can't think of any correctly formatted non-struct intention which could cause these structs. I'm not sure what's messy about wanting to bold-print "$ 300"?

> there are plenty of cases where the same set of characters are part of
> something else and were not meant to be structs.

So I think for footnotes, it has been shown that they are generally not affected by allowing structs of type *x foo* or *foo x* where x is a digit. Ben, can you please provide some more real-life examples where this bug is likely to cause false positives of struct recognition?

> Also, the algorithm works on the paragraph level (in Thunderbird on the line
> level). If there's a *$ at the start and a )* at the end, everything
> in-between would be recognized as bold. There are so many cases where this
> would go completely wrong.

I'm failing to imagine a real-life example where leading *$ would be followed by trailing )* ?
(In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment #14)
> The only way of having false positives of struct recognition with inner
> leading digit of footnotes would be something like this:
> 
> > The quick brown *1 fox jumps over the lazy* dog.   unintended struct -> false positive; messy, highly unlikely to occur

For more clarity:
For the footnotes case, the above example does not make sense and is hence unlikely to occur.
Perhaps there could be this case (unnumbered footnote followed by numbered footnote):

> The quick brown* fox jumps over the lazy*1 dog.   not a struct -> no problem.
> The quick brown * fox jumps over the lazy *1 dog. not a struct -> no problem.

Otoh, it's much more likely that surface structures like the above are actually /intended/ as a *struct*:

> We are looking for *1 neutral person* to look into improving our structs UX.
> We are looking for /10 prototypical examples/ where structs with leading or trailing digits fail.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.