Open Bug 950606 Opened 11 years ago Updated 2 years ago

[mozTXTToHTMLConv] Should not recognize structs with special characters in the middle, e.g. *(!foo.bar)* or *Cool, eh? Really!*

Categories

(MailNews Core :: Backend, defect)

defect

Tracking

(Not tracked)

People

(Reporter: BenB, Unassigned)

References

Details

(Whiteboard: [wontfix?][violates ux-consistency])

Attachments

(1 file)

The struct phrases recognition doesn't work very well. Very often, there are false positives that spread several lines or half of the document. The easiest way to reproduce and quickly run into it as an end user is to turn on View | Message Body As | Plaintext and then look at some marketing emails from companies. They often contain only HTML and lots of bolding (in HTML, as <b>), which will be turned into *, which will then often not be properly recognized and spawn half the document. The intention for structs was to allow one or few (up to 3 or 4) words to be marked bold, if the user wants to emphasize certain words in a phrase. The intention was never to mark whole sentences bold. And the latter is also the part that causes problems in practice. To cut this short, any interpunctation or special characters like ",", "." or "(" within the struct should disable it. The intention here is to massively reduce false positives (bad, wrong bold markings). It will also cause false negatives (not recognize bold markings that were intended), but that's accepted. The design document specifically states that we don't care about false negatives very much, but try hard to avoid false positives. In other words, less false positives and more false negatives are a good tradeoff, and that's what we'll do here.
CC :bwinton for UX input - please read bug 106028 (summary: Bug 106028 Comment 59) and bug 949066 first. FTR: This bug is an idiosyncracy based on several unfounded assumptions and personal judgements that are not applicable for the majority of our users and usecases, as evidenced by user feedback like 18 duplicates on bug 106028. This bug will create more problems of ux-consistency than it solves. I recommend that this bug should be closed *WONTFIX*. I also request that Ben should follow the prescribed structure of Bugs: Steps to reproduce (plus testcase) Actual Result Expected Result This structure is prescribed for a reason, namely to create transparency and accountability for the details of a bug, allow objective examination of the bug, and discover flaws in scenarios, observations, conclusions etc. I'll just mention a few points here but there's certainly more. 1) This bug starts out from an edge case scenario which is not applicable for the majority of our users: How many users would deliberately look at a heavily HTML-formatted "marketing email from companies" with "View | Message Body as | *Plaintext* "? And why would they do that? There was a time in history where converting HTML into plaintext for viewing or otherwise made sense (bandwidth, interoperability, security), but times have changed. Today, the scenario of viewing a heavily formatted HTML message with a plaintext viewer (after down-conversion) does not make any sense (iow, it's nonsense). It's obvious the result will be distorted. No converter in the world, written by Ben Bucksch or otherwise, can create an accurate representation of an HTML message in plaintext. It's a myth that addressing a few random problems of bold formatting conversion would make the entire conversion noticeably better. 2) Otoh, it's a fact that we have a lot of complaints from users who expect structured plaintext formatting deliberately *composed* by themselves or their correspondents to be rendered /correctly/ regardless of the characters used in their text (see 18 duplicates of bug 106028). Not a single user request saying "pls stop recognizing special characters in the middle" (this bug), but plenty of user requests *requesting the exact opposite*. This bug will cause even more bug reports of users complaining that we don't recognize their explicit structs formatting correctly. 3) I'm entirely failing to see why structs like the following should no longer be recognized and formatted as structured plaintext (per this bug, comment 0): a) *(wtf!)* b) *I really hate inconsistent design!* c) *Good design needs user input, design principles, good reasons and cooperation!* d) M$ really make *$$$* a) Round brackets - what's special here? b) A full sentence with more than 3 or 4 words - this should just work? c) A bold sentence with punctuation - where's the problem? d) Some "special" characters bolded in the struct - so what? Why should punctuation or "special characters" prevent formatted rendering of structured plaintext? And since when are commas, dots or round brackets "special characters" for our purpose here? Notwithstanding technical delicacies of getting the algorithm right (let's expose them and solve them!), I'm failing to see the tradeoff. We can't be serious to dictate that users must not use more than 3 or 4 words, and structs formatting will only work for text having certain characters, but not for others, needlessly excluding everyday characters like dots, commas, and round brackets. I want to see that documentation which explains all these random exceptions to our users. Instead of being a balanced "tradeoff", this will introduce even more violations of UX-consistency than we have now. And ux-consistency is a truly recognized design principle beyond doubt... I've made a very simple proposal to eliminate most of this problem for all everyday scenarios at the end of Bug 106028 Comment 59: > For formatted rendering of structs, the type of the leading/trailing[/inner] > character of the inner text is irrelevant. We can just remove the entire > special-casing of alphabetical characters vs. numeric characters [/"special characters"], and render > all structs correctly formatted as they occur 4) The fact that Ben as the author of the design document (link please) has personally decided that he considers "less false positives and more false negatives... a good tradeoff" does not mean much in and of itself. TB and its plaintext features are not a one-man show, and we are here to serve our users, not to serve some personal "design documents" hidden somewhere that are more than a decade old. Serving users means to listen carefully and open-mindedly to what users want, and find creative ways of incorporating user input and other requirements. I've looked into thousands of bugs, and I've never heard users complain that their HTML looks broken when they view it with a plaintext viewer. What!? It's obvious that HTML messages are best viewed with HTML viewers, and most of those other usecases are a thing of the past. TB design is a co-operative venture, and needs constant review, refinement and reform to adjust to changed circumstances, new scenarios, outdated scenarios, and, most of all, users current needs. To that end, any design decision should come with good reasons and tangible evidence that should seek to be be as objective and transparent as possible, look at things from various angles, and be generally agreed upon in a cooperative manner. Ben from his very personal point of view might not care much about more false negatives (not recognizing intended user formatting of structs) and creating more ux-inconsistency, but our users evidently do. And they matter more. FTR: This bug looks like another instance of Ben's well-known principle of "divide and rule": After briskly closing down active bugs that have productive discussion on the shortcomings of his code (Bug 949066), he pulls up a new bug without need which is not linked to anything, no CC's, thus elegantly sidelining the entire discussion, pretending it's /his/ initiative to fix his own code, ensuring he's the "owner" of the bug, then tweaking his own bug as he pleases and add more idiosyncracies, and finally force other contributors to discuss the very same issues all over again if they want them addressed. After that, ignore the new bug until it rots, and others get tired. The net result, as usual, no progress in these corners of TB. *Sad. Very!* <- struct with special character in the middle, will no longer work after this bug, per comment 0.
Blocks: 106028
Flags: needinfo?(bwinton)
Summary: [mozTXTToHTMLConv] Should not recognize structs with special characters in the middle → [mozTXTToHTMLConv] Should not recognize structs with special characters in the middle, e.g. *(!foo.bar)* or *I lv u 4ever. Really!*
Whiteboard: [wontfix?][violates ux-consistency]
re comment @1, point #1: There are still many users who do indeed dislike HTML-formatted E-mail. There are very good reasons to do so. See <http://www.rossde.com/internet/ASCII_mail.html> and its related <http://www.rossde.com/internet/ASCIIvsHTML.html>.
(In reply to Ben Bucksch (:BenB) from comment #0) > The intention here is to massively reduce false positives (bad, wrong bold > markings). It will also cause false negatives (not recognize bold markings > that were intended), but that's accepted. The design document specifically > states that we don't care about false negatives very much, but try hard to I also sometimes use inclusive "we", but in this case, "we" is just Ben afasics, as he's the author of the both the code and the design document. Users, other contributors, and qualified UX input might differ. > avoid false positives. In other words, less false positives and more false > negatives are a good tradeoff, and that's what we'll do here. So that's what /Ben/ *wants* to do here. Let's see what others think. Ben's design document is found here: http://www.bucksch.org/1/projects/mozilla/16507/ FTR, I have attached a snapshot of that document (taken on 24th May 2013). Pls not that the design document, I quote, > is not 100% correct for simplicity. The part Ben refers to in comment 0 is the second "Goals" section, I quote: > Failures should be minimized. A wrong recognition is a failure, > not recognizing a structure/formatting is not seen as failure. "...not /seen/ as a failure" - awesome, that needs close reading. So whatever it is, rest assured, it's not a bug, it's a feature... ;)
Thomas D., stop campaining. We need solutions, not political speeches. I'm removing everybody that Thomas D. added to CC. If any of you is really interested, feel free to re-add yourself. I also value constructive input that leads to a solution that's good for everybody.
Summary: [mozTXTToHTMLConv] Should not recognize structs with special characters in the middle, e.g. *(!foo.bar)* or *I lv u 4ever. Really!* → [mozTXTToHTMLConv] Should not recognize structs with special characters in the middle, e.g. *(!foo.bar)* or *Cool, eh? Really!*
Okay, so, here's my UX input, to clear the needinfo request. I think it would be nice to follow the lead of the other plaintext markup languages as much as possible, so that we confuse as few people as possible. Markdown, in particular, seems close to what we already handle.
Flags: needinfo?(bwinton)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: