Closed Bug 947588 Opened 6 years ago Closed 6 years ago

Invalid characters displayed as hexboxes

Categories

(Core :: Layout: Text and Fonts, defect, P4, minor)

defect

Tracking

()

VERIFIED FIXED
mozilla30
Tracking Status
firefox27 --- unaffected
firefox28 + verified
firefox29 + verified
firefox30 --- verified

People

(Reporter: roc, Assigned: roc)

References

(Depends on 1 open bug)

Details

(Keywords: dev-doc-needed)

Attachments

(8 files, 2 obsolete files)

140.07 KB, image/png
Details
9.34 KB, patch
Details | Diff | Splinter Review
2.83 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
813 bytes, patch
heycam
: review+
Details | Diff | Splinter Review
2.98 KB, patch
heycam
: review+
Details | Diff | Splinter Review
2.18 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
10.41 KB, patch
heycam
: review+
Details | Diff | Splinter Review
4.89 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
http://www.stuff.co.nz/entertainment/books/9488488/Author-proves-book-agent-wrong
There are 0x07 characters scattered through the text. It's unclear why.

Chrome doesn't display these characters. Neither should we.
I'm not sure I agree that we should hide these invalid chars. This screenshot shows an excerpt from the site above, as displayed by Chrome and by Firefox (Nightly). IMO, the Firefox version, where the invalid control chars render as hexboxes, is the better option. In Chrome, it's hard for the reader to tell what the intended reading might be; at least with the visible control characters, the intended words are visually separated from the surrounding "junk" letters (presumably some kind of word-processor-command residue). Although the result isn't pretty, it's a more useful rendering - both for the reader who simply wants to know what the text was meant to say, and for the author who is more likely to notice that something went wrong.
I was searching to see if CSS or HTML specs said something about it - I couldn't find
anything but I found this instead:
http://lists.w3.org/Archives/Public/www-style/2003Jan/0079.html

I agree with Jonathan, the hexboxes alerts the user that something is wrong which
may help when trying to make sense of the text.  Also, it's more likely that it will
result in an error report to the author which is good in the long run.
Jonathan: To avoid a flood of these "funny characters" bugs, maybe you should document bug 909344's new hexbox behavior in a post to a Mozilla mailing list (dev-platform?) or a Mozilla blog that will be read by web developers? I'm also curious to know which control characters *shouldn't* be displayed as hexboxes. :)

Here is another instance:

The <title> and headline of the following Computer World article is "Congress wants to know: <U+0018>Who the blank is responsible<U+0019> for healthcare.gov mess?" The headline contains a quote delimited by Unicode characters U+0018 ('CANCEL') and U+0019 ('END OF MEDIUM'):

http://www.computerworld.com/s/article/9244035/Congress_wants_to_know_Who_the_blank_is_responsible_for_healthcare.gov_mess_
Blocks: 909344
Component: Layout → Layout: Text
Flags: needinfo?(jfkthame)
OS: Linux → All
Hardware: x86_64 → All
(In reply to Chris Peterson (:cpeterson) from comment #4)
> The <title> and headline of the following Computer World article is
> "Congress wants to know: <U+0018>Who the blank is responsible<U+0019> for
> healthcare.gov mess?" The headline contains a quote delimited by Unicode
> characters U+0018 ('CANCEL') and U+0019 ('END OF MEDIUM'):

Probably the upper half of U+2018 ('LEFT SINGLE QUOTATION MARK') and U+2019 ('RIGHT SINGLE QUOTATION MARK') has been truncated. This is the perfect example of finding author's error.
(In reply to Jonathan Kew (:jfkthame) from comment #2)
> Created attachment 8344170 [details]
> comparison of Chrome (left) and Firefox (right) rendering
> 
> I'm not sure I agree that we should hide these invalid chars. This
> screenshot shows an excerpt from the site above, as displayed by Chrome and
> by Firefox (Nightly). IMO, the Firefox version, where the invalid control
> chars render as hexboxes, is the better option. In Chrome, it's hard for the
> reader to tell what the intended reading might be; at least with the visible
> control characters, the intended words are visually separated from the
> surrounding "junk" letters (presumably some kind of word-processor-command
> residue). Although the result isn't pretty, it's a more useful rendering -
> both for the reader who simply wants to know what the text was meant to say,
> and for the author who is more likely to notice that something went wrong.

That's true for this one chunk of text you carefully selected :-). It's not true in the other place 0x07 occurs in that article, and it's not true anywhere in the article in comment #0, and it's not true in the article in comment #4.

(In reply to Masatoshi Kimura [:emk] from comment #5)
> Probably the upper half of U+2018 ('LEFT SINGLE QUOTATION MARK') and U+2019
> ('RIGHT SINGLE QUOTATION MARK') has been truncated. This is the perfect
> example of finding author's error.

It's also a fine example of how displaying errors to the user makes the user's experience worse without affecting author behavior.

This story has played out on the Web over and over again. We know how it ends: users mostly don't report bugs; either they tolerate errors or silently switch browsers. If they do report bugs, those bugs are ignored. Or if those bugs aren't ignored, the response is "looks good to me in Chrome, I suggest you switch browsers". There's a reason we're not all using XHTML.
I'd be more sympathetic if the cleanup of the Web we're trying to achieve here was significant. But it's not, it's utterly trivial.
I also want to remind people that I'm all in favour of displaying these hexboxes in editable text, in case Firefox is actually used in a content production step where displaying the hexboxes can actually make a difference.
Soliciting hsivonen for an independent opinion.
Flags: needinfo?(hsivonen)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #6)
> (In reply to Masatoshi Kimura [:emk] from comment #5)
> > Probably the upper half of U+2018 ('LEFT SINGLE QUOTATION MARK') and U+2019
> > ('RIGHT SINGLE QUOTATION MARK') has been truncated. This is the perfect
> > example of finding author's error.
> 
> It's also a fine example of how displaying errors to the user makes the
> user's experience worse without affecting author behavior.

I'm not sure it's worse than being not noticed the lack of the quotation marks.
Obviously users will not report errors if they don't notice the errors.
FWIW, note that we've also had bug reports filed against our previous behavior (see bug 757521). If we "fix" this we'll be "unfixing" that.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #8)
> I also want to remind people that I'm all in favour of displaying these
> hexboxes in editable text, in case Firefox is actually used in a content
> production step where displaying the hexboxes can actually make a difference.

Wouldn't this mean that merely adding or removing contentEditable on an element could change the layout of its text? That seems undesirable to me.

Moreover, if we display such control-char hexboxes only when an element switches into "editing" mode, and hide them as soon as the author clicks "Done" or whatever, I'd guess people are liable to think they're some kind of artifact of the editing view, and not worry about cleaning them up.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #9)
> Soliciting hsivonen for an independent opinion.

So:
 * Trident (tested IE11) doesn't render the hexboxes but exports them to the clipboard, so the cruft is there in the DOM.
 * Chrome does the same thing.
 * Safari doesn't render the hexboxes and I didn't bother testing DOM/clipboard behavior.
 * There's a legitimate use case for rendering the hexboxes in View Source
 * We have an additional bug report (bug 757521) that doesn't provide enough context to judge its merit.
 * This sort of cruft actually shows up on real sites.

I think trying to show Web authors the errors in their ways by surfacing errors to end users is a loser position to be in when all other browsers hide the error. For this reason, I'm inclined to say that we shouldn't render the hex boxes for code points below U+0020 in Web content.

Since there is value in rendering the hexboxes in View Source, I think it would make sense to be able to render them there if that would be feasible technically and in terms of the amount of implementation work required. For example, if it's easy and not a perf problem to have a magic CSS property for showing hexboxes below U+0020, I think it would sense to have such property defaulting to not rendering the hexboxes and then having View Source and various dev tools flip the property to turn hexbox rendering on.

If this isn't feasible and it's necessary to pick one behavior, then I'd pick the behavior of not rendering the hexboxes (for visual compatibility with other browsers).
Flags: needinfo?(hsivonen)
Not all other browsers hide the errors: in Internet Explorer on WinXP, I see most of the control characters rendered as various symbols from some font or other.
(In reply to Jonathan Kew (:jfkthame) from comment #14)
> Not all other browsers hide the errors: in Internet Explorer on WinXP, I see
> most of the control characters rendered as various symbols from some font or
> other.

In that case, the direction in IE has been towards not rendering the control characters. (I tested IE11 on Windows 8.1.)
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> Since there is value in rendering the hexboxes in View Source, I think it
> would make sense to be able to render them there if that would be feasible
> technically and in terms of the amount of implementation work required. For
> example, if it's easy and not a perf problem to have a magic CSS property
> for showing hexboxes below U+0020, I think it would sense to have such
> property defaulting to not rendering the hexboxes and then having View
> Source and various dev tools flip the property to turn hexbox rendering on.

That shouldn't be a problem and I think we should definitely do it.

(In reply to Jonathan Kew (:jfkthame) from comment #12)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #8)
> > I also want to remind people that I'm all in favour of displaying these
> > hexboxes in editable text, in case Firefox is actually used in a content
> > production step where displaying the hexboxes can actually make a difference.
> 
> Wouldn't this mean that merely adding or removing contentEditable on an
> element could change the layout of its text? That seems undesirable to me.

It's undesirable, but I don't think it actually breaks anything, and if it does it's only in this very edge case that we want to steer authors away from anyway, so I think it's tolerable.

> Moreover, if we display such control-char hexboxes only when an element
> switches into "editing" mode, and hide them as soon as the author clicks
> "Done" or whatever, I'd guess people are liable to think they're some kind
> of artifact of the editing view, and not worry about cleaning them up.

If they have no idea what the hexboxes mean then there is no argument for displaying them under any circumstances.
Given the opinion of Henri, whom I trust more than myself, I'm close to pulling rank here and forcing a backout of bug 909344 and incrementally re-adding the display of hexboxes to View Source, and editable text if we form a consensus around that. But I hate having to do that :-(. Jonathan, is there any other process you'd like to follow here? Anyone else you'd like to consult?
BTW we also have the option of not displaying hexboxes in contenteditable, and but displaying them in <input type="text"> and <textarea>. I really don't see any downsides to the latter option.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17)
> Given the opinion of Henri, whom I trust more than myself, I'm close to
> pulling rank here and forcing a backout of bug 909344 and incrementally
> re-adding the display of hexboxes to View Source, and editable text if we
> form a consensus around that. But I hate having to do that :-(. Jonathan, is
> there any other process you'd like to follow here? Anyone else you'd like to
> consult?

That's OK, I guess. I've stated what I think - that it's more useful to display hexboxes, despite their ugliness, than to hide the garbage that's present in cases like this - but clearly I'm not going to persuade you of that. I suppose if people care enough, they'll eventually file new bugs and we can debate it all over again. :)
Flags: needinfo?(jfkthame)
For what it's worth, I probably lean (not strongly, though) a little more towards Jonathan's side that we ought to display hexboxes.  Though it's not at all clear how many sites it'll show up on; we'd probably have a better idea of that if this hits beta or release.
Someone could search http://commoncrawl.org, but I'm not sure it really matters; both the costs and benefits of this change are proportional to the rate at which these characters show up.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #22)
> More here: http://www.nzwomansweekly.co.nz/food/homemade-brandy-snaps/

That's a curious example: note how an occurrence of 001E or 001F immediately precedes every "fi" or "fl", except for the one instance of "flour" in the list of ingredients.

I guess some broken tool, confused workflow or misguided author has been trying to do something "clever" to the text with presentation-form codepoints for the fi and fl ligatures, but it's all gone haywire.

Also note the example in step 1 of the recipe, where the "fl" of "flour" is completely missing, all that remains is the spurious 001E. In Chrome, all we see is the word "our", with no visible hint that anything's wrong except that it doesn't actually make sense.

Personally, I still don't think anyone really benefits from hiding these errors.
Users obviously benefit from not seeing these junk characters.
More specifically, if we agree that quality of text rendering is important to user experience (I think we do) spicing up the rendering with hexboxes is surely detrimental.
I wasn't going to continue the debate, but just once more......

If we hide the errors, or if we make them visible only in special modes such as "view source" or editable text areas, etc., then many users *and authors* will never see them; I'm pretty sure that much of the text we see on the web gets there without ever being edited directly within Firefox.

I suggest that making the errors visible within the normal browser view substantially increases the chance that they'll get fixed, either because the author or publisher notices them or because a user reports the problem. And thus it contributes to improving the quality of the information being published on the Web. This is good for the health of the Web as a whole, at least if we agree that the Web is not *only* about displaying pages prettily, but also about their information content, and what people can do with it - searching, sharing, reusing, publishing, etc.

By hiding the problems, we're hurting the Web by perpetuating the presence of low-quality data that is less valuable than it could and should be.
If we decide to suppress these hexboxes then I would suggest that we
log an error about it in the console (once per document) to inform
the author about the problem.
Severity: normal → minor
Priority: -- → P4
I've got a new suggestion. I think we should raise with the CSS WG the issue of how these characters should be rendered. If the group agrees they should be rendered, that gets put in a spec, and we coordinate with other browser vendors (at least Chrome) to ensure that we both turn on support for rendering them.

In the meantime, I suggest we introduce a private CSS property to control whether they're rendered, and set that property in view-source, in text inputs, in all the devtools panes, etc, but not for all Web content. And do what Mats suggested in comment #27.

How does that sound?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28)
> How does that sound?

Sounds OK as long as our default behavior really ends up matching Chrome/IE11/Safari during the time that the CSS WG takes to ponder this and the other browsers take to implement whatever the WG decides. (I expect the WG to decide not to render non-tab, non-LF characters below U+0020 by default in order to minimize the change from what most browsers are doing.)
https://bugzilla.mozilla.org/show_bug.cgi?id=957359 has an interesting example. A log file containing 0x7 characters (the beep for an NS_ASSERTION) was pasted into Bugzilla. If we regard this as author error, it's not clear whose error it is. Maybe we shouldn't regard this as an error.
Henri: in comment 29, you recommended that Firefox's default behavior should match Chrome/IE11/Safari while the CSS WG deliberates, i.e. don't render hexboxes. However this bug is still unfixed, so Beta users will soon see hexboxes when Aurora 28 to Beta in two weeks. Should this bug be fixed or the regressing bug 909344 be backed out?

btw, here is another affected page: this article's byline displays Line Tabulation (U+000B) characters on Nightly 29 and Aurora 28: http://www.tabletpcreview.com/default.asp?newsID=4703
Flags: needinfo?(hsivonen)
(In reply to Chris Peterson (:cpeterson) from comment #31)
> Henri: in comment 29, you recommended that Firefox's default behavior should
> match Chrome/IE11/Safari while the CSS WG deliberates, i.e. don't render
> hexboxes. However this bug is still unfixed, so Beta users will soon see
> hexboxes when Aurora 28 to Beta in two weeks. Should this bug be fixed or
> the regressing bug 909344 be backed out?

I don't know how much work fixing this in the way described in the second paragraph of comment 28. I think that would be the ideal fix. If that fix can't be made before the window of opportunity to back out bug 909344 closes, I'd recommend backing out bug 909344.
Flags: needinfo?(hsivonen)
Blocks: 963252
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #30)
> https://bugzilla.mozilla.org/show_bug.cgi?id=957359 has an interesting
> example. A log file containing 0x7 characters (the beep for an NS_ASSERTION)
> was pasted into Bugzilla. If we regard this as author error, it's not clear
> whose error it is. Maybe we shouldn't regard this as an error.

Some thoughts on this:

Obviously, the purpose of the 0x7 is originally intended to produce a beep as part of displaying the error message on the console. That was a legitimate use of this control code within the context of a stream of text being dynamically produced and presented in real time to the user.

It doesn't make as much sense, though, when that text is captured and displayed on a static page; how would the "beep" then be manifested? A single audible beep (or one per 0x07 code) when the page is loaded isn't much use - there'd be no way for the user to associate the beep with the appropriate place in the page - and we certainly don't want a continuous beep as long as it's being displayed, nor a beep every time that fragment is repainted.

So if the presence of the beep is to be retained at all (one option would be to filter it out of the text that's pasted in, but that's also problematic in that it's lossy), it needs to be made visible rather than audible. If we hide the hexbox, users have no way of seeing that there was a beep code present at all; even if they use Find in Page to search for it, they won't be able to see where the found text is, as there's nothing to highlight.

It would arguable be "friendlier" to render the 0x07 as some kind of mnemonic or visual representation of the BELL control code, in preference to a hexbox; e.g. replacing it with U+2407 ␇ or even U+1F514, which probably won't work in bugzilla. But this would also be misleading, as it wouldn't be clear whether the text actually contained one of these visible characters, or the invisible control code.

IMO, none of these options are really satisfactory. In contexts where users may be expected to deal with such codes relatively often, a convention such as inverse-video ^G or <07>, handled as a single entity for editing purposes, etc., is probably more helpful than a hexbox, as it's easier to read; but these would also be more expensive to implement as they'd need additional special-case processing. So as a default behavior for our text rendering code, I think the hexbox is the best option we currently have. Specific features such as View Source might still want to consider replacing it with an enhanced, author-friendlier visualization.
Roc, what's your final call here?  I'll track this if we need to ensure a backout, otherwise this won't be a tracking issue but we might want to consider a release note.

My 2 cents that we should the backout and then bring this up to CSS WG since users seeing the hexboxes will probably go check the page on Chrome and then think Firefox is broken, not that Chrome is just hiding display of characters that are invalid.
Flags: needinfo?(roc)
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 909344
User impact if declined: hexboxes appear in pages with control characters
Testing completed (on m-c, etc.): none
Risk to taking this patch (and alternatives if risky): low risk, this is a straightforward backout
String or IDL/UUID changes made by this patch: none
Attachment #8368415 - Flags: approval-mozilla-beta?
Flags: needinfo?(roc)
Comment on attachment 8368415 [details] [diff] [review]
backout 909344 on beta

Moving the approval request to aurora, as this hasn't actually gone to beta yet (bug 909344 landed for mozilla-28).
Attachment #8368415 - Flags: approval-mozilla-beta? → approval-mozilla-aurora?
Comment on attachment 8368415 [details] [diff] [review]
backout 909344 on beta

Tracking for 28/29, approving the backout, not sure how long the final call on how to handle this will take but we can track for 29 and watch the progression to get a sense of that.
Attachment #8368415 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
As it's the weekend already for Roc, I went ahead and pushed this to aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/0f8d3ce6be17
Blocks: 757521
We need to do something so we're not deciding whether to do the backout before every release.

I propose this plan:
1) Add an internal CSS property "-moz-text-discard: none | control-characters", inherited, initial value "control-characters". chrome-only. -moz-text-discard:none enables the changes in bug 909344. This gets us to a stable state with 909344 effectively backed out everywhere.
2) Set this property to "none" in view-source, text <input>s, <textarea>, everywhere in devtools (including Web Console etc), plain text documents, and XUL documents ... fixing 909344 as filed.
3) Talk to www-style about whether there's interest in displaying control characters across browsers, and/or standardizing this property.
This seems like a reasonable approach for now, but I'm not thrilled about the name of -moz-text-discard; I wonder if we can come up with something clearer?

Maybe

  -moz-text-hide: none | control-characters;

would be better? (Or even -moz-text-hide-characters.) I think "hide" is more accurate than "discard", as they are still present in the text (as proved by copy-and-paste, for example).

Or more explicitly,

  -moz-text-control-characters: show | hide;

would perhaps be the clearest.

The -moz-text-(discard|hide) version could in principle be extended with other values, e.g., if we ever wanted a "show invisibles" option to make bidi formatting controls, joiners, etc., render visibly.

Though in that case, it might be easier to deal with the opposite naming:

  -moz-text-show-invisibles: none | ( control-characters | bidi-controls | joiners )# ;

But if we're unlikely ever to do such things, then I think a narrowly-focused -moz-text-control-characters, designed to address just this one issue, might be the most appropriate.
Comment on attachment 8370628 [details] [diff] [review]
Part 2: Add gfxTextRunFactory::TEXT_DISCARD_CONTROL_CHARACTERS to guard usage of IsInvalidControlChar

Review of attachment 8370628 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine, modulo possibly reversing the sense of the flag - WDYT?

::: gfx/thebes/gfxFont.h
@@ +1140,5 @@
> +        /**
> +         * When set, the textrun should discard control characters instead of
> +         * turning them into hexboxes.
> +         */
> +        TEXT_DISCARD_CONTROL_CHARACTERS = 0x0400,

How about reversing the sense of this, to make it TEXT_SHOW_CONTROL_CHARACTERS? That would eliminate the negation where it's tested in gfxFont.cpp, and so help code readability, IMO.
Attachment #8370628 - Flags: review?(jfkthame) → review+
Comment on attachment 8370627 [details] [diff] [review]
Part 1: Add "-moz-text-discard" property to the style system

Review of attachment 8370627 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/nsCSSPropList.h
@@ +2888,5 @@
>      eStyleAnimType_Custom)
> +#ifndef CSS_PROP_LIST_EXCLUDE_INTERNAL
> +CSS_PROP_TEXT(
> +    -moz-text-discard,
> +    text_discard,

_moz_text_discard

@@ +2889,5 @@
> +#ifndef CSS_PROP_LIST_EXCLUDE_INTERNAL
> +CSS_PROP_TEXT(
> +    -moz-text-discard,
> +    text_discard,
> +    TextDiscard,

CSS_PROP_DOMPROP_PREFIXED(TextDiscard)

(although probably doesn't matter given the CSS_PROP_LIST_EXCLUDE_INTERNAL)

@@ +2894,5 @@
> +    CSS_PROPERTY_PARSE_VALUE,
> +    "",
> +    VARIANT_HK,
> +    kTextDiscardStyleKTable,
> +    offsetof(nsStyleText, mTextDiscard),

Use CSS_PROP_NO_OFFSET since the property isn't declared to be animatable.

::: layout/style/nsStyleConsts.h
@@ +750,5 @@
>  #define NS_STYLE_TEXT_TRANSFORM_LOWERCASE       2
>  #define NS_STYLE_TEXT_TRANSFORM_UPPERCASE       3
>  #define NS_STYLE_TEXT_TRANSFORM_FULLWIDTH       4
>  
> +// See nsStyleText

// See nsStyleText::mTextDiscard
Attachment #8370627 - Flags: review?(cam) → review+
Attachment #8370631 - Flags: review?(cam) → review+
Comment on attachment 8370632 [details] [diff] [review]
Part 4: Set -moz-text-discard:none in devtools, XUL, text inputs, plaintext documents, view-source documents

Review of attachment 8370632 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/content/xul.css
@@ +23,5 @@
>  
>  :root {
>    text-rendering: optimizeLegibility;
>    -moz-binding: url("chrome://global/content/bindings/general.xml#root-element");
> +  -moz-text-discard: none;

Is this one for window title bars etc.?
Attachment #8370632 - Flags: review?(cam) → review+
(In reply to Jonathan Kew (:jfkthame) from comment #47)
> This seems like a reasonable approach for now, but I'm not thrilled about
> the name of -moz-text-discard; I wonder if we can come up with something
> clearer?

Sure. I was semi-subconsciously channeling a proposal for "white-space-collapse:discard" but that's not compelling.

>   -moz-text-hide: none | control-characters;
> 
> would be better? (Or even -moz-text-hide-characters.) I think "hide" is more
> accurate than "discard", as they are still present in the text (as proved by
> copy-and-paste, for example).

Not too bad.

> Or more explicitly,
> 
>   -moz-text-control-characters: show | hide;
> 
> would perhaps be the clearest.

I think the property name isn't specific enough in this case.

How about -moz-control-characters-visibility: visible | hidden?

> The -moz-text-(discard|hide) version could in principle be extended with
> other values, e.g., if we ever wanted a "show invisibles" option to make
> bidi formatting controls, joiners, etc., render visibly.
> 
> Though in that case, it might be easier to deal with the opposite naming:
> 
>   -moz-text-show-invisibles: none | ( control-characters | bidi-controls |
> joiners )# ;

In this case, you probably want to control visibility of various classes independently, so we should stick with the above anyway.
(In reply to Jonathan Kew (:jfkthame) from comment #48)
> Looks fine, modulo possibly reversing the sense of the flag - WDYT?
> 
> ::: gfx/thebes/gfxFont.h
> @@ +1140,5 @@
> > +        /**
> > +         * When set, the textrun should discard control characters instead of
> > +         * turning them into hexboxes.
> > +         */
> > +        TEXT_DISCARD_CONTROL_CHARACTERS = 0x0400,
> 
> How about reversing the sense of this, to make it
> TEXT_SHOW_CONTROL_CHARACTERS? That would eliminate the negation where it's
> tested in gfxFont.cpp, and so help code readability, IMO.

I don't think you want this, because you want "show control characters" to be the default and so that should be 0.

(In reply to Cameron McCormack (:heycam) from comment #50)
> ::: toolkit/content/xul.css
> @@ +23,5 @@
> >  
> >  :root {
> >    text-rendering: optimizeLegibility;
> >    -moz-binding: url("chrome://global/content/bindings/general.xml#root-element");
> > +  -moz-text-discard: none;
> 
> Is this one for window title bars etc.?

No. This effectively enables "show control characters" everywhere in all XUL documents. So if someone slips them into chrome or an extension UI, we'll see it. It also has the nice side effect of enabling "show control characters" in the DOM Inspector extension and probably other similar extensions (Firebug maybe).
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)

> How about -moz-control-characters-visibility: visible | hidden?

Sure, that sounds fine.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #52)

> > Looks fine, modulo possibly reversing the sense of the flag - WDYT?

> I don't think you want this, because you want "show control characters" to
> be the default and so that should be 0.

Fair enough. So something like TEXT_HIDE_CONTROL_CHARACTERS, to associate it with the property value above?
I'll go with TEST_CONTROL_CHARACTERS_HIDDEN.
and -moz-control-character-visibility.
actually I'll go with TEXT_HIDE_CONTROL_CHARACTERS after all.
Attached patch Part 1 v2Splinter Review
Updated for the property changes discussed in the bug.
Attachment #8370627 - Attachment is obsolete: true
Attachment #8372064 - Flags: review?(cam)
Attached patch Part 3 v2Splinter Review
Updated for property changes
Attachment #8370630 - Attachment is obsolete: true
Attachment #8370630 - Flags: review?(jfkthame)
Attachment #8372066 - Flags: review?(jfkthame)
Attachment #8372064 - Flags: review?(cam) → review+
Bug 968584 argues that the current behavior is good for security. We might want to keep at least some invalid characters displaying as hexboxes.
Comment on attachment 8372064 [details] [diff] [review]
Part 1 v2

> +  // -moz-text-discard: enum, inherit, initial
> +    "-moz-control-character-visibility",

These don't match?
Comment on attachment 8372064 [details] [diff] [review]
Part 1 v2

When adding new CSS properties, please add tests:
* Add to property_database.css to test parsing and serialization (and to inform to the fuzzer)
* Add static reftests
* Add dynamic reftests
Comment on attachment 8370633 [details] [diff] [review]
Part 5: Fix tests to pass even if regular HTML content is not displaying hexboxes for control characters

Review of attachment 8370633 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine; but I think you should also add reftests for the behavior of the new property.
Attachment #8370633 - Flags: review?(jfkthame) → review+
Comment on attachment 8372066 [details] [diff] [review]
Part 3 v2

Review of attachment 8372066 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM - though given bug 968584, do you want to reconsider whether we really want this behavior for web content?
Attachment #8372066 - Flags: review?(jfkthame) → review+
(In reply to Jesse Ruderman from comment #59)
> Bug 968584 argues that the current behavior is good for security. We might
> want to keep at least some invalid characters displaying as hexboxes.

I don't think hexboxes really help with that bug. I've commented over there.
(In reply to Jesse Ruderman from comment #61)
> When adding new CSS properties, please add tests:
> * Add to property_database.css to test parsing and serialization (and to
> inform to the fuzzer)

That doesn't seem to work, probably because I made the property internal-only so there's no DOM API.

> * Add static reftests
> * Add dynamic reftests

Done that.
What's the default value of the control-character-visibility CSS option?
Comment on attachment 8368415 [details] [diff] [review]
backout 909344 on beta

Let's take this backout on Aurora (it's already been backed out on Beta) now that we have a longer-term fix on trunk. We could uplift the trunk fix but it's probably not worth the work.
Attachment #8368415 - Flags: approval-mozilla-aurora+ → approval-mozilla-aurora?
Attachment #8368415 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
User agents:
[1] Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
[2] Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:30.0) Gecko/20100101 Firefox/30.0
[3] Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0

I was able to confirm the fix for this issue on:
- the latest Beta (Build ID: 20140210161136) [1]
- the latest Nightly (Build ID: 20140210030201) [2] 

The issue is still reproducible on:
- the latest Aurora (Build ID: 20140210004002) [3]
What does it mean that the property is "internal only"? It seems odd that web pages can set it using CSS but not using DOM. Are there any other properties like that?
Properties that are defined within #ifdef CSS_PROP_LIST_EXCLUDE_INTERNAL guards in nsCSSProps.h are not exposed on CSS DOM declaration objects.  Properties that are defined in that file with the CSS_PROPERTY_PARSE_INACCESSIBLE flag are not parsed by the CSS parser.  This would be the only property that is not exposed on the DOM but is recognised in style sheets.

Things that the CSS parser only allows in chrome style sheets have a check for mUnsafeRulesEnabled in nsCSSParser.cpp, but so far this is limited to (a) a couple of MathML-related properties that really only exist for internal implementation purposes, and which if exposed to Web content would break some invariants the MathML code relies on, and (b) some pseudo-elements for HTML form elements that we want to style using CSS but which we haven't yet agreed on how to expose to Web content.

Is it important to hide this property from DOM CSSStyleDeclaration objects?  Is there any chance we'd want to standardise this property?
Depends on: 973280
Filed bug 973280 for exposing it to the DOM.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.