909344 - view source does not represent all characters

Reporter

Description

•

12 years ago

Attached file test.html — Details

User Agent: Steps to reproduce: 1. Open testcase. 2. View Page Source. Actual results: 2. Source appears as if it were syntactically correct. Expected results: 2. Display unallowed characters after 'u' in 'document'.

Stefan

Reporter

Updated

•

12 years ago

OS: Other → Linux

Hardware: Other → x86_64

Stefan

Reporter

Updated

•

12 years ago

Attachment #795438 - Attachment mime type: text/plain → text/html

Alice0775 White

Comment 1

•

12 years ago

Regression window(m-c) Good: http://hg.mozilla.org/mozilla-central/rev/8ae16e346bd0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0a1) Gecko/20120105 Firefox/12.0a1 ID:20120106015923 Bad: http://hg.mozilla.org/mozilla-central/rev/fcc32e70c95f Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0a1) Gecko/20120105 Firefox/12.0a1 ID:20120106042423 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8ae16e346bd0&tochange=fcc32e70c95f Regression window(m-i) Good: http://hg.mozilla.org/integration/mozilla-inbound/rev/511078d51f71 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0a1) Gecko/20120105 Firefox/12.0a1 ID:20120105035122 Bad: http://hg.mozilla.org/integration/mozilla-inbound/rev/c0b62edd2917 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0a1) Gecko/20120105 Firefox/12.0a1 ID:20120105041225 Pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=511078d51f71&tochange=c0b62edd2917 Regressed by: bug 703100

Blocks: 703100

Status: UNCONFIRMED → NEW

status-firefox22: --- → affected

status-firefox23: --- → affected

status-firefox24: --- → affected

status-firefox25: --- → affected

status-firefox26: --- → affected

status-firefox-esr17: --- → affected

Component: Untriaged → Layout: Text

Ever confirmed: true

Keywords: regression

OS: Linux → All

Product: Firefox → Core

Version: 20 Branch → 12 Branch

Jonathan Kew [:jfkthame]

Assignee

Comment 2

•

12 years ago

I don't think this is a bug. Control characters (except for a few such as <tab> and <return>) "have no visual or spatial representation" .... i.e. they're invisible. If you sprinkle them through source code, it won't look any different, but it'll certainly break. The same would apply to other zero-width, invisible characters such as U+200B, U+2060, U+034F, U+2062, and many more; if you add these to your source, they're unlikely to be visible yet they will break the functionality. Similarly if you replace the "o" (U+006F) in "document" with "ο" (U+03BF); it may not look any different, but don't expect Javascript to understand it!

Stefan

Reporter

Comment 3

•

12 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #2) > Control characters (except for a few such as > <tab> and <return>) "have no visual or spatial representation" .... i.e. > they're invisible. They have no representation in a presentation type context/mode. But view-source does not constitute what type of context but rather an edit type context. And all editors I have checked (vi, nano, kate, kwrite, emacs) represent these control characters properly.

Alice0775 White

Comment 4

•

12 years ago

Attached image screenshot viewsource and indpector on ubuntu — Details

I think that invisible char should be indicate in viewsource and inspector at least. because debuging make easy. The visualization of invisible char, I think that something like Ubuntu and adding special color is nice.

Jonathan Kew [:jfkthame]

Assignee

Comment 5

•

12 years ago

"Properly" meaning... what, exactly? There are many conventions for representing control characters in visible form. (I notice that if I view the sequence <U+0061,U+0001,U+0062> in nano, there's no visible difference from the sequence <U+0061,U+005E,U+0041,U+0062>. To detect the difference, it's necessary to cursor through the text and notice that the "^A" is a single unit rather than two characters.) If we displayed the control characters as hexboxes, would that be satisfactory? Or should view-source use some other convention, such as converting them to the appropriate escaped or encoded form for HTML or CSS or JS or whatever is being viewed?

Jonathan Kew [:jfkthame]

Assignee

Comment 6

•

12 years ago

Attached patch render stray control characters as hexboxes instead of invisible — Details — Splinter Review

One option would be a patch like this, which makes control characters in general render as hexboxes rather than remaining invisible. Note that this will apply -all- text rendering, not only view-source, so it may affect web pages that (accidentally?) include control characters and didn't expect them to be visible. However, I think it's probably a reasonable thing to do on the whole. (If people want view-source to display control chars in some other form, such as a vim/emacs-style "colored ^X" notation, that would need to be implemented as part of the view-source feature instead.)

Attachment #795729 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

12 years ago

Assignee: nobody → jfkthame

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 7

•

12 years ago

I don't think we should do this for all Web content, just for view-source. For regular Web content I see no upside and there is compatibility risk,

Jonathan Kew [:jfkthame]

Assignee

Comment 8

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7) > I don't think we should do this for all Web content, just for view-source. > For regular Web content I see no upside and there is compatibility risk, The benefit is that "stray" control chars will no longer be invisible to the user, which means that if Find, for example, is not giving the expected result (try searching for the word "benefit" here by typing it into the search bar), the user has some clue as to why. Similarly, it avoids unexpected results when copy-pasting text from a page into some other application where the presence of control chars may have unwanted effects, or the possibility of submitting invisibly-incorrect data via web forms. The compatibility risk is minimal, IMO, especially in view of the fact that prior to Fx12 it was possible (depending on what font happened to get chosen during fallback, and whether it mapped the control chars to invisible glyphs or to a .notdef box) for such control characters to show up as visible blots on the page.

Jonathan Kew [:jfkthame]

Assignee

Comment 9

•

11 years ago

How about if we try landing this after the upcoming merge, and then see if any compatibility issues show up as it progresses through the channels towards release? It'd be easy to revert if necessary during Aurora or even Beta. (Personally, I suspect instances of this are sufficiently rare that we won't encounter any significant compatibility issues; and for the rare cases that are affected, I believe the changed behavior is actually an improvement for all web content, not only for view-source.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 10

•

11 years ago

What do other browsers do? This should really be specced.

Stefan

Reporter

Comment 11

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10) > What do other browsers do? This should really be specced. I prefer Jonathan Kew's approach. It ensures that authors who use a Firefox browser to enter their story become aware of a possible control character pollution. Take a look at this newpaper article: http://www.faz.net/aktuell/feuilleton/medien/gema-alternative-c3s-legaler-tausch-schliesst-profit-nicht-aus-12578461.html Do you see any of the 24 U+0084 and U+0093 characters or any of the six U+0096? How do I as an editor using a current Firefox browser ensure that my story does not contain these strange characters?

Jonathan Kew [:jfkthame]

Assignee

Comment 12

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10) > What do other browsers do? It varies. On my Win8.1 (preview) VM, for example, Internet Explorer shows a "box" glyph for one of them, and blank space for a couple of others, while most remain invisible. On the same machine, Chrome leaves them all invisible. See attached screenshot. I suspect the results are often font-dependent, as the browsers are simply letting these characters go through their normal rendering path, potentially with font fallback (as many fonts will omit them from the character map entirely). Note the selection highlighting in the screenshot of IE, which suggests that the control characters are being "drawn" with at least a couple of different fonts, resulting in different heights for the selection highlight. > This should really be specced. In which spec?

Jonathan Kew [:jfkthame]

Assignee

Comment 13

•

11 years ago

Attached image ctrl-chars.png — Details

Screenshot showing examples of IE and Chrome rendering of text with stray control characters on Win8.1.

Jonathan Kew [:jfkthame]

Assignee

Comment 14

•

11 years ago

(See http://people.mozilla.org/~jkew/ctrlchars.html for the HTML page used for that screenshot.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

11 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #12) > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10) > > This should really be specced. > > In which spec? CSS. (In reply to Stefan from comment #11) > Do you see any of the 24 U+0084 and U+0093 characters or any of the six > U+0096? How do I as an editor using a current Firefox browser ensure that my > story does not contain these strange characters? There are a few options other than showing them to users in the browser: -- Examine view-source -- Web developer tools warnings -- Have your HTML editor, CMS or Web framework check automatically

Jonathan Kew [:jfkthame]

Assignee

Comment 16

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #15) > (In reply to Stefan from comment #11) > > Do you see any of the 24 U+0084 and U+0093 characters or any of the six > > U+0096? How do I as an editor using a current Firefox browser ensure that my > > story does not contain these strange characters? > > There are a few options other than showing them to users in the browser: > -- Examine view-source > -- Web developer tools warnings > -- Have your HTML editor, CMS or Web framework check automatically These are (or would be) useful options to help web authors/content creators avoid inadvertently including spurious control characters in pages. However, they don't address the issue for the end user viewing a page where such characters have (by whatever means) been included. Their presence affects the usability, and perhaps even the meaning, of the page; I suggest we are doing users a disservice by hiding them.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 17

•

11 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #16) > These are (or would be) useful options to help web authors/content creators > avoid inadvertently including spurious control characters in pages. However, > they don't address the issue for the end user viewing a page where such > characters have (by whatever means) been included. Their presence affects > the usability, and perhaps even the meaning, of the page; I suggest we are > doing users a disservice by hiding them. Can you point to any real pages where users (not developers) would benefit from seeing control characters rendered as hexboxes?

Stefan

Reporter

Comment 18

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17) > Can you point to any real pages where users (not developers) would benefit > from seeing control characters rendered as hexboxes? How do users benefit from automatic ligatures (fi, fl)? According to Eric E. Schmidt I would like to suggest a different perspective: If you have characters you don't want anyone to see, maybe they should not have been written in the first place.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

11 years ago

(In reply to Stefan from comment #18) > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17) > > Can you point to any real pages where users (not developers) would benefit > > from seeing control characters rendered as hexboxes? > > How do users benefit from automatic ligatures (fi, fl)? They make the text look better. Hexboxes don't make the page look better. > According to Eric E. Schmidt I would like to suggest a different > perspective: If you have characters you don't want anyone to see, maybe they > should not have been written in the first place. That's fine if you're the author of the page. Most users aren't.

Stefan

Reporter

Comment 20

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > (In reply to Stefan from comment #18) > > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17) > > > Can you point to any real pages where users (not developers) would benefit > > > from seeing control characters rendered as hexboxes? > > > > How do users benefit from automatic ligatures (fi, fl)? > > They make the text look better. Do they really? First, that was probably neither intended at the time of their invention nor is it empirically true that they make text look better in general. There are also at least two kinds of 'false' ligatures. The first one is when ligatures are formed in monospace text: It does not look better but also breaks a possible text alignment--making text look worse even for non-expert eyes. The second one is when the processor forms ligatures contrary to the typesetting rules, e.g. in "Auﬂösung" a morpheme boundary is crossed. Does it look better if these rules are disregarded? Does it make text look that better if a processor, which is unable to get kerning/positioning right, makes use of fancy typography? > Hexboxes don't make the page look better. That's not the intended benefit. > > According to Eric E. Schmidt I would like to suggest a different > > perspective: If you have characters you don't want anyone to see, maybe they > > should not have been written in the first place. > > That's fine if you're the author of the page. Most users aren't. Nowadays probably most users /are/ authors of content.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 21

•

11 years ago

(In reply to Stefan from comment #20) > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > > They make the text look better. > > Do they really? That's the intended benefit, anyway. If you want to argue that we should drop ligature support because you don't like ligatures, please make it in some other forum. > > Hexboxes don't make the page look better. > > That's not the intended benefit. Of course. So what is the intended benefit to users who aren't the authors of the page? > > > According to Eric E. Schmidt I would like to suggest a different > > > perspective: If you have characters you don't want anyone to see, maybe they > > > should not have been written in the first place. > > > > That's fine if you're the author of the page. Most users aren't. > > Nowadays probably most users /are/ authors of content. A user being an author of *some* content doesn't mean that they care to see the errors made by authors of *other* content. This is obvious. I fear you are being intentionally obtuse.

Jonathan Kew [:jfkthame]

Assignee

Comment 22

•

11 years ago

The presence of spurious control characters will disrupt normally-expected functionality, such as search (both Find-in-Page and search-engine indexing) and copy/paste reuse of text, that may be relevant to all users - not only to the original author. Making the control characters visible rather than invisible is a service to all users because it means the "unexpected" behavior such text will exhibit is reflected in a visible indication that "all is not what it seems". It makes the platform more transparent to users, less liable to mislead or surprise them. Of course, there are many other ways in which authors may (intentionally or otherwise) produce content that behaves less-than-optimally without giving users any visible clue as to why it is behaving poorly. But in this case, as there is (AFAICT) no legitimate use case for invisible C0/C1 control characters within web content, and given that the web platform has never guaranteed nor consistently implemented this behavior (see attachment 807732 [details]), I don't see any reason to continue hiding them.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 23

•

11 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #22) > Making the control characters visible rather than invisible is a service to > all users because it means the "unexpected" behavior such text will exhibit > is reflected in a visible indication that "all is not what it seems". It > makes the platform more transparent to users, less liable to mislead or > surprise them. Control-characters disrupting copy-and-paste and find-in-page would occur rather less often than they occur in documents. So I think this benefit is only incurred on pages where showing hexboxes has already annoyed or confused a larger number of users. I don't think it can be a net win. I think it makes sense to do this in view-source, in devtools, in HTML text input elements, and contenteditable content. I still think it's a bad idea for regular Web content.

Stefan

Reporter

Comment 24

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #21) > (In reply to Stefan from comment #20) > > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > > > They make the text look better. > > > > Do they really? > > That's the intended benefit, anyway. Ligatures - especially those which come automatically (recall the contraindication monospace and crossed morpheme boundaries) - actually > If you want to argue that we should > drop ligature support because you don't like ligatures, please make it in > some other forum. No and no. Sorry, you got me wrong. My point is that > > > Hexboxes don't make the page look better. > > > > That's not the intended benefit. > > Of course. So what is the intended benefit to users who aren't the authors > of the page? > > > > > According to Eric E. Schmidt I would like to suggest a different > > > > perspective: If you have characters you don't want anyone to see, maybe they > > > > should not have been written in the first place. > > > > > > That's fine if you're the author of the page. Most users aren't. > > > > Nowadays probably most users /are/ authors of content. > > A user being an author of *some* content doesn't mean that they care to see > the errors made by authors of *other* content. This is obvious. I fear you > are being intentionally obtuse.

Stefan

Reporter

Comment 25

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #21) > (In reply to Stefan from comment #20) > > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > > > They make the text look better. > > > > Do they really? > > That's the intended benefit, anyway. Ligatures, especially those which come automatically (recall the contraindication monospace and crossed morpheme boundaries), actually do not empirically make "text look better". I don't know of any empirical study made in the group of nontypographs proving that automatic ligatures makes "text look better". > If you want to argue that we should drop ligature support because you don't like ligatures, please > make it in some other forum. No and no. Sorry, you got me wrong. My point is that features (automatic ligatures) are implemented though they are of – let's face the truth – no use to the ordinary user: They do more harm than good for readers and authors of web pages. > > > Hexboxes don't make the page look better. > > > > That's not the intended benefit. > > Of course. So what is the intended benefit to users who aren't the authors > of the page? What is the benefit of presenting � to the user? Shall the user really be confronted with illegal bytes in UTF-8 sequences? (Yes) > > > > According to Eric E. Schmidt I would like to suggest a different > > > > perspective: If you have characters you don't want anyone to see, maybe they > > > > should not have been written in the first place. > > > > > > That's fine if you're the author of the page. Most users aren't. > > > > Nowadays probably most users /are/ authors of content. > > A user being an author of *some* content doesn't mean that they care to see > the errors made by authors of *other* content. How do you know? How many authors of some content have you asked?

Jonathan Kew [:jfkthame]

Assignee

Comment 26

•

11 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #23) > (In reply to Jonathan Kew (:jfkthame) from comment #22) > > Making the control characters visible rather than invisible is a service to > > all users because it means the "unexpected" behavior such text will exhibit > > is reflected in a visible indication that "all is not what it seems". It > > makes the platform more transparent to users, less liable to mislead or > > surprise them. > > Control-characters disrupting copy-and-paste and find-in-page would occur > rather less often than they occur in documents. So I think this benefit is > only incurred on pages where showing hexboxes has already annoyed or > confused a larger number of users. I don't think it can be a net win. Do we have any evidence that such control characters (that we're currently hiding) are actually widespread on web pages, and that making them visible would be likely to annoy or confuse many users? I suspect the most common scenario is where incorrect management of encodings somewhere in the authoring chain leads to the Windows-1252 characters in the x80-x9f range being treated as though they were Unicode C1-Controls values, and then UTF-8-encoded on the page. I've seen this happen to things like the "typographic quotes" and em-/en-dashes: data:text/html;charset=utf-8,%C2%93hello%C2%94%20foo%C2%97bar where the author's intention was: data:text/html;charset=windows-1252,%93hello%94%20foo%97bar With our current behavior, these incorrectly-encoded punctuation characters are completely invisible to the reader. This can actually be quite confusing, and make the page more difficult to read - even distort its apparent meaning. I'd argue that while hexboxes on the page are ugly, they are in fact a BETTER experience - for the reader as well as the author - in this situation. They give the reader a clue that SOMETHING was meant to be there, and it's then possible to make a reasonable guess as to what the author intended; whereas making them invisible leaves the reader likely to mis-read without even realizing it.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 27

•

11 years ago

Comment on attachment 795729 [details] [diff] [review] render stray control characters as hexboxes instead of invisible Review of attachment 795729 [details] [diff] [review]: ----------------------------------------------------------------- Alright, I give up. I guess we'll see what happens.

Attachment #795729 - Flags: review?(roc) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 28

•

11 years ago

OK, here goes. :) We'll see... https://hg.mozilla.org/integration/mozilla-inbound/rev/739edf98ca0b

Target Milestone: --- → mozilla28

Jonathan Kew [:jfkthame]

Assignee

Comment 29

•

11 years ago

Attached patch reftest for display of stray control characters — Details — Splinter Review

Provided the patch here sticks (oops - after pushing it, I realized I couldn't remember if I've previously run this through tryserver), we should also have a reftest for it.

Attachment #830712 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

11 years ago

Attachment #830712 - Flags: review?(roc) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 30

•

11 years ago

I backed this out because it caused layout/reftests/bugs/388367-1.html to fail. https://hg.mozilla.org/integration/mozilla-inbound/rev/44cce34b7073 The testcase in 388367-1 includes a "formfeed" character U+000c, which became visible and thus caused the comparison with about:blank to fail.

Jonathan Kew [:jfkthame]

Assignee

Comment 31

•

11 years ago

Attached patch load testcase 388367-1 as a crashtest rather than reftest — Details — Splinter Review

Bug 388367 was about an assertion, not really about the 'rendering' of the form-feed control character; so I suggest we change its testcase to load as a simple crashtest instead of comparing to about:blank.

Attachment #830753 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

11 years ago

Attachment #830753 - Flags: review?(roc) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 32

•

11 years ago

This also led to crashtest orange (due to an UNEXPECTED-PASS) on Linux/Linux64 only: REFTEST TEST-UNEXPECTED-PASS | file:///builds/slave/test/build/tests/reftest/tests/layout/generic/crashtests/398332-3.html | assertion count 0 is less than expected 2 assertions I notice that the testcase includes a control character (0x15), so it's plausible that the change here could affect it; by explicitly making this a missing-glyph rather than dropping it, we'll have affected the positioning of the following glyphs in the run, at least. Given the comments in bug 436123 that seem to indicate people have seen various numbers of assertions in this testcase, presumably depending on platform/version issues, I think we should just change the annotation there from asserts(2) to asserts(0-2).

Jonathan Kew [:jfkthame]

Assignee

Comment 33

•

11 years ago

Attached patch testcase 398332-3.html might not trigger assertions — Details — Splinter Review

This just marks the testcase with asserts(0-2). I suppose we could be more precise, and mark it as asserting everywhere except Linux, but I suspect that whether the assertions are triggered is not really dependent on the platform as such, but more likely on exactly what fonts/metrics are involved. The real solution is to fix the underlying bug (436123) so that the assertion never happens anywhere.

Attachment #830776 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

11 years ago

Attachment #830776 - Flags: review?(roc) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 34

•

11 years ago

Relanded, with the manifest fixes that should keep tests green this time: https://hg.mozilla.org/integration/mozilla-inbound/rev/662c12a8cf8b https://hg.mozilla.org/integration/mozilla-inbound/rev/bcbbd944a4a2 https://hg.mozilla.org/integration/mozilla-inbound/rev/30b962e3c46b https://hg.mozilla.org/integration/mozilla-inbound/rev/f0d6d67fbc1e

Mats Palmgren (inactive)

Comment 35

•

11 years ago

FYI, one of the tests fails on OSX 10.8: REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/layout/reftests/text/control-chars-01c.html | image comparison (!=)

Ed Morley [:emorley]

Comment 36

•

11 years ago

Comment on attachment 830712 [details] [diff] [review] reftest for display of stray control characters This part backed out for reftest failures: https://tbpl.mozilla.org/php/getParsedLog.php?id=30470592&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=30469983&tree=Mozilla-Inbound remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d6348e85016f

Attachment #830712 - Flags: checkin-

Jonathan Kew [:jfkthame]

Assignee

Comment 37

•

11 years ago

So the old tests are now fine, but one of the new ones misbehaved. :( I wonder why it's platform-specific. Marking the bug as [leave open] to investigate and reland the tests.

Whiteboard: [leave open]

Alistair Vining

Comment 38

•

11 years ago

Vertical tab and form feed used to show as boxes, leading to bug 106311, but it has been fixed by (?accidental) changes for years without being closed. When this bug reaches a satisfactory end, it would be a good time to close bug 106311, too. The original fontfont.de website has changed and the page is not on archive.org, but I think it had vertical tab at the start of every paragraph or perhaps every (layout) table row, and looked a mess.

Jonathan Kew [:jfkthame]

Assignee

Comment 39

•

11 years ago

Attached patch treat U+007F as a control, not a printable character. — Details — Splinter Review

The problem with testcase 1c on OS X happened because we fail to treat U+007F (DEL) as a control character in gfxFontGroup::IsInvalidChar; instead, it gets handled as though it were a printable character. That's seems wrong - and leads to platform/available-font-dependent behavior; OS X happens to have a font where this character code maps to a zero-width invisible glyph, and so the testcase failed, but the other platforms it must have generated space and/or a .notdef box, and so the test passed. Instead, we should simply treat it like the rest of the control characters.

Attachment #831003 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

11 years ago

Attachment #831003 - Flags: review?(roc) → review+

Ryan VanderMeulen [:RyanVM]

Comment 40

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/662c12a8cf8b https://hg.mozilla.org/mozilla-central/rev/bcbbd944a4a2 https://hg.mozilla.org/mozilla-central/rev/30b962e3c46b

Jonathan Kew [:jfkthame]

Assignee

Comment 41

•

11 years ago

Pushed follow-up fix (see comment 39), and relanded tests: https://hg.mozilla.org/integration/mozilla-inbound/rev/1cb4343bea0d https://hg.mozilla.org/integration/mozilla-inbound/rev/f8a78d1723d8

Whiteboard: [leave open]

Jonathan Kew [:jfkthame]

Assignee

Updated

•

11 years ago

Attachment #830712 - Flags: checkin-

Ryan VanderMeulen [:RyanVM]

Comment 42

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/1cb4343bea0d https://hg.mozilla.org/mozilla-central/rev/f8a78d1723d8

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Alice0775 White

Updated

•

11 years ago

Depends on: 941940

Fanolian

Updated

•

11 years ago

Depends on: 943334

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Blocks: 757521

Chris Peterson [:cpeterson]

Updated

•

11 years ago

Depends on: 947588

Mats Palmgren (inactive)

Updated

•

11 years ago

Depends on: 963252

Ryan VanderMeulen [:RyanVM]

Comment 43

•

11 years ago

Backed out from Fx28 in bug 947588.

status-firefox27: --- → affected

status-firefox28: --- → affected

status-firefox29: --- → fixed

Ryan VanderMeulen [:RyanVM]

Comment 44

•

11 years ago

This was backed out from Firefox 28 and Firefox 29 for causing bug 947588. The fix for that bug is in Firefox 30 but was deemed too risky to backport.

status-firefox22: affected → wontfix

status-firefox23: affected → wontfix

status-firefox24: affected → wontfix

status-firefox25: affected → wontfix

status-firefox26: affected → wontfix

status-firefox27: affected → wontfix

status-firefox28: affected → wontfix

status-firefox29: fixed → wontfix

status-firefox30: --- → fixed

status-firefox-esr17: affected → wontfix

status-firefox-esr24: --- → wontfix

Xidorn Quan [:xidorn] UTC+10

Comment 45

•

9 years ago

> } else if ((ch & 0x7f) < 0x20 || ch == 0x7f) { I wonder why we need to "& 0x7f" here? Shouldn't it simply compare ch with 0x20?

Flags: needinfo?(jfkthame)

Masatoshi Kimura [:emk]

Comment 46

•

9 years ago

It is equivalent to (ch == 0x20 || ch == 0xa0). That is, normal space or no-break space.

Jonathan Kew [:jfkthame]

Assignee

Comment 47

•

9 years ago

(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #45) > > } else if ((ch & 0x7f) < 0x20 || ch == 0x7f) { > I wonder why we need to "& 0x7f" here? Shouldn't it simply compare ch with > 0x20? We also want to catch the C1 controls 0x80..0x9f. Though on looking at the code now, it does look a bit odd how it's written; it would match the first 32 codepoints of any 128-character Unicode block! Something like (ch & ~0x80) < 0x20 would be more logical. (I suspect it doesn't matter in practice, because this isn't applied to arbitrary Unicode characters, only to the "invalid" character codes that have been identified as terminating a run to be shaped. But it does seem confusing.)

Flags: needinfo?(jfkthame)

Masatoshi Kimura [:emk]

Comment 48

•

9 years ago

The type of ch is uint8_t. Is this really a Unicode character?

Jonathan Kew [:jfkthame]

Assignee

Comment 49

•

9 years ago

(In reply to Masatoshi Kimura [:emk] from comment #48) > The type of ch is uint8_t. Is this really a Unicode character? All this has moved somewhat since the patch here, so maybe we're not looking at the same code. In current m-c code, we have an expression like this in gfxFontGroup::IsInvalidChar(uint8_t ch) where it is indeed an 8-bit value, so there's no issue there. But there's also similar code in the static bool IsInvalidControlChar(uint32_t aCh) helper (see gfxFont.cpp), which I think is where the original fragment Xidorn quoted has moved; and that is used by both 8-bit and 16-bit codepaths.

Alice0775 White

Updated

•

9 years ago

Depends on: 1234039

test.html 12 years ago Stefan 972 bytes, text/html		Details
screenshot viewsource and indpector on ubuntu 12 years ago Alice0775 White 518.55 KB, image/png		Details
render stray control characters as hexboxes instead of invisible 12 years ago Jonathan Kew [:jfkthame] 1.97 KB, patch	roc : review+	Details \| Diff \| Splinter Review
ctrl-chars.png 11 years ago Jonathan Kew [:jfkthame] 209.43 KB, image/png		Details
reftest for display of stray control characters 11 years ago Jonathan Kew [:jfkthame] 2.79 KB, patch	roc : review+	Details \| Diff \| Splinter Review
load testcase 388367-1 as a crashtest rather than reftest 11 years ago Jonathan Kew [:jfkthame] 1.92 KB, patch	roc : review+	Details \| Diff \| Splinter Review
testcase 398332-3.html might not trigger assertions 11 years ago Jonathan Kew [:jfkthame] 967 bytes, patch	roc : review+	Details \| Diff \| Splinter Review
treat U+007F as a control, not a printable character. 11 years ago Jonathan Kew [:jfkthame] 1.17 KB, patch	roc : review+	Details \| Diff \| Splinter Review