Closed Bug 838954 Opened 7 years ago Closed 3 years ago

Newlines lost in .innerHTML serialization of <pre> when first character is a newline

Categories

(Core :: DOM: Core & HTML, defect)

18 Branch
x86_64
Linux
defect
Not set

Tracking

()

RESOLVED INVALID
Tracking Status
firefox51 --- fixed
firefox52 --- wontfix

People

(Reporter: roan.kattouw, Assigned: jdai)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tw-dom])

Attachments

(2 files, 4 obsolete files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0
Build ID: 20130201190337

Steps to reproduce:

See the following JS console session:
>>> div = document.createElement('div')
<div>

>>> div.innerHTML = '<pre>\n\n\nFoo</pre>'
"<pre>


Foo</pre>"

>>> div.firstChild.firstChild
<TextNode textContent="\n\nFoo">

>>> div.innerHTML
"<pre>

Foo</pre>"

>>> div2 = document.createElement('div')
<div>

>>> div2.innerHTML = div.innerHTML
"<pre>

Foo</pre>"

>>> div2.firstChild.firstChild
<TextNode textContent="\nFoo">


Actual results:

When parsing the <pre>, the first newline is stripped and the resulting TextNode has two newlines. This is correct behavior: the first newline in a <pre> is ignored.

However, when the <pre> is serialized back to HTML, Firefox outputs two newlines, rather than three. This is bad, because parsing it back to DOM again will result in a TextNode with one newline as demonstrated. This means parsing then serializing does not round-trip cleanly, which is bad.

For a more amusing display of this, open up a console and run the following:
div = document.createElement('div');
div.innerHTML = '<pre>\n\n\n\n\nFoo</pre>';
div.innerHTML = div.innerHTML;
div.innerHTML = div.innerHTML; // and repeat a few more times


Expected results:

Firefox should follow the spec for the HTML fragment serialization algorithm, see http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm . Step 2.2 states "If current node is a pre, textarea, or listing element, and the first child node of the element, if any, is a Text node whose character data has as its first character a U+000A LINE FEED (LF) character, then append a U+000A LINE FEED (LF) character." The author of the spec intended this specifically to prevent data loss, see https://www.w3.org/Bugs/Public/show_bug.cgi?id=14833#c1 .

In practical terms, when I set div.innerHTML = '<pre>\n\n\nFoo</pre>'; then read div.innerHTML , its value should be '<pre>\n\n\nFoo</pre>' rather than '<pre>\n\nFoo</pre>' .

This bug is not specific to Firefox, the same broken behavior occurs in Chrome and Internet Explorer as well.
Component: Untriaged → DOM: Core & HTML
Product: Firefox → Core
Confirming, though the observation that Chrome and IE don't add a new line is troubling. Fixing this might be a compat risk.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Smaug, want to just fix this in the innerHTML serializer?
Yup, if we want to fix this.
Whiteboard: [tw-dom]
Duplicate of this bug: 1259723
Assignee: nobody → jdai
Attachment #8777639 - Flags: review?(bugs)
Attachment #8777639 - Flags: review?(bugs) → review+
Comment on attachment 8777640 [details] [diff] [review]
Bug 838954 - Add a newline in .innerHTML serialization of <pre> when first character is a newline.

>+++ b/dom/base/nsContentUtils.cpp
>@@ -8693,32 +8693,48 @@ public:
>   bool ToString(nsAString& aOut)
>   {
>     if (!aOut.SetCapacity(mLength, fallible)) {
>       return false;
>     }
> 
>     for (StringBuilder* current = this; current; current = current->mNext) {
>       uint32_t len = current->mUnits.Length();
>+      bool isAppendLFChar = false;
>+      bool isStartTag = false;
>       for (uint32_t i = 0; i < len; ++i) {
>         Unit& u = current->mUnits[i];
>         switch (u.mType) {
>           case Unit::eAtom:
>+            if (isStartTag &&
>+               (u.mAtom == nsGkAtoms::pre ||
>+                u.mAtom == nsGkAtoms::textarea ||
>+                u.mAtom == nsGkAtoms::listing)) {
>+              isAppendLFChar = true;
>+            }
>+            isStartTag = false;
>             aOut.Append(nsDependentAtomString(u.mAtom));
>             break;
>           case Unit::eString:
>             aOut.Append(*(u.mString));
>             break;
>           case Unit::eStringWithEncode:
>             EncodeAttrString(*(u.mString), aOut);
>             break;
>           case Unit::eLiteral:
>+            if (!nsCRT::strcmp(u.mLiteral, "<")) {
>+              isStartTag = true;
>+            }
>             aOut.AppendASCII(u.mLiteral, u.mLength);
>             break;
>           case Unit::eTextFragment:
>+            if (isAppendLFChar && u.mTextFragment->CharAt(0) == '\n') {
>+              aOut.AppendLiteral("\n");
>+            }
>+            isAppendLFChar = false;
So there is existing preliminary code for this bug just few lines below in
StartElement() and that is where the fix should happen.
StringBuilder doesn't really have this kind of special cases.

But, given that this is very regression risky, even if we try this, we need some pref around that code.
Use AddBoolVarCache somewhere and then in the
  if (aContent->IsHTMLElement()) {
    if (localName == nsGkAtoms::pre || localName == nsGkAtoms::textarea ||
        localName == nsGkAtoms::listing) {
add a check also for the bool pref. We could try to enable the pref for some time in Nightly only.
And if we don't get complains ask other browsers to fix this too. If there are regressions, we just say this isn't web compatible and the
spec must be changed (reopen https://github.com/whatwg/html/issues/944 ).
Attachment #8777640 - Flags: review?(bugs) → review-
Comment on attachment 8778845 [details] [diff] [review]
Bug 838954 - Add a newline in .innerHTML serialization of <pre> when first character is a newline. v2

>--- a/dom/base/nsContentUtils.cpp
>+++ b/dom/base/nsContentUtils.cpp
>@@ -274,16 +274,17 @@ bool nsContentUtils::sIsFrameTimingPrefEnabled = false;
> bool nsContentUtils::sIsPerformanceTimingEnabled = false;
> bool nsContentUtils::sIsResourceTimingEnabled = false;
> bool nsContentUtils::sIsUserTimingLoggingEnabled = false;
> bool nsContentUtils::sIsExperimentalAutocompleteEnabled = false;
> bool nsContentUtils::sEncodeDecodeURLHash = false;
> bool nsContentUtils::sGettersDecodeURLHash = false;
> bool nsContentUtils::sPrivacyResistFingerprinting = false;
> bool nsContentUtils::sSendPerformanceTimingNotifications = false;
>+bool nsContentUtils::sIsSerializeAppendLF = false;
Maybe call this just sAppendLFInSerialization

>+  static bool IsSerializeAppendLF()
And this then 
static bool AppendLFInSerialization()
Attachment #8778845 - Flags: review?(bugs) → review+
Attachment #8777639 - Attachment description: Bug 838954 - Remove redundant trailing spaces. → Bug 838954 - Remove redundant trailing spaces. r=smaug
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fc593dd9a280
Remove redundant trailing spaces. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/b80d560e0102
Add a newline in .innerHTML serialization of <pre> when first character is a newline. r=smaug
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/fc593dd9a280
https://hg.mozilla.org/mozilla-central/rev/b80d560e0102
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
I don't know if this is the right place to write this but that has an impact on CKEditor (cf. http://dev.ckeditor.com/ticket/14814#ticket) and, indirectly, on sites that use it (e.g. MDN)
Commented in https://github.com/whatwg/html/issues/944
We may need to change the spec and backout the patch.
John, do you think you could back this out. If we try this again, we should do it when other browsers are ready for it too, so that not only FF is broken on some web sites.

(Or, we just back out the spec change and the patch here and mark this bug wontfix. Let's see how the spec bug goes)
Flags: needinfo?(jdai)
Flags: needinfo?(jdai)
Hi sheriff,
Could you help me back this patches out? Thank you.
Status: RESOLVED → REOPENED
Flags: needinfo?(wkocher)
Resolution: FIXED → ---
I had to backout the backouts because it caused web-platform linter failures: https://treeherder.mozilla.org/logviewer.html#?job_id=3619162&repo=mozilla-aurora

https://hg.mozilla.org/integration/mozilla-inbound/rev/0cd2fc866f12
https://hg.mozilla.org/releases/mozilla-aurora/rev/573d14896134

I'm not sure what the issue is. Maybe I didn't get the backout exactly right? 
John, could you post a patch that'll back out the patches correctly?
James, could you explain what that linter failure means?
Flags: needinfo?(jdai)
Flags: needinfo?(james)
Attachment #8793692 - Flags: review?(bugs) → review+
Attachment #8793692 - Attachment description: Bug 838954 - Backout newlines lost in .innerHTML serialization of <pre> when first character is a newline. → Bug 838954 - Backout newlines lost in .innerHTML serialization of <pre> when first character is a newline. r=smaug
Keywords: checkin-needed
Flags: needinfo?(james)
Keywords: leave-open
Target Milestone: mozilla51 → ---
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8224271b1f3
Backout newlines lost in .innerHTML serialization of <pre> when first character is a newline. r=smaug
Keywords: checkin-needed
Comment on attachment 8793692 [details] [diff] [review]
Bug 838954 - Backout newlines lost in .innerHTML serialization of <pre> when first character is a newline. r=smaug

Approval Request Comment
[Feature/regressing bug #]:bug 838954 introduced this bug.
[User impact if declined]: It only affects Nightly user.
[Describe test coverage new/current, TreeHerder]:The patch has an automated test in dom/base/test/test_bug744830.html.
[Risks and why]: Extremely low. This is a back-out patch.
[String/UUID change made/needed]: none.

Try result:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=029b8b7e37c7462cbee252704a70a5445455da8b&filter-tier=1
Attachment #8793692 - Flags: approval-mozilla-aurora?
Hi Sheriff,
This bug has been waiting aurora approval for a long time. Is there any concerns to hold up uplift to aurora? Or did I miss anything? Thank you.
Flags: needinfo?(wkocher)
Flags: needinfo?(cbook)
Hi John,

we sheriffs uplift patches but we don't approve them for aurora,beta etc - thats done by relman. Redirecting your needinfo to ritu and sylvestre from the relman team :)
Flags: needinfo?(wkocher)
Flags: needinfo?(sledru)
Flags: needinfo?(rkothari)
Flags: needinfo?(cbook)
(In reply to Carsten Book [:Tomcat] from comment #29)
> Hi John,
> 
> we sheriffs uplift patches but we don't approve them for aurora,beta etc -
> thats done by relman. Redirecting your needinfo to ritu and sylvestre from
> the relman team :)

Thanks for helping me redirect to correct people. :)
Comment on attachment 8793692 [details] [diff] [review]
Bug 838954 - Backout newlines lost in .innerHTML serialization of <pre> when first character is a newline. r=smaug

Patch back out from Aurora51 (perhaps we will be trying to fix it differently).

GChang (release owner of 51), just fyi.
Flags: needinfo?(rkothari) → needinfo?(gchang)
Attachment #8793692 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
has problems uplifting to aurora:

grafting 365875:b8224271b1f3 "Bug 838954 - Backout newlines lost in .innerHTML serialization of <pre> when first character is a newline. r=smaug"
merging dom/base/nsContentUtils.cpp
merging dom/base/nsContentUtils.h
merging modules/libpref/init/all.js
merging testing/web-platform/meta/MANIFEST.json
warning: conflicts while merging modules/libpref/init/all.js! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(jdai)
Flags: needinfo?(gchang)
Too late for firefox 52, mass-wontfix.
Dropping newlines is now as per the HTML Standard. Closing this.
Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → INVALID
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.