eval doesn't handle \u2028 (LINE SEPARATOR) and \u2029 the same as JSON.parse

RESOLVED FIXED in Firefox 3.6a1

Status

()

defect
--
major
RESOLVED FIXED
11 years ago
Last year

People

(Reporter: morac, Assigned: zeniko)

Tracking

({fixed1.9.1, regression})

3.5 Branch
Firefox 3.6a1
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090327 Minefield/3.6a1pre

As of Firefox 3.1b3, the SessionStore API is returning strings containing UTF-8 characters if the session data contains UTF-8 characters, instead of an ASCII string representation.  If this data is then passed back into one of SessionStore API functions an "ILLEGAL VALUE" exception is thrown because the nsISessionStore.idl defines all the parameters as AString and not AUTF8String.

In addition as the sessionstore.js file also now can contain UTF-8 characters.  If this occurs, the sessionstore.js file will fail to be restored at browser start up.

Basically any web site containing UTF-8 characters anywhere in the saved data (including the URL and title) will result in a session that cannot be loaded.

I believe this problem is a result of switching to native JSON (bug 407110) since the problem does not occur in Firefox 3.0.

In Firefox 3.0, an ASCII representation of a UTF-8 character (example "\u2028") is written into sessionstore.js.  In Firefox 3.1b3, the actual UTF-8 character is written into the file.

So as JSON encoding does not convert UTF-8 characters into nice ASCII strings, SessionStore needs to handle this.



Reproducible: Always

Steps to Reproduce:
1. Go to this URL: http://www.semanticuniverse.com/articles-ontology-based-mining-digital-text-%E2%80%A8internet-monitoring-relation-investor-relations.html
2. Type the following in the error console:

var ss = Components.classes["@mozilla.org/browser/sessionstore;1"].getService(Components.interfaces.nsISessionStore); ss.setBrowserState(ss.getBrowserState())

3. Set the browser to restore the tabs and windows on startup and then close and open the browser.
Actual Results:  

Because the web page title contains a \u2028 UTF-8 character, the following exception occurs:

Error: uncaught exception: [Exception... "'Illegal value' when calling method: [nsISessionStore::setBrowserState]"  nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)"  location: "JS frame :: javascript:%20var%20ss%20=%20Components.classes["@mozilla.org/browser/sessionstore;1"].getService(Components.interfaces.nsISessionStore);%20ss.setBrowserState(ss.getBrowserState()) :: <TOP_LEVEL> :: line 1"  data: no]


When starting the browser a "SessionStartup: The session file is invalid: SyntaxError: unterminated string literal" message displays in the Error Console.


Expected Results:  

No exception should occur and the data should be passed to setBrowserState().

Web page should be restored on browser start up.



I'm curious as to why UTF-8 characters are even showing up in the data returned from the SessionStore functions since the return value is supposed to be an AString.
This means that we either have to special case this character (and possibly others) or always try to use JSON.parse first and only when it throws fall back to evalInSandbox for backwards compatibility.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
OS: Windows XP → All
Hardware: x86 → All
Summary: Sessions containing UTF-8 characters cannot be restored. → eval doesn't handle \u2028 (LINE SEPARATOR) the same as JSON.parse
Version: unspecified → 3.1 Branch
Flags: wanted-firefox3.5?
I looked into this more and it appears the problem here is that when a \u2028 is in a string, javascript just dumps the \u2028 as well as everything after it.  This results in an unterminated string.  So both JSON.parse and evalInSandbox will fail on it as can be seen from these two errors that show up when restarting Firefox while on a "bad" page:

SessionStartup: The session file is invalid: SyntaxError: unterminated string literal
SessionStore: The session file is invalid: SyntaxError: unterminated string literal

The reason I'm getting an "ILLEGAL VALUE" exception when calling the APIs is that an unterminated string is obviously not an AString.  In fact if I try to load one of the "bad" sessions, that I saved in Firefox 3.1b3 into Firefox 3.0, I get the same exceptions.

So simply trying to do a evalInSandbox instead of a JSON.parse won't fix this problem.  

Either the SessionStore should strip out these kind of characters from the data or the session data should be UTF8 encoded before written to disk and returned via the API calls.  An alternative would be to UTF8 encode it when reading from disk.  This should have less of a performance hit, but that won't help with the API calls just fails before SessionStore even runs if the data is "bad".


In the mean time, in my code I worked around the problem by doing a UTF8 encode before passing any session data to SessionStore.  That way the data is a valid AString.  I'm not sure how much of a performance hit this would cause though with large sessions.
In the above I meant to say a \u2028 character.  Obviously the text "\u2028" is okay.
(In reply to comment #2)
> I looked into this more and it appears the problem here is that when a \u2028
> is in a string, javascript just dumps the \u2028 as well as everything after
> it.

Only when eval'ing it, as \u2028 is treated the same as \u0010: a new line. JSON.parse doesn't (and IMO rightly so), at least in my tests.

> or the session data should be UTF8 encoded before written to disk

Note: All session data is already UTF-8 encoded when written to disk and decoded when read back (and always has been).

> In the mean time, in my code I worked around the problem by doing a UTF8 encode
> before passing any session data to SessionStore.  That way the data is a valid
> AString.

It'll be a different string, though. I'd recommend to rather just strip \u2028 and \u2029 from the string (which are the only two characters causing this issue).
Depends on: 387859
Summary: eval doesn't handle \u2028 (LINE SEPARATOR) the same as JSON.parse → eval doesn't handle \u2028 (LINE SEPARATOR) and \u2029 the same as JSON.parse
Posted patch wall-paper for Firefox 3.5 (obsolete) — Splinter Review
The correct fix for this is fixing bug 387859. Since that might be too late for Firefox 3.5 (as it would involve some of our API to become stricter in what it accepts), we'll likely have to go with this wall-paper patch instead which just makes sure that both offending characters aren't left as-is for evalInSandbox to trip over them.
Assignee: nobody → zeniko
Status: NEW → ASSIGNED
Attachment #369776 - Flags: review?(dietrich)
Comment on attachment 369776 [details] [diff] [review]
wall-paper for Firefox 3.5

r=me. a concern is that this work is usually not required. maybe could test() first, and only replace if it returns true?
Attachment #369776 - Flags: review?(dietrich) → review+
Posted patch for check-inSplinter Review
Attachment #369776 - Attachment is obsolete: true
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/0233d2bb8a07
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 3.6a1
Backed out; the push this was part of seems to be causing random leak orange on Mac.
Status: RESOLVED → REOPENED
Flags: in-testsuite+ → in-testsuite?
Resolution: FIXED → ---
Looks like the random leak orange is still happening, so this can reland as desired...
Actually, scratch that.  This orange is on Linux, whereas the orange from the push this landed as part of was on Mac.  I recommend landing this separately from the other 4 patches that landed with it.
The leak exists on 1.9.1, so this didn't cause it.
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/7b69f5a05b8d
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: in-testsuite? → in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
Attachment #370447 - Flags: approval1.9.1?
Comment on attachment 370447 [details] [diff] [review]
for check-in

a191=beltzner
Attachment #370447 - Flags: approval1.9.1? → approval1.9.1+
Flags: wanted-firefox3.5?
Keywords: checkin-needed
This doesn't work anymore. If HTML contains that character - firefox does not show it for chars u2028 and u2029.

Steps:
1. Please open in Chrome and Firefox the same page
https://unicode-table.com/ru/2029/
Actual Result:
Firefox does not show that characters.
Expected result: 
Firefox shows that char as Chrome does.

Thanks.
You need to log in before you can comment on or make changes to this bug.