Closed Bug 1505129 Opened 5 years ago Closed 3 years ago

js::PrintError should probably print its data as UTF-8

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
firefox65 --- wontfix
firefox78 --- fixed

People

(Reporter: Waldo, Assigned: ptomato)

References

Details

Attachments

(1 file)

[jwalden@find-waldo-now src]$ cat ~/moz/inflight/tc-utf8/burrito-1F32F.js 
function f() {
  var x = `
🌯🌯🌯🌯🌯🌯`; 🌯; } f();
[jwalden@find-waldo-now src]$ dbg/js/src/js -f ~/moz/inflight/tc-utf8/burrito-1F32F.js 
/home/jwalden/moz/inflight/tc-utf8/burrito-1F32F.js:3:15 SyntaxError: illegal character:
/home/jwalden/moz/inflight/tc-utf8/burrito-1F32F.js:3:15 </</</</</</`; </; } f();
/home/jwalden/moz/inflight/tc-utf8/burrito-1F32F.js:3:15 ...............^

The reason for the </ goo is, unsurprisingly, that the UTF-16 code units are getting cast-truncated to char when individually fputc'd by PrintErrorLine, called by js::PrintError:

491	        for (size_t i = 0; i < n; i++) {
(rr) n
492	            fputc(static_cast<char>(linebuf[i]), file);
(rr) n
<491	        for (size_t i = 0; i < n; i++) {
(rr) n
492	            fputc(static_cast<char>(linebuf[i]), file);
(rr) n
/491	        for (size_t i = 0; i < n; i++) {

(note the '<' and '/' at the starts of the lines) with context being

(rr) bt 5
#0  PrintErrorLine (file=0x7f6920f25680 <_IO_2_1_stderr_>, prefix=0x7f69127c46a0 "/home/jwalden/moz/inflight/tc-utf8/burrito-1F32F.js:3:27 ", report=0x7f69126bc7e0) at /home/jwalden/moz/after/js/src/vm/JSContext.cpp:491
#1  0x00005574c5361c7e in PrintSingleError<JSErrorReport> (cx=0x7f6913618000, file=0x7f6920f25680 <_IO_2_1_stderr_>, toStringResult=..., report=0x7f69126bc7e0, kind=PrintErrorKind::Error) at /home/jwalden/moz/after/js/src/vm/JSContext.cpp:577
#2  0x00005574c53617f2 in js::PrintError (cx=0x7f6913618000, file=0x7f6920f25680 <_IO_2_1_stderr_>, toStringResult=..., report=0x7f69126bc7e0, reportWarnings=false) at /home/jwalden/moz/after/js/src/vm/JSContext.cpp:603
#3  0x00005574c4f1c5b1 in js::shell::AutoReportException::~AutoReportException (this=0x7ffd78921210) at /home/jwalden/moz/after/js/src/shell/js.cpp:9114
#4  0x00005574c4f22873 in Shell (cx=0x7f6913618000, op=0x7ffd78921418, envp=0x7ffd78921658) at /home/jwalden/moz/after/js/src/shell/js.cpp:10537
(More stack frames follow...)

We could just change this to print out UTF-8, but sadly at the moment js::PrintError doesn't print all data in wholly consistent encoding -- earlier in the line a file name is printed out, and at present file names are not consistently encoded as UTF-8.

So, this probably requires multiple steps of work beyond just changing the aforementioned loop to print UTF-16 code points, not truncated code units.
(In reply to Jeff Walden [:Waldo] from comment #0)
> [...] at present file names are not consistently encoded as UTF-8.

Marking as depends on bug 1492090.
Depends on: 1492090
(In reply to André Bargull [:anba] from comment #1)
> (In reply to Jeff Walden [:Waldo] from comment #0)
> > [...] at present file names are not consistently encoded as UTF-8.
> 
> Marking as depends on bug 1492090.

Huh, nice find!  Maybe someone should review that patchwork.
Priority: -- → P3
Fission Milestone: --- → ?

[Tracking Requested - why for this release]:

Release Note Request (optional, but appreciated)
[Why is this notable]:
[Affects Firefox for Android]:
[Suggested wording]:
[Links (documentation, blog post, etc)]:

Tracking and relnote flag nominations require a comment. Resetting.

Fission Milestone: ? → ---

We could just change this to print out UTF-8, but sadly at the moment js::PrintError doesn't print all data in wholly consistent encoding -- earlier in the line a file name is printed out, and at present file names are not consistently encoded as UTF-8.

So, this probably requires multiple steps of work beyond just changing the aforementioned loop to print UTF-16 code points, not truncated code units.

Given that the file name stuff is probably fixed by bug 1492090, which has patches waiting for review already, would it be reasonable to just fix the loop here?

This would be nicer if we changed the loop that prints tokenOffset()
number of '.' characters, to take into account the visual column width
of the code points that it's supposed to line up with. But I don't see
any readily available function for that.

Depends on D73519

Assignee: nobody → philip.chimento
Status: NEW → ASSIGNED
Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/faeed99f813e
Print code points in PrintErrorLine. r=jwalden

I've updated the patches from this bug and bug 1506323 to fix the build failures. Here's a retry of the jobs that failed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=880f1222a163c8cbd31c1ebf88a25603755990b4

Flags: needinfo?(philip.chimento)
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/96a3f8a62db6
Print code points in PrintErrorLine. r=jwalden
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78
You need to log in before you can comment on or make changes to this bug.