Closed Bug 1556381 Opened 5 years ago Closed 5 years ago

Textual diffs containing non-UTF-8 text cannot be represented on Phabricator

Categories

(Conduit :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: hsivonen, Unassigned)

References

Details

It appears that Lando silently replaces bytes that are invalid UTF-8 with the REPLACEMENT CHARACTER. This is bad when landing test cases that try to test non-UTF-8 encodings.

Here's a changeset landed with Lando:
https://hg.mozilla.org/integration/autoland/rev/f593045cc48f

Here's the same changeset without data loss landed by pushing to inbound directly:
https://hg.mozilla.org/integration/mozilla-inbound/rev/76dbcd8529c4

(Use e.g. curl to pipe to less to see the difference. The two pages look the same in Firefox.)

(In reply to Henri Sivonen (:hsivonen) from comment #0)

It appears that Lando silently replaces bytes that are invalid UTF-8 with the REPLACEMENT CHARACTER.

This isn't Lando / transplant's doing, the problem is with Phabricator and the tool used to upload the diff (in this case, phlay).

From https://secure.phabricator.com/book/phabricator/article/utf8/:

Phabricator stores all internal text data as UTF-8, processes all text data as UTF-8, outputs in UTF-8, and expects all inputs to be UTF-8. Principally, this means that you should write your source code in UTF-8. In most cases this does not require you to change anything, because ASCII text is a subset of UTF-8.

I believe we might be able to work around this by treating files that aren't valid UTF-8 as binary when uploading. The reviewing experience will be poor, requiring the reviewer to inspect the file outside the browser, but I think it should make it through Lando / Transplant properly.

Component: Transplant → General

I vaguely remember this coming up in the past. :zalun, do you know if moz-phab is doing the correct thing here?

Flags: needinfo?(pzalewa)
Summary: Lando silently changes non-UTF-8 text → Textual diffs containing non-UTF-8 text cannot be represented on Phabricator
See Also: → 1521980

I think this should be fixed for the beta version of phabsend-moz now, thanks for the reminder smacleod.
(@ probably fails, but differently to phlay, since it's still using the same createrawdiff call as upstream phabsend, which looks okay in the Phab UI but based on that automated failure probably can't be applied by lando)

I vaguely remember this coming up in the past. :zalun, do you know if moz-phab is doing the correct thing here?

moz-phab and arc will mark non-utf8 files as binary (arc with confirmation, moz-phab automatically):

Invalid Content Encoding (Non-UTF8)
This diff includes files which are not valid UTF-8 (they contain invalid byte
sequences). You can either stop this workflow and fix these files, or
continue. If you continue, these files will be marked as binary.
You can learn more about how Phabricator handles character encodings (and how
to configure encoding settings and detect and correct encoding problems) by
reading 'User Guide: UTF-8 and Character Encoding' in the Phabricator
documentation.
    AFFECTED FILES
    modules/freetype2/builds/amiga/README
    modules/freetype2/builds/amiga/include/config/ftconfig.h
    modules/freetype2/builds/amiga/include/config/ftmodule.h
    modules/freetype2/builds/amiga/makefile
    modules/freetype2/builds/amiga/makefile.os4
    modules/freetype2/builds/amiga/smakefile
    modules/freetype2/builds/amiga/src/base/ftdebug.c
    modules/freetype2/builds/amiga/src/base/ftsystem.c
    Do you want to mark these files as binary and continue? Yes
Uploaded binary data for "README".
Uploaded binary data for "README".
Uploaded binary data for "ftconfig.h".
Uploaded binary data for "ftconfig.h".
Uploaded binary data for "ftmodule.h".
Uploaded binary data for "ftmodule.h".
Uploaded binary data for "makefile".
Uploaded binary data for "makefile".
Uploaded binary data for "makefile.os4".
Uploaded binary data for "makefile.os4".
Uploaded binary data for "smakefile".
Uploaded binary data for "smakefile".
Uploaded binary data for "ftdebug.c".
Uploaded binary data for "ftdebug.c".
Uploaded binary data for "ftsystem.c".
Uploaded binary data for "ftsystem.c".
Upload complete.

the files in the above case are latin1 in-tree.

steven - i think this is a WONTFIX: phabricator is UTF8 only, and the official clients do the right thing.

Flags: needinfo?(pzalewa) → needinfo?(smacleod)

(In reply to Byron Jones ‹:glob› 🎈 from comment #4)

steven - i think this is a WONTFIX: phabricator is UTF8 only, and the official clients do the right thing.

Yup.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(smacleod)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.