Closed Bug 844509 Opened 12 years ago Closed 5 years ago

Handle Unicode in config.status


(Firefox Build System :: General, defect)

Not set


(firefox70 fixed)

Tracking Status
firefox70 --- fixed


(Reporter: gps, Assigned: glandium)




(4 files)

As described in bug 784841 comment #265 and later, our Unicode story for config.status is lacking. Notably, Python 3 compatibility is going to be very challenging since we have a lot of code that works in Python 3 mode where strings are Unicode and a lot of (mostly older) code that uses str/bytes.

We should decide whether we want config.status dealing with bytes or Unicode then go from there. There is also a hybrid solution where we could add/modify autoconf macros to state the value is explicitly bytes or Unicode. From there, we could normalize to UTF-8 or something.

If we decide config.status is to be bytes only, we should change config.status.m4 to emit b'' strings so Python 3 works as expected.

This is kind of a wavy hands bug. I don't expect movement on it until we get serious about Python 3 compatibility.
I would love to transition configure to a Python script at some point, which would let us be more explicit about things. We could probably do it piecemeal by replacing just some autoconf macros with Python invocations.
I actually started working on a configure replacement in python, and I'm already thinking about the possibility of an incremental transition.

I think we're likely to switch away from autoconf before we ever switch to python 3, so maybe WONTFIX?
Product: Core → Firefox Build System

Didn't know this bug existed, but the problem is getting fixed as part of bug 1473498.

(In reply to Mike Hommey [:glandium] from comment #2)

I think we're likely to switch away from autoconf before we ever switch to
python 3, so maybe WONTFIX?


Closed: 5 years ago
Resolution: --- → DUPLICATE

Actually, this is not a dupe. This was filed before python configure, and was about config.status before it was python. Which it is not anymore.


Re-actually, there's a reference to this bug in, so let's make this about killing the manual decoding in there.

Assignee: nobody → mh+mozilla
Depends on: 1575135
Resolution: INVALID → ---

The configure sandbox has wrapped subprocess methods to do it since
bug 1520394.

OTOH, and while we're here, none was actually using the right encoding
for this conversion, so fix the configure sandbox to use the right one,
and make it stop using encode(), which does deep recursion that is not
necessary here, and that I'm trying to remove entirely.

Also while here, remove unused import of encode().

Attachment #9086628 - Attachment is obsolete: true
Attachment #9086628 - Attachment is obsolete: false
Attachment #9086628 - Attachment description: Bug 1575135 - Don't encode environment in get_cmd_output and old_configure. → Bug 844509 - Don't encode environment in get_cmd_output and old_configure.

With bug 1575135, we ensure nothing gets out of the configure sandbox
as a bytes string. We can thus now avoid the encode() pass in We still need, however, to normalize the configuration
so that it doesn't contain unexpected types, and conformning to what
indented_repr does to the configuration in config.status.

While here, convert some obj.iteritems() to six.iteritems(obj).
And remove the now unused encode function.

Now that the configuration comes in without bytes strings, there is no
need to convert it anymore.

Now that there is no bytes strings in it, we don't need to store
config.status in the system encoding to keep those valid. Moreover, the
system encoding is lossy, which utf-8 is not.

Blocks: 1575375
Pushed by
Don't encode environment in get_cmd_output and old_configure. r=nalexander
Don't encode the configuration that configure passes to config_status. r=nalexander
Don't create a separate unicode version of the build config. r=nalexander
Always encode config.status as utf-8. r=nalexander
Regressions: 1575774
Regressions: 1575959
You need to log in before you can comment on or make changes to this bug.