Closed Bug 1642081 Opened 5 years ago Closed 5 years ago

error: array subscript is of type 'char'

Categories

(Core :: General, defect)

76 Branch
defect

Tracking

()

RESOLVED INVALID

People

(Reporter: roland.illig, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36

Steps to reproduce:

build firefox 76.0.1 with GCC's compiler option -Werror=char-subscripts

Actual results:

The build failed with several error messages. The affected files are:

gfx/cairo/cairo/src/cairo-type1-subset.c
media/libtheora/lib/info.c
media/libvorbis/lib/vorbis_info.c
security/nss/lib/freebl/mpi/mpi.c

Expected results:

The build succeeds since all places where functions from <ctype.h> are called are safe and do not invoke undefined behavior, even if they are called with non-ASCII characters.

References: The C standard (any version), 7.4p1.

Hi,
I don't have the technical knowledge to confirm this issue.
I'll add product and component so the team can make some research on it.
Hopefully, someone with a more deep understanding of this matter can help. Feel free to route this ticket to the corresponding team.

Regards,
Jerónimo.

Component: Untriaged → General
Flags: needinfo?(overholt)
Product: Firefox → Core

I guess we'd ideally have 4 separate bugs for the various errors but breaking this into 4 is perhaps a bit much to ask :)

Nathan, do you have any suggestions as to where this bug should live? Build system?

Flags: needinfo?(overholt) → needinfo?(nfroyd)

In https://github.com/NetBSD/pkgsrc/commit/c07103795ee1322602887fd3af497b9028a6bc19, the work is already split into 4 pieces, so it wouldn't involve much work to split the issue. In the end, I think it is most efficient if one person just handles all 4 patches at the same time.

I don't know how you manage patches for third-party software. If it's simply to spread the 4 patch files to the appropriate subdirectories, that should be an easy task.

(In reply to Andrew Overholt [:overholt] from comment #3)

I guess we'd ideally have 4 separate bugs for the various errors but breaking this into 4 is perhaps a bit much to ask :)

Nathan, do you have any suggestions as to where this bug should live? Build system?

The patches would need to be split up across three separate components: Graphics, Media, and NSS.

It's not clear to me that this is a Firefox problem, though. The commit message suggests that the values are being used directly in some sort of macro, and looking at NetBSD's <ctype.h> and its additional <sys/ctype_inline.h>:

https://github.com/NetBSD/src/blob/trunk/include/ctype.h
https://github.com/NetBSD/src/blob/trunk/sys/sys/ctype_inline.h

that suggestion would appear to be correct. In which case the headers are wrong; the macros should be casting the argument to int so as to represent the argument promotion that the C standard would require if you were calling the out-of-line function. See e.g. the OpenBSD <ctype.h>, which accomplishes this via inline functions, or the glibc <ctype.h>:

https://github.com/openbsd/src/blob/master/include/ctype.h
https://sourceware.org/git/?p=glibc.git;a=blob;f=ctype/ctype.h;h=351495aa4feaf23993fe65afc0760615268d044e;hb=HEAD#l63 and the following lines

Flags: needinfo?(nfroyd)

I'm going to close this as not a Firefox bug.

Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID

The NetBSD headers are not wrong. They do exactly what is mandated by the C standard.

Casting to int is completely wrong, as detailed in https://stackoverflow.com/a/60696378.

The OpenBSD header is, well, interesting. In an ISO 8859-1 locale and with char having the same representation as signed char, it returns a wrong result for U+00FF when the function is called with ((signed char) 255). This is because that value is converted to -1, and isalpha wrongly thinks this would mean EOF.

The glibc header fixes the underlying problem by converting the undefined behavior to well-defined behavior. It has the same problem as the OpenBSD header though, in that it doesn't classify U+00FF as a letter.

Closing this issue because it is "not a Firefox bug" ignores the problem. Firefox packages these third-party libraries, therefore these form part of the Firefox code base.

How are the NetBSD headers not wrong? They are choosing to implement things that are clearly defined as functions that take an int in the C standard as macros, which completely leaks the implementation detail that they're using a subscript expression.
It seems to me that if anything should be doing an (unsigned char) cast, it's the macro themselves. Or they should be replaced with inline functions.

The NetBSD headers implement exactly what is required by the C standard in section 7.4.
Leaking the implementation detail of an array subscript is allowed, that's why the C standard explicitly restricts the valid arguments in 7.4p1.
Leaking this implmentation detail, combined with the GCC warning, is an effective way of detecting code that doesn't adhere to the C standard and thus invokes undefined behavior. That's something the other ctype.h implementations (OpenBSD, glibc) don't do.
The C standard says in 7.1.4p1: "Any function declared in a header may be additionally implemented as a function-like macro".
Implementing these functions as macros is a natural choice because they typically reduce to a few instructions only, avoiding the large cost of a function call, especially in tight loops during parsing.

(In reply to Roland Illig from comment #9)

Leaking this implmentation detail, combined with the GCC warning, is an effective way of detecting code that doesn't adhere to the C standard and thus invokes undefined behavior. That's something the other ctype.h implementations (OpenBSD, glibc) don't do.
The C standard says in 7.1.4p1: "Any function declared in a header may be additionally implemented as a function-like macro".

Yes, but it's also implied that the function-like macro has to act as if the function was still being called. Which implies the conversion to int.

I guess the counterargument is that if the program rigorously follows the spec, the conversion to int is pointless, so why bother?

I checked Apple's headers, musl, and MSVC's headers; MSVC will at least check for invalid input in debug mode (I don't know offhand whether char is signed or not on Windows), but Apple does some kind of masking thing. Even musl masks!

(In reply to Nathan Froyd [:froydnj] from comment #10)

(In reply to Roland Illig from comment #9)

The C standard says in 7.1.4p1: "Any function declared in a header may be additionally implemented as a function-like macro".

Yes, but it's also implied that the function-like macro has to act as if the function was still being called. Which implies the conversion to int.

The "which implies" may be wrong here. Since values outside "EOF || unsigned char" invoke undefined behavior, the macro only has to work for the valid argument values, which it does.

I checked Apple's headers, musl, and MSVC's headers; MSVC will at least check for invalid input in debug mode (I don't know offhand whether char is signed or not on Windows), but Apple does some kind of masking thing. Even musl masks!

https://git.musl-libc.org/cgit/musl/tree/include/ctype.h

musl does not use an array for implementing the ctype.h functions, so why bother.

https://docs.microsoft.com/de-de/cpp/c-runtime-library/is-isw-routines?view=vs-2019

The warning explicitly says "don't use char as argument type", and the Firefox code does exactly this. Therefore the code should be fixed.

You need to log in before you can comment on or make changes to this bug.