Closed Bug 70631 Opened 23 years ago Closed 23 years ago

ANSI C preprocessor gives trigraph warning for Mac '????'

Categories

(SeaMonkey :: Build Config, defect, P3)

defect

Tracking

(Not tracked)

VERIFIED WONTFIX
mozilla0.8.1

People

(Reporter: dr, Assigned: dr)

References

()

Details

Attachments

(1 file)

ANSI C has the concept of a trigraph, which is a three-character code for a
different character which may not be present on some keyboards or systems. The
following are ANSI C trigraphs:

trigraph | char
---------+-----
   ??=   |  #
   ??-   |  ~
   ??(   |  [
   ??)   |  ]
   ??<   |  {
   ??>   |  }
   ??'   |  ^
   ??!   |  |
   ??/   |  \

In modern C and C++ use, these are only ever used to obfuscate code.

The mac has a concept of a four-character code to represent file types/creators.
It is defined as:

  typedef unsigned long                   FourCharCode;
  typedef FourCharCode                    OSType;

One such four-character code is '????', which should be recognized by the mac
compiler at compile-time, after pre-processing. However, the C preprocessor
recognizes the last three characters, ??', as a trigraph representing the
character ^.

This incorrect preprocessing does not result in any errors on other platforms,
since usage of '????' only appears in XP_MAC code (in other words, the
preprocessor tries to turn '????' into '??^ and then deletes that entire block
of code), but the preprocessor does notice that trigraph, which causes a warning:

  dr@midget ~/src/trunk/mozilla/mailnews/mime/src -> make mimemoz2.o
  c++ -o mimemoz2.o [blah blah blah] mimemoz2.cpp
  mimemoz2.cpp:1016:27: warning: trigraph ??' ignored

There are several ways to fix this. One is with macros -- we already have a
FOUR_CHAR_CODE macro for safety, but it doesn't help this particular problem, so
we'd need to come up with a better one. Another is by telling compilers to
ignore trigraphs (gcc -Wno-trigraphs). What should be done?
Relevant comments from 70386:

------- Additional Comments From Simon Fraser 2001-03-01 12:08 -------

Mac uses four character codes for file types/creators, using the Mac type 
'OSType', which is defined:

typedef unsigned long                   FourCharCode;
typedef FourCharCode                    OSType;

and the Mac compilers allow you to fill one of these using a 4-char literal 
enclosed in single quotes:
OSType foo = 'TEXT'
or
OSType bar = '????'

For cross-platform compatibility, these values should really be assigned using a 
macros that works on other platforms (taking endianness into account). The Mac 
headers have a macro for this:

Mac:
#define FOUR_CHAR_CODE(x) (x) 

Windows:
#define FOUR_CHAR_CODE(x) (((unsigned long) ((x) & 0x000000FF)) << 24) \
                        | (((unsigned long) ((x) & 0x0000FF00)) << 8) \
                        | (((unsigned long) ((x) & 0x00FF0000)) >> 8) \
                        | (((unsigned long) ((x) & 0xFF000000)) >> 24)

The code should then read
PRUint32 foo = FOUR_CHAR_CODE('????');


------- Additional Comments From Dan Rosen 2001-03-01 12:31 -------

kick ass. but will those macros still prevent the trigraph-ish misinterpretation
of the ??'s?


------- Additional Comments From Simon Fraser 2001-03-01 13:05 -------

I still don't understand how the linux compiler sees the line containing '????'.


------- Additional Comments From Mike Shaver 2001-03-01 13:28 -------

How does that FOUR_CHAR_CODE macro help? It's going to expand to

(((unsigned long) (('????') & 0x000000FF)) << 24) ...

and you're back to square one.

#define FOUR_CHAR_CODE(a, b, c, d) ((a) << 24 | (b) << 16 | (c) << 8 | (d))

FOUR_CHAR_CODE('?', '?', '?', '?')


------- Additional Comments From Dan Rosen 2001-03-01 13:59 -------

Simon: the compiler doesn't see it, the preprocessor does. The #ifdef XP_MAC
block gets dealt with by the preprocessor at the same time that the trigraph
does -- there's no precedence or ordering as far as I can tell.

Shaver: exactly. We should really have them not in quotes, and have the macros
be something like:

Mac:
#define FOUR_CHAR_CODE(a, b, c, d) 'a##b##c##d'
Other:
#define FOUR_CHAR_CODE(a, b, c, d) ((a) << 24 | (b) << 16 | (c) << 8 | (d))

and call FOUR_CHAR_CODE(?, ?, ?, ?), but that's still unwieldy, since you don't
really want to have to call FOUR_CHAR_CODE(T, E, X, T)... Perhaps since only mac
uses it, you could say:

Mac:
#define FOUR_CHAR_CODE(x) 'x'
Other:
#define FOUR_CHAR_CODE(x) #error blah blah blah

or something...
Blocks: 70386
Status: NEW → ASSIGNED
The offending line in particular is:

  http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimemoz2.cpp#1016

for mimemoz2.cpp's CVS rev 1.135.
On irc, sfraser suggests using 0x3F3F3F3F in place of '????', scc suggests
setting a const:

  const OSType nsUnknownOSType = 0x3F3F3F3F;

I still think it'd probably be more mac-developer-friendly to have some macro
wrapping the text ????, but I can't think of a way to do that that wouldn't
introduce more of these trigraph warnings (since you can't say FOO(????) without
the last three characters being misinterpreted as the trigraph for ']').

The places this occurs in the code, by the way, are:

  mailnews/import/eudora/src/nsEudoraMac.cpp (2 instances)
  mailnews/import/src/ImportOutFile.cpp (1 instance)
  xpinstall/packager/mac/ASEncoder/src/nsAppleSingleEncoder.cpp (1 instance)
  xpinstall/src/nsAppleSingleDecoder.cpp (1 instance)

I still think it's a valid option, though, to just say "-Wno-trigraphs" given
that we know compilers are inconsistent on this matter, that mac developers
really just want to write '????', and that we can expect that nobody is
obfuscating mozilla code with trigraphs.
Summary: ANSI C preprocessor misinterprets Mac code '????' as trigraph → ANSI C preprocessor gives trigraph warning for Mac '????'
------- Additional Comments From ducarroz@netscape.com 2001-03-01 19:22 -------
You can replace '????' by '\077\077\077\077'. Does that will fix your problem?
-
it might fix the problem but it also obfuscates the code. if we did that we'd 
need to use // '????' or something to explain what we're doing, but in that 
process we'd trigger the bug.

Actually, since "????" don't have a trigraph, we can do:
  const OSType nsUnknownOSType = 0x3F3F3F3F; // MacOS unknown type "????"
anyone searching w/ lxr for ???? will find it. still not ideal, but it might be 
the best we can do.
You can't search in lxr for "????" (it has no index-able characters).

I still maintain that we want to be as mac-developer-friendly as possible. It
will be easy to replace all current instances of '????' with a const, but I
don't think anybody doing new development will know about this const.

I also don't believe there's any good reason to be confusing moz code with
trigraphs... So I'm very, very tempted just to turn trigraphs off.
scc says: "trigraphs are standard, mac's usage of '????' is not. we should make
everybody use a const and turn on trigraphs on the mac to force mac developers
use that const."

i'm compelled to agree.

scc: i'll produce a patch if you can flip that switch...
->0.8.1 (can and will fix by 3/14)
Priority: -- → P3
Target Milestone: --- → mozilla0.8.1
I just noticed that there are quite a few instances of unintended trigraphs in
our code. They seem to all be in mailnews, for some reason or another... Besides
the mac '????', there also are instances of things that look like:

  printf("myFunction(%s, ??)", foo);

but ??) is a trigraph meaning ].

I'm tempted to go through the whole mozilla source looking for all unintended
trigraphs, but I fear this would be an enormous waste of five perfectly good
minutes of my life :)
Yes  ;)  Why not just deal with the few warnings, and go and fix something that 
impacts users?
heresy!
So a couple things I'm not sure about:

- Do trigraphs in the XPIDL files (which aren't, to my knowledge, anything
special to idl) end up in generated c++ code?

- Do some of these trigraphs matter? I mean, there are things like (???) in
comments, just indicating "not sure" or something, and yes, they might get
processed to become (?], but they disappear anyway.

- What to do about all the mailnews printf("blah(??)") trigraphs? Is there a way
to escape a question mark to indicate literals?

Also, if we're going to turn on trigraph processing on the Mac, should we turn
it on for all platforms? I believe the warnings we're getting on Linux are just
warnings, and the trigraphs are not actually being processed...
Okay, talked to scc. We have a better idea: turn off the stupid trigraph
warnings and leave everybody alone. The gcc option -Wall turns on all warnings,
and -Wno-trigraphs should turn trigraph warnings back off. They are not
currently processed on any platform and should never be.

Doing autoconf-fu...
AARGH! http://gcc.gnu.org/ml/gcc/2000-06/msg00464.html
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
verified wontfix.
Status: RESOLVED → VERIFIED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.