Closed Bug 56777 Opened 24 years ago Closed 23 years ago

[tracking bug] char vs. unsigned char bugs (ctype, stdio, char==unsigned)

Categories

(SeaMonkey :: General, defect, P3)

All
Solaris
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: h.b.furuseth, Assigned: h.b.furuseth)

Details

Attachments

(4 files)

Some systematic bugs: - the argument to the ctype.h functions is usually char, but should be in the range of unsigned char or EOF. Negative arguments other than EOF yield garbage. - getc() returns EOF or unsigned char, not char. - Don't compare an int with an 'unsigned char' value to a 'char' or vice versa! I enclose some quick patches. Trivial but not well tested. Each patch should only affect bad use of negative char values; they'll not affect unsigned-char hosts. Each file can be patched independently, though I combined patches to several source file in one patch file since I didn't quite know how to group them. Patched files in first patch: config/gtscc.c config/makedep.cpp config/mantomak.c config/mkdepend/cppsetup.c config/mkdepend/ifparser.c config/mkdepend/parse.c db/mork/src/morkParser.cpp dbm/src/mktemp.c extensions/cookie/nsCookie.cpp extensions/psm-glue/src/nsCrypto.cpp extensions/wallet/src/wallet.cpp extensions/xmlterm/linetest/lterm.c gc/boehm/cord/de.c gc/boehm/typeinfo.cpp htmlparser/src/nsHTMLTokens.cpp htmlparser/tests/htmlgen/htmlgen.cpp htmlparser/tests/outsinks/Convert.cpp intl/locale/src/nsLocaleFactory.cpp intl/locale/src/nsLocaleService.cpp jpeg/cdjpeg.c js/src/jsapi.c js/src/jscntxt.c js/src/jsdate.c js/src/jsfile.c layout/base/src/nsFrameUtil.cpp lib/mac/NSStdLib/src/nsEnvironment.cpp modules/libjar/nsWildCard.cpp modules/libreg/tests/interp.c netwerk/protocol/http/src/nsHTTPChannel.cpp netwerk/streamconv/converters/nsHTTPChunkConv.cpp nsprpub/pr/src/io/prscanf.c nsprpub/pr/src/misc/prinit.c nsprpub/pr/src/misc/prtime.c rdf/opendir/gs.c security/psm/lib/client/cmtutils.c security/psm/lib/client/sample/sample.c webshell/tests/viewer/nsBrowserWindow.cpp widget/src/gtk/nsFontRetrieverService.cpp widget/src/photon/nsFontRetrieverService.cpp xpcom/io/nsSpecialSystemDirectory.cpp xpcom/typelib/xpidl/xpidl_header.c xpcom/typelib/xpidl/xpidl_java.c xpfe/appfilelocprovider/src/nsAppFileLocationProvider.cpp xpinstall/wizard/unix/src2/nsFTPConn.cpp Patched files in MailNews patch: mailnews/addrbook/src/nsDirPrefs.cpp mailnews/base/search/tests/filterTest.cpp mailnews/base/util/nsMsgI18N.cpp mailnews/compose/src/nsMsgSend.cpp mailnews/compose/tests/smtp/smtpTest.cpp mailnews/imap/src/nsImapSearchResults.cpp mailnews/imap/src/nsImapServerResponseParser.cpp mailnews/imap/tests/harness/imapProtocolTest.cpp mailnews/import/text/src/nsTextAddress.cpp mailnews/local/tests/mailbox/mailboxTest.cpp mailnews/mime/cthandlers/vcard/nsVCard.cpp mailnews/news/src/nsNNTPHost.cpp Patched files in OS2 patch: config/os2/dirent.c xpcom/io/nsFileSpecOS2.cpp xpfe/bootstrap/nsNativeAppSupportOS2.cpp Patched files in Windows patch: gc/boehm/cord/de_win.c nsprpub/pr/src/md/windows/ntio.c nsprpub/pr/src/md/windows/w95io.c profile/Acct/dialshr.cpp xpfe/bootstrap/nsNativeAppSupportWin.cpp xpinstall/wizard/windows/ds32/ds32.cpp xpinstall/wizard/windows/nsinstall/nsinstall.cpp xpinstall/wizard/windows/nsztool/nsztool.c xpinstall/wizard/windows/setup/extra.c xpinstall/wizard/windows/test/testxpi.c xpinstall/wizard/windows/uninstall/extra.c xpinstall/wizard/windows/uninstall/parser.c
Attached patch Patch to OS2Splinter Review
Attached patch Patch to WindowsSplinter Review
hmmm, cc: brendan - are these patches valid (at least the basic idea behind them)? If yes, I will collect the file owners and cc em here.
Most of these patches are unnecessary and undesirable, because <ctype.h>, the Fine Man Page, and all relevant POSIS, etc., standards say that the isxxx() macros/functions take an int, not an unsigned type. But htmlparser fix looks like a good patch, please open a specific bug on that one. When you update this bug with the htmlparser bug's number, I'll resolve this bug as INVALID. Sorry you went to the trouble to hack these menial casts into the code, it would have been better to air this issue in a newsgroup before doing the work. /be
s/POSIS/POSIX/ /be
Wrong. Sure, the argument type should be int, but integer promotions take care of that. The _range_ of that int, however, is EOF + the range of 'unsignded char'. When 'char' is signed, non-ASCII 'char' values are negative, which generally yields garbage when passed to the ctype.h functions. Worse, sometimes you do the equivalent of char *s = something; int ch = getc(f) or toupper(x) or whatever; if (ch == *s) ... which fails if *s < 0 (i.e. a non-ASCII character).
Of course you can't use char for non-ASCII characters with <ctype.h> macros, and Mozilla does not. Mozilla uses not-quite-UCS2 in PRUnichar vectors and scalars for localizable strings nad characters. The examples you patched do not deal with ISO-Latin-1 characters that aren't in ASCII. If you know of any way to get non-ASCII data into those char variables, please file separate bugs. /be
The only exception granted for char in Mozilla is for utf-8, and such strings are not processed character-wise by <ctype.h> macros. What's more, I've worked on Unix systems in past lives, and the <ctype.h> macros in modern Unixes are "safe" against negative values (they add one to the int arg and cast to unsigned before indexing into a larger array, or they use a biased pointer to index into the middle of a big-enough array of flags). I give you RedHat6.1's <ctype.h>, relevant fragment: /* These are defined in ctype-info.c. The declarations here must match those in localeinfo.h. These point into arrays of 384, so they can be indexed by any `unsigned char' value [0,255]; by EOF (-1); or by any `signed char' value [-128,-1). ISO C requires that the ctype functions work for `unsigned char' values and for EOF; we also support negative `signed char' values for broken old programs. The case conversion arrays are of `int's rather than `unsigned char's because tolower (EOF) must be EOF, which doesn't fit into an `unsigned char'. But today more important is that the arrays are also used for multi-byte character sets. */ extern __const unsigned short int *__ctype_b; /* Characteristics. */ extern __const __int32_t *__ctype_tolower; /* Case conversions. */ extern __const __int32_t *__ctype_toupper; /* Case conversions. */ What Unix(en) are you running that do not defend? Please cite vendor and version if you can. /be
Via e-mail, Hallvard informed me that Solaris, at least, misbehaves when given ISO-Latin-1 characters in (signed) chars. Doron, this bug should become a tracking bug for newly-filed dependency bugs that are assigned to individual owners of files and modules touched by the patch. I'm settin OS to Solaris, too, as All is over-broad. /be
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: All → Solaris
sure. Are you gonna file the seperate bugs or should I do that?
Summary: char vs. unsigned char bugs (ctype, stdio, char==unsigned) → [tracking bug] char vs. unsigned char bugs (ctype, stdio, char==unsigned)
Hallvard has kindly offered to work with you on getting the bugs filed. I was going to do help otherwise, but I'll weasel out if I can -- anyway Hallvard is better for the job, since he has looked deeply into this issue (to which I was busily turning a blind eye to work on other bugs!). /be
Since this has morphed into a tracking bug I'm assigning it to Hallvard B Furuseth who will be filing dependency bugs.
Assignee: asa → h.b.furuseth
Sorry to be late; getting back to this Real Soon Now (or maybe next weekend)...
Status: NEW → ASSIGNED
Hallvard B Furuseth: Is there any progress on this bug ? If you don't want to fix it, please close or reassign it Thanks Matthias
marking worksforme - no comment in 1 1/2 years
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: