Closed Bug 153562 Opened 23 years ago Closed 23 years ago

crash trashes user directory, requires new profile

Categories

(Core :: Internationalization, defect)

Sun
SunOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: ron, Assigned: masaki.katakai)

References

Details

(Keywords: dataloss, intl)

Attachments

(3 files, 3 obsolete files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.1a) Gecko/20020614 BuildID: 2002061401 When Mozilla-1.1a crashes, which it unfortunately does, it trashes the user's directory, so that restart brings up the profile manager, which protests that the directory of the user's profile information is inaccessible. The only way to restart Mozilla-1.1a is to create a new profile. This is a not-ready-for-prime-time bug. Reproducible: Always Steps to Reproduce: 1.Wait for Mozilla-1.1a to crash 2.Try to restart it 3. Actual Results: Prifile manager says it cannot find the directory for user's profile. The user has to either create a new profile or restore the .mozilla file from backups. Expected Results: Restart without protest or hassle.
-> Profile Manager Backend
Assignee: Matti99 → ccarlen
Component: Browser-General → Profile Manager BackEnd
QA Contact: imajes-qa → ktrina
Keywords: dataloss
This one is nasty, and only (re?)surfaced recently. To reproduce: 1. run mozilla 2. pkill -9 mozilla 3. mozilla cannot find the preferences dir. The reason it cannot find it: after a crash, it looks for .mozilla/paul/garbase.sl instead of the .slt. Somebody suggested to symlink the .sl to .slt, but after the next crash, Mozilla further truncated the filename to end in .s and once again couldn't find my profile. Apparantly a character gets 'eaten' when a crash happens. Restoring your profile: In the profile manager, remove the profile in question, then recreate it with the same name. Because the .slt-directory doesn't get erased, all your profile data will still be there and you can use Mozilla again until the next crash.
I'm observing the same problem on Solaris 8 (sparc & x86), with recent mozilla versions compiled from the 1.0 branch. I'm using an ISO8859-1 locale, either % locale LANG=en_US.ISO8859-1 LC_CTYPE=en_US.ISO8859-1 LC_NUMERIC=en_US.ISO8859-1 LC_TIME=en_US.ISO8859-1 LC_COLLATE=en_US.ISO8859-1 LC_MONETARY=en_US.ISO8859-1 LC_MESSAGES=C LC_ALL= or % locale LANG= LC_CTYPE=de_DE.ISO8859-1 LC_NUMERIC=de_DE.ISO8859-1 LC_TIME=de_DE.ISO8859-1 LC_COLLATE=de_DE.ISO8859-1 LC_MONETARY=de_DE.ISO8859-1 LC_MESSAGES=de LC_ALL= I've tried to debug this problem, and came to the conclusion that the problem is probably caused by the checkin for bug 147333. With the checkin for 147333, iconv(3C) is used to translate filenames in the local "codeset" to/from unicode encoded filenames (xpcom/io/nsNativeCharsetUtils.cpp; used by xpcom/io/nsLocalFileUnix.cpp). Apparently the user's profile directory is the first filename that is run through the nsNativeCharsetConverter::NativeToUnicode() conersion method. Now the problem seems to be, that the solaris iconv converter for translating ISO8859-1 -> UCS-2 inserts a special unicode character '\ufeff' in front of the first translated UCS-2 word in the output stream. (Is this magic unicode character intended to enable detection of the byte order for the UCS-2 stream??). This extra unicode character in the converted output (for the first native -> ucs-2 conversion) confuses the NS_CopyNativeToUnicode() subroutine in xpcom/io/nsNativeCharsetUtils.cpp. This subroutine assumes that the number of unicode characters produced in the unicode output is not greater than the number of native input characters -> not the entire input buffer is translated into unicode characters. In a mozilla debug build this problem shows in a failed assertion (nsNativeCharsetUtils.cpp, line 701): % /tmp/mozilla/mozilla ... ProfileStruct::InternalizeLocation: /files/jk/.mozilla/default/9cyi1lsp.slt ###!!! ASSERTION: did not consume entire input buffer: 'bufLeft == 0', file nsNativeCharsetUtils.cpp, line 701 ###!!! Break: at file nsNativeCharsetUtils.cpp, line 701 ###!!! ASSERTION: data in U+0100-U+FFFF will be lost: '(*first < 256)', file bufferRoutines.h, line 317 ###!!! Break: at file bufferRoutines.h, line 317 ProfileStruct::ExternalizeLocation, 1: ./files/jk/.mozilla/default/9cyi1lsp.sl ... ( I've added the "ProfileStruct::{In,Ex}ternalizeLocation ..." printfs to my sources, the first InternalizeLocation still shows the correct profile path, but the second ExternalizeLocation has mangled the profile path)
Depends on: 147333
The attached iconv.c source demonstrates the problem with iconv(3C) convertin from ISO8859-1 to UCS-2. The ISO8859-1 String "Hello" is converted to UCS-2 twice. Note that the first conversion produces 6 unicode characters output, the second conversion produces 5: % iconv converted 5 bytes input to 12 bytes output ff fe 48 00 65 00 6c 00 6c 00 6f 00 converted 5 bytes input to 10 bytes output 48 00 65 00 6c 00 6c 00 6f 00
This is a first attempt to fix the problem. I've just increased the maxium output buffer size for the UCS-2 string the NS_CopyNativeToUnicode function, to make space for the extra byte order mark (BOM 'uFEFF') character. While this fixes the trashed user profile directory problem on solaris 8, I'm not 100% sure I like this patch. Probably the BOM character should not appear at all in the converted Native->UCS-2 output string...
I'm also seeing this many times but I don't think this is causes by bug 147333... Actually the patch could fix this problem. I've verified on Solaris. Shall we move this to intl and try to put this to 1.0.1 branch? What do you think, Roland? This problem is very serious for Solaris users.
Marking NEW for patch loving.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Since this is a charset conversion problem, changing component to intl.
Assignee: ccarlen → yokoyama
Component: Profile Manager BackEnd → Internationalization
QA Contact: ktrina → ruixu
katakai-san: can you own this bug?
*** Bug 154054 has been marked as a duplicate of this bug. ***
Another possible way to fix the problem: Perform a dummy conversion, after the iconv converters are opened and before they are used; intended to soak up the BOM. Has the advantage that the BOMs do not make it into the Native->Unicode result string, which might be a problem with my first attempt to fix this problem (with my first patch, the BOMs are part of the Unicode result string). Idea from comment #206 of bug 147333 http://bugzilla.mozilla.org/show_bug.cgi?id=147333#c206
Yet another possible way to fix the problem: Try to avoid the use of the UCS-2 iconv converter module on solaris 8; use the UCS-2BE or UCS-2LE converter module instead - because these iconv converters does not insert the problematic BOMs. Has the advantage that the BOMs do not make it into the Native->Unicode result string, which might be a problem with my first attempt to fix this problem (with my first patch, the BOMs are part of the Unicode result string). Idea from comment #206 of bug 147333 http://bugzilla.mozilla.org/show_bug.cgi?id=147333#c206
Jurgen please contact the owner of this module for a review and cc: reviewers@mozilla.org (so your patch can get the nessecary reviews for checkin)
Keywords: intl
Attached patch patch v1Splinter Review
Updated version of the 2nd variant of this patch with minor fixes for the comment text.
Attachment #90076 - Attachment is obsolete: true
Attachment #90617 - Attachment is obsolete: true
Attachment #90618 - Attachment is obsolete: true
A BOM, along with a number of other unicode characters, are perfectly legitimate parts of a unicode string. This feels to me like we're fixing this problem my side effect, not by fixing the real problem which appears to be in NS_CopyNativeToUnicode(), right?
Masaki Katakai wrote > Shall we move this to intl and try to put this to 1.0.1 branch? > What do you think, Roland? Currently I see two possible solutions: 1. Add a workaround to avoid the BOM (see attachment 90748 [details] [diff] [review]) 2. Fix the problems in |NS_CopyNativeToUnicode()| as suggested by blizzard I would vote for [2] if possible, but [1] may be sufficient as a workaround for the 1.0.1-branch (which should be backed-out again once we land [2] in the branch).
There's nothing wrong with a decoder sticking byte order marks or any other characters into the stream which is going to cause problems down the road some time. We need to fix this the right way.
Looks like my reassignment to katakai-san didn't work. Trying again
Assignee: yokoyama → katakai
Re to #15: Yes, NS_CopyNativeToUnicode() is part of the problem because it assumes the unicode output string is never larger than the native input string length. This assumption is coded in the way the 'resultLeft' variable is initialized in NS_CopyNativeToUnicode(), and passed to the NativeToUnicode() member function. The idea in the original code is probably: in a multibyte locale (china/korea/japan/...) multiple input bytes are translated into a single unicode character, so the worst case is when we're running in one of the single byte locales (e.g. ISO8859-1) where the iconv converter produces one unicode character output for every single input character. The problem with this assumption is, that Sun's Solaris8 iconv module adds one additional unicode character - the byte order mark - during the *first* iconv() call. [ This assumption would also badly fail in a locale with a native encoding that would translate single native input bytes into multiple unicode output characters. I must admit than I'm not an 'i18n' or 'unicode' expert - I've no idea if such a locale exists in reality. ] On the other hand, the GNU based iconv routines on linux and XXXbsd do not insert BOMs. I'm not sure what happens on other unix platforms with iconv support (AIX, IRIX, ...). While we could change NS_CopyNativeToUnicode() to allow for a larger unicode output buffer then the number of native input bytes passed to it - to allow for a sporadic BOM character, I'm not sure if that's a good idea (my first patch attempt implements this). With such a change, on linux / XXXbsd the converted unicode strings would never include BOMs, while on solaris 8 there would be exactly *one* native-to-unicode converted string that does include the extra BOM. And we cannot be sure which part of mozilla will see this unicode string with the extra BOM on solaris 8 (and if that part is able to cope with the extra BOM). Currently this is the profile directory path, but this string could appear in any part of mozilla that happens to call NS_CopyNativeToUnicode() for the first time. For that reason I've decided it would be better if solaris never adds the BOM to the native->unicode converted strings, to get exactly the same unicode output strings we get on linux / XXXbsd.
Re to #17: I just compiled mozilla on solaris8 using my first patch, which performs the complete native->unicode translation for the profile directory *and* includes a BOM prepended to that path: http://bugzilla.mozilla.org/attachment.cgi?id=90076&action=view Then I started and killed the browser on solaris 8. The ~/.mozilla/appreg file contains an (UTF-8 encoded) BOM followed by the profile directory name. Now I tried to run mozilla on a linux box, using my NFS shared home directory. The linux version was unable to open my profile; an truss on linux revealed a failed access() system call: 28756 access("ÿ/home/jk/.mozilla/jk/57rwjd3f.slt", F_OK) = -1 ENOENT (No such file or directory) (Note the 0xff character at the start of the path). I'm not so sure if it's a good idea to let the BOMs through.
The generic unicode decoder should let the BOMs through - they are valid Unicode characters. Maybe the nsIFile code should filter them, though. Do we have any unicode utilities that does decompisition and will remove characters like the BOM?
I'm sorry I could not find the right fix yet for this problem. Could anyone help please?
> I'm sorry I could not find the right fix yet for this problem. What exactly do you mean by "right fix"? Ideas so far are: 1. Fix |NS_CopyNativeToUnicode()| to allow for a destination unicode string with more characters than the byte length of native input string, so that we can store the additional BOM character added by the solaris 8 iconv module. 2. Implement 1. + "postprocess" the unicode output string and remove the BOMs, to note confuse other parts of the mozilla code (comment #21). 3. Just add a "workaround" to strip the BOMs when running on the solaris 8 platform. 1. is implemented by the "First attempt to fix this problem" http://bugzilla.mozilla.org/attachment.cgi?id=90076&action=view A possible enhancement for this patch would be to add some generic code to enlarge the unicode output buffer in |NS_CopyNativeToUnicode()| whenever we run out of space and still have unconverted input characters left. 2. As far as I understand it, this is Christopher's idea on how to fix the problem; but a patch that implements this is missing. I'm unsure what unicode should be stripped from the converted output string (just look for BOM characters and strip them, or are there other possible unicode characters that we want to strip?) 3. Patches for solaris 8 "workarounds" are: http://bugzilla.mozilla.org/attachment.cgi?id=90618&action=view http://bugzilla.mozilla.org/attachment.cgi?id=90748&action=view
J・gen, I mean patch for 2. As your comment #20, 1. is not good enough, right? so we should fix this problem by 2. or 3.?
*** Bug 157608 has been marked as a duplicate of this bug. ***
I think the patch of workaround to avoid the BOM (attachment 90748 [details] [diff] [review]) is the best solution for now. Any objection? Can anyone r=/sr= please?
I have verified the fix works fine on my local build on Solaris 8 both SPARC and Intel.
*** Bug 158531 has been marked as a duplicate of this bug. ***
*** Bug 157080 has been marked as a duplicate of this bug. ***
Per jonjgc@yahoo.com (jon labadie) on bug 158531 (marked dupe): I only use one profile, so there should be no need to enter the profile manager. However, quite often, and always after crashes, I am told my default directory is not available, please select a new profile. I have tracked down the problem to a scrambling of part of the appreg file. At the end of the 30K appreg file is a copy of the path to my default (only) profile. When the problem occurs this path is shifted 3 bytes further into the file, with 3 characters preceeding the leading "/" of the path. Each of these characters is > 127 (non-ascii). The path normally ends in ".slt". When corrupted, the last character of the file is the "l". I can repair the file with a binary editor and all seems fine again until another crash or ???
Who can review the suggested patch for us please? Margaret
Keywords: review
Comment on attachment 90748 [details] [diff] [review] patch v1 sr=darin this patch looks fine to me, but do we have to worry about this BOM after reseting the converter? after calling xp_iconv_reset that is?
Attachment #90748 - Flags: superreview+
Thank you for review, Yes, it's only first time.
Status: NEW → ASSIGNED
Comment on attachment 90748 [details] [diff] [review] patch v1 r=Roland.Mainz@informatik.med.uni-giessen.de - but please keep the bug either open or file a new one for further investigation of the (real) issue.
Attachment #90748 - Flags: review+
Well, the real issue is in |NS_CopyNativeToUnicode()|. It assumes the internal unicode buffer that holds the Native->Unicode conversion result need not have space for more unicode characters than the byte length of the native input string. And the error handling - in case the assumption was wrong - is just a NS_ASSERTION(bufLeft == 0, "did not consume entire input buffer"); This will silently truncate the converted unicode string in a release build (the debug build truncates and prints a warning but continues, too) See comment #3. The patch in this attachment increases the capacity of the unicode output string in |NS_CopyNativeToUnicode()| when there's still some input left that must be converted but the unicode output buffer is full. This replaces the |NS_ASSERTION(bufLeft == 0, "did not consume entire input buffer")|. The dummy conversion to get rid of the BOM on solaris 8 is still included in this version of the patch, inside a conditional "#if SOLARIS ... #endif" block to better document that part as a workaround intended for solaris systems only. With the fixed |NS_CopyNativeToUnicode()| it is not really needed any more, but has the benefit that no invocation of |NS_CopyNativeToUnicode()| adds a BOM on solaris. -------- I'm not sure if I'm using the mozilla string classes 100% correct (probably not); the part that resizes the output string using the |SetLength()| method and resumes writing in the resized unicode output string may need to be improved. According to a comment in nsAString.h the way I use |SetLength()| to append to a string is deprecated (anyway, it works for me :-).... Any mozilla string class expert around, who can improve this?
ok, this looks better, but why include both solutions? the comments in the new code you are adding suggest that katakai's solution would also work. why not just strip the BOM? then the optimization in NativeToUnicode would be fine, right? what am i missing? that said, i do like your addition because it makes the code more robust.
darin: I vote for attachment 90748 [details] [diff] [review] to get rid of this issue in the 1.0.1-branch/MachV and current 1.1final tree - this patch has been tested since a while and involves only a small risk. The real fix for the bug should then backout the workaround from attachment 90748 [details] [diff] [review] when it's being checked-in.
roland: yeah, i think you're right. katakai: can you add the #ifdef SOLARIS to the patch? thx!
darin wrote: > katakai: can you add the #ifdef SOLARIS to the patch? thx! I am not sure whether we want that. Adding a comment would be fine, but adding |#ifdef| stuff there means that the code - and all possible side-effects - are Solaris-only then. I would perfer the the "all platforms run it"-way since the various Unix/Linux variants may have a simlar functionality like Solaris iconv library (and we get more testers for the patch, too :)
yeah, that does make sense. not that our linux builds use iconv, but other unix builds do, and if solaris has this problem then it's likely that others like AIX and HPUX may also have the problem. ok... make it XP.
Asking a= of attachment 90748 [details] [diff] [review] for 1.1final.
Comment on attachment 90748 [details] [diff] [review] patch v1 a=asa (on behalf of drivers) for checkin to 1.1
Attachment #90748 - Flags: approval+
katakai: i can check this in, if you like... that is, if you don't have CVS access yourself ;-)
Thanks for asking, Darin. I have CVS access. I'll check-in the patch soon.
checked into 1.1final.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment on attachment 90748 [details] [diff] [review] patch v1 a=chofmann for 1.0.1 add fixed1.0.1 to the keywords after checking into the branch
Keywords: mozilla1.0.1+
checked into branch.
Keywords: fixed1.0.1
*** Bug 160098 has been marked as a duplicate of this bug. ***
*** Bug 160691 has been marked as a duplicate of this bug. ***
Could Ronald or someone else please verify with the latest build to see if it is still reproducible? since we don't have SunOS here. Thanks!
Yes, I can verify that the problem is gone. If I do the same as in Additional Comment 2 (http://bugzilla.mozilla.org/show_bug.cgi?id=153562#c2), which is to do a kill -9 on Mozilla and then start it up again, Mozilla works. Build date is 2 aug., platform is Solaris 9 (Sparc).
Thanks for your help, Paul! Do we need to check in this fix to trunk?
It is already checked in on the trunk and the 1.0 branch, or did I miss something? RCS file: /cvsroot/mozilla/xpcom/io/nsNativeCharsetUtils.cpp,v Trunk: revision 1.7 MOZILLA_1_0_BRANCH: revision 1.5.2.2
Thanks, Jürgen! Mark bug as verified as per comment #51.
Status: RESOLVED → VERIFIED
There is one thing left - see comment #34: -- snip -- ... please keep the bug either open or file a new one for further investigation of the (real) issue. -- snip -- Can anyone file a bug to squish the real issue, please ?
*** Bug 161475 has been marked as a duplicate of this bug. ***
*** Bug 162550 has been marked as a duplicate of this bug. ***
*** Bug 162624 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: