Closed Bug 72518 Opened 23 years ago Closed 13 years ago

If compreg.dat is corrupted, reregister from scratch

Categories

(Core :: XPCOM, defect, P4)

x86
FreeBSD
defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: jesup, Unassigned)

References

Details

(Keywords: crash, helpwanted)

Attachments

(1 file)

FreeBSD 4.1 20010319xx

I had built an optimized build (gmake -f client.mk clobber checkout build) and
when I ran it in the debugger, the debugger exited when it hit a breakpoint (on
a harmless assertion for a missing file:

WARNING: CSSLoaderImpl::LoadAgentSheet: Load of URL
'file:///home/jesup/.mozilla/default/chrome/userChrome.css' failed.  Error code:
16389, file nsCSSLoader.cpp, line 1543

The fact that the debugger exited on the optimized build isn't this issue.  The
problem was that after that failed run of mozilla under the debugger (which
appeared to register all the components), anytime I ran mozilla (from a shell) I
got a fatal assertion error while processing the component registry.  The only
solution was the delete component.reg and re-run mozilla.

I was not able to recreate the problem, and the component.reg that created the
problem has been removed.  However, I do have a core file and I'll attach a
backtrace and also some relevant structure dumps - it appears the registry is
truncated and that the code that reads it does an assert on the attempt to read
data past the end of the file.
Adding crash keyword.
Keywords: crash
When the program exits during component registration in Init_XPCOM, there is a
possibility it leaves the registry in an unprocessable state.  I do not see this
as a problem.  Don't let the code exit there.  This seems to have been a
debugger / assert interaction that does not happen during user runs.  If it is,
then someone can fix that, or remove or modify the assertion.  This does not
seem out of the ordinary to me.
>Don't let the code exit there. 

We can't guarantee that the code will not exit at any particular time (or that
some external process won't mung our registry file, or that the machine won't
crash while writing it, etc, etc.)

Code to read a file should never fail (crash) due to bad input.  Reject the
input; sure.  Complain, fine.  Crash, never.  Now, if PR_ASSERT() is a no-op in
production code (quite possibly, I didn't look), and the code properly handles
this case in that situation (which is hard to test with a debugging version, and
I'd wonder how well it's been covered), then I withdraw my objection.

A real user who got into this state somehow would not be able to ever get out of
it except by uninstalling mozilla - if we don't handle it correctly in
production code.

If it's supposed to be handled in production code, perhaps that should be a
warning or somesuch that doesn't dump the program when compiled with debugging.
Your principle seems far too broad to me.  Making all data processing self
correcting so that the intent of the user is always carried out even if the data
is missing or corrupt is not realistic.

Code to read a file, never does fail due to bad input.  It is the code to
process the file that fails.  Failure happens all the time.  If the failures are
not in critical systems, then we smile at the missing graphic.  If it is, there
is likely to be failure that will prevent the program from executing.  There is
no way to make a missing registry as unimportant as a missing graphic when the
architecture says it is important.

Shared libraries are an example of another critical file.  They could have virii
or other corruption.  If they do, the program is likely to crash, not on reading
but on processing.  Given the critical relationship of the DLLs to the system,
we rely on preventing corruption, not on somehow fixing it.  Fixing it is done
by installation.

Besides core DLLs and scripts, I cannot think of another file more critical than
the component registry.  If the registry is bad, we cannot contemplate proper
execution of Netscape without resolving the problem.  Whether it exits with a
crash dump or some other error message doesn't seem that significant.

In a debug development environment, the DLLs and the registry file are being
continuously overwritten leaving an ongoing risk of corruption.  In a production
environment, only an installation triggers these files being modified.  We could
try to create a default error handler with lots of logic so that every time the
program crashes, we go out and reregister all the components, perhaps check to
make sure all the DLLs are there, and I am sure there are a number of other
things we might try with other critical systems that may have failed, which
could easily produce loops if the attempted actions did not fix the problem, and
there is no guarantee it would.  This is what your statement about never failing
to read a file would seem to imply.  I disagree.  If the registry doesn't have
all the info it should, the program SHOULD fail, IMO.

To me, a far better approach for this type of critical file is:

a.  Prevent corruption from happening.  If we know there is something that
crashes and causes the file to be corrupt / incomplete, then fix the bug.
b.  The registry is really an installation issue.  Make certain that a
reinstallation (or rebuild in a debug / development environment) fixes the
problem.  Since the registry is automatically generated, the installation should
be permitted to delete the registry or install a non-corrupt prebuilt version.

If you have evidence of a bug that produces registry corruption in a production
system or prevents an installation from overwriting corruption, I think it would
be something to work on.  Developers are going to have a less stable system
regardless, but if certain assertions are making it less convenient for them,
that can be fixed, too.
Assignee: rayw → kandrot
>Your principle seems far too broad to me.  Making all data processing self
>correcting so that the intent of the user is always carried out even if
>the data is missing or corrupt is not realistic.

        That wasn't my principle (or what I meant at least); it was that
at the minimum we should try to recover from errors that we can expect to
happen, and errors we cannot recover from should be as friendly as
possible.  IMHO a truncated registry file is in fact an expectable error.

        I consider this problem similar to the fact that people who ran
betas or dailies of mozilla found it wouldn't start with later versions
because some binary format file (cache.db probably) had changed and it
didn't version-check; forcing users to frequently blow away profiles.
One can argue that that's expected, however it indicates a fragility
in the system if data isn't checked for sanity (or version) when it's
read.  Also, people who may have once tried it on a whim long ago may find
that when a formal open beta or final release appears, they can't start
it, and they don't understand why not.

>Code to read a file, never does fail due to bad input.  It is the code to
>process the file that fails.  Failure happens all the time.  If the
>failures are not in critical systems, then we smile at the missing
>graphic.  If it is, there is likely to be failure that will prevent the
>program from executing.  There is no way to make a missing registry as
>unimportant as a missing graphic when the architecture says it is
>important.

        True, though in many cases (such as this one), there is a fallback
- if the file doesn't pass sanity checks do not exit, instead invoke the
process that creates the file in the first place.  The file is effectively
just a cache anyways; it has no data that is not regenerable.

>Besides core DLLs and scripts, I cannot think of another file more
>critical than the component registry.  If the registry is bad, we cannot
>contemplate proper execution of Netscape without resolving the problem.
>Whether it exits with a crash dump or some other error message doesn't
>seem that significant.

        However, unlike a bad DLL, we do have a way of regenerating
the registry.  Also, even if we didn't have a way of resolving the problem,
something more descriptive like "Corrupt <whatever>, please re-install"
would be better.  Better yet would be things like "Corrupt bookmarks file.
Continue and remove the corrupt file, or Exit leaving the file intact?".

>In a debug development environment, the DLLs and the registry file are
>being continuously overwritten leaving an ongoing risk of corruption.  In
>a production environment, only an installation triggers these files being
>modified.  We could try to create a default error handler with lots of
>logic so that every time the program crashes, we go out and reregister all
>the components, perhaps check to make sure all the DLLs are there, and I
>am sure there are a number of other things we might try with other
>critical systems that may have failed, which could easily produce loops if
>the attempted actions did not fix the problem, and there is no guarantee
>it would.  This is what your statement about never failing to read a file
>would seem to imply.  I disagree.  If the registry doesn't have all the
>info it should, the program SHOULD fail, IMO.

        The problem is that while that's all well and good for a developer
or power-user, it's very bad for Joe User.  It doesn't work, and no matter
how many times they restart it it still doesn't work.  The only solution 
they have is to delete the application and re-install - or more likely just
delete it and go to IE.

>To me, a far better approach for this type of critical file is:
>
>a.  Prevent corruption from happening.  If we know there is something that
>crashes and causes the file to be corrupt / incomplete, then fix the bug.

        Certainly.

>b.  The registry is really an installation issue.  Make certain that a
>reinstallation (or rebuild in a debug / development environment) fixes the
>problem.  Since the registry is automatically generated, the installation
>should be permitted to delete the registry or install a non-corrupt
>prebuilt version.

        Reinstallation is something that people who don't have broadband
connections would dread.  Not to mention that if the registry is corrupt
we _do_ have a way to resolve it, or at least make the attempt, and if it
fails again bail and tell them to reinstall.

reassigning kandrot bugs.
Assignee: kandrot → dougt
Keywords: helpwanted
Target Milestone: --- → Future
*** Bug 121619 has been marked as a duplicate of this bug. ***
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity.  Only changing open bugs to
minimize unnecessary spam.  Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Severity: major → critical
mass reassigning to nobody.
Assignee: dougt → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: rayw → xpcom
Component: XPCOM Registry → XPCOM
Priority: -- → P4
Summary: Crash with corrupt (truncated?) component.reg → If compreg.dat is corrupted, reregister from scratch
compreg.dat doesn't exist anymore, so this bug has been fixed... :-)
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: