Closed Bug 29992 Opened 25 years ago Closed 24 years ago

Continual debug assertions

Categories

(Core :: DOM: Navigation, defect, P3)

x86
Windows NT
defect

Tracking

()

VERIFIED WONTFIX

People

(Reporter: morse, Assigned: Bienvenu)

Details

Attachments

(2 files)

With a tree that I just pulled, I can't do anything without getting continual 
debug assertions.  I had to comment out the assert statement in order to be able 
to run the browser.  Here's the stack trace at the point of assertion.

(Bugzilla insists that I select a component but I have absolutely no idea what 
component this is.  So assigning to browser general although I suspect that this 
is wrong.)

NTDLL! 77f76274()
nsDebug::Assertion(const char * 0x028ecb28, const char * 0x028eca34,
const char * 0x028eca0c, int 78) line 189 + 13 bytes
mork_assertion_signal(const char * 0x028ecb28) line 78 + 31 bytes
morkEnv::NewError(const char * 0x028ecf70) line 369 + 19 bytes
morkStore::AddAlias(morkEnv * 0x024ebbc0, const morkMid & {...},
unsigned long 0) line 975
morkBuilder::OnAlias(morkEnv * 0x024ebbc0, const morkSpan & {...}, const
morkMid & {...}) line 635
morkParser::ReadAlias(morkEnv * 0x024ebbc0) line 956
morkParser::ReadDict(morkEnv * 0x024ebbc0) line 1230
morkParser::ReadContent(morkEnv * 0x024ebbc0, unsigned char 1) line 1322
morkParser::ReadGroup(morkEnv * 0x024ebbc0) line 1116
morkParser::ReadAt(morkEnv * 0x024ebbc0, unsigned char 0) line 1148
morkParser::ReadContent(morkEnv * 0x024ebbc0, unsigned char 0) line 1325
+ 16 bytes
morkParser::OnPortState(morkEnv * 0x024ebbc0) line 1359 + 14 bytes
morkParser::ParseLoop(morkEnv * 0x024ebbc0) line 1415 + 12 bytes
morkParser::ParseMore(morkEnv * 0x024ebbc0, long * 0x0012f740, unsigned
char * 0x024ebef8, unsigned char * 0x024ebef9) line 1454
morkThumb::DoMore_OpenFileStore(morkEnv * 0x024ebbc0) line 433
morkThumb::DoMore(morkEnv * 0x024ebbc0, unsigned long * 0x0012f868,
unsigned long * 0x0012f888, unsigned char * 0x0012f86c, unsigned char *
0x0012f864) line 353 + 12 bytes
orkinThumb::DoMore(nsIMdbEnv * 0x024ebfd8, unsigned long * 0x0012f868,
unsigned long * 0x0012f888, unsigned char * 0x0012f86c, unsigned char *
0x0012f864) line 230
nsGlobalHistory::OpenDB() line 1688 + 37 bytes
nsGlobalHistory::Init() line 1610 + 8 bytes
NS_NewGlobalHistory(nsISupports * 0x00000000, const nsID & {...}, void *
* 0x0012f99c) line 533 + 8 bytes
nsGenericFactory::CreateInstance(nsGenericFactory * const 0x025069f0,
nsISupports * 0x00000000, const nsID & {...}, void * * 0x0012f99c) line
46
nsComponentManagerImpl::CreateInstance(nsComponentManagerImpl * const
0x00c94520, const nsID & {...}, nsISupports * 0x00000000, const nsID &
{...}, void * * 0x0012f99c) line 1253 + 24 bytes
nsComponentManager::CreateInstance(const nsID & {...}, nsISupports *
0x00000000, const nsID & {...}, void * * 0x0012f99c) line 82
nsServiceManagerImpl::GetService(nsServiceManagerImpl * const
0x00c94890, const nsID & {...}, const nsID & {...}, nsISupports * *
0x0012fa58, nsIShutdownListener * 0x00000000) line 293 + 19 bytes
nsServiceManagerImpl::GetService(nsServiceManagerImpl * const
0x00c94890, const char * 0x003998f4, const nsID & {...}, nsISupports * *
0x0012fa58, nsIShutdownListener * 0x00000000) line 432
nsServiceManager::GetService(const char * 0x003998f4, const nsID &
{...}, nsISupports * * 0x0012fa58, nsIShutdownListener * 0x00000000)
line 545
nsGetServiceByProgID::operator()(const nsID & {...}, void * *
0x0012fa58) line 63 + 22 bytes
nsCOMPtr<nsIGlobalHistory>::assign_from_helper(const nsCOMPtr_helper &
{...}, const nsID & {...}) line 795 + 18 bytes
nsCOMPtr<nsIGlobalHistory>::nsCOMPtr<nsIGlobalHistory>(const
nsCOMPtr_helper & {...}) line 498
nsDocShell::SetTitle(nsDocShell * const 0x02332e30, const unsigned short
* 0x10083548 gCommonEmptyBuffer) line 1410 + 28 bytes
nsWebShell::SetTitle(nsWebShell * const 0x02332e30, const unsigned short
* 0x10083548 gCommonEmptyBuffer) line 3566
nsHTMLDocument::SetTitle(nsHTMLDocument * const 0x024a3d9c, const
nsString & {""}) line 784
HTMLContentSink::DidBuildModel(HTMLContentSink * const 0x024a3150, int
0) line 2274 + 38 bytes
CNavDTD::DidBuildModel(CNavDTD * const 0x024a2b50, unsigned int 0, int
1, nsIParser * 0x024a3490, nsIContentSink * 0x024a3150) line 631 + 14
bytes
nsParser::DidBuildModel(unsigned int 0) line 721 + 55 bytes
nsParser::ResumeParse(nsIDTD * 0x00000000, int 1) line 1170
nsParser::OnStopRequest(nsParser * const 0x024a3494, nsIChannel *
0x0233dba0, nsISupports * 0x00000000, unsigned int 0, const unsigned
short * 0x00000000) line 1560 + 19 bytes
nsDocumentOpenInfo::OnStopRequest(nsDocumentOpenInfo * const 0x0233d920,
nsIChannel * 0x0233dba0, nsISupports * 0x00000000, unsigned int 0, const
unsigned short * 0x00000000) line 277
nsInputStreamChannel::OnStopRequest(nsInputStreamChannel * const
0x0233dba4, nsIChannel * 0x0233d7b0, nsISupports * 0x00000000, unsigned
int 0, const unsigned short * 0x00000000) line 358 + 45 bytes
nsOnStopRequestEvent::HandleEvent(nsOnStopRequestEvent * const
0x0233ee30) line 292
nsStreamListenerEvent::HandlePLEvent(PLEvent * 0x0233e360) line 97 + 12
bytes
PL_HandleEvent(PLEvent * 0x0233e360) line 526 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x01027370) line 487 + 9 bytes
_md_EventReceiverProc(HWND__ * 0xa00004e2, unsigned int 49361, unsigned
int 0, long 16937840) line 975 + 9 bytes
USER32! 77e71268()
reassiging.
Assignee: bienvenu → davidmc
I suppose I should add another mode to Mork assertions, so that for folks not 
developing Mork code, the first one will cause a fatal error for the Mork db, 
which signals a soft error to the db users letting them know the db is no good.
Actually something like this should already be happening, so maybe I have  loop 
somewhere in parsing where I don't check the outstanding error status.
Has this ever happened again after the first observation?

I looked at the code, but the parser always seems to stop after the first error, 
because the loops check for ev->Good(), which stays false after an error.  So 
maybe a continual sequence of assertions might be once for every mork file, for
some reason that's a bit hard to imagine.
When I study the stack crawl more, I see this is happening when opening a history 
db in response to a shut down listener notification.  Is that right?  Why would 
that happen?  Anyway, that seems to rule out the multiple mork file idea which 
might apply when opening a bunch of mail/news summaries.

So then a new theory might be -- what if opening the history db fails, but this 
is not accepted by the caller?  What if they try to open the history db again?  
If the history db seems to become corrupt, do we throw it away and start fresh?
It's happening to me with a fresh tree that I just pulled and built this 
morning.  It happens on startup -- before the first screen ever comes up.  
Things were so bad that I had to comment out the Break in nsDebug.cpp in order 
to get any useful work done.
Over lunch, someone suggested that this might have got corrupted because you ran
two builds (e.g., release and debug) against the same profile at the same time.
If so, that could corrupt history files. You could just rename your history file
to get avoid the asserts (but don't delete it in case that could help david
recreate the problem)
Here's something for Chris Waterson: since it's possible for Mork db's to become 
corrupt, is it data loss to throw away an old db?  (By entropic theory and also 
empirical observation, all db's eventually get corrupt, so software needs a plan 
to cope when it happens.)

If history db's hold info important to preserve, then you might want to keep a 
backup copy of the second to last db which looked good.  So if the most recent db 
becomes corrupt, you might fall back on the secondary copy.

I should go look at the history db code now and see whether it makes a new db 
when the old one seems un-openable.
An excellent theory.  I renamed the history.dat file and it no longer asserts.  
I'll attach my old history.dat file so you can do a post-mortem on it.
Here's another question for Chris Waterson: got plans for a different db?  If 
not, we might consider a server version of MDB interfaces in a few months.  I'm 
working on a language with a small runtime engine, which might make it easy to 
write a Mork implementation which handles an arbitrary number of clients.  It's 
just a home hobby thing, but eventually it will make some things possible at work 
which otherwise must be rejected for lack of time and resources.
davidmc, (re: "precious") That probably depends on who you ask :-)

It'd be interesting to try to figure out *why* the history DB got corrupted. If 
it's because two processes were writing to it, I'd say that this is a problem 
that we WONTFIX.

I think the right thing to do here is to detect that the history database is 
corrupted, bail, and create a new, empty db file in its place. This may require 
extending the APIs a bit (e.g., maybe other Mork users would rather try to 
untangle a corrupted DB).
I think this is up to the mork client - the mailnews code does this today (well,
maybe not, but we do throw databases away that are out of date - in 4.x, we
throw away .snm files when they get corrupted)
re: server. I have no immediate plans to use a different DB for history. I 
don't think that we need to solve the multiple-accessors problem for the 
browser's global history (or mail folders).

That said, if you did it, and it was smaller, faster, stronger, etc., it might 
make sense to switch implementations behind the scenes.
Maybe on some platform we are failing to open with exclusive access, so both 
files could try appending a commit transaction, which would almost certainly 
corrupt a file, when the longer transaction goes first so it extends past the end 
of the shorter transaction.  (But I don't see obvious signs of two writers in the 
posted file causing assertions.)

Otherwise if we have exclusive access, I don't see how two app instances can 
manage to corrupt the file, even though this has been correlated in the past with 
observed file corruptions.

Note that when db software is rather stable, most reported file corruptions have 
little apparent explanation.  Anecdotal evidence suggests memory whackage was 
underway prior to a crash, which managed to propagate to memory buffers destined 
for disk i/o.  (This is the reason why no database is safe from corruption, if 
any bugs exist in software running in a process that can access disk i/o 
buffers.)

I only say all that to explain why corruption cause seems more often a mystery 
than not.  Repeatable corruption is usually a findable bug, and rare transient 
corruption is usually written off to bad timing of memory chaos.
No wait, I found what looks like two writers appending the same transaction 
number of 171 to the end of the file:

@$${171{@

<(43CE=951942485280000)(43CA=951942483007000)(43FE=951942559286000)
  (45AA=951946071386000)(45AC=951946072007000)(45AE=951946074731000)
  (45A2=951946006223000)(44D5=951942622818000)(45A4=951946021475000)
  (43CC=951942483167000)(44D6=951942622908000)(45A8=951946063655000)
  (45AB=951946071416000)(45AD=951946072027000)(45AF=951946074741000)
  (45A3=951946006303000)(44D4=951942622537000)(450E=http://abcnews.com/)
  (450F=951945907010000)(4512=A$00B$00C$00N$00E$00W$00S$00.$00c$00o$00m$00)
  (4510=http://abcnews.go.com/)(459D=951945991211000)>
{1:^80 {/*r=0*/ (k^81:c)(s=9u)} 
  [-4E /*r=1*/ (^82^450E)(^84^450F)(^85^4512)]
  [-4F /*r=1*/ (^82^4510)(^84^459D)(^85^4512)]}
[1:^80 /*r=2*/ (^84^43CE)]
[2:^80 /*r=2*/ (^84^43CA)]
[3:^80 /*r=2*/ (^84^43FE)]
[5:^80 /*r=2*/ (^84^45AA)]
[6:^80 /*r=2*/ (^84^45AC)]
[7:^80 /*r=2*/ (^84^45AE)]
[8:^80 /*r=2*/ (^84^45A2)]
[A:^80 /*r=2*/ (^84^44D5)]
[C:^80 /*r=2*/ (^84^45A4)]
[2F:^80 /*r=1*/ (^84^43CC)]
[31:^80 /*r=1*/ (^84^44D6)]
[34:^80 /*r=1*/ (^84^45A8)]
[40:^80 /*r=1*/ (^84^45AB)]
[41:^80 /*r=1*/ (^84^45AD)]
[42:^80 /*r=1*/ (^84^45AF)]
[43:^80 /*r=1*/ (^84^45A3)]
[46:^80 /*r=1*/ (^84^44D4)]
@$$}171}@

@$${171{@
<(43CE=951942301666000)>[1:^80 /*r=2*/ (^84^43CE)]
<(43CA=951942298551000)>[2:^80 /*r=2*/ (^84^43CA)]
<(43D0=951942303188000)>[3:^80 /*r=2*/ (^84^43D0)]
<(43CC=951942298872000)>[2F:^80 /*r=1*/ (^84^43CC)]
<(4469=951942338419000)>[47:^80 /*r=1*/ (^84^4469)]
@$$}171}@

I might have expected an error report on non-ascending transaction numbers; but 
actually I was only concerned that start and end numbers match.  I could add a 
better warning to the code when a transaction number repeats, to tell users they 
might have multiple writers.
Is this really a browser-general bug.  If not, can someone move it to the
correct Product/Component.
changing component to 'history', though it's interaction with MDB of course.
Component: Browser-General → History
these are mine now, I guess.
Assignee: davidmc → bienvenu
I had forgotten about this bug (I filed it so long ago) so I'm glad that 
bienvenue just posted to it and I got the bugzilla announcement.

I just got hit by this again two days ago and forgot what the fix was.  So I had 
to zero in on a bad history.dat file by a process of elimination.  I still have 
that bad history.dat so I'll attach it here in case it gives any more clues over 
the last attachment that I made.
moving to m20 - probably won't fix for reasons described above.
Target Milestone: --- → M20
accepting
Status: NEW → ASSIGNED
updating qa contact
QA Contact: asa → claudius
nav triage team: WONTFIX
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → WONTFIX
mass-verifying WontFix bugs which haven't changed since 2001-12-31.

use the search string "BoletusEdulis" if you want to filter out this msg.
Status: RESOLVED → VERIFIED
Component: History: Session → Document Navigation
QA Contact: claudius → docshell
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: