Closed Bug 189832 Opened 22 years ago Closed 22 years ago

another case where fastload file can get corrupted (post bug 188744)

Categories

(Core :: XUL, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: jrgmorrison, Assigned: brendan)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed1.3)

Attachments

(2 files, 3 obsolete files)

This is a comment carried over from bug 188744. I need to look into this more.

        --- Additional Comment #5 From John Morrison 2003-01-16 22:35 ------- 

The corruption I'm seeing is reproducible as follows:
  1) start nav; quit
  2) start nav; quit
  3) start nav; start msgcompo; quit
  4) start nav; start messenger; triggers commonDialog; \
       start messengercompose.xul; send a message (get \
       progress dialog, and cert dialog [clientauthask.xul] \
       quit
  5) start messenger (-mail and avoid profile manager [-P]); \
       get commonDialog login; start AccountManager.xul; \
       ASSERT!!! and bork mObjectMap



What I'm seeing in debug: I come in at nsFastLoadFileUpdater::Open, where it
populates the object map.  That all goes fine and I have 196 objects with
first mOID being 8.  Next, I pick it up on first call to
nsFastLoadFileWriter::WriteObjectCommon after the PL_DHASH_ADD and inside of
the !entry->mObject test, so we've added another entry. 

But entryCount is 196. Huh?  When I inspect raw memory for mObjectMap, the
entry for mOID of 8 is gone (?).  Now, the entry count of 196 means we now get
a duplicate mOID of 1576 in the map and that leads to trouble down the road.
But I can't see where something would kill that first entry.  [By the way, I
have reproduced this in todays' mozilla win32 build so this corruption is 
independent of this fix, it would seem].  (I'm also referring to the untagged 
values for mOID [(index+1)<<3]).
I need to go back and confirm that I really did add an entry to the hash as 
described above. And need to trace through where the first entry is "going".

Status: NEW → ASSIGNED
Actually, I was somewhat confused earlier (what's new)

> That all goes fine and I have 196 objects with first mOID being 8.

Actually, that's not true. 

In nsFastLoadFileUpdater::Open, aReader->mFooter.mNumSharpObjects is equal to 
197. I start populating the mObjectMap hash with the vector from the disk 
file. At index 80 of the vector, I get back an 'obj' (mReadEntry) with the 
same value that I had for the entry at index 0 of the vector. So, the 
PL_DHASH_ADD winds up overwriting that hash entry. At the end of the loop,
197 disk entries have resulted in 196 hash entries. (And the last entry has 
mOID of 1576 (index 196)). 

When it goes to write this mObjectMap out to disk, it can't find an entry 
with index 0 because it was overwritten above. It may also do an ABW, because
it allocates a vector of 196 elements and then tries to copy into 'objvec[196]'
(although I need to go check this).
[Just dumping some notes here for my own memory as I flail about in this
code. Still pondering if this is an edge case that doesn't have bad affects
or if it's something more...].

So, that first object (objvec[0]) is (no surprise) an nsSystemPrincipal
(nsIPrincipal) with the initial serializing call to
nsFastLoadFileWriter::WriteObjectCommon coming from nsTranscodeJSPrincipals.

objvec[80] (i.e., mOID 648) gets added to object map when I open
messengercompose.xul from navigator.xul, after a session where I had
previously serialized all of navigator.xul into the fastload file.

(Note below that objvec[0] is read in with obj of 0x9. (Didn't I earlier
serialize an obj ref into that slot?)).

*x*: StartMuxDoc    035CB8C8, R, 016C9F38, 00000000, 
chrome://messenger/content/messengercompose/messengercompose.xul
$$$: nsFastLoadFileUpdater::Open, i =   0, obj = 0x00000009,
	              oid =  8, off =    709 (2c5),   str = 807, wk = 0
$$$: nsFastLoadFileUpdater::Open, i =   1, obj = 0x00000011,
	              oid = 16, off = 304195 (4a443), str = 1,   wk = 0
$$$: nsFastLoadFileUpdater::Open, i =   2, obj = 0x00000019,
	              oid = 24, off = 305865 (4aac9), str = 1,   wk = 0
...

I've found that I can get a number of related assertions when I just open with
'-browser -mail -edit' and then clicking around in the pref tree to switch
panels. I also got one hang and one crash in a current trunk opt. build, so we
still need to fix this bug. It all revolves around the serialization /
deserialization of the nsSystemPrincipal object in a situation where there are
several "not-live" instances in the fastload file and one (or more) instance
that are live in memory and then triggering nsFastLoadFileUpdater::Open for a
dialog that is not already in the fastload file. However, I don't really
understand the problem fully enough to directly fix it for 1.3. I do know (or
think I know) how I can add a band-aid to ::Open so that it will defend
against some of the badness, and I'm going to work on that.

Simpler testcase: 

1) set up mailnews to not login to the server on startup (so we don't have to
   deal with commonDialog.xul).
2) delete the fastload file.
3) start with "./mozilla -P myProf -browser" and quit.
4) start with "./mozilla -P myProf -browser", open message compose and quit.
5) start with "./mozilla -P myProf -mail", and quit.
6) start with "./mozilla -P myProf -mail", and open any dialog (which given
   the above will not be in the fastload file).
7) mObjectMap will be corrupted (actually, add an assertion to the top of 
   nsFastLoadFileWriter::WriteSharpObjectInfo that says the following to 
   know that you are doomed)

    NS_ASSERTION(aInfo.mCIDOffset  != 0 && aInfo.mCIDOffset < 5000000,
                 "about to hork aInfo.mCIDOffset!!");

   [where 5000000 is an arbitrary upper bound on the size of the fastload file;
    it should never be this big in practice.]

The (sucky) defense is that in nsFastLoadFileUpdater::Open, the mCIDOffset
values must monotonically increase by definition. If they don't, then the
fastload file is broken and we should (unfortunately, no other choice right
now) abort the fastload.  I just have to make sure that the higher level xul
prototype and js code will recover gracefully from an abort at this point.
Flags: blocking1.3b+
(this is the real fastload blocker, methinks).
Flags: blocking1.3b+ → blocking1.3b?
I need to test this further and think about whether this is what I want to do
for 1.3b. But this seems like a reasonable way to guard against this bug
(albeit by occasionally blowing away the fastload file :-/ ).
Flags: blocking1.3b? → blocking1.3b+
Comment on attachment 112946 [details] [diff] [review]
patch to guard against bogus entries in the object map

actually, I don't have any changes to make. This is the patch.

jst: could you r or sr? I went over this with jag for about a half hour
already. If you have any questions, let me know.
Attachment #112946 - Attachment description: draft patch to guard against bogus entries in the object map → patch to guard against bogus entries in the object map
Attachment #112946 - Flags: superreview?(jst)
Attachment #112946 - Flags: review?(jaggernaut)
I should note that at '@@ -1624,6 +1631,11 @@' where I don't bail out, it's 
okay because we will pick the bogus zero offset out at the beginning of the 
next fastload session, in either of the two other places where I bail.
Comment on attachment 112946 [details] [diff] [review]
patch to guard against bogus entries in the object map

+    //XXXjrgm bug #189832 [temp. hack]; if an offset is zero, we've hit
+    // the bug and most abort fastload now.
+    NS_ASSERTION(aInfo->mCIDOffset != 0, 
+		  "fastload reader: Offset into file cannot be zero!");
+    if (aInfo->mCIDOffset == 0) 
+	 return NS_ERROR_UNEXPECTED;

Change assert(!foo); if (foo) { ... } to if (!foo) { NS_ERROR(); ... }, and
replace "most" with "must" in the comment.

Same thing in the last hunk of this patch.

sr=jst with that change.
Attachment #112946 - Flags: superreview?(jst) → superreview+
Comment on attachment 112946 [details] [diff] [review]
patch to guard against bogus entries in the object map

r=ben@netscape.com
Attachment #112946 - Flags: review?(jaggernaut) → review+
Comment on attachment 112946 [details] [diff] [review]
patch to guard against bogus entries in the object map

a=asa (on behalf of drivers) for checkin to 1.3beta
Attachment #112946 - Flags: approval1.3b+
Checked in to trunk. (Discussed on irc with jst, and leaving this as
is; when we get a real fix, the assertions should stay, but the memset
becomes ifdef debug, and the runtime checks will be removed.)

I realized that on bailing out from the updater, we don't arrange to
nuke the fastload file, although the check in the reader must always
happen before the check in the updater so this is still an ok fix and
a file with a bogus entry won't be used and will be destroyed.

Leaving open to look into a better fix [and removing blocking1.3b+]

(Also, I can still get "ASSERTION: redundant multiplexed document?:
'docMapEntry->mString == nsnull'" when I start with all three of
navigator, composer and mailnews concurrently, but doesn't seem to
harmful at first look. I can also get that assertion while rapidly
clicking through pref panels that are not yet in the fastload
file. [And, I also seem to get a crash, shortly after that assertion
for pref panels, in PresShell::sPaintSuppressionCallback where the
presshell is trashed. But I don't know how that would relate to the
mux assertion].)
Flags: blocking1.3b+ → blocking1.3b-
Depends on: 192341
Maybe my comment is useful, since it is correlated with fastload:

Since today, without reason, mozilla has started to freeze EACH time I select
Bookmarks/ManageBookmarks, while all the other features work normally.  top
prints that mozilla uses 99% of CPU.

I started mozilla several times, but the problem persisted.

I replaced bookmarks.html from my profile with a fresh one, but the problem
persisted.

I read on Internet that starting with version 1.2 the fastload was introduced
and, if mozilla freezes, remove XUL.mfasl.  I removed it, and the
Bookmarks/ManageBookmarks works again...

I can send you my XUL.mfasl if you need it (4MB), maybe you want to study this
corrupted file.

I use Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226
Debian/1.2.1-9
Eugen, you are using a Mozilla build from last December.

As this bug is summarized "...post bug 188744" and bug 188744 was checked in in
mid-January, I don't think John is interested in your fastload file.
Additionally, the comment right before your one says that a fix was checked in
at January 30th.

So I think only fastload files from after January 17th/30th are interesting now.
John also states this in bug 169777 comment #172.

Thanks for trying to help anyway.
Attached patch proposed fix (obsolete) — Splinter Review
Testing help craved.

/be
Attached patch remove bogus assertion (obsolete) — Splinter Review
Thanks to jrgm for the testing help.

/be
Attachment #114270 - Attachment is obsolete: true
We should fix this for 1.3.

/be
Assignee: jrgm → brendan
Status: ASSIGNED → NEW
Flags: blocking1.3?
Flags: blocking1.3? → blocking1.3+
This should do it.

/be
Attachment #114286 - Attachment is obsolete: true
Attachment #114376 - Flags: superreview?(ben)
Attachment #114376 - Flags: review?(jrgm)
Ok, I'm done.

/be
Attachment #114376 - Attachment is obsolete: true
Attachment #114376 - Flags: superreview?(ben)
Attachment #114376 - Flags: review?(jrgm)
Attachment #114469 - Flags: superreview?(ben)
Attachment #114469 - Flags: review?(jrgm)
Comment on attachment 114469 [details] [diff] [review]
optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks

sr=ben@netscape.com
Attachment #114469 - Flags: superreview?(ben) → superreview+
I've thrown this build down the stairs in about as many weird ways as I could
imagine, and it all works correctly. The fastload file looks sane, and there
are no obvious problems. It definitely solves the specific case noted at the
beginning of this bug. I also checked perf on linux and I don't see any
significant change in startup time.

Patch looks good! r=jrgm, and thanks!

[Note: I can get some "redundant multiplexed document?" assertions, either, in
some cases, by starting multiple master documents concurrently (e.g., mail,
edit, browser), or by rapidly changing panels in the pref dialog. But that is
a separate issue from the hangs since it appears to defend correctly. I filed
bug 193530 for investigation].
Comment on attachment 114469 [details] [diff] [review]
optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks

r=jrgm
Attachment #114469 - Flags: review?(jrgm)
Attachment #114469 - Flags: review+
Attachment #114469 - Flags: approval1.3?
Fix checked in for 1.3 final.  Many thanks to jrgm -- onward to bug 193530!

/be
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: fixed1.3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: