Closed Bug 189832 Opened 23 years ago Closed 23 years ago

another case where fastload file can get corrupted (post bug 188744)

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jrgmorrison, Assigned: brendan)

References

Details

(Whiteboard: fixed1.3)

Attachments

(2 files, 3 obsolete files)

patch to guard against bogus entries in the object map 23 years ago John Morrison 2.29 KB, patch	bugs : review+ jst : superreview+ asa : approval1.3b+	Details \| Diff \| Splinter Review
proposed fix 23 years ago Brendan Eich [:brendan] 7.97 KB, patch		Details \| Diff \| Splinter Review
remove bogus assertion 23 years ago Brendan Eich [:brendan] 8.66 KB, patch		Details \| Diff \| Splinter Review
set mSkipOffset when "pre-reading" a singleton on update 23 years ago Brendan Eich [:brendan] 10.08 KB, patch		Details \| Diff \| Splinter Review
optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks 23 years ago Brendan Eich [:brendan] 10.84 KB, patch	jrgmorrison : review+ bugs : superreview+ dbaron : approval1.3+	Details \| Diff \| Splinter Review

John Morrison

Reporter

Description

•

23 years ago

This is a comment carried over from bug 188744. I need to look into this more. --- Additional Comment #5 From John Morrison 2003-01-16 22:35 ------- The corruption I'm seeing is reproducible as follows: 1) start nav; quit 2) start nav; quit 3) start nav; start msgcompo; quit 4) start nav; start messenger; triggers commonDialog; \ start messengercompose.xul; send a message (get \ progress dialog, and cert dialog [clientauthask.xul] \ quit 5) start messenger (-mail and avoid profile manager [-P]); \ get commonDialog login; start AccountManager.xul; \ ASSERT!!! and bork mObjectMap What I'm seeing in debug: I come in at nsFastLoadFileUpdater::Open, where it populates the object map. That all goes fine and I have 196 objects with first mOID being 8. Next, I pick it up on first call to nsFastLoadFileWriter::WriteObjectCommon after the PL_DHASH_ADD and inside of the !entry->mObject test, so we've added another entry. But entryCount is 196. Huh? When I inspect raw memory for mObjectMap, the entry for mOID of 8 is gone (?). Now, the entry count of 196 means we now get a duplicate mOID of 1576 in the map and that leads to trouble down the road. But I can't see where something would kill that first entry. [By the way, I have reproduced this in todays' mozilla win32 build so this corruption is independent of this fix, it would seem]. (I'm also referring to the untagged values for mOID [(index+1)<<3]).

John Morrison

Reporter

Comment 1

•

23 years ago

I need to go back and confirm that I really did add an entry to the hash as described above. And need to trace through where the first entry is "going".

Status: NEW → ASSIGNED

Serge Gautherie (:sgautherie)

Updated

•

23 years ago

Blocks: FastLoadHang

John Morrison

Reporter

Comment 2

•

23 years ago

Actually, I was somewhat confused earlier (what's new) > That all goes fine and I have 196 objects with first mOID being 8. Actually, that's not true. In nsFastLoadFileUpdater::Open, aReader->mFooter.mNumSharpObjects is equal to 197. I start populating the mObjectMap hash with the vector from the disk file. At index 80 of the vector, I get back an 'obj' (mReadEntry) with the same value that I had for the entry at index 0 of the vector. So, the PL_DHASH_ADD winds up overwriting that hash entry. At the end of the loop, 197 disk entries have resulted in 196 hash entries. (And the last entry has mOID of 1576 (index 196)). When it goes to write this mObjectMap out to disk, it can't find an entry with index 0 because it was overwritten above. It may also do an ABW, because it allocates a vector of 196 elements and then tries to copy into 'objvec[196]' (although I need to go check this).

John Morrison

Reporter

Comment 3

•

23 years ago

[Just dumping some notes here for my own memory as I flail about in this code. Still pondering if this is an edge case that doesn't have bad affects or if it's something more...]. So, that first object (objvec[0]) is (no surprise) an nsSystemPrincipal (nsIPrincipal) with the initial serializing call to nsFastLoadFileWriter::WriteObjectCommon coming from nsTranscodeJSPrincipals. objvec[80] (i.e., mOID 648) gets added to object map when I open messengercompose.xul from navigator.xul, after a session where I had previously serialized all of navigator.xul into the fastload file. (Note below that objvec[0] is read in with obj of 0x9. (Didn't I earlier serialize an obj ref into that slot?)). *x*: StartMuxDoc 035CB8C8, R, 016C9F38, 00000000, chrome://messenger/content/messengercompose/messengercompose.xul $$$: nsFastLoadFileUpdater::Open, i = 0, obj = 0x00000009, oid = 8, off = 709 (2c5), str = 807, wk = 0 $$$: nsFastLoadFileUpdater::Open, i = 1, obj = 0x00000011, oid = 16, off = 304195 (4a443), str = 1, wk = 0 $$$: nsFastLoadFileUpdater::Open, i = 2, obj = 0x00000019, oid = 24, off = 305865 (4aac9), str = 1, wk = 0 ...

John Morrison

Reporter

Comment 4

•

23 years ago

I've found that I can get a number of related assertions when I just open with '-browser -mail -edit' and then clicking around in the pref tree to switch panels. I also got one hang and one crash in a current trunk opt. build, so we still need to fix this bug. It all revolves around the serialization / deserialization of the nsSystemPrincipal object in a situation where there are several "not-live" instances in the fastload file and one (or more) instance that are live in memory and then triggering nsFastLoadFileUpdater::Open for a dialog that is not already in the fastload file. However, I don't really understand the problem fully enough to directly fix it for 1.3. I do know (or think I know) how I can add a band-aid to ::Open so that it will defend against some of the badness, and I'm going to work on that. Simpler testcase: 1) set up mailnews to not login to the server on startup (so we don't have to deal with commonDialog.xul). 2) delete the fastload file. 3) start with "./mozilla -P myProf -browser" and quit. 4) start with "./mozilla -P myProf -browser", open message compose and quit. 5) start with "./mozilla -P myProf -mail", and quit. 6) start with "./mozilla -P myProf -mail", and open any dialog (which given the above will not be in the fastload file). 7) mObjectMap will be corrupted (actually, add an assertion to the top of nsFastLoadFileWriter::WriteSharpObjectInfo that says the following to know that you are doomed) NS_ASSERTION(aInfo.mCIDOffset != 0 && aInfo.mCIDOffset < 5000000, "about to hork aInfo.mCIDOffset!!"); [where 5000000 is an arbitrary upper bound on the size of the fastload file; it should never be this big in practice.] The (sucky) defense is that in nsFastLoadFileUpdater::Open, the mCIDOffset values must monotonically increase by definition. If they don't, then the fastload file is broken and we should (unfortunately, no other choice right now) abort the fastload. I just have to make sure that the higher level xul prototype and js code will recover gracefully from an abort at this point.

Flags: blocking1.3b+

John Morrison

Reporter

Comment 5

•

23 years ago

(this is the real fastload blocker, methinks).

Flags: blocking1.3b+ → blocking1.3b?

John Morrison

Reporter

Comment 6

•

23 years ago

Attached patch patch to guard against bogus entries in the object map — Details — Splinter Review

I need to test this further and think about whether this is what I want to do for 1.3b. But this seems like a reasonable way to guard against this bug (albeit by occasionally blowing away the fastload file :-/ ).

Asa Dotzler [:asa]

Updated

•

23 years ago

Flags: blocking1.3b? → blocking1.3b+

John Morrison

Reporter

Comment 7

•

23 years ago

Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map actually, I don't have any changes to make. This is the patch. jst: could you r or sr? I went over this with jag for about a half hour already. If you have any questions, let me know.

Attachment #112946 - Attachment description: draft patch to guard against bogus entries in the object map → patch to guard against bogus entries in the object map

Attachment #112946 - Flags: superreview?(jst)

Attachment #112946 - Flags: review?(jaggernaut)

John Morrison

Reporter

Comment 8

•

23 years ago

I should note that at '@@ -1624,6 +1631,11 @@' where I don't bail out, it's okay because we will pick the bogus zero offset out at the beginning of the next fastload session, in either of the two other places where I bail.

Johnny Stenback (:jst)

Comment 9

•

23 years ago

Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map + //XXXjrgm bug #189832 [temp. hack]; if an offset is zero, we've hit + // the bug and most abort fastload now. + NS_ASSERTION(aInfo->mCIDOffset != 0, + "fastload reader: Offset into file cannot be zero!"); + if (aInfo->mCIDOffset == 0) + return NS_ERROR_UNEXPECTED; Change assert(!foo); if (foo) { ... } to if (!foo) { NS_ERROR(); ... }, and replace "most" with "must" in the comment. Same thing in the last hunk of this patch. sr=jst with that change.

Attachment #112946 - Flags: superreview?(jst) → superreview+

Ben Goodger (use ben at mozilla dot org for email)

Comment 10

•

23 years ago

Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map r=ben@netscape.com

Attachment #112946 - Flags: review?(jaggernaut) → review+

Asa Dotzler [:asa]

Comment 11

•

23 years ago

Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map a=asa (on behalf of drivers) for checkin to 1.3beta

Attachment #112946 - Flags: approval1.3b+

John Morrison

Reporter

Comment 12

•

23 years ago

Checked in to trunk. (Discussed on irc with jst, and leaving this as is; when we get a real fix, the assertions should stay, but the memset becomes ifdef debug, and the runtime checks will be removed.) I realized that on bailing out from the updater, we don't arrange to nuke the fastload file, although the check in the reader must always happen before the check in the updater so this is still an ok fix and a file with a bogus entry won't be used and will be destroyed. Leaving open to look into a better fix [and removing blocking1.3b+] (Also, I can still get "ASSERTION: redundant multiplexed document?: 'docMapEntry->mString == nsnull'" when I start with all three of navigator, composer and mailnews concurrently, but doesn't seem to harmful at first look. I can also get that assertion while rapidly clicking through pref panels that are not yet in the fastload file. [And, I also seem to get a crash, shortly after that assertion for pref panels, in PresShell::sPaintSuppressionCallback where the presshell is trashed. But I don't know how that would relate to the mux assertion].)

Flags: blocking1.3b+ → blocking1.3b-

Olivier Cahagne

Updated

•

23 years ago

Depends on: 192341

Eugen Dedu

Comment 13

•

23 years ago

Maybe my comment is useful, since it is correlated with fastload: Since today, without reason, mozilla has started to freeze EACH time I select Bookmarks/ManageBookmarks, while all the other features work normally. top prints that mozilla uses 99% of CPU. I started mozilla several times, but the problem persisted. I replaced bookmarks.html from my profile with a fresh one, but the problem persisted. I read on Internet that starting with version 1.2 the fastload was introduced and, if mozilla freezes, remove XUL.mfasl. I removed it, and the Bookmarks/ManageBookmarks works again... I can send you my XUL.mfasl if you need it (4MB), maybe you want to study this corrupted file. I use Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9

Andreas Kunz

Comment 14

•

23 years ago

Eugen, you are using a Mozilla build from last December. As this bug is summarized "...post bug 188744" and bug 188744 was checked in in mid-January, I don't think John is interested in your fastload file. Additionally, the comment right before your one says that a fix was checked in at January 30th. So I think only fastload files from after January 17th/30th are interesting now. John also states this in bug 169777 comment #172. Thanks for trying to help anyway.

Brendan Eich [:brendan]

Assignee

Comment 15

•

23 years ago

Attached patch proposed fix (obsolete) — Details — Splinter Review

Testing help craved. /be

Brendan Eich [:brendan]

Assignee

Comment 16

•

23 years ago

Attached patch remove bogus assertion (obsolete) — Details — Splinter Review

Thanks to jrgm for the testing help. /be

Attachment #114270 - Attachment is obsolete: true

Brendan Eich [:brendan]

Assignee

Comment 17

•

23 years ago

We should fix this for 1.3. /be

Assignee: jrgm → brendan

Status: ASSIGNED → NEW

Flags: blocking1.3?

Asa Dotzler [:asa]

Updated

•

23 years ago

Flags: blocking1.3? → blocking1.3+

Brendan Eich [:brendan]

Assignee

Comment 18

•

23 years ago

Attached patch set mSkipOffset when "pre-reading" a singleton on update (obsolete) — Details — Splinter Review

This should do it. /be

Attachment #114286 - Attachment is obsolete: true

Brendan Eich [:brendan]

Assignee

Updated

•

23 years ago

Attachment #114376 - Flags: superreview?(ben)

Attachment #114376 - Flags: review?(jrgm)

Brendan Eich [:brendan]

Assignee

Comment 19

•

23 years ago

Attached patch optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks — Details — Splinter Review

Ok, I'm done. /be

Attachment #114376 - Attachment is obsolete: true

Brendan Eich [:brendan]

Assignee

Updated

•

23 years ago

Attachment #114376 - Flags: superreview?(ben)

Attachment #114376 - Flags: review?(jrgm)

Brendan Eich [:brendan]

Assignee

Updated

•

23 years ago

Attachment #114469 - Flags: superreview?(ben)

Attachment #114469 - Flags: review?(jrgm)

Ben Goodger (use ben at mozilla dot org for email)

Comment 20

•

23 years ago

Comment on attachment 114469 [details] [diff] [review] optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks sr=ben@netscape.com

Attachment #114469 - Flags: superreview?(ben) → superreview+

John Morrison

Reporter

Comment 21

•

23 years ago

I've thrown this build down the stairs in about as many weird ways as I could imagine, and it all works correctly. The fastload file looks sane, and there are no obvious problems. It definitely solves the specific case noted at the beginning of this bug. I also checked perf on linux and I don't see any significant change in startup time. Patch looks good! r=jrgm, and thanks! [Note: I can get some "redundant multiplexed document?" assertions, either, in some cases, by starting multiple master documents concurrently (e.g., mail, edit, browser), or by rapidly changing panels in the pref dialog. But that is a separate issue from the hangs since it appears to defend correctly. I filed bug 193530 for investigation].

John Morrison

Reporter

Comment 22

•

23 years ago

Comment on attachment 114469 [details] [diff] [review] optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks r=jrgm

Attachment #114469 - Flags: review?(jrgm)

Attachment #114469 - Flags: review+

Attachment #114469 - Flags: approval1.3?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

23 years ago

Attachment #114469 - Flags: approval1.3? → approval1.3+

Brendan Eich [:brendan]

Assignee

Comment 23

•

23 years ago

Fix checked in for 1.3 final. Many thanks to jrgm -- onward to bug 193530! /be

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Asa Dotzler [:asa]

Updated

•

23 years ago

Whiteboard: fixed1.3

You need to log in before you can comment on or make changes to this bug.