Closed Bug 341595 Opened 19 years ago Closed 18 years ago

Crash opening mail [@ nsXULPrototypeElement::Deserialize][@ nsXULPrototypeAttribute::Finalize]

Categories

(Core :: DOM: Core & HTML, defect)

x86
All
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: ajschult784, Assigned: brendan)

References

Details

(5 keywords, Whiteboard: Please try comment 60 before commenting further. Thanks.)

Crash Data

Attachments

(9 files, 1 obsolete file)

If I open seamonkey mail with trunk build 2006061409, it usually crashes with terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc This seems to have regressed between 2006061209 and 2006061309 (it's not 100% consistent), but the top of the stack was touched by bug 255942, so that's probably the culprit.
Attached file stacktrace
(gdb) frame 15 #15 0x0696cfd1 in nsXULPrototypeElement::Deserialize (this=0x9135498, aStream=0x89812b0, aGlobal=0x90fde40, aDocumentURI=0x9009dd8, aNodeInfos=0xbf9a9c20) at /build/andrew/moz-debug/mozilla/content/xul/content/src/nsXULElement.cpp:2600 2600 mAttributes = new nsXULPrototypeAttribute[mNumAttributes]; (gdb) p mNumAttributes $1 = 2130706559
Blocks: dom-agnostic
Mark, could your changes have changed the format of the fastload file in any way? If so, we'd need to rev that version number...
(In reply to comment #2) > Mark, could your changes have changed the format of the fastload file in any > way? If so, we'd need to rev that version number... XUL_FASTLOAD_FILE_VERSION needs to be bumped (dropped, really) by one. Do not change MFL_FILE_VERSION, that versions the XPCOM-only "envelope", not the XUL-specific "contents", of the fastload file format. /be
Fast load version was fixed, as per Brendan's comments, in: Checking in content/xul/document/public/nsIXULPrototypeCache.h; new revision: 1.33; previous revision: 1.32 Checking in xpcom/io/nsFastLoadFile.h; new revision: 3.23; previous revision: 3.22
This should be ok now.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
I _did_ see this with earlier builds on Windows XP too, but now it's fixed with build 2006-06-14-20 of SeaMonkey trunk on Windows XP. (I know the build ID doesn't look like it should include Mark's fix (which landed at 2006-06-14 20:06 PDT, but the FTP server from which I downloaded this CREATURE build has its timestamp of deposit at 15-Jun-2006 01:12, and so its build ID must be the time it pulled its CVS source files.) Verified FIXED.
Status: RESOLVED → VERIFIED
OS: Linux → All
Actually, this bug is still alive and well. My new steps to reproduce are: 1. seamonkey -mail 2. kill _seamonkey_pid_ 3. seamonkey -mail [crash] same stack as before. A different profile doesn't exhibit the bug, so I'll try to track down the important difference.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Actual steps to reproduce (works with any profile): 1a. seamonkey 1b. start mail 2. kill _seamonkey_pid_ 3. seamonkey -mail [crash]
*** Bug 341927 has been marked as a duplicate of this bug. ***
There are about 192 TB-Crash-Incidents [@ nsXULPrototypeAttribute::Finalize] at Moment, starting with SM 2006061309-Build and also occour with actual Nightlies. So I think it was still the same Core-Bug, probably caused by Checkin from Bug 255942. Reproduce the crash is very simple, just startup SeaMonkey-Nightly after install/unpack, close and start again. When SeaaMonkey was started the second ore more time, it will crash everey time when I try to open MailNews.
Blocks: 342132
OK, with the steps from commet 8 I can reliably reproduce this. I even get the same crash if I just make nsXULPrototypeElement::Serialize bail out immediately with NS_ERROR_FAILURE. So _something_ is seriously wrong...
Flags: blocking1.9a1+
If I do bail out of nsXULPrototypeDocument::Write instead, things are "ok" (lots of asserts, but no crash).
OK, so the caller of nsXULPrototypeDocument::Write just ignores the return value; that's probably a separate bug here. When I actually open mail, I don't see Write() returning an error... Why does the fact that the process get killed matter? Does that somehow affect the fastload file in a bad way?
OK, killing doesn't matter either. Just quitting via Ctrl-Q has the same result (and produces the same fastload file). So something is just busted elsewhere...
OK, I did some more tracking. We die when deserializing the nsXULPrototypeElement for: 70 <script type="application/x-javascript" src="chrome://communicator/content/findUtils.js"/> in mailnews/base/resources/content/mailWindowOverlay.xul. This comes right afterthe nsXULPrototypeScript for 69 <script type="application/x-javascript" src="chrome://messenger/content/mail-offline.js"/> which is the first script in that overlay whose prototype ends up with a non-null mScriptObject (and for which therefore SerializeOutOfLine got called). Now I'm not sure whether that's relevant, but it does seem a little suspicious. I did look at the data that SerializeOutOfLine writes (hopefully to a different area of the fastload file), and it doesn't look like the bogus data we end up reading back later... So it's still not clear to me what's up. I'm not going to be able to debug this until I get back in mid-July, though, so someone else should really take over.
Is this a duplicate of Bug 342332 and Bug 342510?
*** Bug 342510 has been marked as a duplicate of this bug. ***
Blocks: 342332
*** Bug 342332 has been marked as a duplicate of this bug. ***
No longer blocks: 342332
It seems that this bug was introduced with the fix for Bug 341709 .
No, it's not. See comment 0. How can this bug be caused by a fix that was committed two days after this was filed?
(In reply to comment #21) > No, it's not. See comment 0. How can this bug be caused by a fix that was > committed two days after this was filed? > Andrew, at least in the Mac-version this error only appeared AFTER the other bug was fixed.
In the Fastload process, it seems that the read operation for XUL.mfasl doesn't follow the binary stream order. That's to say, the read and write operations don't match exactly. Is it designed to be that way? To improve the performance, the real content for a duplicated one won't be written to the fastload file. If seamonkey reads the duplicated one first, then the one with real content, it crashes.
Er... Reads and writes should be completely symmetric for a fastload file. Where is there an asymmetry?
I add some printf statements in nsXULPrototypeElement::Serialize and nsXULPrototypeElement::Deserialize to get the attributeValue read and written. They aren't symmetric in my output.
*** Bug 342132 has been marked as a duplicate of this bug. ***
Huh. I'd stopped checking the actual values at some point... Which node in which XUL file is the first node they differ for? How does that compare to comment 16, if at all?
Here is the running logs for read and write. They're a little big, so pack them. From the logs, we can get that the first XUL nodes read and written are different. Deserialize(read): chrome://navigator/content/navigator.xul Serialize(write): chrome://reporter/content/reporterOverlay.xul From the write log, we can get that seamonkey crashes just when deserializing mail-offline.js. It's treated as a duplicated one which isn't correct.
(In reply to comment #28) > From the write log, we can get that seamonkey crashes just when deserializing Should be the read log.
Ah. Files can happen in different orders, sure. The important thing is that within any given file reads and writes are the same order. And it does look like you're seeing the same thing I am in terms of where we crash. We deserialize the mail-offline.js prototype element fine, then bad things happen. In particular, I'm interested in the: #########The number of Children read: 0###################### #########The number of Attributes: 0###################### lines. Where are those zeros coming from? By "duplicate" do you mean the XUL cache?
As I searched around the code, I suspect that deserializing mail-offline.js causes the crash. From the read log, we can get that it's the second time we come to mail-offline.js. Another mail-offline.js has already been deserialized and put into the cache. I take the one causes the crash as the duplicated one. For the duplicated one, method nsXULPrototypeScript::DeserializeOutOfLine will search the cache first and get a hit. And the deserializing process terminates at this point. When going through the binary file XUL.mfasl, we can find that the duplicated one has the real content of the js file. Skiping it will lead to the zero attribute ouput, which causes the crash.
(In reply to comment #31) > As I searched around the code, I suspect that deserializing mail-offline.js > causes the crash. > > From the read log, we can get that it's the second time we come to > mail-offline.js. Another mail-offline.js has already been deserialized and put > into the cache. What other mail-offline.js? What is its URI? /be
Summary: Crash opening mail → Crash opening mail [@ nsXULPrototypeElement::Deserialize]
*** Bug 343241 has been marked as a duplicate of this bug. ***
(In reply to comment #32) > What other mail-offline.js? What is its URI? Maybe I don't make myself clear. It's not a different mail-offline.js. My meaning is that this file has been serialized (or deserialized) more than one time. The URIs are the same, and they're the keys for searching the cache. That will result in a cache hit.
Wait. So the fastload file contains two copies of mail-offline.js? And one of them is basically sitting in the middle of our XUL document or something? The following code in nsXULPrototypeScript::SerializeOutOfLine should protect against that, I would think: 2822 PRBool exists = PR_FALSE; 2823 fastLoadService->HasMuxedDocument(urispec.get(), &exists); Does that not work right?
(In reply to comment #35) > Does that not work right? I checked the fastload file. It contains two mail-offline.js. The first one has the real content of that file. I think "HasMuxedDocument" works, so the second one doesn't have the content, just the name in the fastload file.
> I checked the fastload file. It contains two mail-offline.js. How did you determine that, if I might ask?
(In reply to comment #37) > How did you determine that, if I might ask? I'm using the UltraEdit on Windows. Just search for the binary string "6d00610069006c002d006f00660066006c0069006e006500". Found two of them.
What does that binary string correspond to?
It's "m\0a\0i\0l\0-\0o\0f\0f\0l\0i\0n\0e\0". And the write log also shows that the mail-offline.js has been written twice I think.
Alfred, I was asking what those bytes correspond to in terms of Gecko data structures... I do think we're talking about the same thing, from two slightly different angles. Just to make sure we're on the same page, I do confirm that when we crash is indeed when we deserialize the XULPrototypeElement right after a call to DeserializeOutOfLine for mail-offline.js which finds it in the cache. Furthermore, I just looked at what we actually end up writing out when we serialize that out-of-line script. We write out the mLineNo, then the mLangVersion, then start doing XDR stuff. The bytes end up looking like: 0x0000003e 0x00000000 0x00000238 (mLineNo, mLangVersion, size of xdr data) 0xdead0005 0x0000004d 0x0000002d 0x00000000 0x0000004c 0x00000000 0x7f00007f 0x007f0100 0x03007f02 0x7d04007d 0x007d0500 0x07007d06 0x7d08007d 0x007d0900 0x0b007d0a 0x7d0c007d 0x007d0d00 0x00006c0e 0x00006d40 0x01003b51 (the XDR data). When we're deserializing, we crash at a point when we've read the following data for the nsXULPrototypeElement following this script: mType == eType_Element == 0 mScriptTypeID == 0x4c000000 nodeinfo index == 0 mNumAttributes = 0x7f00007f Modulo endianness issues, this looks like we're reading starting from the 4th word of the XDR data. Further, if I actually look at the stream that's been passed to nsXULPrototypeElement::Deserialize, it's a nsFastLoadFileReader. Its mInputStream is a binary input stream, and for that stream we have: (gdb) p $t->mCursor $24 = 406 (gdb) x/30xw $t->mBuffer + 374 0x8384cb6: 0x38020000 0xdead0005 0x0000004d 0x0000002d 0x8384cc6: 0x00000000 0x0000004c 0x00000000 0x7f00007f 0x8384cd6: 0x007f0100 0x03007f02 0x7d04007d 0x007d0500 0x8384ce6: 0x07007d06 0x7d08007d 0x007d0900 0x0b007d0a 0x8384cf6: 0x7d0c007d 0x007d0d00 0x00006c0e 0x00006d40 0x8384d06: 0x01003b51 0x02003b51 0x03003b51 0x00000051 0x8384d16: 0x00000000 0x00000000 0x000000c3 0x0000000f 0x8384d26: 0x00000000 0x00000004 That last set of data, modulo endianness, is exactly the size of the XDR data, followed by the XDR data itself. So we're definitely reading the XDR data here while actually deserializing a totally different file (!). So I guess the question is whether the fastload file is corrupt (implicating the serialization code) or whether the deserialization code is somehow wrong...
Brendan, do these logs tell you anything interesting?
Just in case you want a talkback ID TB20554439Y is one such.
Summary: Crash opening mail [@ nsXULPrototypeElement::Deserialize] → Crash opening mail [@ nsXULPrototypeElement::Deserialize nsXULPrototypeAttribute::Finalize]
TB20522062X TB20522104X TB20556624K Any workaround for starting mail? All attempts now fail with SM 1.5a
(In reply to comment #45) > TB20522062X > TB20522104X > TB20556624K > > Any workaround for starting mail? > All attempts now fail with SM 1.5a Yes; move away or delete your XUL.mfl (might have a different extension name in Linux/Mac).
(In reply to comment #46) > (In reply to comment #45) > > Any workaround for starting mail? > > All attempts now fail with SM 1.5a > > Yes; move away or delete your XUL.mfl (might have a different extension name in > Linux/Mac). Hmmm, I don't have such a file to move away/delete. Fails on second starting of a new profile for me too.
(In reply to comment #42) > Created an attachment (id=227793) [edit] > DEBUG_MUX trace from first startup (writing) From that trace: start 0xb1739238 (0xb173923c) chrome://messenger/content/mailWindowOverlay.xul select prev chrome://messenger/content/folderPane.xul offset 1020924 select 0xb1739238 (0xb173923c) offset 1031767 start 0xb17c3568 (0xb17c356c) chrome://messenger/content/mail-offline.js select prev chrome://messenger/content/mailWindowOverlay.xul offset 1031767 select 0xb17c3568 (0xb17c356c) offset 1042360 end 0xb17c3568 (0xb17c356c) So far, so good. Wish we had the final offset for this segment at the time end was traced. Continuing: select prev chrome://messenger/content/mail-offline.js offset 1042360 select 0xb1739238 (0xb173923c) offset 1053316 end 0xb1739238 (0xb173923c) The "select prev" after end, with no further mention of mail-offline.js or the end'ed URI pointer 0xb1739238, is sub-optimal on its face, and possibly a sign of the bug. It means some code called SelectMuxedDocument on a URI that was already loaded. The likely call site is http://lxr.mozilla.org/mozilla/source/content/xul/content/src/nsXULElement.cpp#2856. The sub-optimality exposes an old hazard in xpcom/io/nsFastLoadFile.cpp: nsFastLoadFileWriter::EndMuxedDocument leaves mCurrentDocumentMapEntry non-null, so if some errant client of the FastLoad service continues to serialize after calling EndMuxedDocument, and before selecting any other document, unmapped bytes of goodness will be lost "between segments" in the mux. This hazard exists because nsFastLoadFileWriter closes open segments lazily. That is, it expects callers never to select after end'ing, and it seeks back to the segment header to record the final length of the segment only when switching segments via select, or when closing the entire muxed file. Back to the trace. The line that follows the "select prev" line shows a select for 0xb1739238, which is mailWindowOverlay.xul. The line immediately after this select shows the end of mailWindowOverlay.xul. The offset at which the first byte of the final segment of data from that doc was serialized to the FastLoad file is 1053316, but the "select prev" for mail-offline.js traced the first byte of the open segment it was closing at offset 1042360. If the left-over mCurrentDocumentMapEntry hazard mentioned above is biting, and more data *was* serialized for mail-offline.js, even though it had already been end'ed, which would explain the bug. What code might call SelectMuxedDocument after EndMuxedDocument? To put it another way, what code calls Select without calling End in the same method? The same "re-select the old URI returned when we selected, in which we nested" code at http://lxr.mozilla.org/mozilla/source/content/xul/content/src/nsXULElement.cpp#2856. But Select after End should result in an assertion: "SelectMuxedDocument without prior StartMuxedDocument?" Is anyone seeing this? It's important to run with XPCOM_DEBUG_BREAK=trap in your environment, in a debugger. Obviously, something hard to see in mhammond's landing screwed up order of ops here, and the underlying hazard bit hard. I will patch nsFastLoadFile.cpp so it closes open segments eagerly, so that end'ing and then select'ing is an even more obvious error. But the assertion "SelectMuxedDocument without prior StartMuxedDocument" should already be botching. If it isn't, we have some other bug that I do not yet understand. /be
No assertions fire before the crash
(In reply to comment #49) > No assertions fire before the crash Not before the crash -- in the session before that, when creating teh XUL.mfl file. Try removing that file and restarting with XPCOM_DEBUG_BREAK=trap, and report any stack backtraces that lead to botching nsFastLoadFile.cpp assertions. /be
Right. I thought of that after I submitted the comment, but there were also no assertions in the first session, after removing XUL.mfasl.
> So far, so good. Wish we had the final offset for this segment at the time end > was traced. I add " mSeekableOutput->Tell(&saveOffset); TRACE_MUX(('w', "end %p (%p) offset %ld\n", aURI, key.get(), saveOffset)); " to the end trace to print out the offset. Is this what you want? I don't have the assertion here in the first session either, with XPCOM_DEBUG_BREAK=trap set in my debug build.
*** Bug 343533 has been marked as a duplicate of this bug. ***
Summary: Crash opening mail [@ nsXULPrototypeElement::Deserialize nsXULPrototypeAttribute::Finalize] → Crash opening mail [@ nsXULPrototypeElement::Deserialize][@ nsXULPrototypeAttribute::Finalize]
> Hmmm, I don't have such a file to move away/delete. Fails on second starting of > a new profile for me too. > Ian, on the Mac the files is named XUL.mfasl and is located here: /Users/Your_Name/Library/Caches/Mozilla/Profiles/Name_of_Profile
(In reply to comment #41) > Alfred, I was asking what those bytes correspond to in terms of Gecko data > structures... "nsAutoString attributeValue" in function "nsXULPrototypeElement::Deserialize" corresponds to the bits I search for. I just want to know how the data is stored in the file and search around. bz, I do believe that we're on the same page. And following are the debug information I've found out. Hope that will be useful. In the read process, the URI chrome://messenger/content/mail-offline.js will be deserialized twice. For the first time, the process goes like this: data block: 01a8a70 6a002e00 00007300 00000000 00000100 01a8a80 6a010200 9ea2e209 00b87937 ff010000 ...... 01a8b30 01000000 bb79379e 00000000 02000000 01a8b40 11000000 02000000 12000000 18000000 (The first column is the offset in the binary file, and all the data in little endian.) I omit some parts, including mScriptTypeID, mNodeInfo, mNumAttributes, loop mNumAttributes times(index for aNodeInfos, attributeValue), and mNumChildren. Read begins from offset 0x1a8a7a, childType=>0x00000001, langID=>0x00000002, script->mOutOfLine=>0x01. And then the readobject goes until offset 0x1a8b38. In "nsXULPrototypeScript::Deserialize", the statement "aStream->Read32(&mLineNo);" will change the offset. t@1 (l@1) stopped in nsFastLoadFileReader::Read at line 546 in file "nsFastLoadFile.cpp" 546 if (entry) { (dbx) x entry /8 0x083af238: 0xd071b668 0x083ad6a0 0x08abb820 0x000fe04a 0x083af248: 0x000fe04a 0x00000000 0x00000000 0x00000000 The entry information is set by "nsFastLoadFileReader::SelectMuxedDocument". The new offset is 0x000fe04a, with new data block: 00fe040 00000001 379e0100 0000bb79 00000000 00fe050 0000cc2a 00004500 00000000 00053802 00fe060 004ddead 002d0000 00000000 004c0000 entry->mNextSegmentOffset=>0x00000000, entry->mBytesLeft=>0x00002acc, mLineNo=>0x00000045, mLangVersion=>0x00000000, size of xdr data=>0x00000238, and then the xdr data. At last, the mScriptObject will be put into the cache. The SelectMuxedDocument follows EndMuxedDocument will switch the "current entry" to the old one. In the next read "rv |= aStream->Read32(&number);", the offset will also be set back. t@1 (l@1) stopped in nsFastLoadFileReader::Read at line 546 in file "nsFastLoadFile.cpp" 546 if (entry) { (dbx) x entry /8 0x083af4f8: 0xd9351796 0x083ad8e8 0x08744b20 0x0015d287 0x083af508: 0x001aaa12 0x80000172 0x001a8b38 0x00000000 And the deserialize process goes to chrome://messenger/content/phishingDetector.js. For the second time, the same process at the beginning. Data block: 00fdf80 2e006500 73006a00 00000000 01000000 00fdf90 02000000 e5096a01 79379eca 000000b8 ...... 00fe040 00000001 379e0100 0000bb79 00000000 00fe050 0000cc2a 00004500 00000000 00053802 Read begins from offset 0xfdf8c, childType=>0x00000001, langID=>0x00000002, script->mOutOfLine=>0x01. Readobject until 0xfe04a. In "nsXULPrototypeScript::DeserializeOutOfLine", there will be a cache hit and return. Then childType=>0x00000000, mScriptTypeID=>0x00002acc(Invalid). The read process messes up here. The offset should be set by the next read "rv |= aStream->Read32(&number);". The entry information is as follow: t@1 (l@1) stopped in nsFastLoadFileReader::Read at line 546 in file "nsFastLoadFile.cpp" 546 if (entry) { (dbx) x entry /8 0x083ae418: 0x52243ad6 0x0838df90 0x08a5ea40 0x000fb6e9 0x083ae428: 0x00100b16 0x000000b3 0x000fdee4 0x00000000 (dbx) print (char *)0x0838df90 (char *) 0x838df90 = 0x838df90 "chrome://messenger/content/mailWindowOverlay.xul" Here, the entry->mNeedToSeek == 0 and entry->mBytesLeft != 0, so the offset doesn't change, which causes the crash at the end. brendan, any clue from you?
> brendan, any clue from you? What you describe is consistent with what I wrote, I think. The problem is not in the reader, but in the earlier application session that wrote the fastload file. That session, the one where TRACE_MUX('w', ...) calls are made conditionally by nsFastLoadFile.cpp, is the one to debug further. /be
One high-level question: is mail-offline.js included by separate app-components in seamonkey, but not in thunderbird? That might be part of the chain of cause and effect that leads to this bug biting only seamonkey and not thunderbird (if it does bite only seamonkey -- does it?). /be
I'm getting this crash in Firefox on trunk from 7/6/2006.
*** Bug 343838 has been marked as a duplicate of this bug. ***
Workaround: Disable XUL cache in by checking the checkbox in "Pref -> Debug -> Networking -> Disable XUL Cache"
You mean, edit the prefs.js file in the profile directory - as one can't change prefs via the UI if we're crashing on startup...
Attached patch Patch v1 (obsolete) — Splinter Review
> (dbx) x entry /8 > 0x083ae418: 0x52243ad6 0x0838df90 0x08a5ea40 0x000fb6e9 > 0x083ae428: 0x00100b16 0x000000b3 0x000fdee4 0x00000000 > (dbx) print (char *)0x0838df90 > (char *) 0x838df90 = 0x838df90 > "chrome://messenger/content/mailWindowOverlay.xul" > > Here, the entry->mNeedToSeek == 0 and entry->mBytesLeft != 0, so the offset > doesn't change, which causes the crash at the end. From the above debug info, we can get that the mBytesLeft is 0xb3. And it should be zero to trigger the Seek in Read at that point. If an object definition has already been read, ReadObject will skip it. I think the problem here is that mBytesLeft should also be adjusted at that time. > The "select prev" after end, with no further mention of mail-offline.js or the > end'ed URI pointer 0xb1739238, is sub-optimal on its face, and possibly a sign > of the bug. After debugging, I found all the js files except mail-offline.js pointed by mailWindowOverlay.xul had already loaded. So no *MuxedDocument functions are called after "end 0xb1739238". I think that's the reason why no further mention after that. Don't know whether this is the right way to go. Brendan, any opinion?
Attachment #228432 - Flags: review?(brendan)
One line is deleted in the patch. I take it as a redundant one.
I'll have a look later, but why patch nsFastLoadFile.cpp unless it had a latent bug that surfaced only because Mark Hammond's Python-for-XUL patch exposed it? If so, what change of Mark's actually made the latent bug bite? I still smell a problem at a higher layer. /be
(In reply to comment #61) > You mean, edit the prefs.js file in the profile directory - as one can't change > prefs via the UI if we're crashing on startup... > I assume that you can startup Seamonkey browser (not mail) normally. If not so, try setting "nglayout.debug.disable_xul_cache" to true in pref.js...
(In reply to comment #60) > Workaround: Disable XUL cache in by checking the checkbox in > "Pref -> Debug -> Networking -> Disable XUL Cache" That solved the problem for me. My thanks to whoever had the foresight to put that pref in SeaMonkey!
> The sub-optimality exposes an old hazard in xpcom/io/nsFastLoadFile.cpp: > nsFastLoadFileWriter::EndMuxedDocument leaves mCurrentDocumentMapEntry > non-null, so if some errant client of the FastLoad service continues to > serialize after calling EndMuxedDocument, and before selecting any other > document, unmapped bytes of goodness will be lost "between segments" in the > mux. This really gives me some hint to debug. I add some print statements to nsBufferedInputStream::ReadSegments, nsBufferedOutputStream::Write and nsBufferedOutputStream::WriteSegments, try to get the "count" value of each read and write. They should match with each other, but that's not true for mailWindowOverlay.xul. In the read process, mailWindowOverlay.xul has already been deserialized with the specific oid and is skipped. If the Python-for-XUL patch has some latent effects, why the object has been deserialized before the reference maybe a spot to investigate further.
Just to confirm that Pref -> Debug -> Networking -> Disable XUL Cache 'solved' the problem for me.
Works for me with built Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a1) Gecko/20060709 SeaMonkey/1.5a. Does not crash, even with XUL Cache enabled!
(In reply to comment #69) > Works for me with built Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; > rv:1.9a1) Gecko/20060709 SeaMonkey/1.5a. > > Does not crash, even with XUL Cache enabled! That's clear from comment 0; please read all comments carefully before adding more.
Happens also in BeOS port of SeaMonkey, latest tested version is from 2006-07-08-trunk sources
Hi guys, and sorry for my invisibility. I'm inclined to agree with Brendan that the problem is the writing of the .mfl file rather than the reading. I don't see the "SelectMuxedDocument without prior StartMuxedDocument?" assertion. Coming at this from the "problem is writing" angle, I came up with the following: To get a valid XUL.mfl: * remove XUL.mfl * start seamonkey with "-mail", and press Ctrl+Q If you follow this process, you will always be able to restart seamonkey with "-mail" or with no args - it will always work. The XUL.mfl file will always be exactly the same size in bytes each time you follow this process. To get an invalid XUL.mfl: * remove XUL.mfl * start seamonkey with no args (ie, Nav), and press Ctrl+Q * start seamonkey -mail and press Ctrl+Q From this point on, "seamonkey -mail" will always crash at startup. Navigator never seems to. I've seen a couple of different sizes for this file while testing. I had a bit of a look at the DEBUG_MUX tracing. Of note, when following the process to get an invalid .mfl, the various mux.wtrace files show *no* mail specific items being written, including for the "-mail" run - eg, none of these runs ever write 'mailWindowOverlay.xul' nor 'mail-offline.js' to mux.wtrace. In comparison, when following the process to generate a valid .mfl file, both mailWindowOverlay.xul and mail-offline.js appear in mux.wtrace, for all runs. In all cases, these files *do* appear in the various mux.rtrace files. That *seems* significant to me, but I'm not sure yet, and out of time for today. I hope this is some help - I'll try and dig more tomorrow.
For what it's worth, the simplest explanation for why this didn't break before is that it was broken all along, but we got a bogus index into the nodeinfo array, got a null nodeinfo, and bailed with NS_ERROR_UNEXPECTED. With the extra 32-byte language IDs around, the bogus bytes happen for mNumAttributes instead, which crashes. Doesn't really help with the issue of _what_ is broken, though, even if all that is true. Brendan, I'm not quite sure I follow comment 48. Are you saying that serializing bits of the script data after calling EndMuxedDocument is the problem? Because it seems correct to me to reselect the URI that was selected before we called SelectMuxedDocument, no? And that Select call would come after EndMuxedDocument. In other words, the pattern in nsXULPrototypeScript::Serialize looks right to me. What I'm not sure about is whether the offset listed for mailWindowOverlay.xul should always be the same... It's the same when we initially select mailWindowOverlay.xul and when we select mail-offline.js, but when we reselect mailWindowOverlay.xul we now have a different offset (1042360 vs 1031767). Could that cause issues? If so, how could that arise?
OK, as an experiment I changed the code in nsXULPrototypeElement::Deserialize on the 1.8 branch from: 3117 mNodeInfo = aNodeInfos->SafeObjectAt(number); 3118 if (!mNodeInfo) 3119 return NS_ERROR_UNEXPECTED; to: mNodeInfo = aNodeInfos->SafeObjectAt(number); if (!mNodeInfo) { NS_ERROR("DEAD"); return NS_ERROR_UNEXPECTED; } and tried the steps from this bug. I hit the NS_ERROR when I do that; in gdb we have: (gdb) frame #3 0xb67c8b3d in nsXULPrototypeElement::Deserialize (this=0xb4aab1d8, aStream=0x83a3b88, aContext=0xb4a6df30, aDocumentURI=0x83ce0e0, aNodeInfos=0xbfffe8d0) at ../../../../../mozilla/content/xul/content/src/nsXULElement.cpp:3119 3119 NS_ERROR("DEAD"); (gdb) p number $4 = 10956 (gdb) p aNodeInfos->Count() $5 = 119 so we're pretty definitely reading bogus data there on branch too; it just happens that we're reading it in a place where we "detect" that it's bogus and abort fastload. Given that, I'm pretty sure the effect of Mark's patch is indeed to just shift the data around in the fastload file a bit so we get a sane, though incorrect, nodeinfo index (0 to be exact) and then get a totally bogus number for the _next_ Read32. So I'd say the right place to look for the bug isn't really in the changes Mark made. Good thing, because I'd checked those over with a fine-toothed comb and they all looked OK to me. ;)
(In reply to comment #73) > What I'm not sure about is whether the offset listed for mailWindowOverlay.xul > should always be the same... It's the same when we initially select > mailWindowOverlay.xul and when we select mail-offline.js, but when we reselect > mailWindowOverlay.xul we now have a different offset (1042360 vs 1031767). > Could that cause issues? If so, how could that arise? I think the offset for mailWindowOverlay.xul doesn't need to be the same. When we reselect mailWindowOverlay.xul, the offset value will be changed: http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsFastLoadFile.cpp#1531 Actually, mCurrentSegmentOffset stores the offset value for the placeholder(nextSegmentOffset and length) for the previous segment. The next time select is called, the length value will be updated: http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsFastLoadFile.cpp#1492 In this case, 1031767 is the starting offset for mailWindowOverlay.xul, 1042360 should be the starting offset for mail-offline.js, and 1053316 should be the offset when the serialization of mail-offline.js is finished.
(In reply to comment #74) > So I'd say the right place to look for the bug isn't really in the changes Mark > made. Good thing, because I'd checked those over with a fine-toothed comb and > they all looked OK to me. ;) Nice experiment and good point! Hope that your vacation wasn't affected by this bug :-)
(In reply to comment #74) > so we're pretty definitely reading bogus data there on branch too; it just > happens that we're reading it in a place where we "detect" that it's bogus and > abort fastload. Argh, I ass-u-med that we had no such "latent" bug, or it would have been reported already due to the Aborted.mfasl turds. I'll take this bug and evaluate Alfred's patch in detail tomorrow. We need a FasterLoad bug on file, to deCOMtaminate the fastload code and switch it to use mmap (and to compute the checksum on smaller pieces of the file, and avoid deserializing "cold" JS functions, etc. -- some of these bugs are on file but are not linked to a metabug). /be
Assignee: general → brendan
Status: REOPENED → NEW
Unfortunately no mmap in BeOS, so we will be forced to use probably some ifdef-ed hack in port, if bugfix will include explicit mmap call:(
deserialize constitues 460 of ~500 TB reports with kernel32.dll as top of stack
Keywords: topcrash
System has been unused for two weeks. Downloaded latest build. Seamonkey crashes on first startup of mail. Only way to read mail is too use Mozilla.
Whiteboard: Please try comment 60 before commenting further. Thanks.
Applying Alfred's patch locally fixes this crash for me, as far as I can tell.
Attached patch Patch, v1.1Splinter Review
Alfred, thanks very much for pursuing this fix. I'm surprised this bug did not bite till now. I've added a few assertions and beefed up an invariant elsewhere, in addition to removing the useless assignment you spotted. I will check this in ASAP, ideally with your review, crediting you and citing my own review. /be
Attachment #228432 - Attachment is obsolete: true
Attachment #231520 - Flags: review?(alfred.peng)
Attachment #228432 - Flags: review?(brendan)
Comment on attachment 228432 [details] [diff] [review] Patch v1 Noting r+ for the record. /be
Attachment #228432 - Flags: review+
Comment on attachment 231520 [details] [diff] [review] Patch, v1.1 >- // we no longer need it, and we do not want to extend its lifetime. >- if (uriMapEntry->mDocMapEntry) >+ // we no longer need it, and we do not want to extend its lifetime. Also >+ // null mCurrentDocumentMapEntry if aURI is currently selected. >+ if (uriMapEntry->mDocMapEntry) { > NS_RELEASE(uriMapEntry->mDocMapEntry->mURI); >+ if (uriMapEntry->mDocMapEntry == mCurrentDocumentMapEntry) >+ mCurrentDocumentMapEntry = nsnull; >+ } I'm fine with the previous part of the patch. For this part, I checked the code a little bit. If we made such change here, the write operation of the segment length will be skipped: http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsFastLoadFile.cpp#1492. Actually, for select after end without start, the length information is useless I think. The offset change will be triggered by http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsFastLoadFile.cpp#541, not by "entry->mBytesLeft == 0". Just put some of my opinion here. r+=alfred.
Attachment #231520 - Flags: review?(alfred.peng) → review+
Thanks, Alfred. I want select after end without start to be an error. That will take more work than this patch deserves. Nulling mCurrentDocumentMapEntry at least does not leave that "previous" entry around to cause a bogus offset to be saved, as you note. Fixed, sorry it took so long. Nominate for branches if it seems important -- for some reason it seems to have bitten only on the trunk. /be
Status: NEW → RESOLVED
Closed: 19 years ago18 years ago
Resolution: --- → FIXED
No longer blocks: 342132
Brendan, alfred, how safe is the patch? If it's safe, we should land it at least for 1.8, imo.
The patch is safe, in that the first hunk makes state more consistent, the second removes a redundant assignment (easy to prove from local static analysis), the third nulls a pointer that should not be used after EndMuxedDocument returns. I think it can be nominated after a few days' baking. /be
This seems to have turned balsa firefox trunk orange. /be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backed out, not sure how to proceed. Can anyone who tried Patch v1 please try Patch v2 and try to reproduce what the tinderbox saw: ###!!! ASSERTION: demux segment length botch!: 'entry->mBytesLeft >= 8', file /builds/tinderbox/Firefox-gcc3.4/Linux_2.4.7-10_Depend/mozilla/xpcom/io/nsFastLoadFile.cpp, line 579 /be
I can reproduce that assert with the following steps: 1) Apply patch 2) Rebuild xpcom/ 3) Nuke XUL.mfasl 4) seamonkey -mail 5) Quit 6) env XPCOM_DEBUG_BREAK=trap seamonkey -g -mail 0xb7f5f761 in nsFastLoadFileReader::Read (this=0x8365968, aBuffer=0x8397170 "", aCount=4, aBytesRead=0xbfffd41c) at ../../../mozilla/xpcom/io/nsFastLoadFile.cpp:579 579 NS_ASSERTION(entry->mBytesLeft >= 8, "demux segment length botch!"); (gdb) p entry->mBytesLeft $1 = 0 I have to go eat dinner now, but I'll be back tonight if you want to debug this remotely...
Attached patch Patch, v1.2Splinter Review
Try this one, it doesn't have the mCurrentDocumentMapEntry nulling hunk. /be
Yeah, patch v1.2 doesn't seem to assert over here.
The placeholder for nextSegmentOffset and length should be 8 at least, which causes the assertion. Can we do a little hack here: http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsFastLoadFile.cpp#1537 Replace the statement with: rv = Write32(8); to bypass this assertion?
As bz stated in comment 74, the bug also affects 1.8. It should be nominated.
I also crash sometimes when I start Firefox so this is not a mail only issue as the subject states. See bug 347120
*** Bug 347120 has been marked as a duplicate of this bug. ***
No longer blocks: 347120
Fixed on trunk again. Nominating for branch-based releases. /be
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Flags: blocking1.8.1?
Flags: blocking1.8.0.7?
Resolution: --- → FIXED
Flags: blocking1.8.1? → blocking1.8.1+
Comment on attachment 231683 [details] [diff] [review] Patch, v1.2 This is baking on the trunk, it landed around 3pm Pacific yesterday. /be
Attachment #231683 - Flags: review+
Attachment #231683 - Flags: approval1.8.1?
This used to bite instantly, and no longer does, even with mail-window pane layout switches, theme switches, etc (I tried to test things which might cause the XUL.mfl to change; my knowledge of how this works is limited, but I tested well). Verified FIXED on trunk using SeaMonkey build 2006-08-04-09 under Windows XP.
Status: RESOLVED → VERIFIED
Verified FIXED also on Mac version.
Comment on attachment 231683 [details] [diff] [review] Patch, v1.2 a=schrep for drivers.
Attachment #231683 - Flags: approval1.8.1? → approval1.8.1+
This entrains the fix for bug 313575, which I merged automatically via 1508 cvs up -j3.3{6,7} nsFastLoadFile.cpp 1509 cvs up -j3.4{2,3} nsFastLoadFile.cpp 1518 cvs up -j3.1{8,9} nsFastLoadFile.h (from history | grep). Checking in nsFastLoadFile.cpp; /cvsroot/mozilla/xpcom/io/nsFastLoadFile.cpp,v <-- nsFastLoadFile.cpp new revision: 3.36.18.1; previous revision: 3.36 done Checking in nsFastLoadFile.h; /cvsroot/mozilla/xpcom/io/nsFastLoadFile.h,v <-- nsFastLoadFile.h new revision: 3.18.28.1; previous revision: 3.18 done /be
Keywords: fixed1.8.1
Flags: blocking1.8.0.7? → blocking1.8.0.7+
Comment on attachment 232444 [details] [diff] [review] patch I'm committing to the 1.8 branch approved for 1.8.0 branch, a=dveditz for drivers
Attachment #232444 - Flags: approval1.8.0.7+
Fixed on the 1.8.0 branch too. /be
Keywords: fixed1.8.0.7
*** Bug 349629 has been marked as a duplicate of this bug. ***
*** Bug 345608 has been marked as a duplicate of this bug. ***
Depends on: 443866
Crash Signature: [@ nsXULPrototypeElement::Deserialize] [@ nsXULPrototypeAttribute::Finalize]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: