Closed
Bug 1368786
Opened 7 years ago
Closed 7 years ago
Crash in morkRow::GetCell via XPTC__InvokebyIndex
Categories
(Thunderbird :: General, defect)
Tracking
(thunderbird_esr52 unaffected, thunderbird55 wontfix, thunderbird56 fixed, thunderbird57 fixed)
VERIFIED
FIXED
Thunderbird 57.0
Tracking | Status | |
---|---|---|
thunderbird_esr52 | --- | unaffected |
thunderbird55 | --- | wontfix |
thunderbird56 | --- | fixed |
thunderbird57 | --- | fixed |
People
(Reporter: wsmwk, Assigned: jorgk-bmo)
References
Details
(Keywords: crash, regression, topcrash-thunderbird, Whiteboard: [regression:TB55])
Crash Data
Attachments
(1 file, 2 obsolete files)
1.53 KB,
patch
|
rkent
:
review+
jorgk-bmo
:
approval-comm-beta+
|
Details | Diff | Splinter Review |
#1 crash for nightly 55.0a1. First seen for build 20170422030207 bp-5975a332-7eeb-4708-8ea5-4e72b0170422. 0 xul.dll morkRow::GetCell(morkEnv*, unsigned int, int*) C:/builds/moz2_slave/tb-c-cen-w64-ntly-000000000000/build/db/mork/src/morkRow.cpp:516 1 xul.dll XPTC__InvokebyIndex xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_64.asm:58 2 @0x20bf405d7ff 3 xul.dll XPTC__InvokebyIndex xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_64.asm:97 4 @0xe3d7ff7187 5 xul.dll XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) js/xpconnect/src/XPCWrappedNative.cpp:1296 6 xul.dll XPC_WN_GetterSetter(JSContext*, unsigned int, JS::Value*) js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1019 7 xul.dll js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) js/src/vm/Interpreter.cpp:470 8 xul.dll js::CallGetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) js/src/vm/Interpreter.cpp:649 9 xul.dll CallGetter js/src/vm/NativeObject.cpp:1788 10 xul.dll js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) js/src/vm/NativeObject.cpp:2101 11 xul.dll js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, js::PropertyName*, JS::MutableHandle<JS::Value>) js/src/jsobj.h:845 12 xul.dll js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>) js/src/vm/Interpreter.cpp:4399 13 xul.dll GetPropertyOperation js/src/vm/Interpreter.cpp:193 14 xul.dll Interpret js/src/vm/Interpreter.cpp:2740 15 xul.dll js::RunScript(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:410 16 xul.dll js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) js/src/vm/Interpreter.cpp:488 17 xul.dll js::jit::DoCallFallback js/src/jit/BaselineIC.cpp:2452 18 @0x1349838c169 Does not appear to be add-on related. Some of the users see other crash signatures. like morkRow::GetRowSpaceStore bp-5d1ea52f-52c2-47e4-9652-9048e0170516 - but that is not a new signature
Reporter | ||
Comment 1•7 years ago
|
||
> #1 crash for nightly 55.0a1. #1 crash for 55.0b2. And still happening in recent nightlies, like bp-c45fd99f-0883-41bd-b8dc-9a0430170718 Most user are reporting delete related problems. Reporter of bp-277c6aa4-a7d0-488c-bc9a-8a0b40170720 tells me repairing trash folder solved the problem > First seen for build 20170422030207... At rate of several crashes per day [1], so I'd say the regression checkin occurred in the prior 1-2 days https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-04-18+03%3A05%3A00&enddate=2017-04-22+03%3A05%3A00 [1] https://crash-stats.mozilla.com/signature/?release_channel=nightly&signature=morkRow%3A%3AGetCell&date=%3E%3D2017-01-21T09%3A23%3A04.000Z&date=%3C2017-04-25T09%3A23%3A00.000Z&_columns=date&_columns=build_id&_columns=user_comments&_sort=-date&page=1
status-thunderbird55:
--- → affected
OS: Windows 10 → All
Reporter | ||
Comment 2•7 years ago
|
||
bug 1358341 ?
Assignee | ||
Comment 3•7 years ago
|
||
No way. I don't see anything in the range you gave in comment #1. I looked at bug 1358444, in particular https://hg.mozilla.org/comm-central/rev/2bf4d1d2bf58#l1.12 but that looks right, too. Kent, can you confirm, please. https://crash-stats.mozilla.com/report/index/277c6aa4-a7d0-488c-bc9a-8a0b40170720 has a little more call stack. It doesn't need to be a C-C change, something can have changed slightly in M-C exposing a memory mishandling which now results in a crash.
Flags: needinfo?(rkent)
Assignee | ||
Comment 4•7 years ago
|
||
On average, less than one crash per day: https://crash-stats.mozilla.com/signature/?release_channel=nightly&signature=morkRow%3A%3AGetCell&date=%3E%3D2017-07-07T11%3A38%3A24.000Z&date=%3C2017-07-21T11%3A38%3A24.000Z&_columns=date&_columns=build_id&_columns=user_comments&_sort=-date&page=1 Some interesting and varied call stacks in there.
Comment 5•7 years ago
|
||
> I looked at bug 1358444, in particular https://hg.mozilla.org/comm-central/rev/2bf4d1d2bf58#l1.12
but that looks right, too. Kent, can you confirm, please.
It looks OK to me too.
I looked over this bug, and could not get any good ideas on the cause. I'm not sure I really understand the crash itself. Like, how could we crash on line #510:
morkCell* cells = mRow_Cells;
Flags: needinfo?(rkent)
Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Kent James (:rkent) from comment #5) > Like, how could we crash on line #510: > morkCell* cells = mRow_Cells; Agreed, however, this one https://crash-stats.mozilla.com/report/index/e11b8c61-cc19-4f54-b858-642550170709 claims to crash on line #516: mork_column col = cells->GetColumn(); where cells was assigned on line #510: morkCell* cells = mRow_Cells;
Reporter | ||
Comment 7•7 years ago
|
||
FWIW, the user of these crashes also crashed with bug 1377277 signature, which is also a topcrash for 55.0b2 bp-1b07b684-18d3-4af1-b3b3-cc33b0170726 bp-43e0cb0b-0518-40da-abf4-511a00170725
Flags: needinfo?(m_kato)
Comment 8•7 years ago
|
||
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #7) > FWIW, the user of these crashes also crashed with bug 1377277 signature, > which is also a topcrash for 55.0b2 > > bp-1b07b684-18d3-4af1-b3b3-cc33b0170726 comment #6. But I don't know why cells is invalid address. > bp-43e0cb0b-0518-40da-abf4-511a00170725 simply shutdown hang like bug 1377277. But 2 sqlite3 thread tries accessing database without deadlock, so this might be slow on user PC/disk.
Flags: needinfo?(m_kato)
Reporter | ||
Comment 9•7 years ago
|
||
Still #1 crash for 55.0b2 and exists for 56.0a1, so expecting to see this in 56.0b1. This signature might be coming in part from users who had different crash signatures prior to 55.0b2. For example bug 1255903
status-thunderbird56:
--- → affected
tracking-thunderbird56:
--- → ?
Reporter | ||
Comment 10•7 years ago
|
||
#3 crash for 56.0b2 (but only because it's behind *new* topcrashes). Might this be relalated to gcc6?
Comment 11•7 years ago
|
||
it's been a while since I've last looked at this part of the code (if ever), and I have zero idea what's going on. that said, I was wondering if someone can clarify: MORK_USED_1(ev); morkCell* cells = mRow_Cells; if ( cells ) { morkCell* end = cells + mRow_Length; while ( cells < end ) { mork_column col = cells->GetColumn(); As I understand this (which is extremely limited), class->GetColumn() calls : https://hg.mozilla.org/comm-central/file/tip/db/mork/src/morkCell.h#l48 mork_column GetColumn() const { return morkDelta_Column(mCell_Delta); } Which morkDelta_Column is defined in line 20: #define morkDelta_Column(self) ((self) >> morkDelta_kShift) So just wondering if: 1) mRow_Cells is defined. If so, what is it defined to? I'm looking at the crash reason "EXCEPTION_ACCESS_VIOLATION_READ" and I'm wondering if cells is defined. Looking at https://dxr.mozilla.org/comm-central/source/db/mork/src/morkRow.cpp#187, mRow_Cells = 0 (at line 199) only if (ioSpace && iOPool && inOid && inLength <= morkRow_kMaxLength && inOid->mOid_Id != morkRow_kMinusOneRid ) are all true. Assuming they are all true, and in line 212, if (inLength) is true (from https://dxr.mozilla.org/comm-central/source/db/mork/src/morkRow.cpp#213) does ioPool->NewCells(ev, inLength, zone); return something valid? Again, I have no clue, but just asking since the easiest for me is to figure out where mRow_Cells is defined and work from there. I apologize if it's irrelevant to the issue.
Comment 12•7 years ago
|
||
Edmund, I wish I could answer your questions, but I cannot. The mork code was written in the earliest days of Mozilla, and adapted in a non-standard way to XPCOM. None of us really understand it, or want to understand it. Replacing Mork has been a high priority for a very long time with no traction.
Comment 13•7 years ago
|
||
Has anyone else encountered this crash? (In reply to Jorg K (GMT+2) from comment #6) > (In reply to Kent James (:rkent) from comment #5) > > Like, how could we crash on line #510: > > morkCell* cells = mRow_Cells; > Agreed, however, this one > https://crash-stats.mozilla.com/report/index/e11b8c61-cc19-4f54-b858- > 642550170709 claims to crash on line #516: > mork_column col = cells->GetColumn(); > where cells was assigned on line #510: > morkCell* cells = mRow_Cells; So if cells == 0, would it crash?
Assignee | ||
Comment 14•7 years ago
|
||
To answer the question, let's look at the code again: morkCell* morkRow::GetCell(morkEnv* ev, mdb_column inColumn, mork_pos* outPos) const { MORK_USED_1(ev); morkCell* cells = mRow_Cells; if ( cells ) { morkCell* end = cells + mRow_Length; while ( cells < end ) { 516: mork_column col = cells->GetColumn(); If cells==0 then line 516 is not executed. It will most likely crash is cells contains an invalid value it obtained from the assignment a few lines above.
Reporter | ||
Comment 15•7 years ago
|
||
what's the conclusion then - msf corrupted, by something else much earlier, and we should not avoid the crash but rather find the regression range causing the corruption?
Assignee | ||
Comment 17•7 years ago
|
||
Triggered by the new bug 1398402 I've looked at this again. That bug has some very consistent crash reports. If you look at them you see what happens: Some JS part of Thunderbird uses messageHeader.messageId and that invokes nsMsgHdr::GetMessageId() at mailnews/db/msgdb/src/nsMsgHdr.cpp:580. There is plenty of JS that does this: https://dxr.mozilla.org/comm-central/search?q=.messageId&redirect=false The regressing bug is bug 853881 which landed on 22nd April 2017 in two changesets, bug 853881 comment #34 and bug 853881 comment #42. In bug 853881 Joshua replaced the "old technology" PLDHashTable with a more modern one. The bug was abandoned, Aceman and later Kent had the initiative to get it landed. So I spent a considerable amount of time on it. In the end, there was some doubt left re. reference counting, see for example bug 853881 comment #34. So I believe what happens is that the new caching mechanism somehow hangs on to a stale row. When we come to access that header again, the new hash table hands back the stale row and when accessing it ... Crash!
Assignee | ||
Comment 18•7 years ago
|
||
I have the impression that what I described above is related to the code that got removed here: https://hg.mozilla.org/comm-central/rev/62bfaab2731fc47ee7ed26aefa001e63c4f2c371#l2.259
Assignee | ||
Comment 19•7 years ago
|
||
Joshua, looks like your hash table replacement has done some damage, in fact, caused the #1 top crash :-( What do you think of this proposal?
Attachment #8906332 -
Flags: feedback?(Pidgeot18)
Assignee | ||
Comment 20•7 years ago
|
||
Comment on attachment 8906332 [details] [diff] [review] 1368786-mork-crash.patch Maybe Kent passes by before Joshua comments. I have a keen user in bug 1398402 eager to try this out, but sadly C-C try builds don't work (bug 1398554), so I can't get a binary to him.
Attachment #8906332 -
Flags: review?(rkent)
Assignee | ||
Comment 21•7 years ago
|
||
Kent, user "doug2" over in bug 1398402 has been running a try build with this patch for a few days and his crashes have gone away. Repeating: The patch restores part of the code that was removed here: https://hg.mozilla.org/comm-central/rev/62bfaab2731fc47ee7ed26aefa001e63c4f2c371#l2.259 to avoid stale handles to Mork rows, if we can believe the comment.
Reporter | ||
Updated•7 years ago
|
Blocks: 853881
Keywords: regressionwindow-wanted
Comment 22•7 years ago
|
||
20:37 no crashes today. A couple of instances of disappearing message windows (like the window crashed, but not the app). Otherwise stable. I'll stay on daily until I hear otherwise.
Assignee | ||
Comment 23•7 years ago
|
||
Kent, this user has been crash-free ever since he started using a version containing this patch, so for a few days now. Before he crashed a few times a day. Doug2, the Daily version you're using is pretty stable, I'm using it myself. Please stay on it.
Comment 24•7 years ago
|
||
Comment on attachment 8906332 [details] [diff] [review] 1368786-mork-crash.patch Review of attachment 8906332 [details] [diff] [review]: ----------------------------------------------------------------- I spent many hours looking at this, trying to understand the underlying memory model a little better. I'm still not comfortable that I understand it, but really that relates more to the changes in bug 853881 than this one. As I understand the memory model pre bug 853881, the COM reference to the nsIMdbRow of m_mdbRow in nsMsgHdr belongs to nsMsgHdr, such that any nulling of m_mdbRow required NS_RELEASE(m_mdbRow). In the implementation of bug 853881, you went over this issue quite a bit, and ultimately removed the call to NS_RELEASE(m_mdbRow) in nsMsgHdr::~nsMsgHdr(). You requested many times that I investigate that, but I don't really understand the mork memory model, and it is non-trivial (as in days or weeks) to try to really understand it. So I'm really not very happy with patch https://hg.mozilla.org/comm-central/rev/5c3191edbbd4fe552450b582d520ba9be4cd431b from bug 853881. That patch seems to me to be incorrect. You believed that the destructor code in morkNode::CutWeakRef and morkNode::ZapOld (which calls the destructor explicitly!) was removing the object, but I'm not sure that is correct. It seems to me that there is no longer anything that releases m_mdbRow, which would result in a fairly substantial memory leak. So prior to that patch, I would have required here in the current patch NS_IF_RELEASE(m_mdbRow) (actually of iter.Data()->m_mdbRow) prior to nulling, but that assumes that the last patch in bug 853881 is invalid. I'm really not sure what to do here because I don't really like the previous patch. Can you explain where you believe that the m_mdbRow is being deleted? Do you believe it is in mork somewhere? Yet prior to bug 853881 there were clearly NS_RELEASE calls that would have resulted in it being deleted. What has changed? Really the problem here is that the technical debt of the mork code is catching up with us. We all hope that someone else will figure this out, when the reality is that the code is hopelessly obsolete and none of us is going to be able to figure it out. If you can justify the last patch in bug 853881, then this patch is fine. But really what we should do is revert that previous patch, and try to really understand what is going on there. Or you can just leave the current patch as is, and we can accept the massive leaks that we may get from these two changes. Or prove why they don't occur.
Assignee | ||
Comment 25•7 years ago
|
||
(In reply to Kent James (:rkent) from comment #24) > So I'm really not very happy with patch > https://hg.mozilla.org/comm-central/rev/ > 5c3191edbbd4fe552450b582d520ba9be4cd431b from bug 853881. Hi Kent, thanks for taking the time to look at this thoroughly. As you recall, Joshua's work was abandoned, I refreshed the patches, addressed compile errors, and did a "pick-and-chose" from Joshua's patch not using some hunks. The original code at that spot can be seen here: https://hg.mozilla.org/comm-central/rev/62bfaab2731fc47ee7ed26aefa001e63c4f2c371#l3.46 and in a second patch, I changed it to what you're not happy about: https://hg.mozilla.org/comm-central/rev/5c3191edbbd4fe552450b582d520ba9be4cd431b#l1.20 Why did I do that? Two reasons: The first version crashed, due to double free, see bug 853881 comment #39 and bug 853881 comment #41. Back then, I looked at it with the debugger, and clearly, things got double free'd. I came into Mork and it tried to free a location that contained garbage. Since https://hg.mozilla.org/comm-central/rev/62bfaab2731fc47ee7ed26aefa001e63c4f2c371#l3.46 was my own picking and choosing from Joshua's patch, I then went to his original hunk, attachment 728244 [details] [diff] [review]: nsMsgHdr::~nsMsgHdr() { - if (m_mdbRow) - { - if (m_mdb) - { - NS_RELEASE(m_mdbRow); - m_mdb->RemoveHdrFromUseCache((nsIMsgDBHdr *) this, m_messageKey); - } - } + if (m_mdb) + m_mdb->RemoveHdrFromUseCache(this, m_messageKey); NS_IF_RELEASE(m_mdb); } and the problem went away. I trusted that he and the author and you as the reviewer had inspected that original patch. In the review, no one complained about that hunk, so I assumed it was right. Sorry, that's the practical answer. You know I have multiple roles here, sheriff, developer and jack-of-all-trades. So this is the sheriff's answer. I'll post this comment in case you're still awake, and then I'll look into the issue a little more.
Assignee | ||
Comment 26•7 years ago
|
||
OK, if I understood your comment #24 correctly, you want to undo two hunks of Joshua's original patch. Note that releasing with the macro also nulls the pointer. I started the application, clicked through a few folders and no crash. Would you like me to give this to our test user to try? Sadly debug builds are currently busted (as of just now, bug 1399746), so they crash anyway and I cannot try this on try before fixing that issue.
Attachment #8907961 -
Flags: feedback?(rkent)
Comment 27•7 years ago
|
||
As the test user, I would be happy to help when you can get me a daily or test build. I wish I could help understand the larger issue (memory model), but it's probably over my head.
Assignee | ||
Comment 28•7 years ago
|
||
Thanks, Doug, a new test version is on the way. I'll advise you here.
Assignee | ||
Comment 29•7 years ago
|
||
Kent, I noticed that the crashes from bug 1399746 are Mac only for some reason. So here's a try with the new patch on Windows: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ecb54ea446274101ea438d0a2353ad82d2c14f51
Comment 30•7 years ago
|
||
This is the test user. 1. I would be happy to test the modified code when you can create a "daily" or try for me to test. 2. Being a little radical here, am I the only user who cares about this configuration: MS Windows + POP + new messages in windows-not-tabs? Don't know if POP (vs MAPI) has anything to do with it. If I'm the only user in that config, maybe I should move to some other mail reader? (Not my preference, but several people pulling hair out to fix one person's problem is not effective use of your time.) 3. I'd love to try to help understand the memory model, but it would probably take more of your time than you have to get me started.
Assignee | ||
Comment 31•7 years ago
|
||
(In reply to doug2 from comment #30) > This is the test user. I know, thanks for helping us. > 1. I would be happy to test the modified code when you can create a "daily" > or try for me to test. It will be ready soon. > 2. Being a little radical here, am I the only user who cares about this > configuration: MS Windows + POP + new messages in windows-not-tabs? Don't > know if POP (vs MAPI) has anything to do with it. If I'm the only user in > that config, maybe I should move to some other mail reader? (Not my > preference, but several people pulling hair out to fix one person's problem > is not effective use of your time.) I use Windows and POP, but not "message in window". Anyway, that should be supported and with your help, we can sort out the problems. First we need to address the crash since you're not the only person crashing. > 3. I'd love to try to help understand the memory model, but it would > probably take more of your time than you have to get me started. Please understand that we have some legacy code dating back to the Netscape days in 2000. The original developers have left and even the current head developers tear their hair out, see comment #24. I'll let you know when the new binary becomes available.
Assignee | ||
Comment 32•7 years ago
|
||
Kent, the try is green apart from the "usual" failures, so proceed to review ;-) Doug, your new binary is here, it should perform just like the previous one, although there are internal changes: https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-ecb54ea446274101ea438d0a2353ad82d2c14f51/try-comm-central-win32/thunderbird-57.0a1.en-US.win32.installer.exe
Comment 33•7 years ago
|
||
Comment on attachment 8907961 [details] [diff] [review] 1368786-mork-crash-take2.patch Review of attachment 8907961 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp @@ +653,5 @@ > { > + // Clear out m_mdbRow member variable - the db is going away, which means > + // that this member variable might very well point to a mork db that is gone. > + for (auto iter = m_headersInUse.ConstIter(); !iter.Done(); iter.Next()) { > + NS_IF_RELEASE(iter.Data()->m_mdbRow); Actually I believe this line is likely the root cause of the double release. We are releasing m_mdbRow here - but not nulling m_mdbRow in the process. In nsMsgHdr::~nsMsgHdr, which will still get called, we avoid the m_mdbRow release if m_mdbRow is null - but where is the nulling here? Won't the destructor release it again? So actually I believe that you need to replace that with: if (iter.Data()->m_mdbRow) { NS_RELEASE(iter.Data()->m_mdbRow); iter.Data()->m_mdbRow = null; } Or so I think today and part of yesterday. I get so easily confused by this that my opinion changes regularly. I would be more comfortable if we could reproduce the crash from bug 853881, and convince ourselves that this change fixes it.
Attachment #8907961 -
Flags: feedback?(rkent) → feedback+
Assignee | ||
Comment 34•7 years ago
|
||
The crash is easily reproduced by applying: nsMsgHdr::~nsMsgHdr() { - if (m_mdb) - m_mdb->RemoveHdrFromUseCache(this, m_messageKey); + if (m_mdbRow) + { + if (m_mdb) + { + NS_RELEASE(m_mdbRow); + m_mdb->RemoveHdrFromUseCache(this, m_messageKey); + } + } NS_IF_RELEASE(m_mdb); } Then I can see whether the other hunk fixes it, which it does, since I ran TB with both patches applied, and the try is green. When it crashes, the try was red. But I can repeat the exercise with the two patches separately. As for: NS_RELEASE(iter.Data()->m_mdbRow); iter.Data()->m_mdbRow = null; Well, the NS_IF_RELEASE() is safer since it won't dereference a nullptr, and as I said, it nulls the pointer:http://searchfox.org/mozilla-central/source/xpcom/base/nsISupportsUtils.h#96 I'll get back to you in 30 minutes, when I made it crash, OK?
Comment 35•7 years ago
|
||
I see what you mean about NS_IF_RELEASE, I was not aware of that. I'm a little confused though about what set of changes crashes or not. Is there a variation that does not crash that includes a release of m_mdbRow in ~nsMsgHdr?
Assignee | ||
Comment 36•7 years ago
|
||
OK, this code crashes: nsMsgHdr::~nsMsgHdr() { if (m_mdbRow) { if (m_mdb) { NS_RELEASE(m_mdbRow); m_mdb->RemoveHdrFromUseCache(this, m_messageKey); } } NS_IF_RELEASE(m_mdb); } That's the original code *before* Joshua got to it. The NS_IF_RELEASE() in the loop will fix it.
Assignee | ||
Comment 37•7 years ago
|
||
OK, as I said, restoring that code above crashes at application shutdown time after visiting a few folders in TB and clicking a few messages, very reproducible. Stack:
> xul.dll!morkRowObject::CloseRowObject(morkEnv * ev) Line 572 C++
xul.dll!morkRowObject::~morkRowObject() Line 61 C++
[External Code]
xul.dll!morkObject::Release() Line 35 C++
xul.dll!morkRowObject::Release() Line 88 C++
xul.dll!nsMsgHdr::~nsMsgHdr() Line 111 C++
We're accessing junk.
Now, the new loop:
void nsMsgDatabase::ClearUseHdrCache()
{
// Clear out m_mdbRow member variable - the db is going away, which means
// that this member variable might very well point to a mork db that is gone.
for (auto iter = m_headersInUse.ConstIter(); !iter.Done(); iter.Next()) {
NS_IF_RELEASE(iter.Data()->m_mdbRow);
}
m_headersInUse.Clear();
}
makes the crash go away. That's the confirmation you asked for, no?
We can wait for Doug's report on the binary I gave him.
Comment 38•7 years ago
|
||
Running the "daily" from comment #32 now. Have not had a crash yet. Definitely still has the "new message window disappears" issue, however. Let me know when you want me to run something else.
Assignee | ||
Comment 39•7 years ago
|
||
While we're here, why is this so twisted? Why do I only call NS_RELEASE(m_mdbRow) if m_mdb != null? nsMsgHdr::~nsMsgHdr() { if (m_mdbRow) { if (m_mdb) { NS_RELEASE(m_mdbRow); m_mdb->RemoveHdrFromUseCache(this, m_messageKey); } } NS_IF_RELEASE(m_mdb); } Why can't it just be: nsMsgHdr::~nsMsgHdr() { NS_IF_RELEASE(m_mdbRow); <== Do we need this? if (m_mdb) m_mdb->RemoveHdrFromUseCache(this, m_messageKey); NS_IF_RELEASE(m_mdb); }
Comment 40•7 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #39) > While we're here, why is this so twisted? ... > > Why can't it just be: > > nsMsgHdr::~nsMsgHdr() > { > NS_IF_RELEASE(m_mdbRow); <== Do we need this? > if (m_mdb) > m_mdb->RemoveHdrFromUseCache(this, m_messageKey); > NS_IF_RELEASE(m_mdb); > } That seems like it should work.
Assignee | ||
Comment 41•7 years ago
|
||
Doesn't crash. Can you enlighten me why you think we need to release in the DTOR? All those got released when we tossed the cache away, no?
Attachment #8906332 -
Attachment is obsolete: true
Attachment #8907961 -
Attachment is obsolete: true
Attachment #8906332 -
Flags: review?(rkent)
Attachment #8906332 -
Flags: feedback?(Pidgeot18)
Attachment #8908213 -
Flags: review?(rkent)
Comment 42•7 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #41) > Created attachment 8908213 [details] [diff] [review] > 1368786-mork-crash-take2.patch (v2) > > Doesn't crash. Can you enlighten me why you think we need to release in the > DTOR? All those got released when we tossed the cache away, no? Yes, but blowing away the use cache is not the typical way that a nsMsgHdr gets released. It is those other cases that need to release in the DTOR.
Comment 43•7 years ago
|
||
Comment on attachment 8908213 [details] [diff] [review] 1368786-mork-crash-take2.patch (v2) Review of attachment 8908213 [details] [diff] [review]: ----------------------------------------------------------------- I'm still a little confused about what crashed and what does not. You say this is not crashing now, but isn't this what crashed previously? But if this seems to solve the issues in these bugs, then the code looks correct to me.
Attachment #8908213 -
Flags: review?(rkent) → review+
Assignee | ||
Comment 44•7 years ago
|
||
Maybe the confusion comes from the fact that there are two crashes. In the current state, no patch, TB crashes for some users, in fact, it's topcrash #1 on beta right now. I'm using Daily and it never crashed for me. So this is a "rare" crash. Adding the loop in nsMsgDatabase.cpp makes that "rare" crash go away, as confirmed by Doug over the last few days. There is a second very reliable crash obtained by releasing the row in the DTOR, see comment #36. That's what crashed "previously", hence my change with which you weren't happy at all since it leaked. This second very reliable crash is also fixed by that loop. I hope it's clear now. I'm doing a build for Doug to try with the final version before we release this to the masses ;-)
Assignee | ||
Comment 45•7 years ago
|
||
Doug, thanks for your patience, this would be the final version: https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-82c7d73473a74b459884c56fddeb29dd23ea770a/try-comm-central-win32/thunderbird-57.0a1.en-US.win32.installer.exe Once we're done with this bug here, I promise to look at bug 711204 for you.
Comment 46•7 years ago
|
||
Running fine so far. Thanks. 711204 is a minor pain, not an app crash.
Comment 47•7 years ago
|
||
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/a0bd8f2b70be release m_mdbRow in hash table entry and also in DTOR. r=rkent
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → jorgk
status-thunderbird57:
--- → fixed
tracking-thunderbird56:
? → ---
Target Milestone: --- → Thunderbird 57.0
Assignee | ||
Comment 48•7 years ago
|
||
Comment on attachment 8908213 [details] [diff] [review] 1368786-mork-crash-take2.patch (v2) Let's uplift this if we don't get negative feedback from Doug.
Attachment #8908213 -
Flags: approval-comm-beta+
Reporter | ||
Comment 49•7 years ago
|
||
We're so close to merge I'm thinking we won't do another beta 56, so uplift should not be needed. As long as Tom's watershed/56.0b3 test can be done using 57.0b
Assignee | ||
Comment 50•7 years ago
|
||
OK, but what about the things we need for TB 52.4 ESR?
Comment 51•7 years ago
|
||
No crashes since I installed the latest try. (See comment #45)
Assignee | ||
Comment 52•7 years ago
|
||
Thanks so much for your help. Since the changes are now integrated into the code base, you could switch to an official Daily from here:https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32/1505428582/thunderbird-57.0a1.en-US.win32.installer.exe I suggest to switch automatic updates off since some Dailies don't work, that's life and the course of the development. Then at the end of the month, you switch to TB 57 beta when it's released. Sounds like a plan?
Comment 53•7 years ago
|
||
You are very welcome. Glad to help anytime. Thank you for all the time you spent on this and that you spend creating & updating Thunderbird. Got the official daily and running with it.
Assignee | ||
Comment 54•7 years ago
|
||
Beta (TB 56): https://hg.mozilla.org/releases/comm-beta/rev/8cb3077054c23c5590469eafd25f1e5c78db2f4c
Comment 55•7 years ago
|
||
Beta 56.0b3 2 crashes today. Both while reading new mail. 1st one does not show in my crash reports. Crash file contents "1507845030" Was on daily (57.0a1, 9/22/17) until yesterday. See comment #52 above. ?? when I went to the Beta Channel, I got 56 instead of 57, so I guess it is not released yet. That means crashes are probably just same as before. I'll go back to daily.
Assignee | ||
Comment 56•7 years ago
|
||
It would be in TB 56 beta 4. We've been struggling with that release, it's not ready so far. TB 57 beta will come after that.
Reporter | ||
Comment 57•7 years ago
|
||
graph looks good and with >50% uptake so far there are no crashes with 56.0b4 https://crash-stats.mozilla.com/signature/?release_channel=beta&signature=morkRow%3A%3AGetCell&date=%3E%3D2017-07-23T19%3A15%3A09.000Z&date=%3C2017-10-23T19%3A15%3A09.000Z#graphs
Status: RESOLVED → VERIFIED
Comment 58•7 years ago
|
||
Thanks. Just installed it. Will monitor.
You need to log in
before you can comment on or make changes to this bug.
Description
•