OK, I've spent the morning printf debugging, and here's what I've got:
A database is being
ForceClosed() during the test, and there's still a
nsMsgHdr held somewhere which references it.
I guess that's not too surprising - lots of places hold
nsIMsgDBHdr pointers. There's no way for the database to know about them all and to ask them to relinquish their DB refcounts... so even
ForceClosed() can't really guarantee that a database will be freed.
If file handles are bound to nsMsgDatabase lifetimes, then
ForceClosed() is just inherently doomed to be a bit crap. There's a bit of stuff in the DB to track outstanding message iterators, but there's no way to be sure there aren't other things holding a refcount on the DB.
So, if the patch fixes the immediate issue, I say we should just land it as is, and I'll open a new bug to try and sort out DB lifetime management properly.
Making ForceClosed() unconditional, the failing line in the test run is indeed:
let junkScore = parseInt(msgHdr.getProperty("junkscore"), 10) || 0;
The failure is occurring in
nsMsgHdr::GetProperty(). It bails out because
m_mdbRow is null.
Working back from there, I can see that particular nsMsgHdr previously being created fine, with a database (m_mdb) and a non-null row (m_mdbRow). Here's a heavily-edited exerpt of some of my printfs while the test runs:
0:04.39 pid:408695 nsMsgHdr::nsMsgHdr(db=0x7f2399efb120, row=0x7f237714ef28) this=0x7f237714efa0
[ So our nsMsgHdr has now been created, with a non-null db and row. ]
0:04.39 pid:408695 nsMsgHdr::SetMessageKey() this=0x7f237714efa0 mdb=0x7f2399efb120 row=0x7f237714ef28
0:04.39 pid:408695 nsMsgHdr::SetStringProperty('storeToken') this=0x7f237714efa0 db=0x7f2399efb120 row=0x7f237714ef28
[... snip lots of other nsMsgHdr calls, all with the same db and row as in the ctor... ]
0:04.40 pid:408695 nsMsgDatabase::ForceClosed() this=0x7f2399efb120
[and there goes our database - byebye! ]
0:04.40 pid:408695 nsMsgHdr::GetProperty('junkscore') this=0x7f237714efa0 db=0x7f2399efb120 row=(nil) canary='XXXXXXXX'
[uhoh... row is now null!]
0:04.40 pid:408695 ===> NS_ERROR_NULL_POINTER!!!
m_mdbRow is private and the only place I can see which clears it is nsMsgHdr::ReleaseMDBRow(). But that's not triggering at all, according to the printf I added to it.
I did think maybe something else might have been blatting over the memory... but I replaced the m_mdbRow pointer with a char filled with 'X's, and moved m_mdbRow elsewhere in the class. And my 'X's remained intact (you can see the
canary='XXXXXXXX' in the output above). So I can't be 100% sure, but I don't think it's a memory overwrite.
So something is clearing out
m_mdbRow, but I have no idea what or how :-(
And it's not the real issue anyway, so I'm going to stop digging.
One other mystery:
GetProperty() dies because
m_mdbRow is null, and returns
Oh, the failure happens for local folders - you can bypass the other runs of the same test with
./mach xpcshell-test --tag local comm/mail/components/extensions/test/xpcshell/test_ext_messages.js