Closed Bug 1611708 Opened 5 years ago Closed 3 months ago

C-C TB uninitialized memory access during xpcshell-test comm/mailnews/local/test/unit/test_over4GBMailboxes.js, Uninitialised value was created by a stack allocation at nsFolderCompactState::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long.

Categories

(Thunderbird :: Testing Infrastructure, defect)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: ishikawa, Assigned: ishikawa)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Yet another and final unitialised memory access during xpcshell-test that I am aware of.
(There could be a few more if I can run all the test of xpcshell-test under valgrind.
Due to slower execution, some tests timeout and I cannot run them fully.)

This is the valgrind error signature:

37:10.97 pid:600151 ==600151== Conditional jump or move depends on uninitialised value(s)
37:10.97 pid:600151 ==600151==	  at 0x5829CCB: nsFolderCompactState::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long, unsigned int) (nsMsgFolderCompactor.cpp:763)
37:10.97 pid:600151 ==600151==	  by 0x5B568DB: nsMailboxProtocol::ReadMessageResponse(nsIInputStream*, unsigned long, unsigned int) (nsMailboxProtocol.cpp:605)
37:10.97 pid:600151 ==600151==	  by 0x5B57037: nsMailboxProtocol::ProcessProtocolState(nsIURI*, nsIInputStream*, unsigned long, unsigned int) (nsMailboxProtocol.cpp:684)
37:10.97 pid:600151 ==600151==	  by 0x58CF74E: nsMsgProtocol::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long, unsigned int) (nsMsgProtocol.cpp:298)
37:10.97 pid:600151 ==600151==	  by 0x5FC553B: nsInputStreamPump::OnStateTransfer() (nsInputStreamPump.cpp:571)
37:10.97 pid:600151 ==600151==	  by 0x5FC5C49: nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) (nsInputStreamPump.cpp:416)
37:10.97 pid:600151 ==600151==	  by 0x5FB01EF: nsBufferedInputStream::OnInputStreamReady(nsIAsyncInputStream*) (nsBufferedStreams.cpp:711)
37:10.97 pid:600151 ==600151==	  by 0x5DB4E73: mozilla::SlicedInputStream::OnInputStreamReady(nsIAsyncInputStream*) (SlicedInputStream.cpp:414)
37:10.97 pid:600151 ==600151==	  by 0x5DC32AA: nsInputStreamReadyEvent::Run() (nsStreamUtils.cpp:171)
37:10.97 pid:600151 ==600151==	  by 0x5E632EC: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:1220)
37:10.97 pid:600151 ==600151==	  by 0x5E47957: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:486)
37:10.97 pid:600151 ==600151==	  by 0x5E5827D: bool mozilla::SpinEventLoopUntil<(mozilla::ProcessFailureBehavior)1, nsThreadManager::SpinEventLoopUntilInternal(nsINestedEventLoopCondition*, bool)::{lambda()#1}>(nsThreadManager::SpinEventLoopUntilInternal(nsINestedEventLoopCondition*, bool)::{lambda()#1}&&, nsIThread*) (nsThreadUtils.h:346)
37:10.97 pid:600151 ==600151==	  by 0x5E58457: nsThreadManager::SpinEventLoopUntilInternal(nsINestedEventLoopCondition*, bool) (nsThreadManager.cpp:694)
37:10.97 pid:600151 ==600151==	  by 0x5E58543: nsThreadManager::SpinEventLoopUntil(nsINestedEventLoopCondition*) (nsThreadManager.cpp:673)
37:10.97 pid:600151 ==600151==	  by 0x5E75A09: ??? (xptcinvoke_asm_x86_64_unix.S:106)
37:10.97 pid:600151 ==600151==	  by 0x6D98BE7: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (XPCWrappedNative.cpp:1643)
37:10.97 pid:600151 ==600151==	  by 0x6DA970F: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) (XPCWrappedNativeJSOps.cpp:947)
37:10.97 pid:600151 ==600151==	  by 0xBBE835B: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) (Interpreter.cpp:450)
37:10.97 pid:600151 ==600151==	  by 0xBC0D6F8: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (Interpreter.cpp:542)
37:10.98 pid:600151 ==600151==	  by 0xBC0E224: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (Interpreter.cpp:605)
37:10.98 pid:600151 ==600151==	  by 0xBC0E3A8: js::CallFromStack(JSContext*, JS::CallArgs const&) (Interpreter.cpp:609)
37:10.98 pid:600151 ==600151==	  by 0xBC059C0: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:3018)
37:10.98 pid:600151 ==600151==	  by 0xBC0CF8C: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:422)
37:10.98 pid:600151 ==600151==	  by 0xBC0DC57: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (Interpreter.cpp:577)
37:10.98 pid:600151 ==600151==	  by 0xBC0E224: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (Interpreter.cpp:605)
37:10.98 pid:600151 ==600151==	  by 0xBC0E3A8: js::CallFromStack(JSContext*, JS::CallArgs const&) (Interpreter.cpp:609)
37:10.98 pid:600151 ==600151==	  by 0xC7A2DE7: js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) (BaselineIC.cpp:2906)
37:10.98 pid:600151 ==600151==	  by 0x3875A68739E6: ???
37:10.98 pid:600151 ==600151==	  by 0x182131B7: ???
37:10.98 pid:600151 ==600151==	  by 0x3875A6869A23: ???
37:10.98 pid:600151 ==600151==	  by 0xC89F508: EnterBaseline(JSContext*, js::jit::EnterJitData&) (BaselineJIT.cpp:114)
37:10.98 pid:600151 ==600151==	  by 0xC89FD65: js::jit::EnterBaselineInterpreterAtBranch(JSContext*, js::InterpreterFrame*, unsigned char*) (BaselineJIT.cpp:187)
37:10.98 pid:600151 ==600151==	  by 0xBBFFAFF: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:1909)
37:10.98 pid:600151 ==600151==	  by 0xBC0CF8C: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:422)
37:10.98 pid:600151 ==600151==	  by 0xBC11CAA: js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::AbstractFramePtr, JS::Value*) (Interpreter.cpp:798)
37:10.98 pid:600151 ==600151==	  by 0xBC125FA: js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) (Interpreter.cpp:831)
37:10.98 pid:600151 ==600151==	  by 0xBDFAFD5: bool EvaluateSourceBuffer<mozilla::Utf8Unit>(JSContext*, js::ScopeKind, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceText<mozilla::Utf8Unit>&, JS::MutableHandle<JS::Value>) (CompilationAndEvaluation.cpp:564)
37:10.98 pid:600151 ==600151==	  by 0xBDFB137: JS::EvaluateDontInflate(JSContext*, JS::ReadOnlyCompileOptions const&, JS::SourceText<mozilla::Utf8Unit>&, JS::MutableHandle<JS::Value>) (CompilationAndEvaluation.cpp:596)
37:10.98 pid:600151 ==600151==	  by 0x6D931DE: ProcessArgs(mozilla::dom::AutoJSAPI&, char**, int, XPCShellDirProvider*) (XPCShellImpl.cpp:1001)
37:10.98 pid:600151 ==600151==	  by 0x6D9E146: XRE_XPCShellMain(int, char**, char**, XREShellData const*) (XPCShellImpl.cpp:1401)
37:10.98 pid:600151 ==600151==	  by 0xBAB2291: mozilla::BootstrapImpl::XRE_XPCShellMain(int, char**, char**, XREShellData const*) (Bootstrap.cpp:54)
37:10.98 pid:600151 ==600151==	  by 0x11587B: main (xpcshell.cpp:66)
37:10.98 pid:600151 ==600151==	Uninitialised value was created by a stack allocation
37:10.98 pid:600151 ==600151==	  at 0x582927B: nsFolderCompactState::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long, unsigned int) (nsMsgFolderCompactor.cpp:716)

I am attaching the full log. (I have removed a few repetitive debug dumps. Due to the large amount of lines processed for this test, header line envelope debug dump lines were too numerous and made the log very large. I have removed them from the attached log.)

I have not had the time to check the cause the yet. But here is the info so that someone may be able to look into this ASAP.

Line numbers may be slightly off due to local changes, but the function
nsFolderCompactState::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long, unsigned int) (nsMsgFolderCompactor.cpp:716) is hard to miss.

TIA

Big Question:
To what value should |statsOffset| be initialized when m_startOfMsg = 0 ?

Analysis:
I have run the test with the attached patch to dump some variables and to make sure the compaction to run ALWAYS (as opposed to the random behavior (approximately once in 100 tries).

I think I found the culprit by checking the log.

When m_startOfMsg = 0,
statusOffset is not initialized and used at
https://searchfox.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#758
The uninitialized value can be anything.

When m_startOfMsg != 0,
statusOffset is initialized by the call to
(void)m_curSrcHdr->GetStatusOffset(&statusOffset);
https://searchfox.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#733

The code is sprinkled with code snippet to check for bogus statusOffset value.
At least in one control flow path It is not initialized before use and thus very likely that the
the check would kick in and then keyword processing is not done most of the cases(!).
For example, the usage mentioned above is part of a sanity check.
https://searchfox.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#758
If the value seems insane, checkForKeyword = false; and no keyword related processing takes place.

There is another one at
https://searchfox.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#739
too.

Thus, I think the failure to initialize |statusOffset| is the cause of previously observed
loss of keyword or tag information after compaction(?).

Possibly related bugs?
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=key%20compact&list_id=15087348

Big Question:
To what value should |statusOffset| be initialized when m_startOfMsg = 0?

I am attaching an excerpt from the log
in another post.

Assignee: nobody → ishikawa

Line that looks like the following with 2779096485
shows that statusOffset is about to be used without proper initialization.

14:17.76 pid:673222 available = bc000
14:17.76 pid:673222 {debug} OnDataAvailable
14:17.76 pid:673222 m_starOfMsg = 00
14:17.76 pid:673222 {debug} OnDataAvailable
14:17.76 pid:673222 count = 786432, statusOffset = 2779096485
14:17.76 pid:673222 {debug} OnDataAvailable
14:17.76 pid:673222 available = c0000
14:17.76 pid:673222 {debug} OnDataAvailable

The bogus value is observed ONLY when m_startOfMsg == 0 (oops, there is a misspelling of m_startOfMsg.)

Component: Untriaged → Testing Infrastructure

This path seems to be taken only when 4GB compaction is attempted as noted earlier.

I modified the code to set |statusOffset| to zero always.
I have no idea if that is right.
But I don't see any bad behavior in the test in some builds (to test other patches).
I simply put the local patch for this bugzilla near the start of the local patch queue.

Just in case, I am submitting a job to test this patch and a couple other simpler patches.

https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=067170d1724573f8dfe529494891a42ceed8c71a

If it does not work, maybe I need to create |m_statusOffset| and use it instead of the local variable.

(In reply to ISHIKAWA, Chiaki from comment #3)

If it does not work, maybe I need to create |m_statusOffset| and use it instead of the local variable.

No actually, |m_statusOffset| is already there. Maybe we need to use |m_statusOffset| instead of |statusOffset| in a few places.
That makes sense. Doesn't it?

I have noticed that comm/mailnews/local/test/unit/test_over4GBMailboxes.js crashes TB lately. :-(
See Bug 1737488.

Further notice, this test has been touched by me, see https://bugzilla.mozilla.org/show_bug.cgi?id=1737488#c4

Severity: normal → S3

Is this now gone from nightly builds?

Flags: needinfo?(ishikawa)

(In reply to Wayne Mery (:wsmwk) from comment #7)

Is this now gone from nightly builds?

I am trying to create a C-C TB binary that will run under valgrind. But for some mysterious reasons, valgrind+thunderbird has failed to run for the last few months.
valgrind kills thunderbird after trying to run it for a while. I don't even get to the main window. So I cannot vouch if I can run this anymore. :-(
That is one reason I have been testing my patches using ASAN build for any memory issues it might produce.

Flags: needinfo?(ishikawa)
Flags: needinfo?(ishikawa)

(In reply to Wayne Mery (:wsmwk) from comment #7)

Is this now gone from nightly builds?

Wayne, the compaction code has been rewritten by BenC and I believe the problem is gone.
I forgot about that.

Short summary:
The original question about if this bug is still there. The short answer is No.

Comment:
Just for completeness's sake, I have been trying to run the test under valgrind. It is an uphill battle.
valgrind is a bit flakey and I am not sure if I can run it in a reasonable time at all.
See Bug 1732848 (the latest comments).

Also, this test requires 4352MiB free space. But I am not entirely sure where the extra space is required.
Is this under TMP directory, the current directory (I think, No.)
My linux image inside virtualbox is rather tight in terms of space and I need to figure out where I can put the 4GB free space.
My local test harness consists of many shell scripts, and I may have set the temporary directory in a strange place. Ugh.
(Yes, I just realized the temporary directory is set to a memory-based filesystem of 1GB only. I had to tweak it to run this test.)

Well, the test timed out and error was recorded. A good sign that valgrind+thunderbird seems to be able to tackle this particular test after appropriate temporary file system setting.
But I think I need to set the time out much, much longer. I set it to "x25" the original value and am trying to see how it goes.
Basically, xpshell does not use graphical screen and this fares well under valgrind.
mochitest, OTOH, uses graphics screen and like my attempt to run C-C TB under valgrind to access test mail folders,
it may not fare well. I could not get the main screen to operate.
I could do it a couple of years ago, IIRC. Too bad.

Flags: needinfo?(ishikawa)

Thanks for the update

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: