Closed Bug 1996828 Opened 6 months ago Closed 2 months ago

Fuzzing calls exit() which runs static destructors and can cause MemoryTelemetry to be used after free.

Categories

(Thunderbird :: Testing Infrastructure, task)

task

Tracking

(Not tracked)

RESOLVED FIXED
149 Branch

People

(Reporter: jtracey, Assigned: pbone)

References

Details

Attachments

(4 files)

Since bug 1505277 is now wontfix, we should get Thunderbird fuzzing running on Mozilla infrastructure instead.

It would be good to describe what the next steps are.

As I understand it, fuzzing-capable binaries are now produced by Thunderbird CI.

I assume the next step is to have a task running that fetches those builds, and regularly executes them, captures the result and produces some kind of report.

What's necessary for that?

It sounds like at least Fuzzfetch support is needed, but how the Mozilla fuzzing infra works beyond that, I don't know.

Flags: needinfo?(twsmith)

Once fuzzfetch support is added the fuzzing team can add the target to our fuzzing infrastructure.

Flags: needinfo?(twsmith)

Now that the PR is merged, what are the next steps here?

Flags: needinfo?(twsmith)

Thanks for the fuzzzfetch PR!

guided-fuzzing-daemon is used in automation to run the fuzzing targets. AFL and Libfuzzer support will be required. Have a look and let us know if you have any questions.

Flags: needinfo?(twsmith)

That said we may be able to get the libfuzzer working. I'll try and let you know.

Flags: needinfo?(twsmith)

We got the infrastructure changes in place but the target is crashing when running replay or corpus merge. These issues will need to be resolved before the target will run properly in automation. This can be reproduced locally by running:

replay (add a single 'a' to test.bin):

FUZZER=MimeDecoder ./thunderbird test.bin

corpus reduce:

FUZZER=MimeDecoder ./thunderbird -merge=1 corpus/ empty/
Flags: needinfo?(twsmith)

I've been looking into this, but haven't made a ton of progress. Here is the output with ASAN's backtraces:

FUZZER=MimeDecoder ./thunderbird test.bin
*** You are running in headless mode.
Running Fuzzer tests...
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1874693350
INFO: Loaded 2 modules   (2478618 inline 8-bit counters): 14715 [0x77262b6534e0, 0x77262b656e5b), 2463903 [0x7726254725d0, 0x7726256cbe6f), 
INFO: Loaded 2 PC tables (2478618 PCs): 14715 [0x77262b656e60,0x77262b690610), 2463903 [0x7726256cbe70,0x772627c64860), 
./thunderbird: Running 1 inputs 1 time(s) each.
Running: test.bin
Executed test.bin in 47 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***
=================================================================
==1254668==ERROR: AddressSanitizer: heap-use-after-free on address 0x77a62da56df8 at pc 0x77260ed0525e bp 0x7ffe405b1630 sp 0x7ffe405b1628
READ of size 1 at 0x77a62da56df8 thread T0
    #0 0x77260ed0525d in mozilla::MemoryTelemetry::Poke() /builds/worker/checkouts/gecko/xpcom/base/MemoryTelemetry.cpp:160:8
    #1 0x77261083ee60 in XPCJSContext::AfterProcessTask(unsigned int) /builds/worker/checkouts/gecko/js/xpconnect/src/XPCJSContext.cpp:1539:28
    #2 0x77260eee1311 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1220:24
    #3 0x77260eedb349 in NS_ProcessPendingEvents(nsIThread*, unsigned int) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:426:19
    #4 0x77260ec96887 in mozilla::AppShutdown::AdvanceShutdownPhaseInternal(mozilla::ShutdownPhase, bool, char16_t const*, nsCOMPtr<nsISupports> const&) /builds/worker/checkouts/gecko/xpcom/base/AppShutdown.cpp:407:5
    #5 0x77261bd3fa2f in ~ScopedXPCOM /builds/worker/checkouts/gecko/tools/fuzzing/interface/harness/FuzzerTestHarness.h:82:7
    #6 0x77261bd3fa2f in mozilla::_InitFuzzer::DeinitXPCOM() /builds/worker/checkouts/gecko/tools/fuzzing/interface/harness/FuzzerRunner.cpp:25:23
    #7 0x7b262e6488a0 in __run_exit_handlers stdlib/exit.c:118:8
    #8 0x7b262e64897d in exit stdlib/exit.c:148:3
    #9 0x572ceef5b44c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerDriver.cpp
    #10 0x77261bd3f5fb in mozilla::FuzzerRunner::Run(int*, char***) /builds/worker/checkouts/gecko/tools/fuzzing/interface/harness/FuzzerRunner.cpp:87:13
    #11 0x77261bc8bdba in XREMain::XRE_mainStartup(bool*) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4817:35
    #12 0x77261bc9759f in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6148:12
    #13 0x77261bc98703 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6246:21
    #14 0x572ceee718b8 in do_main /builds/worker/checkouts/gecko/comm/mail/app/nsMailApp.cpp:218:22
    #15 0x572ceee718b8 in main /builds/worker/checkouts/gecko/comm/mail/app/nsMailApp.cpp:414:16
    #16 0x7b262e62a574 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #17 0x7b262e62a627 in __libc_start_main csu/../csu/libc-start.c:360:3
    #18 0x572ceed8b9b8 in _start (/home/user/Documents/fuzzfetch/thunderbird-fuzzing3/thunderbird-bin+0xcf9b8) (BuildId: 814d5880744d72c1bafd6f9f161c20cec1ca34c2)
 
0x77a62da56df8 is located 88 bytes inside of 96-byte region [0x77a62da56da0,0x77a62da56e00)
freed by thread T0 here:
    #0 0x572ceee2cab6 in free /builds/worker/fetches/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:51:3
    #1 0x77260ed0436d in operator delete /builds/worker/workspace/obj-build/dist/include/mozilla/cxxalloc.h:64:10
    #2 0x77260ed0436d in mozilla::MemoryTelemetry::Release() /builds/worker/checkouts/gecko/xpcom/base/MemoryTelemetry.cpp:122:1
    #3 0x7b262e6488a0 in __run_exit_handlers stdlib/exit.c:118:8
    #4 0x7b262e64897d in exit stdlib/exit.c:148:3
    #5 0x572ceef5b44c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerDriver.cpp
    #6 0x77261bd3f5fb in mozilla::FuzzerRunner::Run(int*, char***) /builds/worker/checkouts/gecko/tools/fuzzing/interface/harness/FuzzerRunner.cpp:87:13
    #7 0x77261bc8bdba in XREMain::XRE_mainStartup(bool*) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4817:35
    #8 0x77261bc9759f in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6148:12
    #9 0x77261bc98703 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6246:21
    #10 0x572ceee718b8 in do_main /builds/worker/checkouts/gecko/comm/mail/app/nsMailApp.cpp:218:22
    #11 0x572ceee718b8 in main /builds/worker/checkouts/gecko/comm/mail/app/nsMailApp.cpp:414:16
    #12 0x7b262e62a574 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #13 0x7b262e62a627 in __libc_start_main csu/../csu/libc-start.c:360:3
    #14 0x572ceed8b9b8 in _start (/home/user/Documents/fuzzfetch/thunderbird-fuzzing3/thunderbird-bin+0xcf9b8) (BuildId: 814d5880744d72c1bafd6f9f161c20cec1ca34c2)
 
previously allocated by thread T0 here:
    #0 0x572ceee2cd54 in malloc /builds/worker/fetches/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3
    #1 0x572ceee823a5 in moz_xmalloc /builds/worker/checkouts/gecko/memory/mozalloc/mozalloc.cpp:51:15
    #2 0x77260ed0463e in operator new /builds/worker/workspace/obj-build/dist/include/mozilla/cxxalloc.h:33:10
    #3 0x77260ed0463e in mozilla::MemoryTelemetry::Get() /builds/worker/checkouts/gecko/xpcom/base/MemoryTelemetry.cpp:146:17
    #4 0x77261083ee58 in XPCJSContext::AfterProcessTask(unsigned int) /builds/worker/checkouts/gecko/js/xpconnect/src/XPCJSContext.cpp:1539:5
    #5 0x77260eee1311 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1220:24
    #6 0x77260eeeb828 in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:461:10
    #7 0x77260acbf476 in bool mozilla::SpinEventLoopUntil<(mozilla::ProcessFailureBehavior)1, mozilla::net::FuzzingStreamListener::waitUntilDone()::'lambda'()>(nsTSubstring<char> const&, mozilla::net::FuzzingStreamListener::waitUntilDone()::'lambda'()&&, nsIThread*) /builds/worker/workspace/obj-build/dist/include/mozilla/SpinEventLoopUntil.h:176:25
    #8 0x77260d5b226f in waitUntilDone /builds/worker/workspace/obj-build/dist/include/mozilla/fuzzing/FuzzingStreamListener.h:23:5
    #9 0x77260d5b226f in FuzzingMimeDecoder /builds/worker/checkouts/gecko/comm/mailnews/mime/fuzz/TestMimeFuzz.cpp:84:19
    #10 0x77260d5b226f in LibFuzzerTestMimeDecoder(unsigned char const*, unsigned long) /builds/worker/checkouts/gecko/comm/mailnews/mime/fuzz/TestMimeFuzz.cpp:89:1
    #11 0x572ceef6a346 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerLoop.cpp:619:13
    #12 0x572ceef55b68 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerDriver.cpp:335:6
    #13 0x572ceef5b2dd in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerDriver.cpp:871:9
    #14 0x77261bd3f5fb in mozilla::FuzzerRunner::Run(int*, char***) /builds/worker/checkouts/gecko/tools/fuzzing/interface/harness/FuzzerRunner.cpp:87:13
    #15 0x77261bc8bdba in XREMain::XRE_mainStartup(bool*) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4817:35
    #16 0x77261bc9759f in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6148:12
    #17 0x77261bc98703 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6246:21
    #18 0x572ceee718b8 in do_main /builds/worker/checkouts/gecko/comm/mail/app/nsMailApp.cpp:218:22
    #19 0x572ceee718b8 in main /builds/worker/checkouts/gecko/comm/mail/app/nsMailApp.cpp:414:16
    #20 0x7b262e62a574 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #21 0x7b262e62a627 in __libc_start_main csu/../csu/libc-start.c:360:3
    #22 0x572ceed8b9b8 in _start (/home/user/Documents/fuzzfetch/thunderbird-fuzzing3/thunderbird-bin+0xcf9b8) (BuildId: 814d5880744d72c1bafd6f9f161c20cec1ca34c2)
 
SUMMARY: AddressSanitizer: heap-use-after-free /builds/worker/checkouts/gecko/xpcom/base/MemoryTelemetry.cpp:160:8 in mozilla::MemoryTelemetry::Poke()

It's clear that the problem is the final drain of the event loop is still invoking the memory telemetry after it was already freed, but I don't understand why that's happening or how to prevent it.

Chris, do you know who might be a good person for us to ask about this?

Flags: needinfo?(chutten)

I think :pbone is the one most likely to know how best to work out "Fuzzer's deinit of XPCOM is triggering the XPCJSContext to poke an uninit'd MemoryTelemetry reporter"

Flags: needinfo?(chutten) → needinfo?(pbone)

Those stacks paint a clear picture. I'll fix it on Monday.

Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(pbone)

It seems strange that the fuzzing code would call exit(), this is running destructors for static variables which have no defined execution order and causing the use after free:

    #2 0x77260ed0436d in mozilla::MemoryTelemetry::Release() /builds/worker/checkouts/gecko/xpcom/base/MemoryTelemetry.cpp:122:1
    #3 0x7b262e6488a0 in __run_exit_handlers stdlib/exit.c:118:8
    #4 0x7b262e64897d in exit stdlib/exit.c:148:3
    #5 0x572ceef5b44c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerDriver.cpp
    #6 0x77261bd3f5fb in mozilla::FuzzerRunner::Run(int*, char***) /builds/worker/checkouts/gecko/tools/fuzzing/interface/harness/FuzzerRunner.cpp:87:13
    #7 0x77261bc8bdba in XREMain::XRE_mainStartup(bool*) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4817:35
    #8 0x77261bc9759f in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6148:12

I will fix this use after free but there may be others that get caused by calling exit() without shutting down XPCOM.

(In reply to Justin Tracey [:jtracey] from comment #9)

I've been looking into this, but haven't made a ton of progress. Here is the output with ASAN's backtraces:

FUZZER=MimeDecoder ./thunderbird test.bin

Hi Justin, I can see this is how to run the test. Can you show me how to build for this test and if there's anything else I need to run it? Or is it best to upload some patches and ask you if you could test them?

Thanks.

Flags: needinfo?(jtracey)

:truber, I vaguely remember we had a similar problem before with libFuzzer calling exit() causing issues with our destructor order? how did we fix that?

Flags: needinfo?(jschwartzentruber)

There's _exit() which skips the destructors, I think. Although I remember there's some complication about it? Looking at AppShutdown::DoImmediateExit, on Windows we do something different. Honestly you could just call DoImmediateExit() and maybe that would be best.

(In reply to Paul Bone [:pbone] from comment #14)

Hi Justin, I can see this is how to run the test. Can you show me how to build for this test and if there's anything else I need to run it? Or is it best to upload some patches and ask you if you could test them?

Thanks.

To run the test, you'd need to build thunderbird with fuzzing enabled (including the gtest build, as documented there). Submitting changes for me to try is fine too though.

(In reply to Andrew McCreight [:mccr8] from comment #16)

There's _exit() which skips the destructors, I think. Although I remember there's some complication about it? Looking at AppShutdown::DoImmediateExit, on Windows we do something different. Honestly you could just call DoImmediateExit() and maybe that would be best.

We're only fuzzing on Linux for now, FWIW, though longer term that could change. If terminating more quickly is the solution, and ASAN is fine with it, I'm good with that, but I would still like to understand why we need that (and don't in Firefox fuzzing targets, at least from what I can see).

Flags: needinfo?(jtracey)

(In reply to Christian Holler (:decoder) from comment #15)

:truber, I vaguely remember we had a similar problem before with libFuzzer calling exit() causing issues with our destructor order? how did we fix that?

We fixed this in bug 1695761 by patching a lot of the exit() calls in libfuzzer to return instead. It looks like there's a lot more that could be done.

Flags: needinfo?(jschwartzentruber)
See Also: → 1695761

MemoryTelemetry was already ref counted, but it also had a static global
that was assumed to be never cleared/freed. This patch makes it
entirely reference counted and places a reference in TelemetryImpl so
that TelemetryImpl can always access it. It also requires threadsafe
reference counting since some of the memory telemetry is collected
off-thread.

Until telemetry is initialised MemoryTelemetry will consume no memory.

This shutdown observer flushes telemetry from content processes during
shutdown. It may have been created for memory telemetry but it's more
general.

Pushed by pbone@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/0147760b374d https://hg.mozilla.org/integration/autoland/rev/085f296a1221 Use reference counting for MemoryTelemetry r=toolkit-telemetry-reviewers,chutten https://github.com/mozilla-firefox/firefox/commit/bad6494dd217 https://hg.mozilla.org/integration/autoland/rev/feb6702151bc Make creation of MemoryTelemetry explicit r=toolkit-telemetry-reviewers,chutten https://github.com/mozilla-firefox/firefox/commit/bc6ccb4c0628 https://hg.mozilla.org/integration/autoland/rev/4fd494628aac Move the shutdown observer into TelemetryImpl r=toolkit-telemetry-reviewers,chutten https://github.com/mozilla-firefox/firefox/commit/4fb3fa392553 https://hg.mozilla.org/integration/autoland/rev/aca3a656d078 MemoryTelemetry doesn't need to inherrit from nsISupports r=toolkit-telemetry-reviewers,chutten
Pushed by agoloman@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/4bea86c2ae8d https://hg.mozilla.org/integration/autoland/rev/797ae648ee02 Revert "Bug 1996828 - MemoryTelemetry doesn't need to inherrit from nsISupports r=toolkit-telemetry-reviewers,chutten" for causing mda failures @browser_encrypted_play_time_telemetry.js.

Backed out for causing mda failures @browser_encrypted_play_time_telemetry.js.

Flags: needinfo?(pbone)

I just realized that only one commit in the stack was reverted, and the rest of the stack seems to be sufficient for our purposes. :tsmith, can we try fuzzing Thunderbird again?

Flags: needinfo?(twsmith)

Thanks for the ni? I have the fuzzer up and running in our infrastructure. I will log any issues that are reported.

Flags: needinfo?(twsmith)

Excellent! I'm going to close this now, since everything in this component is complete. If there's lingering issues that still need to be fixed with MemoryTelemetry, that's probably best tracked in its own issue at this point. Thanks everyone for helping get this over the finish line!

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 149 Branch
See Also: → 2018949

Hello Tyson:

In comment 8 you said you were seeing crashes and fixing the crashes is necessary for the target to run properly in automation.
Early december crash fixes were landed, but they were backed out for causing other failures (see comment 25).

In comment 27 you said, that the fuzzer is correctly running.
Does that mean it is no longer necessary to fix the crashes? Or did the crashes disappear?

Thanks also for your offer to let us know about any issue that you'll see reported.
I'm wondering, is there a way for the Thunderbird team to have a look at logs to confirm the MIME fuzzer is executed as expected?
Is there any help needed from our side to keep fuzzing running?

Flags: needinfo?(twsmith)

(In reply to Kai Engert [:KaiE:] from comment #29)

In comment 8 you said you were seeing crashes and fixing the crashes is necessary for the target to run properly in automation.
Early december crash fixes were landed, but they were backed out for causing other failures (see comment 25).

See bug 2018949, but this does not seem to be affecting our automation runs.

In comment 27 you said, that the fuzzer is correctly running.
Does that mean it is no longer necessary to fix the crashes? Or did the crashes disappear?

Yes, it is running in automation but only runs for 20 minutes before OOMing. This is a separate issue, the memory usage seems to continue to grow over time, perhaps a memory leak in the target? In the short term we can give the machine more memory.

Thanks also for your offer to let us know about any issue that you'll see reported.

Check your access to: https://fuzzmanager.fuzzing.mozilla.org. If you can login we can grant access to Thunderbird reports, but there aren't any at the moment.

I'm wondering, is there a way for the Thunderbird team to have a look at logs to confirm the MIME fuzzer is executed as expected?

Nothing beyond checking locally at this time.

Is there any help needed from our side to keep fuzzing running?

Investigating if there is indeed a memory leak in the target and getting that resolved would allow for full fuzzing runs.

Flags: needinfo?(twsmith)

Fixing this now. I see the problem. The observer was holding a weak reference to the telemetry class. it needs a strong reference to live long enough to get the content-child-shutdown message.

Flags: needinfo?(pbone)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: fuzz Thunderbird on Mozilla infrastructure → Fuzzing calls exit() which runs static destructors and can cause MemoryTelemetry to be used after free.
Duplicate of this bug: 2018949
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: