Closed Bug 1898292 Opened 1 year ago Closed 11 months ago

Timeout in NetworkWebsocket fuzzer

Categories

(Core :: Fuzzing, defect, P2)

defect
Points:
5

Tracking

()

RESOLVED FIXED
130 Branch
Tracking Status
firefox130 --- fixed

People

(Reporter: jkratzer, Assigned: kershaw)

Details

(Whiteboard: [fuzzblocker])

Attachments

(6 files)

Attached file corpora.zip

Replaying the attached corpus triggers a timeout which has a negative impact on fuzzing. Executing both tests are required to trigger the timeout.

$ unzip corpora.zip 
$ FUZZER=NetworkWebsocket ./libfuzzer/firefox -rss_limit_mb=3500 -timeout=5 ./corpora/* 
*** You are running in headless mode.
Running Fuzzer tests...
INFO: Seed: 796195894
INFO: Loaded 2 modules   (3887989 inline 8-bit counters): 15344 [0x7442f9455e50, 0x7442f9459a40), 3872645 [0x7442f0f0dea0, 0x7442f12bf625), 
INFO: Loaded 2 PC tables (3887989 PCs): 15344 [0x7442f9459a40,0x7442f9495940), 3872645 [0x7442f12bf628,0x7442f4dd6e78), 
./libfuzzer/firefox: Running 2 inputs 1 time(s) each.
Running: ./corpora/0701e82da7a246e42be944256cddb4f92cf68e66
Executed ./corpora/0701e82da7a246e42be944256cddb4f92cf68e66 in 74 ms
Running: ./corpora/070c32b493e14da935746f8e2365c1fe06c48e59
ALARM: working on the last Unit for 5 seconds
       and the timeout value is 5 (use -timeout=N to change)
==157738== ERROR: libFuzzer: timeout after 5 seconds
    #0 0x5bf339a3c7d1 in __sanitizer_print_stack_trace /builds/worker/fetches/llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3
    #1 0x5bf339c617c1 in fuzzer::PrintStackTrace() /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerUtil.cpp:210:5
    #2 0x5bf339c4a430 in fuzzer::Fuzzer::AlarmCallback() /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerLoop.cpp:309:5
    #3 0x7442fc4d651f  (/lib/x86_64-linux-gnu/libc.so.6+0x4251f) (BuildId: 962015aa9d133c6cbcfb31ec300596d7f44d3348)
    #4 0x7442fc525116 in __futex_abstimed_wait_common64 nptl/futex-internal.c:57:12
    #5 0x7442fc525116 in __futex_abstimed_wait_common nptl/futex-internal.c:87:9
    #6 0x7442fc525116 in __GI___futex_abstimed_wait_cancelable64 nptl/futex-internal.c:139:10
    #7 0x7442fc527a40 in __pthread_cond_wait_common nptl/pthread_cond_wait.c:503:10
    #8 0x7442fc527a40 in pthread_cond_wait nptl/pthread_cond_wait.c:627:10
    #9 0x5bf339b5216b in mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) /builds/worker/checkouts/gecko/mozglue/misc/ConditionVariable_posix.cpp:106:11
    #10 0x7442d2875c29 in Wait /builds/worker/workspace/obj-build/dist/include/mozilla/CondVar.h:58:11
    #11 0x7442d2875c29 in mozilla::TaskController::GetRunnableForMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:631:19
    #12 0x7442d28ba3e0 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1134:38
    #13 0x7442d28c862a in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
    #14 0x7442cebcc57b in SpinEventLoopUntil<(mozilla::ProcessFailureBehavior)1, (lambda at /builds/worker/checkouts/gecko/netwerk/test/fuzz/TestWebsocketFuzzing.cpp:217:24)> /builds/worker/workspace/obj-build/dist/include/mozilla/SpinEventLoopUntil.h:176:25
    #15 0x7442cebcc57b in mozilla::net::FuzzingRunNetworkWebsocket(unsigned char const*, unsigned long) /builds/worker/checkouts/gecko/netwerk/test/fuzz/TestWebsocketFuzzing.cpp:216:5
    #16 0x5bf339c4b8db in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerLoop.cpp:570:11
    #17 0x5bf339c38c22 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerDriver.cpp:301:6
    #18 0x5bf339c3d8d1 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /builds/worker/checkouts/gecko/tools/fuzzing/libfuzzer/FuzzerDriver.cpp:810:9
    #19 0x7442e2d8be6e in mozilla::FuzzerRunner::Run(int*, char***) /builds/worker/checkouts/gecko/tools/fuzzing/interface/harness/FuzzerRunner.cpp:75:13
    #20 0x7442e2c9a1c6 in XREMain::XRE_mainStartup(bool*) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4698:35
    #21 0x7442e2cab77f in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:5970:12
    #22 0x7442e2cacac1 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:6039:21
    #23 0x5bf339a718b2 in do_main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:230:22
    #24 0x5bf339a718b2 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:448:16
    #25 0x7442fc4bdd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #26 0x7442fc4bde3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #27 0x5bf339995f78 in _start (/home/jkratzer/builds/libfuzzer/firefox+0xd9f78) (BuildId: 3e3be701051dffbadc26ac0a42058db5b55a2627)

SUMMARY: libFuzzer: timeout
Attached file stderr.txt
Whiteboard: [fuzzblocker]
Severity: -- → S3
Points: --- → 5
Priority: -- → P2
Whiteboard: [fuzzblocker] → [fuzzblocker][necko-triaged][necko-priority-next]

This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:kershaw, could you consider increasing the severity?

For more information, please visit BugBot documentation.

Flags: needinfo?(kershaw)
Severity: S3 → S2
Flags: needinfo?(kershaw)
Whiteboard: [fuzzblocker][necko-triaged][necko-priority-next] → [fuzzblocker][necko-triaged][necko-priority-queue]
Assignee: nobody → rjesup

Something is wrong in the assumptions encoded in FuzzyLayer.cpp. I've poked at it, took an RR trace, ran it with logging (and both corpora separately), and it's clear that the state machines embedded in FuzzyLayer/TestWebSocketFuzzing have some disconnect with how things work.

I do note that I see network activity/calls after the first test ends when only running the first test; and I see that when I run both as well. I suspect that interaction is involved in the problem.

I don't see anything wrong in the WebSocket/network code here.

Assignee: rjesup → nobody
Component: Networking → Fuzzing
Whiteboard: [fuzzblocker][necko-triaged][necko-priority-queue] → [fuzzblocker][necko-triaged]
Whiteboard: [fuzzblocker][necko-triaged] → [fuzzblocker]

Has there been any progress on this? I'm considering disabling this fuzzer if a resolution is not found soon.

(In reply to Jason Kratzer [:jkratzer] from comment #7)

Has there been any progress on this? I'm considering disabling this fuzzer if a resolution is not found soon.

Sorry that I have no idea how to reproduce this on my mac book.
Could you try to get a log with setting MOZ_LOG to timestamp,sync,nsHttp:5,nsWebSocket:5,nsSocketTransport:5,nsHostResolver:5?
I can try to take a look at the log. Thanks.

Flags: needinfo?(jkratzer)

Kershaw, does the testcase not reproduce for you or you don't know how to reproduce it? If it's the latter, you can do the following:

$ pip install fuzzfetch
$ python -m fuzzfetch -a --fuzzing --target gtest -n libfuzzer
$ unzip corpora.zip 
$ FUZZER=NetworkWebsocket ./libfuzzer/firefox -rss_limit_mb=3500 -timeout=5 ./corpora/*

If you still can't reproduce the issue please let me know and I'll get you the logs.

Flags: needinfo?(jkratzer)

I sent a log with those and nsFuzzingNecko:5 to kershaw

Note: you need to build firefox with --enable-fuzzing, and probably use "./firefox" instead of ./libfuzzer/firefox

I saw this socket error from the log:

2024-07-29 21:49:37.311469 UTC - [Parent 3093518: Socket Thread]: V/nsFuzzingNecko [FuzzyConnect] Denying additional connection.
2024-07-29 21:49:37.311480 UTC - [Parent 3093518: Socket Thread]: D/nsSocketTransport ErrorAccordingToNSPR [in=-5978 out=80004005]
2024-07-29 21:49:37.311485 UTC - [Parent 3093518: Socket Thread]: D/nsSocketTransport   after event [this=7fc8860d7300 cond=80004005]

This is the reason of the websocket failure.
That error is from the fuzzing code here.
Jason, could you take a look? It seems that necko can't do anything about this error.

Flags: needinfo?(jkratzer)

:decoder, can you take a look at this?

Flags: needinfo?(jkratzer) → needinfo?(choller)

I don't understand why the log in comment 12 should make this time out.

The code in every network fuzzing target spins until the channel associated with the fuzzing request is destroyed and the connection closed.

https://searchfox.org/mozilla-central/source/netwerk/test/fuzz/TestWebsocketFuzzing.cpp#207-218

If we hit the failure above while opening an additional connection (what connection is this, btw?), why don't we close/destroy the channel then? Could it be that we drop this error somewhere and don't tear down properly? That was the case with other (HTTP) connections before.

Flags: needinfo?(choller)
Flags: needinfo?(kershaw)

Jason, also, if you can reproduce the issue, can you check which of the two conditions we are waiting for while timing out?

Flags: needinfo?(jkratzer)

(In reply to Christian Holler (:decoder) from comment #14)

I don't understand why the log in comment 12 should make this time out.

The code in every network fuzzing target spins until the channel associated with the fuzzing request is destroyed and the connection closed.

https://searchfox.org/mozilla-central/source/netwerk/test/fuzz/TestWebsocketFuzzing.cpp#207-218

If we hit the failure above while opening an additional connection (what connection is this, btw?), why don't we close/destroy the channel then? Could it be that we drop this error somewhere and don't tear down properly? That was the case with other (HTTP) connections before.
So, I finally figured out the reason for the timeout.

The test times out because we are waiting for gFuzzingConnClosed here:
https://searchfox.org/mozilla-central/rev/669fac9888b173c02baa4c036e980c0c204dfe02/netwerk/test/fuzz/TestWebsocketFuzzing.cpp#217

There are two WebSocket connections created in the test, and both are closed at the end. gFuzzingConnClosed is set to false here because PR_SendTo is called:
https://searchfox.org/mozilla-central/rev/669fac9888b173c02baa4c036e980c0c204dfe02/netwerk/base/FuzzyLayer.cpp#190

2024-07-29 21:49:37.313163 UTC - [Parent 3093518: Socket Thread]: I/nsHttp Http3Session::ProcessOutput sending packet with 1357 bytes to 127.0.0.1 port=443 [this=7fc8860d7000].
2024-07-29 21:49:37.313170 UTC - [Parent 3093518: Socket Thread]: V/nsFuzzingNecko [FuzzySendTo] Successfully opened connection: 7fc88531b970

PR_SendTo is called because Necko created an HTTP/3 connection to verify the Alt-Svc received header. Since the HTTP/3 connection timeout (30s) is larger than the timeout (5s) used in this test, we timeout.

2024-07-29 21:49:37.310523 UTC - [Parent 3093518: Main Thread]: D/nsHttp AltSvcMapping created npnToken=h3
2024-07-29 21:49:37.310530 UTC - [Parent 3093518: Main Thread]: D/nsHttp AltSvcMapping ctor 7fc8a986de40 https://3+fdct:443 to :443

A quick workaround might be disabling HTTP/3 in NetworkWebsocket fuzzer.

Flags: needinfo?(kershaw)
Flags: needinfo?(jkratzer)
Assignee: nobody → kershaw
Status: NEW → ASSIGNED
Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a76a20a4048b Disable HTTP/3 when fuzzing websocket, r=decoder
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 130 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: