1916558 - fosstodon.org doesn't load with network.http.http3.use_nspr_for_io=false

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Description

•

17 days ago

•

Mozregression says it's in this range: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=4d36a2f69e7027dcc1e3535e563aa57e556d9f70&tochange=c38029641964591e518535856e2d7b3038b134ad

Could not pinpoint the commit because of mozregression aarch64 support issue (WARNING: Skipping build b94ec5ba05c9: Unable to find build info using the taskcluster route 'gecko.v2.autoland.shippable.revision.b94ec5ba05c96151a539e0c05403f7384829b008.firefox.win64-aarch64-opt) but I see bug 1910360 being the only network change there.

Bisected by mozregression --good 2024-07-03 --bad 2024-08-14 -a https://fosstodon.org/explore. On a bad build the page never finishes loading, and on a good build it loads instantly.

Edit: On MSEdge the page always loads instantly.

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Updated

•

17 days ago

OS: Unspecified → Windows 11

Hardware: Unspecified → ARM64

Version: unspecified → Firefox 130

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 1

•

17 days ago

Setting network.http.http3.use_nspr_for_io to true indeed fixes the issue.

Summary: fosstodon.org doesn't load since nightly 2024-08-02 → fosstodon.org doesn't load with network.http.http3.use_nspr_for_io=false

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

17 days ago

Set release status flags based on info from the regressing bug 1910360

:mail, since you are the author of the regressor, bug 1910360, could you take a look?

For more information, please visit BugBot documentation.

status-firefox130: --- → affected

status-firefox131: --- → affected

status-firefox132: --- → affected

status-firefox-esr115: --- → unaffected

status-firefox-esr128: --- → unaffected

Flags: needinfo?(mail)

Max Inden

Assignee

Comment 3

•

16 days ago

Thank you Kagami Rosylight for tracking this down. I will take a look.

Max Inden

Assignee

Comment 4

•

16 days ago

Given that network.http.http3.use_nspr_for_io is only set to false on Nightly, i.e. quinn-udp only used Nightly, I don't think this affects firefox130, firefox131 and firefox132. Am I missing something?

# Use NSPR for HTTP3 UDP IO
- name: network.http.http3.use_nspr_for_io
  type: RelaxedAtomicBool
# Always use NSPR on Android x86. On Android x86 sendmsg and recvmmsg, both used
# by quinn-udp, are disallowed by seccomp. Future fix and further details
# tracked in https://github.com/quinn-rs/quinn/issues/1947.
#if defined(ANDROID) && !defined(HAVE_64BIT_BUILD)
  value: true
#else
  value: @IS_NOT_NIGHTLY_BUILD@
#endif
  mirror: always

https://searchfox.org/mozilla-central/rev/8fffdc727aa507ee4955042ec2d6f71d23c9c2de/modules/libpref/init/StaticPrefList.yaml#13672-13683

Max Inden

Assignee

Comment 5

•

16 days ago

I am unable to reproduce this bug on x86-64 Linux and Arm M2 Mac.

Kagami Rosylight can you reproduce the bug once more, with the following log level enabled, and send the logs to necko@mozilla.com and minden@mozilla.com?

timestamp,sync,nsHttp:5,nsSocketTransport:5,nsHostResolver:5,neqo_http3::*:5,neqo_transport::*:5,quinn_udp::*:5,neqo_udp::*:5,neqo_glue::*:5

Flags: needinfo?(krosylight)

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 6

•

16 days ago

•

Edited

Sure. A perf profile with a fresh firefox profile (select Socket Thread and see Marker Table to get the logs): https://share.firefox.dev/3XdSqo5

Flags: needinfo?(krosylight)

Kershaw Chang [:kershaw]

Comment 7

•

16 days ago

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #6)

Sure. A perf profile with a fresh firefox profile (see Marker Table to get the logs): https://share.firefox.dev/3XdSqo5

(I see zero rust logs this way 🤔)

Could you choose logging to a file instead?
Thanks.

Flags: needinfo?(krosylight)

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 8

•

16 days ago

LogMessages — (neqo_transport::) [neqo_transport] Crypto operation failed NssError { name: "SEC_ERROR_BAD_DATA", code: -8190, desc: "security library: received bad data." }
LogMessages — (neqo_transport::) [neqo_transport::stats] [Client ...] Dropped received packet: Decryption failure; Total: 27

This constantly happens, interesting.

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 9

•

16 days ago

(In reply to Kershaw Chang [:kershaw] from comment #7)

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #6)

Sure. A perf profile with a fresh firefox profile (see Marker Table to get the logs): https://share.firefox.dev/3XdSqo5

(I see zero rust logs this way 🤔)

Could you choose logging to a file instead?
Thanks.

You checked my comment too early! I just selected a wrong thread (main thread lol), select socket thread and you see all the logs.

Flags: needinfo?(krosylight)

Lars Eggert [:lars]

Comment 10

•

16 days ago

FWIW I tried it in Nightly Version 132.0a1 (2024-09-03) (64-bit) in a Windows for ARM VM on my Macbook M3 and fosstodon.org loads fine.

Lars Eggert [:lars]

Comment 11

•

16 days ago

•

Edited

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #8)

LogMessages — (neqo_transport::) [neqo_transport] Crypto operation failed NssError { name: "SEC_ERROR_BAD_DATA", code: -8190, desc: "security library: received bad data." }
LogMessages — (neqo_transport::) [neqo_transport::stats] [Client ...] Dropped received packet: Decryption failure; Total: 27

This constantly happens, interesting.

This is possibly a red herring - this is the message that gets printed when QUIC packets are padded with garbage. We should probably lower the log level of that message, or only log it if it is not a coalesced garbage packet that triggers it.

Edit: But there sure are more of those than there should be...

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 12

•

16 days ago

•

Edited

(In reply to Lars Eggert [:lars] from comment #10)

FWIW I tried it in Nightly Version 132.0a1 (2024-09-03) (64-bit) in a Windows for ARM VM on my Macbook M3 and fosstodon.org loads fine.

Hmm, could this be device specific? x86 32bit binary (by --arch 32 to mozregression) shows the same problem on my Surface Pro 11 (Qualcomm Snapdragon X) on the BER office network. (Edit: the profile above was recorded in a different network)

status-firefox130: affected → disabled

status-firefox131: affected → disabled

status-firefox132: affected → disabled

Version: Firefox 130 → Trunk

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 13

•

16 days ago

And with the same configuration my AMD desktop has no problem.

Lars Eggert [:lars]

Comment 14

•

16 days ago

•

Edited

@krosylight, am I understanding you correctly that you think this may happen on 32-bit builds/platforms but not 64-bit ones? Or only on 32-bit ARM?

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 15

•

16 days ago

•

Edited

No, I wanted to say that different archs shows the same problem on my device. All aarch64/x86/x86-64 nightly builds have the same issue.

Max Inden

Assignee

Comment 16

•

16 days ago

•

Edited

Thank you for the additional logs via mail!

Paraphrasing the above to make sure I understand correctly. On an ARM machine running Windows for ARM (Surface Pro 11) fosstodon.org does not load with aarch64, x86 and x86-64 Firefox Nightly builds with network.http.http3.use_nspr_for_io set to false. The latter two (x86, x86-64) are presumably run via Windows for ARM emulation layer.

Do I understand correctly that all other H3 pages work fine, e.g. https://quic.nginx.org/ or https://cloudflare-quic.com/ ?

Flags: needinfo?(krosylight)

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 17

•

16 days ago

•

Edited

(In reply to Max Inden from comment #16)

Thank you for the additional logs via mail!

Paraphrasing the above to make sure I understand correctly. On an ARM machine running Windows for ARM (Surface Pro 11) fosstodon.org does not load with aarch64, x86 and x86-64 Firefox Nightly builds with network.http.http3.use_nspr_for_io set to false. The latter two (x86, x86-64) are presumably run via Windows for ARM emulation layer.

Yes.

Do I understand correctly that all other H3 pages work fine, e.g. https://quic.nginx.org/ or https://cloudflare-quic.com/ ?

I did not check others as I don't know who uses H3. https://quic.nginx.org/ shows "You're not using QUIC right now, but don't despair," with NSPR pref off, while it shows "Congratulations! You're connected over QUIC." with NSPR pref on. Tested with mozregression --launch 2024-09-04 -a https://quic.nginx.org/ --pref network.http.http3.use_nspr_for_io:true. Similar result on the cloudflare page.

But as I showed to Kershaw offline, the test result is a bit flaky. Sometimes it connects well with QUIC (and then breaks on a few reload and then permanently stuck on HTTP2), or just goes straight to HTTP2 from the start.

Edit: And with the nspr pref on, things work consistently without flakyness.

Flags: needinfo?(krosylight)

Max Inden

Assignee

Comment 18

•

16 days ago

Thank you for your help!

Do you have Rust installed on this laptop? If so, would you mind trying the following and posting the output here?

# In some directory of your choice:
git clone https://github.com/quinn-rs/quinn.git
cd quinn/quinn-udp 
cargo test

Context:

When you set network.http.http3.use_nspr_for_io to false your Firefox instance uses quinn-udp instead of NSPR.
Instead of debugging quinn-udp through Firefox, we might as well debug it directly, running its unit tests on your machine.

Flags: needinfo?(krosylight)

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 19

•

16 days ago

Test seems clean:

> cargo test
warning: D:\quinn\quinn-udp\Cargo.toml: unused manifest key: target.cfg(any(target_os = "linux", target_os = "windows")).bench
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.09s
     Running unittests src\lib.rs (D:\quinn\target\debug\deps\quinn_udp-528859568f696d14.exe)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests\tests.rs (D:\quinn\target\debug\deps\tests-9710717108edc830.exe)

running 6 tests
test basic ... ok
test ecn_v4_mapped_v6 ... ok
test ecn_v6 ... ok
test ecn_v6_dualstack ... ok
test ecn_v4 ... ok
test gso ... ok

test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

   Doc-tests quinn_udp

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Flags: needinfo?(krosylight)

Max Inden

Assignee

Comment 20

•

16 days ago

Thank you for trying this so quickly. This is surprising.

Do I understand correctly that you only alter the network.http.http3.use_nspr_for_io pref?

Flags: needinfo?(krosylight)

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 21

•

16 days ago

•

Edited

Yes, as each time mozregression creates a new profile in a temp directory, which then only includes the default prefs unless told otherwise by --pref.

Flags: needinfo?(krosylight)

Lars Eggert [:lars]

Comment 22

•

15 days ago

Is this only a problem from the Berlin office network, or does it happen elsewhere?

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 23

•

15 days ago

It happens elsewhere too.

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 24

•

11 days ago

So the connection tend to work initially (in https://quic.nginx.org/) and then breaks in a few reload. Can we try making a test simulating such situation?

Valentin Gosu [:valentin] (he/him)

Updated

•

7 days ago

Assignee: nobody → mail

Severity: -- → S3

Priority: -- → P2

Whiteboard: [necko-triaged][necko-priority-next]

Max Inden

Assignee

Comment 25

•

4 days ago

If I understand Firefox's build system correctly, debug and trace level logging is not available in release builds.

Would you mind reproducing the issue at hand once more, this time in a debug build, and again send us the logs?

As far as I know, the following addition to mozconfig should do, but I assume you know better than me.

ac_add_options --enable-debug
ac_add_options --enable-debug-symbols

Please enable the following log level:

timestamp,sync,nsHttp:5,nsSocketTransport:5,nsHostResolver:5,neqo_http3::*:5,neqo_transport::*:5,quinn_udp::*:5,neqo_udp::*:5,neqo_glue::*:5,neqo_common::*:5

Thank you for your help!

Flags: needinfo?(mail) → needinfo?(krosylight)

Kagami Rosylight [:saschanaz] (they/them)

Reporter

Comment 26

•

4 days ago

Oh hey, I can see tons of dropped packets in debug build. Sending the log!

Flags: needinfo?(krosylight)