Closed Bug 1919678 Opened 26 days ago Closed 4 days ago

Crash in [@ neqo_common::codec::Encoder::encode_uint<T>]

Categories

(External Software Affecting Firefox :: Other, defect, P1)

Other
Windows

Tracking

(relnote-firefox 131+, firefox-esr115132+ affected, firefox-esr128132+ fixed, firefox131+ fixed, firefox132+ fixed, firefox133+ fixed)

RESOLVED FIXED
Tracking Status
relnote-firefox --- 131+
firefox-esr115 132+ affected
firefox-esr128 132+ fixed
firefox131 + fixed
firefox132 + fixed
firefox133 + fixed

People

(Reporter: release-mgmt-account-bot, Assigned: mail)

References

(Blocks 1 open bug)

Details

(Keywords: crash, Whiteboard: [necko-triaged])

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/da8fc03b-0e89-4049-8f4e-57bf40240914

MOZ_CRASH Reason: assertion failed: n > 0 && n <= 8

Top 10 frames of crashing thread:

0  xul.dll  MOZ_Crash  mfbt/Assertions.h:317
0  xul.dll  RustMozCrash  mozglue/static/rust/wrappers.cpp:18
1  xul.dll  mozglue_static::panic_hook  mozglue/static/rust/lib.rs:102
2  xul.dll  core::ops::function::Fn::call<void   /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:166
3  xul.dll  alloc::boxed::impl$50::call  library/alloc/src/boxed.rs:2084
3  xul.dll  std::panicking::rust_panic_with_hook  library/std/src/panicking.rs:808
4  xul.dll  std::panicking::begin_panic_handler::closure$0  library/std/src/panicking.rs:667
5  xul.dll  std::sys::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0, never$>  library/std/src/sys/backtrace.rs:168
6  xul.dll  std::panicking::begin_panic_handler  library/std/src/panicking.rs:665
7  xul.dll  core::panicking::panic_fmt  library/core/src/panicking.rs:74

By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:

  • First crash report: 2024-09-14
  • Process type: Parent
  • Is startup crash: No
  • Has user comments: No
  • Is null crash: No
Component: General → Networking: DNS

Hey Max, FYI
Do you have any idea what recent changes might have caused this?

Flags: needinfo?(mail)
Whiteboard: [necko-triaged]
Assignee: nobody → mail
Flags: needinfo?(mail)

Thank you for triaging Sunil. I will take a more in-depth look.

Tracking upstream in github.com/mozilla/neqo#2132.

Severity: -- → S3
Priority: -- → P2

As :yjuglaret noted in the neqo issue, this crash is highly correlated with Avast, and the fact that it's showing up across many versions of Firefox strongly indicate this is an issue with Avast. I'm moving this bug to External Software Affecting Firefox and will ping our Avast contacts to see if they can take a look.

Component: Networking: DNS → Other
Product: Core → External Software Affecting Firefox

People are creating forum threads about this too. Since it is making Firefox unusable for them, shouldn't this issue be set to P1?

I don't have enough experience with crash-stats.mozilla.org to judge the importance of this issue.

Maybe Greg Stoll, can you comment here?

We could handle the error more gracefully, failing the connection instead of panicking. Though I assume that would make it more difficult to actually find and fix the root cause of the issue.

Flags: needinfo?(gstoll)

For the record, we have extended the assert with additional metadata. This will be available with #1922479.

It's the #2 topcrash in Nightly 133. Since there are also several public complaints we have more evidence that this is affecting many people.

While this does seem to affect all channels, it looks worse in 133 nightly, which could mean a nightly-only feature or pref increases the chance of the crash. Or, there could be some other new issue in 133 that makes it worse, in which case it will be a high volume crash for 133 beta and release. So my tendency would be to bump up the priority here and in bug 1922479. jesup, you set the priority in bug 1992479 so maybe you can weigh in .

Flags: needinfo?(rjesup)

The bug is marked as tracked for firefox133 (nightly). However, the bug still has low severity.

:gcp, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(gpascutto)
Flags: needinfo?(rjesup)

shouldn't this issue be set to P1?

That only really serves a purpose if we can fix it, not if the issue is with a third party. We can't force Avast to prioritize this. But maybe we can mitigate it...somehow?

Since there are also several public complaints we have more evidence that this is affecting many people.

Nightly only or release?

Severity: S3 → S2
Flags: needinfo?(gpascutto)
Priority: P2 → P1
Depends on: 1923287

Max - will bug 1923287 fix this crash? (or is it just adding more logging?)

Flags: needinfo?(gstoll) → needinfo?(mail)

bug 1923287 contains mozilla/neqo#2150, which terminates a connection, instead of panicking, when receiving an invalid ACK, i.e. an ACK for a packet that was never sent.

While this will most likely resolve the panic itself, it is still an open question why we are seeing an increase in invalid ACKs.

Flags: needinfo?(mail)

Note that neqo_transport::connection::Connection::add_packet_number is the new signature for this crash after we added some additional metadata in neqo v0.9.1.

As of neqo v0.9.2, we believe that Firefox should close the connection if we get an unexpected ACK instead of panicking.

Crash Signature: [@ neqo_common::codec::Encoder::encode_uint<T>] → [@ neqo_common::codec::Encoder::encode_uint<T>] [@ neqo_transport::connection::Connection::add_packet_number]
Duplicate of this bug: 1923565

Looking at the Nightly data the fix bug 1923287 seems to have eliminated the crash, although presumably connections are instead being closed so the underlying issue remains.

Duplicate of this bug: 1923895

Copying crash signatures from duplicate bugs.

Crash Signature: [@ neqo_common::codec::Encoder::encode_uint<T>] [@ neqo_transport::connection::Connection::add_packet_number] → [@ neqo_common::codec::Encoder::encode_uint<T>] [@ neqo_transport::connection::Connection::add_packet_number] [@ neqo_common::codec::Encoder::encode_uint]

For the record, I have reached out to Avast, informing them that the crash itself is fixed, but the underlying issue now surfaces as expected through connections being closed. They are investigating whether this is related to their logic that "intercepts Quic traffic".

Crash Signature: [@ neqo_common::codec::Encoder::encode_uint<T>] [@ neqo_transport::connection::Connection::add_packet_number] [@ neqo_common::codec::Encoder::encode_uint] → [@ neqo_common::codec::Encoder::encode_uint<T>] [@ neqo_transport::connection::Connection::add_packet_number] [@ neqo_common::codec::Encoder::encode_uint]
See Also: → 1923943

Since we have fixed the crash, I'm closing this bug (and will request uplifts) and filed bug 1923943 to cover the QUIC connection being closed as a result of the unexpected ACK.

Status: NEW → RESOLVED
Closed: 4 days ago
Resolution: --- → FIXED
Attachment #9430284 - Flags: approval-mozilla-release?

release Uplift Approval Request

  • User impact if declined: Possible crash
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: N/A
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Patch is verified
  • String changes made/needed: N/A
  • Is Android affected?: yes
Attachment #9430284 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9430354 - Flags: approval-mozilla-esr128?

esr128 Uplift Approval Request

  • User impact if declined: possible crash
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: N/A
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Patch is verified
  • String changes made/needed: N/A
  • Is Android affected?: yes
Attachment #9430354 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+
See Also: → 1924423
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: