Open Bug 1800152 Opened 3 years ago Updated 3 years ago

Stack overflow, with nested event loop in nsSyncStreamListener::Create

Categories

(Core :: Networking, defect, P3)

x86
Windows 11
defect

Tracking

()

People

(Reporter: dholbert, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [necko-triaged])

Crash report: https://crash-stats.mozilla.org/report/index/9a4f6ed1-064c-4bae-86d5-1e1420221109

Reason: EXCEPTION_STACK_OVERFLOW

Top 10 frames of crashing thread:

0  xul.dll  nsLineBreaker::AppendText  dom/base/nsLineBreaker.cpp:340
1  xul.dll  BuildTextRunsScanner::SetupBreakSinksForTextRun  layout/generic/nsTextFrame.cpp:2890
2  xul.dll  BuildTextRunsScanner::BuildTextRunForFrames  layout/generic/nsTextFrame.cpp:2665
3  xul.dll  BuildTextRunsScanner::FlushFrames  layout/generic/nsTextFrame.cpp:1750
4  xul.dll  BuildTextRunsScanner::ScanFrame  layout/generic/nsTextFrame.cpp:2110
5  xul.dll  BuildTextRunsScanner::ScanFrame  layout/generic/nsTextFrame.cpp:2164
6  xul.dll  BuildTextRunsScanner::ScanFrame  layout/generic/nsTextFrame.cpp:2164
7  xul.dll  nsTextFrame::EnsureTextRun  layout/generic/nsTextFrame.cpp:3096
8  xul.dll  nsTextFrame::ReflowText  layout/generic/nsTextFrame.cpp:9572
9  xul.dll  nsLineLayout::ReflowFrame  layout/generic/nsLineLayout.cpp:873

That^ backtrace is mostly-irrelevant; it's just indicating that we happen to be doing text reflow when we run out of stack space. But text reflow isn't the reason that we run out of stack space. The crash annotations tab shows that we're deeply nested in nested event loops inside of nsSyncStreamListener::Create for some reason.

Not sure where to file this bug; I'll start it off in Networking since nsSyncStreamListener::Create is a networking function. The root issue here is that we're spinning the event loop while we're inside of that function, and there's something (possibly frontend code?) that then thinks we need to call nsSyncStreamListener::Create again at that point.

This is a crash report for Thunderbird 102.4.2, so if there's some faulty frontend code here, it's possible that it's Thunderbird code.

I think the problem is that something is creating either a lot or long lived nsMsgProtocols here. This could be avoided if it used nsMsgProtocol::AsyncOpen instead of nsMsgProtocol::Open.

I'm sure it's possible this is a bug with Firefox, but I'm not really sure how you'd hit it.
We should also add the name of the class calling NS_ImplementChannelOpen when spinning the event loop here so we get an idea of which consumers are causing problems.

Put this in our review queue, since we might have an idea about how to debug this.

Severity: -- → S3
Priority: -- → P3
Whiteboard: [necko-triaged][necko-priority-review]

Side note:

  • I removed nsSyncStreamListener::Create in bug 1800344, since Firefox/Gecko didn't have any direct callers aside from NS_NewSyncStreamListener() (which it's now folded into).
  • Thunderbird did have some other callers, from JS, but those are being removed in bug 1800606.
  • So it's likely that the XPCOMSpinEventLoopStack value (for crashes of this sort) will change as a result of one or both of those bugs. Possibly this might even go away if the Thunderbird changes happen to avoid this somehow, but I'm not sure.
See Also: → 1800606
Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged]
You need to log in before you can comment on or make changes to this bug.