Closed Bug 1658197 Opened 4 years ago Closed 3 years ago

Crash in [@ mozilla::dom::RTCStatsTimestampMaker::RTCStatsTimestampMaker]

Categories

(Core :: WebRTC, defect, P2)

72 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1728574
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- wontfix
firefox-esr91 --- fixed
firefox79 --- wontfix
firefox80 --- wontfix
firefox81 --- wontfix
firefox82 --- wontfix
firefox83 --- wontfix
firefox87 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- wontfix

People

(Reporter: philipp, Assigned: jib)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 obsolete file)

This bug is for crash report bp-9005b00d-371d-4baf-8b90-231ac0200808.

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::RTCStatsTimestampMaker::RTCStatsTimestampMaker media/webrtc/signaling/src/peerconnection/RTCStatsReport.cpp:20
1 xul.dll mozilla::PeerConnectionImpl::PeerConnectionImpl media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:309
2 xul.dll static mozilla::PeerConnectionImpl::Constructor media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:230
3 xul.dll mozilla::dom::PeerConnectionImpl_Binding::_constructor dom/bindings/PeerConnectionImplBinding.cpp:1788
4 xul.dll static xpc::DOMXrayTraits::construct js/xpconnect/wrappers/XrayWrapper.cpp:1786
5 xul.dll xpc::XrayWrapper<js::CrossCompartmentWrapper, xpc::DOMXrayTraits>::construct const js/xpconnect/wrappers/XrayWrapper.cpp:2154
6 xul.dll static js::Proxy::construct js/src/proxy/Proxy.cpp:563
7 xul.dll Interpret js/src/vm/Interpreter.cpp:3323
8 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:612
9 xul.dll JS::Call js/src/jsapi.cpp:2837

this is a low-volume content crash signature that started showing up across all platforms since firefox 72.

Looks like GetPerformance() returns null. Crashing code was added in bug 1588588, so this is likely a regression from that refactor. Byron, you added this. Thoughts?

Flags: needinfo?(docfaraday)

Why on earth is GetPerformance() returning null?

Any ideas on why this might be happening? Are we somehow ending up with a PeerConnection but no document?

Flags: needinfo?(nika)

It's possible to have an inner window without a document both early during the object's lifecycle (e.g. before it's had a document assigned to it), and late during said lifecycle (e.g. after it's been unlinked by the cycle collector).

That being said, it seems like the more likely cause here would be the Document::GetNavigationTiming returning nullptr, as if that's the case you'll never create the mPerformance object here: https://searchfox.org/mozilla-central/rev/fa7f47027917a186fb2052dee104cd06c21dd76f/dom/base/nsGlobalWindowInner.cpp#2319-2322.

Flags: needinfo?(nika)

Are there any routine cases where we would expect the navigation timing not to be set? If there are, the webrtc-stats spec may need to altered to reflect this.

@jib, If this is truly an exceptional case, maybe we can just have the stats query fail?

Flags: needinfo?(nika)
Flags: needinfo?(jib)

I don't know much about the navigation timing stuff unfortunately, so I can't really comment. Wouldn't surprise me much if we don't have this object in some cases, though. There's also a chance this is somehow related to the issues with resource timing and process switches (like bug 1658097).

Flags: needinfo?(nika)

(In reply to Byron Campen [:bwc] from comment #5)

Are there any routine cases where we would expect the navigation timing not to be set? If there are, the webrtc-stats spec may need to altered to reflect this.

Without repro steps it is hard to rule out, but I can't think of any. To early: wait to resolve. Too late: resolve with state cached at pc.close(). Out of scope pc reference: either never resolve or resolve with an empty RTCStatsReport.

@jib, If this is truly an exceptional case, maybe we can just have the stats query fail?

The stack shows this is the RTCPeerConnection constructor, not a stats query, so that's the first thing: we're creating this before we really need it.

FWIW typing [...(await new RTCPeerConnection().getStats()).values()][0] into web console did not crash Firefox for me, and in Chrome produces the following which I believe is to spec (using Chrome because of bug 1641015):

{id: "RTCPeerConnection", timestamp: 1597759365322.165, type: "peer-connection", dataChannelsOpened: 0, dataChannelsClosed: 0}

So we clearly need a timestamp before we can resolve the promise, but I would not fail getStats over this since that method is not designed to fail. If we want to wallpaper this, I think I'd be comfortable with returning an empty RTCStatsReport (which is what we do for the line above today) until we figure out what's causing this.

Or since the count is low, we could block on bug 1658097 and come back to it once that is fixed to see if repro steps have materialized.

Flags: needinfo?(jib)

Hi Dan, can you give a severity to this bug? Thank you!

Flags: needinfo?(dminor)
Severity: -- → S3
Flags: needinfo?(dminor)
Priority: -- → P3

The volume seems to have increased since the beginning of the year and this is currently the #17 overall top content process crash in Fx88. Can we revisit the priority?

Flags: needinfo?(jib)
Flags: needinfo?(jib)
Priority: P3 → P2

Hi Emilio, in bug 1665792 comment 6, you say "there are legit reasons for mTiming to be null." I was wondering: what are they?

In this bug we have JS calling new someWindow.RTCPeerConnection() — an API that needs mTiming in order to perform (timestamps in pc.getStats()) — but it looks like mTiming is null (comment 4).

I'm not able to reproduce it, but I kind of think this is a bug elsewhere and that we should be able to expect mTiming to be there for any JS API that runs. Thoughts? Any insight you may have would be helpful, thanks.

Flags: needinfo?(jib) → needinfo?(emilio)

So the test-case in that bug, for example, involves window.stop() etc. In that particular case nulling out the timing at that point seems wrong, but a document object can exist without a navigation timing object... I'd think that yeah, if you're running JS, you should always get a working performance and navigation timing objects, but I'm not super-familiar with the navigation timing stuff other than having looked into the bug mentioned above.

Olli, IIRC you reviewed the navigation timing stuff. It seems a bit weird / complicated the way it's hooked up. Maybe you have some insight based on the comments above?

Flags: needinfo?(emilio) → needinfo?(bugs)

I couldn't find anything obvious. Timing object handling looked quite reasonable.
Performance object is created asap when the js object for window is created
https://searchfox.org/mozilla-central/source/dom/webidl/Window.webidl#398 (notice StoreInSlot)

Of course it is marked nullable, so it may return null in some edge cases. That is why the C++ method is called GetPerformance and not Performance.
Is it possible that some code manages to access window after unlink? In that case mDoc would be null and mPerformance already cleared.

Flags: needinfo?(bugs)
Assignee: nobody → jib
Status: NEW → ASSIGNED

Looks like the culprit may have been window.print.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Attachment #9241869 - Attachment is obsolete: true
Flags: needinfo?(docfaraday)
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: