Closed Bug 1304009 Opened 8 years ago Closed 8 years ago

Intermittent Assertion failure: isValid, at TelemetryHistogram.cpp:2190 | application crashed [@ TelemetryHistogram::AccumulateChild]

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1304519
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 - affected

People

(Reporter: intermittent-bug-filer, Assigned: chutten)

References

Details

(Keywords: assertion, intermittent-failure, Whiteboard: [measurement:client])

maybe related to bug 1218576 ?
Flags: needinfo?(chutten)
Keywords: assertion
Definitely related to bug 1218576.

Here is the failure:
> 05:14:49     INFO -  Assertion failure: isValid, at c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/src/toolkit/components/telemetry/TelemetryHistogram.cpp:2190
> 05:15:07     INFO -  #01: mozilla::dom::ContentParent::RecvAccumulateChildHistogram(nsTArray<mozilla::Telemetry::Accumulation> &&) [dom/ipc/ContentParent.cpp:5442]
> 05:15:07 INFO - #02: mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const &) [obj-firefox/ipc/ipdl/PContentParent.cpp:6999] 

This is the assert here:
https://hg.mozilla.org/integration/mozilla-inbound/file/42a77283ee64b4528b054104fa75f90fbbcfb515/toolkit/components/telemetry/TelemetryHistogram.cpp#l2190
... which means that the parent process received an accumulation from the child process that has an invalid histogram id.

This is odd, because the latest patches for bug ... introduced a id validity check that runs in the child too:
https://hg.mozilla.org/integration/mozilla-inbound/file/42a77283ee64b4528b054104fa75f90fbbcfb515/toolkit/components/telemetry/TelemetryHistogram.cpp#l1355
https://hg.mozilla.org/integration/mozilla-inbound/file/42a77283ee64b4528b054104fa75f90fbbcfb515/toolkit/components/telemetry/TelemetryHistogram.cpp#l1372

Possibilities i can think of:
* data corruption on the wire in IPC
* some bug corrupting the ID in Telemetry on the child side, before sending it up
* ...?

Chris, what do you think?
Blocks: 1218576
Priority: -- → P1
Whiteboard: [measurement:client]
I wonder if the chrome and content process builds could get out of sync in the CI infrastructure.
If that happened between revisions that changed the number of histograms, it would be possible to have valid ids in the child that are not valid in the parent.

It's also interesting that we keep having this issue specifically with this dom/media test, there must be something special about it (media/webrtc code doing bad things or problematic test setup?).
:gfritzsche nailed it, there's asserts on both sides of the pipe, and only the destination (parent) side of them is being tripped here.

The test failing in the treeherder link is dom/media/tests/mochitest/test_peerConnection_bug1013809.html

It is a very small test. Here it is repeated in full:
<!DOCTYPE HTML>
<html>
<head>
  <script type="application/javascript" src="pc.js"></script>
</head>
<body>
<pre id="test">
<script type="application/javascript">
  createHTML({
    bug: "1013809",
    title: "Audio-only peer connection with swapped setLocal and setRemote steps"
  });

  var test;
  runNetworkTest(function (options) {
    test = new PeerConnectionTest(options);
    var sld = test.chain.remove("PC_REMOTE_SET_LOCAL_DESCRIPTION");
    test.chain.insertAfter("PC_LOCAL_SET_REMOTE_DESCRIPTION", sld);
    test.setMediaConstraints([{audio: true}], [{audio: true}]);
    test.run();
  });
</script>
</pre>
</body>
</html>

There's nothing telemetry-related at all sticking out from this test in particular. Nor from its "pc.js" include (at 1800 lines, I'll not paste it).

I'll set up another local debug build and see if I can loop enough to reproduce it locally.
Flags: needinfo?(chutten)
From the log entries (and considering the async/off-main-thread operations in WebRTC/media code), this might also be triggered from the preceding test.
Summary: Intermittent Assertion failure: isValid, at c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/src/toolkit/components/telemetry/TelemetryHistogram.cpp:2190 → Intermittent Assertion failure: isValid, at TelemetryHistogram.cpp:2190 | application crashed [@ TelemetryHistogram::AccumulateChild]
[Tracking Requested - why for this release]:
This is a regression from bug 1218576, with potential stability or data correctness implications if this is also occuring in normal usage.
Assignee: nobody → chutten
Tracking 52+ for the reason George notes in Comment 10.
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1304009 shows reports dying away immediately as bug 1304519's fix landed.

This evidence is supportive of my hypothesis that this isValid assertfail shares the same race rootcause as bug 1304519, so I'll dupe to it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
This is tracked in the dupe bug, adjusting the tracking flag.
You need to log in before you can comment on or make changes to this bug.