Open Bug 1541033 Opened 5 months ago Updated 4 months ago

missing windows/nightly data

Categories

(Cloud Services :: Mission Control, defect)

defect
Not set

Tracking

(Not tracked)

REOPENED

People

(Reporter: jcristau, Unassigned)

References

Details

https://missioncontrol.telemetry.mozilla.org/#/nightly/windows currently has content crashes but nothing else.

The dev instance at https://data-missioncontrol.dev.mozaws.net/#/nightly/windows does have data.

Duplicate of this bug: 1541036

Checking new relic, I think database updates are timing out, seemingly in the phase when we're calculating rates.

Things to do:

  1. Set higher thresholds. Currently the soft timeout (which is really the limit for how long things can take despite its name) is set at 10 minutes.
  2. If we're failing here, it probably indicates that we have a db query that's not hitting an index and taking a long time. This might be hard to fix in a robust way, but maybe we could consider expiring old data (> 6 months). I'll investigate this.

I just noticed this is happening again with Win nightly, see https://missioncontrol.telemetry.mozilla.org/#/?channel=nightly. We can use the dev instance for channel tomorrow if need be.

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #3)

I just noticed this is happening again with Win nightly, see https://missioncontrol.telemetry.mozilla.org/#/?channel=nightly. We can use the dev instance for channel tomorrow if need be.

Yeah, sorry, I've been working on this the whole time and found a big pile of other issues with mission control which needed to be fixed. The dev instance should be more reliable for now (I expired a bunch of older data on it, so there should be few if any timeouts there).

https://github.com/mozilla/missioncontrol/commit/95b2cf473361b0ffecc74f5be59df34cd770d6d5 should fix this when it's applied to production (I'll do a deploy tomorrow, after we've verified bug 1542820 is fixed).

Status: NEW → RESOLVED
Closed: 5 months ago
Depends on: 1542820
Resolution: --- → FIXED

(In reply to William Lachance (:wlach) (use needinfo!) from comment #5)

https://github.com/mozilla/missioncontrol/commit/95b2cf473361b0ffecc74f5be59df34cd770d6d5 should fix this when it's applied to production (I'll do a deploy tomorrow, after we've verified bug 1542820 is fixed).

Filed bug 1543541 to get this deployed.

Looks like we might need some manual expiry to get things working again. Bug 1544444

Depends on: 1541036
Depends on: 1544444
No longer depends on: 1541036
Duplicate of this bug: 1544577

This is still not completely fixed -- things still appear to be timing out when doing summary calculations even with the data expiry. Filed bug 1544801 to deploy a release with increased timeouts, which will hopefully help.

Status: RESOLVED → REOPENED
Depends on: 1544801
Resolution: FIXED → ---

I think the 1.10 update should finally fix this, once deployed.

You need to log in before you can comment on or make changes to this bug.