stage refresh causing problems for nightly builds

RESOLVED INVALID

Status

RESOLVED INVALID
4 years ago
4 years ago

People

(Reporter: rhelmer, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

4 years ago
crash-stats staging symbol upload is being tested by Firefox nightlies right now, and when crash-stats staging DB is down symbol upload doesn't work.
(Reporter)

Comment 1

4 years ago
How long does the stage DB refresh take, can we make the downtime less? pir and philor say it's been down for hours this morning, is that normal?
Flags: needinfo?(mpressman)
(Reporter)

Comment 2

4 years ago
Can we make symbol upload more robust in the face of errors like postgres being unreachable? I know we need postgres for auth, so maybe we can't get around that easily... any thoughts?
Flags: needinfo?(peterbe)
(Reporter)

Comment 3

4 years ago
This is one downside to using crash-stats stage in a way that may break the tree :) Are we satisfied enough with testing here to make this production soon? If not we should take more care not to disable staging until this is ready.
Flags: needinfo?(ted)
I confess I haven't actually looked at the results. If you are happy with how things have been uploaded then I'm fine switching it over to prod. (It's a 1-line change.)
Flags: needinfo?(ted)
I put a patch to that effect in bug 1085557.
(In reply to Robert Helmer [:rhelmer] from comment #2)
> Can we make symbol upload more robust in the face of errors like postgres
> being unreachable? I know we need postgres for auth, so maybe we can't get
> around that easily... any thoughts?

We depend on postgres for auth and sessions but naturally we also record every upload as a record so we can't make the auth be entirely memcache or something like that anyway.
Flags: needinfo?(peterbe)
I can't say for certain how long it takes. I would say hours is normal in that it needs to copy over the entirety of the current prod set. Depending on when the last purge was done to raw_crashes. In this case, it was done just last week and the amount of data is still at 1.2TB.
Flags: needinfo?(mpressman)
Wait! Isn't this bug moot now. Didn't we stop depending on stage Ted?
Resolve->Invalid?
(Reporter)

Comment 9

4 years ago
(In reply to Peter Bengtsson [:peterbe] from comment #8)
> Wait! Isn't this bug moot now. Didn't we stop depending on stage Ted?
> Resolve->Invalid?

Yep agreed, this was a temporary thing and not an ideal situation (to break the nightly-builds tree if crash-stats stage is down), but we didn't have better options due to the way Firefox nightly builds work right now.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.