Closed Bug 1537710 Opened 5 years ago Closed 5 years ago

nightlies based on same revision end up adding two entry points for the same file in Balrog blobs

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect, P1)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1579415

People

(Reporter: mtabara, Unassigned)

References

Details

(Whiteboard: [releaseduty])

Somehow related to bug 1501167, we've hit something interesting earlier today in central.

Mozilla-central was closed on Wednesday, 20 March for most of the day so the 10:00am and 10:00pm UTC nightlies were based on the same revision. Since they were based on the same revision, they shared the same decision task, hence the same buildid which comes from parameters.yml's moz_build_date.

This ended up in balrog with having two urls for the same file (size and hash different):

     "completes": [
            {
              "fileUrl": "https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar",
              "filesize": 58334587,
              "from": "*",
              "hashValue": "7c46b0cce414b9ca0e757a71fc3a8d9e80ba1cdb33be6caa7fe321995fc2daac875bb8a4317c41915cdd5b19bd3dff74b945f35a6aa8b110856bbcf9414c2c90"
            },
            {
              "fileUrl": "https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar",
              "filesize": 58328179,
              "from": "*",
              "hashValue": "de545dc946ea19c84fedbbfec6534f01312ed28cf7559637938440438987bb7ad778a72283359ef0db85377eb7c2da9306948308b921e2b12c62f3773dc5d0cb"
            }

I suppose this confused Balrog and we've seen nightlies fail to update with:

AUS:SVC Downloader:_selectPatch - found existing patch with state: null
AUS:SVC Downloader:downloadUpdate - url: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, path: /Users/shawn/Library/Caches/Mozilla/updates/Applications/Firefox Nightly/updates/0/update.mar, interval: 0
AUS:SVC Downloader:onStartRequest - original URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, final URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar
AUS:SVC Downloader:onProgress - progress: 30864/58328179
AUS:SVC Downloader:onProgress - maxProgress: 58328179 is not equal to expected patch size: 58334587
AUS:SVC Downloader: cancel
AUS:SVC Downloader:onProgress - progress: 0/58328179
AUS:SVC Downloader:onProgress - maxProgress: 58328179 is not equal to expected patch size: 58334587
AUS:SVC Downloader: cancel
AUS:SVC Downloader:onStopRequest - original URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, final URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, status: 2147549183
AUS:SVC Downloader:onStopRequest - status: 2147549183, current fail: 0, max fail: 10, retryTimeout: 2000
AUS:SVC Downloader:onStopRequest - non-verification failure
AUS:SVC getStatusTextFromCode - transfer error: 失敗 (不明原因), default code: 2152398849
AUS:SVC Downloader:onStopRequest - setting state to: download-failed
AUS:SVC Downloader:onStopRequest - notifying observers of error. topic: update-error, status: download-attempts-exceeded, downloadAttempts: 23 maxAttempts: 2
AUS:SVC UpdateManager:_writeUpdatesToXMLFile - no updates to write. removing file: /Users/shawn/Library/Caches/Mozilla/updates/Applications/Firefox Nightly/active-update.xml
UTM:SVC TimerManager:registerTimer - id: telemetry_modules_ping

Temp solution to unblock this: we've frozen nightlies to previsouly known good buildid - 20190319215514

Whiteboard: [releaseduty]

:RyanVM raised this during channel meeting today and had indicated we haven't had this problem in the BB days. We should amend our logic to prevent scheduling the same nightly graph if an already existing one is there based on that particular revision.

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)

:RyanVM raised this during channel meeting today and had indicated we haven't had this problem in the BB days. We should amend our logic to prevent scheduling the same nightly graph if an already existing one is there based on that particular revision.

That might be tricky while nightlies are a hook creating a new graph, but might be doable.
I think it'll become easier when we're triggering promotion on shippable builds.

callek, should we block this on Nightly Promotion project? We plan to do this in Q2 after shippable is done. Is there a placeholder bug for that yet?

Flags: needinfo?(bugspam.Callek)

(In reply to Aki Sasaki [:aki] from comment #3)

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)

:RyanVM raised this during channel meeting today and had indicated we haven't had this problem in the BB days. We should amend our logic to prevent scheduling the same nightly graph if an already existing one is there based on that particular revision.

That might be tricky while nightlies are a hook creating a new graph, but might be doable.
I think it'll become easier when we're triggering promotion on shippable builds.

Could we have the hook logic check something in the index to determine if we've already done nightlies for a given revision?

This was discussed briefly at channel meeting - We can have something in index check if nightlies were already run, but it would be cludgy and harder to validate (e.g. do we want to prevent triggering windows again on a given rev if all linux finished?)

That said, there will be work in Q2 that should solve that aspect as well, which is Nightly Promotion. I say we wait for that unless this becomes more of a prevalent problem.

Flags: needinfo?(bugspam.Callek)
Priority: -- → P1

This happened again.

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #8)

This happened again.

Turns out we had two nightlies triggered last night, based on the same revision.
It's https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=nightly&revision=b283a7ef186c216d765631f6cb1260a3fa2ee42c

At 2019-08-15T19:45:20.912Z, the first decision task was triggered - https://tools.taskcluster.net/groups/c9bVskeaT5Os0Uy7IJ7ZxA/tasks/c9bVskeaT5Os0Uy7IJ7ZxA/details

At 2019-08-15T22:00:26.923Z, the second decision task was triggered - https://tools.taskcluster.net/groups/dlzpxOqnTxGRm5l0aBLkxw/tasks/dlzpxOqnTxGRm5l0aBLkxw/details

Since the revision 283a7ef186cCNMerge was a "inbound to mozilla-central. a=merge", I'm tempted to believe that someone manually triggered the nightlies. Then, 2h15min later, the cron job automatically triggered and since there was no other revision pushed in between, it triggered it again, based on the same revision.

Your theory is correct. Sheriffs had to trigger Nightlies earlier (to ship the backout).

Blocks: 1579125

Duplicating this against bug 1579415 where we'll track the first solution to prevent triggering it if another graph already exists.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.