Closed Bug 1332821 Opened 6 years ago Closed 5 years ago

mochitest-browser-chrome-screenshots jobs with screenshots don't get marked as completed

Categories

(Testing :: mozscreenshots, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: MattN, Unassigned)

References

()

Details

The problem seemed to start after bug 1329262 landed which added 60 more captured screenshot artifacts. Note that this job only captures on "Nightly" builds (not regular m-c builds) so if you see green builds since bug 1329262 landed on m-c (rev: fe22af79bacf) those are probably non-Nightly builds with no screenshot artifacts captured.

I'm guessing there is a problem ingesting so many screenshot artifacts. Maybe a request or database columns size limit?

I think I'll back bug 1329262 out for now and hopefully there are server logs somewhere that will pinpoint the issue.
Blocks: 1329262
I don't think this is a treeherder issue, but a buildbot one. Coop, could you get someone to investigate (ordinarily I would ask :catlee, but he seems to be away)?
Flags: needinfo?(coop)
I'll check the database, but from a casual check of the running and pending interfaces I don't think these are actually scheduled.

I'm curious how the extra runs were scheduled in the first place. Did someone retrigger?
Flags: needinfo?(coop)
(In reply to Chris Cooper [:coop] from comment #3)
> I'll check the database, but from a casual check of the running and pending
> interfaces I don't think these are actually scheduled.

BuildAPI shows them as completed and the logs were uploaded to archive.mozilla.org. See [2] in comment 0.

> I'm curious how the extra runs were scheduled in the first place. Did
> someone retrigger?

There are three ss for each of the affected platforms (from left to right):
1) Regular PGO/opt scheduling in buildbot. These are green and completed because screenshots (at the time of push) don't get captured if the update channel isn't Nightly. Regular PGO/OPT use "default" as the channel. This is done in the code of the tests themselves since there wasn't a way to only schedule the jobs on Nightlies via BB.
2) These were scheduled for the Nightly builds that got triggered on this push. Since the update channel is "nightly" this job was expected to generate dozens of png artifacts using blobber. In [2] you see that the images were successfully captured and uploaded but TH didn't get told about the finished job.
3) These were schedule by me re-triggering #2 jobs when I saw that it wasn't finishing to see if it was an intermittent infra issue or whether it's a permanent issue. After seeing them also not complete in the expected time I filed this bug.
(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #1) 
> I'm guessing there is a problem ingesting so many screenshot artifacts.
> Maybe a request or database columns size limit?

Looking at the job output directly on the buildbot-master, I see the following:

blobber_files 	{"20170120030214-primaryUI_099_tabsOutsideTitlebar_fiveTabs_maximized_allToolbars_compactLight.png": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/f46de60815d0077800e24b39ba8064503d79cb3a4be7b2c619d6a4a96bcefa3e118aadcab2b4af40bd35d2968764c4d82ebd811246bbe4b561e5386d679f3969", "20170120030214-controlCenter_011_noLWT_mixedPassive.png": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/0484f32a58dc9e3dedae5682d4c4ce80cf93b40475931418d7b182bd0b9 .. [property value too long]

That value is going into a TEXT field which by default in MySQL holds 65,535 chars. I checked the complete blobber_files prop string, and it's 187,866 chars.

We're going to have to rethink how we do this. I would suggest figuring out a way to make this work in TaskCluster vs expending effort in buildbot.
(In reply to Chris Cooper [:coop] from comment #5)
> (In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from
> comment #1) 
> > I'm guessing there is a problem ingesting so many screenshot artifacts.
> > Maybe a request or database columns size limit?
> 
> Looking at the job output directly on the buildbot-master, I see the
> following:
> 
> blobber_files 
> {"20170120030214-
> primaryUI_099_tabsOutsideTitlebar_fiveTabs_maximized_allToolbars_compactLight
> .png":
> "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/
> f46de60815d0077800e24b39ba8064503d79cb3a4be7b2c619d6a4a96bcefa3e118aadcab2b4a
> f40bd35d2968764c4d82ebd811246bbe4b561e5386d679f3969",
> "20170120030214-controlCenter_011_noLWT_mixedPassive.png":
> "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/
> 0484f32a58dc9e3dedae5682d4c4ce80cf93b40475931418d7b182bd0b9 .. [property
> value too long]
> 
> That value is going into a TEXT field which by default in MySQL holds 65,535
> chars. I checked the complete blobber_files prop string, and it's 187,866
> chars.

Thanks for investigating Chris! I figured some limit was getting hit.

> We're going to have to rethink how we do this. I would suggest figuring out
> a way to make this work in TaskCluster vs expending effort in buildbot.

OK, I just landed a patch for linux64 builds to use TaskCluster in bug 1332727 but the problem is that I need this job to run on every different OS configuration (where the UI differs) and we don't have full OS support yet AFAIK.

It seems like I should switch to TC as much as possible but for remaining OSs I will have to split the job into two parts I guess if there's no simple fix in BB.
Component: Treeherder: Data Ingestion → General
Product: Tree Management → Firefox
Version: --- → unspecified
Component: General → mozscreenshots
Product: Firefox → Testing
We're switched to TC and will disable BB in bug 1411811.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.