maximum file size for signing server

RESOLVED FIXED

Status

Release Engineering
General Automation
P2
normal
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: catlee, Assigned: coop)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [signing])

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

6 years ago
sometimes we have l10n jobs that go crazy and don't clean up after themselves. the result of this is that the next l10n job includes the results of the previous one, effectively doubling its size, which it then tries to get signed. the signing servers then explode trying to sign 1G .exe files.

the signing servers should support a maximum file size for the different signing formats to protect against this.
(Assignee)

Comment 1

6 years ago
(In reply to Chris AtLee [:catlee] from comment #0)
> sometimes we have l10n jobs that go crazy and don't clean up after
> themselves. the result of this is that the next l10n job includes the
> results of the previous one, effectively doubling its size, which it then
> tries to get signed. the signing servers then explode trying to sign 1G .exe
> files.

I'm guessing this caused the rash of red/purple on aurora repacks yesterday?

Is there a bug on file for fixing the cleanup step(s)? I'd be happy to tackle that part.
(Reporter)

Comment 2

6 years ago
(In reply to Chris Cooper [:coop] from comment #1)
> (In reply to Chris AtLee [:catlee] from comment #0)
> > sometimes we have l10n jobs that go crazy and don't clean up after
> > themselves. the result of this is that the next l10n job includes the
> > results of the previous one, effectively doubling its size, which it then
> > tries to get signed. the signing servers then explode trying to sign 1G .exe
> > files.
> 
> I'm guessing this caused the rash of red/purple on aurora repacks yesterday?

The initial batch of red/purple were caused by the migration of the redis VM as part of bug 734728. I mistakenly thought that signing could survive a brief redis outage. After that Axel asked if I could re-build all the failed l10n jobs, which I did.

However, there's a bug with the l10n logic somewhere where it can end up including the previous locale's package in the new package. So a machine in this state will package, say, af, and then on the next job move onto ak. The resulting setup.exe file will contain all the regular ak files, as well as firefox-14.0a1.af.win32.installer.exe. This doubles the size of the installer for each repack that's run, and you very quickly end up trying to sign files that are hundreds of megabytes large.

> Is there a bug on file for fixing the cleanup step(s)? I'd be happy to
> tackle that part.

No, I didn't file one. I'm not really sure what's going on, or how to reproduce either.
(Reporter)

Comment 3

6 years ago
also, I believe the build system eventually crashed from out of memory errors, leaving the signing scripts still running in the background trying to sign gigantic files while the slave had moved on to other builds.
Created attachment 610515 [details] [diff] [review]
allow max_filesize per format

Something like this?
Attachment #610515 - Flags: feedback?(catlee)
(Reporter)

Comment 5

6 years ago
Comment on attachment 610515 [details] [diff] [review]
allow max_filesize per format

Review of attachment 610515 [details] [diff] [review]:
-----------------------------------------------------------------

looks good!

::: release/signing/signing-server.py
@@ +356,5 @@
>          self.formats = [f.strip() for f in config.get('signing', 'formats').split(',')]
> +        self.max_filesize = dict()
> +        for f in self.formats:
> +            try:
> +                self.max_filesize = config.getint('security', 'max_filesize_%s' % f)

self.max_filesize[f] = ...

::: release/signing/signing.ini.template
@@ +27,5 @@
>  allowed_filenames = .*
>  # Minimum filesize that we'll sign
>  min_filesize = 10
> +# Maximum filesize, per format. 52428800 = 50MB
> +max_filesize_gpg = 52428800

I think this will need to be larger since we sign our source bundles with gpg.
Attachment #610515 - Flags: feedback?(catlee) → feedback+
May as well assign this to myself, I'll try to get this tested in the next week or two.
Assignee: nobody → bhearsum
Created attachment 613799 [details] [diff] [review]
small fix

I accidentally overwrote the max_filesize dict instead of just an item, oops. I also updated the maximum gpg size to be 500MB. Currently, our largest source bundle seems to be around 350MB, so I futureproofed us. I can lower it a bit if you think it's too high.
Attachment #610515 - Attachment is obsolete: true
Attachment #613799 - Flags: review?(catlee)
Created attachment 613802 [details] [diff] [review]
add max filesize, and a bonus --reload

needs testing still, but should resolve bug 723812 too.
Comment on attachment 613802 [details] [diff] [review]
add max filesize, and a bonus --reload

I tested this on signing3 and the mac signing machines, and it worked fine.
Attachment #613802 - Flags: review?(catlee)
(Reporter)

Updated

6 years ago
Attachment #613799 - Flags: review?(catlee) → review+
Created attachment 613969 [details] [diff] [review]
add an onlyif, so that we don't try to reload a server that isn't running
Attachment #613969 - Flags: review?(catlee)
(Reporter)

Updated

6 years ago
Attachment #613969 - Flags: review?(catlee) → review+
(Reporter)

Updated

6 years ago
Attachment #613802 - Flags: review?(catlee)
Attachment #613802 - Attachment is obsolete: true
Comment on attachment 613799 [details] [diff] [review]
small fix

I updated the SIGNING_SERVER tag, too.
Attachment #613799 - Flags: checked-in+
Comment on attachment 613969 [details] [diff] [review]
add an onlyif, so that we don't try to reload a server that isn't running

After landing the tools patch and manually updating those checkouts, I updated the puppet master with this patch and verified by watching /var/log/messages for success.
Attachment #613969 - Flags: checked-in+
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Benjamin is hitting this on Try, where he's creating 60-135mb tarballs and is failing packaging.  We're unable to grab the files to inspect because the machine picks up a new build and wipes it.

I'm currently respinning and gracefulled/slavealloc-disabled the slave so I can inspect post-build.  We may want to consider increasing the 50mb limit and fixing the l10n jobs to not do that.

Updated

5 years ago
Blocks: 773496
(Assignee)

Comment 14

5 years ago
Need to bump the max size for the XR sdk for 21.0b1. We should be able to bump this back down for the 22.0 beta cycle once bug 812105 merges into mozilla-beta.
Assignee: bhearsum → coop
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 15

5 years ago
Created attachment 732597 [details] [diff] [review]
Bump gpg limit to 550MB to accommodate XR linux sdk

Ben: will this get deployed automatically to the signing servers once landed? I think that's what your mentioned in channel today.
Attachment #732597 - Flags: review?(bhearsum)
(In reply to Chris Cooper [:coop] from comment #15)
> Created attachment 732597 [details] [diff] [review]
> Bump gpg limit to 550MB to accommodate XR linux sdk
> 
> Ben: will this get deployed automatically to the signing servers once
> landed? I think that's what your mentioned in channel today.

It will, but why are we doing this when there's a known fix? That should be backported instead IMO. The large packages are bad for our systems and worse for users. I commented in bug 812105 to that affect.
Comment on attachment 732597 [details] [diff] [review]
Bump gpg limit to 550MB to accommodate XR linux sdk

Review of attachment 732597 [details] [diff] [review]:
-----------------------------------------------------------------

Come to think of it, we should probably take this now anyways, so we can ship some sort of SDK with beta 1. We can back it out after bug 812105 is backported or the next uplift, whichever comes first.
Attachment #732597 - Flags: review?(bhearsum) → review+
(Assignee)

Comment 18

5 years ago
Comment on attachment 732597 [details] [diff] [review]
Bump gpg limit to 550MB to accommodate XR linux sdk

https://hg.mozilla.org/build/tools/rev/d1312fd9d4a1
Attachment #732597 - Flags: checked-in+
(Assignee)

Comment 19

5 years ago
https://hg.mozilla.org/build/puppet-manifests/rev/8f16f9569378
(Assignee)

Comment 20

5 years ago
Linux XULRunner sdk for 21.0b1 is now signed.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago5 years ago
Resolution: --- → FIXED
Comment on attachment 732597 [details] [diff] [review]
Bump gpg limit to 550MB to accommodate XR linux sdk

Sizes are back to normal:
	xulrunner-21.0b1.en-US.linux-i686.sdk.tar.bz2	03-Apr-2013 11:25 	502M	 

vs.

xulrunner-21.0b2.en-US.linux-i686.sdk.tar.bz2	08-Apr-2013 17:54 	53M

I've backed out the latest size limit.

https://hg.mozilla.org/build/tools/rev/46307fbad179
https://hg.mozilla.org/build/puppet-manifests/rev/be256774eac5
Attachment #732597 - Flags: checked-in+ → checked-in-
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.