Closed Bug 1311410 Opened 8 years ago Closed 8 years ago

Linux64 nightlies permafailing due to symbol upload failure (gmake[4]: *** [uploadsymbols] Error 1)

Categories

(Release Engineering :: Release Automation: Other, defect)

defect
Not set
critical

Tracking

(firefox50 unaffected, firefox51 unaffected, firefox52 fixed)

VERIFIED FIXED
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- fixed

People

(Reporter: pascalc, Assigned: peterbe)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Looks like symbol upload is busted:

https://treeherder.mozilla.org/logviewer.html#?job_id=5323535&repo=mozilla-central#L53649

 02:30:19     INFO -  Uploading symbol file "dist/firefox-52.0a1.en-US.linux-x86_64.crashreporter-symbols-full.zip" to "https://crash-stats.mozilla.com/symbols/upload"
 02:30:19     INFO -  Attempt 1 of 5...
 02:30:27     INFO -  /builds/slave/m-cen-l64-ntly-000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
 02:30:27     INFO -    SNIMissingWarning
 02:30:27     INFO -  /builds/slave/m-cen-l64-ntly-000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
 02:30:27     INFO -    InsecurePlatformWarning
 02:30:27     INFO -  Error: ('Connection aborted.', error(32, 'Broken pipe'))
 02:30:27     INFO -  Retrying...
 02:30:38     INFO -  Attempt 2 of 5...
 02:30:51     INFO -  /builds/slave/m-cen-l64-ntly-000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
 02:30:51     INFO -    InsecurePlatformWarning
 02:30:52     INFO -  Error: ('Connection aborted.', error(32, 'Broken pipe'))
 02:30:52     INFO -  Retrying...
 02:31:09     INFO -  Attempt 3 of 5...
 02:31:11     INFO -  /builds/slave/m-cen-l64-ntly-000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
 02:31:11     INFO -    InsecurePlatformWarning
 02:31:12     INFO -  Error: ('Connection aborted.', error(32, 'Broken pipe'))
 02:31:12     INFO -  Retrying...
 02:31:38     INFO -  Attempt 4 of 5...
 02:31:40     INFO -  /builds/slave/m-cen-l64-ntly-000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
 02:31:40     INFO -    InsecurePlatformWarning
 02:31:41     INFO -  Error: ('Connection aborted.', error(32, 'Broken pipe'))
 02:31:41     INFO -  Retrying...
 02:32:21     INFO -  Attempt 5 of 5...
 02:32:23     INFO -  /builds/slave/m-cen-l64-ntly-000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
 02:32:23     INFO -    InsecurePlatformWarning
 02:32:23     INFO -  Error: ('Connection aborted.', error(32, 'Broken pipe'))
 02:32:23     INFO -  Retrying...
02:32:23 INFO - Maximum retries hit, giving up!

Ted, any ideas?
Flags: needinfo?(ted)
Apparantly my c#2 was incorrect and that was a failure on linux32 for tuesdays nightly. and c#1 is actually reflective of what the 2nd day in a row failure was. Sorry for the noise.
peterbe maintains the Socorro API that we upload to. Peter: looks like we're having trouble reliably uploading symbols.

I don't know if that SNIMissingWarning is a real problem or just log spam.
Flags: needinfo?(ted) → needinfo?(peterbe)
I was looking at an OS X build log (for unrelated reasons) and it shows:
09:30:18     INFO -  Uploading symbol file "dist/firefox-50.0.en-US.mac.crashreporter-symbols-full.zip" to "https://crash-stats.mozilla.com/symbols/upload"
09:30:18     INFO -  Attempt 1 of 5...
09:30:18     INFO -  /builds/slave/m-beta-m64-0000000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
09:30:18     INFO -    SNIMissingWarning
09:30:18     INFO -  /builds/slave/m-beta-m64-0000000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
09:30:18     INFO -    InsecurePlatformWarning
09:30:33     INFO -  Uploaded successfully!

...so I don't think the SNIMissingWarning is a problem.
When did this start to be a problem?
We haven't seen any errors inside Django related to this. 

A few short weeks ago we upgraded our webheads to use the latest version of Python. But if it's a problem at that level we should see exceptions bubble up and get my attention. 

Those kinds of errors, from requests.post presumably, usually happen on the client when it's unable to talk SSL to a remote server. Can you check, inside that taskcluster job or something I guess, if you can do something trivial like...

>>> import requests
>>> requests.get('https://crash-stats.mozilla.com')
>>> requests.get('https://github.com')

Perhaps JP has some ideas too.
Flags: needinfo?(peterbe)
(In reply to Peter Bengtsson [:peterbe] from comment #6)
> When did this start to be a problem?

I've had reliable daily updates for quite a while (the most recent break was an intentional close). But as with comment 0 I'm stuck on the build from Monday the 17th.
(In reply to Peter Bengtsson [:peterbe] from comment #6)
> Those kinds of errors, from requests.post presumably, usually happen on the
> client when it's unable to talk SSL to a remote server. Can you check,
> inside that taskcluster job or something I guess, if you can do something
> trivial like...
> 
> >>> import requests
> >>> requests.get('https://crash-stats.mozilla.com')
> >>> requests.get('https://github.com')

Even easier: we do Linux32 nightlies on the exact same pool of AWS instances, so it's possible to see that the exact same requests.post works from the exact same set of instances, as long as it isn't posting Linux64 symbols or isn't using the Auth-Token that the Linux64 nightlies use, if that invisible thing could possibly be different from the Linux32 nightlies.
(In reply to Phil Ringnalda (:philor) from comment #8)
> (In reply to Peter Bengtsson [:peterbe] from comment #6)
> > Those kinds of errors, from requests.post presumably, usually happen on the
> > client when it's unable to talk SSL to a remote server. Can you check,
> > inside that taskcluster job or something I guess, if you can do something
> > trivial like...
> > 
> > >>> import requests
> > >>> requests.get('https://crash-stats.mozilla.com')
> > >>> requests.get('https://github.com')
> 
> Even easier: we do Linux32 nightlies on the exact same pool of AWS
> instances, so it's possible to see that the exact same requests.post works
> from the exact same set of instances, as long as it isn't posting Linux64
> symbols or isn't using the Auth-Token that the Linux64 nightlies use, if
> that invisible thing could possibly be different from the Linux32 nightlies.

So what we need is to, from same python environment too, to do those tests I pointed out in a python shell. 

Who can do that?
Burned today's nightly again. I retriggered, though I'm not sure how useful that'll actually be.
Severity: normal → critical
Summary: No 64bits Nighly Linux builds in the last 2 days → Linux64 nightlies failing due to symbol upload failure (gmake[4]: *** [uploadsymbols] Error 1)
Summary: Linux64 nightlies failing due to symbol upload failure (gmake[4]: *** [uploadsymbols] Error 1) → Linux64 nightlies permafailing due to symbol upload failure (gmake[4]: *** [uploadsymbols] Error 1)
can we get this fixed as soon as possible ?
Sooo.. this is failing on linux64 on `date` as well, while `Linux32` on date is not (or at least no log output from it trying)

(linux32 opt seems to have `export MOZ_AUTOMATION_UPLOAD_SYMBOLS=0`)

linux32 nightly uses the same environment (afaict) as linux64 but is passing

10:51:47     INFO -  /builds/slave/m-cen-lx-ntly-0000000000000000/build/src/obj-firefox/_virtualenv/bin/python -u /builds/slave/m-cen-lx-ntly-0000000000000000/build/src/toolkit/crashreporter/tools/upload_symbols.py 'dist/firefox-52.0a1.en-US.linux-i686.crashreporter-symbols-full.zip'
10:51:47     INFO -  Uploading symbol file "dist/firefox-52.0a1.en-US.linux-i686.crashreporter-symbols-full.zip" to "https://crash-stats.mozilla.com/symbols/upload"
10:51:47     INFO -  Attempt 1 of 5...
10:51:55     INFO -  /builds/slave/m-cen-lx-ntly-0000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
10:51:55     INFO -    SNIMissingWarning
10:51:55     INFO -  /builds/slave/m-cen-lx-ntly-0000000000000000/build/src/python/requests/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
10:51:55     INFO -    InsecurePlatformWarning
10:53:48     INFO -  Uploaded successfully!
What is the difference between the .zip files uploaded for Linux32 compared to Linux64?
In particular, what's the difference in terms of file size (mean, median, std)?
Also, can we isolate exactly which types of builds work and don't work. 
E.g. a tally like this:

* (ENV: SUCCESS RATE)
* linux32: 100%
* linux64: 0%
* win32: ???
* win64: ???
* android: ??
* osx: ???
So it looks like we have giant symbol archives now:
-rw-rw-r--   1 cltbld mock_mozilla 577K Oct 20 04:27 mozharness.zip
-rw-rw-r--   1 cltbld mock_mozilla  14M Oct 20 05:31 firefox-52.0a1.en-US.linux-x86_64.talos.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla  17M Oct 20 04:27 jsshell-linux-x86_64.zip
-rw-rw-r--   1 cltbld mock_mozilla  22M Oct 20 05:31 firefox-52.0a1.en-US.linux-x86_64.common.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla  31M Oct 20 05:31 firefox-52.0a1.en-US.linux-x86_64.xpcshell.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla  37M Oct 20 05:31 firefox-52.0a1.en-US.linux-x86_64.reftest.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla  40M Oct 20 05:31 firefox-52.0a1.en-US.linux-x86_64.web-platform.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla  69M Oct 20 05:31 firefox-52.0a1.en-US.linux-x86_64.mochitest.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla  76M Oct 20 05:31 firefox-52.0a1.en-US.linux-x86_64.cppunittest.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla 161M Oct 20 05:35 firefox-52.0a1.en-US.linux-x86_64.crashreporter-symbols.zip
-rw-rw-r--   1 cltbld mock_mozilla 461M Oct 20 05:32 firefox-52.0a1.en-US.linux-x86_64.gtest.tests.zip
-rw-rw-r--   1 cltbld mock_mozilla 1.1G Oct 20 05:36 firefox-52.0a1.en-US.linux-x86_64.crashreporter-symbols-full.zip
Largest 20 files in the symbols archive:
   677172  10-20-2016 05:33   test_sanity/686B4218CB76625CADDEA23C57A57FD80/test_sanity.sym
   677752  10-20-2016 05:34   libfreeblpriv3.so/D7ABBBE635B7399EBFD2A8B4C7A898A10/libfreeblpriv3.so.sym
   679698  10-20-2016 05:33   test_audio/F96EFD1C3F68C998B8D07D6744147EAE0/test_audio.sym
   985252  10-20-2016 05:33   plugin-container/AC6031663BA6D9F04EE7F4E2AAEA60880/plugin-container.sym
  1148934  10-20-2016 05:33   minidump-analyzer/F06B7FBC49288762015F12527C8C59E10/minidump-analyzer.sym
  1892684  10-20-2016 05:33   libnss3.so/A1F901E2FE918CEE531A64C22B5D91520/libnss3.so.sym
  1943859  10-20-2016 05:33   libmozsqlite3.so/725A8D36801E252CFC4FB861B2122E4A0/libmozsqlite3.so.sym
  2409834  10-20-2016 05:33   libmozavcodec.so/58CE644CC3926AF1CF30294F38798B920/libmozavcodec.so.sym
 28389552  10-20-2016 05:32   gdb-tests/1D56408788E4C0B2294051436DC6E0E20/gdb-tests.sym
 28768457  10-20-2016 05:32   js/0C1030CDD7E1129439F8470CDE111A120/js.sym
 29594602  10-20-2016 05:33   sdp_file_parser/322D2A9BD85F3365C9DFE479119402E10/sdp_file_parser.sym
 29675060  10-20-2016 05:33   mediaconduit_unittests/954A804E3D3E868658F0FB606D1BBAB40/mediaconduit_unittests.sym
 29738552  10-20-2016 05:33   mediapipeline_unittest/3C0AFFBD1424FC3E5787771F23AF8D5A0/mediapipeline_unittest.sym
 29855259  10-20-2016 05:33   jsep_track_unittest/7765F85FFCBD6C4E6777A4B6780102BF0/jsep_track_unittest.sym
 30510171  10-20-2016 05:33   jsep_session_unittest/F242130ED2B4AE96B14256035A59CE180/jsep_session_unittest.sym
 30510931  10-20-2016 05:32   jsapi-tests/FF8F6DBC474E5202716D270EA5C3C2620/jsapi-tests.sym
 30609126  10-20-2016 05:33   signaling_unittests/71BF54A0639C49664D053F5D181C32B80/signaling_unittests.sym
 31284667  10-20-2016 05:32   sdp_unittests/DA98C363AA9B1926B20C0A81FDDAD7390/sdp_unittests.sym
197947372  10-20-2016 05:33   libxul.so/8390927E37E99CA335B099F7432858080/libxul.so.sym
218804976  10-20-2016 05:33   libxul.so/7DC2FE813271C517027AF7F84139CB520/libxul.so.sym
sorry, that was for the non-full package. Largest 20 files in the full symbols archive are:
   850652  10-20-2016 05:33   OCSPStaplingServer/AC5C8148A1A44ECE10F22FC634EEB7F10/OCSPStaplingServer.dbg.gz
   852766  10-20-2016 05:33   GenerateOCSPResponse/BA61702D38EEE2AB38106BDEA00189790/GenerateOCSPResponse.dbg.gz
   985252  10-20-2016 05:33   plugin-container/AC6031663BA6D9F04EE7F4E2AAEA60880/plugin-container.sym
  1148934  10-20-2016 05:33   minidump-analyzer/F06B7FBC49288762015F12527C8C59E10/minidump-analyzer.sym
  1345262  10-20-2016 05:33   libmozsqlite3.so/725A8D36801E252CFC4FB861B2122E4A0/libmozsqlite3.so.dbg.gz
  1472286  10-20-2016 05:33   plugin-container/AC6031663BA6D9F04EE7F4E2AAEA60880/plugin-container.dbg.gz
  1524568  10-20-2016 05:33   libmozavcodec.so/58CE644CC3926AF1CF30294F38798B920/libmozavcodec.so.dbg.gz
  1663844  10-20-2016 05:33   libnss3.so/A1F901E2FE918CEE531A64C22B5D91520/libnss3.so.dbg.gz
  1711700  10-20-2016 05:33   minidump-analyzer/F06B7FBC49288762015F12527C8C59E10/minidump-analyzer.dbg.gz
  1892684  10-20-2016 05:33   libnss3.so/A1F901E2FE918CEE531A64C22B5D91520/libnss3.so.sym
  1943859  10-20-2016 05:33   libmozsqlite3.so/725A8D36801E252CFC4FB861B2122E4A0/libmozsqlite3.so.sym
  2409834  10-20-2016 05:33   libmozavcodec.so/58CE644CC3926AF1CF30294F38798B920/libmozavcodec.so.sym
 28768457  10-20-2016 05:32   js/0C1030CDD7E1129439F8470CDE111A120/js.sym
 29594602  10-20-2016 05:33   sdp_file_parser/322D2A9BD85F3365C9DFE479119402E10/sdp_file_parser.sym
 37569662  10-20-2016 05:33   sdp_file_parser/322D2A9BD85F3365C9DFE479119402E10/sdp_file_parser.dbg.gz
 62795296  10-20-2016 05:32   js/0C1030CDD7E1129439F8470CDE111A120/js.dbg.gz
197947372  10-20-2016 05:33   libxul.so/8390927E37E99CA335B099F7432858080/libxul.so.sym
218804976  10-20-2016 05:33   libxul.so/7DC2FE813271C517027AF7F84139CB520/libxul.so.sym
423641207  10-20-2016 05:33   libxul.so/8390927E37E99CA335B099F7432858080/libxul.so.dbg.gz
450136296  10-20-2016 05:33   libxul.so/7DC2FE813271C517027AF7F84139CB520/libxul.so.dbg.gz
Commits pushed to master at https://github.com/mozilla/socorro-infra

https://github.com/mozilla/socorro-infra/commit/b3382b35d41c37482df04d13e5ecf5cd6a3219f6
bug 1311410 - up nginx file upload size limit

https://github.com/mozilla/socorro-infra/commit/8f647c46784d8b3932af6cd78a0f4536c3056da2
Merge pull request #259 from peterbe/bug-1311410-up-nginx-file-upload-size-limit

bug 1311410 - up nginx file upload size limit
Turns our, our Nginx file size limit was set to 1GB. 
Addressed with: https://github.com/mozilla/socorro-infra/pull/259
Assignee: nobody → peterbe
Prod Nginx now accepts up to 2GB file uploads. 

Please verify once those linux64 symbol uploads start to work.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
The latest Linux64 nightly retrigger just went green.
Status: RESOLVED → VERIFIED
See Also: → 1311725
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: