Closed Bug 1424236 Opened 7 years ago Closed 6 years ago

Error connecting to symbols.mozilla.org causes exception in nightly symbol upload jobs

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: NarcisB, Unassigned)

References

Details

Flags: needinfo?(peterbe)
# run
symbol_zip=$(basename ${symbol_url})
basename ${symbol_url}
++ basename https://queue.taskcluster.net/v1/task/UuaW5XKDQcegzKbv7Eg5yg/artifacts/public/build/target.crashreporter-symbols-full.zip
+ symbol_zip=target.crashreporter-symbols-full.zip
script_name=$(basename ${SCRIPT_PATH})
basename ${SCRIPT_PATH}
++ basename toolkit/crashreporter/tools/upload_symbols.py
+ script_name=upload_symbols.py
python -u ${script_name} ${symbol_zip}
+ python -u upload_symbols.py target.crashreporter-symbols-full.zip
Uploading symbol file "target.crashreporter-symbols-full.zip" to "https://symbols.mozilla.org/upload/"
Attempt 1 of 5...
Error: HTTPSConnectionPool(host='symbols.mozilla.org', port=443): Max retries exceeded with url: /upload/ (Caused by <class 'httplib.BadStatusLine'>: '')
Retrying...
Attempt 2 of 5...
Error: HTTPSConnectionPool(host='symbols.mozilla.org', port=443): Max retries exceeded with url: /upload/ (Caused by <class 'httplib.BadStatusLine'>: '')
Retrying...
Attempt 3 of 5...

[taskcluster:error] Task timeout after 600 seconds. Force killing container.
Working on trying to figure this out. 

I downloaded that .zip file into an EC2 test instance (based in us-east, the Tecken server is in us-west). 
Then I uploaded it manually there with curl and it worked. It took 33 seconds to process, inside Tecken, once received. 

There are no exceptions logged in Sentry. 

However, I do see lots of errors like this in the logs:

127.0.0.1 - - [08/Dec/2017:12:17:23 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:18:02 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:19:04 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:20:00 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:20:59 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:21:57 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:22:56 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:52:35 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"
127.0.0.1 - - [08/Dec/2017:12:52:48 +0000] "POST /upload/ HTTP/1.1" 403 22 "-" "python-requests/2.9.1"

(it's grep filtered. they didn't all come in at the same time)
Flags: needinfo?(peterbe)
Another possibility is that the ELB configuration for symbols.mozilla.org isn't changed to allow for long and slow requests. That's something that's definitely been set for Stage and before we did that to stage I vaguely remember seeing these kinds of strange HTTPSConnectonPool errors.
I've checked with CloudOps. The ELB is correctly configured on symbols.mozilla.org.
We've established that the migration of API keys from Socorro seems to not have worked. The quick remedy is that Ted will generate a key in his name and upload into the TaskCluster secrets service.
I went through and manually uploaded symbols to symbols.mo for all of the nightly builds where symbol uploading failed:
build-linux-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/Y_f8XoDYR5ekxRXY5JYEPQ/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/13

build-linux64-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/btSlyhUeTRaizi1nGpwUlw/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/14

build-macosx64-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/HThxwLZTQwW31tWbuD54Kg/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/15

build-android-api-16-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/UuaW5XKDQcegzKbv7Eg5yg/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/16

build-android-api-16-old-id-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/KInFbI9OQCmatcv2aijQxw/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/17

build-android-x86-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/BiYIzmLlTyC7rjtpH2MsoA/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/18

build-android-x86-old-id-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/Vt7_WH_iQdCGnPDi1Wqw8g/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/19

build-android-aarch64-nightly/opt-upload-symbols
https://queue.taskcluster.net/v1/task/VE0Qt-CXTLuNmN-mvahTng/artifacts/public/build/target.crashreporter-symbols-full.zip
https://symbols.mozilla.org/uploads/upload/20
While I was doing that I realized that the Windows nightly *builds* had failed because they were trying to upload symbols *from the build task*, so I filed bug 1424323. I have a patch there to disable that, hopefully that will land and get merged in time for the next nightly to prevent them from burning again.

I also landed the patches in bug 1422740 on autoland, which will make the upload-symbols tasks use the token stored in a Taskcluster secret instead of the one hardcoded in a Docker image, and I generated a new upload token for my account on symbols.mo and put that token in the Taskcluster secret, so if that gets merged in time that should fix symbol upload for nightlies.
This should be fixed now that those other patches have merged to central. If the next set of nightlies uploads symbols successfully, we can close this bug.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> This should be fixed now that those other patches have merged to central. If
> the next set of nightlies uploads symbols successfully, we can close this
> bug.

Yup, symbols uploaded without issues for the next set of nightlies, so I'll close this.
e.g. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=6e2181b6137c87fde58434bb926ea3f21fc1ed11&filter-searchStr=upload%20symbols
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.