Closed Bug 1314345 Opened 6 years ago Closed 6 years ago

add linux nightly symbols upload to taskgraph

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kmoir, Assigned: kmoir)

References

Details

Attachments

(1 file)

Like I did for android in bug 1313190
Assignee: nobody → kmoir
Attachment #8806473 - Flags: review?(bugspam.Callek) → review+
Linux symbol uploads are failing like this

https://tools.taskcluster.net/push-inspector/#/EuS1VOotTS6jxwO26T5zFw/ZUr_XGh0RDi1qhdFxT5k9g?_k=hj4n4a

 run

symbol_zip=$(basename ${symbol_url})

basename ${symbol_url}

++ basename https://queue.taskcluster.net/v1/task/YVX3OjE5T-mR5YsMh8KsFQ/artifacts/public/build/target.crashreporter-symbols.zip

+ symbol_zip=target.crashreporter-symbols.zip

script_name=$(basename ${SCRIPT_PATH})

basename ${SCRIPT_PATH}

++ basename toolkit/crashreporter/tools/upload_symbols.py

+ script_name=upload_symbols.py

python -u ${script_name} ${symbol_zip}

+ python -u upload_symbols.py target.crashreporter-symbols.zip

Uploading symbol file "target.crashreporter-symbols.zip" to "https://crash-stats.mozilla.com/symbols/upload"

Attempt 1 of 5...

Error: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out.

Retrying...

Attempt 2 of 5...

Error: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out.

Retrying...

Attempt 3 of 5...

Error: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out.

Retrying...

Attempt 4 of 5...


Looks like the Fennec symbols work fine, not sure why the Linux ones are timing out connecting to crash stats, I retriggered the job to see if it was an intermittent issue but the same error occurred.
I retriggered the same task with a the one click loaner and the same thing occurred, it just retries.  

https://tools.taskcluster.net/one-click-loaner/#B3bdzGj4SAOFPZaVNeDvQA

:peterbe is there a way to look at the logs on crashstats so I can have some insight into why the symbols aren't being uploaded given a time and an ip address trying to upload?
Flags: needinfo?(peterbe)
The processor logs every missing symbol and dumps this in a huge database. 
Then there's a cronjob [0] that queries that postgres table for the last 24 hours and creates a "latest.csv" file in S3 which publicly available here: https://s3-us-west-2.amazonaws.com/org-mozilla-missingsymbols/latest.csv

This was built for Ted who has some cron job running somewhere and goes through it.


[0] https://github.com/mozilla/socorro/blob/master/socorro/cron/jobs/missingsymbols.py
Flags: needinfo?(peterbe)
I don't think that answered kmoir's question. She's asking whether you have logs for the symbol upload api.

kmoir: we hit a similar error recently (bug 1311410) with linux64 nightly builds failing to upload symbols because the symbol upload api had a 1GB file size limit, and our symbols.zip crossed that. It was bumped to 2GB as a result. It's possible the symbol upload is just a little slower in taskcluster, or something like that, and so we're timing out trying to upload it?
We have logs of ALL symbol *uploads*. And an API to query that. 
https://crash-stats.mozilla.com/api/#UploadedSymbols
You can do a string search on the contents (kinda like `unzip -l symbols.zip`) 

The API is protected by the "View all Symbol Uploads" permission. 
You need to be in the "Hackers Plus" group to have that permission. Kim, you are not in that group. Ted is :)
Do you want to have that access? If so please file a Socorro::General bug.
(In reply to Peter Bengtsson [:peterbe] from comment #8)
> We have logs of ALL symbol *uploads*. And an API to query that. 
> https://crash-stats.mozilla.com/api/#UploadedSymbols
> You can do a string search on the contents (kinda like `unzip -l
> symbols.zip`) 

Right--but the problem here is that the symbol upload is timing out. It's possible that it's actually succeeding, and the server is just taking long enough to process the contents that the HTTP request times out waiting for a response.
Depends on: 1315086
So looking at both the android and linux builds on tc on date, as Ted noted we are uploading the  target.crashreporter-symbols.zip instead of target.crashreporter-symbols-full.zip.

I've spent some time looking at the tc code and the in tree mh code and I'm not really sure how the symbols_url is set but this looks suspicious

https://dxr.mozilla.org/mozilla-central/rev/ade8d4a63e57560410de106450f37b50ed71cca5/testing/mozharness/mozharness/mozilla/taskcluster_helper.py#254
Depends on: 1315287
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.