Closed Bug 1295997 Opened 8 years ago Closed 6 years ago

Add limit to size of log we will parse

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: emorley)

References

Details

Attachments

(3 files, 1 obsolete file)

We should skip parsing logs that are too large so that it doesn't stop our processing if a ridiculously large log (like 2GB) comes in.
Assignee: nobody → cdawson
See Also: → 1294548, 1165356
Part of this work is to switch over to requests which helps us determine the log size before we try to parse. So I will likely be fixing bug 1165356 while I'm at it.
Blocks: 1294548
Depends on: 1165356
See Also: 1294548, 1165356
Comment on attachment 8782981 [details] [review] [treeherder] mozilla:requests-log-parser-responses > mozilla:master Hey Ed-- I've been beating my head against a wall on this one for a while now. :) I'd love any ideas you may have. Thanks for taking a look. Not a huge rush because I'm going to task switch to clear my head a bit. :) Perhaps some distance will give my better perspective...
Attachment #8782981 - Flags: feedback?(emorley)
Comment on attachment 8782981 [details] [review] [treeherder] mozilla:requests-log-parser-responses > mozilla:master I've had a look but can't see anything too obvious. Will take a deeper look when I have more time :-)
Attachment #8782981 - Flags: feedback?(emorley)
Blocks: 1343831
Assignee: cdawson → emorley
In addition to the instances in bug 1294548 and bug 1343831, we just had another today. 23:28 <emorley> gbrown: this try run is creating 400MB logs which momentarily caused log parser backlogs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b2c3fe5bbe2cb539e9513067d333891ed4e511c 23:29 <gbrown> emorley: wow, spectacular failure. sorry. cancelled. 23:30 <emorley> gbrown: np, treeherder should handle this case better (doing so in bug 1294544, though requires changing the way we record log parser failures so we can improve the UX and explain that log parsing was skipped deliberately) 23:31 <emorley> it's just that in the meantime, 500 jobs x 200-400MB logs is quite a bit of parsing time :-) In today's instance the largest logs were 400MB uncompressed but only 22.5MB compressed (since lots of repetition). Unfortunately the log parser can only see the Content-Length of the compressed log, so we'll have to see whether we can set a compressed size limit of say 10-20MB to catch these, or whether that would be too low for logs that don't have much repetition. It might be that for such high-compression cases we'll have to rely instead on the time limit of bug 1294544.
See Also: → 1294544
00:11 <emorley> gbrown: Aryx: though the jobs do have a failure message that warns about the size (https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/buildbot.py?q=%22Log+file+size%22&redirect_type=single#91) -- perhaps that should truncate the logs down to the max size, so the full log never gets uploaded?
Depends on: 1347956
Blocks: 1347945
Attachment #8877268 - Flags: review?(cdawson)
Attachment #8782981 - Attachment is obsolete: true
Attachment #8877268 - Flags: review?(cdawson) → review+
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/ba4bf09e5ebc1da3bbf46a99e5cf69e605173310 Bug 1295997 - Record the size of the unstructured log download This will help determine what maximum size threshold is appropriate to allow the common log sizes, but still prevent the extreme offenders. The Content-Encoding is also recorded, to check if there are any other logs being served without gzip.
Whilst working on the implementation for blocking downloads I spotted a requests daftness, which I fixed and has now been released: https://github.com/requests/requests/pull/4137 https://github.com/requests/requests/blob/master/HISTORY.rst#2180-2017-06-14 Once we update to the new release (https://github.com/mozilla/treeherder/pull/2560) we can remove the `closing()` boilerplate from here: https://github.com/mozilla/treeherder/blob/ba4bf09e5ebc1da3bbf46a99e5cf69e605173310/treeherder/log_parser/artifactbuildercollection.py#L90
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/ce5527c820aba79ddef0e76c8e1756ca11d01499 Bug 1295997 - Send the unstructured log size to New Relic as ints New Relic Insights doesn't coerce strings to integers, so doesn't allow the graphing of custom attributes sent as strings. HTTP headers are always exposed as strings, even for fields that are expected to represent numbers, so we must explicitly cast Content-Length.
I've written a query to generate a histogram of download sizes: https://insights.newrelic.com/accounts/677903/dashboards/339080 The fix in comment 11 was only just deployed, but we'll gradually get more data over the next few days (our non-pro Insights plan keeps 7 days of data).
Component: Treeherder: Data Ingestion → Treeherder: Log Parsing & Classification
Priority: -- → P1
This try push created logs that were 486 MB compressed and a whopping 12 GB uncompressed! https://treeherder.mozilla.org/#/jobs?repo=try&revision=f9627aa74b1083fd2dab6f4f39fae24fcd9ebbd2
To summarise the current state here: I've had a WIP for checking Content-Length of log files before performing the download using requests for a while, however it needs some additional changes to Treeherder so we can store a "log parsing failed" reason to display in the UI, otherwise people will blame Treeherder not their logs. I also added some New Relic stats annotation above to help pick a threshold, though not everything is compressed, so it's hard to pick a threshold that's fair for both cases (plus depending on how much duplication there is in the over-verbose log lines, the appropriate compressed threshold can vary dramatically).
Blocks: 1455721
See Also: → 1372668
See Also: → 1530357
Status: NEW → ASSIGNED

https://github.com/mozilla/treeherder/commit/52d6017c5b3751d547a445f0a3fc891c3406d52b

Logs whose download size (ie before decompression, for those that are compressed) that is larger than 5MB will now be skipped by the log parser.

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

(This will prevent one of the main causes for rabbitmq queue size alerts in production)

Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: