Closed Bug 1607793 Opened 6 years ago Closed 5 years ago

taskclusteretl - redesign the live_backing.log parser

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: trink, Assigned: trink)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

Attachments

(1 file)

Currently the parsers were created by analyzing a fixed set of logs (based on platform, symbol, groupSymbol) and a static parse table was generated. This needs to be more flexible going forward; either the parse table must dynamically change or we need to efficiently scan for everything we are interested in every log.

Assignee: nobody → mtrinkala
Status: NEW → ASSIGNED
Points: --- → 3
Priority: -- → P1

Please let us know if some improvements in the build process could make your life easier!

Attached file components
Since this is/was experimental we are extracting a bunch of data because we could, not because it was needed.

Before the parser is redesigned it would be useful to spec out what data is needed and its use (also, if necessary get it added to the logs). I have attached a list of what we currently extract (many of those are simply the perfherder data flattened out). The others are not as straight forward, here are some of the pain points and the reason for the 54 unique log parsers that only consume a curated subset of the logs:

  1. Different timestamp logging within a single log and between different tasks/workers e.g.
    • [taskcluster 2019-05-28T23:58:58.896Z]
    • [task 2019-05-29T13:13:24.382Z] 13:13:24 INFO -
    • 2019-05-29 10:02:50,127 - signingscript.script - INFO -
    • '00:24:06 INFO - '
    • no timestamp
  2. Inconsistent summary results e.g.
[task 2019-05-29T03:03:12.817Z] + exit 0
[taskcluster 2019-05-29 03:03:13.626Z] === Task Finished ===
[taskcluster 2019-05-29 03:03:18.742Z] Successful task run with exit code: 0 completed in 1088.948 seconds
[taskcluster 2019-05-29T03:12:50.089Z]       Result: SUCCEEDED
[taskcluster 2019-05-29T03:12:50.089Z] === Task Finished ===
[taskcluster 2019-05-29T03:12:50.089Z] Task Duration: 1h9m16.452924784s
2019-05-29 02:10:25,450 - datadog.threadstats - DEBUG - No events to flush. Continuing.
exit code: 0
  1. Numerous test result status lines depending on test suite being run e.g.
    • "%s | %s | took %dms"
    • "%s | %s | test completed (time: %dms)"
    • "REFTEST TEST-START | %s == %s" (sadly the status has to be retrieved from a line that looks like this "%s | %s %s" and the timing is computed from the delta between the two
  2. many durations have to be computed by taking the delta from the closest line, that has a timestamp (above or below depending on circumstance). In general this is good enough for cost analysis.
  3. long log lines, there was an issue with some 400K plus character lines
  4. multi-line logs
    • debug dumps... mainly waste CPU cycles since the are treated like valid log lines with/without timestamps
    • individually timestamped pretty printed JSON is entertaining (fortunately we don't extract anything from those... yet)
06:31:17     INFO - Dumping config to /builds/task_1559111460/workspace/logs/localconfig.json.
06:31:17     INFO - {'append_to_log': False,
06:31:17     INFO -  'avds_dir': '/builds/worker/workspace/build/.android',
06:31:17     INFO -  'base_work_dir': '/builds/task_1559111460/workspace',
06:31:17     INFO -  'config_files': ('/builds/task_1559111460/workspace/mozharness/configs/android/android_common.py',
....
  1. It is much more efficient to only enable the parser components for the specific type of logs. By cherry picking what we want the throughput is up to 50x faster. However, what is useful can change over time but I think this will have to be manually managed unless we standardize the logging of everything we want to extract.
    1. Some task logs don't contain a lot of useful timing information and the fetch can be skipped all together, only the total task time is recorded.

For our tests we have structured logging provided by mozlog. It might make sense to adapt this to mozharness's logger and run-task so that we can provide structured logs rather than relying on complex regex's.

I realize now that this is more generic than just parsing build logs, so I'll reiterate that for tests we already have "raw logs" which are more-or-less json parsable. That would cover all the "task" portion of the live_backing.log. For example here's a reftest live_backing.log that's "human readable" and here's the raw log for the test portion.

Mike, if you can fill some individual bugs about the various issues that you are mentioning, we can update the output to make it consistent.
Most of the time, I guess it is just a few minutes of work (last famous words) ;)

See Also: → 1607456

(In reply to Eric Rahm [:erahm] from comment #5)

I realize now that this is more generic than just parsing build logs, so I'll reiterate that for tests we already have "raw logs" which are more-or-less json parsable. That would cover all the "task" portion of the live_backing.log. For example here's a reftest live_backing.log that's "human readable" and here's the raw log for the test portion.

Thanks, I looked at that in the past but since the text log basically contains the same information and I have to parse it, fetching the JSON file only adds complexity and cost. With the initial focus being on cost analysis: test name, duration (the text log also has sub second resolution), and status were sufficient. Do we need something specific from the JSON that is not in the log?

[task 2020-01-08T23:45:34.825Z] 23:45:34     INFO - REFTEST INFO | Running with e10s: True
[task 2020-01-08T23:45:34.825Z] 23:45:34     INFO - REFTEST INFO | Application command: /builds/worker/workspace/build/application/firefox/firefox -marionette -profile /tmp/tmpt8502z.mozrunner
[task 2020-01-08T23:45:35.143Z] 23:45:35     INFO - 1578527135138	addons.webextension.doh-rollout@mozilla.org	WARN	Loading extension 'doh-rollout@mozilla.org': Reading manifest: Invalid extension permission: networkStatus
...
[task 2020-01-08T23:45:38.396Z] 23:45:38     INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html
[task 2020-01-08T23:45:38.397Z] 23:45:38     INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html | 0 / 11 (0%)
[task 2020-01-08T23:45:38.733Z] 23:45:38     INFO - REFTEST INFO | drawWindow flags = DRAWWINDOW_DRAW_CARET | DRAWWINDOW_DRAW_VIEW | DRAWWINDOW_USE_WIDGET_LAYERS; window size = 800,1000; test browser size = 800,1000
[task 2020-01-08T23:45:38.735Z] 23:45:38     INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html | 0 / 11 (0%)
[task 2020-01-08T23:45:38.763Z] 23:45:38     INFO - REFTEST TEST-PASS | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html | image comparison, max difference: 0, number of differing pixels: 0
{"source": "reftest harness", "thread": "MainThread", "time": 1578527125475, "action": "log", "message": "Running with e10s: True", "level": "INFO", "pid": 1017}
{"source": "reftest harness", "thread": "MainThread", "time": 1578527125475, "action": "log", "message": "Application command: /builds/worker/workspace/build/application/firefox/firefox -marionette -profile /tmp/tmpsgkRsI.mozrunner", "level": "INFO", "pid": 1017}
{"source": "reftest harness", "process": "GECKO(1029)", "thread": "ProcessReader", "time": 1578527125809, "action": "process_output", "data": "1578527125809\taddons.webextension.doh-rollout@mozilla.org\tWARN\tLoading extension 'doh-rollout@mozilla.org': Reading manifest: Invalid extension permission: networkStatus", "pid": 1017}
...
{"thread": "None", "pid": null, "source": "reftest", "test": "file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html", "time": 1578527138394, "action": "test_start"}
{"source": "reftest harness", "process": "GECKO(1248)", "thread": "ProcessReader", "time": 1578527138395, "action": "process_output", "data": "REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html | 0 / 11 (0%)", "pid": 1017}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138394, "action": "log", "exc_info": false, "message": "START file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138699, "action": "log", "exc_info": false, "message": "[CONTENT] Using browser remote=true\n"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138713, "action": "log", "exc_info": false, "message": "[CONTENT] OnDocumentLoad triggering AfterOnLoadScripts"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138725, "action": "log", "exc_info": false, "message": "Initializing canvas snapshot"}
{"thread": "None", "level": "INFO", "pid": null, "source": "reftest", "time": 1578527138726, "action": "log", "exc_info": false, "message": "drawWindow flags = DRAWWINDOW_DRAW_CARET | DRAWWINDOW_DRAW_VIEW | DRAWWINDOW_USE_WIDGET_LAYERS; window size = 800,1000; test browser size = 800,1000"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138726, "action": "log", "exc_info": false, "message": "DoDrawWindow 0,0,800,1000"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138730, "action": "log", "exc_info": false, "message": "[CONTENT] RecordResult fired"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138730, "action": "log", "exc_info": false, "message": "RecordResult fired"}
{"source": "reftest harness", "process": "GECKO(1248)", "thread": "ProcessReader", "time": 1578527138732, "action": "process_output", "data": "REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html | 0 / 11 (0%)", "pid": 1017}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138730, "action": "log", "exc_info": false, "message": "START file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138745, "action": "log", "exc_info": false, "message": "[CONTENT] OnDocumentLoad triggering AfterOnLoadScripts"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138752, "action": "log", "exc_info": false, "message": "Initializing canvas snapshot"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138752, "action": "log", "exc_info": false, "message": "DoDrawWindow 0,0,800,1000"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138757, "action": "log", "exc_info": false, "message": "[CONTENT] RecordResult fired"}
{"thread": "None", "level": "DEBUG", "pid": null, "source": "reftest", "time": 1578527138757, "action": "log", "exc_info": false, "message": "RecordResult fired"}
{"status": "PASS", "thread": "None", "extra": {"status_msg": "Pass"}, "pid": null, "source": "reftest", "test": "file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html", "time": 1578527138758, "action": "test_status", "subtest": "image comparison, max difference: 0, number of differing pixels: 0"}
{"status": "OK", "thread": "None", "pid": null, "source": "reftest", "test": "file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-enabled.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/css-disabled/fieldset/fieldset-ref.html", "time": 1578527138758, "action": "test_end"}
Depends on: 1608988
Depends on: 1608992

There is probably some existing code in ActiveData to parse logs which you could re-use.

The parser was refactored and still handles all the variations above. The refactor was deployed Jan 31 so Feb 1 will be the first day with a full updated dataset.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: