Closed Bug 1863290 Opened 7 months ago Closed 7 months ago

logs for web platform tests more likely to be too big to parse, get skipped by Treeherder

Categories

(Testing :: web-platform-tests, defect)

defect

Tracking

(firefox121 fixed)

RESOLVED FIXED
121 Branch
Tracking Status
firefox121 --- fixed

People

(Reporter: aryx, Assigned: jgraham)

References

Details

Attachments

(1 file)

Over the last week, the chance for a web platform test log to get skipped was approximately tenfold the one for the previous three weeks.

See this query and change the week counter in line 27.

James, could you take a look?

Flags: needinfo?(james)

The link to the query is broken; it would help to know exactly which jobs are causing problems.

Flags: needinfo?(james) → needinfo?(aryx.bugmail)

OK, so they're almost all wdspec jobs. I'm not seeing anything obvious in the recent changes but Henrik might remember if we increased the logging verbosity or similar.

Flags: needinfo?(hskupin)

In nearly all of the cases the huge logs seem to be only present for the wd2 jobs and hereby for debug builds. I have seen an increase of leak check lines, and maybe all these additional entries have caused that issue?

Also it looks like that the same leak log for a tab/pid gets processed multiple times, eg. search for runtests_leaks_1541_tab_pid4639.log in the following log: https://treeherder.mozilla.org/logviewer?job_id=434206201&repo=mozilla-central

Shouldn't all the logs be removed after the leakcheck is done for this particular Firefox process?

Flags: needinfo?(hskupin) → needinfo?(james)

So re: the files being deleted, I think the code currently deletes the file for the parent process, but it didn't get updated for possible content-process files: (https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox.py#586-587 vs https://searchfox.org/mozilla-central/source/testing/mozbase/mozleak/mozleak/leaklog.py#225-248). So maybe process_leak_log should return a list of processed files for cleanup (or should handle this itself, generally mozleak is a bit sketchy design wise).

Flags: needinfo?(james)
Assignee: nobody → james
Status: NEW → ASSIGNED

Marking as leave-open so that we can check if the provided patch is enough or if there is something else that needs to be investigated.

Keywords: leave-open
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8a8da65f9aee
Cleanup all leak log files after processing, r=ahal,Sasha
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/43042 for changes under testing/web-platform/tests
Upstream PR merged by moz-wptsync-bot

I can still see too large log files for Wd2.

Please note that because bug 1851376 (using a dummy URL for the remote settings server) landed now we also get a lot of extra log lines now. Lets wait for the fix on bug 1812040.

Depends on: 1812040
See Also: → 1851376

Because we added quite a few new tests in the past weeks I assume that some tests have been moved between the different chunks. In wd2 we actually have the PDF tests now, and I can see that we do not truncate the URLs which causes quite a lot of extra data. I've filed bug 1864389 to get that fixed.

With the patch from bug 1864389 this bug should be finally fixed as well. I'll re-check once it's on mozilla-central.

The situation is much better now. There is only a single wd2 debug job that I was able to find which had a too large error summary log:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&revision=acdcddac0b907d2e59d6d0fa1f96fff98453093a&selectedTaskRun=Dn3nV-6OTdWmcym_NFHUJA.0

Aryx, can you please check again? What is the mean current file size? Is it still close to the cut-off size or do only some random jobs show that now? If there are still many we might want to split the wdspec jobs for debug builds into one more chunk because right now I cannot see anything else.

Note the above listed job runs actually very long so I assume we had some shutdown hang issues which caused more log output.

Flags: needinfo?(aryx.bugmail)
No longer blocks: 1864644

Treeherder uses the size of the gzipped logs because that can be checked, the plain file size is not available publicly to my knowledge.

Gzipped log sizes for wdspec web platform with plain config on Linux debug:

  • Wd1: 3.78 MiBi (devtools shows MB)
  • Wd2: 5.20 MiBi (which is 4.96MB, below the threshold of 5MB)
  • Wd3: 2.82 MiBi
Flags: needinfo?(aryx.bugmail)

The severity field is not set for this bug.
:jgraham, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(james)
Severity: -- → S3
Flags: needinfo?(james)
No longer depends on: 1866322

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #16)

Gzipped log sizes for wdspec web platform with plain config on Linux debug:

  • Wd1: 3.78 MiBi (devtools shows MB)
  • Wd2: 5.20 MiBi (which is 4.96MB, below the threshold of 5MB)
  • Wd3: 2.82 MiBi

So Wd2 is still running close to the limits then. Did we had other instances recently when the size exceeded? If the number is higher I would suggest that we split the Wd2 debug jobs into 4 chunks.

Flags: needinfo?(aryx.bugmail)

Last time the wdspec log was too large was 3 days ago. If we observe this again, we can increase the chunk count.

Flags: needinfo?(aryx.bugmail)

Sounds good. In that case let me close this bug for now as fixed for Firefox 121 (the fix for bug 1864389 was the final one), and we can have a new bug for splitting the chunks when necessary.

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: