Open Bug 1821646 Opened 2 years ago Updated 2 years ago

Generate a warnings statistics file at the end of the execution of tests

Categories

(Testing :: General, enhancement)

Default
enhancement

Tracking

(Not tracked)

People

(Reporter: marco, Unassigned)

References

Details

This would allow us to more easily and quickly identify things such as bug 1821633.

We have two options:

  1. Generate a summary file and upload it as an artifact, then write a script to analyze a set of tasks or ingest it into BigQuery and query it from there
  2. Upload the log files to BigQuery so we can perform any kind of query on the log files

FWIW for 1. a simple command like

cat input.log | grep WARNING | sed 's/ WARNING/\nWARNING/g' | grep WARNING | sed 's/[0-9]\+[.s ]/n/g' | sort | uniq -c | sort -nr > output.log

might be a first start. There might be some improvement possible to not suppress interesting numbers (there are some warnings that include time measures as numbers, which I wanted to aggregate together).

I took a look at this quickly- overall this is doable with a service that would download all the raw logs and parse them, then upload to a database- I imagine we would find a short list of WARNINGS and from there we could determine it is common.

Do we care about frequency per log, or just overall?

the example above from comment 1 is cool, I got a lot of good hits- I think for something that we would parse and store in a database, we would just write a raw log parser that would be similar to the commandline tools, but could account for things like:

      1 WARNING: [548CEB4B06An8AB4A01248190595]: Could not be introduced to peer D44464934178C6FC.91C3651B1CFBA19A: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:605
      1 WARNING: [548CEB4B06An8AB4A01248190595]: Could not be introduced to peer D2451B742F00638C.C7AD7FC269BCEFBB: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:605
      1 WARNING: [548CEB4B06An8AB4A01248190595]: Could not be introduced to peer 8FD63FC3060999Dn69F77B3497972AD3: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:605
      1 WARNING: [548CEB4B06An8AB4A01248190595]: Could not be introduced to peer 7DBCD309FB73BnCDB4DFE360903100: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:605

and:

      1 WARNING: [n1]: Rejecting introduction request from '548CEB4B06An8AB4A01248190595' for unknown peer 'D5C00632ECB72DnA5229BE72B110B9C': file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:675
      1 WARNING: [n1]: Rejecting introduction request from '548CEB4B06An8AB4A01248190595' for unknown peer 'D44464934178C6FC.91C3651B1CFBA19A': file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:675
      1 WARNING: [n1]: Rejecting introduction request from '548CEB4B06An8AB4A01248190595' for unknown peer 'D2451B742F00638C.C7AD7FC269BCEFBB': file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:675
      1 WARNING: [n1]: Rejecting introduction request from '548CEB4B06An8AB4A01248190595' for unknown peer '8FD63FC3060999Dn69F77B3497972AD3': file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:675

most likely we would find the top 20 patterns like ^^ and report as something simple:

WARNING: [n1]: Rejecting introduction request from '548CEB4B06An8AB4A01248190595' for unknown peer <>: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:675
WARNING: [548CEB4B06An8AB4A01248190595]: Could not be introduced to peer <>: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:605

assuming that would be ok, we could simplify something like 400 unique warnings -> <250 warnings in a given log file.

See Also: → 1824185

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #2)

Do we care about frequency per log, or just overall?

Probably both. For an overall statistics there might be the challenge how to keep the connection to the source code over different revisions. For QM_TRY monitoring we used rust-code-analysis-cli to get function names as permanent anchors. But it might be just enough to have overall statistics at m-c level for each nightly build to avoid this hazzle ?

:mccr8 has https://github.com/amccreight/log-spam-hell which might be relevant here?

Flags: needinfo?(continuation)

Erich Rahm wrote that years ago, and I just got it working again recently because I wanted something to download a bunch of logs from TreeHerder. It would definitely be worth looking over what all the scripts are doing. For many years he was kind of doing a one-person operation to fix warning spam. A lot of the bugs he filed were against bug 765224.

Here's the top ten of the output from a Linux try run I did a few weeks ago:

820262 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file toolkit/xre/nsXREDirProvider.cpp:475
432192 WARNING: '!Theme::ThemeSupportsWidget(aFrame->PresContext(), aFrame, aAppearance)', file widget/gtk/nsNativeThemeGTK.cpp:1092
193943 WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file modules/libpref/Preferences.cpp:4687
193917 WARNING: XPCOM_MEM_BLOAT_LOG is set, disabling native allocations.: file tools/profiler/core/platform.cpp:345
181620 WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file xpcom/base/nsCycleCollector.cpp:3426
180025 WARNING: '!scrollbar', file widget/Theme.cpp:1103
179939 WARNING: NS_ENSURE_TRUE(InitStaticMembers()) failed: file /builds/worker/workspace/obj-build/dist/include/mozilla/Preferences.h:129
169226 WARNING: could not set real-time limit in CubebUtils::InitLibrary: file dom/media/CubebUtils.cpp:655
116182 WARNING: JSWindowActorChild::SendRawMessage (Conduits, ConduitClosed) not sent: !CanSend() || !mManager || !mManager->CanSend(): file dom/ipc/jsactor/JSWindowActorChild.cpp:61
113869 WARNING: IPC Connection Error: [Parent][PCompositorManagerParent] RunMessage(msgname=PCompositorBridge::Msg___delete__) Channel closing: too late to send/recv, messages will be lost: file ipc/glue/MessageChannel.cpp:1927

Flags: needinfo?(continuation)

FWIW, the first row is bug 1821633.

It might be worth to just re-use that old meta bug and link new findings there, too?

It looks like his scripts will, once you've picked an individual warning to investigate, do stuff like break down how common it is in specific test suites and even specific tests, as seen in this bug: bug 1542374.

Interesting. I'd still want to see the statistics to be calculated on a regular base and easily accessible somewhere (like a dashboard or so), but those scripts might do a lot of what :jmaher was talking about in comment 2.

You need to log in before you can comment on or make changes to this bug.