Open Bug 1865396 Opened 2 years ago Updated 2 years ago

Leak logging overwrites existing logs with duplicate PIDs

Categories

(Core :: XPCOM, task)

task

Tracking

()

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

The leak logs that XPCOM generates for use by leakcheck have names like runtests_leaks_tab_pid9424.log. The test harness gives a new temp directory for each directory of tests it runs, but it is possible for the same PID to get used multiple times in a single directory.

It appears that at least on Windows this is quite common. In one log I looked at, I see a single PID being used four times. There are more than a hundred duplicate logs.

These logs are opened with "w", so in effect the last log will overwrite all of the earlier ones, meaning that we are potentially missing out on a large number of leaks.

Two possible fixes I can think of are:

  • Make the leak log files get opened with append, and change leaklog.py to deal with multiple reports in a single file. (I think it will complain right now if you do that.)
  • Look for an existing file, and use a different name if one exists. This might require leaklog.py changes. I don't remember how much it cares about what the log names are.

This is probably a "regression" from Fission, because we create an absolutely immense number of content processes with Fission, compared to e10s.

I found this because I'm writing a detailed log analyzer to help understand what is going on in bug 1863874. It might be good to wait to fix this until after I work out what is going on there, because our leak rate may increase massively.

One complication that I forgot about is that NoteIntentionalCrash() writes a special annotation into the leak log so that the leak checker won't complain about the log being empty. It does this by figuring out what the name of the leak log is and appending to it. One way around this would be to move the implementation of NoteIntentionalCrash() into nsTraceRefCnt.cpp and then it could just use the same file handle rather than figuring it out from scratch, but maybe there's some weird mozglue reason why it is defined inline. Another possibility would be to name multiple logs in a guessable way with a total ordering, like runtests_leaks_tab_pid9424-1.log, runtests_leaks_tab_pid9424-2.log and have it look for the highest value it can find, though that's a pain.

Along similar lines, if we share leak logs, we'll have to write something into the leak log indicating that we've started logging for that process (if we don't already?). Otherwise, we could have a situation where process A with pid N successfully creates a log and writes to it, then process B with pid N crashes before writing to the log, and the leak checker can't tell that B existed.

See Also: → 1069479
You need to log in before you can comment on or make changes to this bug.