Closed Bug 1582357 Opened 5 years ago Closed 5 years ago

Running talos tests with --geckoProfile is very slow or crashes

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox71 --- fixed

People

(Reporter: jdescottes, Assigned: mozbugz)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

It seems like running talos tests locally with the --geckoProfile option is now either very slow (for small tests) or crashes (for bigger tests).

For instance on my machine (macos):
./mach talos-test --activeTests damp --subtest custom.debugger --cycles 1 --tppagecycles 1 --geckoProfile finishes with a tab crash (no issue without --geckoProfile)

It crashes with the following exception in the commandline:

05:52:06     INFO -  PID 86064 | dyld: Library not loaded: @executable_path/libmozglue.dylib
05:52:06     INFO -  PID 86064 |   Referenced from: /[...]/Nightly.app/Contents/MacOS/crashreporter.app/Contents/MacOS/minidump-analyzer
05:52:06     INFO -  PID 86064 |   Reason: no suitable image found.  Did find:
05:52:06     INFO -  PID 86064 | 	/Users/jdescottes/lib/libmozglue.dylib: stat() failed with errno=20

Also running ./mach talos-test --activeTests damp --subtest simple.debugger --cycles 1 --tppagecycles 1 --geckoProfile finishes and opens the profiler in a new window but only after ~20 seconds

According to a bisect, it was regressed by Bug 1581049, more accurately: https://hg.mozilla.org/integration/autoland/rev/4001594f15dd

:gerald, any idea why we are getting this crash?

Flags: needinfo?(gsquelart)

Thank you Julian.
I'm able to reproduce it locally, so hopefully I'll be able to find the cause...
Here it seems to regress in the previous revision https://hg.mozilla.org/integration/autoland/rev/1ceccc3c86a49e41 -- a more likely culprit to me than the simpler nearby revision you found (but as we discussed you were using artifact builds, so maybe that was not as precise.)

Assignee: nobody → gsquelart
Severity: normal → critical
Flags: needinfo?(gsquelart)
Priority: -- → P1

In some situations, entries may in fact take more than half the buffer size
(e.g., when duplicating a stack into a small temporary buffer).
So we now allow blocks to take the full buffer size -- but not more, as they
would start overwriting themselves!

Pushed by gtatum@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/318c6aa896db
Allow profiler entries up to the size of the buffer - r=gregtatum
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Julian, the crashes should not happen anymore, starting with Nightly 71.0a1 (2019-09-20).

But I'm guessing tests being slow may still be an issue. Could you please try again, and if there is still a problem, please open a new bug with your experience of how slow tests are. Thank you.

Flags: needinfo?(jdescottes)

Thanks a lot for fixing this!

I quickly measured the time needed to get profiles on my machine for:

  • custom.debugger: 12s
  • simple.debugger: 10s

I compared with FIREFOX_NIGHTLY_70_END, and I get the same timings.
So, for me it's back to normal and definitely usable.

Flags: needinfo?(jdescottes)
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: