Open Bug 1541603 Opened 5 years ago Updated 7 days ago

MakeOrSetMinidumpPath shouldn't bother doing IO and just set the env var, leaving the crash reporter to create the minidumps folder if necessary

Categories

(Toolkit :: Crash Reporting, defect, P3)

defect

Tracking

()

REOPENED
Performance Impact medium

People

(Reporter: Gijs, Unassigned)

References

(Blocks 2 open bugs)

Details

(4 keywords, Whiteboard: [fxperfsize:S])

Crash Data

Attachments

(1 file)

As in summary. The code adds insult to injury by doing an exists() check followed by a create() - but I think the correct solution is to just do this work in the crashreporter, and just update the env var when we have a profile without doing any IO.

Keywords: main-thread-io, perf
Whiteboard: [fxperf]
Whiteboard: [fxperf] → [fxperf:p2]

This looks fairly straightforward...

Whiteboard: [fxperf:p2] → [fxperf:p2] [fxperfsize:S]
Assignee: nobody → mcheang
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b59c9bfa328d
remove creation of minidumps folder on startup, create it lazily in crash reporter. r=gsvelto
Flags: needinfo?(mcheang)
Backout by shindli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/16ee100ad0c9
Backed out changeset b59c9bfa328d for causing xpcshell failure in toolkit/crashreporter/test/unit_ipc/test_content_annotation.js CLOSED TREE

Mandy, it looks like the minidump & extra files are not being moved correctly in the test environment. You'll have to run the tests locally and see what's going on. The only difference between a normal run and a test run is that ShouldReport() will return false and the target minidump directory will be different. I find it odd that your change doesn't work under those conditions but we might have missed some corner-case that comes up in tests.

Status: NEW → ASSIGNED
Priority: -- → P1

Thanks Gabriele, I will take a look at why those tests are failing.

Flags: needinfo?(mcheang)

Adding bug 1546460, since we'll no longer be creating the minidump folder for brand new user profiles right at startup anymore.

Blocks: 1546460

My patch moved the creation of the minidumps directory out of start-up. Instead we can create it lazily (in the content process). This change caused many test failures and I have attempted to fix these failures over the last several weeks. The test infrastructure heavily relies on the minidumps directory to be available at start-up.

While fixing this problem, I ran into several issues. I fixed a subset of tests to correctly find the minidump file, while there are other tests that will fail due to this fix. Some tests want to find the dump file in the "profile/minidumps" folder, while other tests want to find the dump file in "temp" directory.

All the failing tests have a similar theme. The problem is that it cannot find the dump file in the directory it wants to find it. The directory where the dump file is created is inconsistent across different test suites.

I've fixed most of the test failures by adding a few things:

  • I've added creating it lazily in the parent and plugin process to fix test failures.
  • I've added an externallySetMinidumpPath boolean to keep track of whether to or not the crashReporter.minidumpPath has been set externally by a test. If it has, then let's use the externally set path for the test.

I've pushed my most recent patch on phabricator: https://phabricator.services.mozilla.com/D36370
My most recent push to TRY: https://treeherder.mozilla.org/#/jobs?repo=try&revision=def246ee2a53c5adbd9dd3a2563a728179517150

This is the version with the least test failures (from few hundred down to less than 10 test failures):

There are 4 remaining test failures I'm aware of:

  • Windows 2012 build is failing because the test was not able to log "PROCESS CRASH".
  • There is another set of XPCshell X2 test failures:
    • toolkit/crashreporter/test/unit/test_crash_with_memory_report.js and
    • toolkit/crashreporter/test/unit/test_crash_with_memory_report.js
  • An android Xpcshell test failure: toolkit/components/extensions/test/xpcshell/test_ext_webRequest_cached.js

These are my findings as of August 28, 2019. My internship ends on August 30, 2019. I plan to continue working on this after I receive volunteer contributor access. I'll be around in IRC and slack.

Mandy, you've done some excellent work on a very complex issue. I'm always available to help out if you want to finish this. As for your last try run I suspect that the failures in toolkit/crashreporter/test/unit/test_crash_with_memory_report.js are due to the memory report file which should "ride along" with the minidump and it might have been created in the wrong directory instead. The Windows issue on the other hand is quite odd and I can't think of an obvious cause.

Assignee: mcheang → mandycheang135
Assignee: mandycheang135+personal → mandycheang135

Mandy, looks like through a series of bugzilla account changes this is still assigned to you - are you likely to come back to it? :-)

Performance Impact: --- → P2
Flags: needinfo?(mcheang)
Whiteboard: [fxperf:p2] [fxperfsize:S] → [fxperfsize:S]

Hi Gijs, thanks for following up on this one.

Yes - I'd love to get back to this bug when I have a chance in the next upcoming weeks. I've saved all my notes from my intern days on it and I roughly remember the solution I was implementing. I got stuck because it failed a bunch of tests and I couldn't figure out why. Next steps are to track back my steps of my original solution and investigate the test failures.

I'll aim to get it in for Fx 101 if not, then Fx 102.

Flags: needinfo?(mcheang)

Oh and - I apologize for not getting back to this earlier. School and part-time work got really busy for me in the last two years! But now I'm back at Mozilla, I'll love to take a look at this bug again. :)

Severity: normal → S3

I started working on this again in my extra time. It has been a while since I worked with C++ and I needed some guidance. [:botond] - who's on the layouts team has been gracious enough to help me out! We have been pair programming on this bug in our extra time. Here are our findings so far:

April 14, 2023

Summary of today's session:

  • Finished copying the rest of the code
  • Finished a full build but there are compile errors we can look at next time
10:25.30 /Users/mandycheang/central/toolkit/components/osfile/NativeOSFileInternals.cpp:824:24: warning: result of comparison 'unsigned long' > 18446744073709551615 is always false [-Wtautological-type-limit-compare]
10:25.30         needed.value() > std::numeric_limits<nsAString::size_type>::max()) {
10:25.30         ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10:25.38 toolkit/components/printingui/ipc
10:25.66 1 warning generated.
10:25.71 toolkit/components/processtools
10:25.77 toolkit/components/protobuf
10:26.46 toolkit/components/reflect
10:26.87 toolkit/components/remote
10:27.01 toolkit/components/reputationservice
10:27.14 toolkit/components/resistfingerprinting
10:27.80 /Users/mandycheang/central/toolkit/components/protobuf/src/google/protobuf/generated_message_tctable_lite.cc:1631:10: warning: 'return' will never be executed [-Wunreachable-code-return]
10:27.80   return Error(PROTOBUF_TC_PARAM_PASS);
10:27.80          ^~~~~
10:27.89 toolkit/components/satchel
10:28.75 1 warning generated.
10:28.79 toolkit/components/sessionstore
10:28.95 toolkit/components/startup
10:29.18 toolkit/components/statusfilter
10:29.29 toolkit/components/telemetry/TelemetryHistogramData.inc.stub
10:29.67 toolkit/components/terminator
10:29.73 /Users/mandycheang/central/toolkit/components/protobuf/src/google/protobuf/io/zero_copy_stream_impl_lite.cc:375:10: warning: 'return' will never be executed [-Wunreachable-code-return]
10:29.73   return true;
10:29.73          ^~~~
10:29.76 toolkit/components/typeaheadfind
10:29.81 1 warning generated.
10:29.93 toolkit/components/uniffi-js
10:30.30 toolkit/components/url-classifier
10:30.71 toolkit/components/viaduct
10:30.99 toolkit/components/windowwatcher
10:32.95 toolkit/crashreporter/breakpad-client/mac/crash_generation
10:32.97 toolkit/crashreporter/breakpad-client/mac/handler
10:33.12 toolkit/crashreporter/breakpad-client
10:33.41 toolkit/crashreporter/google-breakpad/src/common/mac
10:33.42 toolkit/crashreporter/google-breakpad/src/common
10:33.80 toolkit/crashreporter
10:34.31 toolkit/library/buildid.cpp.stub
10:34.39 In file included from Unified_cpp_components_protobuf0.cpp:47:
10:34.39 /Users/mandycheang/central/toolkit/components/protobuf/src/google/protobuf/stubs/strutil.cc:506:11: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only.  Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations]
10:34.39           sprintf(dest + used, (use_hex ? "\\x%02x" : "\\%03o"),
10:34.39           ^
10:34.40 /Users/mandycheang/.mozbuild/MacOSX13.0.sdk/usr/include/stdio.h:188:1: note: 'sprintf' has been explicitly marked deprecated here
10:34.40 __deprecated_msg("This function is provided for compatibility reasons only.  Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.")
10:34.40 ^
10:34.40 /Users/mandycheang/.mozbuild/MacOSX13.0.sdk/usr/include/sys/cdefs.h:215:48: note: expanded from macro '__deprecated_msg'
10:34.40         #define __deprecated_msg(_msg) __attribute__((__deprecated__(_msg)))
10:34.40                                                       ^
10:34.47 toolkit/mozapps/extensions
10:34.53 toolkit/profile
10:34.80 In file included from Unified_cpp_crashreporter0.cpp:11:
10:34.80 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1461:24: error: use of undeclared identifier 'NS_LITERAL_STRING'
10:34.80   miniDumpsDir->Append(NS_LITERAL_STRING("minidumps"));
10:34.80                        ^
10:34.82 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1528:41: error: use of undeclared identifier 'memoryReportLocalPath'
10:34.84     copy_file(memoryReportPath.c_str(), memoryReportLocalPath);
10:34.84                                         ^
10:34.84 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1594:7: error: no viable conversion from 'CrashReporter::xpstring' (aka 'basic_string<char, char_traits<char>, allocator<char>>') to 'bool'
10:34.84   if (memoryReportPath) {
10:34.85       ^~~~~~~~~~~~~~~~
10:34.85 /Users/mandycheang/.mozbuild/MacOSX13.0.sdk/usr/include/c++/v1/string:903:5: note: candidate function
10:34.85     operator __self_view() const _NOEXCEPT { return __self_view(data(), size()); }
10:34.85     ^
10:34.85 In file included from Unified_cpp_crashreporter0.cpp:11:
10:34.86 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1598:5: error: no matching function for call to 'copy_file'
10:34.86     copy_file(memoryReportPath, memoryReportLocalPath);
10:34.86     ^~~~~~~~~
10:34.86 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:501:6: note: candidate function not viable: no known conversion from 'CrashReporter::xpstring' (aka 'basic_string<char, char_traits<char>, allocator<char>>') to 'const char *' for 1st argument
10:34.87 bool copy_file(const char* from, const char* to) {
10:34.87      ^
10:34.87 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1613:17: error: no matching function for call to 'LaunchProgram'
10:34.87   returnValue = LaunchProgram(crashReporterPath, minidumpPathInMinidumpsDir);
10:34.87                 ^~~~~~~~~~~~~
10:34.99 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1195:13: note: candidate function not viable: no known conversion from 'CrashReporter::xpstring' (aka 'basic_string<char, char_traits<char>, allocator<char>>') to 'const CrashReporter::XP_CHAR *' (aka 'const char *') for 1st argument
10:34.99 static bool LaunchProgram(const XP_CHAR* aProgramPath,
10:34.99             ^
10:34.99 5 errors generated.
10:34.99 make[4]: *** [Unified_cpp_crashreporter0.o] Error 1
10:34.99 make[4]: *** Waiting for unfinished jobs....
10:35.03 toolkit/system/commonproxy
10:35.07 1 warning generated.
10:35.12 toolkit/system/osxproxy
10:35.30 make[3]: *** [toolkit/crashreporter/target-objects] Error 2
10:35.30 make[3]: *** Waiting for unfinished jobs....
10:41.01 make[2]: *** [compile] Error 2
10:41.01 make[1]: *** [default] Error 2
10:41.01 make: *** [build] Error 2
10:41.04 1311 compiler warnings present.
10:41.12 Notification center failed: Install terminal-notifier to get a notification when the build finishes.
➜  central

Discussion:

  • // This was tricky to understand, but this is "equivalent" to declaring and assigning a variable in a higher language 
    static XP_CHAR minidumpPathInMinidumpsDir[XP_PATH_MAX];
    p = Concat(minidumpPathInMinidumpsDir, path.c_str(), &size);
    
  • We figured out how to declare a variable before calling createFileFromPath in this code here]

  • nsCOMPtr<nsIFile> minidump;
    CreateFileFromPath(aFilePath, getter_AddRefs(minidump));
    

March 15, 2023

Summary of today's session:

  • Understood the scope of the problem: The patch is finished and approved. What’s left is to fix the remaining failing tests.
  • Since the patch is over 3 years old now, we started copying over the code into the latest files. This will help so we don’t have to rebase a deal with merge conflicts.

Discussion:

  • The parent directory is profile/

  • The dump directory is profile/minidumps

  • nsComPtr are smart pointers and it knows how to automatically deallocate and free destroyed object memory

NS_GetSpecialDirectory(const char* aSpecialDirName, nsIFile** aResult)
  • Investigated the NS_GetSpecialDirectory function signature and learned that it takes two arguments:

    • const char* a SpecialDirName

    • nsIFile** aResult

    • It knows about a special directory, and in the code we're looking at, it's "ProfD" or profileDirectory

    • It creates an nsIFile object representing the profile directory and stores it into nsIFile** aResult

    • nsCOMPtr<nsIFile> file;
      rv = NS_GetSpecialDirectory("ProfD", getter_AddRefs(file));
      
    • We looked into NS_GetSpecialDirectory to understand what getter_AddRefs() was returning

Attachment #9074858 - Attachment description: Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?mconley,gsvelto → Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?mconley,gsvelto,botond

April 26, 2023

Summary of today's session:

  • Fixed 3 out of 5 compile errors:
  • I learned that NS_LITERAL_STRING is no longer used. This changed happened 3 years ago in this newsletter https://groups.google.com/g/mozilla.dev.platform/c/6TfibZmQHaY. We changed miniDumpsDir->Append(NS_LITERAL_STRING("minidumps")); to miniDumpsDir->Append(u"minidumps"_ns);.
  • I removed an extra piece of code that included memoryReportLocalPath. I had missed removing this earlier when copying the code over to the new commit.
  • memoryReportPath is not longer a pointer to XP_CHAR. It is now a xpstring type. In order to check an xpstring type of variable within an if conditional, we need to use empty() method. This method checks if the first character of the string is null.

Logs of the remaining 2 errors to fix for next time:

 0:04.78     Finished release [optimized] target(s) in 3.00s
 0:04.92 In file included from Unified_cpp_crashreporter0.cpp:11:
 0:04.92 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1590:5: error: no matching function for call to 'copy_file'
 0:04.92     copy_file(memoryReportPath, memoryReportLocalPath);
 0:04.92     ^~~~~~~~~
 0:04.92 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:501:6: note: candidate function not viable: no known conversion from 'CrashReporter::xpstring' (aka 'basic_string<char, char_traits<char>, allocator<char>>') to 'const char *' for 1st argument
 0:04.92 bool copy_file(const char* from, const char* to) {
 0:04.92      ^
 0:04.92 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1605:17: error: no matching function for call to 'LaunchProgram'
 0:04.92   returnValue = LaunchProgram(crashReporterPath, minidumpPathInMinidumpsDir);
 0:04.92                 ^~~~~~~~~~~~~
 0:04.92 /Users/mandycheang/central/toolkit/crashreporter/nsExceptionHandler.cpp:1195:13: note: candidate function not viable: no known conversion from 'CrashReporter::xpstring' (aka 'basic_string<char, char_traits<char>, allocator<char>>') to 'const CrashReporter::XP_CHAR *' (aka 'const char *') for 1st argument
 0:04.94 static bool LaunchProgram(const XP_CHAR* aProgramPath,
 0:04.94             ^
 0:04.97 ipc/glue/test/gtest
 0:05.08 2 errors generated.
 0:05.08 make[4]: *** [Unified_cpp_crashreporter0.o] Error 1
 0:05.08 make[3]: *** [toolkit/crashreporter/target-objects] Error 2
 0:05.08 make[3]: *** Waiting for unfinished jobs....
 0:09.81 make[2]: *** [compile] Error 2
 0:09.81 make[1]: *** [default] Error 2
 0:09.81 make: *** [build] Error 2
 0:09.83 1078 compiler warnings present.
➜  central

I think the remaining two errors have a straightforward fix: we are passing an xpstring to functions that expect a const char*, so we need to call the c_str() method on the xpstring to get a const char* from it.

May 3, 2023

Summary of today's session:

  • Fixed the last 2 compile errors by calling .c_str() method on the first xpstring argument to the functions. It converted the xpstring to char* .

  • Build was sucessful. Pushed to TRY: https://treeherder.mozilla.org/jobs?repo=try&revision=a990a3aeaf3b6156ee015e70125c9c3bc59418fb

  • Ran the the test in toolkit/crashreporter/test/unit/ locally on my MacOS

  • This test here timed out: toolkit/crashreporter/test/unit/test_crash_with_memory_report.js

  •  0:02.57 INFO "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
     0:02.58 INFO (xpcshell/head.js) | test MAIN run_test pending (1)
     0:02.58 INFO (xpcshell/head.js) | test run_next_test 0 pending (2)
     0:02.58 INFO (xpcshell/head.js) | test MAIN run_test finished (2)
     0:02.58 INFO running event loop
     0:02.58 INFO toolkit/crashreporter/test/unit/test_crash_with_memory_report.js | Starting run_test
     0:02.58 INFO (xpcshell/head.js) | test run_test pending (2)
     5:02.46 TEST_END: TIMEOUT
     5:02.46 INFO xpcshell return code: None
     5:02.46 INFO toolkit/crashreporter/test/unit/test_crash_with_memory_report.js | Process still running after test!
    ^C 7:56.31 INFO toolkit/crashreporter/test/unit/test_crash_with_memory_report.js failed or timed out, will retry.
     7:56.31 INFO INFO | Result summary:
     7:56.31 INFO INFO | Passed: 0
     7:56.31 INFO INFO | Failed: 0
     7:56.31 INFO INFO | Todo: 0
     7:56.31 INFO INFO | Retried: 0
     7:56.31 ERROR TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
     7:56.31 INFO Node moz-http2 server shutting down ...
     7:56.31 INFO Node server moz-http2 already dead -2
     7:56.31 INFO http3Server server shutting down ...
     7:56.31 INFO Http3 server http3Server already dead -2
     7:56.31
    Overall Summary
    ===============
    
    xpcshell
    ~~~~~~~~
    Ran 1 checks (1 tests)
    Expected results: 1
    Unexpected results: 0
    
    Unexpected Results
    ------------------
    ERROR TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
    
  • Investigating the Timeout and looked into test_crash_with_memory_report.js and saw that this test calls await do_crash function

  • Looked at the function body of do_crash and it calls handleMinidump

  • We suspect that is what causing the time out. We can think about adding a debugger to the do_crash function and see what's happening there next time.

After we got my patch to build successfully, I wanted to check it actually solved the issue. If it did then, browser_start_up_mainthreadio.js should fail without my patch because without my patch it's doing main thread IO on start-up.

However, browser_startup_mainthreadio.js should succeed with my patch because it no longer creates the minidumps directly on startup, and instead loads it lazily.

browser_startup_mainthreadio.js wasn't working though. The test is always skipped. See Bug 1831406.

May 17, 2023

Summary of today's Session:

  • Continued to investigate the timeout for test_crash_with_memory_report.js
  • We wanted to use the jsdebugger to see what's happening. But we weren't able to use it.
mach xpcshell-test --jsdebugger toolkit/crashreporter/test/unit/test_crash_with_memory_report.js
  • Error:
ERROR Failed to initialize debugging: Error: resource://devtools appears to be inaccessible from the xpcshell environment.
This can usually be resolved by adding:
  firefox-appdir = browser
to the xpcshell.ini manifest.
It is possible for this to alter test behevior by triggering additional browser code to run, so check test behavior after making this change.
See also https://bugzil.la/1215378.
_setupDevToolsServer@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:407:11
_initDebugging@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:456:66
_execute_test@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:531:21
@-e:1:1
  console.log("!!! before process.run");
  try {
    process.run(true, args, args.length);
  } catch (ex) {
    // on Windows we exit with a -1 status when crashing.
  } finally {
    Services.env.set("CRASHES_EVENTS_DIR", "");
  }
  console.log("!!! after process.run");

May 18, 2023

pid:73242 console.log: "!!! before calling do_crash"
 0:02.31 pid:73242 console.log: "!!! Start do_crash"
 0:02.31 pid:73242 console.log: "!!! before do_get_tempdir"
 0:02.31 pid:73242 console.log: "!!! before process.run"
 0:02.40 pid:73242 console.log: "!!! before _tmpd declaration"
 0:02.40 pid:73242 console.log: "!!! after _tmpd declaration"
 0:02.40 pid:73242 console.log: "!!! before UpdateCrashEventsDir"
 0:02.40 pid:73242 console.log: "!!! after minidumpPath assignment to _tmpd"
 0:02.40 pid:73242 console.log: "!!! after shouldDelay"
 0:02.40 pid:73242 console.log: "!!! inside shouldDelay"
 0:02.40 pid:73242 console.log: "!!! before crash"
 0:02.40 pid:73242 console.log: "!!! crashType" 0

May 24, 2023

  • We're not printing out !!! after crash after CrashTestUtils.crash because the process has successfully crashed. This means the process has been killed and won't run any code after it has crashed.

  • The main problem we are trying to solve is to get more information on why test_crash_with_memory_report.js is hanging. It's challenging to get more information.

  • Troubleshooting methods

  • First we tried using the jsdebugger again. This time it worked. Added firefox-appdir = browser line right after the support files https://searchfox.org/mozilla-central/rev/2ca95198a2a0806de358a0484d96d4354e3cbaab/toolkit/crashreporter/test/unit/xpcshell.ini#8. But this did not get us far becuase process.run is in cpp.

  • Second, we tried using lldb to attched to the process id before crashing. It did not hang but failed at the assertion. Here's the stack:

  • pid:87876 console.log: "!!! before calling do_crash"
     0:02.37 pid:87876 console.log: "!!! Start do_crash"
     0:02.37 pid:87876 console.log: "!!! before do_get_tempdir"
     0:02.37 pid:87876 console.log: "!!! before process.run"
     0:02.46 pid:87876 console.log: "!!! before _tmpd declaration"
     0:02.46 pid:87876 console.log: "!!! after _tmpd declaration"
     0:02.46 pid:87876 console.log: "!!! before UpdateCrashEventsDir"
     0:02.46 pid:87876 console.log: "!!! after minidumpPath assignment to _tmpd"
     0:02.47 pid:87876 console.log: "!!! after shouldDelay"
     0:02.47 pid:87876 console.log: "!!! inside shouldDelay"
     0:02.47 pid:87876 console.log: "!!! before crash"
     0:02.47 pid:87876 console.log: "!!! crashType" 0
     0:02.47 pid:87876 console.log: "!!! process id" 87877
     2:36.62 pid:87876 console.log: "!!! after process.run"
     2:36.62 pid:87876 console.log: "!!! before handleMinidump"
     2:36.62 pid:87876 console.log: "!!! inside handleMinidump"
     2:36.62 pid:87876 console.log: "!!! minidump" ({})
     2:36.62 PASS run_test - [run_test : 13] true == true
     2:36.62 INFO (xpcshell/head.js) | test run_next_test 0 finished (2)
     2:36.62 FAIL run_test - [run_test : 45] false == true
    /Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/test_crash_with_memory_report.js:run_test/<:45
    /Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/head_crashreporter.js:handleMinidump:201
    /Users/mandycheang/central-full-build/testing/xpcshell/head.js:_do_main:241
    /Users/mandycheang/central-full-build/testing/xpcshell/head.js:_execute_test:588
    -e:null:1
     2:36.62 INFO exiting test
     2:36.62 ERROR Unexpected exception NS_ERROR_ABORT:
    _abort_failed_test@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:866:20
    do_report_result@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:967:5
    Assert<@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:71:21
    Assert.prototype.report@resource://testing-common/Assert.sys.mjs:240:10
    Assert.prototype.ok@resource://testing-common/Assert.sys.mjs:266:10
    run_test/<@/Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/test_crash_with_memory_report.js:45:14
    handleMinidump@/Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/head_crashreporter.js:201:11
    _do_main@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:241:6
    _execute_test@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:588:5
    @-e:1:1
    
     2:36.62 INFO exiting test
     2:36.63 TEST_END: Test FAIL. Subtests passed 1/2. Unexpected 1
     2:36.63 INFO toolkit/crashreporter/test/unit/test_crash_with_memory_report.js failed or timed out, will retry.
     2:36.63 INFO INFO | Result summary:
     2:36.63 INFO INFO | Passed: 0
     2:36.63 INFO INFO | Failed: 0
     2:36.63 INFO INFO | Todo: 0
     2:36.63 INFO INFO | Retried: 0
     2:36.63 SUITE_END
     2:36.63 INFO Node moz-http2 server shutting down ...
     2:36.63 INFO http3Server server shutting down ...
     2:36.74
    Overall Summary
    ===============
    
    xpcshell
    ~~~~~~~~
    Ran 3 checks (2 subtests, 1 tests)
    Expected results: 2
    Unexpected results: 1
      subtest: 1 (1 fail)
    
    Unexpected Results
    ------------------
    toolkit/crashreporter/test/unit/test_crash_with_memory_report.js
      FAIL run_test - [run_test : 45] false == true
    /Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/test_crash_with_memory_report.js:run_test/<:45
    /Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/head_crashreporter.js:handleMinidump:201
    /Users/mandycheang/central-full-build/testing/xpcshell/head.js:_do_main:241
    /Users/mandycheang/central-full-build/testing/xpcshell/head.js:_execute_test:588
    -e:null:1
    ERROR Unexpected exception NS_ERROR_ABORT:
    _abort_failed_test@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:866:20
    do_report_result@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:967:5
    Assert<@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:71:21
    Assert.prototype.report@resource://testing-common/Assert.sys.mjs:240:10
    Assert.prototype.ok@resource://testing-common/Assert.sys.mjs:266:10
    run_test/<@/Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/test_crash_with_memory_report.js:45:14
    handleMinidump@/Users/mandycheang/central-full-build/obj-opt/_tests/xpcshell/toolkit/crashreporter/test/unit/head_crashreporter.js:201:11
    _do_main@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:241:6
    _execute_test@/Users/mandycheang/central-full-build/testing/xpcshell/head.js:588:5
    @-e:1:1
    
  • We want to find out more information on why test_crash_with_memory_report.js hangs. We might need to see the stack trace before the process crashes. Since I'm on Mac OS, we were thinking about finding the code when the crashed process is sent SIGSEGV to terminate.

  • Resources on Signals

  • After reading these two links, we think we will need SIGSEGV code because our crash is based on invalid memory access

  • Maybe what we are looking for is in here: https://searchfox.org/mozilla-central/search?q=SIGSEGV&path=breakpad&case=false&regexp=false

  • I'm going to ask Gabriele for some hints here because I'm not too familiar with the inner workings of Crashreporter/breakpad-client

I had a look at the breakpad-client code and, as best as I can tell, the entry point to the Mac equivalent of a "signal handler" for an invalid memory access is here. It would be interesting to add some logging there to see if the crashing process gets there.

(Later on in that function there is a call to WriteMinidumpWithException(), that could be where things are going wrong.)

Based on our discussion today, a potential idea to try could be to move the call to NS_GetSpecialDirectory() to a function that gets called on startup (for example, from the same place in XRE_mainStartup() where MakeOrSetMinidumpPath() was called before the patch), and have it store the result for later consumption by the minidump callback.

I'm not sure yet about the details, e.g. whether we can store the result as an nsCOMPtr<nsIFile> or if we should just store a path instead.

Attachment #9074858 - Attachment description: Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?mconley,gsvelto,botond → Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?gsvelto,botond
Attachment #9074858 - Attachment description: Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?gsvelto,botond → WIP: Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?gsvelto,botond

I've pushed my latest update on Phabricator. The patch works while manually testing it, but there's many many test failures.

The latest test I fixed is: browser_browser_toolbox_debugger.js

After fixing that test I pushed to try again and it's not looking good: TRY ~1900 failures.
The failures might be due to a small set of common causes, but at this point it's hard to predict how much time would be needed to fix all of them.

Since it’s been several months of small effort in fixing the tests, there's still quite a lot of failures. The performance gain on this ticket is very small so for the effort vs benefit, it’s not worth continuing at this point I think.

The solution on the patch is still a valid approach but tests need to be fixed moving forward if anyone has time to continue the work on it. As for myself, I am moving on to higher priority items on search team.

Priority: P1 → P3
Attachment #9074858 - Attachment is obsolete: true
Attachment #9074858 - Attachment is obsolete: false
Attachment #9074858 - Attachment description: WIP: Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?gsvelto,botond → Bug 1541603 - remove creation of minidumps folder on startup, create it lazily in crash reporter. r?gsvelto,botond
Pushed by fqueze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6c295712b30c
remove creation of minidumps folder on startup, create it lazily in crash reporter. r=gsvelto,geckoview-reviewers
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 126 Branch

Backed out because we observed crash reports which failed to write minidumps after this landed.

Example crash report: https://crash-stats.mozilla.org/report/index/507ee3d8-a4dd-4a36-9901-22fe80240324

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0  xul.dll  CrashReporter::PairedDumpCallback  toolkit/crashreporter/nsExceptionHandler.cpp:3919
1  xul.dll  google_breakpad::ExceptionHandler::WriteMinidumpForChild  toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc:889
2  xul.dll  CrashReporter::CreateMinidumpsAndPair  toolkit/crashreporter/nsExceptionHandler.cpp:3993
3  xul.dll  mozilla::ipc::CrashReporterHost::GenerateMinidumpAndPair<mozilla::dom::ContentParent>  ipc/glue/CrashReporterHost.h:78
4  xul.dll  mozilla::dom::ContentParent::GeneratePairedMinidump  dom/ipc/ContentParent.cpp:4514
5  xul.dll  mozilla::dom::ContentParent::KillHard  dom/ipc/ContentParent.cpp:4562
6  xul.dll  nsTimerImpl::Fire::<lambda_14>::operator const  xpcom/threads/nsTimerImpl.cpp:681
6  xul.dll  mozilla::detail::VariantImplementation<unsigned char, 3, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback>::matchN  mfbt/Variant.h:309
6  xul.dll  mozilla::detail::VariantImplementation<unsigned char, 2, nsCOMPtr<nsIObserver>, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback>::matchN  mfbt/Variant.h:318
6  xul.dll  mozilla::detail::VariantImplementation<unsigned char, 1, nsCOMPtr<nsITimerCallback>, nsCOMPtr<nsIObserver>, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback>::matchN  mfbt/Variant.h:318
Status: RESOLVED → REOPENED
Crash Signature: [@ CrashReporter::PairedDumpCallback]
Flags: needinfo?(mcheang)
Resolution: FIXED → ---
Target Milestone: 126 Branch → ---

I added the obvious NS_ENSURE_SUCCESS with https://phabricator.services.mozilla.com/D36370?vs=840457&id=840539#toc to avoid the crash mentioned in comment 29, but I'm not sure if it's expected that we actually hit that error case.

Gabriele, do you know if this is expected or if relanding the path with the additional error handling might just hide a real problem and make us lose some crash reports?

Flags: needinfo?(gsvelto)

The NS_ENSURE_SUCCESS call should be added, but this code normally works so there's also something else that needs to be investigated. The last error value in the crashes is ERROR_INVALID_PARAMETER indicating that maybe we passed something wrong to CreateFileFromPath() that ultimately caused a system-level function to fail. One way to test this is to set a minuscule value to the process shutdown timeout (see here) so that a content process shutting down is always killed (and a minidump taken).

Flags: needinfo?(gsvelto)
Flags: needinfo?(mcheang)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

:mcheang, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(mcheang)
Keywords: topcrash

The crash was caused by the patch that was backed out.

Flags: needinfo?(mcheang)
Assignee: mcheang → nobody

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: