Closed Bug 1588877 Opened 5 years ago Closed 5 years ago

Intermittent GECKO(2361) | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down

Categories

(Toolkit :: Crash Reporting, defect, P5)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure)

Filed by: rmaries [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=271355925&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/bqHnXrq3SDCEhkeQYcrthQ/runs/0/artifacts/public/logs/live_backing.log


[task 2019-10-15T17:33:15.370Z] 17:33:15 INFO - TEST-OK | dom/cache/test/mochitest/test_cache_orphaned_cache.html | took 442ms
[task 2019-10-15T17:33:15.406Z] 17:33:15 INFO - TEST-START | dom/cache/test/mochitest/test_cache_overwrite.html
[task 2019-10-15T17:33:15.845Z] 17:33:15 INFO - GECKO(2361) | firefox: ../nptl/pthread_mutex_lock.c:144: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
[task 2019-10-15T17:33:15.873Z] 17:33:15 INFO - GECKO(2361) | [Parent 2361, Gecko_IOThread] WARNING: pipe error (75): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 358
[task 2019-10-15T17:33:15.937Z] 17:33:15 ERROR - GECKO(2361) | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2019-10-15T17:33:16.061Z] 17:33:16 INFO - GECKO(2361) | JavaScript error: resource://gre/modules/IndexedDB.jsm, line 370: InvalidStateError: A mutation operation was attempted on a database that did not allow mutations.
[task 2019-10-15T17:33:16.108Z] 17:33:16 INFO - GECKO(2361) | JavaScript error: resource://gre/modules/IndexedDB.jsm, line 370: InvalidStateError: A mutation operation was attempted on a database that did not allow mutations.
[task 2019-10-15T17:33:16.566Z] 17:33:16 INFO - GECKO(2361) | 1571160796562 Marionette TRACE Received observer notification xpcom-will-shutdown
[task 2019-10-15T17:33:16.567Z] 17:33:16 INFO - GECKO(2361) | 1571160796563 Marionette INFO Stopped listening on port 2828
[task 2019-10-15T17:33:16.568Z] 17:33:16 INFO - GECKO(2361) | 1571160796563 Marionette DEBUG Remote service is inactive
[task 2019-10-15T17:33:16.869Z] 17:33:16 INFO - GECKO(2361) | -----------------------------------------------------
[task 2019-10-15T17:33:16.870Z] 17:33:16 INFO - GECKO(2361) | Suppressions used:
[task 2019-10-15T17:33:16.870Z] 17:33:16 INFO - GECKO(2361) | count bytes template
[task 2019-10-15T17:33:16.871Z] 17:33:16 INFO - GECKO(2361) | 27 832 nsComponentManagerImpl
[task 2019-10-15T17:33:16.871Z] 17:33:16 INFO - GECKO(2361) | 611 17713 libfontconfig.so
[task 2019-10-15T17:33:16.873Z] 17:33:16 INFO - GECKO(2361) | 1 29 libglib-2.0.so
[task 2019-10-15T17:33:16.874Z] 17:33:16 INFO - GECKO(2361) | -----------------------------------------------------
[task 2019-10-15T17:33:18.363Z] 17:33:18 INFO - GECKO(2361) | -----------------------------------------------------
[task 2019-10-15T17:33:18.364Z] 17:33:18 INFO - GECKO(2361) | Suppressions used:
[task 2019-10-15T17:33:18.365Z] 17:33:18 INFO - GECKO(2361) | count bytes template
[task 2019-10-15T17:33:18.365Z] 17:33:18 INFO - GECKO(2361) | 24 728 nsComponentManagerImpl
[task 2019-10-15T17:33:18.366Z] 17:33:18 INFO - GECKO(2361) | 4 832 mozJSComponentLoader::LoadModule
[task 2019-10-15T17:33:18.367Z] 17:33:18 INFO - GECKO(2361) | 611 17509 libfontconfig.so
[task 2019-10-15T17:33:18.367Z] 17:33:18 INFO - GECKO(2361) | 1 29 libglib-2.0.so
[task 2019-10-15T17:33:18.368Z] 17:33:18 INFO - GECKO(2361) | -----------------------------------------------------
[task 2019-10-15T17:33:18.464Z] 17:33:18 INFO - TEST-INFO | Main app process: exit 0

Maybe it helps but in a ccov build I can see the following failure:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=285527261&repo=mozilla-central&lineNumber=32686

[task 2020-01-19T10:31:49.814Z] 10:31:49 INFO - JavaScript error: resource://gre/modules/CrashService.jsm, line 193: NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsICrashReporter.getMinidumpForID]

Which means that the crash report hasn't been written to disk. Gabriele, do you have an idea?

For now moving over to crash reporting component, which seems to be more appropriate here.

Component: DOM: Service Workers → Crash Reporting
Flags: needinfo?(gsvelto)
Product: Core → Toolkit
See Also: → 1536796

Reads familiar with bug 1536796.

Thanks Henrik; I'll have to investigate that ccov error because it seems like the minidump wasn't written at all. It might be that we have configurations where we call CrashService.addCrash() after a crash w/o having written a minidump. The other stack traces for this bug are all different so they're probably different issues.

Flags: needinfo?(gsvelto)

Ah ok. So maybe best file a new bug for the ccov issue and making it blocking this one.

Flags: needinfo?(gsvelto)

Sorry this fell through the cracks.

I had another look and my analysis in comment 15 is likely wrong. In fact I'm a bit puzzled as to why these errors show up under this bug. The actual issues are different depending on the tests:

  • This one and this other one have the same assertion failure:

    18:26:59     INFO - Assertion failure: [GFX1]: [D2D1.1] 3CreateBitmap failure Size(153544,153561) Code: 0x8007000e format 0, at z:/build/build/src/obj-firefox/dist/include/mozilla/gfx/Logging.h:749
    

    However the line is marked with INFO rather than ERROR. Scrolling the logs further there's the actual crash entry:

    18:27:12     INFO - REFTEST PROCESS-CRASH | gfx/tests/crashtests/1470437.html | application crashed [@ mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::WriteLog(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const&)]
    

    Which is again marked as INFO

  • These two also have a different problem which is in this line:

    13:23:25     INFO - GECKO(3210) | ThreadSanitizer: nested bug in the same thread, aborting.
    

    But again it's marked as INFO and only the other line is marked as ERROR

Do you know which script is responsible for parsing these logs and deciding where are the errors? I think that the solution here is to correctly mark the error lines that we're currently missing as ERROR and mark the A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down ones as INFO because they don't represent a crash. They're just the main process informing us that one of the content processes crashed and it will be shutting down too because MOZ_CRASHREPORTER_SHUTDOWN was set.

Once that's fixed we should be able to tell these issues apart and file separate bugs for each of them.

Flags: needinfo?(gsvelto) → needinfo?(hskupin)

(In reply to Gabriele Svelto [:gsvelto] from comment #23)

Do you know which script is responsible for parsing these logs and deciding where are the errors? I think that the solution here is to correctly mark the error lines that we're currently missing as ERROR and mark the A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down ones as INFO because they don't represent a crash. They're just the main process informing us that one of the content processes crashed and it will be shutting down too because MOZ_CRASHREPORTER_SHUTDOWN was set.

As far as I can remember Andrew made those changes a while ago. But I cannot find the bug right now.

Flags: needinfo?(hskupin) → needinfo?(ahal)

Oh I can see now. It was on bug 1539449. Maybe some code needs to be adjusted?

Yes, that's exactly what I was looking for! We just need to add the necessary entries for the lines that aren't being recognized correctly.

Andrew does comment 23 give you enough information to adjust the regexps that are used to detect the assertions and crashes?

I don't think the INFO vs ERROR thing is relevant. Looking at all the logs you linked, the proper error is showing up in the treeherder errorsummary. Fyi we use structured logs to propagate assertions/crashes up (for most suites). Those mozharness regexes are just fallbacks for suites that don't support structured logging.

The issue is that this bug has a very generic title, so sheriffs are starring anything that crashes against it. I believe we force-crash the browser if we hit an assertion too, so that's why those are also appearing here. Basically anything with the string "A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down".

So the solution is to file separate bugs for each underlying issue, and ask the sheriffs to star each issue against the appropriate bug (possibly even closing this to prevent further starring here).

Flags: needinfo?(ahal)

Thinking about it, we should not be filing intermittents here at all. Each crash is its own potentially unique issue. So if we hit a new intermittent crash, it'll likely get lost in all the noise here and we'll never know about it.

Sheriffs: For issues that have this line, please see if there is another error in the errorsummary that could be more topical. If this is the only thing that appears in the errorsummary, I guess you can re-open this. But I'm closing it for now to prevent this from showing up as a related bug.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID

Thanks for clearing that up Andrew, this info is very useful to me.

You need to log in before you can comment on or make changes to this bug.