Closed Bug 1024669 Opened 10 years ago Closed 7 years ago

Send the name of threads

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: Yoric, Assigned: cyu)

References

(Depends on 1 open bug)

Details

(Whiteboard: [ps-radar])

Attachments

(3 files, 4 obsolete files)

At the moment, we send thread stacks with crash reports. Would it also be possible to also send the names of the threads?
If you can gather it in a way that's memory-safe at crash time, then yes. Note that on Windows thread names aren't stored anywhere, they're just gathered by the debugger at thread creation time.
This WIP patch adds a new field in the .extra file when crash happens:

ThreadIdNameMapping=6524:"Timer",4124:"Link Monitor",7664:"Socket Thread",7072:"JS Watchdog",13488:"Hang Monitor",3436:"BgHangManager",(omitted for brevity)
This patch makes minidump_stackwalk output the native thread id like:

Thread 55(thread_id=13340)
 0  ntdll.dll + 0xa6154
    rax = 0x00000000000000e5   rdx = 0x00000286b76005d0
... (omitted)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #2)
> Created attachment 8827086 [details] [diff] [review]
> WIP for sending thread id -> thread name mapping
How the patch works:
* ntdll!NtCreateThreadEx is interposed by CrashReporter::InterposedNtCreateThreadEx() to create an instance of struct ThreadInfo and store the pointer to a TLS variable. The instance is added to a linked list for traversal when crash happens.
* Wrap PR_SetCurrentThreadName() with NS_SetCurrentThreadName(), which calls CrashReporter::SetCurrentThreadName() to set thread id and name to the ThreadInfo instance for the current thread.
* When crash happens, nsExceptionHandler calls CrashReporter::GetThreadInfoAnnotation() to traverse the ThreadInfo instances and build a serialized thread id -> thread name mapping and save it to the .extra file.
Todos:
* Support all process types/platforms
* Patch to socorro to show thread names.
Ted, Gabriele, what do you think about the proposed approach for sending thread names in the crash report? Thanks.
Flags: needinfo?(ted)
Flags: needinfo?(gsvelto)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #6)
> Ted, Gabriele, what do you think about the proposed approach for sending
> thread names in the crash report? Thanks.

The general approach seems sound to me but I feel a bit uneasy having the code walk a linked-list after a crash. I'd rather lump every mapping into a string and just write it out upon a crash. Since you're already handling a global structure with synchronization and all I think it might be feasible within SetCurrentThreadName(). Cleanup might be a little more complicated since you'd need to strip the mapping info from the string when a thread dies but overall I think it would be safer.
Flags: needinfo?(gsvelto)
Assignee: nobody → cyu
Attachment #8827086 - Attachment is obsolete: true
Attachment #8833896 - Flags: review?(gsvelto)
Comment on attachment 8833896 [details] [diff] [review]
Annotate crash reports with thread names

Review of attachment 8833896 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good overall but I'm puzzled about the way you remove an annotation when a thread exit, see my comment below about it. I'm also not 100% sure I can review this on my own so I'll ping Ted later and also give this patch a spin on my machine before finishing the review.

::: toolkit/crashreporter/ThreadAnnotation.cpp
@@ +1,4 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +

You should add #include "ThreadAnnotation.h"

@@ +11,5 @@
> +
> +#ifdef MOZ_WIDGET_ANDROID
> +#include <pthread.h>
> +#endif
> +#include <stddef.h>

nit: The system include should go before the mozilla-specific ones

@@ +85,5 @@
> +          end <= mData.Length())) {
> +      return;
> +    }
> +
> +    mData.Cut(begin, end - begin);

I may be wrong on this one, but doesn't Cut() actually remove the specified part of the buffer? If that's the case then after doing this the following ThreadAnnotationSpan structures will be invalid since their offsets were based on the size of the uncut string.

@@ +108,5 @@
> +//  template<typename U>
> +//  MOZ_IMPLICIT DeleteWithLock(const DeleteWithLock<U>& aOther,
> +//                             typename mozilla::EnableIf<mozilla::IsConvertible<U*, T*>::value,
> +//                                               int>::Type aDummy = 0)
> +//  {}

nit: Remove the commented-out code

@@ +173,5 @@
> +  }
> +
> +#ifdef MOZ_WIDGET_ANDROID
> +  int rv = pthread_key_create(&sTLSThreadInfoKey, &ThreadExitCallback);
> +  if (rv) {

nit: An NS_WARN_IF() check would be appropriate here; since it's very unlikely that this fails it's good to report it

::: toolkit/crashreporter/nsExceptionHandler.h
@@ +281,5 @@
>  
>  #endif
> +
> +// Annotates the crash report with the name of the calling thread.
> +void SetCurrentThreadName(const char* aName);

nit: Leave an empty line here

::: toolkit/crashreporter/test/unit/test_crash_thread_annotation.js
@@ +1,3 @@
> +function run_test() {
> +  if (!("@mozilla.org/toolkit/crash-reporter;1" in Components.classes)) {
> +    dump("INFO | test_crash_oom.js | Can't test crashreporter in a non-libxul build.\n");

nit: test_crash_thread_annotation.js
(In reply to Gabriele Svelto [:gsvelto] from comment #10)
> Comment on attachment 8833896 [details] [diff] [review]
> Annotate crash reports with thread names
> 
> Review of attachment 8833896 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> 
> @@ +85,5 @@
> > +          end <= mData.Length())) {
> > +      return;
> > +    }
> > +
> > +    mData.Cut(begin, end - begin);
> 
> I may be wrong on this one, but doesn't Cut() actually remove the specified
> part of the buffer? If that's the case then after doing this the following
> ThreadAnnotationSpan structures will be invalid since their offsets were
> based on the size of the uncut string.
> 
You're right. That would cause OOB access to the string data. It'll need to readjust ThreadAnnotationSpan instances after we cut the string. I'll update with that.
Attachment #8833896 - Attachment is obsolete: true
Attachment #8833896 - Flags: review?(gsvelto)
Attachment #8834347 - Flags: review?(ted)
Attachment #8834347 - Flags: review?(gsvelto)
Comment on attachment 8834347 [details] [diff] [review]
Annotate crash reports with thread names (v2)

Review of attachment 8834347 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with just a couple of nits.

BTW I'd like to recommend splitting this kind of patch in two for ease of review with one patch introducing the new API and the other applying it everywhere. I've read it recently in the mozreview guide and it's quite helpful: https://mozilla-version-control-tools.readthedocs.io/en/latest/mozreview/commits.html#prefer-more-smaller-commits-over-large-monolithic-commits

::: toolkit/crashreporter/ThreadAnnotation.cpp
@@ +116,5 @@
> +    mData.Cut(begin, cutLength);
> +
> +    // Adjust the ThreadAnnotationSpan affected by data shifting.
> +    size_t index = mDataSpans.BinaryIndexOf(&aThreadInfo,
> +                                        ThreadAnnotationSpan::Comparator());

nit: Indentation

@@ +141,5 @@
> +  // The flat representation of thread annotations.
> +  nsCString mData;
> +
> +  // This array tracks the created ThreadAnnotationSpan instances so that we
> +  // can make adjustments accordingly when we

nit: The phrase seems cut in half
Attachment #8834347 - Flags: review?(gsvelto) → review+
Comment on attachment 8834348 [details] [diff] [review]
Test cases for thread name annotations in the crash report

Review of attachment 8834348 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thanks for the excellent test coverage!

::: toolkit/crashreporter/test/gtest/TestCrashThreadAnnotation.cpp
@@ +84,5 @@
> +    aMonitor.NotifyAll();
> +  });
> +  nsCOMPtr<nsIThread> thread;
> +  nsresult rv = NS_NewThread(getter_AddRefs(thread), setNameRunnable);
> +  mozilla::Unused << rv;

nit: You can just write

Unused << NS_NewThread(getter_AddRefs(thread), setNameRunnable);
Attachment #8834348 - Flags: review?(gsvelto) → review+
Flags: needinfo?(ted)
Sorry for nagging by resetting ni flag. Ted, will you have time for reviewing attachment 8834347 [details] [diff] [review]? Or is there anyone you can delegate the request to? Thanks.
Flags: needinfo?(ted)
Whiteboard: [ps-radar]
Comment on attachment 8834347 [details] [diff] [review]
Annotate crash reports with thread names (v2)

Review of attachment 8834347 [details] [diff] [review]:
-----------------------------------------------------------------

The concept seems sound, but it feels like things are overly complicated by trying to serialize the data to a string. I think this data would be better off inside the minidump, which shouldn't be too hard to do. In that case you could just store (thread id, name) pairs and have the minidump writing code write them out to a separate stream.

This should be very easy to do on Linux/Mac, and slightly more of a pain on Windows, where the API of MinidumpWriteDump is a little more limited (you have to provide the full data of extra streams up front).

For Linux you could just add an extra function here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/linux/minidump_writer/minidump_writer.cc#170

For Mac you'd add it here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/mac/handler/minidump_generator.cc#224

For Windows you'd add an extra stream here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc#926

...and then fill in the data.

To get the data from those places it'd be fine to add a `#include "ThreadAnnotation.h"` to the client code nowadays.

You'll have to pick an arbitrary value for the stream type, Breakpad has a list of reserved names here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/google-breakpad/src/google_breakpad/common/minidump_format.h#337

...but we could just invent one with a similar convention and use it in our code. We'll have to add some code to the Socorro stackwalker to pull the thread names out and display them.

::: toolkit/xre/nsEmbedFunctions.cpp
@@ +296,5 @@
> +{
> +#if defined(XP_WIN) || defined(XP_MACOSX) || defined(OS_LINUX)
> +  return CrashReporter::UnsetRemoteExceptionHandler();
> +#else
> +#  error "OOP crash reporter unsupported on this platform"

I don't think the ifdef adds much here, unsupported platforms will error elsewhere in crashreporter code.
Attachment #8834347 - Flags: review?(ted)
Flags: needinfo?(ted)
...I realize that I am giving a conflicting review to what gsvelto said. I apologize, I don't want to make busywork for you. I do think this data would fit better in the minidump itself, but I don't want to make you rewrite it all after having already done so in response to gsvelto's review.

With that in mind, you can land this as-is and we can always change things in the future.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> With that in mind, you can land this as-is and we can always change things
> in the future.

Yeah, if you don't mind the patch as it is we can make a follow up with the adjustments to Breakpad. Maybe in another language. After all we want to rewrite this stuff anyway, don't we ;-)
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a52d887bceaf68b568bf865ba2e0d065465cee6
Bug 1024669 - Part 1: Annotate crash reports with thread names. r=gsvelto

https://hg.mozilla.org/integration/mozilla-inbound/rev/cf7d1b64cc12699be55b914ac3d0cd29f82f7a49
Bug 1024669 - Part 2: Test cases for thread name annotation in the crash report. r=gsvelto
Got the warning as error on Windows if we declare friend class referring to a struct. Will fix that and repush.
Flags: needinfo?(cyu)
https://hg.mozilla.org/integration/mozilla-inbound/rev/7fdcabdbb67559377ccf624f648019944d2c8e6f
Bug 1024669 - Part 1: Annotate crash reports with thread names. r=gsvelto

https://hg.mozilla.org/integration/mozilla-inbound/rev/a2b8da35820b66916f0afa28be9289efa55f0908
Bug 1024669 - Part 2: Test cases for thread name annotation in the crash report. r=gsvelto
Backed out

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a2b8da35820b66916f0afa28be9289efa55f0908&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable

Failure log talos: https://treeherder.mozilla.org/logviewer.html#?job_id=90172688&repo=mozilla-inbound
09:51:32     INFO -  TEST-START | tresize
09:51:32     INFO -  operating with platform_type : linux_
09:51:32     INFO -  Initialising browser for tresize test...
09:51:32     INFO -  TEST-INFO | started process 20607 (/builds/slave/test/build/application/firefox/firefox -profile /tmp/tmpVFfejN/profile http://localhost:52033/getInfo.html)
09:51:32     INFO -  PID 20607 | XPCOMGlueLoad error for file /builds/slave/test/build/application/firefox/libxul.so:
09:51:32     INFO -  PID 20607 | /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.7' not found (required by /builds/slave/test/build/application/firefox/libxul.so)
09:51:32     INFO -  PID 20607 | Couldn't load XPCOM.
09:51:32     INFO -  TEST-INFO | 20607: exit 255
09:51:32     INFO -  Could not find __metrics(.*)__metrics in browser output
09:51:32     INFO -  Raw results:XPCOMGlueLoad error for file /builds/slave/test/build/application/firefox/libxul.so:
09:51:32     INFO -  /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.7' not found (required by /builds/slave/test/build/application/firefox/libxul.so)
09:51:32     INFO -  Couldn't load XPCOM.
09:51:32     INFO -  TEST-UNEXPECTED-ERROR | tresize | browser failed to close after being initialized
09:51:32    ERROR -  Traceback (most recent call last):
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/run_tests.py", line 202, in run_tests
09:51:32     INFO -      talos_results.add(mytest.runTest(browser_config, test))
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 69, in runTest
09:51:32     INFO -      with FFSetup(browser_config, test_config) as setup:
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/ffsetup.py", line 162, in __enter__
09:51:32     INFO -      self._run_profile()
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/ffsetup.py", line 138, in _run_profile
09:51:32     INFO -      raise TalosError("browser failed to close after being initialized")
09:51:32     INFO -  TalosError: browser failed to close after being initialized

Failure log mda: https://treeherder.mozilla.org/logviewer.html#?job_id=90173258&repo=mozilla-inbound
> dom/media/test/test_eme_stream_capture_blocked_case2.html | This test left crash dumps behind, but we weren't expecting it to

Failure log GTest: https://treeherder.mozilla.org/logviewer.html#?job_id=90178875&repo=mozilla-inbound
> TEST-UNEXPECTED-FAIL | TestCrashThreadAnnotation.TestGetFlatThreadAnnotation_ShutdownOneThread | Value of: !strstr(aAnnotation, "Thread1")

Please check the push for more failures and do a Try run with tests. Thank you
Flags: needinfo?(cyu)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #28)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3eb0214912b117580cc073ac2d0c9d1e03472379

Shutdown the created named threads after each test case completes. Also don't run the test cases on Windows because the PR_JoinThread() called by nsThread::Shutdown() doesn't wait until the native thread is destroyed, which is a required by the test cases.
Flags: needinfo?(cyu)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #35)
> Created attachment 8859451 [details] [diff] [review]
> Annotate crash reports with thread names (v3)

Changes to v2:
* Use NSPR TLS functions instead of C++11 thread_local keyword and pthread TLS functions. I make this change for several reasons:
1. Talos bustage that libxul cannot be loaded because of unsupported ABI. I guess the machines running Talos has older libstdc++.so.6, which doesn't support the newer ABI for using the thread_local keyword. This indicates that we might break some linux users after upgrade.
2. Destructor functions registered using the thread_local keyword gets called *after* PR_JoinThread() (called internally in nsThread::Shutdown()). Using thread_local forces me to disable the gtest test cases for this bug on Windows.

* Unhook InitThreadAnnotation() from crash reporter in the child process. CrashReporter::UnsetRemoteExceptionHandler() is not called in the child process. In previous revision I added it back in the child process, but media plugin process has SIGSYS crashes during shutdown. So I unhook init/shutdown of thread annotation from the crash reporter and use RAII instead.

* A minor change: move the declaration of NS_SetCurrentThreadName() in nsThreadUtils.h.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #36)
> Created attachment 8859452 [details] [diff] [review]
> Test cases for thread name annotations in the crash report (v2)

Changes to v1:
* Shutdown the threads created in each test case.
Comment on attachment 8859451 [details] [diff] [review]
Annotate crash reports with thread names (v3)

Gabriele, I'd like you to look at this patch again because it has nontrivial changes. What and why of the changes are in comment #37.

And sorry that it's also a monolithic patch for you to interdiff more easily.
Attachment #8859451 - Flags: review?(gsvelto)
Comment on attachment 8859452 [details] [diff] [review]
Test cases for thread name annotations in the crash report (v2)

Carryover r+.
Attachment #8859452 - Flags: review+
Attachment #8834347 - Attachment is obsolete: true
Attachment #8834348 - Attachment is obsolete: true
Comment on attachment 8859451 [details] [diff] [review]
Annotate crash reports with thread names (v3)

I've given this a brief spin on my machine and looked at the interdiff and everything looks fine to me.
Attachment #8859451 - Flags: review?(gsvelto) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/fcd98542af390cfb873aee712e43388c779b07e6
Bug 1024669 - Part 1: Annotate crash reports with thread names. r=gsvelto

https://hg.mozilla.org/integration/mozilla-inbound/rev/1105f666b01948b7bac4bf3566402baf2359fdb4
Bug 1024669 - Part 2: Test cases for thread name annotation in the crash report. r=gsvelto
https://hg.mozilla.org/mozilla-central/rev/fcd98542af39
https://hg.mozilla.org/mozilla-central/rev/1105f666b019
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Blocks: 1358457
Blocks: 1365203
Depends on: 1376773
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: