Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Send the name of threads

RESOLVED FIXED in Firefox 55

Status

()

Toolkit
Crash Reporting
RESOLVED FIXED
3 years ago
2 months ago

People

(Reporter: Yoric, Assigned: cyu)

Tracking

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: [ps-radar])

Attachments

(3 attachments, 4 obsolete attachments)

At the moment, we send thread stacks with crash reports. Would it also be possible to also send the names of the threads?
If you can gather it in a way that's memory-safe at crash time, then yes. Note that on Windows thread names aren't stored anywhere, they're just gathered by the debugger at thread creation time.
(Assignee)

Comment 2

6 months ago
Created attachment 8827086 [details] [diff] [review]
WIP for sending thread id -> thread name mapping

This WIP patch adds a new field in the .extra file when crash happens:

ThreadIdNameMapping=6524:"Timer",4124:"Link Monitor",7664:"Socket Thread",7072:"JS Watchdog",13488:"Hang Monitor",3436:"BgHangManager",(omitted for brevity)
(Assignee)

Comment 3

6 months ago
Created attachment 8827087 [details] [diff] [review]
WIP for output thread id in minidump_stackwalk

This patch makes minidump_stackwalk output the native thread id like:

Thread 55(thread_id=13340)
 0  ntdll.dll + 0xa6154
    rax = 0x00000000000000e5   rdx = 0x00000286b76005d0
... (omitted)
(Assignee)

Comment 4

6 months ago
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #2)
> Created attachment 8827086 [details] [diff] [review]
> WIP for sending thread id -> thread name mapping
How the patch works:
* ntdll!NtCreateThreadEx is interposed by CrashReporter::InterposedNtCreateThreadEx() to create an instance of struct ThreadInfo and store the pointer to a TLS variable. The instance is added to a linked list for traversal when crash happens.
* Wrap PR_SetCurrentThreadName() with NS_SetCurrentThreadName(), which calls CrashReporter::SetCurrentThreadName() to set thread id and name to the ThreadInfo instance for the current thread.
* When crash happens, nsExceptionHandler calls CrashReporter::GetThreadInfoAnnotation() to traverse the ThreadInfo instances and build a serialized thread id -> thread name mapping and save it to the .extra file.
(Assignee)

Comment 5

6 months ago
Todos:
* Support all process types/platforms
* Patch to socorro to show thread names.
(Assignee)

Comment 6

6 months ago
Ted, Gabriele, what do you think about the proposed approach for sending thread names in the crash report? Thanks.
Flags: needinfo?(ted)
Flags: needinfo?(gsvelto)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #6)
> Ted, Gabriele, what do you think about the proposed approach for sending
> thread names in the crash report? Thanks.

The general approach seems sound to me but I feel a bit uneasy having the code walk a linked-list after a crash. I'd rather lump every mapping into a string and just write it out upon a crash. Since you're already handling a global structure with synchronization and all I think it might be feasible within SetCurrentThreadName(). Cleanup might be a little more complicated since you'd need to strip the mapping info from the string when a thread dies but overall I think it would be safer.
Flags: needinfo?(gsvelto)
(Assignee)

Comment 8

5 months ago
Created attachment 8833896 [details] [diff] [review]
Annotate crash reports with thread names
Assignee: nobody → cyu
Attachment #8827086 - Attachment is obsolete: true
Attachment #8833896 - Flags: review?(gsvelto)
(Assignee)

Comment 9

5 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e547a72f134f
Comment on attachment 8833896 [details] [diff] [review]
Annotate crash reports with thread names

Review of attachment 8833896 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good overall but I'm puzzled about the way you remove an annotation when a thread exit, see my comment below about it. I'm also not 100% sure I can review this on my own so I'll ping Ted later and also give this patch a spin on my machine before finishing the review.

::: toolkit/crashreporter/ThreadAnnotation.cpp
@@ +1,4 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +

You should add #include "ThreadAnnotation.h"

@@ +11,5 @@
> +
> +#ifdef MOZ_WIDGET_ANDROID
> +#include <pthread.h>
> +#endif
> +#include <stddef.h>

nit: The system include should go before the mozilla-specific ones

@@ +85,5 @@
> +          end <= mData.Length())) {
> +      return;
> +    }
> +
> +    mData.Cut(begin, end - begin);

I may be wrong on this one, but doesn't Cut() actually remove the specified part of the buffer? If that's the case then after doing this the following ThreadAnnotationSpan structures will be invalid since their offsets were based on the size of the uncut string.

@@ +108,5 @@
> +//  template<typename U>
> +//  MOZ_IMPLICIT DeleteWithLock(const DeleteWithLock<U>& aOther,
> +//                             typename mozilla::EnableIf<mozilla::IsConvertible<U*, T*>::value,
> +//                                               int>::Type aDummy = 0)
> +//  {}

nit: Remove the commented-out code

@@ +173,5 @@
> +  }
> +
> +#ifdef MOZ_WIDGET_ANDROID
> +  int rv = pthread_key_create(&sTLSThreadInfoKey, &ThreadExitCallback);
> +  if (rv) {

nit: An NS_WARN_IF() check would be appropriate here; since it's very unlikely that this fails it's good to report it

::: toolkit/crashreporter/nsExceptionHandler.h
@@ +281,5 @@
>  
>  #endif
> +
> +// Annotates the crash report with the name of the calling thread.
> +void SetCurrentThreadName(const char* aName);

nit: Leave an empty line here

::: toolkit/crashreporter/test/unit/test_crash_thread_annotation.js
@@ +1,3 @@
> +function run_test() {
> +  if (!("@mozilla.org/toolkit/crash-reporter;1" in Components.classes)) {
> +    dump("INFO | test_crash_oom.js | Can't test crashreporter in a non-libxul build.\n");

nit: test_crash_thread_annotation.js
(Assignee)

Comment 11

5 months ago
(In reply to Gabriele Svelto [:gsvelto] from comment #10)
> Comment on attachment 8833896 [details] [diff] [review]
> Annotate crash reports with thread names
> 
> Review of attachment 8833896 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> 
> @@ +85,5 @@
> > +          end <= mData.Length())) {
> > +      return;
> > +    }
> > +
> > +    mData.Cut(begin, end - begin);
> 
> I may be wrong on this one, but doesn't Cut() actually remove the specified
> part of the buffer? If that's the case then after doing this the following
> ThreadAnnotationSpan structures will be invalid since their offsets were
> based on the size of the uncut string.
> 
You're right. That would cause OOB access to the string data. It'll need to readjust ThreadAnnotationSpan instances after we cut the string. I'll update with that.
(Assignee)

Comment 12

5 months ago
Created attachment 8834347 [details] [diff] [review]
Annotate crash reports with thread names (v2)
Attachment #8833896 - Attachment is obsolete: true
Attachment #8833896 - Flags: review?(gsvelto)
Attachment #8834347 - Flags: review?(ted)
Attachment #8834347 - Flags: review?(gsvelto)
(Assignee)

Comment 13

5 months ago
Created attachment 8834348 [details] [diff] [review]
Test cases for thread name annotations in the crash report
Attachment #8834348 - Flags: review?(gsvelto)
Comment on attachment 8834347 [details] [diff] [review]
Annotate crash reports with thread names (v2)

Review of attachment 8834347 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with just a couple of nits.

BTW I'd like to recommend splitting this kind of patch in two for ease of review with one patch introducing the new API and the other applying it everywhere. I've read it recently in the mozreview guide and it's quite helpful: https://mozilla-version-control-tools.readthedocs.io/en/latest/mozreview/commits.html#prefer-more-smaller-commits-over-large-monolithic-commits

::: toolkit/crashreporter/ThreadAnnotation.cpp
@@ +116,5 @@
> +    mData.Cut(begin, cutLength);
> +
> +    // Adjust the ThreadAnnotationSpan affected by data shifting.
> +    size_t index = mDataSpans.BinaryIndexOf(&aThreadInfo,
> +                                        ThreadAnnotationSpan::Comparator());

nit: Indentation

@@ +141,5 @@
> +  // The flat representation of thread annotations.
> +  nsCString mData;
> +
> +  // This array tracks the created ThreadAnnotationSpan instances so that we
> +  // can make adjustments accordingly when we

nit: The phrase seems cut in half
Attachment #8834347 - Flags: review?(gsvelto) → review+
Comment on attachment 8834348 [details] [diff] [review]
Test cases for thread name annotations in the crash report

Review of attachment 8834348 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thanks for the excellent test coverage!

::: toolkit/crashreporter/test/gtest/TestCrashThreadAnnotation.cpp
@@ +84,5 @@
> +    aMonitor.NotifyAll();
> +  });
> +  nsCOMPtr<nsIThread> thread;
> +  nsresult rv = NS_NewThread(getter_AddRefs(thread), setNameRunnable);
> +  mozilla::Unused << rv;

nit: You can just write

Unused << NS_NewThread(getter_AddRefs(thread), setNameRunnable);
Attachment #8834348 - Flags: review?(gsvelto) → review+
(Assignee)

Updated

5 months ago
Flags: needinfo?(ted)
(Assignee)

Comment 16

5 months ago
Sorry for nagging by resetting ni flag. Ted, will you have time for reviewing attachment 8834347 [details] [diff] [review]? Or is there anyone you can delegate the request to? Thanks.
Flags: needinfo?(ted)
Whiteboard: [ps-radar]
Comment on attachment 8834347 [details] [diff] [review]
Annotate crash reports with thread names (v2)

Review of attachment 8834347 [details] [diff] [review]:
-----------------------------------------------------------------

The concept seems sound, but it feels like things are overly complicated by trying to serialize the data to a string. I think this data would be better off inside the minidump, which shouldn't be too hard to do. In that case you could just store (thread id, name) pairs and have the minidump writing code write them out to a separate stream.

This should be very easy to do on Linux/Mac, and slightly more of a pain on Windows, where the API of MinidumpWriteDump is a little more limited (you have to provide the full data of extra streams up front).

For Linux you could just add an extra function here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/linux/minidump_writer/minidump_writer.cc#170

For Mac you'd add it here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/mac/handler/minidump_generator.cc#224

For Windows you'd add an extra stream here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc#926

...and then fill in the data.

To get the data from those places it'd be fine to add a `#include "ThreadAnnotation.h"` to the client code nowadays.

You'll have to pick an arbitrary value for the stream type, Breakpad has a list of reserved names here:
https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/google-breakpad/src/google_breakpad/common/minidump_format.h#337

...but we could just invent one with a similar convention and use it in our code. We'll have to add some code to the Socorro stackwalker to pull the thread names out and display them.

::: toolkit/xre/nsEmbedFunctions.cpp
@@ +296,5 @@
> +{
> +#if defined(XP_WIN) || defined(XP_MACOSX) || defined(OS_LINUX)
> +  return CrashReporter::UnsetRemoteExceptionHandler();
> +#else
> +#  error "OOP crash reporter unsupported on this platform"

I don't think the ifdef adds much here, unsupported platforms will error elsewhere in crashreporter code.
Attachment #8834347 - Flags: review?(ted)
Flags: needinfo?(ted)
...I realize that I am giving a conflicting review to what gsvelto said. I apologize, I don't want to make busywork for you. I do think this data would fit better in the minidump itself, but I don't want to make you rewrite it all after having already done so in response to gsvelto's review.

With that in mind, you can land this as-is and we can always change things in the future.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> With that in mind, you can land this as-is and we can always change things
> in the future.

Yeah, if you don't mind the patch as it is we can make a follow up with the adjustments to Breakpad. Maybe in another language. After all we want to rewrite this stuff anyway, don't we ;-)
(Assignee)

Comment 20

3 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a52d887bceaf68b568bf865ba2e0d065465cee6
Bug 1024669 - Part 1: Annotate crash reports with thread names. r=gsvelto

https://hg.mozilla.org/integration/mozilla-inbound/rev/cf7d1b64cc12699be55b914ac3d0cd29f82f7a49
Bug 1024669 - Part 2: Test cases for thread name annotation in the crash report. r=gsvelto
backed out for bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=90056142&repo=mozilla-inbound
Flags: needinfo?(cyu)

Comment 22

3 months ago
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fea7bf84ef7f
Backed out changeset cf7d1b64cc12 
https://hg.mozilla.org/integration/mozilla-inbound/rev/44e2cdb39bd2
Backed out changeset 7a52d887bcea for bustage
(Assignee)

Comment 23

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=844c964cb05fcffccacb353d374c55ea5e3b333c
(Assignee)

Comment 24

3 months ago
Got the warning as error on Windows if we declare friend class referring to a struct. Will fix that and repush.
Flags: needinfo?(cyu)
(Assignee)

Comment 25

3 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/7fdcabdbb67559377ccf624f648019944d2c8e6f
Bug 1024669 - Part 1: Annotate crash reports with thread names. r=gsvelto

https://hg.mozilla.org/integration/mozilla-inbound/rev/a2b8da35820b66916f0afa28be9289efa55f0908
Bug 1024669 - Part 2: Test cases for thread name annotation in the crash report. r=gsvelto
Backed out

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a2b8da35820b66916f0afa28be9289efa55f0908&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable

Failure log talos: https://treeherder.mozilla.org/logviewer.html#?job_id=90172688&repo=mozilla-inbound
09:51:32     INFO -  TEST-START | tresize
09:51:32     INFO -  operating with platform_type : linux_
09:51:32     INFO -  Initialising browser for tresize test...
09:51:32     INFO -  TEST-INFO | started process 20607 (/builds/slave/test/build/application/firefox/firefox -profile /tmp/tmpVFfejN/profile http://localhost:52033/getInfo.html)
09:51:32     INFO -  PID 20607 | XPCOMGlueLoad error for file /builds/slave/test/build/application/firefox/libxul.so:
09:51:32     INFO -  PID 20607 | /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.7' not found (required by /builds/slave/test/build/application/firefox/libxul.so)
09:51:32     INFO -  PID 20607 | Couldn't load XPCOM.
09:51:32     INFO -  TEST-INFO | 20607: exit 255
09:51:32     INFO -  Could not find __metrics(.*)__metrics in browser output
09:51:32     INFO -  Raw results:XPCOMGlueLoad error for file /builds/slave/test/build/application/firefox/libxul.so:
09:51:32     INFO -  /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.7' not found (required by /builds/slave/test/build/application/firefox/libxul.so)
09:51:32     INFO -  Couldn't load XPCOM.
09:51:32     INFO -  TEST-UNEXPECTED-ERROR | tresize | browser failed to close after being initialized
09:51:32    ERROR -  Traceback (most recent call last):
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/run_tests.py", line 202, in run_tests
09:51:32     INFO -      talos_results.add(mytest.runTest(browser_config, test))
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 69, in runTest
09:51:32     INFO -      with FFSetup(browser_config, test_config) as setup:
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/ffsetup.py", line 162, in __enter__
09:51:32     INFO -      self._run_profile()
09:51:32     INFO -    File "/builds/slave/test/build/tests/talos/talos/ffsetup.py", line 138, in _run_profile
09:51:32     INFO -      raise TalosError("browser failed to close after being initialized")
09:51:32     INFO -  TalosError: browser failed to close after being initialized

Failure log mda: https://treeherder.mozilla.org/logviewer.html#?job_id=90173258&repo=mozilla-inbound
> dom/media/test/test_eme_stream_capture_blocked_case2.html | This test left crash dumps behind, but we weren't expecting it to

Failure log GTest: https://treeherder.mozilla.org/logviewer.html#?job_id=90178875&repo=mozilla-inbound
> TEST-UNEXPECTED-FAIL | TestCrashThreadAnnotation.TestGetFlatThreadAnnotation_ShutdownOneThread | Value of: !strstr(aAnnotation, "Thread1")

Please check the push for more failures and do a Try run with tests. Thank you
Flags: needinfo?(cyu)
Changesets of the backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fc485cdc98629aa93835c05ea2bf1bc41686149f
https://hg.mozilla.org/integration/mozilla-inbound/rev/75c3e9d4404580ddc40b5e6478da4f05386a9392
(Assignee)

Comment 28

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3eb0214912b117580cc073ac2d0c9d1e03472379
(Assignee)

Comment 29

3 months ago
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #28)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3eb0214912b117580cc073ac2d0c9d1e03472379

Shutdown the created named threads after each test case completes. Also don't run the test cases on Windows because the PR_JoinThread() called by nsThread::Shutdown() doesn't wait until the native thread is destroyed, which is a required by the test cases.
Flags: needinfo?(cyu)
(Assignee)

Comment 30

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=19c93319933c2842f3f6e06e731de95ad98f9c0a
(Assignee)

Comment 31

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3eff33e6f0c771814ff81623c8b25c999eaeac9f
(Assignee)

Comment 32

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=48cf0a697ed36d9e7272b714096705f20e76442c
(Assignee)

Comment 33

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fee2e26c50a630412ae70bf648290fca9a17b728
(Assignee)

Comment 34

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5ed4b4e230b99658a7954a38fedc9d3aef3afeb0
(Assignee)

Comment 35

3 months ago
Created attachment 8859451 [details] [diff] [review]
Annotate crash reports with thread names (v3)
(Assignee)

Comment 36

3 months ago
Created attachment 8859452 [details] [diff] [review]
Test cases for thread name annotations in the crash report (v2)
(Assignee)

Comment 37

3 months ago
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #35)
> Created attachment 8859451 [details] [diff] [review]
> Annotate crash reports with thread names (v3)

Changes to v2:
* Use NSPR TLS functions instead of C++11 thread_local keyword and pthread TLS functions. I make this change for several reasons:
1. Talos bustage that libxul cannot be loaded because of unsupported ABI. I guess the machines running Talos has older libstdc++.so.6, which doesn't support the newer ABI for using the thread_local keyword. This indicates that we might break some linux users after upgrade.
2. Destructor functions registered using the thread_local keyword gets called *after* PR_JoinThread() (called internally in nsThread::Shutdown()). Using thread_local forces me to disable the gtest test cases for this bug on Windows.

* Unhook InitThreadAnnotation() from crash reporter in the child process. CrashReporter::UnsetRemoteExceptionHandler() is not called in the child process. In previous revision I added it back in the child process, but media plugin process has SIGSYS crashes during shutdown. So I unhook init/shutdown of thread annotation from the crash reporter and use RAII instead.

* A minor change: move the declaration of NS_SetCurrentThreadName() in nsThreadUtils.h.
(Assignee)

Comment 38

3 months ago
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #36)
> Created attachment 8859452 [details] [diff] [review]
> Test cases for thread name annotations in the crash report (v2)

Changes to v1:
* Shutdown the threads created in each test case.
(Assignee)

Comment 39

3 months ago
Comment on attachment 8859451 [details] [diff] [review]
Annotate crash reports with thread names (v3)

Gabriele, I'd like you to look at this patch again because it has nontrivial changes. What and why of the changes are in comment #37.

And sorry that it's also a monolithic patch for you to interdiff more easily.
Attachment #8859451 - Flags: review?(gsvelto)
(Assignee)

Comment 40

3 months ago
Comment on attachment 8859452 [details] [diff] [review]
Test cases for thread name annotations in the crash report (v2)

Carryover r+.
Attachment #8859452 - Flags: review+
(Assignee)

Updated

3 months ago
Attachment #8834347 - Attachment is obsolete: true
(Assignee)

Updated

3 months ago
Attachment #8834348 - Attachment is obsolete: true
Comment on attachment 8859451 [details] [diff] [review]
Annotate crash reports with thread names (v3)

I've given this a brief spin on my machine and looked at the interdiff and everything looks fine to me.
Attachment #8859451 - Flags: review?(gsvelto) → review+
(Assignee)

Comment 42

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3574438c2a8e2c903c9bd1829cff30fe36e39978
Try push looks good. I will land using this version.
(Assignee)

Comment 43

3 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/fcd98542af390cfb873aee712e43388c779b07e6
Bug 1024669 - Part 1: Annotate crash reports with thread names. r=gsvelto

https://hg.mozilla.org/integration/mozilla-inbound/rev/1105f666b01948b7bac4bf3566402baf2359fdb4
Bug 1024669 - Part 2: Test cases for thread name annotation in the crash report. r=gsvelto
https://hg.mozilla.org/mozilla-central/rev/fcd98542af39
https://hg.mozilla.org/mozilla-central/rev/1105f666b019
Status: NEW → RESOLVED
Last Resolved: 3 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Updated

3 months ago
Blocks: 1358457
(Assignee)

Updated

2 months ago
Blocks: 1365203
You need to log in before you can comment on or make changes to this bug.