Closed Bug 1393710 Opened 5 years ago Closed 5 years ago

Add thread name into NS_DebugBreak for better debugging.

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: JamesCheng, Assigned: JamesCheng)

Details

Attachments

(1 file)

I think it is handy to print warning with thread id.

When debugging or trace code, we might want to know which thread runs this code.
Attachment #8901087 - Flags: review?(nfroyd)
Comment on attachment 8901087 [details]
Bug 1393710 - Add thread name into NS_DebugBreak for better debugging.

https://reviewboard.mozilla.org/r/172586/#review179032

Sorry for taking a while to get to this.  I think this is OK, but I am curious on your thoughts below.

::: xpcom/base/nsDebugImpl.cpp:356
(Diff revision 1)
> -  buf.print("%d] %s", base::GetCurrentProcId(), nonPIDBuf.buffer);
> +  buf.print("pid=%d, tid=%llu] %s",
> +            base::GetCurrentProcId(),
> +            static_cast<unsigned long long>(PlatformThread::CurrentId()),
> +            nonPIDBuf.buffer);

The prefix is getting a little long now--pushing 30 characters--but maybe that's OK?

We name a lot of our threads; do you think it's more useful to print the numeric thread ID, or the name that we assigned to them?
Attachment #8901087 - Flags: review?(nfroyd) → review+
Thanks, 
I think it's useful printing the thread name. Naively, if we print the thread name or id, we have chance to know if it's a race condition or we can understand the threading model more easier.

I have some questions about your comment~

I prefer to change to print the thread name instead like PR_LOG using[1], but I cannot find the Windows' version of PR_GetThreadName from SearchFox or DXR... Do you know where it is?

What does "numeric thread ID" mean? Could you guide me where it is?

I have no "Main Thread instance" here in nsDebugImpl.cpp, so I would like to try printing the PR_GetThreadName directly on main thread first or is there any API that I can easily get the PRThread instance of mainthread?

Thank you.
[1]
http://searchfox.org/mozilla-central/rev/cd82cacec2cf734768827ff85ba2dba90a534c5e/xpcom/base/Logging.cpp#419
Flags: needinfo?(nfroyd)
By testing, I got nullptr by getting the name from main thread.
Can I use [1] to check mainthread on content or chrome process(tested OK by including "MainThreadUtils.h", but not sure because it is guarded by MOZILLA_INTERNAL_API)?

[1]
http://searchfox.org/mozilla-central/rev/cd82cacec2cf734768827ff85ba2dba90a534c5e/xpcom/threads/nsThreadManager.cpp#35
(In reply to James Cheng[:JamesCheng] from comment #3)
> I prefer to change to print the thread name instead like PR_LOG using[1],
> but I cannot find the Windows' version of PR_GetThreadName from SearchFox or
> DXR... Do you know where it is?

PR_GetThreadName is platform independent, but if you're looking for the actual implementation, searchfox won't index it because it's not compiling for Windows, just Linux.  The Windows implementation is here:

http://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/ntthread.c#291

> What does "numeric thread ID" mean? Could you guide me where it is?

I mean the value returned by PlatformThread::CurrentId(), which is just a number.

> I have no "Main Thread instance" here in nsDebugImpl.cpp, so I would like to
> try printing the PR_GetThreadName directly on main thread first or is there
> any API that I can easily get the PRThread instance of mainthread?

mozilla::GetCurrentVirtualThread() will do what you want.
Flags: needinfo?(nfroyd)
Summary: Add thread id into NS_DebugBreak for better debugging. → Add thread name into NS_DebugBreak for better debugging.
Comment on attachment 8901087 [details]
Bug 1393710 - Add thread name into NS_DebugBreak for better debugging.

Changing thread id to thread name is more useful for debugging, logging and understanding the threading model.

I'd like to request review it again.

Thanks.
Attachment #8901087 - Flags: review+ → review?(nfroyd)
Comment on attachment 8901087 [details]
Bug 1393710 - Add thread name into NS_DebugBreak for better debugging.

https://reviewboard.mozilla.org/r/172586/#review179606

Thank you!
Attachment #8901087 - Flags: review?(nfroyd) → review+
Pushed by jacheng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/855146d44333
Add thread name into NS_DebugBreak for better debugging. r=froydnj
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/4fc9b0383fb2
Backed out changeset 855146d44333 for mass failing tests at dist/include/mozilla/ThreadLocal.h:161 on Android debug. r=backout on a CLOSED TREE
Backed out for mass failing tests at dist/include/mozilla/ThreadLocal.h:161 on Android debug:

https://hg.mozilla.org/integration/autoland/rev/4fc9b0383fb2c992f1b78b089f91f79113c5be71

Push with failures (ignore the build bustage on Linux and Windows): https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=855146d4433352ac00bb0ce56d5f259362ea72aa&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=127285954&repo=autoland
[task 2017-08-31T06:09:15.516592Z] 06:09:15     INFO -  08-30 23:08:58.508 F/MOZ_Assert(  795): Assertion failure: initialized(), at /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/ThreadLocal.h:161
Flags: needinfo?(jacheng)
On Fennec,
the crash is caused by "I called NS_IsMainThread too early even earlier than all function call by[1]"
There must be a debug message that has been invoked in global variable initialization, in other word it is on main thread.

The possible fix would be,

1.
bool
NS_IsMainThread()
{
  if (!sTLSIsMainThread.initialized()) {
    // Call this method too early. Must be main thread.
    return true;
  }
  return sTLSIsMainThread.get();
}

2. 
Expose an API in MainThreadUtils.h
like:
bool 
NS_IsTLSIsMainThreadInitialized()
{
  return sTLSIsMainThread.initialized();
}
then do the check every time in NS_DebugBreak before using NS_IsMainThread.
if NS_IsTLSIsMainThreadInitialized return false, I will skip calling NS_IsMainThread and print the thread as Unnamed.

3. If there is no better approach to handle this case, I would rollback to the original thought(printing thread id instead).


Hi Nathan,
Do you have any thought about this?
Or you can indicate a better way to solve this.

Thank you.

[1]
http://searchfox.org/mozilla-central/search?q=NS_SetMainThread()&case=false&regexp=false&path=
Flags: needinfo?(jacheng) → needinfo?(nfroyd)
Thanks for the information.

Do you know if I use "mach try" to push my code onto treeherder, does it a clobber build?

The question is also for "autoland build"

If not a clobber build, how could I make it clobber? Push a modified CLOBBER file?

If clobber works, I need to test since I don't know why my change needs a clobber.

Thanks.
Flags: needinfo?(aryx.bugmail)
As far as I know:
Try always does a clobber.
Autoland and inbound don't.
You have to touch/modify the CLOBBER file to force it. But maybe froydnj can tell you more (and why it was busted).
Flags: needinfo?(aryx.bugmail)
The earliest call to NS_DebugBreak is by this

http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/xpcom/base/nsTraceRefcnt.cpp#171

Log
11419 is GeckoMainthread so maybe we can use approach 1 by comment 12.

09-01 12:11:46.430 11388 11419 I Gecko   : [11388, Unnamed thread da30c080] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/james/Projects/gecko-fennec/xpcom/base/nsTraceRefcnt.cpp, line 171
09-01 12:11:46.431 11388 11419 E GeckoLibLoad: Loaded libs in 65.868652ms total, 10ms(73ms) user, 3ms(6ms) system, 0(1) faults
09-01 12:11:46.433 11388 11419 D GeckoProfile: Found profile dir.
09-01 12:11:46.433 11388 11419 W GeckoThread: zerdatime 357932361 - runGecko
09-01 12:11:46.433 11388 11419 I Gecko   : [11388, Unnamed thread da30c080] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/james/Projects/gecko-fennec/xpcom/base/nsTraceRefcnt.cpp, line 171
09-01 12:11:46.433 11388 11419 I Gecko   : !!!!!!!!!!!!!!!!!!!!! NS_SetMainThread
09-01 12:11:46.433 11388 11419 I Gecko   : !!!!!!!!!!!!!!!!!!!!! NS_IsMainThread
(In reply to James Cheng[:JamesCheng] from comment #12)
> On Fennec,
> the crash is caused by "I called NS_IsMainThread too early even earlier than
> all function call by[1]"
> There must be a debug message that has been invoked in global variable
> initialization, in other word it is on main thread.
> 
> The possible fix would be,
> 
> 1.
> bool
> NS_IsMainThread()
> {
>   if (!sTLSIsMainThread.initialized()) {
>     // Call this method too early. Must be main thread.
>     return true;
>   }
>   return sTLSIsMainThread.get();
> }

I would worry about having a non-main-thread, non-gecko thread calling this through other means, though.  I don't think we can rely on this.

> 2. 
> Expose an API in MainThreadUtils.h
> like:
> bool 
> NS_IsTLSIsMainThreadInitialized()
> {
>   return sTLSIsMainThread.initialized();
> }
> then do the check every time in NS_DebugBreak before using NS_IsMainThread.
> if NS_IsTLSIsMainThreadInitialized return false, I will skip calling
> NS_IsMainThread and print the thread as Unnamed.

This seems reasonable.
Flags: needinfo?(nfroyd)
Comment on attachment 8901087 [details]
Bug 1393710 - Add thread name into NS_DebugBreak for better debugging.

I'd like to request r? again to let nfroyd know my change according to comment 17.

I'm also curious why the declaration of "NS_IsMainThread" is guarded by "MOZILLA_INTERNAL_API"[1] but the definition of it did not be guarded?

Is it intended to do that...it seems unreasonable.

[1]
http://searchfox.org/mozilla-central/rev/51eae084534f522a502e4b808484cde8591cb502/xpcom/threads/MainThreadUtils.h#30-31
Attachment #8901087 - Flags: review+ → review?(nfroyd)
Comment on attachment 8901087 [details]
Bug 1393710 - Add thread name into NS_DebugBreak for better debugging.

https://reviewboard.mozilla.org/r/172586/#review183984
Attachment #8901087 - Flags: review?(nfroyd) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a104181ac18b
Add thread name into NS_DebugBreak for better debugging. r=froydnj
When I pressed "Land commit", I got

A server error occurred: Autoland returned an error message during communications.

Then I refreshed web page and realized Nathan has pushed the patch.

It is weird that I still can press the "Land commit" button temporarily.

Thank you Nathan for helping me land the patch.
Hi Nathan,
I got the error before in

https://bugzilla.mozilla.org/show_bug.cgi?id=1392885#c7

The situation was the same.

Does that mean I have no privilege to land the patch so you need to help me to do this?

Is there a mysterious mechanism or autoland bug that I will trigger you to push(I pressed land then you have pressed already, what a coincidence) that means you didn't press "land commit" manually but automatically instead.


Or just a coincidence that you just do me a favor landing the code.

Thank you.
Flags: needinfo?(nfroyd)
https://hg.mozilla.org/mozilla-central/rev/a104181ac18b
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
(In reply to James Cheng[:JamesCheng] from comment #24)
> Or just a coincidence that you just do me a favor landing the code.

I had requested autolanding your patch earlier in the day, that's all.  Hope that was OK!
Flags: needinfo?(nfroyd)
You need to log in before you can comment on or make changes to this bug.