Closed
Bug 1397052
Opened 7 years ago
Closed 7 years ago
stylo: Long hang in GCAtomTableLocked
Categories
(Core :: CSS Parsing and Computation, enhancement, P2)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: xidorn, Assigned: bholley)
References
Details
(Keywords: hang, Whiteboard: [stylo])
Attachments
(3 files)
2.57 KB,
patch
|
Details | Diff | Splinter Review | |
1.52 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
59 bytes,
text/x-review-board-request
|
froydnj
:
review+
|
Details |
I've been experiencing some significant long hangs on content process recently, and I successfully got a profile once, which shows that we spend several seconds inside GCAtomTableLocked during some Rust code releasing atom.
Stylo has significantly increased the use of atoms (for avoiding string conversion cost between Rust and C++), so it is expected that the atom table would have far more atoms, as well as more short-live atoms. This may contribute to a much longer atom table GC than before.
Unfortunately I lost the profile I got (because after I pressed the Share button and copy the url, I closed the tab, but the url doesn't seem to work at all). I will try catching another one when I saw this kind of hanging.
Reporter | ||
Comment 1•7 years ago
|
||
Also note that I'm on Beta 56 with Stylo experiment turned on. It doesn't seem to me this is something we've changed between Beta and Trunk, though.
Reporter | ||
Updated•7 years ago
|
Priority: -- → P2
Reporter | ||
Comment 2•7 years ago
|
||
Eventually I got another profile which shows a much shorter hang than the previous one: https://perfht.ml/2xMQg2d
GCAtomTableLocked isn't triggered from Rust code in this profile, but I still suspect that the wide usage of atoms may be one cause of this issue.
It is weird that we continuously run into GCAtomTableLocked. Do we really release so many atoms sometimes? If so, we probably want to trigger an atom table GC only at some particular time point (e.g. after a GC) rather than having it triggered randomly when we release an atom.
If we should not be releasing so many atoms, we would need to investigate why this happens.
Summary: stylo: Long hang in GCAtomTableLocked → Long hang in GCAtomTableLocked
Whiteboard: [stylo]
Reporter | ||
Comment 3•7 years ago
|
||
And it seems this is really an issue, and it happens majority with Servo code in stack: https://squarewave.github.io/?durationSpec=2048_65536&payloadID=131360179ef146ab85e01052d16ebe89&search=GCAtomTable&thread=0
Reporter | ||
Comment 4•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Summary: Long hang in GCAtomTableLocked → stylo: Long hang in GCAtomTableLocked
Reporter | ||
Comment 5•7 years ago
|
||
It doesn't seem to me that we should be releasing that many atoms, and the profile in comment 2 shows that we are overwhelmingly just doing GCAtomTable, in almost all the time, and there is neither anything which seems to be releasing tons of atoms, nor much actual atom destruction shows up, so it feels more likely that we are messing up gUnusedAtomCount somehow.
Since it is an unsigned integer, there is a chance that we can mess it up by integer overflow into negative value as intermediate state in some cases, but it is unlikely to happen when we have been in the main thread :/
Reporter | ||
Comment 6•7 years ago
|
||
And yeah, I think bug 1388632 could be related. Given that we are less parallel in Nightly, I would probably expect this to happen less as well... We can see if the hang reports keep being low in coming days.
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #6)
> And yeah, I think bug 1388632 could be related. Given that we are less
> parallel in Nightly, I would probably expect this to happen less as well...
> We can see if the hang reports keep being low in coming days.
Hm, "doing less parallelism means we hit the race condition less" doesn't make me sleep much easier. I really how we can figure out why gUnusedAtomCount is getting messed up...
Reporter | ||
Comment 8•7 years ago
|
||
I successfully attached MSVC onto the content process when it hanged (because it just hangs for too long this time again and I have enough time to attach).
So as expected, this is because of integer overflow. gUnusedAtomCount shows 4294967295, which is basically -1 if the variable is signed, when DynamicAtom::Release call into GCAtomTable. (It is on beta, so before the merge of StaticAtom and DynamicAtom.)
Reporter | ||
Comment 9•7 years ago
|
||
And mysteriously, there is no other thread which is operating atom at all at that moment. So gUnusedAtomCount keeps being -1 after we execute ++ on it in DynamicAtom::Release means there have been three times before this happens that it accumulated incorrect decrement.
Reporter | ||
Comment 10•7 years ago
|
||
And yes, the impl of gUnusedAtomCount is not thread-safe.
This is a test patch modified from the one I wrote for debugging bug 1386158. It would addref/release a single atom lots of times concurrently, and print out value of gUnusedAtomCount after all threads finish.
Local runs give me results like 4294967290 and 4294967295, which clearly shows the issue. And apparently, this triggers bug 1389826.
Reporter | ||
Comment 11•7 years ago
|
||
So the following code is not balanced:
> if (++gRefCnt == 1) {
> --gCount;
> }
> if (--gRefCnt == 0) {
> ++gCount;
> }
with gRefCnt being ThreadSafeAutoRefCnt and gCount being Atomic<uint32_t, ReleaseAcquire>.
It produces non-zero result on gCount if there is concurrency access, but gRefCnt is correctly reset to zero at the end, so we are probably safe from leaking / use-after-free.
Reporter | ||
Comment 12•7 years ago
|
||
So, the problem here is actually in ThreadSafeAutoRefCnt. Its operator-- looks like: [1]
> nsrefcnt result = mValue.fetch_sub(1, std::memory_order_release) - 1;
> if (result == 0) {
> result = mValue.load(std::memory_order_acquire);
> }
> return result;
This is troublesome for atom, because the reference count value can change between the fetch_sub() and the load(), which means that --gRefCnt is not guaranteed to return 0 when it is actually releasing the last reference.
I'm not familiar with atomic operation, so it's not clear to me why it needs to be written this way. But I guess we can fix this via using Atomic<nsrefcnt, Relaxed> instead of ThreadSafeAutoRefCnt for mRefCnt of Atom.
froydnj, what do you think?
Also this behavior of ThreadSafeAutoRefCnt could be confusing. I'm not sure whether there is any other objects doing anything similar to atom table that some cache holds objects unconditionally. If there is similar pattern elsewhere, we should probably audit usage there as well.
[1] https://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/xpcom/base/nsISupportsImpl.h#348-363
Blocks: 1277709
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 13•7 years ago
|
||
Ahah! Thanks for tracking that down.
The current Release code assumes that refcounted objects will never be brought back from the dead, which breaks for our Atom setup. We still need the barrier, but it seems to me that we should shouldn't assign the value of the load into |result|. This matches what Rust's std::Arc does (it just uses a fence, and doesn't do a load).
Nathan, does this sound right to you?
Assignee | ||
Comment 14•7 years ago
|
||
MozReview-Commit-ID: 1Yp4kFXA5gV
Attachment #8905154 -
Flags: review?(nfroyd)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → bobbyholley
Assignee | ||
Comment 15•7 years ago
|
||
xidorn, can you verify that this patch fixes your testcase?
Flags: needinfo?(nfroyd) → needinfo?(xidorn+moz)
![]() |
||
Comment 16•7 years ago
|
||
Comment on attachment 8905154 [details] [diff] [review]
Don't update |result| after the acquire fence in nsThreadSafeRefCnt::operator--. v1
Review of attachment 8905154 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/base/nsISupportsImpl.h
@@ +362,5 @@
> + // that the consumer uses a GC model and allows re-acquiring strong
> + // references for objects whose reference count dropped to zero (e.g.
> + // the Atom table). We leave |result| with its original value from
> + // the fetch_sub so that consumers can use it to balance counters.
> + mValue.load(std::memory_order_acquire);
Thanks for the explanation here. This is OK, but if you really wanted to micro-optimize, you'd use
std::atomic_thread_fence(std::memory_order_acquire);
instead, and avoid the load entirely.
Attachment #8905154 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 17•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #16)
> Comment on attachment 8905154 [details] [diff] [review]
> Don't update |result| after the acquire fence in
> nsThreadSafeRefCnt::operator--. v1
>
> Review of attachment 8905154 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: xpcom/base/nsISupportsImpl.h
> @@ +362,5 @@
> > + // that the consumer uses a GC model and allows re-acquiring strong
> > + // references for objects whose reference count dropped to zero (e.g.
> > + // the Atom table). We leave |result| with its original value from
> > + // the fetch_sub so that consumers can use it to balance counters.
> > + mValue.load(std::memory_order_acquire);
>
> Thanks for the explanation here. This is OK, but if you really wanted to
> micro-optimize, you'd use
>
> std::atomic_thread_fence(std::memory_order_acquire);
>
> instead, and avoid the load entirely.
Good point. I guess that also lets me get rid of the comment.
Comment 18•7 years ago
|
||
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/80e9cd4126b4
Don't update |result| after the acquire fence in nsThreadSafeRefCnt::operator--. r=froydnj
This added a hazard to the build: https://treeherder.mozilla.org/logviewer.html#?job_id=129006338&repo=autoland
Backed out in https://hg.mozilla.org/integration/autoland/rev/6169c1d97aab3ef1fcad843619916040fe698e96
Flags: needinfo?(bobbyholley)
Assignee | ||
Comment 21•7 years ago
|
||
Steve, any idea why this 1-line change added a hazard? Should I just bump the threshold?
Flags: needinfo?(bobbyholley) → needinfo?(sphink)
Assignee | ||
Comment 22•7 years ago
|
||
Oh I see. Just the external function usage.
Flags: needinfo?(sphink)
Assignee | ||
Comment 23•7 years ago
|
||
Reporter | ||
Comment 24•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #15)
> xidorn, can you verify that this patch fixes your testcase?
I've tested before that as far as we don't assign the result again, everything works fine, so I believe it would fix that.
Flags: needinfo?(xidorn+moz)
Reporter | ||
Comment 25•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #23)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=07f3cfe898eb4d9acd666c1392e77b73906d25b7
You should add __atomic_thread_fence I guess.
Comment hidden (mozreview-request) |
Reporter | ||
Comment 27•7 years ago
|
||
attachment 8905340 [details] is two tests checking that concurrent access works as expected.
It currently fails (unfortunately, intermittently), and it is expected to pass with the patch from bholley applies.
Reporter | ||
Comment 28•7 years ago
|
||
try push for the gtest: https://treeherder.mozilla.org/#/jobs?repo=try&revision=34197ed4d315809cd031cead84992c28cb289263
Assignee | ||
Comment 29•7 years ago
|
||
Comment 30•7 years ago
|
||
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d65d668da45f
Don't update |result| after the acquire fence in nsThreadSafeRefCnt::operator--. r=froydnj
Assignee | ||
Comment 31•7 years ago
|
||
NI xidorn to make sure the gtest passes with the patch in this bug and land it.
Flags: needinfo?(xidorn+moz)
![]() |
||
Comment 32•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
![]() |
||
Comment 33•7 years ago
|
||
mozreview-review |
Comment on attachment 8905340 [details]
Bug 1397052 - Add gtest for concurrent accessing.
https://reviewboard.mozilla.org/r/177122/#review182372
Thank you!
::: xpcom/ds/nsAtomTable.cpp:813
(Diff revision 1)
> +uint32_t
> +NS_GetUnusedAtomCount(void)
Man, I wish there was a way to limit this to gtest-only code.
Attachment #8905340 -
Flags: review?(nfroyd) → review+
Reporter | ||
Comment 34•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8905340 [details]
Bug 1397052 - Add gtest for concurrent accessing.
https://reviewboard.mozilla.org/r/177122/#review182372
> Man, I wish there was a way to limit this to gtest-only code.
Yeah. The `NS_GetNumberOfAtoms` above is also used only in gtest.
Reporter | ||
Comment 35•7 years ago
|
||
All pass now: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa35220e65a9a12173bc375789a129b898afd33d
Flags: needinfo?(xidorn+moz)
Comment 36•7 years ago
|
||
Pushed by xquan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/44fb9477f42f
Add gtest for concurrent accessing. r=froydnj
Comment 37•7 years ago
|
||
bugherder |
You need to log in
before you can comment on or make changes to this bug.
Description
•