Closed Bug 1397052 Opened 7 years ago Closed 7 years ago

stylo: Long hang in GCAtomTableLocked

Categories

(Core :: CSS Parsing and Computation, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: xidorn, Assigned: bholley)

References

Details

(Keywords: hang, Whiteboard: [stylo])

Attachments

(3 files)

I've been experiencing some significant long hangs on content process recently, and I successfully got a profile once, which shows that we spend several seconds inside GCAtomTableLocked during some Rust code releasing atom. Stylo has significantly increased the use of atoms (for avoiding string conversion cost between Rust and C++), so it is expected that the atom table would have far more atoms, as well as more short-live atoms. This may contribute to a much longer atom table GC than before. Unfortunately I lost the profile I got (because after I pressed the Share button and copy the url, I closed the tab, but the url doesn't seem to work at all). I will try catching another one when I saw this kind of hanging.
Also note that I'm on Beta 56 with Stylo experiment turned on. It doesn't seem to me this is something we've changed between Beta and Trunk, though.
Priority: -- → P2
Eventually I got another profile which shows a much shorter hang than the previous one: https://perfht.ml/2xMQg2d GCAtomTableLocked isn't triggered from Rust code in this profile, but I still suspect that the wide usage of atoms may be one cause of this issue. It is weird that we continuously run into GCAtomTableLocked. Do we really release so many atoms sometimes? If so, we probably want to trigger an atom table GC only at some particular time point (e.g. after a GC) rather than having it triggered randomly when we release an atom. If we should not be releasing so many atoms, we would need to investigate why this happens.
Summary: stylo: Long hang in GCAtomTableLocked → Long hang in GCAtomTableLocked
Whiteboard: [stylo]
And it seems this is really an issue, and it happens majority with Servo code in stack: https://squarewave.github.io/?durationSpec=2048_65536&payloadID=131360179ef146ab85e01052d16ebe89&search=GCAtomTable&thread=0
Summary: Long hang in GCAtomTableLocked → stylo: Long hang in GCAtomTableLocked
It doesn't seem to me that we should be releasing that many atoms, and the profile in comment 2 shows that we are overwhelmingly just doing GCAtomTable, in almost all the time, and there is neither anything which seems to be releasing tons of atoms, nor much actual atom destruction shows up, so it feels more likely that we are messing up gUnusedAtomCount somehow. Since it is an unsigned integer, there is a chance that we can mess it up by integer overflow into negative value as intermediate state in some cases, but it is unlikely to happen when we have been in the main thread :/
Depends on: 1397130
And yeah, I think bug 1388632 could be related. Given that we are less parallel in Nightly, I would probably expect this to happen less as well... We can see if the hang reports keep being low in coming days.
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #6) > And yeah, I think bug 1388632 could be related. Given that we are less > parallel in Nightly, I would probably expect this to happen less as well... > We can see if the hang reports keep being low in coming days. Hm, "doing less parallelism means we hit the race condition less" doesn't make me sleep much easier. I really how we can figure out why gUnusedAtomCount is getting messed up...
I successfully attached MSVC onto the content process when it hanged (because it just hangs for too long this time again and I have enough time to attach). So as expected, this is because of integer overflow. gUnusedAtomCount shows 4294967295, which is basically -1 if the variable is signed, when DynamicAtom::Release call into GCAtomTable. (It is on beta, so before the merge of StaticAtom and DynamicAtom.)
And mysteriously, there is no other thread which is operating atom at all at that moment. So gUnusedAtomCount keeps being -1 after we execute ++ on it in DynamicAtom::Release means there have been three times before this happens that it accumulated incorrect decrement.
Attached patch patch for testSplinter Review
And yes, the impl of gUnusedAtomCount is not thread-safe. This is a test patch modified from the one I wrote for debugging bug 1386158. It would addref/release a single atom lots of times concurrently, and print out value of gUnusedAtomCount after all threads finish. Local runs give me results like 4294967290 and 4294967295, which clearly shows the issue. And apparently, this triggers bug 1389826.
So the following code is not balanced: > if (++gRefCnt == 1) { > --gCount; > } > if (--gRefCnt == 0) { > ++gCount; > } with gRefCnt being ThreadSafeAutoRefCnt and gCount being Atomic<uint32_t, ReleaseAcquire>. It produces non-zero result on gCount if there is concurrency access, but gRefCnt is correctly reset to zero at the end, so we are probably safe from leaking / use-after-free.
So, the problem here is actually in ThreadSafeAutoRefCnt. Its operator-- looks like: [1] > nsrefcnt result = mValue.fetch_sub(1, std::memory_order_release) - 1; > if (result == 0) { > result = mValue.load(std::memory_order_acquire); > } > return result; This is troublesome for atom, because the reference count value can change between the fetch_sub() and the load(), which means that --gRefCnt is not guaranteed to return 0 when it is actually releasing the last reference. I'm not familiar with atomic operation, so it's not clear to me why it needs to be written this way. But I guess we can fix this via using Atomic<nsrefcnt, Relaxed> instead of ThreadSafeAutoRefCnt for mRefCnt of Atom. froydnj, what do you think? Also this behavior of ThreadSafeAutoRefCnt could be confusing. I'm not sure whether there is any other objects doing anything similar to atom table that some cache holds objects unconditionally. If there is similar pattern elsewhere, we should probably audit usage there as well. [1] https://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/xpcom/base/nsISupportsImpl.h#348-363
Blocks: 1277709
Flags: needinfo?(nfroyd)
Ahah! Thanks for tracking that down. The current Release code assumes that refcounted objects will never be brought back from the dead, which breaks for our Atom setup. We still need the barrier, but it seems to me that we should shouldn't assign the value of the load into |result|. This matches what Rust's std::Arc does (it just uses a fence, and doesn't do a load). Nathan, does this sound right to you?
MozReview-Commit-ID: 1Yp4kFXA5gV
Attachment #8905154 - Flags: review?(nfroyd)
Assignee: nobody → bobbyholley
xidorn, can you verify that this patch fixes your testcase?
Flags: needinfo?(nfroyd) → needinfo?(xidorn+moz)
Blocks: 1388632
Comment on attachment 8905154 [details] [diff] [review] Don't update |result| after the acquire fence in nsThreadSafeRefCnt::operator--. v1 Review of attachment 8905154 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/base/nsISupportsImpl.h @@ +362,5 @@ > + // that the consumer uses a GC model and allows re-acquiring strong > + // references for objects whose reference count dropped to zero (e.g. > + // the Atom table). We leave |result| with its original value from > + // the fetch_sub so that consumers can use it to balance counters. > + mValue.load(std::memory_order_acquire); Thanks for the explanation here. This is OK, but if you really wanted to micro-optimize, you'd use std::atomic_thread_fence(std::memory_order_acquire); instead, and avoid the load entirely.
Attachment #8905154 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #16) > Comment on attachment 8905154 [details] [diff] [review] > Don't update |result| after the acquire fence in > nsThreadSafeRefCnt::operator--. v1 > > Review of attachment 8905154 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: xpcom/base/nsISupportsImpl.h > @@ +362,5 @@ > > + // that the consumer uses a GC model and allows re-acquiring strong > > + // references for objects whose reference count dropped to zero (e.g. > > + // the Atom table). We leave |result| with its original value from > > + // the fetch_sub so that consumers can use it to balance counters. > > + mValue.load(std::memory_order_acquire); > > Thanks for the explanation here. This is OK, but if you really wanted to > micro-optimize, you'd use > > std::atomic_thread_fence(std::memory_order_acquire); > > instead, and avoid the load entirely. Good point. I guess that also lets me get rid of the comment.
Pushed by bholley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/80e9cd4126b4 Don't update |result| after the acquire fence in nsThreadSafeRefCnt::operator--. r=froydnj
Steve, any idea why this 1-line change added a hazard? Should I just bump the threshold?
Flags: needinfo?(bobbyholley) → needinfo?(sphink)
Oh I see. Just the external function usage.
Flags: needinfo?(sphink)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #15) > xidorn, can you verify that this patch fixes your testcase? I've tested before that as far as we don't assign the result again, everything works fine, so I believe it would fix that.
Flags: needinfo?(xidorn+moz)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #23) > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=07f3cfe898eb4d9acd666c1392e77b73906d25b7 You should add __atomic_thread_fence I guess.
attachment 8905340 [details] is two tests checking that concurrent access works as expected. It currently fails (unfortunately, intermittently), and it is expected to pass with the patch from bholley applies.
Pushed by bholley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d65d668da45f Don't update |result| after the acquire fence in nsThreadSafeRefCnt::operator--. r=froydnj
NI xidorn to make sure the gtest passes with the patch in this bug and land it.
Flags: needinfo?(xidorn+moz)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment on attachment 8905340 [details] Bug 1397052 - Add gtest for concurrent accessing. https://reviewboard.mozilla.org/r/177122/#review182372 Thank you! ::: xpcom/ds/nsAtomTable.cpp:813 (Diff revision 1) > +uint32_t > +NS_GetUnusedAtomCount(void) Man, I wish there was a way to limit this to gtest-only code.
Attachment #8905340 - Flags: review?(nfroyd) → review+
Comment on attachment 8905340 [details] Bug 1397052 - Add gtest for concurrent accessing. https://reviewboard.mozilla.org/r/177122/#review182372 > Man, I wish there was a way to limit this to gtest-only code. Yeah. The `NS_GetNumberOfAtoms` above is also used only in gtest.
Pushed by xquan@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/44fb9477f42f Add gtest for concurrent accessing. r=froydnj
Blocks: 1389826
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: