Closed Bug 1392755 Opened 2 years ago Closed 2 years ago

lazyIdleThread::ShutdownThread failures in test automation

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: jmaher, Assigned: asuth)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [stockwell fixed:product])

Attachments

(2 files)

we have noticed more than a few intermittent failures, crashes, assertions, leaks which reference lazyIdleThread::Shutdown.

Possibly there is a single root cause, or there are many different failures and we end up closing this bug out.
Whiteboard: [stockwell needswork]
:ryanvm, do you know who would know more about this set of failures and could look into the code?
Flags: needinfo?(ryanvm)
froydnj would probably be who I'd start with (/me ducks)
Component: General → XPCOM
Flags: needinfo?(ryanvm) → needinfo?(nfroyd)
Attached file log excerpt
The LazyIdleThread is shutting itself down (for being idle?), and while it's spinning a nested event loop for that, QuotaManager tries to shut itself down.  So this is unexpected re-entrancy on the part of LazyIdleThread.

Relatedly, QuotaManager's use of LazyIdleThread causes other problems like confusing mozStorage and its memory pressure events because mozStorage latches the wrong event target.  (Unless LazyIdleThread's reported eventtarget has been cleaned up so that the LazyIdleThread event target is returned instead of the current nsThread in use?)  Plus, QuotaManager seems like one of the more latency-sensitive and important threads we have.

Not that we shouldn't fix LazyIdleThread, but :janv, how would you feel about switching to a normal persistent nsThread for QM?
Flags: needinfo?(jvarga)
Yeah, we don't need the "lazy" behavior anymore.
Flags: needinfo?(jvarga)
Andrew, are you volunteering for this?? :)
Flags: needinfo?(nfroyd) → needinfo?(bugmail)
Yes, but due to scheduling concerns, Andrew from 1-1.5 days in the future is going to handle this.
Assignee: nobody → bugmail
Depends on: 1389309
Any chance you'll get to this soon, Andrew? It's possible that eliminating the use of LazyIdleThread in the quota manager would fix bug 1391895.
Yes; I should have this pushed to try and up for review later today.
So simple :)
QuotaManager is now a hip, popular subsystem used by many consumers, many of
which are now latency sensitive.  There is no longer any meaningful benefit to
using LazyIdleThread, but there are latency downsides.  Also, LazyIdleThread
has some bugs at shutdown and complications for callers that use
NS_GetCurrentThread().  So, begone LazyIdleThread!
Attachment #8902856 - Flags: review?(bkelly)
Comment on attachment 8902856 [details] [diff] [review]
Use a normal thread instead of LazyIdleThread for QuotaManager IO thread

Review of attachment 8902856 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jan Varga [:janv] from comment #11)
> So simple :)

Yeah, I thought it would be harder.  Also, apparently you're up, so I'll let you do the r+ honors.  Also, I apparently have no idea when you're awake.  I'm pretty sure it's all the time.
Attachment #8902856 - Flags: review?(bkelly) → review?(jvarga)
Comment on attachment 8902856 [details] [diff] [review]
Use a normal thread instead of LazyIdleThread for QuotaManager IO thread

Review of attachment 8902856 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for driving this!
Attachment #8902856 - Flags: review?(jvarga) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/6e0eb137038981bb7d8781d2842cdf982361c82c
Bug 1392755 - Use a normal thread instead of LazyIdleThread for QuotaManager IO thread. r=jvarga
Now that the tree is open again and I've triaged the many non-android oranges to make sure there's no subtle regressions, I've pushed and we'll declare victory.  I saw no intermittents related to QM consumers, closest thing was LocalStorage bug 1347690 and that's only got storage in the name, it doesn't touch QM.  (And I'll be fixing that one as part of labeling in bug 1378716.)
Flags: needinfo?(bugmail)
Comment on attachment 8902856 [details] [diff] [review]
Use a normal thread instead of LazyIdleThread for QuotaManager IO thread

OVERVIEW: This bug wants to be uplifted to make the tree less orange.  There is no risk to users.  RyanVM will throw a small celebration if this gets uplifted.

Approval Request Comment
[Feature/Bug causing the regression]:
Longstanding issue.

[User impact if declined]:
Users won't notice one way or the other unless they're running a debug build.  Then they'll notice fewer assertion-related crashes.

[Is this code covered by automated tests?]:
Yes.  They go orange more without this fix.

[Has the fix been verified in Nightly?]:
I verified locally on nightly, plus pushed to try on top of trunk and examined the failures to make sure there were no crazy invariants violated.

[Needs manual test from QE? If yes, steps to reproduce]:
No.  Not meaningfully possible to manually test.

[List of other uplifts needed for the feature/fix]:
None.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
Without the patch, we use LazyIdleThread for the QuotaManager IO thread.  This is a normal thread wrapped so that the underlying "real" thread can be terminated when it hasn't done anything for a while.  We then spin up a new real thread when needed.  This wrapping is slightly buggy at shutdown and introduces edge-cases that are known to be problematic for subsystems like mozStorage's memory-pressure-triggered cleanup.  By eliminating the lazy thread and going to an always-there thread, we're eliminating these problems.

Additionally, we got a clean try run and all QuotaManager logic and consumer logic is inherently async and the use of LazyIdleThread or not using LazyIdleThread is not observable to them.  (Except for the edge cases.)

[String changes made/needed]:
No localized string changes made.

Disclaimer: Note that the thread name changed.  This is an internal-only debugging name exposed only to debuggers and system monitoring tools like "pstree" on linux.  The change from "Storage I/O" to "QuotaManager IO" is to aid people filing bugs or debugging Gecko.  "Storage" is an ambiguous term and a red herring, whereas QuotaManager IO exactly describes what's going on.
Attachment #8902856 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/6e0eb1370389
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Comment on attachment 8902856 [details] [diff] [review]
Use a normal thread instead of LazyIdleThread for QuotaManager IO thread

Fix test automation failures. Beta56+.
Attachment #8902856 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.