Closed Bug 1253516 Opened 8 years ago Closed 8 years ago

Intermittent test_restyles.html | application timed out after 330 seconds with no output

Categories

(Core :: DOM: Animation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- affected
firefox48 --- fixed

People

(Reporter: philor, Assigned: fitzgen)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

I guess this was caused by bug 1244897. ni? to me.
Flags: needinfo?(hiikezoe)
See Also: → 1244897
This is tough, I can't catch this while running this test with choas mode repeat=1000!
(In reply to Hiroyuki Ikezoe (:hiro) from comment #2)
> This is tough, I can't catch this while running this test with choas mode
chaos... There is no choas mode.
I thought the dead lock might be caused from other thread but it was in the same, the main thread!

A stack trace at the point:

06:43:31     INFO -   7  libxul.so!mozilla::TimelineConsumers::IsEmpty() [TimelineConsumers.cpp:77192f60c4a5 : 172 + 0xd]
06:43:31     INFO -   8  libxul.so!mozilla::CycleCollectedJSRuntime::GCNurseryCollectionCallback(JSRuntime*, JS::GCNurseryProgress, JS::gcreason::Reason) [CycleCollectedJSRuntime.cpp:77192f60c4a5 : 852 + 0x10]
06:43:31     INFO -   9  libxul.so!js::gcstats::Statistics::beginNurseryCollection(JS::gcreason::Reason) [Statistics.h:77192f60c4a5 : 199 + 0x14]
06:43:31     INFO -  10  libxul.so!js::Nursery::collect(JSRuntime*, JS::gcreason::Reason, mozilla::Vector<js::ObjectGroup*, 0u, js::SystemAllocPolicy>*) [Nursery.cpp:77192f60c4a5 : 402 + 0x5]
06:43:31     INFO -  11  libxul.so!js::gc::GCRuntime::minorGCImpl(JS::gcreason::Reason, mozilla::Vector<js::ObjectGroup*, 0u, js::SystemAllocPolicy>*) [jsgc.cpp:77192f60c4a5 : 6653 + 0x19]
06:43:31     INFO -  12  libxul.so!js::gc::GCRuntime::evictNursery(JS::gcreason::Reason) [GCRuntime.h:77192f60c4a5 : 604 + 0x15]
06:43:31     INFO -  13  libxul.so!js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) [jsgc.cpp:77192f60c4a5 : 6263 + 0xf]
06:43:31     INFO -  14  libxul.so!js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) [jsgc.cpp:77192f60c4a5 : 6424 + 0x1b]
06:43:31     INFO -  15  libxul.so!js::gc::GCRuntime::gcSlice(JS::gcreason::Reason, long long) [jsgc.cpp:77192f60c4a5 : 6497 + 0x4a]
06:43:31     INFO -  16  libxul.so!js::gc::GCRuntime::gcIfRequested(JSContext*) [jsgc.cpp:77192f60c4a5 : 6717 + 0x1d]
06:43:31     INFO -  17  libxul.so!js::gc::GCRuntime::gcIfNeededPerAllocation(JSContext*) [Allocator.cpp:77192f60c4a5 : 34 + 0xf]
06:43:31     INFO -  18  libxul.so!JSExternalString* js::Allocate<JSExternalString, (js::AllowGC)1>(js::ExclusiveContext*) [Allocator.cpp:77192f60c4a5 : 55 + 0x8]
06:43:31     INFO -  19  libxul.so!JSExternalString::new_(JSContext*, char16_t const*, unsigned int, JSStringFinalizer const*) [String-inl.h:77192f60c4a5 : 319 + 0x5]
06:43:31     INFO -  20  libxul.so!JS_NewExternalString(JSContext*, char16_t const*, unsigned int, JSStringFinalizer const*) [jsapi.cpp:77192f60c4a5 : 1548 + 0x1d]
06:43:31     INFO -  21  libxul.so!XPCStringConvert::StringBufferToJSVal(JSContext*, nsStringBuffer*, unsigned int, JS::MutableHandle<JS::Value>, bool*) [xpcpublic.h:77192f60c4a5 : 242 + 0x7]
06:43:31     INFO -  22  libxul.so!XPCStringConvert::ReadableToJSVal(JSContext*, nsAString_internal const&, nsStringBuffer**, JS::MutableHandle<JS::Value>) [XPCString.cpp:77192f60c4a5 : 95 + 0x15]
06:43:31     INFO -  23  libxul.so!xpc::NonVoidStringToJsval(JSContext*, nsAString_internal&, JS::MutableHandle<JS::Value>) [XPCString.cpp:77192f60c4a5 : 116 + 0xa]
06:43:31     INFO -  24  libxul.so!xpc::NonVoidStringToJsval(JSContext*, nsAString_internal const&, JS::MutableHandle<JS::Value>) [xpcpublic.h:77192f60c4a5 : 310 + 0xf]
06:43:31     INFO -  25  libxul.so!mozilla::dom::ProfileTimelineStackFrame::ToObjectInternal(JSContext*, JS::MutableHandle<JS::Value>) const [ProfileTimelineMarkerBinding.cpp:77192f60c4a5 : 560 + 0xd]
06:43:31     INFO -  26  libxul.so!mozilla::JavascriptTimelineMarker::AddDetails(JSContext*, mozilla::dom::ProfileTimelineMarker&) [ToJSValue.h:77192f60c4a5 : 231 + 0xe]
06:43:31     INFO -  27  libxul.so!mozilla::ObservedDocShell::PopMarkers(JSContext*, nsTArray<mozilla::dom::ProfileTimelineMarker>&) [ObservedDocShell.cpp:77192f60c4a5 : 141 + 0x1b]

We need to be careful not to invoke cycle collection while PopMarkers is being processed.  But how?

I don't know much about cycle collection.  Nick, you are the author who introduced AutoGlobalTimelineMarker in cycle collection codes in bug 1141614.  Any ideas?
Flags: needinfo?(hiikezoe) → needinfo?(nfitzgerald)
Thanks for the digging, Hiroyuki! This indeed should not be re-entrant, and I'm not sure why it is. Investigating.
Assignee: nobody → nfitzgerald
Status: NEW → ASSIGNED
Flags: needinfo?(nfitzgerald)
So this doesn't actually involve CC at all.

PopTimelineMarkers
      |
      V
acquire lock
      |
      V
create JS objects for markers
      |
      V
nursery GC
      |
      V
attempt to acquire lock to add marker for nursery GC
      |
      V
dead lock

Hack #1: AutoSuppressGC while creating timeline markers. Not a good solution.

Hack #2: Some atomic global flag or something that we set while popping timeline markers and then unset when we're done. We would check this before adding nursery markers, and if it was set not add the marker. Less bad thna #1, but still pretty gross.

Solution #3: Swap the timeline markers into a new vec while we have the lock, but do not allocate any JS obejcts yet, release the lock, operate on our vec to create the JS objects. This would be most ideal, but now I need to learn how the heck all this stuff works.

Looking into #3 more.
Actually, `sMutex` is acquired by `TimelineConsumers::PopMarkers` here[0] so that it can touch `mMarkerStores` according to the comment, but then it doesn't touch `mMarkerStores` nor does `ObserveDocShell::PopTimelineMarkers`. So assuming the comment is correct, I think we can just delete this lock acquisition!

[0] https://dxr.mozilla.org/mozilla-central/source/docshell/base/timeline/TimelineConsumers.cpp#303
`sMutex` protects the `mMarkersStores` data member, but neither
`TimelineConsumers::PopMarkers` nor `ObservedDocShell::PopMarkers` use that
data member.

Removing this lock acquisition also fixes a dead lock where we re-entrantly
attempt to acquire the lock when triggering a GC when allocating JS objects
inside `ObservedDocShell::PopMarkers`.
Attachment #8730325 - Flags: review?(vporof)
Great Nick!

(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #9)
> Actually, `sMutex` is acquired by `TimelineConsumers::PopMarkers` here[0] so
> that it can touch `mMarkerStores` according to the comment, but then it
> doesn't touch `mMarkerStores` nor does
> `ObserveDocShell::PopTimelineMarkers`. So assuming the comment is correct, I
> think we can just delete this lock acquisition!

That was my mistake.  As I commented in comment#0, I thought the deadlock we had been seeing in bug 1244897 might be caused from other threads.  What I don't quite understand is that the assertion message in bug 1244897 is accurate or not.
To be honest, I don't really understand the locking scheme in use here and exactly what is protected by which lock and where the critical sections are. I defer to :tromey and :vporof for that information, as they were much more involved in authoring the timeline consumers infrastructure than I was.
Attachment #8730325 - Flags: review?(vporof) → review+
https://hg.mozilla.org/mozilla-central/rev/ffdeaea4ceae
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: