Intermittent test_restyles.html | application timed out after 330 seconds with no output

RESOLVED FIXED in Firefox 48

Status

()

Core
DOM: Animation
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: philor, Assigned: fitzgen)

Tracking

({intermittent-failure})

Trunk
mozilla48
intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox47 affected, firefox48 fixed)

Details

Attachments

(1 attachment)

I guess this was caused by bug 1244897. ni? to me.
Flags: needinfo?(hiikezoe)
See Also: → bug 1244897
This is tough, I can't catch this while running this test with choas mode repeat=1000!
(In reply to Hiroyuki Ikezoe (:hiro) from comment #2)
> This is tough, I can't catch this while running this test with choas mode
chaos... There is no choas mode.
I thought the dead lock might be caused from other thread but it was in the same, the main thread!

A stack trace at the point:

06:43:31     INFO -   7  libxul.so!mozilla::TimelineConsumers::IsEmpty() [TimelineConsumers.cpp:77192f60c4a5 : 172 + 0xd]
06:43:31     INFO -   8  libxul.so!mozilla::CycleCollectedJSRuntime::GCNurseryCollectionCallback(JSRuntime*, JS::GCNurseryProgress, JS::gcreason::Reason) [CycleCollectedJSRuntime.cpp:77192f60c4a5 : 852 + 0x10]
06:43:31     INFO -   9  libxul.so!js::gcstats::Statistics::beginNurseryCollection(JS::gcreason::Reason) [Statistics.h:77192f60c4a5 : 199 + 0x14]
06:43:31     INFO -  10  libxul.so!js::Nursery::collect(JSRuntime*, JS::gcreason::Reason, mozilla::Vector<js::ObjectGroup*, 0u, js::SystemAllocPolicy>*) [Nursery.cpp:77192f60c4a5 : 402 + 0x5]
06:43:31     INFO -  11  libxul.so!js::gc::GCRuntime::minorGCImpl(JS::gcreason::Reason, mozilla::Vector<js::ObjectGroup*, 0u, js::SystemAllocPolicy>*) [jsgc.cpp:77192f60c4a5 : 6653 + 0x19]
06:43:31     INFO -  12  libxul.so!js::gc::GCRuntime::evictNursery(JS::gcreason::Reason) [GCRuntime.h:77192f60c4a5 : 604 + 0x15]
06:43:31     INFO -  13  libxul.so!js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) [jsgc.cpp:77192f60c4a5 : 6263 + 0xf]
06:43:31     INFO -  14  libxul.so!js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) [jsgc.cpp:77192f60c4a5 : 6424 + 0x1b]
06:43:31     INFO -  15  libxul.so!js::gc::GCRuntime::gcSlice(JS::gcreason::Reason, long long) [jsgc.cpp:77192f60c4a5 : 6497 + 0x4a]
06:43:31     INFO -  16  libxul.so!js::gc::GCRuntime::gcIfRequested(JSContext*) [jsgc.cpp:77192f60c4a5 : 6717 + 0x1d]
06:43:31     INFO -  17  libxul.so!js::gc::GCRuntime::gcIfNeededPerAllocation(JSContext*) [Allocator.cpp:77192f60c4a5 : 34 + 0xf]
06:43:31     INFO -  18  libxul.so!JSExternalString* js::Allocate<JSExternalString, (js::AllowGC)1>(js::ExclusiveContext*) [Allocator.cpp:77192f60c4a5 : 55 + 0x8]
06:43:31     INFO -  19  libxul.so!JSExternalString::new_(JSContext*, char16_t const*, unsigned int, JSStringFinalizer const*) [String-inl.h:77192f60c4a5 : 319 + 0x5]
06:43:31     INFO -  20  libxul.so!JS_NewExternalString(JSContext*, char16_t const*, unsigned int, JSStringFinalizer const*) [jsapi.cpp:77192f60c4a5 : 1548 + 0x1d]
06:43:31     INFO -  21  libxul.so!XPCStringConvert::StringBufferToJSVal(JSContext*, nsStringBuffer*, unsigned int, JS::MutableHandle<JS::Value>, bool*) [xpcpublic.h:77192f60c4a5 : 242 + 0x7]
06:43:31     INFO -  22  libxul.so!XPCStringConvert::ReadableToJSVal(JSContext*, nsAString_internal const&, nsStringBuffer**, JS::MutableHandle<JS::Value>) [XPCString.cpp:77192f60c4a5 : 95 + 0x15]
06:43:31     INFO -  23  libxul.so!xpc::NonVoidStringToJsval(JSContext*, nsAString_internal&, JS::MutableHandle<JS::Value>) [XPCString.cpp:77192f60c4a5 : 116 + 0xa]
06:43:31     INFO -  24  libxul.so!xpc::NonVoidStringToJsval(JSContext*, nsAString_internal const&, JS::MutableHandle<JS::Value>) [xpcpublic.h:77192f60c4a5 : 310 + 0xf]
06:43:31     INFO -  25  libxul.so!mozilla::dom::ProfileTimelineStackFrame::ToObjectInternal(JSContext*, JS::MutableHandle<JS::Value>) const [ProfileTimelineMarkerBinding.cpp:77192f60c4a5 : 560 + 0xd]
06:43:31     INFO -  26  libxul.so!mozilla::JavascriptTimelineMarker::AddDetails(JSContext*, mozilla::dom::ProfileTimelineMarker&) [ToJSValue.h:77192f60c4a5 : 231 + 0xe]
06:43:31     INFO -  27  libxul.so!mozilla::ObservedDocShell::PopMarkers(JSContext*, nsTArray<mozilla::dom::ProfileTimelineMarker>&) [ObservedDocShell.cpp:77192f60c4a5 : 141 + 0x1b]

We need to be careful not to invoke cycle collection while PopMarkers is being processed.  But how?

I don't know much about cycle collection.  Nick, you are the author who introduced AutoGlobalTimelineMarker in cycle collection codes in bug 1141614.  Any ideas?
Flags: needinfo?(hiikezoe) → needinfo?(nfitzgerald)

Comment 5

2 years ago
6 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 4
* mozilla-central: 1
* fx-team: 1

Platform breakdown:
* linux64: 3
* linux32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1253516&startday=2016-02-29&endday=2016-03-06&tree=all
Thanks for the digging, Hiroyuki! This indeed should not be re-entrant, and I'm not sure why it is. Investigating.
Assignee: nobody → nfitzgerald
Status: NEW → ASSIGNED
Flags: needinfo?(nfitzgerald)

Comment 7

2 years ago
20 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 9
* fx-team: 7
* try: 2
* mozilla-central: 2

Platform breakdown:
* linux64: 16
* linux32: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1253516&startday=2016-03-07&endday=2016-03-13&tree=all
So this doesn't actually involve CC at all.

PopTimelineMarkers
      |
      V
acquire lock
      |
      V
create JS objects for markers
      |
      V
nursery GC
      |
      V
attempt to acquire lock to add marker for nursery GC
      |
      V
dead lock

Hack #1: AutoSuppressGC while creating timeline markers. Not a good solution.

Hack #2: Some atomic global flag or something that we set while popping timeline markers and then unset when we're done. We would check this before adding nursery markers, and if it was set not add the marker. Less bad thna #1, but still pretty gross.

Solution #3: Swap the timeline markers into a new vec while we have the lock, but do not allocate any JS obejcts yet, release the lock, operate on our vec to create the JS objects. This would be most ideal, but now I need to learn how the heck all this stuff works.

Looking into #3 more.
Actually, `sMutex` is acquired by `TimelineConsumers::PopMarkers` here[0] so that it can touch `mMarkerStores` according to the comment, but then it doesn't touch `mMarkerStores` nor does `ObserveDocShell::PopTimelineMarkers`. So assuming the comment is correct, I think we can just delete this lock acquisition!

[0] https://dxr.mozilla.org/mozilla-central/source/docshell/base/timeline/TimelineConsumers.cpp#303
Created attachment 8730325 [details] [diff] [review]
Do not acquire a lock that protects data that is not used

`sMutex` protects the `mMarkersStores` data member, but neither
`TimelineConsumers::PopMarkers` nor `ObservedDocShell::PopMarkers` use that
data member.

Removing this lock acquisition also fixes a dead lock where we re-entrantly
attempt to acquire the lock when triggering a GC when allocating JS objects
inside `ObservedDocShell::PopMarkers`.
Attachment #8730325 - Flags: review?(vporof)
Great Nick!

(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #9)
> Actually, `sMutex` is acquired by `TimelineConsumers::PopMarkers` here[0] so
> that it can touch `mMarkerStores` according to the comment, but then it
> doesn't touch `mMarkerStores` nor does
> `ObserveDocShell::PopTimelineMarkers`. So assuming the comment is correct, I
> think we can just delete this lock acquisition!

That was my mistake.  As I commented in comment#0, I thought the deadlock we had been seeing in bug 1244897 might be caused from other threads.  What I don't quite understand is that the assertion message in bug 1244897 is accurate or not.
To be honest, I don't really understand the locking scheme in use here and exactly what is protected by which lock and where the critical sections are. I defer to :tromey and :vporof for that information, as they were much more involved in authoring the timeline consumers infrastructure than I was.
Attachment #8730325 - Flags: review?(vporof) → review+
Keywords: checkin-needed

Comment 15

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ffdeaea4ceae
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48

Comment 16

2 years ago
11 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 9
* try: 1
* fx-team: 1

Platform breakdown:
* linux64: 6
* linux32: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1253516&startday=2016-03-14&endday=2016-03-20&tree=all

Updated

2 years ago
Duplicate of this bug: 1283887
You need to log in before you can comment on or make changes to this bug.