Closed Bug 1418806 Opened 6 years ago Closed 6 years ago

Crash in core::result::unwrap_failed<T> | geckoservo::glue::Servo_AnimationCompose

Categories

(Core :: DOM: Animation, defect, P3)

58 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox57 --- unaffected
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fixed
firefox63 --- fixed

People

(Reporter: philipp, Assigned: hiro)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files)

This bug was filed from the Socorro interface and is
report bp-55abfbeb-143a-4150-9122-b470e0171118.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll std::panicking::rust_panic_with_hook src/libstd/panicking.rs:617
1 xul.dll std::panicking::begin_panic&lt;alloc::string::String&gt; src/libstd/panicking.rs:572
2 xul.dll std::panicking::begin_panic_fmt src/libstd/panicking.rs:522
3 xul.dll core::panicking::panic_fmt src/libcore/panicking.rs:70
4 xul.dll core::result::unwrap_failed&lt;hashglobe::FailedAllocationError&gt; src/libcore/macros.rs:41
5 xul.dll geckoservo::glue::Servo_AnimationCompose servo/ports/geckolib/glue.rs:606
6 xul.dll mozilla::dom::KeyframeEffectReadOnly::ComposeStyleRule dom/animation/KeyframeEffectReadOnly.cpp:666
7 xul.dll mozilla::dom::KeyframeEffectReadOnly::ComposeStyle&lt;RawServoAnimationValueMap&amp;&gt; dom/animation/KeyframeEffectReadOnly.cpp:731
8 xul.dll mozilla::dom::Animation::ComposeStyle&lt;RawServoAnimationValueMap&amp;&gt; dom/animation/Animation.cpp:1018
9 xul.dll mozilla::dom::Animation::ComposeStyle&lt;RawServoAnimationValueMap&amp;&gt; dom/animation/Animation.cpp:1018

=============================================================

crashes with this signature are newly showing up on windows since firefox 58 - first affected build was 58.0a1 20171029102300.
they all come with MOZ_CRASH Reason 'called `Result::unwrap()` on an `Err` value: FailedAllocationError { reason: "out of memory when allocating RawTable" }' and the system memory use % is quite high already in most cases.
This happened after we landed the minor refactoring of Servo_AnimationCompose() from Bug 1340005, but I still cannot figure out why this panic happened while memory usage is high.
Assignee: nobody → boris.chiou
Status: NEW → ASSIGNED
Blocks: 1340005
See Also: → 1416903
It seems my refactoring in bug 1340005 doesn't add extra clone()s. Astley helps me to check the crash urls, and this panics sometimes happens on "about:blank" (https://crash-stats.mozilla.com/report/index/852fc0ed-e584-42a6-8cc9-73f790171121) or "about:newTab". Bug 1416903 also has the "out of memory" panic. So this may happen on Chrome document? Still have no idea. Manish, do you have any suggestion for debugging this kind of panic?
Flags: needinfo?(manishearth)
(In reply to Boris Chiou [:boris] (away Dec 16–31) from comment #2)
> It seems my refactoring in bug 1340005 doesn't add extra clone()s. Astley
> helps me to check the crash urls, and this panics sometimes happens on
> "about:blank"
> (https://crash-stats.mozilla.com/report/index/852fc0ed-e584-42a6-8cc9-
> 73f790171121) or "about:newTab". Bug 1416903 also has the "out of memory"
> panic. So this may happen on Chrome document? Still have no idea. Manish, do
> you have any suggestion for debugging this kind of panic?

Besides, the available physical memory of the above report is still 1.34GB.
(In reply to Boris Chiou [:boris] (away Dec 16–31) from comment #3)
> (In reply to Boris Chiou [:boris] (away Dec 16–31) from comment #2)
> > It seems my refactoring in bug 1340005 doesn't add extra clone()s. Astley
> > helps me to check the crash urls, and this panics sometimes happens on
> > "about:blank"
> > (https://crash-stats.mozilla.com/report/index/852fc0ed-e584-42a6-8cc9-
> > 73f790171121) or "about:newTab". Bug 1416903 also has the "out of memory"
> > panic. So this may happen on Chrome document? Still have no idea. Manish, do
> > you have any suggestion for debugging this kind of panic?
> 
> Besides, the available physical memory of the above report is still 1.34GB.

Yeah, but that doesn't say anything about how big is the table we're requesting or such. We can probably are more debug information to the panic message so it displays the amount of memory we're requesting.
Yeah, I agree with emilio here, add more stuff to the crash message and see what happens.
Flags: needinfo?(manishearth)
(In reply to Manish Goregaokar [:manishearth] from comment #5)
> Yeah, I agree with emilio here, add more stuff to the crash message and see
> what happens.

Likely the next act is to load more information in code and see what happens.
(In reply to Boris Chiou [:boris] (away Dec 16–31) from comment #7)
> Created attachment 8934881 [details] [review]
> Servo PR, #19504 (dump requesting table size)

Thanks, Boris. Please create an uplift request to beta 58 so that we can have more information in following crash reports.
Comment on attachment 8934881 [details] [review]
Servo PR, #19504 (dump requesting table size)

Approval Request Comment
[Feature/Bug causing the regression]: Don't know yet.
[User impact if declined]: We need this patch to dump more crash information for this bug on beta.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes 
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: No
[Is the change risky?]: No
[Why is the change risky/not risky?]: We just add a debug field, so we can understand the size we are requesting for a hash table.
[String changes made/needed]: No
Attachment #8934881 - Flags: approval-mozilla-beta?
When did this land in m-c?
Flags: needinfo?(boris.chiou)
(In reply to Julien Cristau [:jcristau] from comment #10)
> When did this land in m-c?

Oh sorry, this change set:
https://hg.mozilla.org/mozilla-central/rev/f3ea755aa8c4
Flags: needinfo?(boris.chiou)
Comment on attachment 8934881 [details] [review]
Servo PR, #19504 (dump requesting table size)

add allocation size annotation for servo oom crashes, beta58+
Attachment #8934881 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
This hasn't been fixed.  A diagnostic code has just landed.  That's said, no new crash has been reported since 20171204150510, though.
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
The diagnostic code is not in 58.0b10, still waiting for next beta release.
By far there are 10 reports in 58.0b11 and size of the table seems small enough and should not fail to allocate.
 - allocation_size: Some(8448)
 - allocation_size: Some(8960)

Also in the report bp-269a0331-4d29-4861-a412-0a1120171213, the available memory is quite enough.
Emilio, do you have any suggestions to forward?
Flags: needinfo?(emilio)
Sorry for the lag here... I don't, though I've seen some other allocations fail with relatively high available memory usage. Maybe somebody familiar with the allocator (Glandium / Erahm?) can help out here? Not sure if this is expected.
Flags: needinfo?(emilio) → needinfo?(mh+mozilla)
Maybe there's something wrong with the requested alignment? Could it be logged along the allocation_size?
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #18)
> Maybe there's something wrong with the requested alignment? Could it be
> logged along the allocation_size?

Sure, we could add one more field into FailedAllocationError. Thanks for this suggestion.
Attached file Servo PR, #19607
(In reply to Boris Chiou [:boris] from comment #20)
> Created attachment 8937847 [details] [review]
> Servo PR, #19607

This has been merged into m-c:
https://hg.mozilla.org/mozilla-central/rev/be2a93f7d4b4

I should uplift it.
Comment on attachment 8937847 [details] [review]
Servo PR, #19607

Approval Request Comment
[Feature/Bug causing the regression]: Don't know yet.
[User impact if declined]: We need this patch to dump more crash information for this bug on beta. And according to Comment 18, it's better to uplift this so we have the memory alignment information for the hashtable.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]:  No
[List of other uplifts needed for the feature/fix]: No. (other uplifts have been merged into beta)
[Is the change risky?]: No.
[Why is the change risky/not risky?]: We just add one more debug field, so we can understand the alignment we are requesting for a hash table.
[String changes made/needed]: No.

This PR is merged into m-c already:
https://hg.mozilla.org/mozilla-central/rev/be2a93f7d4b4
Attachment #8937847 - Flags: approval-mozilla-beta?
bp-15fc0dac-abc3-494e-bb34-fec5d0180104 submitted from Nightly yesterday includes new added debug info(alignment:8).
Comment on attachment 8937847 [details] [review]
Servo PR, #19607

Take this to dump more crash information for this bug. Beta58+.
Attachment #8937847 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
It's not fixed but just added more debug info in the code.
A lot of the reports really don't seem to be in a particularly low memory state at all, I don't see anything we can do here. Brian, WDYT?
Flags: needinfo?(bbirtles)
See Also: → 1421345
I've gone through those patches again and I don't have any good ideas.

I notice that on the C++ side we have to jump through a few hoops to handle the reference to the pointer to the the hashmap and the reference to the RefPtr<AnimValuesStyleRule> alike. I wonder if it's possible we got something wrong there (e.g. by using Forward we ended up using move semantics were we didn't intend to and on Windows the compiler is treating that memory as freed?). Until now we've been focusing on the rust side, but perhaps the call stack just before the call to Servo_AnimationCompose could do with an extra audit? That's about all I can think of.
Flags: needinfo?(bbirtles)
(In reply to Brian Birtles (:birtles) from comment #28)
> I've gone through those patches again and I don't have any good ideas.

Thanks for your feedback.
Given that the crash volume is quite low and no concrete action we can take in short-term(FF58), make this bug wontfix for 58.
Assignee: boris.chiou → nobody
Hey Brian, is this still a P1? If so we need to find an owner.
Flags: needinfo?(bbirtles)
Target Milestone: mozilla59 → ---
I discussed this with Hiro this morning. The crash volume is still low so this doesn't need to be P1.

(Hiro wondered if bug 1446108 comment 2 was related, i.e. we can get malformed stylo structures when using IPC, and that somehow we were ending up with corrupted data here.)
Flags: needinfo?(bbirtles)
Priority: P1 → P3
Almost 400 crashes on release in the past week. Any chance we could revisit this at some point?
Crash Signature: [@ core::result::unwrap_failed<T> | geckoservo::glue::Servo_AnimationCompose] → [@ core::result::unwrap_failed<T> | geckoservo::glue::Servo_AnimationCompose] [@ static <T> core::result::unwrap_failed<T> | void geckoservo::glue::Servo_AnimationCompose]
I have an idea to reduce the possibility that the animation value map tries to allocate a new memory.
Assignee: nobody → hikezoe
Status: REOPENED → ASSIGNED
Flags: needinfo?(hikezoe)
https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=f5b1725a3cf0d422678b0d84be116907afe72164
I did cap the pre-allocation hash map count for the number of animatable properties just in case.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #35)
> https://treeherder.mozilla.org/perf.html#/
> comparechooser?newProject=try&newRevision=f5b1725a3cf0d422678b0d84be116907afe
> 72164
> I did cap the pre-allocation hash map count for the number of animatable
> properties just in case.

Wrong link.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f5b1725a3cf0d422678b0d84be116907afe72164
Comment on attachment 8987654 [details]
Bug 1418806 - Introduce a constant variable to represents the number of all animatable longhands.

https://reviewboard.mozilla.org/r/252898/#review259468

::: servo/components/style/properties/properties.mako.rs:103
(Diff revision 1)
>  #[allow(missing_docs)]
>  pub mod longhands {
>      % for style_struct in data.style_structs:
>      include!("${repr(os.path.join(OUT_DIR, 'longhands/{}.rs'.format(style_struct.name_lower)))[1:-1]}");
>      % endfor
> +    pub const ANIMATABLE_PROPERTY_COUNT: usize = ${len([prop for prop in data.longhands if prop.animatable])};

Maybe `sum(1 for prop in data.longhands if prop.animatable)` is faster.
Attachment #8987654 - Flags: review?(xidorn+moz) → review+
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #40)
> Comment on attachment 8987654 [details]
> Bug 1418806 - Introduce a constant variable to represents the number of all
> animatable longhands.
> 
> https://reviewboard.mozilla.org/r/252898/#review259468
> 
> ::: servo/components/style/properties/properties.mako.rs:103
> (Diff revision 1)
> >  #[allow(missing_docs)]
> >  pub mod longhands {
> >      % for style_struct in data.style_structs:
> >      include!("${repr(os.path.join(OUT_DIR, 'longhands/{}.rs'.format(style_struct.name_lower)))[1:-1]}");
> >      % endfor
> > +    pub const ANIMATABLE_PROPERTY_COUNT: usize = ${len([prop for prop in data.longhands if prop.animatable])};
> 
> Maybe `sum(1 for prop in data.longhands if prop.animatable)` is faster.

Oh neat!  Didn't know that. Thanks!
Comment on attachment 8987655 [details]
Bug 1418806 - Try to allocate possible size for AnimationValueMap before composing.

https://reviewboard.mozilla.org/r/252900/#review259470

I think there are many cases where this is not accurate -- e.g. when we have both animations and transitions but I think it's worth trying this as an initial approximation.

I think it wouldn't be too hard to get a more accurate number of properties, if we need to. e.g. If we iterate the effect set on the Gecko side, skip effects whose animation's cascade level doesn't match, add the properties to an nsCSSPropertyIDSet, then return its length, I think we'd be pretty close (the main difference being from transitions that get overridden) without increasing the complexity significantly.

::: commit-message-8effd:44
(Diff revision 2)
> +
> +Note that when the hash map is expanded, we do allocate a new RawTable with the
> +new size then replace the old one with the new one, so I believe this change
> +will reduce the crash rate to some extent.
> +
> +[1] https://hg.mozilla.org/mozilla-central/file/15c95df467be/servo/components/hashglobe/src/hash_map.rs#l734

This footnote isn't actually referenced from anywhere in the comment.

::: layout/style/ServoBindings.h:260
(Diff revision 2)
> +// Returns the number of animation effects on the given element.  This number
> +// is equals to animating property count in most cases.

I'd prefer we drop the comment here (especially because I'm not sure we should say it is equal in most cases) and use the comment suggested below.

::: layout/style/ServoBindings.h:262
(Diff revision 2)
>                              ComputedStyleBorrowedOrNull aOldComputedValues,
>                              ComputedStyleBorrowedOrNull aComputedValues,
>                              mozilla::UpdateAnimationsTasks aTasks);
> +// Returns the number of animation effects on the given element.  This number
> +// is equals to animating property count in most cases.
> +size_t Gecko_AnimationEffectCountOnElement(RawGeckoElementBorrowed aElementOrPseudo);

I wonder if we need "OnElement" here. Perhaps that's obvious from the arguments it takes? And it seems like a lot of functions in this file use the Get/Set naming, so perhaps Gecko_GetAnimationEffectCount?

::: servo/components/style/gecko/wrapper.rs:954
(Diff revision 2)
>      cascade_level: CascadeLevel,
>  ) -> Option<Arc<Locked<PropertyDeclarationBlock>>> {
>      use gecko_bindings::sugar::ownership::HasSimpleFFI;
> +    use properties::longhands::ANIMATABLE_PROPERTY_COUNT;
> +
> +    let effect_count = unsafe { Gecko_AnimationEffectCountOnElement(element.0) };

Let's add a comment:

"There's a very rough correlation between the number of effects (animations) on an element and the number of properties it is likely to animate, so we use that as an initial guess for the size of the AnimationValueMap in order to reduce the number of re-allocations needed."
Attachment #8987655 - Flags: review?(bbirtles) → review+
Thanks for reviewing!

(In reply to Brian Birtles (:birtles) from comment #42)
> Comment on attachment 8987655 [details]
> Bug 1418806 - Try to allocate possible size for AnimationValueMap before
> composing.
> 
> https://reviewboard.mozilla.org/r/252900/#review259470
> 
> I think there are many cases where this is not accurate -- e.g. when we have
> both animations and transitions but I think it's worth trying this as an
> initial approximation.
> 
> I think it wouldn't be too hard to get a more accurate number of properties,
> if we need to. e.g. If we iterate the effect set on the Gecko side, skip
> effects whose animation's cascade level doesn't match, add the properties to
> an nsCSSPropertyIDSet, then return its length, I think we'd be pretty close
> (the main difference being from transitions that get overridden) without
> increasing the complexity significantly.

Yeah, totally agree.  We can make this more accurate, but I did want make this initial attempt pretty small and simple to be able to be uplifted (if it didn't have any negative impacts on nightly).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=86a9258e7599b22eaea15f1b2567738276c03230
This will be a final try.
Pushed by hikezoe@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ca338b516f46
Introduce a constant variable to represents the number of all animatable longhands. r=xidorn
https://hg.mozilla.org/integration/autoland/rev/463ef9c8540f
Try to allocate possible size for AnimationValueMap before composing. r=birtles
Flags: needinfo?(hikezoe)
Comment on attachment 8987655 [details]
Bug 1418806 - Try to allocate possible size for AnimationValueMap before composing.

Approval Request Comment
[Feature/Bug causing the regression]: Not particularly, it started from the beginning of stylo I think
[User impact if declined]: Possibly keep suffering from the crash
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes, but not sure this patch fixes the crash
[Needs manual test from QE? If yes, steps to reproduce]: No, we have no way to reproduce the crash
[List of other uplifts needed for the feature/fix]: https://hg.mozilla.org/mozilla-central/rev/ca338b516f46
[Is the change risky?]: very low risk
[Why is the change risky/not risky?]: This patch just changed the memory allocation timing, before this change we re-allocate memory for animation on each time when we calculate values for each property, but this patch allocates the memory for the bunch of properties at once before we start calculating each values for properties
[String changes made/needed]: None
Attachment #8987655 - Flags: approval-mozilla-beta?
I'll tentatively call this fixed in 63.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
There are a few crashes in beta 3 but I'd like to give this a few more days (to see that it may fix the issue in 63 and also to see the crash volume in 62 without the potential fix)
Comment on attachment 8987655 [details]
Bug 1418806 - Try to allocate possible size for AnimationValueMap before composing.

Let's try this on beta 6 later in the week. We can see a steady low volume of crashes on 62 beta, so we should be able to get more information once this patch is in 62.
Attachment #8987655 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1475377
You need to log in before you can comment on or make changes to this bug.