Note: There are a few cases of duplicates in user autocompletion which are being worked on.

stylo: possible stack overflow when processing very deep DOM

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P2
normal
2 months ago
23 days ago

People

(Reporter: jseward, Assigned: jseward)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 3 obsolete attachments)

(Assignee)

Description

2 months ago
Whilst working with variants of the bloom-basic test for Stylo perf 
profiling, I noticed that it is easy to cause Firefox to segfault on
DOMs with a depth of more than about 1500 elements.  I don't know whether
this is really a Stylo issue, but it seems like a possible DOS and
so I am reporting it.

Unfortunately the debuginfo for libxul.so is somehow not right
(rustc problem?) and GDB won't show the fault point at all.  Valgrind
manages just one frame:

nodes at depth 2000: 1
nodes at depth 2001: 0
total nodes: 10380
--27771-- sync signal handler: signal=11, si_code=2, EIP=0x142e8b48, eip=0x101c2a1bc7, from kernel
--27771-- SIGSEGV: si_code=2 faultaddr=0x28300e68 tid=27 ESP=0x28300e68 seg=0x28300000-0x28300fff
--27771-- delivering signal 11 (SIGSEGV):2 to thread 27
--27771-- push_signal_frame (thread 27): signal 11
==27771==    at 0x142E8B48: style::properties::StyleBuilder::mutate_box (/checkout/src/libcore/mem.rs:334)
Segmentation fault (core dumped)


GDB offers this:
nodes at depth 1999: 1
nodes at depth 2000: 1
nodes at depth 2001: 0
total nodes: 10380
QQQ iteration 1
Dwarf Error: Cannot find DIE at 0x2bce02d8 referenced from DIE at 0x2b7bc8ea [in module /space/MOZ/MC-STYLO/gcc-Og-nondebug-stylo/dist/bin/libxul.so]
Dwarf Error: Cannot find DIE at 0x2bce02d8 referenced from DIE at 0x2b7bc8ea [in module /space/MOZ/MC-STYLO/gcc-Og-nondebug-stylo/dist/bin/libxul.so]
Missing separate debuginfos, use: dnf debuginfo-install gtk3-3.22.12-2.fc25.x86_64
Dwarf Error: Cannot find DIE at 0x2bce02d8 referenced from DIE at 0x2b7bc8ea [in module /space/MOZ/MC-STYLO/gcc-Og-nondebug-stylo/dist/bin/libxul.so]
(gdb) where
Python Exception <class 'gdb.error'> Frame is invalid.: 
Dwarf Error: Cannot find DIE at 0x2bce02d8 referenced from DIE at 0x2b7bc8ea [in module /space/MOZ/MC-STYLO/gcc-Og-nondebug-stylo/dist/bin/libxul.so]


STR: load my-bloom-basic.html (attached)
(Assignee)

Comment 1

2 months ago
Created attachment 8872120 [details]
my-bloom-basic.html
(Assignee)

Comment 2

2 months ago
Created attachment 8872121 [details]
my-util.js
(Assignee)

Comment 3

2 months ago
(In reply to Julian Seward [:jseward] from comment #0)
> (gdb) where
> Python Exception <class 'gdb.error'> Frame is invalid.: 
> Dwarf Error: Cannot find DIE at 0x2bce02d8 referenced from DIE at 0x2b7bc8ea
> [in module /space/MOZ/MC-STYLO/gcc-Og-nondebug-stylo/dist/bin/libxul.so]

Nathan, have you seen anything like that before?  This is built with
"clang version 3.9.0 (tags/RELEASE_390/final)" and "rustc 1.17.0 (56124baa9 2017-04-24)"
on Fedora 25.
Flags: needinfo?(nfroyd)
(Assignee)

Comment 4

2 months ago
It's interesting that Valgrind can't unwind out of it, because AFAIK
it doesn't care (at least for unwinding) about the DIEs at all, to
the extent of not even trying to read them.  That says to me that the
stack might be trashed.
(In reply to Julian Seward [:jseward] from comment #3)
> (In reply to Julian Seward [:jseward] from comment #0)
> > (gdb) where
> > Python Exception <class 'gdb.error'> Frame is invalid.: 
> > Dwarf Error: Cannot find DIE at 0x2bce02d8 referenced from DIE at 0x2b7bc8ea
> > [in module /space/MOZ/MC-STYLO/gcc-Og-nondebug-stylo/dist/bin/libxul.so]
> 
> Nathan, have you seen anything like that before?  This is built with
> "clang version 3.9.0 (tags/RELEASE_390/final)" and "rustc 1.17.0 (56124baa9
> 2017-04-24)"
> on Fedora 25.

I haven't seen anything like this.  It seems pretty unusual that there'd be invalid dwarf generated by clang, so perhaps this is confusion somewhere else?
Flags: needinfo?(nfroyd)
(Assignee)

Comment 6

2 months ago
I used objcopy to remove the .debug_info section on the basis that
that's where the DIEs that GDB doesn't like, came from:

objcopy --remove-section=.debug_info libxul.so-ORIG libxul.so

and like that I can at least get a stack trace.  It does look
like a stack overflow caused by recursion in 
style::parallel::traverse_nodes::h08dccb9d87aa4e5f.  There are
about 1750 such frames on the stack.  See attachment.
(Assignee)

Comment 7

2 months ago
Created attachment 8872282 [details]
Crash backtrace
In general, we don't treat stack overflows as critical security issues, since they're just DoS. Also, stylo isn't shipping to users yet. I can't seem to unhide this bug though. Julian, can you do it?

I think what we want here is just to detect the recursion depth in the tail-call stuff in parallel.rs and eliminate the tail-call optimization if we get too big (say, 500?). Julian, want to whip up a patch and retest?
Component: DOM → CSS Parsing and Computation
Flags: needinfo?(jseward)
Summary: Stylo: possible stack overflow when processing very deep DOM → stylo: possible stack overflow when processing very deep DOM
Julian asked me to unhide this bug pointing at comment 8.
Group: core-security
(Assignee)

Comment 10

2 months ago
Created attachment 8872651 [details] [diff] [review]
Limits recursion depth to 500, per comment 8
Assignee: nobody → jseward
Flags: needinfo?(jseward)
(Assignee)

Comment 11

2 months ago
Bobby, how does that look?  Without it, Stylo segfaults for trees more
than about 1700 deep.  With it, I can process trees up to at least 4500
deep.  Deeper than that and my JS tree-generation routines give up due
to too much recursion, but that's a different problem.

I am vaguely concerned that this assertion at the end of |traverse_nodes|
might fail

  // If this is a tail call, bypass rayon for the first chunk and invoke top_down_dom
  // directly.
  debug_assert_eq!(first_chunk.is_some(), mode.is_tail_call());

because the equivalence (if there ever was one, and if that is indeed
asserting it) "exactly one child == we're doing a tail call" now no longer
holds.  I haven't tested with a debug build.  Any insight on that?
Flags: needinfo?(bobbyholley)
Comment on attachment 8872651 [details] [diff] [review]
Limits recursion depth to 500, per comment 8

Review of attachment 8872651 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/style/parallel.rs
@@ +226,5 @@
>  impl DispatchMode {
>      fn is_tail_call(&self) -> bool { matches!(*self, DispatchMode::TailCall) }
>  }
>  
> +static MAX_RECURSION_DEPTH: u32 = 500;

Nit: I'd make this a usize (here and where it's passed as an arg), since that's the convention for counters, though I'd be curious if you have any particular insight on the tradeoffs.

@@ +232,4 @@
>  #[inline]
>  fn traverse_nodes<'a, 'scope, E, D>(nodes: NodeList<E::ConcreteNode>,
> +                                    mode_orig: DispatchMode,
> +                                    recursion_depth_orig: u32,

Nit: I would make both of these |mut| and nix the _orig business.

@@ +242,5 @@
>            D: DomTraversal<E>,
>  {
>      debug_assert!(!nodes.is_empty());
>  
> +    // Disable the tailcall optimisation after a given depth, so as to avoid

Nit: en-US spelling of optimization.

@@ +259,5 @@
>          } else {
>              // The caller isn't done yet. Append to the queue and return synchronously.
>              scope.spawn(move |scope| {
>                  let nodes = nodes;
> +                top_down_dom(&nodes, recursion_depth,

This is wrong - we don't want to propagate the depth into scope.spawn() calls, because we'll unwind the stack before processing those.

@@ +274,5 @@
>                  let boxed = chunk.iter().cloned().collect::<Vec<_>>().into_boxed_slice();
>                  let traversal_data_copy = traversal_data.clone();
>                  scope.spawn(move |scope| {
>                      let b = boxed;
> +                    top_down_dom(&*b, recursion_depth, root,

Same here.

@@ +283,5 @@
>          }
>  
>          // If this is a tail call, bypass rayon for the first chunk and invoke top_down_dom
>          // directly.
>          debug_assert_eq!(first_chunk.is_some(), mode.is_tail_call());

You can fix this by doing the following:

(1) Rename MAX_RECURSION_DEPTH to RECURSION_DEPTH_LIMIT
(2) Make the mode adjustment >= instead of >.
(3) Change this to debug_assert_eq!(first_chunk.is_some(), mode.is_tail_call() || recursion_depth == RECURSION_DEPTH_LIMIT);
Attachment #8872651 - Flags: review-
Attachment #8872651 - Flags: feedback+
Flags: needinfo?(bobbyholley)
(Worth re-testing your testcase with those fixes, and also doing a try push with |-b do -p linux64-stylo -u all -t none| before landing).
(Assignee)

Comment 14

2 months ago
Created attachment 8872985 [details] [diff] [review]
bug1368302-2.cset

Addresses all comments in comment 12.  Verified working up to depth 5500.

This sets a depth limit of 500.  I also tried with 1000 but some testing with
bloom-basic variants didn't show any change to the level of achieved
parallelism (nor CPU) so I stayed with 500 so as to be conservative.

Looks OK on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cec3f616bf699490e57ebb99b33b0a963548210

One detail: the recursion_depth incrementer at the start of traverse_nodes
turned out more complex than I expected, due to the need to cause it to stick
at exactly RECURSION_DEPTH_LIMIT so as to make the previously discussed
assertion happy.  Not sure if there's a simpler way to do that.
Attachment #8872651 - Attachment is obsolete: true
Attachment #8872985 - Flags: review?(bobbyholley)
Comment on attachment 8872985 [details] [diff] [review]
bug1368302-2.cset

Review of attachment 8872985 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with those three changes.

::: servo/components/style/parallel.rs
@@ +226,5 @@
>  impl DispatchMode {
>      fn is_tail_call(&self) -> bool { matches!(*self, DispatchMode::TailCall) }
>  }
>  
> +static RECURSION_DEPTH_LIMIT: usize = 500;

Make this const instead.

@@ +254,5 @@
> +    }
> +    else {
> +        debug_assert!(recursion_depth == RECURSION_DEPTH_LIMIT);
> +        mode = DispatchMode::NotTailCall;
> +    }

This is more complicated than it needs to be. Let's make this:

recursion_depth += 1;
debug_assert!(recursion_depth <= RECURSION_DEPTH_LIMIT);
if recursion_depth == RECURSION_DEPTH_LIMIT {
  mode = DispatchMode::NotTailCall;
}

@@ +293,5 @@
>          // If this is a tail call, bypass rayon for the first chunk and invoke top_down_dom
>          // directly.
> +        debug_assert_eq!(first_chunk.is_some(),
> +                         mode.is_tail_call()
> +                         || recursion_depth == RECURSION_DEPTH_LIMIT);

Looking at this again, you actually don't need to modify this assertion at all, because |mode| will already be fixed up by the time first_chunk is computed. Please revert this change.
Attachment #8872985 - Flags: review?(bobbyholley) → review+
I think we may actually not need the patch in this bug, because I think I need to remove the tail call optimization entirely in bug 1365692.
Depends on: 1365692
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #16)
> I think we may actually not need the patch in this bug, because I think I
> need to remove the tail call optimization entirely in bug 1365692.

The tail call optimization is still there, but in reduced form. So I think we'll still want to land this on top of that bug once it lands.
Priority: -- → P2
This patch needs rebasing. Also worth re-measuring that we crash without the patch and don't crash with it.
Flags: needinfo?(jseward)
(Assignee)

Comment 19

29 days ago
Created attachment 8880300 [details] [diff] [review]
bug1368302-3.cset

Rebased, and takes into account comment 15, to the extent that it is still
relevant.

I reduced the threshold from 500 to 200 in this version, having observed
with GDB that the current parallel.rs will fail with a stack overflow at
only 600 levels of recursion.  So setting it to 200 seems like a
conservative margin -- in particular given that we're assuming that limits
that are OK on Linux also carry over to other platforms, particularly
Windows.  IIUC almost no real pages will have a DOM that deep, so 200 will
not have any perf impact for real pages.

I verified that this successfully handles a 5000-deep chain tree DOM, that
definitely segfaults without the patch.
Attachment #8872985 - Attachment is obsolete: true
Flags: needinfo?(jseward)
(Assignee)

Updated

29 days ago
Attachment #8880300 - Flags: review?(bobbyholley)
Comment on attachment 8880300 [details] [diff] [review]
bug1368302-3.cset

Review of attachment 8880300 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/style/parallel.rs
@@ +240,5 @@
>  impl DispatchMode {
>      fn is_tail_call(&self) -> bool { matches!(*self, DispatchMode::TailCall) }
>  }
>  
> +const RECURSION_DEPTH_LIMIT: usize = 200;

Please add a describing the measurements involved in setting this.

@@ +261,5 @@
>      // This is a tail call from the perspective of the caller. However, we only
>      // want to actually dispatch the job as a tail call if there's nothing left
>      // in our local queue. Otherwise we need to return to it to maintain proper
> +    // breadth-first ordering. We also need to take care to avoid stack
> +    // overflow due to excessive tail recursion. See bug 1368302.

Add a comment here indicating that this limit isn't observable to content, since we're still totally correct, we just skip the tail call optimization for one iteration.

@@ +270,5 @@
> +                && !pool.current_thread_has_pending_tasks().unwrap()
> +        } else {
> +            debug_assert!(recursion_depth == RECURSION_DEPTH_LIMIT);
> +            false
> +        };

Nit: I think we could clean this up now a bit as follows:

debug_assert!(recursion_depth <= RECURSION_DEPTH_LIMIT);
let may_dispatch_tail = mode.is_tail_call() &&
                        recursion_depth != RECURSION_DEPTH_LIMIT &&
                        !pool.current_thread_has_pending_tasks().unwrap();

@@ +277,5 @@
>      // case we can pass the SmallVec directly and avoid extra allocation.
>      if nodes.len() <= WORK_UNIT_MAX {
>          let work = nodes.iter().cloned().collect::<WorkUnit<E::ConcreteNode>>();
>          if may_dispatch_tail {
> +            top_down_dom(&work, recursion_depth, root,

And then just pass recursion_depth + 1 here.
Attachment #8880300 - Flags: review?(bobbyholley) → review+
Just a reminder to land this. :-)
Blocks: 1376883
(Assignee)

Comment 22

23 days ago
Created attachment 8882034 [details] [diff] [review]
Rebased, FTR
Attachment #8880300 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.