Closed Bug 1677774 Opened 4 years ago Closed 4 years ago

Crash in [@ IPCError-browser | CommitFromIPC Invalid Transaction from Child - CanSet failed for field(s): TouchEventsOverrideIntern]

Categories

(Core :: DOM: Navigation, defect, P1)

Unspecified
All
defect

Tracking

()

RESOLVED FIXED
85 Branch
Fission Milestone M6c
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- unaffected
firefox84 --- wontfix
firefox85 --- fixed
firefox86 --- fixed
firefox87 --- fixed

People

(Reporter: gsvelto, Assigned: smacleod)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/f5747459-4531-408e-a210-254b00201116

Reason: EXC_BREAKPOINT / EXC_I386_BPT

Top 10 frames of crashing thread:

 0  libsystem_kernel.dylib __psynch_mutexwait
 1  libsystem_pthread.dylib _pthread_mutex_firstfit_lock_slow
 2  XUL google_breakpad::ExceptionHandler::WriteMinidump exception_handler.cc:302
 3  XUL google_breakpad::ExceptionHandler::WriteMinidump exception_handler.cc:317
 4  XUL CrashReporter::CreateMinidumpsAndPair nsExceptionHandler.cpp:3805
 5  XUL bool mozilla::ipc::CrashReporterHost::GenerateMinidumpAndPair CrashReporterHost.h:71
 6  XUL mozilla::dom::ContentParent::GeneratePairedMinidump ContentParent.cpp:3830
 7  XUL mozilla::dom::ContentParent::KillHard ContentParent.cpp:3865
 8  XUL mozilla::ipc::IPCResult::Fail ProtocolUtils.cpp:61
 9  XUL mozilla::dom::syncedcontext::Transaction SyncedContextInlines.h:92

Not all the crashes under this signature have fission enabled but some do. I will attach the actual stack trace shortly.

Steven, it looks like you moved this field to the BC. Maybe you could take a look at this crash? Thanks.

Flags: needinfo?(smacleod)
Keywords: regression
Regressed by: 1606061
Has Regression Range: --- → yes

(In reply to Andrew McCreight [:mccr8] from comment #2)

Steven, it looks like you moved this field to the BC. Maybe you could take a look at this crash? Thanks.

Sure thing, thanks for calling this out.

Taking another look at the code involved here, I have a theory for what's happening:

When BrowsingContext::SetTouchEventsOverride is called, we walk the context's descendants and set their TouchEventsOverrideinternal to TouchEventsOverride::None (if it isn't already). I must have forgotten about this when writing the BrowsingContext::CanSet for TouchEventsOverrideInternal and made it CheckOnlyOwningProcessCanSet. If any of these descendants are OOP and have an override set this crash will be triggered.

I can fix this by either:

  1. Allowing cross-origin sets (so if a content process is pwned, it could change the TouchEventsOverrideInternal for cross origin documents)
  2. Change the semantics of setting TouchEventsOverrideInternal to not match the old pre-fission behaviour in one of the following ways:
    A. Don't inherit overrides from ancestor contexts at all.
    B. Inherit from ancestor contexts, but don't clear any descendent overrides when an ancestor's override changes.

:whimboo, I think you'd be in a much better position to make the call (or point me at who should) about which path to take ^.

Assignee: nobody → smacleod
Status: NEW → ASSIGNED
Flags: needinfo?(smacleod) → needinfo?(hskupin)

(In reply to Steven MacLeod [:smacleod] from comment #3)

:whimboo, I think you'd be in a much better position to make the call (or point me at who should) about which path to take ^.

Woops, confused myself because you wrote a related patch in https://phabricator.services.mozilla.com/D96705.

jdescottes, you're who I'm looking for. Please see Comment 3.

Flags: needinfo?(hskupin) → needinfo?(jdescottes)

Sorry I forgot to reply here!

In my opinion this is not something we would want to expose to a DevTools (or test automation) end user. I don't think users would benefit from this. Enabling touch simulation should remain a global feature.

If we go for 2.A then DevTools would have to emulate the inheritance somehow.
I think 2.B can be a good compromise. We will only ever set overrides on the topmost context. So as long as updating the override is properly inherited by descendent contexts which didn't set a specific override this would be ok.

Would the following scenario work with 2.B?

  • assuming two contexts, parent and child
  • both have initially touchEventsOverride set to "none" (default)
  • start touch simulation, which sets parent.touchEventsOverride = "enabled"
  • child should now inherit from this and be "enabled" as well
  • stop touch simulation, which sets parent.touchEventsOverride = "none"
  • child should also revert back to "none"

If this works with 2.B, I think we should be ok. Adding :micah, :bwerth and :rcaliman in cc.

Flags: needinfo?(jdescottes)

(In reply to Julian Descottes [:jdescottes] from comment #5)

If this works with 2.B, I think we should be ok.

Yup, that scenario will work. The difference would only be noticed in the following scenario:

  1. assuming two contexts, parent and child.
  2. both initially have touchEventsOVerride set to "none" (default)
  3. child.touchEventsOverride = "enabled"
  4. parent.touchEventsOverride = "disabled"

The old behaviour would result in both parent and child being effectively "disabled" and child.touchEventsOverride == "none" . Going with 2.B, child would remain "enabled" and not inherit from the parent because it had previously been set directly.

If touchEventsOverride is only ever set directly on the top context, 2.B shouldn't result in any differences.

Great, I think we are good with 2.B from a DevTools & test automation standpoint

This should prevent content process crashes when there are
cross-origin descendants with an override set.

There is a slight change in behaviour when an override is set after
a descendant context has previously set one. This shouldn't be a
problem since devtools only wants to expose touch simulation at the
tab level. Additionally this new behaviour is reasonable and would
allow devtools to implement the same semantics by doing the clearing
themselves (if needed).

Severity: -- → S3
Fission Milestone: --- → M6c
Priority: -- → P1
Pushed by smacleod@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ea3b56f48dfd don't clear descendant's overrides when setting TouchEventsOverride. r=farre
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

There is one crash with this signature for the first Nightly which contains the commit from this bug: bp-1f565991-e715-488d-ab7e-a07a20201201
Please take a look if anything is left to be done.

Flags: needinfo?(smacleod)
Status: RESOLVED → REOPENED
Flags: needinfo?(smacleod)
Resolution: FIXED → ---

smacleod said: "Theory is that responsive design mode triggers this more often than you’d expect because by design it causes a navigation at the same time as changing touch events simulation."
Steven is working on a fix to address this theory.

Nika says, IIUC, all uses of BrowsingContext::CheckOnlyOwningProcessCanSet are buggy.

I filed bug 1688722 to replace uses of BrowsingContext::CheckOnlyOwningProcessCanSet. (BrowsingContext::CheckOnlyOwningProcessCanSet was just renamed to LegacyCheckOnlyOwningProcessCanSet in bug 1688715.)

See Also: → 1688722
Blocks: 1688948

As a temporary fix to prevent crashes we are loosening the CanSet
restrictions for TouchEventsOverrideInternal. In the future this will
be fixed properly by changing DevTools to set this property from the
parent only, at which time tighter restrictions on setting
TouchEventsOverrideInternal may be added.

Pushed by smacleod@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6969e6d61e8b Relax CanSet for TouchEventsOverrideInternal. r=nika
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED

Change the status for beta to have the same as nightly and release.
For more information, please visit auto_nag documentation.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: