Closed Bug 423737 Opened 12 years ago Closed 10 years ago

Avoid causing DOM events at unsafe times

Categories

(Core :: Disability Access APIs, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta2+

People

(Reporter: aaronlev, Assigned: davidb)

References

Details

(Keywords: access, Whiteboard: [sg:nse-meta][ccbr])

Attachments

(1 file)

This is only an issue on windows where ATs like to be in process.

If we fire an MSAA event synchronously, an AT such as JAWS may respond to that event by using accDoDefaultAction(), accSelect(), IAccessibleEditableText methods etc. We cannot allow those things to occur at unsafe times because it will allow content to run at an unsafe time.
The easiest thing be to create internal events on a delay for those actions. We already have a mechanism for delayed events. Right now it uses timer, which is apparently wasteful compared to posting an nsIRunnable to the thread, but I don't think it's a problem for how often we use it.
If you want things to execute before returning to content javascript, but not before it's safe, you can use the nsContentUtils::AddScriptRunner interface.
It appears that at least do action is mostly taken care of, via nsAccessible::DoAction() which uses a timer.
Aaron, can you give more details about unsafe times, what's wrong can happen?
Any time that nsContentUtils::IsSafeToRunScript returns false it is unsafe to fire any DOM events. If an event fires at this point content can do things, usually mutate the DOM or navigate to another page, that causes us to crash. These crashes can be exploited to run arbitrary binary code on the users system.
Based on the suggestion in comment 2 and the reporter's comment 3, is there still specific action that can be taken on this bug?
Aaron/Jonas/Alexander/Marco,   can you add something about security risk for this bug?  how likely is it be discovered or exploited and what has to be in place to exploit?  should we really be trying to get this into Firefox 3.1?
I have no idea where our code fires MSAA events, so it's hard to assess the risk.

However I would imagine that all you'd need to do is to run some good fuzzers combined with an AT tool to find this.

So I would think the current sg:critical rating is correct here.
Alexander, have you had any opportunity to look at this bug?  Is there any way we can make progress on it without a testcase that triggers a MSAA event?  This one has been open for a while and it would be great if we can start to address it.
Sorry, I'm inactive on this bug. It's partially because I'm not well familiar with these stuffs.

(In reply to comment #8)
> I have no idea where our code fires MSAA events, so it's hard to assess the
> risk.

Actually it concerns to not MSAA only, JS is able to listen a11y events and invoke actions. 

What is the best approach: should we delay events or should we delay actions when nsContentUtils::IsSafeToRunScript is false?

In the mantime we fire a11y events when

1) we handle the following DOM events http://mxr.mozilla.org/mozilla-central/source/accessible/src/base/nsRootAccessible.cpp#274

2) we listen nsIMutationObserver

3) layout changed (for example, in nsPresShell, http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#3336)

4) selection changes (in nsCaretAccessible.cpp)
(In reply to comment #10)
> What is the best approach: should we delay events or should we delay actions
> when nsContentUtils::IsSafeToRunScript is false?
Yeah, probably best to wait until nsContentUtils::IsSafeToRunScript returns 
true, so in other words use a script runner to do whatever you need to do.
See nsContentUtils::AddScriptRunner.
I hope that is accessible also outside gklayout. Jonas?

Other option, dispatching nsIRunnable to the main thread
and do the work in nsIRunnable::Run() might work well enough too.
That is certainly available outside gklayout.

Handling DOM Events isn't the problem, but any cases where a11y modifies DOM
or *fires* DOM events at unsafe time.
(I don't know whether or when a11y does that.)
There is currently no way to add scriptrunners if you're outside gklayout, however if you need it it would be pretty trivial to add. I'd be happy to review a patch :)

However I wonder if dispatching to the current thread isn't what you want to do anyway. nsContentUtils::AddScriptRunner is more appropriate if it's important that you take action before webpage javascript is run, for example if you need to update some property that the webpage might read.
It's been a while since we've heard anything on this.  jst, can you make a call on who is the right person to own this bug?
(In reply to comment #11)

> Handling DOM Events isn't the problem, but any cases where a11y modifies DOM
> or *fires* DOM events at unsafe time.
> (I don't know whether or when a11y does that.)

Do I understand correct: whenever a11y changes the DOM, selection or focus we should ensure it is safe time before we'll prepare an action, and if it's unsafe then we should invoke an action by sContentUtils::AddScriptRunner?
Yes.

Not sure if the accessibility code is able to use nsContentUtils. Otherwise you can always use NS_DispatchToMainThread
(In reply to comment #15)
> Yes.
> 
> Not sure if the accessibility code is able to use nsContentUtils. Otherwise you
> can always use NS_DispatchToMainThread

I think it can't. What's about analogue for nsContentUtils::IsSafeToRunScript?
Could you simply assume IsSafeToRunScript would return false and always use NS_DispatchToMainThread? Alternatively, we could wrap a service around nsContentUtils that expose things like IsSafeToRunScript/AddScriptRunner through some new nsIContentUtilsService interface.
Btw, how does NS_DispatchToMainThread works? Is nsIRunnable::Run invoked after timeout always so that every AT action (accessibility methods changing DOM or anything else) will be invoked asynchronously?
nsIRunnable::Run is called from the main event loop. So yes, it's always called asynchronously. There's no real timeout though, we don't wait any given amount of time, instead it's processed as soon as we can (but always from the event loop).

Well.. I should say, from the event loop or anything that pumps it. But both those situations are ok to run script.
(In reply to comment #17)
> Alternatively, we could wrap a service around
> nsContentUtils that expose things like IsSafeToRunScript/AddScriptRunner
> through some new nsIContentUtilsService interface.
There is a partial patch for this:
https://bugzilla.mozilla.org/attachment.cgi?id=386370&action=edit

Instead of adding dispatchTrustedEvent, one needs to add addScriptRunner method.
Just a notion. We should check with screen reader developers they don't rely on sync actions.
Using a script runner would be probably safer than async nsRunnable.
Although script runner does also postpone the action.
(In reply to comment #21)
> Just a notion. We should check with screen reader developers they don't rely on
> sync actions.

Yes it is great to bring their attention to this necessary change. Maybe the best thing is to roll a patch based on Olli's suggestion and point the vendors at the try build.
(In reply to comment #22)
> Using a script runner would be probably safer than async nsRunnable.
> Although script runner does also postpone the action.

Is it always pending action? Now when AT calls doAction() method then the action is executed after timeout 0 (nsITimer service is used). It is because of bug 277888: if doAction() is result of open dialog window then AT waits until dialog is closed. I wonder can I stop to use timer service and use AddScriptRunner instead? It allows to have the same syntax for all accessible actions.
Not sure I understand what you're saying. If you always fire events from a timeout (using nsITimer), then you're always firing events when it's safe.
It concerns to nsIAccessible::doAction() method only, we have couple of actions more. I just want to follow unique approach in action methods implementation. Therefore I asked can I replace timer service usage on AddScriptRunner for doAction?
If you want to run something asynchronously, the best way is to use NS_DispatchToMainThread, that's simpler than using nsITimer.

You should only use AddScriptRunner if you need to fire an event before webpage script runs. Hmm.. we really need to document this API on MDC somewhere.
Jonas, I still didn't get an answer on my question in comment #24. Can I replace nsITimer usage on AddScriptRunner in the case nsIAccessible::doAction(). I'll try to describe original problem in another words why we started to use nsITimer.

Let's webpage has a button which opens modal dialog on click. We expose "click" action for AT on every button's accessible object which can be activated by nsIAccessible::doAction(). When screen reader calls doAction method (via MSAA for example) then we send DOM click event to the button and modal dialog window is open. Previously screen reader was forced to wait when modal window is closed because we opened modal dialog window synchronously. Therefore we started to use nsITimer to avoid this. So if I replace nsITimer on AddScriptRunner then do I get desired behaviour as well?
AddScriptRunner isn't asynchronous. It runs the runnable it gets as a parameter
when it is safe to run it. So it may even run it synchronously.
What's about NS_DispatchToMainThread? I think it's better to have async actions always, at least it get rid the case when AT thinks action is prepared sync, but we invoke an action something synchronously sometimes asynchronously because this can lead to confusions and hard-to-catch bugs.
I'm not sure I understand what you're asking. If you want to do something asynchronously then NS_DispatchToMainThread is always the simplest way to do so.
Ok, great. David are you happy to make our actions async always?
(In reply to comment #32)
> Ok, great. David are you happy to make our actions async always?

Yes.
Ok, it sounds all our AT vendors are happy as well.
Status: NEW → ASSIGNED
Attached patch wipSplinter Review
How does approach look?
Looks okay to me, but I don't fully understand the macro magic (yet).
Olli, Jonas what is your opinion? I tried to follow the approach from nsThreadUtils.h but I was failed to pass interface method to template based classes (like nsRunnableMehtod class). Therefore I used macros. I think it would be preferable to use template based approach rather than macros one but I couldn't do this.
So I thought you were going to make the method calls to be always truly async
(I mean, dispatch runnable to the current thread).
But perhaps there is some reason to use script runners in some case.
Or maybe I misunderstood "happy to make our actions async always?"

NS_RUNNABLEMETHOD_INVOKE is strange name. It sounds like the method is invoked
when that code runs, but that doesn't actually happen.
Better name could be something like NS_RUN_WHEN_SAFE.

I would have thought template based classes worked if the right interface is
passed as class name.
(In reply to comment #38)
> So I thought you were going to make the method calls to be always truly async
> (I mean, dispatch runnable to the current thread).
> But perhaps there is some reason to use script runners in some case.
> Or maybe I misunderstood "happy to make our actions async always?"

Yes, you're right. I just realized I need to create new method for every existing method and I scared. And I thought if all AT said it shouldn't be a problem then we could run them when it' safe, i.e. probably there is no big problem if they might be sync and sometimes async.

> NS_RUNNABLEMETHOD_INVOKE is strange name. It sounds like the method is invoked
> when that code runs, but that doesn't actually happen.
> Better name could be something like NS_RUN_WHEN_SAFE.

Thank you. I like this more.

> 
> I would have thought template based classes worked if the right interface is
> passed as class name.

The problem if I get right in interface method declaration. Interface method uses
NS_IMETHOD/NS_IMETHODIMP, if I try to typedef NS_IMETHOD/NS_IMETHODIMP .... then compiler fails (analogue from http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsThreadUtils.h#254). If I use typedef nsresult ... then compiler sais I can't cast interface method (nsresult__stdcall) to (unsigned long __thiscall). However I didn't try to pass interface. I used class implementing interface.
Ah, ok. Seems like no one really uses interface methods with runnables.
I can see one possible problem: async call of the method hides errors, so we always return NS_OK (S_OK) which might be not good.
(In reply to comment #11)

> Handling DOM Events isn't the problem, but any cases where a11y modifies DOM
> or *fires* DOM events at unsafe time.
> (I don't know whether or when a11y does that.)

Olli, can you point when time to change the DOM might be not safe? In comment #10 I listed cases when our reaction is to fire a11y events. Which of this cases is unsafe?
It is probably not safe to change DOM when nsIMutationObserver methods are
called.
And seems like PresShell::InvalidateAccessibleSubtree is called while there
are script blockers.
(In reply to comment #43)
> It is probably not safe to change DOM when nsIMutationObserver methods are
> called.
> And seems like PresShell::InvalidateAccessibleSubtree is called while there
> are script blockers.

Ok, great. So if we'll ensure we fire any a11y events asynchronously (via timeout 0) in these cases then we don't need to fix anything here, right?
It is definitely not safe to mutate the DOM from nsIMutationObserver notifications. Out of the list in comment #10 only from 1) is it safe to fire events or mutate the DOM. And that's assuming that then events that you listen to in 1) only fire when it's safe to mutate the DOM ;)
Yes, if you do things asynchronously then you are fine.

Though I'd recommend using NS_DispatchToMainThread rather than a 0-timeout. That's just a code cleanlyness/performance thing though, and not something that affects stability/security.
(In reply to comment #46)

> Though I'd recommend using NS_DispatchToMainThread rather than a 0-timeout.
> That's just a code cleanlyness/performance thing though, and not something that
> affects stability/security.

I filed bug 523785.
I prepared a table https://wiki.mozilla.org/Accessibility/SafeA11yEvents showing how a11y interacts with gecko. Could you look at it and check yes?/no? third column showing if the call can happens at safe or unsafe time? If it's possible to get rid red fields in last column (showing a11y events are not safe) then I would prefer to go this way instead of making all a11y action async.
um, you put security sensitive information to a public wiki.
Moved to https://intranet.mozilla.org/Accessibility/SafeA11yEvents

If the public page hasn't been deleted yet, perhaps someone with admin can do it?
(In reply to comment #49)
> um, you put security sensitive information to a public wiki.

Olli can you file a server ops bug if you feel the public page should be blown away?
sounds like this might also block bug 525579
Blocks: 525579
any ideas on what reasonable target milestone is for this bug?
Good question. We need it ASAP to reduce crashiness.

It sounds like anytime we modify the DOM in any way we should dispatch that activity to the main thread (bug 523785). Note bug 523785 is unassigned, Alexander are you looking at both of these bugs? What is the difference? Let's fix these (ping me).

Note there has been no edits to the wiki page your started.
if I understand this right it sounds like at least "wanted 1.9.2" to avoid possible crash regressions due to frame poisoning.
Flags: wanted1.9.2?
Flags: blocking1.9.2?
(In reply to comment #54)
> Good question. We need it ASAP to reduce crashiness.
> 
> It sounds like anytime we modify the DOM in any way we should dispatch that
> activity to the main thread (bug 523785). Note bug 523785 is unassigned,
> Alexander are you looking at both of these bugs? What is the difference? Let's
> fix these (ping me).

bug 523785 is clean up bug, if I get right nsITimer works good in our case also.

> Note there has been no edits to the wiki page your started.

I think we need to ask Olli or somebody else to look at this table and check layout related column "IsSafe". Then we can start to change the code to fire events at safe time. I think this might be preferable than to call AT API method at safe time.
On the layout side, I'm not sure what '?' means. Those are marked 'safe'.


FireValueChangeEvent / nsEventDispatcher:: DispatchDOMEvent should be safe.

PostHandleEvent / nsEventDispatcher:: DispatchDOMEvent should be safe.

I have no idea what 
"toolkit 	 radio.xml 	 yes?" means.

But in general, if nsContentUtils::IsSafeToRunScript returns false, you should
not change anything in the DOM.
(In reply to comment #57)
> But in general, if nsContentUtils::IsSafeToRunScript returns false, you should
> not change anything in the DOM.
This means you should not change any attribute values, add or remove nodes, dispatch DOM events etc.
(In reply to comment #57)
> On the layout side, I'm not sure what '?' means. Those are marked 'safe'.

This means I'm not sure if it's safe to change the DOM and etc.

> I have no idea what 
> "toolkit      radio.xml      yes?" means.

Iiirc this means DOM event is fired from radio.xml, I assumed it should be safe but I wasn't 100% sure therefore I marked as "yes?".
I think we'd take a patch with tests here, but I don't think this blocks based on the comments in this bug (only happens with ATs in some circumstances) - if sg thinks that assessment is incorrect, please renominate.
Flags: wanted1.9.2?
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
Let's smash this bug. I think Alexander landed at least a partial fix for this as bug 523785 and I think we are protected against the cases mentioned in comment 0 except maybe for the editable text actions. Alexander do you have a list of remaining AT actions we need to dispatch to the main thread?
(In reply to comment #61)
> Let's smash this bug. I think Alexander landed at least a partial fix for this
> as bug 523785 and I think we are protected against the cases mentioned in
> comment 0 except maybe for the editable text actions. Alexander do you have a
> list of remaining AT actions we need to dispatch to the main thread?

Actually I would like to be protected by events firing at safe time. We need to update that wiki page and fix accessibility events firing. Btw, I don't think bug 523785 is related with this one since nsITimer usage should be safe as NS_DispatchToMainThread.
(In reply to comment #62)
> (In reply to comment #61)
> > Let's smash this bug. I think Alexander landed at least a partial fix for this
> > as bug 523785 and I think we are protected against the cases mentioned in
> > comment 0 except maybe for the editable text actions. Alexander do you have a
> > list of remaining AT actions we need to dispatch to the main thread?
> 
> Actually I would like to be protected by events firing at safe time. We need to
> update that wiki page and fix accessibility events firing. Btw, I don't think
> bug 523785 is related with this one since nsITimer usage should be safe as
> NS_DispatchToMainThread.

That is true, I thought you had added additional cases there. Will you be updating the wiki? Moving our remaining unsafe-times event firing to the main thread makes sense to me.
Whiteboard: [sg:critical] → [sg:critical][ccbr]
blocking2.0: --- → beta1+
Note this bug report is valid but general and speculative so if anyone has seen relevant crash stat reports please link them here, to help guide the work.
what would the stacks/signatures look like? I can see if we can do a deep dive to dig some data out of socorro.

This listing has some places that accessibility (and other code) is the top source line on the stack, and then we crash.

http://people.mozilla.com/~chofmann/crash-data/source-module-report.html
(In reply to comment #65)
> what would the stacks/signatures look like? I can see if we can do a deep dive
> to dig some data out of socorro.

Oh hmmm. I suppose we could have many kinds of stacks unfortunately, but probably with accessible module call(s) near the top. We'd be calling into content or layout when we shoudn't be. I suppose something dom eventy might be in the stack. Olli probably has good ideas here.

> This listing has some places that accessibility (and other code) is the top
> source line on the stack, and then we crash.
> 
> http://people.mozilla.com/~chofmann/crash-data/source-module-report.html

Interesting. Is that from the last two weeks for all FF versions? (I'm guessing from the links -- which don't seem to work fully for me).
Note that the main problem here isn't concern that a lot of people are crashing causing them annoyance.

The concern is that an evil person can use this bug to cause an exploitable crash and attack users.

All it takes is for one evil-doer to find this out and then we'll have to firedrill.
its a sample from a single days worth of crash reports I was playing with around april 15.  yeah its all firefox versions.   I need to clean up the the output link queries into the socorro data but the signature and source line info is correct.
(In reply to comment #67)
> Note that the main problem here isn't concern that a lot of people are crashing
> causing them annoyance.
> 
> The concern is that an evil person can use this bug to cause an exploitable
> crash and attack users.

OK thanks for clarifying that Jonas. We definitely don't want accessibility to be an attack surface and I hate fire drills.
(In reply to comment #67)
> Note that the main problem here isn't concern that a lot of people are crashing
> causing them annoyance.
> 

 that source line report isn't from topcrash data, its a sample of all 350,000 crashes we received on the 15th. so this kind of thing would tell us if something out there is malicious, or if something was just unintentionally hitting a crash that could be turned into an exploit.
We need to figure out the places we are vulnerable and fix them.

The wiki table is confusing (https://intranet.mozilla.org/Accessibility/SafeA11yEvents) and I think this needs to be cleared up before we can properly collaborate on a fix here.

Alexander, can you describe what the first "Safe?" column is describing?
Comment on attachment 406428 [details] [diff] [review]
wip

I like this idea of making AT actions async. I think it is worth updating this patch and doing some try-build outreach to see if this causes AT pain.
(In reply to comment #71)
> We need to figure out the places we are vulnerable and fix them.
> 
> The wiki table is confusing
> (https://intranet.mozilla.org/Accessibility/SafeA11yEvents) and I think this
> needs to be cleared up before we can properly collaborate on a fix here.
> 
> Alexander, can you describe what the first "Safe?" column is describing?

The fist safe means if we are safe to call unsafe methods in a11y.

(In reply to comment #72)
> (From update of attachment 406428 [details] [diff] [review])
> I like this idea of making AT actions async. I think it is worth updating this
> patch and doing some try-build outreach to see if this causes AT pain.

I would prefere to not change how the actions work because
1) it might be hard to find AT regressions
2) mochitests will be more complex
3) we need to make text change events async in any way since related mutation events are async already and it's not good for AT to get events in reverse order
> (In reply to comment #72)
> > (From update of attachment 406428 [details] [diff] [review] [details])
> > I like this idea of making AT actions async. I think it is worth updating this
> > patch and doing some try-build outreach to see if this causes AT pain.
> 
> I would prefere to not change how the actions work because
> 1) it might be hard to find AT regressions
> 2) mochitests will be more complex
> 3) we need to make text change events async in any way since related mutation
> events are async already and it's not good for AT to get events in reverse
> order

I agree with 2 and 3, but not sure about 1 yet.
(In reply to comment #74)
> > (In reply to comment #72)
> > > (From update of attachment 406428 [details] [diff] [review] [details] [details])
> > > I like this idea of making AT actions async. I think it is worth updating this
> > > patch and doing some try-build outreach to see if this causes AT pain.
> > 
> > I would prefere to not change how the actions work because
> > 1) it might be hard to find AT regressions

> I agree with 2 and 3, but not sure about 1 yet.

1.1) the patch attached to the bug makes actions work async only if they were called at unsafe time. 
1.2) AT is complex stuff and action methods aren't everyday usage methods I think, I mean they might be called not to often (it depends on markup or user actions), they are not like accessible state or name methods which are called every time when AT gets an accessible object. Therefore I afraid the testing might not trigger proper AT code.
Yep, I see your point I think. It is a risk/trade-off. I'm just not convinced it is a large risk (yet). Also, believe me I wouldn't look forward to adding more timers to our mochitests ;)
(In reply to comment #73)
> (In reply to comment #71)
> > We need to figure out the places we are vulnerable and fix them.
> > 
> > The wiki table is confusing
> > (https://intranet.mozilla.org/Accessibility/SafeA11yEvents) and I think this
> > needs to be cleared up before we can properly collaborate on a fix here.
> > 
> > Alexander, can you describe what the first "Safe?" column is describing?
> 
> The fist safe means if we are safe to call unsafe methods in a11y.

Hmmm okay, and thanks for the IRC chat. So. My understanding is if this column has a "no" in it, it means we need to make sure there is a "yes" in the right most "Safe" column for that row(s).
Whiteboard: [sg:critical][ccbr] → [sg:critical][ccbr][critsmash:investigating]
Any progress on this?
(In reply to comment #78)
> Any progress on this?

eventually this bug has been turned to meta bug: real work happens in dependent bugs
sg:nse-meta per comment 79
Whiteboard: [sg:critical][ccbr][critsmash:investigating] → [sg:nse-meta][ccbr]
Assignee: surkov.alexander → bolterbugz
Regarding the wiki showing remaining holes, I'm not sure ::DestroyCaret is exploitable.
Marking DestroyCaret safe... 2 dependencies left.
blocking2.0: beta1+ → beta2+
I think we're done here. Thanks everyone!
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Group: core-security
You need to log in before you can comment on or make changes to this bug.