Closed Bug 1432846 (CVE-2018-5179) Opened 6 years ago Closed 6 years ago

Self-update service worker to stay alive

Categories

(Core :: DOM: Service Workers, defect, P2)

57 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- disabled
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- fixed

People

(Reporter: contact, Assigned: catalinb)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-dos, privacy, sec-low, Whiteboard: [Embargo until Chrome?][adv-main60+])

Attachments

(3 files, 5 obsolete files)

Attached file sw.go
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/604.4.7 (KHTML, like Gecko) Version/11.0.2 Safari/604.4.7

Steps to reproduce:

VULNERABILITY DETAILS
Service workers can self update to keep at least one version running.
This was also reported to Chrome (Spec bug?).

REPRODUCTION CASE
1. Create index.html to load SW
     <!DOCTYPE html>
     <script>
       navigator.serviceWorker.register('/sw.js');
     </script>

2. Create sw.js, and make sure the file will change every time the SW updates.
One simple way of doing this is by adding the current timestamp to sw.js.
(I attached a simple Go program for this).
      // Time: {now}
      function wait(ms) {
        // This is only to show
        const i = setInterval(() => console.log('is alive'), 1000);
        return new Promise(resolve => setTimeout(() => {
          clearInterval(i);
          resolve();
        }, ms));
      }

      self.addEventListener('activate', event => {
        // Don't wait too long or the worker will be killed and this does not work.
        event.waitUntil(wait(120000).then(() => {
             self.registration.update();
          }));
      });

3. Open the html file (http://localhost:5000/index.html).
This will register the SW and fire the |activate| event, which will wait some time and then self-update the SW. Because we include a unique timestamp in sw.js, SW will update and fire the |activate| event on the new worker.
This can be repeated forever to keep at least one version running.
Group: firefox-core-security → core-security
Component: Untriaged → DOM: Service Workers
Product: Firefox → Core
Summary: Security: Self-update service worker to stay alive → Self-update service worker to stay alive
This is bad for battery and memory, but its unclear to me if its a security issue.
I think this is a security issue because it violates the user's privacy.
Let's say we use the running SW to ping a server instead of logging a message.
Websites can now use this to track their users after they close the page, at least until they close the browser.
If these websites have permission to send push notifications, they are able to track users even if they restart Firefox like this:
1. SW pings server.
2. If server did not receive a ping for some time, it sends an invisible push notification to the user which will be delivered when the user opens Firefox again.
If a service worker has push notifications permission then the server can already send push notifications to start the service worker in the background.  It can use waitUntil() to hold the worker alive 5 minutes or so currently.  Its unclear to me if we have throttling that would prevent push notifications coming once every 5 minutes.

By track, what exactly do you mean?  Tracking based on their ip address?
Tracking could be done based on ip addresses, or by including a unique token in the name of the service worker script.
The ability to track a user was only an example of how it can be a privacy issue. People could also use this to execute tasks which consume a lot of cpu or memory.
I think there are good reasons for the restriction that service workers cannot do this, and the example above shows a way to bypass it.

If you decide to close this, i'd like to ask you to not make this report public yet. Chromium has a similar issue which can be used to run a service worker every time the browser starts and keep it running until the user closes the browser, but Chromium does not require the user to grant any permission to do this.
I'm not saying we shouldn't fix the issue.  I was trying to determine if its a security issue and if so, how severe it is:

https://wiki.mozilla.org/Security_Severity_Ratings

We may need a spec change to deal with the self update() call.
Do we have any way to have spec discussions about private bugs like this?
Flags: needinfo?(annevk)
There's no formalized way of handling it. I suggest discussing it here or in an equivalent bug of another browser. I copied Jake and Jungkee for their perspective.
Flags: needinfo?(annevk)
I'm also now in an email thread with Matt Falkenhagen  and Ali Alabbas.

The main concern so far is using this to mine bitcoin, etc.

There are a few solutions on the table.  In order of difficulty:

1. Reject self-update calls by service workers if there are no window/worker clients open.
2. Progressively delay the start of the update process every time self-update is called by a service worker while there are no clients open.  It will work a few times, but eventually the delay will be greater than the service worker timeout and it will be killed.  Opening a controlled client would reset the progressive delay.
3. Propagate the lifetime budget of the service worker through the update process to the new worker's install and activate events.  Those events budgets are limited by the initial budget of the service worker that did a self-update.  This is similar to how we solved self-postMessage(), but its a bit weird since the update results fires two events instead of one.

I think we ultimately want to do (3), but we don't have very good infrastructure to support that.  (1) might break some real sites.  I think we probably want to do (2) for now.

Once we implement budget lifetime propagation for self.postMessage() then we can maybe make updates use that mechanism as well.
May not need to be hidden unless chrome also wants to treat it as sensitive. Privacy issue (unlike Push the user hasn't agreed to it) and avenue for resource abuse, but not an exploit.
Marion, we should probably work this in the FF60 time frame.  We probably want to do one of our short term fixes (probable 2 from comment 8).
Flags: needinfo?(mdaly)
Assignee: nobody → catalin.badea392
Flags: needinfo?(mdaly)
Priority: -- → P2
Catalin, would you want to take a look at this for us?
Flags: needinfo?(catalin.badea392)
Group: core-security → dom-core-security
Hey Catalin, I think I mispoke in our meeting about this bug the other day.  I think I said something like we should store the current update delay on the WorkerPrivate so it resets when the SW thread is killed.  I don't think that is correct.

Instead we should probably store the current update delay on ServiceWorkerInfo and the ServiceWorkerManager should call a method to reset it when the list of controlled clients goes from zero to one entry.  Basically, if the user opened a window to this site, then clear the progressive delay mitigation.

Sorry for any confusion I might have introduced at the meeting.
Comment on attachment 8951365 [details] [diff] [review]
Delay update runnables from service workers that don't control any clients. r=bkelly

With this patch, calling registration.update() while not controlling clients will incurr a 5s * |multiplier| delay before the update is processed. The multiplier doubles for each request and resets when the worker starts controlling a document.
Flags: needinfo?(catalin.badea392)
Attachment #8951365 - Flags: review?(bkelly)
Attachment #8951366 - Flags: review?(bkelly)
Comment on attachment 8951365 [details] [diff] [review]
Delay update runnables from service workers that don't control any clients. r=bkelly

Review of attachment 8951365 [details] [diff] [review]:
-----------------------------------------------------------------

Please see the comments about not delaying update() calls that are initiated from other contexts then the SW's own thread.

Overall, I'm also a bit nervous about putting this in ServiceWorkerJobQueue.  That code has been particularly sensitive to breakage in the past.  The delayed job concept does not fit very well either as its unclear how a delayed job related to the queue concept.  Finally, since this is a temporary mitigation, I'd like to avoid putting it too deep in the life cycle code.

What about making update() in the service worker setup an nsITimer that then calls UpdateInternal()?  So the job isn't created until this timer executes.  The timer would have to lookup the registration again in case it went away, etc.

::: dom/serviceworkers/ServiceWorkerJobQueue.cpp
@@ +117,5 @@
> +{
> +  uint32_t delay = Preferences::GetInt("dom.serviceWorkers.update_delay",
> +                                       5000);
> +  uint32_t timeout = delay * mDelayMultiplier;
> +  mDelayMultiplier *= 2;

We need to reset the delay multiplier to zero when the registration starts controlling a client.  The easiest way to do this would be to put the mDelayedMultiplier on ServiceWorkerRegistrationInfo and reset it in StartControllingClient.

::: dom/serviceworkers/ServiceWorkerManager.cpp
@@ +327,5 @@
> +
> +  auto jobQueueTrigger = MakeScopeExit([&] {
> +    nsAutoCString scopeKey;
> +    nsresult rv = PrincipalToScopeKey(aRegistrationInfo->Principal(), scopeKey);
> +    Unused << rv;

nit: Don't declare rv and just do Unused << PrincipalToScopeKey() if that is the behavior you want.

@@ +2940,5 @@
>  
> +  if (registration->IsControllingClients()) {
> +    queue->ScheduleJob(job);
> +  } else {
> +    queue->ScheduleDelayedJob(job);

I don't think we want to delay update() calls from the main window.  We only want to delay self-update from the service worker.

Maybe you could add something to ServiceWorkerUpdateFinishCallback to tell if its a self-update.  Then you could do something like:

if (aCallback->IsSelfUpdate() && !registration->IsControllingClients()) {
  queue->ScheduleJob(job);
} else {
  queue->ScheduledDelayedJob(job);
}

See my other comment on a possibly different way to do it without modifying ServiceWorkerJob.
Attachment #8951365 - Flags: review?(bkelly) → review-
(In reply to Ben Kelly [:bkelly] from comment #16)
> Comment on attachment 8951365 [details] [diff] [review]
> Delay update runnables from service workers that don't control any clients.
> r=bkelly
> 
> Review of attachment 8951365 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Please see the comments about not delaying update() calls that are initiated
> from other contexts then the SW's own thread.
That's definitely a bug, yes.
> 
> Overall, I'm also a bit nervous about putting this in ServiceWorkerJobQueue.
> That code has been particularly sensitive to breakage in the past.  The
> delayed job concept does not fit very well either as its unclear how a
> delayed job related to the queue concept.  Finally, since this is a
> temporary mitigation, I'd like to avoid putting it too deep in the life
> cycle code.
> 
> What about making update() in the service worker setup an nsITimer that then
> calls UpdateInternal()?  So the job isn't created until this timer executes.
> The timer would have to lookup the registration again in case it went away,
> etc.
I think no matter how we approach this, something on the main thread needs to
own the pending update runnables/jobs until their timer fires. This allows
us to do proper cleanup at shutdown or when a registration is removed.

My initial choice with ServiceWorkerJobQueue was because it kept things sort
of contained.  After talking to baku, I think ServiceWorkerPrivate should own it:
If the SW is terminated and there's a pending update job on the main thread
holding a promise proxy, the shutdown procedure won't complete until
that promsie proxy is released. We might end up keeping dangling terminating
workers for a long time.

> 
> ::: dom/serviceworkers/ServiceWorkerJobQueue.cpp
> @@ +117,5 @@
> > +{
> > +  uint32_t delay = Preferences::GetInt("dom.serviceWorkers.update_delay",
> > +                                       5000);
> > +  uint32_t timeout = delay * mDelayMultiplier;
> > +  mDelayMultiplier *= 2;
> 
> We need to reset the delay multiplier to zero when the registration starts
> controlling a client.  The easiest way to do this would be to put the
> mDelayedMultiplier on ServiceWorkerRegistrationInfo and reset it in
> StartControllingClient.
> 
> ::: dom/serviceworkers/ServiceWorkerManager.cpp
> @@ +327,5 @@
> > +
> > +  auto jobQueueTrigger = MakeScopeExit([&] {
> > +    nsAutoCString scopeKey;
> > +    nsresult rv = PrincipalToScopeKey(aRegistrationInfo->Principal(), scopeKey);
> > +    Unused << rv;
> 
> nit: Don't declare rv and just do Unused << PrincipalToScopeKey() if that is
> the behavior you want.
> 
> @@ +2940,5 @@
> >  
> > +  if (registration->IsControllingClients()) {
> > +    queue->ScheduleJob(job);
> > +  } else {
> > +    queue->ScheduleDelayedJob(job);
> 
> I don't think we want to delay update() calls from the main window.  We only
> want to delay self-update from the service worker.
> 
> Maybe you could add something to ServiceWorkerUpdateFinishCallback to tell
> if its a self-update.  Then you could do something like:
> 
> if (aCallback->IsSelfUpdate() && !registration->IsControllingClients()) {
>   queue->ScheduleJob(job);
> } else {
>   queue->ScheduledDelayedJob(job);
> }
> 
> See my other comment on a possibly different way to do it without modifying
> ServiceWorkerJob.
Attachment #8951365 - Attachment is obsolete: true
Comment on attachment 8952339 [details] [diff] [review]
Delay update runnables from service workers that don't control any clients.

Review of attachment 8952339 [details] [diff] [review]:
-----------------------------------------------------------------

Looking better, but I think there are still some things to sort through.  Let me know if the comments below make sense.  Thanks!

::: dom/serviceworkers/ServiceWorkerInfo.cpp
@@ +324,5 @@
>                                           mCreationTimeStamp).ToMicroseconds());
>  }
>  
> +void
> +ServiceWorkerInfo::ResetUpdateDelay()

ResetUpdateDelay() doesn't seem to be called anywhere.

@@ +330,5 @@
> +  mDelayMultiplier = 0;
> +}
> +
> +uint32_t
> +ServiceWorkerInfo::GetUpdateDelay(bool aIsControllingClients)

I think the delay probably needs to be stored on ServiceWorkerRegistrationInfo instead of the ServiceWorkerInfo.  In a chained update scenario we will have multiple ServiceWorkerInfo objects cycling through.  We don't want to lose the delay the install of each new worker.

::: dom/serviceworkers/ServiceWorkerManager.h
@@ +316,5 @@
> +  // This method generates a key using appId and isInElementBrowser from the
> +  // principal. We don't use the origin because it can change during the
> +  // loading.
> +  static nsresult
> +  PrincipalToScopeKey(nsIPrincipal* aPrincipal, nsACString& aKey);

I don't think you need to make this public.

::: dom/serviceworkers/ServiceWorkerRegistrationImpl.cpp
@@ +321,5 @@
> +      return NS_OK;
> +    }
> +
> +    nsAutoCString scopeKey;
> +    nsresult rv = ServiceWorkerManager::PrincipalToScopeKey(principal, scopeKey);

It seems like scopeKey is not used here and can be removed.

@@ +333,5 @@
> +      return NS_OK;
> +    }
> +
> +    uint64_t id = mPromiseProxy->GetWorkerPrivate()->ServiceWorkerID();
> +    RefPtr<ServiceWorkerInfo> worker = registration->GetServiceWorkerInfoById(id);

If we store the delay on the registration this bit can go away.

@@ +341,5 @@
> +    // if we have a timer object, it means we've already been delayed once.
> +    if (delay && !mTimer) {
> +      auto cb = [] (nsITimer* aTimer, void* aClosure) {
> +        MOZ_ASSERT(aClosure);
> +        static_cast<SWRUpdateRunnable*>(aClosure)->Run();

This is unsafe.  The closure is not ref-counted by the timer.  It is held alive by the ServiceWorkerPrivate::StoreISupports(), but that reference is cleared in TerminateWorker().  If TerminateWorker() is called before the timer fires then this will hit a UAF.

Rather than using StoreISupports(), I think it might be better to just add an explicit timer list:

  nsTArray<nsCOMPtr<nsITimer>> mTimerList;

The timer can hold a strong ref to the runnable like:

  RefPtr<ServiceWorkerPrivate> swp = worker->WorkerPrivate();
  nsCOMPtr<nsIRunnable> self = this;
  auto cb = [self = Move(self), swp = Move(swp)] (nsITimer* aTimer, void* aClosure) {
    swp->ClearTimer(aTimer);
    self->Run()
  }

The ServiceWorkerPrivate can then also cancel the timers in NoteDeadServiceWorkerInfo(), etc.  This would be necessary to break the ref-cycle from timer->SWP->timer in a timely manner when we are shutting down.

Or maybe make the SWP hold a list of the update runnables themselves and give it an explicit cancel method, etc.  Not sure which would be cleaner.

@@ +348,5 @@
> +
> +      Result<nsCOMPtr<nsITimer>, nsresult> result =
> +        NS_NewTimerWithFuncCallback(cb, this, delay, nsITimer::TYPE_ONE_SHOT,
> +                                    "dom::SWRUpdateRunnable::TimerRunnable",
> +                                    GetCurrentThreadEventTarget());

Instead of GetCurrentThreadEventTarget(), please use SystemGroup::GetEventTarget(TaskCategory::Other).

@@ +350,5 @@
> +        NS_NewTimerWithFuncCallback(cb, this, delay, nsITimer::TYPE_ONE_SHOT,
> +                                    "dom::SWRUpdateRunnable::TimerRunnable",
> +                                    GetCurrentThreadEventTarget());
> +      mTimer = result.unwrapOr(nullptr);
> +      MOZ_ASSERT(mTimer);

Can we make this a runtime check and only save our state on SWP if we pass it?
Attachment #8952339 - Flags: review?(bkelly) → review-
The timer should be cancelled on release, so we shouldn't hit a UAF. Unless I'm wrong?
https://searchfox.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#326
SWRUpdateRunnable now explicitly calls Cancel in its destructor. Addressed other
comments.
Attachment #8952339 - Attachment is obsolete: true
Attachment #8952746 - Attachment is obsolete: true
Attachment #8952748 - Attachment is obsolete: true
Attachment #8952748 - Flags: review?(bkelly)
Comment on attachment 8952749 [details] [diff] [review]
Delay update runnables from service workers that don't control any clients.

Review of attachment 8952749 [details] [diff] [review]:
-----------------------------------------------------------------

Just took a quick look at some of your changes.
Let me know if the comments below make sense. Thanks!

::: dom/serviceworkers/ServiceWorkerRegistrationImpl.cpp
@@ +350,5 @@
> +      worker->WorkerPrivate()->StoreISupports(static_cast<nsIRunnable*>(this));
> +      return NS_OK;
> +    }
> +
> +    if (mTimer) {

That logic looks wrong to me.
If update() is delayed for the first time, |mTimer| will be nullptr and the update will be delayed because the condition in line 336 is true. If a worker immediately calls update() again without waiting for the promise to settle, that condition won't be true because we have a timer now.
If we have a timer, this condition here will be true, the timer will be deleted and the update is executed without delay. This means workers would just need to call update two times in a row to keep themselves alive.

::: dom/serviceworkers/ServiceWorkerRegistrationInfo.cpp
@@ +719,5 @@
> +uint32_t
> +ServiceWorkerRegistrationInfo::GetUpdateDelay()
> +{
> +  if (!mControlledClientsCounter) {
> +    ++mDelayMultiplier;

I think you might want to do this after calculating the current delay to not delay the first update.

@@ +730,5 @@
> +  if (mDelayMultiplier >= INT_MAX / (delay ? delay : 1)) {
> +    return INT_MAX;
> +  }
> +
> +  delay *= mDelayMultiplier;

Wouldn't it be better to store the delay and double it every time this method is called?
This way the backoff would be exponential instead of linear.

Matt Falkenhagen pointed out in the Chromium review that workers get a lot of time on every update() (It's 5 min install + 5 min activate in Chromium, not sure about the exact times in Firefox, but I expect them to be similar).
Calling update() n times will result in n*10 minutes runtime, and if the delay is increased only 5 seconds every time update() is called, that will result a lot of runtime.
(In reply to Yannic Bonenberger from comment #24)
> ::: dom/serviceworkers/ServiceWorkerRegistrationImpl.cpp
> @@ +350,5 @@
> > +      worker->WorkerPrivate()->StoreISupports(static_cast<nsIRunnable*>(this));
> > +      return NS_OK;
> > +    }
> > +
> > +    if (mTimer) {
> 
> That logic looks wrong to me.
> If update() is delayed for the first time, |mTimer| will be nullptr and the
> update will be delayed because the condition in line 336 is true. If a
> worker immediately calls update() again without waiting for the promise to
> settle, that condition won't be true because we have a timer now.
> If we have a timer, this condition here will be true, the timer will be
> deleted and the update is executed without delay. This means workers would
> just need to call update two times in a row to keep themselves alive.

I don't think this works like you are interpreting it.  The mTimer member is on on the SWRUpdateRunnable and you get a new runnable every time script calls update().  So each update() call gets its own mTimer member.  Stacked update() calls should not interfere with each other.
You're right, I interpreted this to be one mTimer member on ServiceWorkerRegistrationImpl. If every call gets its own timer, than this works as it's supposed to.

(In reply to Ben Kelly [:bkelly] from comment #25)
> (In reply to Yannic Bonenberger from comment #24)
> > ::: dom/serviceworkers/ServiceWorkerRegistrationImpl.cpp
> > @@ +350,5 @@
> > > +      worker->WorkerPrivate()->StoreISupports(static_cast<nsIRunnable*>(this));
> > > +      return NS_OK;
> > > +    }
> > > +
> > > +    if (mTimer) {
> > 
> > That logic looks wrong to me.
> > If update() is delayed for the first time, |mTimer| will be nullptr and the
> > update will be delayed because the condition in line 336 is true. If a
> > worker immediately calls update() again without waiting for the promise to
> > settle, that condition won't be true because we have a timer now.
> > If we have a timer, this condition here will be true, the timer will be
> > deleted and the update is executed without delay. This means workers would
> > just need to call update two times in a row to keep themselves alive.
> 
> I don't think this works like you are interpreting it.  The mTimer member is
> on on the SWRUpdateRunnable and you get a new runnable every time script
> calls update().  So each update() call gets its own mTimer member.  Stacked
> update() calls should not interfere with each other.
Comment on attachment 8952749 [details] [diff] [review]
Delay update runnables from service workers that don't control any clients.

Review of attachment 8952749 [details] [diff] [review]:
-----------------------------------------------------------------

Still looking better, but I'm sorry I can't get over the raw void * pointer passed in the closure.  I've just had to fix too many of those in the past, even if its nsTimer code handles it better today.  I would really prefer a strong ref.

::: dom/serviceworkers/ServiceWorkerManager.cpp
@@ +324,5 @@
>  
>    const ServiceWorkerDescriptor& active =
>      aRegistrationInfo->GetActive()->Descriptor();
>  
> +  aRegistrationInfo->ResetUpdateDelay();

I think it would be cleaner to make ResetUpdateDelay() private and call it from ServiceWorkerRegistrationInfo::StartControllingClient().

https://searchfox.org/mozilla-central/rev/9a8d3f8191b15cdca89a7ce044c7bea2dd0462dc/dom/serviceworkers/ServiceWorkerRegistrationInfo.h#94

Or just set mUpdateMultiplier to zero directly there.

::: dom/serviceworkers/ServiceWorkerRegistrationImpl.cpp
@@ +286,5 @@
>    SWRUpdateRunnable(PromiseWorkerProxy* aPromiseProxy, const nsAString& aScope)
>      : Runnable("dom::SWRUpdateRunnable")
>      , mPromiseProxy(aPromiseProxy)
>      , mScope(aScope)
> +    , mTimer(nullptr)

nit: You don't need to explicitly initialize nsCOMPtr<> with nullptr.

@@ +313,5 @@
>          return NS_OK;
>        }
>  
>        principal = mPromiseProxy->GetWorkerPrivate()->GetPrincipal();
> +      id = mPromiseProxy->GetWorkerPrivate()->ServiceWorkerID();

Rather than use the ServiceWorkerID(), please use ServiceWorkerDescriptor.  We currently have some problems with matching only on ID in multi-e10s.  The descriptor provides more of a safe match.

I think it would be a bit preferable to capture this on the worker thread in the constructor.

@@ +328,5 @@
> +    if (NS_WARN_IF(!registration)) {
> +      return NS_OK;
> +    }
> +
> +    RefPtr<ServiceWorkerInfo> worker = registration->GetServiceWorkerInfoById(id);

Please use ServiceWorkerRegistrationInfo::GetByDescriptor() here.

Also, please add a comment that a lot of this is very specific to our partial implementation.  It currently does not handle:

1. ServiceWorkerRegistration.update() called from other worker types.  It assumes its only ever called from a ServiceWorker global.
2. It assumes the registration corresponds to self.registration on the service worker global.  It doesn't handle a registration for a different scope from the current service worker.

These work today because the only worker context ServiceWorkerRegistration we expose is self.registration.  When we expose ServiceWorkerContainer and ServiceWorker on worker contexts fully this code will have to handle those other situations.

@@ +335,5 @@
> +    // if we have a timer object, it means we've already been delayed once.
> +    if (delay && !mTimer) {
> +      auto cb = [] (nsITimer* aTimer, void* aClosure) {
> +        MOZ_ASSERT(aClosure);
> +        static_cast<SWRUpdateRunnable*>(aClosure)->Run();

I'm sorry, but I still think using an un-ref'd closure is a huge risk.  I've spent too much time removing this kind of usage from SWM to add another one.  I've even talked with Nathan about removing it from the nsITimer API because its a footgun.

I feel like we should be able to avoid it in this case without too much effort.

For example:

1. Remove the mTimer member on SWRUpdateRunnable and replace it with a mDelayed boolean.
2. Store the nsITimer in the ServiceWorkerPrivate::StoreISupports();
3. Have the lambda capture a strong ref to the runnable instead of passing it through as a void* closure.

This will rely on the auto-cancel on last ref, but I think I prefer that over use a raw pointer for closure here.  And if we wanted to we could even add an explicit cancel by making TerminateWorker QI to nsITimer and call Cancel(), etc.

@@ +346,5 @@
> +      if (NS_WARN_IF(!mTimer)) {
> +        return NS_OK;
> +      }
> +
> +      worker->WorkerPrivate()->StoreISupports(static_cast<nsIRunnable*>(this));

Please change the MOZ_ASSERT(mWorkerPrivate) in StoreISupports() to a MOZ_DIAGNOSTIC_ASSERT().  I want to make sure future changes don't accidentally call StoreISupports on a ServiceWorkerPrivate that does not have a running worker.  Again, its not possible with our self.registration partial implementation, but it would be easy to accidentally do in the future.

Also, please add a comment indicating we are storing the runnable on the calling workers private and we expect the reference to be dropped when that SWP terminates.

::: dom/serviceworkers/ServiceWorkerRegistrationInfo.cpp
@@ +730,5 @@
> +  if (mDelayMultiplier >= INT_MAX / (delay ? delay : 1)) {
> +    return INT_MAX;
> +  }
> +
> +  delay *= mDelayMultiplier;

I do agree we're probably not aggressive enough with the delay.  I think we should aim for an exponential delay progression like:

0
30 seconds
900 seconds

This would give a malicious script about 30 minutes of processing time.
Attachment #8952749 - Flags: review?(bkelly) → review-
Comment on attachment 8951366 [details] [diff] [review]
Add test for self updating service worker. r=bkelly

Review of attachment 8951366 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!  r=me with comments addressed.

::: dom/serviceworkers/test/self_update_worker.sjs
@@ +9,5 @@
> +    for (i = 0; i < clients.length; i++) {
> +      clients[i].postMessage({version: version});
> +    }
> +  }).then(function() {
> +    self.registration.update();

Do we want waitUntil() to block on the update() promise?  Its not being returned at the moment.

@@ +11,5 @@
> +    }
> +  }).then(function() {
> +    self.registration.update();
> +  });
> +  event.waitUntil(promise);

nit: We can use async functions to make this a bit easier to read:

event.waitUntil(async function() {
  let clients = await clients.matchAll({ includeUncontrolled: true });
  await self.registration.update();
}());

@@ +16,5 @@
> +};
> +`;
> +
> +function handleRequest(request, response) {
> +  let count = getState("count");

Nice.  I didn't know about getState() and setState() in .py scripts here.

::: dom/serviceworkers/test/test_self_update_worker.html
@@ +35,5 @@
> +  }
> +
> +  await SpecialPowers.pushPrefEnv({"set": [
> +    ["dom.serviceWorkers.idle_timeout", 299999],
> +    ["dom.serviceWorkers.idle_timeout", 0],

This sets idle_timeout twice?

@@ +47,5 @@
> +      return waitForState(worker, 'activated', registration);
> +    });
> +
> +  // We need to wait a reasonable time to give the self updating worker a chance
> +  // to change to a newer version. Register and activate an empty worker two times.

I think maybe we should go through more iterations of the dummy scope registration/activation/uninstall.  You do two here, but we also only check that two of the test script are activated as well.  It seems like its possible for those to end up matching simply by chance.  Can we do something like 10 iterations of the dummy stuff instead?

@@ +52,5 @@
> +  await navigator.serviceWorker.register("empty.js", { scope: "./empty/"})
> +    .then(function(registration) {
> +      worker = registration.installing;
> +      return waitForState(worker, 'activated', registration).then(function() {
> +        registration.unregister();

Please wait for the unregister() promise to settle.

@@ +56,5 @@
> +        registration.unregister();
> +      });
> +    });
> +
> +  await navigator.serviceWorker.register("empty.js", { scope: "./empty/"})

Maybe use a cache busting scope here, like `"./empty?q=" + Date.now()`.

@@ +62,5 @@
> +      worker = registration.installing;
> +      return waitForState(worker, 'activated', registration).then(function() {
> +        registration.unregister();
> +      });
> +    });

nit: Can you refactor this into a "waitForDummyScope" or something?

@@ +67,5 @@
> +
> +
> +  await registration.unregister();
> +  await SpecialPowers.popPrefEnv();
> +  await SpecialPowers.popPrefEnv();

I don't think you should have to explicitly popPrefEnv().  The test harness does it automatically at the end.
Attachment #8951366 - Flags: review?(bkelly) → review+
Delay now is 1s * [0, 30, 900, ..]. SWP now holds the timer object instead of the runnable.
Addressed other comments too.
Attachment #8954360 - Flags: review?(bkelly)
Attachment #8952749 - Attachment is obsolete: true
Comment on attachment 8954360 [details] [diff] [review]
Delay update runnables from service workers that don't control any clients.

Review of attachment 8954360 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: dom/serviceworkers/ServiceWorkerRegistrationInfo.h
@@ +213,5 @@
>    const ServiceWorkerRegistrationDescriptor&
>    Descriptor() const;
>  
> +  void
> +  ResetUpdateDelay();

I think we can remove this declaration now.
Attachment #8954360 - Flags: review?(bkelly) → review+
Flags: needinfo?(catalin.badea392)
Group: dom-core-security → core-security-release
Depends on: 1445475
Whiteboard: [adv-main60+]
Alias: CVE-2018-5179
Yannic: I can't find a chrome bug on this which may mean it's still hidden (I searched for bugs filed by you, which might also be the problem if you use a different name there). Please add a link to the chrome bug in a comment here so we can check when they've gone public with it.
Flags: needinfo?(contact)
Whiteboard: [adv-main60+] → [Embargo until Chrome?][adv-main60+]
Hi Catalin!

Would this fix benefit from manual testing?

If so, could you provide us some clear steps to reproduce this issue? I tried running the proof of concept from comment 0 but I can't manage to verify if this issue was fixed or not.

Thanks!
Flags: needinfo?(catalin.badea392)
Responding for :catalinb who has moved on from MoCo.  The "Add test for self updating service worker." patch included an automated test which should provide the necessary test coverage.  :bkelly performed the initial review of the automated test and I also inspected the test when reviewing a follow-up patch, so I think we have sufficient confidence in the test.  So there shouldn't be a need for additional manual testing.
Flags: needinfo?(catalin.badea392)
Awesome.

Thanks Andrew!
Flags: qe-verify-
Daniel: Sorry that I didn't reply earlier. The Chromium bug is http://crbug.com/805496
Flags: needinfo?(contact)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: