Closed Bug 1204596 Opened 9 years ago Closed 9 years ago

Remove the code added in bug 1164397 for handling redirections

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox43 --- wontfix
firefox44 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

References

Details

Attachments

(4 files)

The underlying cause of bug 1164397 was that we were previously not tainting the response of the fetch() when it was being redirected cross origin, which Ben has fixed in 1184607.  It seems that some of the code added in that bug is no longer necessary, and should be removed.
Status: NEW → ASSIGNED
I believe this means we can remove the redirect URL from Cache.  That can wait until later, though.  We should stop overriding the final URL in the intercepted channel now, though.
Making this block v1 since this will fix our Response.url implementation.
I have adjusted a few tests to expect a different Response.url value when there is a service worker present.

Requesting review from Ben on the cache stuff, and Nikhil on the rest, although it would be nice if Ben can also skim the rest and sanity check!
Attachment #8661562 - Flags: review?(nsm.nikhil)
Attachment #8661562 - Flags: review?(bkelly)
The migration code ended up larger than what I liked, but I think it's straightforward, as I'm just following the instructions in the sqlite docs.  One important point is that I have modified things so that each migrator function is expected to start its own transaction if it wants to, as I needed to disable foreign key checking, and that can only be done outside of transactions.  The test_migration.js change is checking reading something from the response_headers table which has a foreign key referencing the entries table, and verifies that the migration leaves the foreign key constraints valid.
Attachment #8661563 - Flags: review?(bkelly)
The last remaining thing to do in this bug is to figure out what we want to do with all of the tests I added in bug 1164397.  I think they are of no value and we should remove them all.  I have asked Ben to verify my sanity on that, and if he gives the green light, I'll prepare a removal patch.  It should be simple.
(In reply to Ehsan Akhgari (don't ask for review please) from comment #5)
> The last remaining thing to do in this bug is to figure out what we want to
> do with all of the tests I added in bug 1164397.  I think they are of no
> value and we should remove them all.  I have asked Ben to verify my sanity
> on that, and if he gives the green light, I'll prepare a removal patch.  It
> should be simple.

I think we should keep the tests for now.  The WPT tests don't run on mulet, android, or b2g, so its nice to have some coverage there.  In the long term I still think we should be moving to WPT so I prefer writing new tests there, but no sense to throw these away since they already exist.
OK, fair enough!
This needs to be done so that we match the manual redirect mode
for navigations when the response is stored in the cache.
Attachment #8661834 - Flags: review?(bkelly)
Comment on attachment 8661562 [details] [diff] [review]
Part 1: Avoid overriding the channel final URI when it gets intercepted

Review of attachment 8661562 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/cache/DBSchema.cpp
@@ -1603,5 @@
>        "response_body_id, "
>        "response_security_info_id, "
>        "response_principal_info, "
> -      "response_redirected, "
> -      "response_redirected_url, "

This is fine with the other patches, but I don't think it would work on its own.  There is no default value and the columns are marked NOT NULL.

::: dom/tests/mochitest/fetch/reroute.js
@@ -7,5 @@
>  
>      e.respondWith(fetch(url, {
>        method: e.request.method,
>        headers: e.request.headers,
> -      body: e.request.body,

While this code was definitely wrong, we still effectively strip the POST body off here.  I tried fixing that at one point and some cases started failing.  We might want to look at that.

I think it needs to be something like:

  e.respondWith(e.request.text().then(function(text) {
    var body = text === '' ? null : text;
    return fetch(url, {
      method: e.request.method,
      headers: e.request.headers,
      body: body,
      mode: e.request.mode,
      credentials: e.request.credentials,
      cache: e.request.cache
    });
  });

We also probably want redirect: e.request.redirect now.

We should file a follow up bug if you don't want to fix it here.
Attachment #8661562 - Flags: review?(bkelly) → review+
Comment on attachment 8661563 [details] [diff] [review]
Part 2: Update the schema of the DOM Cache database to remove the response_redirected and response_redirected_url columns

Review of attachment 8661563 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the transaction issue fixed.  Thanks!

::: dom/cache/DBSchema.cpp
@@ +373,5 @@
>  
> +  // Start a transaction after any possible migrations are finished.
> +  // Migrations that require transactions are responsible to set up their own.
> +  mozStorageTransaction trans(aConn, false,
> +                              mozIStorageConnection::TRANSACTION_IMMEDIATE);

Can we disable the foreign key checking at the top of CreateOrMigrateSchema?  And only restore the foreign key checking on return if it was actually disabled?

Then we could keep a single transaction for all migrations and Validate().  I would like to keep them in a single transaction if possible so the database ends up in its original state if the validation fails.

@@ +2517,5 @@
> +  // response_redirected_url columns from the entries table.  sqlite doesn't
> +  // support removing a column from a table using ALTER TABLE, so we need to
> +  // create a new table without those columns, fill it up with the existing
> +  // data, and then drop the original table and rename the new one to the old
> +  // one.

An alternative I thought up this morning:

  - Delete all current values from response_redirected and response_redirected_url.
  - Rename the columns to "unused_int_1" and "unused_text_1".
  - Leave them in the schema and we will use them when we next need to add an integer or text column.

That would probably be less code bloat and have minimal impact on the database.

I'm ok doing what you have here, though.  Just offering that as an alternative.

@@ +2634,5 @@
> +  // Revalidate the foreign key constraints, and ensure that there are no
> +  // violations.
> +  nsCOMPtr<mozIStorageStatement> state;
> +  rv = aConn->CreateStatement(NS_LITERAL_CSTRING(
> +    "PRAGMA foreign_key_check;"

ExecuteSimpleSQL() here as it doesn't look like we're binding any parameters.

@@ +2649,5 @@
> +  // breaking the checks in Validate().
> +  rv = RewriteEntriesSchema(aConn);
> +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
> +
> +  // Commit the transaction before we attempt to restore foreign key checking.

It seems you restore foreign key checking above.  Is this comment accurate?

@@ +2653,5 @@
> +  // Commit the transaction before we attempt to restore foreign key checking.
> +  rv = trans.Commit();
> +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
> +
> +  rv = aConn->SetSchemaVersion(17);

Any reason not to include setting the schema version in the transaction?
Attachment #8661563 - Flags: review?(bkelly) → review+
Attachment #8661834 - Flags: review?(bkelly) → review+
(In reply to Ben Kelly [:bkelly] from comment #9)
> Comment on attachment 8661562 [details] [diff] [review]
> Part 1: Avoid overriding the channel final URI when it gets intercepted
> 
> Review of attachment 8661562 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/cache/DBSchema.cpp
> @@ -1603,5 @@
> >        "response_body_id, "
> >        "response_security_info_id, "
> >        "response_principal_info, "
> > -      "response_redirected, "
> > -      "response_redirected_url, "
> 
> This is fine with the other patches, but I don't think it would work on its
> own.  There is no default value and the columns are marked NOT NULL.

Yeah, it indeed doesn't work on its own!

> ::: dom/tests/mochitest/fetch/reroute.js
> @@ -7,5 @@
> >  
> >      e.respondWith(fetch(url, {
> >        method: e.request.method,
> >        headers: e.request.headers,
> > -      body: e.request.body,
> 
> While this code was definitely wrong, we still effectively strip the POST
> body off here.  I tried fixing that at one point and some cases started
> failing.  We might want to look at that.
> 
> I think it needs to be something like:
> 
>   e.respondWith(e.request.text().then(function(text) {
>     var body = text === '' ? null : text;
>     return fetch(url, {
>       method: e.request.method,
>       headers: e.request.headers,
>       body: body,
>       mode: e.request.mode,
>       credentials: e.request.credentials,
>       cache: e.request.cache
>     });
>   });
> 
> We also probably want redirect: e.request.redirect now.
> 
> We should file a follow up bug if you don't want to fix it here.

Filed bug 1205495.  Will fix tomorrow.
(In reply to Ben Kelly [:bkelly] from comment #10)
> ::: dom/cache/DBSchema.cpp
> @@ +373,5 @@
> >  
> > +  // Start a transaction after any possible migrations are finished.
> > +  // Migrations that require transactions are responsible to set up their own.
> > +  mozStorageTransaction trans(aConn, false,
> > +                              mozIStorageConnection::TRANSACTION_IMMEDIATE);
> 
> Can we disable the foreign key checking at the top of CreateOrMigrateSchema?
> And only restore the foreign key checking on return if it was actually
> disabled?
> 
> Then we could keep a single transaction for all migrations and Validate(). 
> I would like to keep them in a single transaction if possible so the
> database ends up in its original state if the validation fails.

Sure, that's a good idea!

> @@ +2517,5 @@
> > +  // response_redirected_url columns from the entries table.  sqlite doesn't
> > +  // support removing a column from a table using ALTER TABLE, so we need to
> > +  // create a new table without those columns, fill it up with the existing
> > +  // data, and then drop the original table and rename the new one to the old
> > +  // one.
> 
> An alternative I thought up this morning:
> 
>   - Delete all current values from response_redirected and
> response_redirected_url.
>   - Rename the columns to "unused_int_1" and "unused_text_1".
>   - Leave them in the schema and we will use them when we next need to add
> an integer or text column.
> 
> That would probably be less code bloat and have minimal impact on the
> database.
> 
> I'm ok doing what you have here, though.  Just offering that as an
> alternative.

Hmm, since I have this code already working, I prefer to leave it as is.  :-)

> @@ +2634,5 @@
> > +  // Revalidate the foreign key constraints, and ensure that there are no
> > +  // violations.
> > +  nsCOMPtr<mozIStorageStatement> state;
> > +  rv = aConn->CreateStatement(NS_LITERAL_CSTRING(
> > +    "PRAGMA foreign_key_check;"
> 
> ExecuteSimpleSQL() here as it doesn't look like we're binding any parameters.

I did this because I need to check to make sure this query doesn't return any data.  It's supposed to return a list of violations that it finds, if any.

> @@ +2649,5 @@
> > +  // breaking the checks in Validate().
> > +  rv = RewriteEntriesSchema(aConn);
> > +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
> > +
> > +  // Commit the transaction before we attempt to restore foreign key checking.
> 
> It seems you restore foreign key checking above.  Is this comment accurate?

This code is gone now.

> @@ +2653,5 @@
> > +  // Commit the transaction before we attempt to restore foreign key checking.
> > +  rv = trans.Commit();
> > +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
> > +
> > +  rv = aConn->SetSchemaVersion(17);
> 
> Any reason not to include setting the schema version in the transaction?

The whole thing is wrapped in a single transaction now!

Thanks for the review!
And on those rare times when we ran debug Android tests, a crash there, https://treeherder.mozilla.org/logviewer.html#?job_id=14223953&repo=mozilla-inbound
(In reply to Phil Ringnalda (:philor) from comment #15)
> And on those rare times when we ran debug Android tests, a crash there,
> https://treeherder.mozilla.org/logviewer.html#?job_id=14223953&repo=mozilla-
> inbound

Nikhil, this seems to be a bug in the worker promise code.  Does this look familiar?
Flags: needinfo?(nsm.nikhil)
(In reply to Ehsan Akhgari (don't ask for review please) from comment #16)
> (In reply to Phil Ringnalda (:philor) from comment #15)
> > And on those rare times when we ran debug Android tests, a crash there,
> > https://treeherder.mozilla.org/logviewer.html#?job_id=14223953&repo=mozilla-
> > inbound
> 
> Nikhil, this seems to be a bug in the worker promise code.  Does this look
> familiar?

Is this reproducible consistently? We clean up the promise if the worker has already started shutting down, but in that case WorkerRun() should not be invoked. Try adding some printfs in PromiseWorkerProxy::CleanProperties's callers (Create() and Notify()) to see who calls it and sets mWorkerPromise to null for that. Maybe this is a synchronization issue, but mWorkerPromise is only ever touched on the worker thread.

Perhaps there is a edge case where the worker runnable can run even if the worker is in the canceling state.
Flags: needinfo?(nsm.nikhil)
(In reply to Nikhil Marathe [:nsm] (please needinfo?) from comment #17)
> (In reply to Ehsan Akhgari (don't ask for review please) from comment #16)
> > (In reply to Phil Ringnalda (:philor) from comment #15)
> > > And on those rare times when we ran debug Android tests, a crash there,
> > > https://treeherder.mozilla.org/logviewer.html#?job_id=14223953&repo=mozilla-
> > > inbound
> > 
> > Nikhil, this seems to be a bug in the worker promise code.  Does this look
> > familiar?
> 
> Is this reproducible consistently?

Yes, this is 100% reproducible on the Android emulator with just my test changes from this bug <https://treeherder.mozilla.org/#/jobs?repo=try&revision=faaf798a8817> (ignore the test failures before the crash since they are a result of the test changes not being accompanied by the code changes.)

> We clean up the promise if the worker has
> already started shutting down, but in that case WorkerRun() should not be
> invoked. Try adding some printfs in PromiseWorkerProxy::CleanProperties's
> callers (Create() and Notify()) to see who calls it and sets mWorkerPromise
> to null for that.

OK.

> Maybe this is a synchronization issue, but mWorkerPromise
> is only ever touched on the worker thread.

It's not obvious to me how this is true actually.  mWorkerPromise() is touched from CleanUp() which can happen both from the worker and the main thread.  In Fetch.cpp, the worker consumers of PromiseWorkerProxy do no synchronization, but the main thread ones do.  I don't understand how that can be safe...

> Perhaps there is a edge case where the worker runnable can run even if the
> worker is in the canceling state.

Any idea how I would verify that?
Flags: needinfo?(nsm.nikhil)
After trying a few things on the try server today, this change in the first patch in this series turns the test green:

diff --git a/dom/tests/mochitest/fetch/test_fetch_cors.js b/dom/tests/mochitest/fetch/test_fetch_cors.js
--- a/dom/tests/mochitest/fetch/test_fetch_cors.js
+++ b/dom/tests/mochitest/fetch/test_fetch_cors.js
@@ -1272,21 +1272,21 @@ function testRedirects() {
     var request = new Request(req.url, { method: req.method,
                                          headers: req.headers,
                                          body: req.body });
     fetches.push((function(request, test) {
       return fetch(request).then(function(res) {
         ok(test.pass, "Expected test to pass for " + test.toSource());
         is(res.status, 200, "wrong status in test for " + test.toSource());
         is(res.statusText, "OK", "wrong status text for " + test.toSource());
-        var reqHost = (new URL(req.url)).host;
         // If there is a service worker present, the redirections will be
         // transparent, assuming that the original request is to the current
         // site and would be intercepted.
         if (isSWPresent) {
+          var reqHost = (new URL(req.url)).host;
           if (reqHost === location.host) {
             is((new URL(res.url)).host, reqHost, "Response URL should be original URL with a SW present");
           }
         } else {
           is((new URL(res.url)).host, (new URL(test.hops[test.hops.length-1].server)).host, "Response URL should be redirected URL");
         }
         return res.text().then(function(v) {
           is(v, "<res>hello pass</res>\n",


That's pretty solid evidence that this is caused by a timing issue, and that we're dealing with a race condition.

The b2g failures still persist:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4dea7c4d88e

Oh, and adding any logging will make the bug go away.
(In reply to Ehsan Akhgari (don't ask for review please) from comment #18)
> 
> > Maybe this is a synchronization issue, but mWorkerPromise
> > is only ever touched on the worker thread.
> 
> It's not obvious to me how this is true actually.  mWorkerPromise() is
> touched from CleanUp() which can happen both from the worker and the main
> thread.  In Fetch.cpp, the worker consumers of PromiseWorkerProxy do no
> synchronization, but the main thread ones do.  I don't understand how that
> can be safe...

I don't see where mWorkerPromise is touched on the main thread, there are assertions in all methods touching mWorkerPromise that assert on worker thread. CleanUp() can only be called on the worker thread. CleanedUp() is different.
Flags: needinfo?(nsm.nikhil)
(In reply to Ehsan Akhgari (don't ask for review please) from comment #19)
> After trying a few things on the try server today, this change in the first
> patch in this series turns the test green:
> 
> diff --git a/dom/tests/mochitest/fetch/test_fetch_cors.js
> b/dom/tests/mochitest/fetch/test_fetch_cors.js
> --- a/dom/tests/mochitest/fetch/test_fetch_cors.js
> +++ b/dom/tests/mochitest/fetch/test_fetch_cors.js
> @@ -1272,21 +1272,21 @@ function testRedirects() {
>      var request = new Request(req.url, { method: req.method,
>                                           headers: req.headers,
>                                           body: req.body });
>      fetches.push((function(request, test) {
>        return fetch(request).then(function(res) {
>          ok(test.pass, "Expected test to pass for " + test.toSource());
>          is(res.status, 200, "wrong status in test for " + test.toSource());
>          is(res.statusText, "OK", "wrong status text for " +
> test.toSource());
> -        var reqHost = (new URL(req.url)).host;
>          // If there is a service worker present, the redirections will be
>          // transparent, assuming that the original request is to the current
>          // site and would be intercepted.
>          if (isSWPresent) {
> +          var reqHost = (new URL(req.url)).host;
>            if (reqHost === location.host) {
>              is((new URL(res.url)).host, reqHost, "Response URL should be
> original URL with a SW present");
>            }
>          } else {
>            is((new URL(res.url)).host, (new
> URL(test.hops[test.hops.length-1].server)).host, "Response URL should be
> redirected URL");
>          }
>          return res.text().then(function(v) {
>            is(v, "<res>hello pass</res>\n",
> 
> 
> That's pretty solid evidence that this is caused by a timing issue, and that
> we're dealing with a race condition.
> 
> The b2g failures still persist:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4dea7c4d88e
> 
> Oh, and adding any logging will make the bug go away.

The URL API implementation on workers uses runnables dispatched to the main thread that *block the worker thread* by spinning a nested event loop to give an illusion of synchronicity.
(In reply to Nikhil Marathe [:nsm] (please needinfo?) from comment #21)
> The URL API implementation on workers uses runnables dispatched to the main
> thread that *block the worker thread* by spinning a nested event loop to
> give an illusion of synchronicity.

Hmm, what does that nested event loop do?  Could this situation result in the WorkerFetchResponseRunnable and WorkerFetchResponseEndRunnable runnables to run on the worker out of order?

Also, couldn't we CleanUp() the PromiseWorkerProxy object at the end of WorkerFetchResponseRunnable::WorkerRun() after resolving/rejecting as the documentation in PromiseWorkerProxy.h suggests?
Flags: needinfo?(nsm.nikhil)
Cleaning up immediately in WorkerFetchResponseRunnable::WorkerRun() doesn't help... <https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a008525a1f9>
(In reply to Ehsan Akhgari (don't ask for review please) from comment #22)
> (In reply to Nikhil Marathe [:nsm] (please needinfo?) from comment #21)
> > The URL API implementation on workers uses runnables dispatched to the main
> > thread that *block the worker thread* by spinning a nested event loop to
> > give an illusion of synchronicity.
> 
> Hmm, what does that nested event loop do?  Could this situation result in
> the WorkerFetchResponseRunnable and WorkerFetchResponseEndRunnable runnables
> to run on the worker out of order?

I will have to think about this.

> 
> Also, couldn't we CleanUp() the PromiseWorkerProxy object at the end of
> WorkerFetchResponseRunnable::WorkerRun() after resolving/rejecting as the
> documentation in PromiseWorkerProxy.h suggests?

WorkerFetchResolveRunnable is dispatched from OnStartRequest at which point we don't have the entire body downloaded yet. The PromiseWorkerProxy is a WorkerFeature that tries to keep the worker alive (subject to terms and conditions related to the worker being close()d, terminate()d or losing all main thread references). We delay removing it as a feature until the body is completely received so that the fetch() promise handler can read out a valid body. This mainly makes sense for an edge case like this:

  onclose = function() { fetch("large_file").then((r) => r.text().then(_ => do something )) }
  close();

That is, if a worker calls fetch() in an onclose handler and at some point it calls close() so that the onclose handler is allowed to take infinite time, then the fetch() in the handler will be able to keep the worker alive until the promise handlers can execute. Calling CleanUp() in WorkerFetchResponseRunnable::WorkerRun(), which is dispatched from onStartRequest, means that while the fetch() promise will resolve, any further promises (r.text(), r.json()) will not get to run.

onclose hasn't been in the worker spec for years and is a gecko only thing, so i guess we could say don't use fetch() in onclose.

That said, this timing issue is very likely due to the sync runnable from URL.
Hmm, thanks for the explanation, but I still don't really have any ideas on what to try next.

I did try this: <https://treeherder.mozilla.org/#/jobs?repo=try&revision=f407eed7b22b> which is basically avoiding an extra call into the URL API from this patch.  That works around the Android test failure but the b2g test failure still persists.  Given that I can't build the emulator locally any more, I don't think I can land this bug before the uplift (or really after the uplift either, because I have run out of ideas.)
I think we should work around this for now.  Disable the test on b2g (like all the other b2g disables) and write a follow up bug to fix.
Note that the b2g tests are failing because of this: <http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/TestRunner.js#773>  In other words, the checks succeed, but we do the checks after we have called finish().  I cannot explain why.
(In reply to Ben Kelly [:bkelly] from comment #26)
> I think we should work around this for now.  Disable the test on b2g (like
> all the other b2g disables) and write a follow up bug to fix.

This is a fetch() test, not a SW test...  :(
khuey had an idea yesterday on IRC about the possibility of the sync loop terminating with events in its queue which will get appended to the main event queue breaking the ordering, and I pushed a patch to try to add a MOZ_CRASH to that code to see if we trigger it.  To my big surprise, it seems like the Android debug failure doesn't happen any more <https://treeherder.mozilla.org/#/jobs?repo=try&revision=8299e8456f81>.

However, the b2g emulator failure still happens and doesn't hit the MOZCRASH <https://treeherder.mozilla.org/#/jobs?repo=try&revision=058ea5f10fad>.  So I don't think that is our bug.
Depends on: 1208687
This is my try push on top of bug 1208687, and the b2g mochitest-8/19 suites are still orange: <https://treeherder.mozilla.org/#/jobs?repo=try&revision=0938d2a56aba>
Depends on: 1210552
Attachment #8668634 - Flags: review?(bkelly)
Attachment #8668634 - Flags: review?(bkelly) → review+
Comment on attachment 8661562 [details] [diff] [review]
Part 1: Avoid overriding the channel final URI when it gets intercepted

(Approval being requested for the whole patch series)

Approval Request Comment
[Feature/regressing bug #]: Service workers
[User impact if declined]: Blocker for shipping SW
[Describe test coverage new/current, TreeHerder]: Has tests.
[Risks and why]: The changes here are mostly related to service workers and the DOM Cache API.  SW is not shipped yet, but Cache is, however the Cache part is fairly straightforward and not very risky.
[String/UUID change made/needed]: None.
Attachment #8661562 - Flags: approval-mozilla-aurora?
Comment on attachment 8661562 [details] [diff] [review]
Part 1: Avoid overriding the channel final URI when it gets intercepted

Please uplift all 4 patches to aurora.
Attachment #8661562 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
part 4 failed to apply:

rafting 306451:f49b6f48c993 "Bug 1204596 - Part 4: Disable test_fetch_cors.html on b2g; r=bkelly"
merging dom/tests/mochitest/fetch/mochitest.ini
warning: conflicts during merge.
merging dom/tests/mochitest/fetch/mochitest.ini incomplete! (edit conflicts, then use 'hg resolve --mark')

could you take a look, thanks!
Flags: needinfo?(ehsan)
Flags: needinfo?(ehsan)
This may depend on bug 1188545 in some way...
Depends on: 1188545
We decided to not ship SW in 43.
Flags: needinfo?(ehsan)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: