Closed Bug 1843001 Opened 1 year ago Closed 8 days ago

GeckoView fails to return web requests after GeckoNetworkManager is marked as down

Categories

(Core :: Networking, defect, P2)

All
Android
defect

Tracking

()

RESOLVED FIXED
132 Branch
Tracking Status
firefox132 --- fixed

People

(Reporter: jw, Assigned: edenchuang)

Details

(Whiteboard: [necko-triaged])

Attachments

(2 files)

Steps to reproduce:

We have a application using GeckoView. It is served content from both the Android "asset" folder and a locally running web server.

Up to v108 this worked fine. We couldn't use v109 to v113 due to bugs in GeckoView (see bug 1817249). We have since tried v114/v115.

However, it seems in this version that if GeckoView determines the network is down it can no longer serve up content, even if that content is from a local web server that is running just fine.

In the logs, if we get "GeckoNetworkManager: New network state: DOWN, NONE, UNKNOWN", we then get errors like the following when doing a Requests/Fetch/Response from JavaScript:

[Exception... "Success" nsresult: "0x0 (NS_OK)" location: "<unknown>" data: no]

For example:

const request = new Request(...);
fetch(request).then(response => {
if (response.ok) {
... do some work...
}
else {
... the actual response isn't a Response at all, instead it's an exception with the details listed above...
}

Actual results:

When the network is down, local content can no longer be fetched by GeckoView. This is a regression from v108, where this worked fine.

Interestingly, if you start in a network "down" state, GeckoView appears to serve content from local web servers just fine. It seems to be the transition to a "down" state causing the issue.

Expected results:

GeckoView should still be able to return Requests for locally served content, even if the phone thinks that Wi-Fi/Cellular is in a "down" state.

The severity field is not set for this bug.
:boek, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jboek)

Any update on this please? It's a serious bug that makes GeckoView unusable if locally serving content.

Nominating this in our next triage

:jw what is the locally running server? Also, have you tried reproducing with GeckoView Example?

Flags: needinfo?(jw)

I've not tried with the GeckoView example, but have no reason to believe it would behave any differently. This functionality used to work on v108 and only broke on the newer versions of GeckoView.

There are various local web servers available on Android. We've used TinyWebServer and NanoHTTP in the past, but these days just do a TCP SocketServer and serve the content we need ourselves (it's trivial enough to implement something that reads/responds HTTP for our use case).

We use the local web server to serve up dynamic map tiles to MapLibre GL JS as part of our line-of-business app. What we see with v114 onward is that we can no longer do this if the device doesn't have outbound Internet connectivity (i.e. GeckoNetworkManager is marked as down), even though, in our case, not having the Internet has no impact in our ability to serve these dynamic map tiles to GeckoView.

So, for example, in our case, our app stops working if someone wanders out of Wi-Fi coverage. GeckoView doesn't event attempt to make the HTTP request, we just get an error in the console log "[Exception... "Success" nsresult: "0x0 (NS_OK)" location: "<unknown>" data: no]" when in this state.

I'll have a look at mocking up an example using GeckoView, but, hopefully, this is enough information to go on for now?

Flags: needinfo?(jw)

Just wondering if there has been any updates on this? Thanks.

Severity: -- → S2
Flags: needinfo?(jboek)
Priority: -- → P2
Version: Firefox 114 → Firefox 108

Sorry to chase this, but we can't move to a newer build of GeckoView due to this bug.

Is there anything that is needed to move this forward?

Or, is there any mechanism to turn off "down detection" in GeckoView as a workaround for this?

I can't reproduce this , could you help providing more detailed steps or a video?

I did following steps.

  1. Install LWS (lightweight web server) from f-droid and listen at 127.0.0.1:8080 when WIFI off (otherwise it will listen on WIFI interface's IP) and also no mobile data available.
  2. WIFI On and then open GeckoView Example and load http://127.0.0.1:8080 (loaded successfully) , also do a fetch to http://127.0.0.1:8080/random1 in remote debugging console (loaded successfully)
  3. WIFI off , load http://127.0.0.1:8080/random2 (loaded successfully) also do a fetch to http://127.0.0.1:8080/random3 in remote debugging console (loaded successfully)

Thanks for looking. I've had another go at reproducing this on GeckoView 120.0.20231129155202.

The missing step is that the Request() fails only when initiated from a WebWorker.

The steps I followed are:

  1. Turn Wi-Fi is ON
  2. Do a Request from the Remote Debugging Console [SUCCESSFUL]
  3. Do a Request from a test WebWorker [SUCCESSFUL]
  4. Turn Wi-Fi OFF (and go into Airplane mode if you have Cellular)
  5. Do a Request from the Remote Debugging Console [SUCCESSFUL]
  6. Do a request from a test WebWorker [FAILS]

I've attached a screenshot of the console while I was doing this. Some of the console.log() lines are just notes to show what I was doing. My example WebWorker is:

self.onmessage = function (e) {
console.log("TestWebWorker: Got message (e.data=" + e.data + ")");

var request = new Request(e.data);
var response = fetch(request);
response.then(response => {
if (response.ok) {
console.log("TestWebWorker: Response OK");
} else {
console.log("TestWebWorker: Response ERROR");
}
});

console.log("TestWebWorker: Finished, postMessage back");
postMessage("All done");
};

The URL is just for a map data tiles, it doesn't really matter what this is as, when it fails, the error stops the request from happening.

I've tried to see if it's something to do with running commands in the context of the Remote Debugging Console vs from the main JavaScript thread, but this doesn't seem to be the case. If I set up a Request() in a setInterval() from the main JavaScript thread, then the Request() appears to work fine both with and without Wi-Fi. It does seem to be something specific to do with WebWorkers?

If you need any more information, please let me know :)

Attached image GeckoViewDebugLog.png

Thanks for the updated info. I can reproduce that. Fetch under webworker will failed after WIFI switching from ON to OFF.

Since widget/android/GeckoNetworkManager.h and mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoNetworkManager.java have been not changed for a long time . I think the problem may related to dom/fetch part. I'd recommend to move this ticket to Product: Core Components: DOM:Networking and CC/Needinfo from someone in dom-worker-reviewers group.

There're several code related to mOffline in dom/fetch/FetchService.cpp , but i'm not sure why the normal fetch works (it looks like it don't use code from dom/fetch ? ) , but web worker's fetch don't work.

From the code , it may be related to https://bugzilla.mozilla.org/show_bug.cgi?id=1757147 , however it looks like a security bug that is restricted to specific groups.

Also I think even network is offline , fetch to localhost should never be canceled.


Some log:


====  wifi on and in the web page's web worker
11-30 23:23:26.728 17610 17852 I GeckoFetch: FetchChild::FetchChild [0xb456d2e0]
11-30 23:23:26.728 17610 17852 I GeckoFetch: FetchChild::DoFetchOp [0xb456d2e0]
11-30 23:23:26.758 17560 17594 I GeckoFetch: FetchService::Fetch (WorkerFetch)
11-30 23:23:26.758 17560 17594 I GeckoFetch: FetchInstance::Initialize [0xc31d1710] request[0xb339e3e0]
11-30 23:23:26.758 17560 17594 I GeckoFetch: FetchInstance::Fetch [0xc31d1710], mRequest URL: http://127.0.0.1:8080/init_request mPrincipal: http://127.0.0.1:8080/worker.js
11-30 23:23:26.759 17560 17594 I GeckoFetch: FetchInstance::OnNotifyNetworkMonitorAlternateStack [0xc31d1710]
11-30 23:23:26.759 17560 17601 I GeckoFetch: FetchInstance::NotifyNetworkMonitorAlternateStack, Runnable
11-30 23:23:26.759 17560 17594 I GeckoFetch: FetchService::Fetch entry[0xac1e7a40] of FetchInstance[0xc31d1710] added
11-30 23:23:26.770 17560 17594 I GeckoFetch: FetchInstance::OnResponseAvailableInternal [0xc31d1710]
11-30 23:23:26.770 17560 17594 I GeckoFetch: FetchInstance::OnResponseAvailableInternal [0xc31d1710] response body: 0xb2e201c0
11-30 23:23:26.770 17560 17601 I GeckoFetch: FetchInstance::OnResponseAvailableInternal Runnable
11-30 23:23:26.771 17610 17852 I GeckoFetch: FetchChild::RecvOnResponseAvailableInternal [0xb456d2e0]
11-30 23:23:26.771 17560 17594 I GeckoFetch: FetchInstance::OnReportPerformanceTiming [0xc31d1710]
11-30 23:23:26.771 17560 17601 I GeckoFetch: FetchInstance::OnReportPerformanceTiming, Runnable
11-30 23:23:26.771 17560 17594 I GeckoFetch: FetchInstance::OnResponseEnd [0xc31d1710] eNetworking
11-30 23:23:26.771 17560 17594 I GeckoFetch: FetchInstance::FlushConsoleReport [0xc31d1710]
11-30 23:23:26.771 17560 17594 I GeckoFetch: FetchInstance::OnResponseEnd entry of responsePromise[0xac1e7a40] is removed
11-30 23:23:26.771 17560 17601 I GeckoFetch: FetchInstance::FlushConsolReport, Runnable
11-30 23:23:26.771 17610 17852 I GeckoFetch: FetchChild::RecvOnReportPerformanceTiming [0xb456d2e0]
11-30 23:23:26.771 17560 17601 I GeckoFetch: FetchInstance::OnResponseEnd, Runnable
11-30 23:23:26.771 17610 17852 I GeckoFetch: FetchChild::RecvOnFlushConsoleReport [0xb456d2e0]
11-30 23:23:26.771 17610 17852 I GeckoFetch: FetchChild::RecvOnResponseEnd [0xb456d2e0]
11-30 23:23:26.771 17610 17852 I GeckoFetch: FetchChild::Recv__delete__ [0xb456d2e0]
11-30 23:23:26.771 17610 17852 I GeckoFetch: FetchChild::ActorDestroy [0xb456d2e0]

===== turn off then postMessage to  web worker in debugging console
11-30 23:24:27.716 17560 17594 I GeckoFetch: FetchService::Observe topic: network:offline-status-changed
11-30 23:24:32.480 17610 17852 I GeckoFetch: FetchChild::FetchChild [0xb2f2f790]
11-30 23:24:32.480 17610 17852 I GeckoFetch: FetchChild::DoFetchOp [0xb2f2f790]
11-30 23:24:32.482 17560 17594 I GeckoFetch: FetchService::Fetch (WorkerFetch)
11-30 23:24:32.482 17560 17594 I GeckoFetch: FetchService::Fetch network offline
11-30 23:24:32.482 17560 17601 I GeckoFetch: FetchService::PropagateErrorResponse runnable aError: 0x804B0010
11-30 23:24:32.483 17610 17852 I GeckoFetch: FetchChild::RecvOnResponseAvailableInternal [0xb2f2f790]
11-30 23:24:32.483 17610 17852 I GeckoFetch: FetchChild::RecvOnResponseAvailableInternal [0xb2f2f790] response type is Error(0x804b0010)
11-30 23:24:32.483 17610 17852 I GeckoFetch: CustomLog FetchChild::RecvOnResponseAvailableInternal MSG_FETCH_FAILED
11-30 23:24:32.484 17610 17852 I GeckoFetch: FetchChild::RecvOnResponseEnd [0xb2f2f790]
11-30 23:24:32.484 17610 17852 I GeckoFetch: FetchChild::Recv__delete__ [0xb2f2f790]
11-30 23:24:32.484 17610 17852 I GeckoFetch: FetchChild::ActorDestroy [0xb2f2f790]

Also i comment out the code, the web worker's fetch works normally.

Thanks for the update jackyzy823. Is there anything that needs to be done to push this forward? We are really keen to see this fixed :)

Flags: needinfo?(echuang)

This is a case fetch() in Workers fails since the Network status is offline. But we expect fetch() to localhost should not fail even if Network is offline.
I think I know what should do, will submit a patch later.

Flags: needinfo?(echuang)
Assignee: nobody → echuang
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Apologies for chasing, but, has there been any progress on this issue? Thanks.

Any further update? Thanks.

Please, can this be looked at? This is critical for our app and stopping us upgrading to a new GeckoView.

Just a hint: If this really matters for your application , you could build geckoview yourself with the pending patch or reverting the previous change or just commenting out the lines.

Understood, and thank you for the pointer to the lines to comment out.

I guess, I was hoping that, as a patch was under discussion 4 months ago, that this issue could be resolved centrally, rather than us having to maintain our own private build of Geckoview.

Is there any way to get that patch pushed forward? Is seems to have kicked off some discussions but then stopped before being committed.

Is there any documentation about how to build Geckoview, as I'm struggling to pinpoint it?

Much appreciated.

Component: Core → General

Is there any way to get that patch pushed forward

Maybe leaving a comment under that patch or pinging (needinfo) the author here and there?

Is there any documentation about how to build Geckoview, as I'm struggling to pinpoint it?

Some documentations for building GeckoView

[1] https://firefox-source-docs.mozilla.org/mobile/android/geckoview/contributor/for-gecko-engineers.html
[2] https://firefox-source-docs.mozilla.org/mobile/android/geckoview/contributor/geckoview-quick-start.html#include-geckoview-as-a-dependency

Please, can this bug be resolved or moved to someone else to be looked at again? This bug has been open for a year and still impacts the latest GeckoView (v128).

It's becoming critical for us to have a fix for this. Is there anything I can do to push this forward?

Thanks.

Hi Andrew can you please review the patch?

Flags: needinfo?(bugmail)

The patch is appropriately marked as needs revision.

Flags: needinfo?(bugmail) → needinfo?(echuang)

ok, thank you!

Component: General → Networking
Product: GeckoView → Core
Version: Firefox 108 → unspecified

Moved the bug because it looks like it affects not just mobile, and the solution lies withing networking

Flags: needinfo?(echuang)
Pushed by echuang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/87f698d4fb8e Allow fetch() to localhost in Workers while network is offline. r=asuth,necko-reviewers,valentin

Backed out for causing mochitest-plain failures on test_offline_localhost_fetch.html.
So far, the failures appeared only on Android 7.0 x86-64.

[task 2024-08-15T08:46:04.219Z] 08:46:04     INFO -  TEST-START | dom/serviceworkers/test/test_offline_localhost_fetch.html
[task 2024-08-15T08:51:07.268Z] 08:51:07  WARNING -  TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_offline_localhost_fetch.html | Test timed out. -
[task 2024-08-15T08:51:07.269Z] 08:51:07  WARNING -  TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_offline_localhost_fetch.html | [SimpleTest.finish()] No checks actually run. (You need to call ok(), is(), or similar functions at least once.  Make sure you use SimpleTest.waitForExplicitFinish() if you need it.)
[task 2024-08-15T08:51:07.269Z] 08:51:07     INFO -      SimpleTest.ok@SimpleTest/SimpleTest.js:426:16
[task 2024-08-15T08:51:07.269Z] 08:51:07     INFO -      afterCleanup@SimpleTest/SimpleTest.js:1477:18
[task 2024-08-15T08:51:07.269Z] 08:51:07     INFO -      executeCleanupFunction@SimpleTest/SimpleTest.js:1562:7
[task 2024-08-15T08:51:07.269Z] 08:51:07     INFO -      SimpleTest.finish@SimpleTest/SimpleTest.js:1582:3
[task 2024-08-15T08:51:07.269Z] 08:51:07     INFO -      killTest@SimpleTest/TestRunner.js:200:22
[task 2024-08-15T08:51:07.269Z] 08:51:07     INFO -      async*delayedKillTest@SimpleTest/TestRunner.js:243:17
[task 2024-08-15T08:51:07.269Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:241:17
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.270Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:255:15
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      TestRunner.runTests/<@SimpleTest/TestRunner.js:535:16
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      Async*TestRunner.runTests@SimpleTest/TestRunner.js:522:48
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      RunSet.runtests@SimpleTest/setup.js:285:14
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      RunSet.runall@SimpleTest/setup.js:264:12
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      getPrefList/<@SimpleTest/setup.js:351:14
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      loadFile/req.onload@SimpleTest/setup.js:80:19
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      EventHandlerNonNull*loadFile@SimpleTest/setup.js:75:3
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      getPrefList@SimpleTest/setup.js:349:13
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      hookupTests@SimpleTest/setup.js:372:5
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -  parseTestManifest@http://mochi.test:8888/manifestLibrary.js:53:13
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -  getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:66:28
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -  EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:62:3
[task 2024-08-15T08:51:07.271Z] 08:51:07     INFO -      hookup@SimpleTest/setup.js:337:20
[task 2024-08-15T08:51:07.272Z] 08:51:07     INFO -  EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&logFile=%2Fdata%2Flocal%2Ftmp%2Ftest_root%2Flogs%2Fmochitest.log&fileLevel=INFO&consoleLevel=INFO&hideResultsTable=1&manifestFile=tests.json&dumpOutputDirectory=%2Fdata%2Flocal%2Ftmp%2Ftest_root&ignorePrefsFile=ignorePrefs.json:10:32
[task 2024-08-15T08:51:07.272Z] 08:51:07  WARNING -  TEST-UNEXPECTED-FAIL | /tests/dom/serviceworkers/test/test_offline_localhost_fetch.html logged result after SimpleTest.finish(): [SimpleTest.finish()] No checks actually run. (You need to call ok(), is(), or similar functions at least once.  Make sure you use SimpleTest.waitForExplicitFinish() if you need it.)
[task 2024-08-15T08:51:07.272Z] 08:51:07     INFO -  TEST-OK | dom/serviceworkers/test/test_offline_localhost_fetch.html | took 307387ms
[task 2024-08-15T08:51:07.272Z] 08:51:07     INFO -  TEST-START | dom/serviceworkers/test/test_onmessageerror.html
Flags: needinfo?(echuang)
Flags: needinfo?(echuang)
Whiteboard: [necko-triaged]

Hi Eden,

It seems the patch has been backed out for a while. Do you need any help from necko team?
Is there anything we can do to help you unblock?

Thanks.

Flags: needinfo?(echuang)

Apologies to chase, but, is there anything that can be done to get this fix pushed back in? We are really keen to update our version of GeckoView and this bug is blocking this for us. Many thanks.

Pushed by echuang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/92b03eef5d65 Allow fetch() to localhost in Workers while network is offline. r=asuth,necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 8 days ago
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch
Flags: needinfo?(echuang)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: