Closed Bug 535793 Opened 15 years ago Closed 5 years ago

Connect in gtk port on n900 works, but you get a page load error directly after connection setup and need to try again.

Categories

(Firefox for Android Graveyard :: General, defect)

Fennec 1.1
ARM
Maemo
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Assigned: jbos)

References

Details

Attachments

(1 file)

with build 20091218 I was testing the auto connect feature.  I load a simple webpage up and it times out giving me a page load error.  Then I look at my device and the connection was established...a "try again" works just fine to load the page.
I have seen this when opening a gsm connection. The process takes too long and we timeout. Not sure if mozilla has a way to extend the timeout.
The current fix for auto-connect is a bit of a workaround. Bug 530075 should be a more general purpose solution.

Since you can still connect, after an additional "try again", this bug isn't so bad, imo.
tracking-fennec: --- → ?
Bug 530075 is a platform bug and won't land for Fennec 1.1
I have identical issue when I use GPRS connection. Fennec is loading and loading, but could not open any page. I tried Fennec v1.0 and latest nightly build.
With other browsers (Micro-B, Chromium...) I don't have such issue. Also Fennec works fine if I use WiFi connection.
Keywords: qablocker
Given comment 3, I don't see how we can qablock on this bug
Keywords: relnote
Component: Linux/Maemo → General
Depends on: 530075
I fixed the issue by reseting the timeout to the default value after applying the connection successfully.

With that the connection error is not reproducible anymore.
Assignee: nobody → jeremias.bosch
Attachment #462273 - Flags: review?(doug.turner)
blocking2.0: --- → ?
Blocks: 583135
No longer depends on: 530075
A side node:
The request to establish the connection does block the network thread and not the mainthread which means that the timers get processed. 

This cause that the timeout of nsSocketTransport can actually ran out since we cant now how long the establishing of a connection takes. The time depends on the network, if the user need to go through a dialog and so on.

The solution is to reset the timeout after establishing the connection. This avoid that this waiting gets wrongly interpreted as a slow network / not responding server / ...
Attachment #462273 - Flags: review?(doug.turner) → review?(cbiesinger)
(In reply to comment #7)
> The request to establish the connection does block the network thread and not
> the mainthread which means that the timers get processed. 

No it doesn't, it is a nonblocking request. See http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsSocketTransport2.cpp#866

> The solution is to reset the timeout after establishing the connection. This
> avoid that this waiting gets wrongly interpreted as a slow network / not
> responding server / ...

What is mPollTimeout here if not the connect timeout? And why did it get set to anything else?
I think thats exactly the point. there is these "socket-thread" and the mainthread is not blocked by it. Which means the timeout can ran out even when i.e. the server does not react - and thats absolutly fine.

But in case we want to establish the new connection we need block the socket-thread by ourself, it does not make sense to load anything without internet connection established first. But this will be interpreted the same as a server does not respond since the timeout timer gets not paused.
OK, but why does the patch work? mPollTimeout should already be equal to mTimeouts[TIMEOUT_CONNECT], right?

Furthermore, why is there a timeout at all here? By default necko doesn't have timeouts.
(In reply to comment #10)
> OK, but why does the patch work? mPollTimeout should already be equal to
> mTimeouts[TIMEOUT_CONNECT], right?
> 
I'm really no expert in that area :(. Might be that this is just a symptom of some problem deeper in necko.  
 
> Furthermore, why is there a timeout at all here? By default necko doesn't have
> timeouts.

That i cant answer.



This bug is one of the most important to fix for meego / maemo. Since its making testing nearly impossible (the timeout even happens on automatic connection), creates a really bad user experience and is just wrong.
(In reply to comment #11)
> (In reply to comment #10)
> > OK, but why does the patch work? mPollTimeout should already be equal to
> > mTimeouts[TIMEOUT_CONNECT], right?
> > 
> I'm really no expert in that area :(. Might be that this is just a symptom of
> some problem deeper in necko.  

How about you add a printf displaying both mPollTimeout and mPollTimeouts[TIMEOUT_CONNECT] and see what the two values are when the bug happens?

I'm not going to r+ this patch without understanding why the bug happens.

> > Furthermore, why is there a timeout at all here? By default necko doesn't have
> > timeouts.
> 
> That i cant answer.

Maybe you could do some investigations?
I have it seen working well with that, newest tests are again broken. something else is really wrong here.
Summary: autoconnect on my n900 works, but I get a page load error in the process → Connect on maemo/meego works, but you get a page load error directly after connection setup and need to try again.
Some more informations, this happens sometimes:

1) disconnect from the web
2) start the browser
3) cancel the first request to establish a connection
4) try to load http://www.google.com
   -> nsSocketTransport::RecoverFromError()  gets called 
5) establish the connection through the dialog
   -> Try Again == true.
6) after some time
   -> nsSocketTransport::RecoverFromError()  gets called 
   -> Try Again == true.
7) (No dialogs are displayed)...
   -> nsSocketTransport::RecoverFromError()  gets called
hang somewhere, not, nothing happens

1) disconnect from the web
2) start the browser
3) cancel the first request to establish a connection
4) try to load http://www.google.com
   -> nsSocketTransport::RecoverFromError()  gets called 
5) establish the connection through the dialog
   -> Try Again == true.
6) after some time
   -> nsSocketTransport::RecoverFromError()  gets called 
   -> return with false.
7) Error Page is displayed
re: comment 12: jbos also reports that both mPollTimeout and
mPollTimeouts[TIMEOUT_CONNECT] are INT_MAX when the bug occurs.
Ok i was talking with romaxa, might be a problem of lib conic which reports to early that the connection is established. He said that we should wait one event loop before trying to reconnect
Alright i tested this, by adding a wait and processing gmainloop it _seems_ to work. I'm carefull to call it a fix. And I wouldlike to have someone how can reproduce it.

the "hack" is very easy.

nsMaemoNetworkManager::OpenConnectionSync(){
  [...................]

  MonitorAutoEnter mon(*gMonitor);

  while (!gConnectionCallbackInvoked)
    mon.Wait();
}
+int test = 0;
+printf("Now wait 5 sek\n");
+while (test < 5)
+{
+    PR_Sleep(PR_SecondsToInterval(1));
+    g_main_context_iteration(0,false);
+    test++;
+}
    
  if (gInternalState == InternalState_Connected)
    return PR_TRUE;

  return PR_FALSE;
}
For the qt port we fix this with Bug, 585636 rename to GTK N900 only bug. Solution there is to wait for at least one event loop, maybe send another signal internal.
No longer blocks: 583135
Summary: Connect on maemo/meego works, but you get a page load error directly after connection setup and need to try again. → Connect in gtk port on n900 works, but you get a page load error directly after connection setup and need to try again.
tracking-fennec: ? → 2.0+
blocking2.0: ? → ---
Comment on attachment 462273 [details] [diff] [review]
This patch fix the timeout/connection issue

per comment 12, this patch has no effect.
Attachment #462273 - Flags: review?(cbiesinger) → review-
I have an update on this one... 

Mobile devices need up to 6-10 seconds for the first UDP package, most of the UDP Packages get actually lost.

1) adding a second resolveing into nsHostResolver::ThreadFunc in case the first failed.

2) we return a Lookup Failed "Unknown Host" on DNS Service::onLookupComplete(..) even in case the resolving in nsHostResolver::ThreadFunc is still ongoing.
I cant say if that is actually ok, but it is the cause of this bug. This error hits nsSocketTransport2 within onLookupComplete and cause that we stop waiting for the resolve and cause that we load the error page.

Interesting part is that ater (2-3-8 seconds) we create another DNSService::onLookupComplete(...) but this time with success, but since nsSocketTransport2 already stopped and reports the error its to late.

What i did to make it working is that i (hackish) expose mResolving of HostRecord. The dns service will check either that is true or not, in case its true we just do not report the error to nsSocketTransport2 and only take care about our references.

This makes it work 50/50 cases.

So now we need a real solution, exposing the mResolving does not seem like a good idea.

I think here is the help from some Necko Experts needed.
cc-ing Patrick since he's fairly conversant with our DNS code and connection timeout stuff.
tracking-fennec: 2.0+ → 2.0-
tracking-fennec: 2.0- → ---
Keywords: qablocker
Keywords: relnote
Closing all opened bug in a graveyard component
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: