535793 - Connect in gtk port on n900 works, but you get a page load error directly after connection setup and need to try again.

Reporter

Description

•

15 years ago

with build 20091218 I was testing the auto connect feature.  I load a simple webpage up and it times out giving me a page load error.  Then I look at my device and the connection was established...a "try again" works just fine to load the page.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 1

•

15 years ago

I have seen this when opening a gsm connection. The process takes too long and we timeout. Not sure if mozilla has a way to extend the timeout.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 2

•

14 years ago

The current fix for auto-connect is a bit of a workaround. Bug 530075 should be a more general purpose solution.

Since you can still connect, after an additional "try again", this bug isn't so bad, imo.

Aakash Desai [:aakashd]

Updated

•

14 years ago

tracking-fennec: --- → ?

Mark Finkle (:mfinkle) (use needinfo?)

Comment 3

•

14 years ago

Bug 530075 is a platform bug and won't land for Fennec 1.1

kolos

Comment 4

•

14 years ago

I have identical issue when I use GPRS connection. Fennec is loading and loading, but could not open any page. I tried Fennec v1.0 and latest nightly build.
With other browsers (Micro-B, Chromium...) I don't have such issue. Also Fennec works fine if I use WiFi connection.

Aakash Desai [:aakashd]

Updated

•

14 years ago

Keywords: qablocker

Mark Finkle (:mfinkle) (use needinfo?)

Comment 5

•

14 years ago

Given comment 3, I don't see how we can qablock on this bug

Aakash Desai [:aakashd]

Updated

•

14 years ago

Keywords: relnote

Doug Turner (:dougt)

Updated

•

14 years ago

Component: Linux/Maemo → General

timeless

Updated

•

14 years ago

Depends on: 530075

Jeremias Bosch (:jbos)

Assignee

Comment 6

•

14 years ago

Attached patch This patch fix the timeout/connection issue — Details — Splinter Review

I fixed the issue by reseting the timeout to the default value after applying the connection successfully.

With that the connection error is not reproducible anymore.

Assignee: nobody → jeremias.bosch

Attachment #462273 - Flags: review?(doug.turner)

Jeremias Bosch (:jbos)

Assignee

Updated

•

14 years ago

blocking2.0: --- → ?

Jeremias Bosch (:jbos)

Assignee

Updated

•

14 years ago

Blocks: 583135

No longer depends on: 530075

Jeremias Bosch (:jbos)

Assignee

Comment 7

•

14 years ago

A side node:
The request to establish the connection does block the network thread and not the mainthread which means that the timers get processed. 

This cause that the timeout of nsSocketTransport can actually ran out since we cant now how long the establishing of a connection takes. The time depends on the network, if the user need to go through a dialog and so on.

The solution is to reset the timeout after establishing the connection. This avoid that this waiting gets wrongly interpreted as a slow network / not responding server / ...

timeless

Updated

•

14 years ago

Attachment #462273 - Flags: review?(doug.turner) → review?(cbiesinger)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 8

•

14 years ago

(In reply to comment #7)
> The request to establish the connection does block the network thread and not
> the mainthread which means that the timers get processed. 

No it doesn't, it is a nonblocking request. See http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsSocketTransport2.cpp#866

> The solution is to reset the timeout after establishing the connection. This
> avoid that this waiting gets wrongly interpreted as a slow network / not
> responding server / ...

What is mPollTimeout here if not the connect timeout? And why did it get set to anything else?

Jeremias Bosch (:jbos)

Assignee

Comment 9

•

14 years ago

I think thats exactly the point. there is these "socket-thread" and the mainthread is not blocked by it. Which means the timeout can ran out even when i.e. the server does not react - and thats absolutly fine.

But in case we want to establish the new connection we need block the socket-thread by ourself, it does not make sense to load anything without internet connection established first. But this will be interpreted the same as a server does not respond since the timeout timer gets not paused.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 10

•

14 years ago

OK, but why does the patch work? mPollTimeout should already be equal to mTimeouts[TIMEOUT_CONNECT], right?

Furthermore, why is there a timeout at all here? By default necko doesn't have timeouts.

Jeremias Bosch (:jbos)

Assignee

Comment 11

•

14 years ago

(In reply to comment #10)
> OK, but why does the patch work? mPollTimeout should already be equal to
> mTimeouts[TIMEOUT_CONNECT], right?
> 
I'm really no expert in that area :(. Might be that this is just a symptom of some problem deeper in necko.  
 
> Furthermore, why is there a timeout at all here? By default necko doesn't have
> timeouts.

That i cant answer.



This bug is one of the most important to fix for meego / maemo. Since its making testing nearly impossible (the timeout even happens on automatic connection), creates a really bad user experience and is just wrong.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 12

•

14 years ago

(In reply to comment #11)
> (In reply to comment #10)
> > OK, but why does the patch work? mPollTimeout should already be equal to
> > mTimeouts[TIMEOUT_CONNECT], right?
> > 
> I'm really no expert in that area :(. Might be that this is just a symptom of
> some problem deeper in necko.  

How about you add a printf displaying both mPollTimeout and mPollTimeouts[TIMEOUT_CONNECT] and see what the two values are when the bug happens?

I'm not going to r+ this patch without understanding why the bug happens.

> > Furthermore, why is there a timeout at all here? By default necko doesn't have
> > timeouts.
> 
> That i cant answer.

Maybe you could do some investigations?

Jeremias Bosch (:jbos)

Assignee

Comment 13

•

14 years ago

I have it seen working well with that, newest tests are again broken. something else is really wrong here.

Jeremias Bosch (:jbos)

Assignee

Updated

•

14 years ago

Summary: autoconnect on my n900 works, but I get a page load error in the process → Connect on maemo/meego works, but you get a page load error directly after connection setup and need to try again.

Jeremias Bosch (:jbos)

Assignee

Comment 14

•

14 years ago

Some more informations, this happens sometimes:

1) disconnect from the web
2) start the browser
3) cancel the first request to establish a connection
4) try to load http://www.google.com
   -> nsSocketTransport::RecoverFromError()  gets called 
5) establish the connection through the dialog
   -> Try Again == true.
6) after some time
   -> nsSocketTransport::RecoverFromError()  gets called 
   -> Try Again == true.
7) (No dialogs are displayed)...
   -> nsSocketTransport::RecoverFromError()  gets called
hang somewhere, not, nothing happens

1) disconnect from the web
2) start the browser
3) cancel the first request to establish a connection
4) try to load http://www.google.com
   -> nsSocketTransport::RecoverFromError()  gets called 
5) establish the connection through the dialog
   -> Try Again == true.
6) after some time
   -> nsSocketTransport::RecoverFromError()  gets called 
   -> return with false.
7) Error Page is displayed

Jason Duell

Comment 15

•

14 years ago

re: comment 12: jbos also reports that both mPollTimeout and
mPollTimeouts[TIMEOUT_CONNECT] are INT_MAX when the bug occurs.

Jeremias Bosch (:jbos)

Assignee

Comment 16

•

14 years ago

Ok i was talking with romaxa, might be a problem of lib conic which reports to early that the connection is established. He said that we should wait one event loop before trying to reconnect

Jeremias Bosch (:jbos)

Assignee

Comment 17

•

14 years ago

Alright i tested this, by adding a wait and processing gmainloop it _seems_ to work. I'm carefull to call it a fix. And I wouldlike to have someone how can reproduce it.

the "hack" is very easy.

nsMaemoNetworkManager::OpenConnectionSync(){
  [...................]

  MonitorAutoEnter mon(*gMonitor);

  while (!gConnectionCallbackInvoked)
    mon.Wait();
}
+int test = 0;
+printf("Now wait 5 sek\n");
+while (test < 5)
+{
+    PR_Sleep(PR_SecondsToInterval(1));
+    g_main_context_iteration(0,false);
+    test++;
+}
    
  if (gInternalState == InternalState_Connected)
    return PR_TRUE;

  return PR_FALSE;
}

Jeremias Bosch (:jbos)

Assignee

Comment 18

•

14 years ago

For the qt port we fix this with Bug, 585636 rename to GTK N900 only bug. Solution there is to wait for at least one event loop, maybe send another signal internal.

No longer blocks: 583135

Summary: Connect on maemo/meego works, but you get a page load error directly after connection setup and need to try again. → Connect in gtk port on n900 works, but you get a page load error directly after connection setup and need to try again.

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

14 years ago

tracking-fennec: ? → 2.0+

Benjamin Smedberg

Updated

•

14 years ago

blocking2.0: ? → ---

Christian :Biesinger (don't email me, ping me on IRC)

Comment 19

•

14 years ago

Comment on attachment 462273 [details] [diff] [review]
This patch fix the timeout/connection issue

per comment 12, this patch has no effect.

Attachment #462273 - Flags: review?(cbiesinger) → review-

Christian :Biesinger (don't email me, ping me on IRC)

Comment 20

•

14 years ago

sorry, that should've said comment 15.

Jeremias Bosch (:jbos)

Assignee

Comment 22

•

14 years ago

I have an update on this one... 

Mobile devices need up to 6-10 seconds for the first UDP package, most of the UDP Packages get actually lost.

1) adding a second resolveing into nsHostResolver::ThreadFunc in case the first failed.

2) we return a Lookup Failed "Unknown Host" on DNS Service::onLookupComplete(..) even in case the resolving in nsHostResolver::ThreadFunc is still ongoing.
I cant say if that is actually ok, but it is the cause of this bug. This error hits nsSocketTransport2 within onLookupComplete and cause that we stop waiting for the resolve and cause that we load the error page.

Interesting part is that ater (2-3-8 seconds) we create another DNSService::onLookupComplete(...) but this time with success, but since nsSocketTransport2 already stopped and reports the error its to late.

What i did to make it working is that i (hackish) expose mResolving of HostRecord. The dns service will check either that is true or not, in case its true we just do not report the error to nsSocketTransport2 and only take care about our references.

This makes it work 50/50 cases.

So now we need a real solution, exposing the mResolving does not seem like a good idea.

I think here is the help from some Necko Experts needed.

Jason Duell

Comment 23

•

14 years ago

cc-ing Patrick since he's fairly conversant with our DNS code and connection timeout stuff.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

14 years ago

tracking-fennec: 2.0+ → 2.0-

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

13 years ago

tracking-fennec: 2.0- → ---

Jason Smith [:jsmith]

Updated

•

10 years ago

Keywords: qablocker

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Keywords: relnote

BMO Automation

Comment 25

•

5 years ago

Closing all opened bug in a graveyard component

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX