Closed Bug 60968 Opened 24 years ago Closed 22 years ago

Talkback crash incidents cannot be sent

Categories

(Core Graveyard :: Talkback Client, defect, P3)

x86
All

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 189702

People

(Reporter: max1, Assigned: namachi)

References

Details

From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (WinNT; U)
BuildID:    2000112020

Netscape Quality Feedback Agent can not send incidents. A proxy works fine, I 
have stable connection, but the Agent fails to send them. I tried on to send on 
the different proxies and without a proxy, but the results are the same. Agent 
tries to send incidents, fails, waits 5 minutes, fails, etc. On manual sending I 
get the following message:

The Agent is unable to connect to the server. Please check your Proxy
Server settings or try again later.

I captured TCP packets (sent & received) and analysed them. This is what I 
discovered:

1. agent sends data via POST method with "Content-Length: 153"
2. receives reply from server "HTTP/1.1 200 OK"
3. sends another data via POST method with "Content-Length: 63641"
4. waits for server reply some time, and displays the error message
5. 10-20 seconds later after error message I capture packet with the server 
reply: "HTTP/1.1 200 OK"

I'm not using very fast connection, so I think the agent must have greater 
timeout

Reproducible: Always
Steps to Reproduce:
1. Run talkback build
2. Reproduce any crash bug


Actual Results:  Agent tries to send incidents, but it will never send them

Expected Results:  Agent sends incidents successfully

I can not send incidents on Mozilla builds and on Netscape 4.7x builds also
Since the traffic is very high on the server end, the QFA agent tries multiple
times to send before it succeeds.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Yes, the agent tries to send every 5 minutes, but in my case it fails every 
time, because the timeout is too short. None of incidents were sent during the 
year.
Summary: incidents could not be sent → incidents cannot be sent
Has something changed on the talkback server after this bugreport? I have sent 
successfully 2 incidents from the first attempt on the same build. It is the 
first time when I can send incidents.
As I said earlier the weekend traffic is low and the agent is able to 
successfully connect and send the box
Talkback agent also fails to send its bug reports here. I'm also using Mozilla
0.8 under Win2k with a 56k line (and it's even weekend)
--> tom ( curt@netscape.com)
Assignee: namachi → curt
Status: ASSIGNED → NEW
I've been experiencing this problem with Talkback ever since it was introduced 
in NS4.x. On a dial-up line the situation has always been very bad. During the 
past couple of years I've used a number of different Win32 machines, with all 
different kinds of configurations and 56K dialup ISPs. What I saw was the modem 
actually transferring data for a few seconds and in the end an error came back 
reporting that it "can't contact the server...", just as the initial post 
mentions. In my case it kept transferring data until right before it popped up 
the error message.

For the past few months I've been on a T1 line. I still get such errors 
relatively frequently, although not as frequently as before. Large bandwidth 
definitely improves the situation but doesn't solve it.

Another problem related to talkback while it's trying to send is that it eats up 
all my CPU cycles. Whenever it's trying to contact the server, my CPU usage 
shoots up at 100%. And when I'm on a dialup, this lasts for a enough seconds to 
disrupt my work and eventually force me to shutdown the Talkback agent :-(  I'm 
on a PII / 450MHz btw. This _clearly_ needs to be fixed.

The past 3 months I've been using Mozilla milestones and nightly releases under 
Linux and Win2K. Same exact problems, only in the Linux case the error message 
is different, claiming: "Network Error - Receive Failed".

Sorry for the long post but I feel that Talkback never worked adequately enough, 
and causes major problems while using low-bandwidth connections such as dialup. 
On fast connections the actual events reported here and their side-effects do 
not become so easily apparent but they're still there. If Talkback actually 
helps the Mozilla team, then I guess something should be done about it.

Maybe someone should change the "OS" to "all" and add the words "Talkback/QFA" 
at the beginning of the summary. I'm not sure if I'm allowed\able to do such 
changes myself.
this latest set of feedback on talkback is something
to follow up on and might be the first we have heard 
of on such problem.   We watch the talkback system 
handle thousands of blackboxes per hour under occasional
peak loads when millions of people are on the internet
using all the 4.5+ browsers.  maybe we can run some tests
where we can trace boxes from alecka to figure out if
there is something related to the specific system or
network configurations alecka has been running or
if there is something more systematic we can find.
Reassigning to me.
Assignee: curt → greer
I see this problem with Moz 0.9.3 (build 2001080110) on Windows 2000.
Also see on build 2001080104 on Linux (RH 7.1)
Wow!  Within minutes of appending my comments to this bug, 1 of the 2
queued incidents on Win2,000 delivered.  The second one is still 
"In Queue"
Last two days we had power outage which prevented us from receiving incidents
data. Now, We have restored the servers and we are receiving incidents now.

Sorry for the inconvenience.
I am still seeing difficulties transferring incidents via talkback.
I do not think the power failure explanation is the correct one.
This seems like a problem with bandwidth on Netscape's Talkback server. I never
have problems sending the crash reports to Microsoft, but the
talkback.netscape.com server is just too busy to handle crash reports. Netscape
really needs to upgrade the server--otherwise reporting crash bugs will be very
difficult without a Talkback ID. During off-peak hours I might successfully send
in a report, though it happens very slowly.

Random question: Would the crash data which is sent to Microsoft (in Windows XP)
be a good thing to paste into a crash bug report, until this is fixed? This data
is also generated when Mozilla hangs, which doesn't happen with Talkback.
Summary: incidents cannot be sent → Talkback crash incidents cannot be sent
I don't see any problem in the collector side of our server. In fact the server
is used only half of its capacity. If you can give me some idea on the time of
the day you are experiencing the delay in submission it will be helpful
understand the issue.


Pasting regular debug information into appropriate bug will certainly help.
Report of TB experience (after session with namachi)

I use a dialup connection 28.8 and when sending talkback incidents it appears
that my incidents consistently fail to send. They remain in the unsent queue
without incident ids and each time I start the browser and attempt to send the
unsent incidents it fails to do so.

This happens to me both behind the firewall and outside of the firewall, when
sending talkbacks for commercial builds or for Mozilla trunk builds.

From my discussion with Namachi, it appears that the incident is actually being
received however my talkback client is not receiving the update telling it that
the incident has been received and need not be sent again. The result is that I
continually attempt to send the same incidents. The talkback server thinks the
2nd attempt is a duplicate, however it fails to update my client which continue
to think the incident has not yet been sent. 

From a user perspective, this is a continual pita and if I didn't care about the
project I would disable talkback, and for Mozilla builds, stop downloading the
talkback builds at all.
Here's an update. Since I got a broadband connection, I am still having the same
problem getting Talkback reports to send. I'm unsure whether the problem is with
how the program sends the data or how the server receives it, but it seems not
as simple as looking at the server's bandwidth, and it's obviously not related
to the user's bandwidth. Someone who is responsible for the talkback system
needs to find a way to simulate a crash in Mozilla (run a query for bugs with
keyword crash, there are still plenty of them ;) ) and witness the problem for
himself. It might work better (or worse, rather) if the tester is not
geographically near the talkback server.
Skewer thanks for the input.
  A couple of weeks ago I sent out email to users on this bug and bug 128906
asking for recent experiences with Talkback submissions. In March, I changed
submission timeouts and several of the responders tell me that those changes
seem to have helped. 

Today, I have increased three of the timeouts to see if we can pick up any
"edge" cases of users for whom the previous settings were still insufficient:
- On the collecting server I have doubled the "accept" and "receive" timeouts
from 10sec to 20sec.
- On the repeater server (which pushes info through the firewall) I have
increased the "accept" timeout from 2500ms to 25000ms. (Comparing other server
timouts in the system, I believe that 2500ms was a mistake, missing the extra 0.)
This may be part of your trouble.

Notice the issue that Bob was seeing in comment #17 -
  "it appears that the incident is actually being received however my talkback
client is not receiving the update telling it that the incident has been
received and need not be sent again. The result is that I continually attempt to
send the same incidents."

If you have a problem submitting this week, email me. (That goes for anyone on
the bug.) Together we can find out if your submission actually completes and you
are seeing the same issue that Bob has seen.

Regarding your suggestions in comment #18. I typically work approx 150mi. from
the system, using a broadband connection and have not had the trouble you are
experiencing either inside or outside the firewall.

Again, let me know if you have troubles with Talkback. We want to make sure it
is running well for everyone.

I have seen a change in talkback this week. I have been unable to send any
talkbacks and have them received. Contact me on AIM today if you would like to
watch while I try to send my most recent tb since I will upgrade to a newer
build soon.
I just tried sending two talkbacks with my Bugzilla e-mail. Both times the
software reported that the connection failed. Did they go through?

If so, Talkback's software needs to acknowledge that these bug reports are being
properly sent so people don't get these stupid error messages (it's bad enough
they had to go through a crash!).

If not, the talkback.netscape.com server needs an upgrade. I estimate this isn't
the problem because there's no delay or latency, the "sending" dialog is only up
for a split second before the software reports failure.
Shock! Today I got a crash report to successfully send! Anyone getting favorable
results on narrow-band?
Skewer,
When you submit your Talkback incidents do you include your email address with
the incident? We need some way of identifying which crashes are yours. When you
entered comment #21 I tried looking for the incidents that you mentioned, but
didn't find any under the email that shows up in your Bugzilla comments.

If the opportunity comes for us to see if the Talkback system accepts your
submission but doesn't acknowledge the submission on the client side (scenario
in comment #17), the email address would be the only thing we could use to
verify that is what's happening. (Because in that scenario, talkback would not
acknowledge the incident ID to you, the client.)

If you are submitting an email address that you don't want posted in the bug,
you could email it to me at greer@netscape.com .
>When you submit your Talkback incidents do you include your email address with
>the incident? We need some way of identifying which crashes are yours. When you
>entered comment #21 I tried looking for the incidents that you mentioned, but
>didn't find any under the email that shows up in your Bugzilla comments.

I was using SkewerMZ@skewer100.cjb.net those two times. They were not
acknowledged in the client ("In queue") and apparently never went through.

When I posted comment #22 that was a crash incident I sent without any e-mail.
According to the client it worked. The next time I crash I will try the e-mail
again.

>If the opportunity comes for us to see if the Talkback system accepts your
>submission but doesn't acknowledge the submission on the client side (scenario
>in comment #17), the email address would be the only thing we could use to
>verify that is what's happening. (Because in that scenario, talkback would not
>acknowledge the incident ID to you, the client.)

Apparently that wasn't what happened to me when I posted comment #21. The client
correctly reported that the reports did not go through. There could possibly be
a problem where the client reports success and doesn't send it, but that will be
confirmed or proven false the next time I send a Talkback. The reason this bug
was reported was because Talkbacks were not being sent, but that seems to have
been fixed the last time I tried.
*** Bug 153046 has been marked as a duplicate of this bug. ***
Skewer,

> There could possibly be a problem where the client reports success and doesn't
send it

No, the client will not report success (and a valid Talkback incident ID)
without a success status returned from the Talkback system.
Then in that case there were no problems for me the last time I sent a Talkback,
and we just need a narrow-band user (and maybe someone overseas) to confirm that
it's fixed.
I am an overseas narrowband user, and I am unable to submit any incidents with
the QFA.    A few hours ago I tried several times to send two incidents, and
each time it would say:

Status: Connecting
Sending incident 1 of 2
http://talkback5.netscape.com/spiral-bin/Collector.dll

After a few seconds it would change to "Status: Receiving".  Then it would
quickly flicker back to connecting, receiving, and connecting again.  Then after
a few more seconds it would change to receiving once more, which would remain
for precisely 30 seconds before being replaced by a window that said "The Agent
is unable to connect to the server. Please check your Proxy Server settings or
try again later."

This happens irrespective of whether I configure the QFA to use my ISP's web
cache or not.  I use Freeserve (www.freeserve.com) and get 45Kbps.  I am using
2002071004 on Windows 2000.

I have been trying to submit these for a couple of days now, at times ranging
from 18:00 to 03:00 GMT, with the email address <petef61 at fordy.org>.

In the past I have successfully (AFAIK) transmitted crash data with talkback  -
with the same PC and OS, but with broadband Internet and a different Mozilla build.

I am of course happy to provide any further details, or other co-operation.
Now, just a few hours later, I believe I have (accidently) succesfully submitted
some or all of the crash data!  The QFA no longer loads automatically with
Mozilla (as it has been these last few days).  If I run talkback.exe it tells me
that incident TB8261140G has been sent.  Based on the "captured at" date, I
think this is one of the two incidents I was trying to submit.  I don't know
what has happened to the other one.

I haven't changed anything that would cause this to suddenly work.
I experience this often over 56K on WinXP.  Eventually, after about a week or
near the middle of the night, they will go through, but I am rarely on in the
middle of the night.
OS: Windows NT → All
Today I had another "unable to connect" error message. The problem's returned.
Skewer, your problem yesterday is not surprising. We had a huge influx of data
due to some added monitoring we took on. We are cutting some of it back starting
today, and hope to add some systems soon to further share the load and avoid the
type of problem you were seeing.

Your input helps to verify the need for (and justify) those additions. Thanks.
We might consider another bug capturing alternative such as Bugtoaster at
www.bugtoaster.com.  They tell me that they are working on something for
open-source communites such as Mozilla where the fee will be reduced or free. 
Also, they tell me that the Talkback system is no longer supported and is less
convenient because it requires its own server.  Bugtoaster allows access from
any system, captures data in an XML format, captures its own crashes (weird, but
true) and provides a way for users to monitor their crashes and see if
resolutions are available.  If any information is wanted about this, I can relay
the information or give the answer if I know it.  To avoid bugspam, my email is
brantgurganus2001@cherokeescouting.org.  I have worked with the people at
Bugtoaster many times.  I can direct people interested in this to sample crash
data and reports.
The Talkback server has enough problems handling the current load without having
new stuff added to it. If you plan to add new capabilities to Talkback, that's
great but be prepared to upgrade the machine to handle it.

What there needs to be is a way to monitor hangs. Windows XP captures hang data
but Talkback doesn't. 
Unable to send crash reports for 3 days now. 8 incidents "in queue", on this
here 28.8 connection!  What good do these talkback incidents do, if they sit on
my computer for (many) days at a time without being sent?  I have recently begun
deleting talkback incidents, and even turned off TalkBack for a while, because
it was incredibly annoying for it to try and send incidents, but continually 

WinXP, 28.8k, No Proxy, No firewall, No sign of life from TalkBack server
</rant>

Yes.. I agree with Brant in Comment #33 regarding BugToaster.. I find that
BugToaster:
(1) Is much more user-friendly
(2) Able to catch more crashes than TalkBack (from what I've seen)
(3) has no connectivity problems ;)
(4) helpful to both users and software engineers
(5) etc.. 

Over 1,300 "mozilla.exe" crashes have already been caught by BugToaster:
http://www.bugtoaster.com/dw15/Reports/AllCrashes.asp?ShowApps=true&CompanyName=&MaxCrashItems=50&ShowDlls=true&BaseName=mozilla.exe&Submit=Search
(link may wrap, copy/paste if necessary)
I do realize this is off the bug topic.  Is there a bug "Replace Ye Olde
TalkBack software" floating around? Shall I create one?
I managed to squeeze 1 (of 8) talkback incidents by repeatedly hitting "send". 
It took about 8 minutes (about 18 times I pressed "send") for the one incident.
I continued to try (and will continue to do so), but was unsuccessful in getting
the other 7 to send.

Packet sniffing reveals:
(1) About every 490ms, there are 40bytes of data per packet (all Headers, no
Data) sent from sun-006336.asset.aol.com (207.200.79.6)
(2) Tries for exactly 30 seconds before giving up
(3) After 30 sec, a packet of 210 bytes from the server:
-----
HTTP/1.1 500 collector replied: Collector session not found
Server: Netscape-Enterprise/4.1
Date: Fri, 23 Aug 2002 12:51:37 GMT
Content-type: text/plain
Connection: close
-----
(4) (occasionally) A 149 byte packet with the only data:
FCMP

I have tried on weekends, weekdays, 'business hours' and off-peak hours.
I am behind an authenticating firewall on a T1 at work.  I have been using
Mozilla fairly regularly since 0.9.3.  I have NEVER successfully sent (according
to talkback anyway) an error, yet at home on dialup I have sent several and
don't recall ever not successfully sending one from there.
I'd just like to mention something 'interesting' I noticed about the
talkback-send procedure:  The progress bar is useless and is possibly misleading
people as to why their talkback reports don't 'send' properly.  I am
(unfortunately and unavoidably) on 28.8k dialup and this is what happenned:

1. Boot machine, start win2k

2. On the start of win2k, talkback immediately tries to send the crash data from
last night, but I know it can't because the modem has not yet been dialed.

3. ***I noticed that the progress meter still reached 100% even though there was
no internet connection***

This is why I believe that the progress bar is useless.  You might think that at
100%, all data has been received by the server, but this is apparently not the
case because because it reaches 100% even when the server has not been contacted!

Thus, people might think that there is some problem with the client detecting
that the data has been sent, when in fact it could be a connectivity or
bandwidth problem because the progress bar proves nothing.

4. Dial modem, talkback report sends successfully.
see bug 189702

I'm having this problem with 1.3a on talkback5 server.
what i forgot: i am sitting behind a router.
I've also been having this problem for some months; I can leave my machine on
overnight with Talkback activated and trying every 5 mins, with no joy. I am on
a 56K dialup connection, using Naviscope as a proxy (disabling this makes no
difference) and ZoneAlarm. I have *very* occasionally actually been able to
submit crash data, but I don't recall it succeeding at all in at least the last
month.

Naviscope logs the following HTTP headers:
-----------------------------------------------------------------------
POST http://talkback.netscape.com:80/spiral-bin/Collector.dll HTTP/1.0
Content-Length: 151
User-Agent: talkback/1.0; Win32
Content-Type: application/x-spiral-fcmp

FCMP
-----------------------------------------------------------------------
HTTP/1.0 200 OK
Date: Wed, 26 Feb 2003 11:13:52 GMT
Content-Length: 85
Content-Type: application/x-spiral-fcmp
Server: Netscape-Enterprise/4.1
Via: 1.1 webcacheB03 (NetCache NetApp/5.2.1R3)
-----------------------------------------------------------------------
POST http://talkback.netscape.com:80/spiral-bin/Collector.dll HTTP/1.0
Content-Length: 121017
User-Agent: talkback/1.0; Win32
Content-Type: application/x-spiral-fcmp

FCMP
-----------------------------------------------------------------------
These happen in the space of a second or so; after the last request from my
machine (the third item above), there is approx 25 second delay (Naviscope shows
'Waiting for response') then I get the 'Unable to connect' message.

To me, it looks like the 25000ms timeout mentioned in comment 19 isn't long enough!
Russell:  Can you try to delete the oldest incident waiting to send and try to
manually send the next one with the button in the Talkback UI?  That worked for
me yesterday during some testing with dialup connections.  Let me know what you see.
Reassigning to Shiva.
Assignee: greer → namachi
I only have one incident in the queue at the moment - they all get lost when I
install the latest version. If I get another one, I'll try it and post result
here. Anyone else able to give this a try?
OK, I generated an incident manually using the button in the UI, deleted the
existing incident, and hit the button. Still no joy.
Here is a possible isolation of a variable:

Talkback always works at my place on my cable connection (no router, NAT, etc.),
but if I take the laptop to my parents' place and put it on the 28.8/Win2K-NAT,
it almost always fails.  Same machine, same config, the only variable is the
connection.

This is on an iBook G3 with Mozilla Build ID 2003021017.
I am having the same problem many others seem to be having.  To me it looks like
a simple timeout issue, as it is sending data upstream as fast as my modem
connection allows the entire time until it abruptly fails.

How about adding a features where we could dump the report to a file and send it
via email to an address set up just for this?  I wouldn't mind the crashes as
much if I felt I was able to at least do some good by getting these talkback
reports submitted.

If I leave the talkback agent running for DAYS eventually the stars are aligned
right (connection's running a little faster?  server is a little less busy? 
Internet less congested?) and the reports usually get through.  But it sure
would be nice to just be able to send them when they happen, even if I had to do
a little work (the email solution I mentioned above) to make it happen.
Bugs 60968, 167536, and 189702 seem to all be the same.  People have been having
this problem for over two years, and still no resolution.

Contrary to what I said in a previous message, the problem does seem to be with
connecting.  I installed a firewall and watched what was going on, and it never
did successfully make a connection (despite maxing out my modem's upstream
bandwith the entire time...what on earth is it doing?).

If I copy the talkback data and take it into work, it goes through with no
problem the first time I try it, every time.  My work connection is faster but
much more restricted, so that's surprising.  Perhaps this wasn't written with
modem users in mind.  Perhaps the upstream bandwith maxing out shows there is
some flaw in the code, such as it has an unrealistic timeout for connecting and
keeps retrying and retrying, never really giving it a chance to respond.  As
I've said before, once in a blue moon it does work, so the capability is there.

This problem has been there for years, and I bet it's affecting a LOT of people
(how many bother to go into Bugzilla and complain?  Most probably just turn it
off).  I also know that it's not being taken seriously enough to get fixed.  You
have three identical bugs, and nobody's even noticed that.  That shows how
seriously it's taken.  Oh, and all three are still marked as "NEW" even though
one is over two years old.

We've asked for this to get fixed.  I've asked for an alternative method of
sending these (email or uploading) if that can't be done.

I guess it's just time to say if you don't care, then we don't care...time to
turn this feature off.

*** This bug has been marked as a duplicate of 189702 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
-v
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.