Closed Bug 1154298 Opened 9 years ago Closed 9 years ago

Crashes on Windows XP link to about/throttling by default

Categories

(Socorro :: General, task)

x86
Windows XP
task
Not set
normal

Tracking

(firefox38-, firefox41 fixed)

RESOLVED FIXED
Tracking Status
firefox38 - ---
firefox41 --- fixed

People

(Reporter: FlorinMezei, Assigned: away)

References

Details

Attachments

(1 file)

Reproducible with: 
- Firefox 38 Beta 4 - BuildID: 20150413143743
- also reproduced with Firefox 37 Beta 7 and Firefox 36.0.1

Reproducible on: Windows XP SP2 x86

Steps to reproduce:
1. Open Firefox with a fresh profile and install the crashme add-on (http://people.mozilla.org/~tmielczarek/crashme/).
2. After installation, go to Tools -> "Crash me!".
3. When Firefox crashes, introduce some data as desired then choose to Restart/Quit Firefox.
4. Open Firefox again and go to about:crashes and check the link for the previous crash.

Expected results:
Link should be in the form "bp-1b8ce92f-0524-413f-bfb3-061602150324" and point to the correct page on Socorro (https://crash-stats.mozilla.com/report/index/bp-1b8ce92f-0524-413f-bfb3-061602150324).

Actual results:
Link is in the form "7bb2be59-0065-47a0-8b31-f045a2620982" and points to https://crash-stats.mozilla.com/about/throttling. More details:
- if I right clicked on the link and opened it in a separate tab then I got the about/throttling page
- if I left clicked on the link, then it changed to a link with correct form, but a different ID - the link may have been correct, but I don't see the comments submitted so I cannot confirm
- after a few minutes a few of the remaining links also changed to correct form with different IDs

IDs before any link changed:
7bb2be59-0065-47a0-8b31-f045a2620982
4390092c-ea51-4fd5-818e-b63cdeac35e3
c433603a-69cb-437c-89d1-215e41801e11
77a7ec8c-07fc-42d8-81e3-93670d74e72c
8552fd17-9a3e-4c12-8ea9-db5f507c63ee
2be1f9d6-08b4-43ca-be2a-f23b5dd5fc45
38694dc2-afe7-4a3f-96e3-9af6383a9619
b449cdf5-706c-4532-9656-d665b4bceb5a

IDs after several links changed:
bp-ee066903-3d6c-4fff-85e7-8b7272150414
bp-cdf01fe1-a9b7-4e7b-8cfd-089642150414
bp-c7754228-fd20-4bf3-b4b7-6dfdb2150414
bp-4a5466d8-199b-4629-bcb9-ea0522150414
bp-16fb0150-ffb7-428c-a593-b8f002150414
bp-f503531b-637d-4f8c-a659-cf2ee2150414
c433603a-69cb-437c-89d1-215e41801e11
77a7ec8c-07fc-42d8-81e3-93670d74e72c

Notes:
- the issue seems to always reproduce on Win Xp x86 (no issues on Win 7 or 8, Mac or Ubuntu)
- the Crash Reporter says that the crash reports were submitted successfully
That sounds like they are not submitted correctly, actually, otherwise they already would have IDs that end in a date, like the ones in the second list (after clicking them, which triggers a re-submit). Sounds to me like the -xpsp2 submission end point is not working correctly.
Blocks: 1138794
I think this is a confusing thing (and possibly a bug we should fix) about the about:crashes UI, it's been like this for a long time and I can repro this on Mac Nightly.

Crashes that have not been submitted yet should be in all caps e.g.:
67031626-7E1F-4808-B373-C871AD7A71F8

If you right-click on such a crash, you'll be able to open a link in a new tab to the /about/throttling page (which no longer exists because it's not true, we don't do the client-side throttling the page used to describe.) If you left-click the crash will be submitted and the server will return a crash ID replacing it (such as bp-e6d3cfec-6912-487e-80f6-d35432150414)
Flags: needinfo?(florin.mezei)
What rhelmer says is correct. I second Kairo's idea that this is likely the xpsp2 endpoint not working properly. If you look in %APPDATA%\Mozilla\Firefox\Crash Reports\submit.log you should see error messages which might hint as to why the submission initially failed.
Investigated a bit more today, on the same machine, and got the following results:

1. Crashed Firefox 5 times, and submit.log indicates all 5 crashes were submitted without error (same thing reported by the Crash Reporter):
[04/15/15 10:30:17] Crash report submitted successfully
[04/15/15 10:33:52] Crash report submitted successfully
[04/15/15 10:34:14] Crash report submitted successfully
[04/15/15 10:35:24] Crash report submitted successfully
[04/15/15 10:35:53] Crash report submitted successfully

2. about:crashes displayed only 3 of the crashes above (the first two and the last one). It seems it displays only the crashes for which I chose to "Restart Firefox" in the Crash Reporter. For the 3rd and the 4th I chose to "Quit Firefox" and they did not show up in about:crashes. The crashes displayed in about:crashes (all lowercase, and all pointing to about/throttling):
52ea64a4-d6f4-4876-a045-f101b4bae6f9 	4/15/2015	10:35 AM
e7215661-0275-484c-8624-fd58b6374f73 	4/15/2015	10:33 AM
21658f7b-dc66-4e9f-b49c-3ac953ed54eb 	4/15/2015	10:30 AM

3. Left clicking on each of the 3 crashes that showed up in about:crashes changed them to the following:
bp-26247221-d200-4bcb-9826-6ad122150415	4/15/2015	10:35 AM
bp-4f775f6d-3e3b-4543-a6dc-6cde52150415	4/15/2015	10:33 AM
bp-de989c81-2290-4198-954f-6cbf02150415	4/15/2015	10:30 AM

Let me know if you need any other info on this. Also, if you think that issue #2 should be tracked separately.
Flags: needinfo?(florin.mezei)
Should the crash report GUID be appended to "Crash report submitted successfully"?
Something is definitely going wrong with submission and handling the submission there. The client creates a really random ID for the crashes, and then the collector (the part of Socorro that actually receives the crash reports) should assign one ending in the date (150415 in your case) and the client replaces its own with that one. Somehow that second part does not seem to happen.
[Tracking Requested - why for this release]:
Given this means that Windows XP crash reports will not show up in crash stats, I'm nominating this for all not-yet-released trains.
rhelmer, do you think this should be reported separately? Does this happening on a Mac mean that Mac crashes may also be underreported?
Flags: needinfo?(rhelmer)
Tracking for 39+.  Is this a problem in Socorro itself or in Firefox?
This is purely about an issue with the (new) Windows XP SP2 end point of crash reporting, if there's something on Mac, that would need an entirely new bug.
(In reply to Liz Henry (:lizzard) from comment #9)
> Tracking for 39+.  Is this a problem in Socorro itself or in Firefox?

Socorro is set up to receive crashes from XP (we have a special weaker HTTPS endpoint for it) set up in bug 1138794, that might need further adjustment but it seemed to test OK from XP at the time.

Finding someone with the right version of XP is the trick.

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #10)
> This is purely about an issue with the (new) Windows XP SP2 end point of
> crash reporting, if there's something on Mac, that would need an entirely
> new bug.

Yes agreed, the XP situation is quite unique and unlikely to be related to any other problem.
Flags: needinfo?(rhelmer)
Richard, does the end point set up in bug 1138794 comment #55 depend on SNI possibly? I just realized that we need to make sure we don't depend on that for this end point as WinXP didn't support SNI yet.
Flags: needinfo?(rsoderberg)
No, it's not. You might open IE6 SP2 and verify that the endpoint loads correctly without SSL errors, since that's a test I cannot perform myself.
Flags: needinfo?(rsoderberg)
(That endpoint has a dedicated IP address shared by no other uses, because we can't mix 'permit SSLv3' and 'prohibit SSLv3' on any given IP address.)
Florin, can you do the test stated in comment #13?
Flags: needinfo?(florin.mezei)
(In reply to Richard Soderberg [:atoll] from comment #13)
> No, it's not. You might open IE6 SP2 and verify that the endpoint loads
> correctly without SSL errors, since that's a test I cannot perform myself.

If this means loading crash-reports-xpsp2.mozilla.com in IE6 SP2, I've tried this today on Windows XP SP2, but the page did not load:
- http://crash-reports-xpsp2.mozilla.com - tells me it cannot find the page
- https://crash-reports-xpsp2.mozilla.com - redirects to https://www.mozilla.org/en-US/firefox/new/

Used IE6:
Version: 6.0.2900.2180.xpsp_sp2_gdr.100216-1441
Update Versions: SP2
Flags: needinfo?(florin.mezei)
It's https only and redirected you without SSL errors, which means it's working as configured anyways.
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #16)
> (In reply to Richard Soderberg [:atoll] from comment #13)
> > No, it's not. You might open IE6 SP2 and verify that the endpoint loads
> > correctly without SSL errors, since that's a test I cannot perform myself.
> 
> If this means loading crash-reports-xpsp2.mozilla.com in IE6 SP2, I've tried
> this today on Windows XP SP2, but the page did not load:
> - http://crash-reports-xpsp2.mozilla.com - tells me it cannot find the page
> - https://crash-reports-xpsp2.mozilla.com - redirects to
> https://www.mozilla.org/en-US/firefox/new/
> 
> Used IE6:
> Version: 6.0.2900.2180.xpsp_sp2_gdr.100216-1441
> Update Versions: SP2

This endpoint accepts only HTTP POST, GETs are redirected.
:kairo, is there anything else we can diagnose for you server-side, or do you have what you needed for comment 1?
Flags: needinfo?(kairo)
(In reply to Richard Soderberg [:atoll] from comment #19)
> :kairo, is there anything else we can diagnose for you server-side, or do
> you have what you needed for comment 1?

It doesn't sound to me like we'd be any closer to a result. We'd need someone with a WinXP SP2 machine and someone who can watch the server side (including the Socorro collector) to work together and see where things flow correctly and where they get stuck.
Flags: needinfo?(kairo)
If I can get timestamped logs from a crash reporter running on SP2, I can correlate that with logs on the server side to ensure that we're passing the traffic correctly through Zeus, and then you and the Socorro team can work together directly to trace the issue beyond that.
Florin, are the logs mentioned in comment #21 something you could provide?
Flags: needinfo?(florin.mezei)
(In reply to Richard Soderberg [:atoll] from comment #21)
> If I can get timestamped logs from a crash reporter running on SP2, I can
> correlate that with logs on the server side to ensure that we're passing the
> traffic correctly through Zeus, and then you and the Socorro team can work
> together directly to trace the issue beyond that.

Richard, can you provide some details on how to setup and get these logs?
Flags: needinfo?(florin.mezei) → needinfo?(rsoderberg)
I cannot, sorry. My scope of knowledge for this bug is restricted to the load balancer component:

Firefox Client -> crash-reports-xpsp2 Load Balancer VIP -> Socorro cluster

Given a timestamped log from the Firefox Client, or a source IP and a window of time within +/- 60 seconds of the test, I should be able to locate the Load Balancer logs to confirm a successful SSL negotiation from the client. Once that's confirmed, the Socorro team can investigate using that same timestamped log to try and understand what's going wrong. Additionally, the log from the client may well expose the issue without any further investigation.

I have assumed up to this point that the client supports debug logging for crash report submission, as it appears to be extremely difficult to analyze any issues with submissions without a debug log (as evidenced by the previous twenty comments). Could someone please point us to the instructions for enabling crash reporting debug logging on the client so that we can get more certainty about what's happening here?
Flags: needinfo?(rsoderberg)
:rhelmer, in comment 2 you were able to reproduce this issue somehow - could you please give us the steps to reproduce and provide the client crash reporting logs from that effort?
cc :kjozwiak who has an XP SP2 Windows VM of the type necessary to test this issue, in case anyone needs one.
Too late for 38.
Comment 4 and Comment 16 suggest that this is a client bug -- at least, there doesn't look to be much evidence that there's a server problem.

Comment 3 and Comment 6 seem to disagree, but if so, a Windows dev is needed to investigate.

Still, if this is tracking-39, it should have an assignee.

Benjamin, could you triage and assign, please?
Flags: needinfo?(benjamin)
Redirecting to dmajor.
Flags: needinfo?(benjamin) → needinfo?(dmajor)
(In reply to Florin Mezei, QA (:FlorinMezei) [PTO - May 18-29] from comment #0)
> - the Crash Reporter says that the crash reports were submitted successfully

I am not seeing that behavior.

Environment: Fresh XPSP2 install, fresh Firefox 38.0.1 install

- I forced a crash with CrashMe
- On the crash reporter I said "Quit Firefox"
- "There was a problem submitting your report."
- Nothing on about:crashes
- Forced another crash
- On the crash reporter I said "Restart Firefox"
- "There was a problem submitting your report."
- Non bp- crash on about:crashes

Tested the following in IE (to simulate crashreporter's network stack):
- https://crash-reports.mozilla.com/ -> error
- https://crash-reports-xpsp2.mozilla.com/ -> redirect to mozilla.org

How can I debug this further?
Flags: needinfo?(dmajor) → needinfo?(ted)
I guess I would crash Firefox on the Windows XP SP2 box, then attach a debugger to the crashreporter process and step through and see what's happening.
Flags: needinfo?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #31)
> I guess I would crash Firefox on the Windows XP SP2 box, then attach a
> debugger to the crashreporter process and step through and see what's
> happening.

HttpSendRequest gets a 302.

Now that I think about it, this redirect shouldn't be happening:
> Tested the following in IE (to simulate crashreporter's network stack):
> - https://crash-reports-xpsp2.mozilla.com/ -> redirect to mozilla.org

Shoudn't crash-reports-xpsp2 be set up just like crash-reports?
Flags: needinfo?(rsoderberg)
(In reply to David Major [:dmajor] from comment #32)
> Shoudn't crash-reports-xpsp2 be set up just like crash-reports?

It should, just with weaker security.
The socorro-collectorX.webapp.phx1 nodes are behaving differently depending on whether the domain is 'crash-reports-xpsp2.mozilla.com' or 'crash-reports.mozilla.com', which they should not be.

I verified that the load balancers are configured equivalently and that everything is routed to the same worker pool (socorro-collectorX) regardless of which IP/domain the request is for.

This is where I would normally move this to the Socorro component for further triage - only we're already in the Socorro component.
Flags: needinfo?(rsoderberg)
+ServerAlias crash-reports-xpsp2.mozilla.com

Shipping this to the crash-reports httpd config.
With the server fix in place, crash reports are working. The crash reporter UI reports success (with both "Quit" and "Restart"), and Firefox shows the entries in about:crashes.
Yay!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Untracking this for 39 and 40 since this is now fixed (but not fixed in any particular version of Firefox).
Verified fixed with 41.0a1 (Build ID: 20150527135446) and 39.0b1 build 2 (Build ID: 20150523155636), under XP 32-bit SP2 and SP3.

Although, under XP SP2 x64 with the same two builds (41.0a1 and 39.0b1), this issue is reproducible - please see the details below.

Testing results:
* the links via about:crashes are in this form -> "7bb2be59-0065-47a0-8b31-f045a2620982" and point to https://crash-stats.mozilla.com/about/throttling
* when I right click on any link and select to open it in a separate tab, the about/throttling page is displayed
* when I left click on any link, it changes to a link with the correct form, but a different ID
* none of the comments are submitted
* '[05/28/15 10:44:23] Crash report submission failed: A connection with the server could not be established' is displayed via submit.log for every submitted crash; 
* when loading crash-reports-xpsp2.mozilla.com (both http and https) in IE6 SP2 (version 6.0.3790.3959), it redirects to ‘The page cannot be displayed’ page

IDs before any link changed:
9fa9d56e-fbe4-4c17-bbe8-505d245103f3
08aaa34f-f9ca-4b40-b0c7-a628e627b00e
e593bed3-b46f-4d85-9ddf-b6958a4f0dd6

IDs after several links changed:
bp-3a2134db-44b6-42f4-a6b9-ecd952150528
bp-bb9fdcc4-f326-4652-a2fd-3786e2150528
bp-17986936-0de3-4a92-80a7-5e7b12150528
Flags: needinfo?(rsoderberg)
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #41)
> Although, under XP SP2 x64 with the same two builds (41.0a1 and 39.0b1),
> this issue is reproducible - please see the details below.

David, is the mechanism to use the XP SP2 endpoint even detecting XP 64bit correctly?

That said, there are probably very few people on 64bit XP.
Flags: needinfo?(dmajor)
> David, is the mechanism to use the XP SP2 endpoint even detecting XP 64bit
> correctly?

Probably not, because XP x64 isn't really Windows XP! Under the hood it's the same code branch as Windows Server 2003.

Alexandra, can you try the same test on Windows Server 2003 SP1 and Server 2003 SP2?
Flags: needinfo?(rsoderberg)
Flags: needinfo?(dmajor)
Flags: needinfo?(alexandra.lucinet)
I've tested on Windows Server 2003 SP1 and Windows Server 2003 SP2 using Firefox 39.0b1 build 2 (buildID: 20150523155636) and I have the following results: 

- on Windows Server 2003 SP2 (32bit and 64bit), the issue is fixed

- on Windows Server 2003 SP1 (32bit and 64bit), the issue is reproducible - please see the details below:

       * the links via about:crashes are in this form -> "7bb2be59-0065-47a0-8b31-f045a2620982" and point to https://crash-stats.mozilla.com/about/throttling
       * when I right click on any link and select to open it in a separate tab, the about/throttling page is displayed
       * when I left click on any link, it changes to a link with the correct form, but a different ID
       * none of the comments are submitted

       IDs before any link changed:
         d6a3478a-8fb7-44ba-91c1-b5c44bca61da
         38c7a5bb-e7d8-46d3-902e-932f2ece4413
         a58b5311-9b94-4f92-82de-42e07bca12f7
         637c72dd-4989-41dd-9070-a064d0826e86

       IDs after several links changed:
         bp-d18c29c7-27d5-49e1-8b22-1aeb72150529
         bp-41c12a97-434a-4061-8d9a-54af42150529
         bp-098e7b04-97a2-4d2c-b284-50b3f2150529
         bp-427dc611-120c-46af-8893-64b232150529
Flags: needinfo?(alexandra.lucinet)
Can you please try this build: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dmajor@mozilla.com-ab337a57ebf1/try-win32/

On XP SP2 (32 and 64), XP SP3, Server 2003 SP1, and Server 2003 SP2? Thanks!
Flags: needinfo?(camelia.badau)
I've tested on Windows XP SP2 (32bit and 64bit), Windows XP SP3 (32bit), Windows Server 2003 SP1 (32bit and 64bit) and Windows Server 2003 SP2 (32bit and 64bit) using the try build from comment 45 (http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dmajor@mozilla.com-ab337a57ebf1/try-win32/) and the issue is fixed.
Flags: needinfo?(camelia.badau)
Assignee: nobody → dmajor
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This is almost more trouble than it's worth :-/
Attachment #8614068 - Flags: review?(ted)
Attachment #8614068 - Flags: review?(ted) → review+
https://hg.mozilla.org/mozilla-central/rev/dc43c6bdd468
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
¡Hola Florin!

I've found that middle clicking on unsubmitted crash reports on Nightly 64-bit on Windows 7 [Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0 ID:20150916030203 CSet: 3e8dde8f8c174cce2c0b65c951808f88e35d1875] also sends directly to https://crash-stats.mozilla.com/about/throttling

Is this the same bug or a different one?

¡Gracias!
Flags: needinfo?(florin.mezei)
You'd see that any time a crash fails to submit. Crashes can fail to submit for any number of reasons.
¡Hola Ted!

But how comes left clicking submits those and then points you into the right direction?

Could something be done so middle-click did what left-click does?

¡Gracias!
Flags: needinfo?(ted)
That's a longstanding bug--bug 557739.
Flags: needinfo?(ted)
Removing the ni? since Ted has answered pretty much all questions. Note that this issue was filed for Windows XP SP2 x86 always showing all crash links in unsubmitted form, despite reporting that crashes were successfully submitted. As Ted explained, crashes can fail to submit for various reason so we can still get this issue every now and then on various OSes.
Flags: needinfo?(florin.mezei)
The check is not even correct, I think it depends on whether a hotfix/update is installed on Server 2003.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: