Closed Bug 1161720 Opened 9 years ago Closed 8 years ago

Snappy Symbolication Server isn't returning any data

Categories

(Socorro :: Symbols, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: vladan, Unassigned)

References

Details

Profiling Nightly fails with error message "Could not understand response from symbolication server at http://symbolapi.mozilla.org/"

Sending a test request to the symbol server returns a 0-length response.

$ curl -d '{"stacks":[[[0,11723767],[1, 65802]]],"memoryMap":[["xul.pdb","44E4EC8C2F41492B9369D6B9A059577C2"],["wntdll.
pdb","D74F79EB1F8D4A45ABCD2F476CCABACC2"]],"version":4}' http://symbolapi.mozilla.org/

curl: (18) transfer closed with outstanding read data remaining

$ curl --http1.0 -d '{"stacks":[[[0,11723767],[1, 65802]]],"memoryMap":[["xul.pdb","44E4EC8C2F41492B9369D6B9A059577C2"]
,["wntdll.pdb","D74F79EB1F8D4A45ABCD2F476CCABACC2"]],"version":4}' http://symbolapi.mozilla.org/

<exits with an empty response>

You might want to just check the logs and just restart the server before doing any further debugging.
Flags: needinfo?(ted)
I don't have access to the production server, I don't know what's going on. The code got updated a while ago, but maybe it's wedged or something?
Flags: needinfo?(ted) → needinfo?(smani)
Btw can I get SSH access to the symbol server to fix future issues?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #2)
> I don't have access to the production server, I don't know what's going on.
> The code got updated a while ago, but maybe it's wedged or something?

I kicked it fwiw.

The first test URL is a malformed request? That always fails for me. The second one works fine :

shyam@katniss ~ $ curl -v --http1.0 -d '{"stacks":[[[0,11723767],[1, 65802]]],"memoryMap":[["xul.pdb","44E4EC8C2F41492B9369D6B9A059577C2"],["wntdll.pdb","D74F79EB1F8D4A45ABCD2F476CCABACC2"]],"version":4}' http://symbolapi.mozilla.org/
* Hostname was NOT found in DNS cache
*   Trying 63.245.217.177...
* Connected to symbolapi.mozilla.org (63.245.217.177) port 80 (#0)
> POST / HTTP/1.0
> User-Agent: curl/7.37.1
> Host: symbolapi.mozilla.org
> Accept: */*
> Content-Length: 163
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 163 out of 163 bytes
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.3 Python/2.6.6
< Content-type: application/json
< Date: Thu, 07 May 2015 04:33:45 GMT
< Connection: close
<
* Closing connection 0
{"symbolicatedStacks": [["XREMain::XRE_mainRun() (in xul.pdb)", "KiUserCallbackDispatcher (in wntdll.pdb)"]], "knownModules": [true, true]}%
Flags: needinfo?(smani)
Your command returns no response when I try it. I'm on the Mozilla network in SF office

$  curl -v --http1.0 -d '{"stacks":[[[0,11723767],[1,65802]]],"memoryMap":[["xul.pdb","44E4EC8C2F41492B9369D6B9A059577C2"],["wntdll.pdb","D74F79EB1F8D4A45ABCD2F476CCABACC2"]],"version":4}' http://symbolapi.mozilla.org/

* STATE: INIT => CONNECT handle 0x80048140; line 1048 (connection #-5000)
* Added connection 0. The cache now contains 1 members
*   Trying 63.245.217.177...
* STATE: CONNECT => WAITCONNECT handle 0x80048140; line 1101 (connection #0)
* Connected to symbolapi.mozilla.org (63.245.217.177) port 80 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x80048140; line 1197 (connectio                     n #0)
* STATE: SENDPROTOCONNECT => DO handle 0x80048140; line 1215 (connection #0)
> POST / HTTP/1.0
> Host: symbolapi.mozilla.org
> User-Agent: curl/7.42.1
> Accept: */*
> Content-Length: 163
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 163 out of 163 bytes
* STATE: DO => DO_DONE handle 0x80048140; line 1305 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x80048140; line 1432 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x80048140; line 1445 (connection #0)
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.3 Python/2.6.6
< Content-type: application/json
< Date: Thu, 07 May 2015 22:28:58 GMT
< Connection: close
<

* STATE: PERFORM => DONE handle 0x80048140; line 1615 (connection #0)
* Closing connection 0
* The cache now contains 0 members
Flags: needinfo?(smani)
Ping? We're pretty blind here for getting profiles from contributors until this gets fixed.
Flags: needinfo?(srich)
Mike: For the time-being, we can deploy the symbol server locally, since the symbols are available publicly in AWS anyways. 

1. Clone https://github.com/mozilla/Snappy-Symbolication-Server
2. Run it with "python symbolicationWebService.py 8000"
3. Point Nightly's profiler.symbolicationUrl pref to http://127.0.0.1:8000/ You don't need to restart Nightly

I had to fix a bug in the server to get this working (same bug in production?), but at least we can profile again

Shyam: can you deploy my fix on the production server? I already merged it into the github repo
On second thought, production almost definitely does not have the bug (it would only affect Windows servers). Also, users still can't capture profiles for us
I believe Vladan and I just fixed this over Vidyo. Required a complete kill of all running processes and a clean restart.
Flags: needinfo?(srich)
Flags: needinfo?(smani)
Yes, this is now working for me, thank you!
Were you able to figure out what was going on, or did you just kill it and hope for the best? I'm worried that the recent changes may have introduced some bug that doesn't show up until the server has been running for a while.
We just killed it, but at one point there was more than 1 symbolication server running, so that's one possible explanation
.. we also removed the mru symbols .json file
The server is down again. It looks like requests are timing out after 45 seconds. Now that the symbol server is fetching the data remotely from AWS instead of reading from NFS, the requests might be taking longer to process (> 45s), so the load balancer or something similar might be terminating the connections after 45 seconds.
Until the symbol server is working properly, here's an alternative server: http://symbolapi.mocotoolsstaging.net/
(In reply to Mike Conley (:mconley) - PTO until June 2nd from comment #17)
> Until the symbol server is working properly, here's an alternative server:
> http://symbolapi.mocotoolsstaging.net/

This is the staging site, we just brought up what will be the new production symbolapi which should be more stable:

http://symbolapi.mocotoolsprod.net/

At some point in the not-too-distant future, symbolapi.mozilla.org will point here too.
Is symbolapi.mozilla.org right and working now?
Component: General → Symbols
Not sure if it's related, but the current version of the gecko profiler fails to load profiles with the following error on the console:

> TypeError: profile.tasktracer is undefined Symbolicate Worker.js:285
We updated the addon on Wednesday to mark it as e10s compatible. I can confirm that it's broken at the moment (stuck at "Gathering unresolved symbols"), so it sounds like it was not e10s compatible after all. I'm sorry about that, I'll update it again today to revert that change.
I've released a new version which keeps the "multiprocess compatible" flag but fixes the tasktracer error, and profiling is working again on Mac and Linux. I haven't tested Windows which is the only platform that needs the symbol server IIRC.
It works now.

There also is a conflict with HTTPS Everywhere since the symbol server does not support https.
(In reply to The 8472 from comment #24)
> It works now.
> 
> There also is a conflict with HTTPS Everywhere since the symbol server does
> not support https.

Sad panda
I have filed https://github.com/EFForg/https-everywhere/issues/6613 with them so they can add an exception.

... unless TLS could be added quickly to the server?
We should have SSL on this unless there's a reason not to? Specifically since it's mozilla.org URL?
This issue is resolved. Please file the SSL issue or any others as new bugs. Thank you.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.