Closed
Bug 687888
Opened 13 years ago
Closed 13 years ago
Missing symbols in crash reports for recent nightly builds
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: alice0775, Assigned: nmaul)
Details
Build Identifier: http://hg.mozilla.org/mozilla-central/rev/648d084ca28e Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0a1) Gecko/20110920 Firefox/9.0a1 ID:20110920030905 Crash Reporter always sends signature: xul.dll@xxxxx or mozjs.dll@xxxx like a hourly build, does not send not exact signature. Reproducible: Always Steps to Reproduce: 1. Start browser 2. Crash browser 3. Actual Results: always sends signature: xul.dll@xxxxx or mozjs.dll@xxxx Expected Results: Should send exact signature.
Comment 1•13 years ago
|
||
Symbols were successfully uploaded for this build, so I suspect it's a problem on the Socorro side.
Component: Release Engineering → Socorro
Product: mozilla.org → Webtools
QA Contact: release → socorro
Reporter | ||
Comment 2•13 years ago
|
||
This happens since http://hg.mozilla.org/mozilla-central/rev/5319b0100025 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0a1) Gecko/20110919 Firefox/9.0a1 ID:20110919030912
Comment 3•13 years ago
|
||
Can you paste a link to one of your crash reports?
Reporter | ||
Comment 4•13 years ago
|
||
Wrong Sig on Build ID 20110919030912 bp-b9949de2-d944-4b48-b7b9-37c572110920 Wrong Sig on Build ID 20110920030905 bp-e9e04be9-9837-4aaf-92b4-9800d2110920 The above Sig should be same as follows Correct Sig on Build ID 20110918030911 bp-94216f13-5f95-44f8-9abb-6bba62110920
Comment 5•13 years ago
|
||
Okay, I can reproduce the same issue using "Crash me now": https://crash-stats.mozilla.com/report/index/bp-eaee3815-ce8d-4ff9-b066-601562110920 I've looked on the symbol store in SJC (dm-symbolpush01) and the symbols appear to be present, so perhaps there's an issue with the syncing to the PHX datastore.
Assignee | ||
Comment 6•13 years ago
|
||
The sync job is on dm-symbolpush01 in SJC, and it syncs to ip-admin01.phx in PHX1. I can confirm that this appears to be working properly and hasn't emailed us any errors. Ted gave me a sample file to check: SJC: [root@dm-symbolpush01 ~]# md5sum /mnt/netapp/breakpad/symbols_ffx/mozjs.pdb/471CD365868F4341ACB60D2C33D9D9992/mozjs.sym 57832686a69bf0d7b82f6b7e5929c6c0 /mnt/netapp/breakpad/symbols_ffx/mozjs.pdb/471CD365868F4341ACB60D2C33D9D9992/mozjs.sym PHX: [root@ip-admin01 ~]# md5sum /mnt/pio_symbols/symbols_ffx/mozjs.pdb/471CD365868F4341ACB60D2C33D9D9992/mozjs.sym 57832686a69bf0d7b82f6b7e5929c6c0 /mnt/pio_symbols/symbols_ffx/mozjs.pdb/471CD365868F4341ACB60D2C33D9D9992/mozjs.sym So it exists on both sides, and the copies are identical.
Comment 7•13 years ago
|
||
So either things aren't being synced quickly enough (which seems unlikely, the script is running every 5 minutes and the files appear to be there) or something has gone wrong with our processor config and it can't see the symbols.
Severity: normal → blocker
Comment 8•13 years ago
|
||
I crashed my almost-month-out-of-date Linux build and it had symbols: https://crash-stats.mozilla.com/report/index/bp-656a50ec-716e-4180-8fed-1e8322110920 I updated to today's nightly build and crashed again and no symbols: https://crash-stats.mozilla.com/report/index/bp-106685f2-08b2-4892-bd13-84e1d2110920 It looks like the syncing script isn't working quickly enough, or something like that.
Assignee: nobody → server-ops
Component: Socorro → Server Operations
Product: Webtools → mozilla.org
QA Contact: socorro → cshields
Updated•13 years ago
|
Assignee: server-ops → nmaul
Updated•13 years ago
|
Assignee: nmaul → server-ops
Assignee | ||
Comment 9•13 years ago
|
||
It syncs from sjc->phx every 5 minutes. Would you be able to give more examples of symbols files that seem to be missing? The one from comment 6 is good, are there others we can check? Why do we believe this is a sync issue from SJC to PHX? Dropping prio to avoid paging on-call so quickly.
Assignee: server-ops → nmaul
Severity: blocker → critical
Comment 10•13 years ago
|
||
Per comment 8, an older Linux nightly worked fine (which indicates that symbols aren't completely broken), but today's nightly had no symbols (which indicates that it's not just windows nightlies). In that crash report, I checked that one of the symbol files was present on dm-symbolpush01 immediately after I viewed my crash report: symbols_ffx/libxul.so/42037E50B4400C0C59F9F9F08F81174C0/libxul.so.sym so the only thing I can imagine is that it's not getting synced to PHX properly. I just tried crashing that same build again and the symbols are still not showing up: https://crash-stats.mozilla.com/report/index/16ad86be-9c20-4791-a3cb-2476c2110920 Is that symbol file above present in PHX?
Updated•13 years ago
|
Summary: Nightly9.0a1 ID:20110920030905 , Something wrong in crash report Signature → Missing symbols in crash reports for recent nightly builds
Comment 11•13 years ago
|
||
The sync process doesn't appear to be picking everything up. For example, the symbols for ted's crash: In SJC: in /mnt/pio_symbols/symbols_ffx/mozjs.pdb -rw-r--r-- 1 ffxbld users 5119934 Sep 20 11:37 mozjs.pdb/9C6EB556CE72441BAAAED9531D1FDFCA2/mozjs.sym In PHX: this file doesn't exist, 24 hours later. Looking in /mnt/socorro/symbols/symbols_ffx/mozjs.pdb
Comment 12•13 years ago
|
||
How can we verify the processor config? Here is what is in /etc/socorro/common.conf : export processorSymbolsPathnameList="/mnt/socorro/symbols/symbols_ffx,/mnt/socorro/symbols/symbols_sea,/mnt/socorro/symbols/symbols_tbrd,/mnt/socorro/symbols/symbols_mob,/mnt/socorro/symbols/symbols_penelope,/mnt/socorro/symbols/symbols_sbrd,/mnt/socorro/symbols/symbols_camino,/mnt/socorro/symbols/symbols_os,/mnt/socorro/symbols/symbols_solaris,/mnt/socorro/symbols/symbols_opensuse,/mnt/socorro/symbols/symbols_ubuntu,/mnt/socorro/symbols/symbols_fedora" Can someone verify if that is correct, or if any other config values should be verified?
Assignee | ||
Comment 13•13 years ago
|
||
With fresh eyes, the problem is obvious. It looks like the box that SJC syncs to in PHX has had the NFS share unmounted. This happened on Sunday, just minutes before the commit in comment 2. The server was rebooted, and this mount point was not set in /etc/fstab or puppet. I've added it to puppet, and it's mounted up properly now. I don't know if the incremental job will not be sufficient to fill in the gaps... we may need to re-run the complete job. This usually runs Sunday at 3am, and takes 5 hours... I suspect it'll take a bit longer during the day, and it will prevent the incremental sync while it's running. The incremental sync is running now, just in case that is actually sufficient. If it seems not, I'll start the full sync right away.
Assignee | ||
Comment 14•13 years ago
|
||
The incremental sync seems to not be getting the job done. I've started a full sync job, which should take 4-6 hours to complete. Once that's done I believe things will be back to normal...
Comment 15•13 years ago
|
||
And for the record, bug 688186 covers replacing dm-symbolpush01 (which lives in SJC) with an equivalent upload server in PHX so that we can stop this syncing process.
Assignee | ||
Comment 16•13 years ago
|
||
The full sync completed, and the incrementals have been running since. The link in comment 10 still doesn't show symbols, but is that normal? I don't know if it pulls symbols on every page hit, or just during the actual crash. Would someone be able to verify if this is working properly again?
Assignee | ||
Comment 17•13 years ago
|
||
I think this is working again: https://crash-stats.mozilla.com/report/index/0f3361e6-f566-4605-9c3e-fd4132110922 I'm guessing it's expected behavior for crash-stats to not go back and fill in the data on reports generated during the problem interval. In any case I don't think there's anything more I can do here. Closing this out. If anyone knows how we might easily populate the symbols for crashes that have already been reported, please let me know.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 18•13 years ago
|
||
I think it is possible to re-process those crashes. I'd guess we'd need to redo all crashes between 20:20 Sunday and now. I think in the past lars and jberkus have cooked up a way to tell postgres to re-queue these for processes. CC'ing them.
Comment 19•13 years ago
|
||
Note that it's Nightly (9.0a1) and Aurora (8.0a2) that need reprocessing for this timeframe. And up to somewhere between midnight and morning Pacific today should be enough (not exactly sure since when it's fixed for new repots).
Comment 20•13 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #17) > I'm guessing it's expected behavior for crash-stats to not go back and fill > in the data on reports generated during the problem interval. In any case I > don't think there's anything more I can do here. Closing this out. This is correct, FWIW. We only use the symbols at the time of processing. We can re-process crashes to pick up new symbols, but it doesn't happen automatically.
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•