Closed Bug 485336 Opened 12 years ago Closed 12 years ago

uploadsymbols failures when mozilla-central & tracemonkey upload at same time

Categories

(Firefox Build System :: General, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: nthomas)

References

Details

Attachments

(1 file)

Hit this a couple of times recently, eg 
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1238061720.1238064888.5771.gz
...
  inflating: libalerts_s.dylib/AFFC07027E0971FFED4931FBDE77F71B0/libalerts_s.dylib.dSYM.tar.bz2  
  error:  invalid compressed data to inflate
 bad CRC 00000000  (should be 4c73b72c)
file #17:  bad zipfile offset (local header sig):  598050
file #18:  bad zipfile offset (local header sig):  612710

I tried uploading that to stage and unzipping there and had no problems, same on the bm-xserve16 itself. Transmission errors ?

Similar problem in the Mar 23 nightly, bm-xserve19.

Any ideas Ted ? Or should be push this over to server ops to check on dm-symbolpush01.mozilla.org ?
I'm kind of stumped, I can't imagine how any changes I've made would cause CRC errors in the zipfile. The "bad CRC 00000000" seems pretty suspicious, too. I think IT should definitely have a look at this.
Looks like 
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/tools/upload_symbols.sh
is unconditionally removing the archive, so we can't look at what bits it got on upload. If it was still there we'd expect
 dm-symbolpush01.mozilla.org:
    /mnt/netapp/breakpad/symbols_ffx/
    firefox-3.6a1pre.en-US.mac.crashreporter-symbols.zip
to have a SHA1 sum of 6cfe3cb061a07360677ed5983a9322b21b792e06.

Is dm-symbolpush01 hale and hearty ? Would it have been affected by the eql storage issues ?
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
187999099 bytes in size.
Assignee: server-ops → aravind
This box seems to be fine.  The crash symbols are put on an nfs share anyway, so the recent eql problems shouldn't be affecting it.  The VM itself is on a netapp as well, so this shouldn't be an eql issue.

Also, I looked at the box briefly, and i/o wait etc, looks fine on the box.

From the servernames you posted, it looks like these are happening only on osX servers?  maybe something specific to that platform is off?
We hit this one more time,
  http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1238752924.1238755931.28579.gz
on April 3rd from bm-xserve17, so it's specific to xserves at this point. The build directory was still around so I tried uploading the symbols by hand, which worked fine on three attempts. 

I think it might be a problem with both mozilla-central and tracemonkey uploading symbols with the same filename, when both builds get allocated xserves and the build takes about the same time. The uploadsymbols calls were at
 tracemonkey - Start: Fri Apr 3 03:50:59 2009, End: Fri Apr 3 03:51:49 2009
 moz-central - Start: Fri Apr 3 03:51:25 2009, End: Fri Apr 3 03:52:51 2009
so it seems pretty likely that the moz-central uploadsymbols (which succeeded) clobbered the tracemonkey file during or after upload.

We already set MOZ_SYMBOLS_EXTRA_BUILDID=tracemonkey to get manifests named like this 
 firefox-3.6a1pre-Darwin-20090403030630-tracemonkey-symbols.txt
Could we include that variable  in the zip filename also ?

Thanks for looking Aravind.
Assignee: aravind → nobody
Component: Server Operations → Build Config
Product: mozilla.org → Core
QA Contact: mrz → build-config
Summary: uploadsymbols failures on mac mozilla-central nightlies → uploadsymbols failures on mac mozilla-central & tracemonkey nightlies
Version: other → Trunk
Another regression from bug 478221, fun! We can't change the zip filenames from the start, since Talos needs to be able to find and download them by name. We could, however, upload to a more-unique filename on the server to avoid these collisions. Should be easy enough to patch.
Assignee: nobody → ted.mielczarek
Blocks: 478221
Duplicate of this bug: 494606
Summary: uploadsymbols failures on mac mozilla-central & tracemonkey nightlies → uploadsymbols failures when mozilla-central & tracemonkey upload at same time
Tested on Linux - the symbol filenames seem to be the same on all platforms but the quoting should be OK anyway.

We have openssl on all our build platforms, and it's generally ubiquitous. I'm not sure that this change makes openssl a general build requirement anyway.
Attachment #379498 - Flags: review?
Attachment #379498 - Flags: review? → review?(ted.mielczarek)
As soon as I hit submit on this I had second thoughts, wondering if the symbol file hash can be identical if tracemonkey is sufficiently close to mozilla-central in code terms. But we should be OK by virtue of the zip format encoding timestamps.
Assignee: ted.mielczarek → nthomas
Comment on attachment 379498 [details] [diff] [review]
prefix target filename with sha1sum on scp

sha1sum might be a bit overkill, but it's not that big of a deal.
Attachment #379498 - Flags: review?(ted.mielczarek) → review+
Priority: -- → P3
Whiteboard: Waiting for tree to reopen
Will close this after confirming the next set of nightlies work OK.
Whiteboard: Waiting for tree to reopen
All the mozilla-central Firefox and XULRunner nightlies worked fine with this patch in. We didn't collide with tracemonkey but that'll be fixed now.

Noticed that XULRunner symbols are pretty wonky: linux is just the .txt manifest, windows is that plus MOZCRT19.pdb; Mac looks plausible. Do we care ? Should we just disable symbols for XULRunner nightly runs ?
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.