Closed Bug 669947 Opened 13 years ago Closed 13 years ago

Deploy minidump stackwalker to new vhost on build.mozilla.org

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Unassigned)

References

Details

(Whiteboard: [buildbot-configs])

Can we create a new cname and vhost on build.mozilla.org called stackwalker.pvt.build.mozilla.org

This should be accessible only to machines in the build network.

The source is located at http://hg.mozilla.org/users/tmielczarek_mozilla.com/minidump-stackwalk-cgi/file/3eb9c6e47b97

build hosts need to be able to hit http://stackwalker.pvt.build.mozilla.org/stackwalker.cgi

ted, any special deployment instructions?
You need a minidump_stackwalk binary, you can grab the one from build/tools:
http://hg.mozilla.org/build/tools/raw-file/5998154615cf/breakpad/linux/minidump_stackwalk

You'll need to copy config.py.in -> config.py and adjust the paths. MINIDUMP_STACKWALK should point to the binary from above, and SYMBOL_CACHE_PATH needs to be a writable directory where the script can store symbol files. We'll probably also need to set up a cron to remove old files from that directory.
Assignee: server-ops-releng → server-ops
Component: Server Operations: RelEng → Server Operations
QA Contact: zandr → mrz
Assignee: server-ops → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
I can take care of the webby parts of this - I'll need infra folks to take care of the DNS parts.
Assignee: server-ops-releng → dustin
I added a CNAME for build.mozilla.org for stackwalker.pvt.build.mozilla.org.
vhost is up at:
  http://stackwalker.pvt.build.mozilla.org/index.txt
accessible only from the build network.

I followed the instructions in comment 1.  SYMBOL_CACHE_PATH is /builds/stackwalker_symbols, since there was lots of space on that partition.  The CGI is now up at:
  http://stackwalker.pvt.build.mozilla.org/stackwalk.cgi

However, it seems to expect hashlib, which is not available in Python-2.4.3, which is what's installed on this system (it's new in Python-2.5).  That's usually pretty easy to replace with the md5 module.  Can you fix that up and I'll put a new copy on there?

What sort of old files should the crontask look for?  Just by date, or only certain files?
I wrote up docs at
 https://mana.mozilla.org/wiki/display/SpecOps/Stackwalker+CGI

Still to do, once it's working:
* puppetize (I'll learn how this works on bug 604688)
* set up and document crontab
I pushed a fix to use md5 if hashlib isn't available, so it should work on Python 2.4 now.

It looks like I was thinking ahead when I wrote the CGI, and it updates the mtime on the cache directory if it uses an already-downloaded set of symbols, so you should be able to find directories that are immediate children of SYMBOL_CACHE_PATH and rm any whose mtime is older than whatever time period we decide. (24 hours?)
OK, I think the CGI is working.  I get:

  Error: no minidump or no symbols

Also, I set up the following in cron.d:

MAILTO=dustin@mozilla.com
@daily root find  /builds/stackwalker_symbols/ -mindepth 1 -maxdepth 1 -mtime +7 -type d -exec rm -rf \{} \;

once I don't get any interesting emails, I'll change the MAILTO to release@

I think this is it for the ops side of things.  Next steps:

 - do a test stackwalk to ensure permissions are correct, etc. (I can do this if you tell me what to type, or I can get you Build VPN access with releng's permission)

 - adjust the buildbot config to use this (a releng project)
Okay, try this. Clone the minidump-stackwalk-cgi repo to your local machine (with VPN access to the machine hosting the CGI). Download this mindiump file locally:
http://people.mozilla.com/~tmielczarek/6404faf2-deac-09f3-6f26264d-562d535c.dmp

Then run:
python testsubmit.py 6404faf2-deac-09f3-6f26264d-562d535c.dmp http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1311761319/firefox-8.0a1.en-US.linux-x86_64.crashreporter-symbols.zip http://stackwalker.pvt.build.mozilla.org/stackwalk.cgi

And pastebin the output, link it to me on IRC.
So there was no output from this.  Internally, it's running

/var/www/html/stackwalker/minidump_stackwalk /tmp/tmp2f1kt6 /builds/stackwalker_symbols/90a461124f840d5d74fff3a542fd261d

which, when run on the console, gives

/var/www/html/stackwalker/minidump_stackwalk: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by /var/www/html/stackwalker/minidump_stackwalk)
/var/www/html/stackwalker/minidump_stackwalk: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by /var/www/html/stackwalker/minidump_stackwalk)

So again we're being bitten by an ancient CentOS.  If there's source for this somewhere (maybe committed to the repo?), I'll be happy to recompile locally.  For the record:
  Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Yeah, you can grab it from SVN:
http://code.google.com/p/google-breakpad/source/checkout

configure && make and it will wind up in src/processor.
Success!

/var/www/html/stackwalker/minidump_stackwalk /tmp/tmpGbVkxU /builds/stackwalker_symbols/90a461124f840d5d74fff3a542fd261d
Operating system: Linux
                  0.0.0 Linux 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20 21:52:38 UTC 2011 x86_64
CPU: amd64
     family 6 model 15 stepping 11
     4 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 0 (crashed)
 0  libcrashme.so + 0x2f1
    rbx = 0x0000000000000008   r12 = 0x00007f52011abbb8
    r13 = 0x0000000000000004   r14 = 0x0000000000000001
    r15 = 0x00007fff514e7258   rip = 0x00007f521fe042f1
    rsp = 0x00007fff514e7020   rbp = 0x00007fff514e7040
    Found by: given as instruction pointer in context
 1  libxul.so!js::mjit::ic::NativeCall [MonoIC.cpp:0a936ddb70e9 : 1031 + 0x4]

........ etc. 


Over to release engineering to set this up in the buildbot configs, then.  I also updated the docs to indicate that this needs to be compiled.
Assignee: dustin → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
(In reply to comment #11) 
> Over to release engineering to set this up in the buildbot configs, then.  I
> also updated the docs to indicate that this needs to be compiled.

What exactly does releng need to do here? Just set SYMBOL_CACHE_PATH in the env (since I think we already set MINIDUMP_STACKWALK)?
OS: Linux → All
Priority: -- → P3
Hardware: x86_64 → All
Whiteboard: [buildbot-configs]
I think our end of things is bug 561754.
Yeah, this bug was just to set up the CGI.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.