Closed Bug 829690 Opened 7 years ago Closed 5 years ago

create_talos_zip.py incorrectly packages binaries on Windows, resulting in: "OSError: [Errno 13] Permission denied" at mozcrash.py line 115 in the check_for_crashes minidump stackwalk call

Categories

(Testing :: Talos, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: talos-android)

Attachments

(1 file)

Android Tegra 250 mozilla-inbound talos remote-tsvg on 2013-01-11 09:38:35 PST for push 6be5ab4c2fde

slave: tegra-239

https://tbpl.mozilla.org/php/getParsedLog.php?id=18719819&tree=Mozilla-Inbound

{
NOISE: __end_tp_report
NOISE: __start_cc_report
NOISE: _x_x_mozilla_cycle_collect,2944
NOISE: __end_cc_report
NOISE: __startTimestamp1357897693007__endTimestamp
NOISE: 
NOISE: __startBeforeLaunchTimestamp1357926237802__endBeforeLaunchTimestamp
NOISE: __startAfterTerminationTimestamp1357926499699__endAfterTerminationTimestamp
getting files in '/mnt/sdcard/tests/profile/minidumps/'
Traceback (most recent call last):
  File "run_tests.py", line 308, in <module>
    main()
  File "run_tests.py", line 305, in main
    run_tests(parser)
  File "run_tests.py", line 258, in run_tests
    talos_results.add(mytest.runTest(browser_config, test))
  File "/builds/tegra-239/talos-data/talos/ttest.py", line 410, in runTest
    self.cleanupAndCheckForCrashes(browser_config, profile_dir, test_config['name'])
  File "/builds/tegra-239/talos-data/talos/ttest.py", line 162, in cleanupAndCheckForCrashes
    test_name=test_name)
  File "/builds/tegra-239/talos-data/talos/mozcrash.py", line 115, in check_for_crashes
    stderr=subprocess.PIPE)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 623, in __init__
    errread, errwrite)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 1141, in _execute_child
    raise child_exception
OSError: [Errno 13] Permission denied
program finished with exit code 1
elapsedTime=288.874387

========= Finished 'python run_tests.py ...' failed (results: 2, elapsed: 4 mins, 48 secs) (at 2013-01-11 09:48:22.287576) =========
}
Temporarily backed bug 829580 out to see if it helps with this (struggling to see how it could be the cause, but worth a shot).
Hmm they seem to have died down after the backout :-/

This therefore must be due to one of:
https://github.com/mozilla/mozbase/commit/80752fd0f0d3887f2c563a27c03e9a2624cccf70
https://hg.mozilla.org/build/talos/rev/bb89d81b4624
Blocks: 829580
Diffing the former in-production talos.zip against my new one confirms there are only two changes and nothing else snuck in. I've sent each of these to Try to see which it is:

Just the mozcrash s/log/print/ change:
https://tbpl.mozilla.org/?tree=Try&rev=0c689e1738c9

Just the "pass test_name" change:
https://tbpl.mozilla.org/?tree=Try&rev=0a0bb9406105
maybe you are creating the talos.zip incorrectly?
Seeing how weird this is, I'm not going to rule anything out.

That said, I extracted the zip create_talos_zip.py create for me, as well as the one being used in production, before doing the diff. So if my zip was somehow corrupt or otherwise missing files, then that would have shown up.

Boo talos, boo Android!
(In reply to Joel Maher (:jmaher) from comment #7)
> maybe you are creating the talos.zip incorrectly?

I've generated a zip of what should be the same as production, to see if this is the case:
https://tbpl.mozilla.org/?tree=Try&rev=e819ed9f083a
And I get the error on that run too (the one supposedly identical to what works on production; apart from the zip being generated on my [windows] machine):
https://tbpl.mozilla.org/php/getParsedLog.php?id=18780742&tree=Try

Hilarity.
Bah, forgot the breakpad binary was in the talos repo too - which leads to the pretty obvious (in retrospect) conclusion that the permissions on the binary are being set incorrectly when creating talos.zip on Windows.

We'll soon be switching to using breakpad binaries out of the tools repo - at which point this will be a non-issue.
OS: Android → All
Hardware: ARM → All
Summary: Intermittent Android "OSError: [Errno 13] Permission denied" at mozcrash.py line 115 in the check_for_crashes minidump stackwalk call → create_talos_zip.py incorrectly packages binaries on Windows, resulting in: "OSError: [Errno 13] Permission denied" at mozcrash.py line 115 in the check_for_crashes minidump stackwalk call
No longer blocks: 829580
We need to set the ZipInfo external_attr on Windows on the breakpad files.

Something like:
http://stackoverflow.com/questions/434641/how-do-i-set-permissions-attributes-on-a-file-in-a-zip-file-using-pythons-zip
(This isn't an intermittent failure bug, this is about creating a package for talos locally)
(In reply to Ed Morley (Away 29th-1st, UK public holiday) [:edmorley UTC+0] from comment #16)
> (This isn't an intermittent failure bug, this is about creating a package
> for talos locally)

Fair enough, filed bug 856687.
No longer blocks: 856687
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> (In reply to Ed Morley (Away 29th-1st, UK public holiday) [:edmorley UTC+0]
> from comment #16)
> > (This isn't an intermittent failure bug, this is about creating a package
> > for talos locally)
> 
> Fair enough, filed bug 856687.

Just to give some more context - this bug is "if someone working on Talos on their local machine tries to create a new talos.zip to upload to build.m.o (and then update the in-tree talos json reference to use the new zip), test runs will fail, since the script in the talos repo that creates talos.zip (which is only ever run locally) is busted when run on Windows".
We could have the script err out if run on windows.  I don't really have an easy solution nor understand the problem here.  If someone who uses windows and wants to fix the script is inclined, that would be best; but I don't use windows regularly. I'm not even 100% sure it works on mac!  While this is awful, the whole script was yet-another-quick-hack-that-would-be-gone soon, as context.
Depends on: 857039
(In reply to Jeff Hammel [:jhammel] from comment #21)
> We could have the script err out if run on windows

WFM; patch in bug 857039 - will leave this bug open for the fix, should anyone be inclined (but likely not worth the effort for now).
Here's a candidate patch. It seems to work for me, I created a Talos zip on my windows machine, and running zipinfo on it on my Linux machine shows:
-rwxr-xr-x  2.0 unx  1803072 b- defN 13-Apr-03 14:20 talos/breakpad/linux/minidu
mp_stackwalk
-rwxr-xr-x  2.0 unx  1926016 b- defN 13-Apr-03 14:20 talos/breakpad/linux64/minidump_stackwalk
-rwxr-xr-x  2.0 unx  1316240 b- defN 13-Apr-03 14:20 talos/breakpad/osx/minidump_stackwalk
-rwxr-xr-x  2.0 unx  1316240 b- defN 13-Apr-03 14:20 talos/breakpad/osx64/minidump_stackwalk

The implementation here is pretty simplistic, it simply takes all files whose basename is "minidump_stackwalk" and stores them with unix permission 0755.
Whiteboard: talos-android
moving the remaining android talos tests to autophone this quarter, no need for talos.zip then.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.