As per https://bugzilla.mozilla.org/show_bug.cgi?id=546010#c12 (warning: brain damage), we should do a better job flagging executable binaries. I'll try to find a comprehensive list of executable file extensions, and include the ones that make sense for the validator. Wenzel also suggested running 'file' on the files. I'm not sure how difficult that may be with the current validator design.
Reassigning to basta, who is working on the new awesome validator.
Currently, the new validator seeks out extensions on files: "dll", "exe", "dylib", "so", "sh", "class", "swf" As a side-note, the validator doesn't currently check inside ZIP files for blacklisted files, though it will check inside JAR and XPI files. This should probably also be added. http://github.com/mattbasta/amo-validator/blob/tracemonkey/validator/testcases/packagelayout.py#L24 The validator, as it stands, does not currently extract the add-on to a directory. Everything is managed in-memory. Files are extracted to memory streams if they require further inspection. This would make it difficult to run any sort of non-Python-based operation on files. Perhaps this could be achieved with memory-mapped files? Additionally, many of the file types that ideally would be detected likely wouldn't be properly identified by the "file" command. For instance, SWF files would likely be identified simply as "data". Windows-specific files may not be identified properly. Also, another concern would be false positives. JS 1.9 looks, in some circumstances, a LOT like C/C++ code; packed scripts may have the appearance of being text; etc. etc. Perhaps a smart way to go about this would be to prepare a list of "expected" types (JS, various image formats, etc.) and attempt to confirm files with those extensions (in-memory) using a Pythonic method. For files whose extensions aren't expected, we could then run some sort of simple heuristic. Any file whose extension doesn't match its detected type will throw a warning, and any file whose type is unexpected and whose contents appear suspicious will throw an error. Perhaps I'm over-complicating this. Feedback?
I like the idea of reversing it to more of a whitelist behavior. Sounds more reliable, even if it is a little noisier.
If the conclusion of this bug gets anywhere past looking at file extensions, this becomes a low priority enhancement. Developing any kind of heuristic stuff is a lot of investment for little return. Using an existing library is more appealing, but still a low priority.
Having a whitelist of file types is definitely the best means of making this work well. The obvious downside to it, though, is that someone could simply throw executable binary content into, say, a PNG file. Also, if we're whitelisting, it means that we have to block things, which would necessitate a heuristic. If it's so easy to get around having your file tested (changing the extension), we'd might as well scan every file. I feel like the noise from this test (and the time required to execute it during validation) would simply be a hinderance. If it was an epidemic on AMO, I'd say let's go for it. In the words of Bert Lance, "If it ain't broke, don't fix it."
Just pushed a new test to the validator: https://github.com/mattbasta/amo-validator/commit/e2bae4d74afef33f0762555fe791a3e98456fead This will examine the magic number of each file in the package (only unpacking the first four characters for performance). If it matches the bad ones, it throws an error. The only other thing that we could possibly do is byte frequency analysis, but that's extremely slow: we would need to unpack every file in the package, calculate the byte frequencies, and then run a comparison against numerous rounds of tests to avoid false positives. Conservatively, such an item in the validation process could cause validation to take upwards of a minute for each add-on. It's impractical to even consider something like that for the future. Anyway, this is about as good as we can get for now. It's pretty simple to disguise the magic number, but this at least prevents folks from simply changing the extension to hide their executable files.
Reclassifying editor bugs and changing to a new whiteboard flag. Spam, spam, spam, spam...