Closed Bug 563522 Opened 14 years ago Closed 13 years ago

Add Jetpack common file checks to code validator

Categories

(addons.mozilla.org Graveyard :: Admin/Editor Tools, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
6.0.12

People

(Reporter: jorgev, Assigned: basta)

References

Details

(Whiteboard: [required amo-editors][patch][needs review])

Attachments

(1 file)

We need to do a checksum verification to all common Jetpack files in generated extensions, in order to ensure that they aren't tampered with. The Jetpack SDK is kind of a moving target at the moment, but I think it should be possible to have something we can update whenever the SDK is updated, even if only a few weeks later.
Moving to 5.12. Busy month...
Target Milestone: 5.11 → 5.12
The Jetpack SDK is changing quickly, but it will hopefully reach some sort of stability once Firefox 4 is released. We should do this then, and using the new code tools.
Assignee: jorge → mbasta
Status: ASSIGNED → NEW
Target Milestone: 5.12 → 4.x (triaged)
In response to Wil:
It would be better to hash each file individually and whitelist the hashes. While it would be nice to be able to whitelist a directory, there's no way to be absolutely certain that its contents haven't been meddled with. I have a tool that will hash a whole directory's worth of files; if someone can point me to the latest SDK, I can try to get on this ASAP.
Target Milestone: 4.x (triaged) → Q1 2011
Fixed:

https://github.com/mattbasta/amo-validator/commit/94e9ffee9267f45739b91081d397b902d6ae75fb

All non-essential files (things that aren't install.rdf, chrome.manifest, etc.) are now hashed before validation and tested to determine whether the hash is whitelisted. If it is, the file is skipped.

I downloaded the Jetpack SDK and extracted the /packages folder. I then wrote a little Python script that recursively hashes all the files in a directory. The tool is in the /extras directory (build_whitelist.py).

Anyway, if there's any need to add additional files to the whitelist or a new version of the Jetpack SDK comes out, shoot me an email or file a bug listing exactly what needs to be included.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: Q1 2011 → 5.12.6
Attached file Validator output —
If I understand your patch correctly, it simply hides known versions of SDK files from the validator, but doesn't notify us of our existence, or skip/identify them in the source viewer. If I'm correct, I don't think that this is a useful solution. It means that we still have to manually verify that the files are safe or authentic, which seems absurd when we have a checksum database and validator in existence, and impractical at any rate.

Ideally, I think, we should have a database of checksums along with a description of where that checksum comes from so we can extend it to include external libraries like jQuery, which are commonly included by authors. Sources of files should be reported to editors, and matching files should be perhaps highlighted differently in the source code viewer. It would be a good idea to pre-convert newline characters to a pre-defined format, since a lot of archiving tools alter them, but I can open a different bug for that.

Attached is the output from the validator I use for a Jetpack add-on.
Perhaps I misunderstood the purpose of the fix. My understanding was to prevent the Jetpack code from throwing errors in the validator. As it stands, if a file is hashed and is on the whitelist, it is indeed safe (assuming the files in the SDK are all safe). You can't fudge the hash (well you could, but if you have the CPU power to do it, you'd might as well be cracking into the CIA or something).

From a practical standpoint, the validator doesn't provide a comprehensive output of each individual file. It does check files for malignant code, which could help with the use case you describe.

To the best of my knowledge, there's no way to show that any given file is a tampered-with version of a Jetpack SDK file. It wouldn't match any checksum, and for all the validator knows, it could just be a user-provided JS file.

It seems like you're after a different kind of tool than what the validator provides. This validator provides no reports on full source codes, nor does it provide reports on paths for files. It's designed to give a pass/fail validation of add-ons as a quick preliminary scan. It is NOT designed as a tool to inspect add-ons past the automated test. If you wish to have different or modified functionality, I would suggest filing a bug for what you're specifically interested in seeing come out of the project.


Also--for the record--jQuery is disallowed by the validator as part of another series of tests. The validator blacklists all major JS libraries in much the same manner that Jetpack code is whitelisted.
While it's convenient that the validator hides all the warnings that are being generated by the SDK code, it's also necessary to make sure those are actual SDK files, not modified versions of them. If we trust the validator and only look at the "user" file, then a malicious developer could easily tamper with the SDK files to do bad things. We need to make sure those files have not been modified.

(In reply to comment #10)
> Also--for the record--jQuery is disallowed by the validator as part of another
> series of tests. The validator blacklists all major JS libraries in much the
> same manner that Jetpack code is whitelisted.

That shouldn't be the case. Add-ons are allowed to include JS libraries, even if we generally discourage them. Just like I would expect for Jetpack, the remora validator basically used checksums to verify the files are in fact what they claim they are.
(In reply to comment #11)
> While it's convenient that the validator hides all the warnings that are being
> generated by the SDK code, it's also necessary to make sure those are actual
> SDK files, not modified versions of them. 

Matt, instead of skipping the files it sounds like the validator needs to raise an error or warning if a common file does match the whitelisted hash.

One caveat is we don't want to lock a dev out of submitting an add-on the day a new jetpack sdk comes out if the validator has not updated the whitelists for it yet.
We definitely need to maintain close communication with the SDK dev team when they are planning on releasing a new version.
(In reply to comment #11)
> While it's convenient that the validator hides all the warnings that are being
> generated by the SDK code, it's also necessary to make sure those are actual
> SDK files, not modified versions of them.

Since we're only whitelisting files that match known hashes of SDK files, modified or tampered SDK files will still be inspected by the validator using the standard validation tests.

> Add-ons are allowed to include JS libraries, even
> if we generally discourage them.

This was part of the Google Doc that contained the spec for the validator:

https://docs.google.com/Doc?docid=0Abey5P3VyBI4ZGdyZHNuN2pfMTIzZHFxZzdic3I&hl=en

If it should be changed, please let me know.

(In reply to comment #12)
> Matt, instead of skipping the files it sounds like the validator needs to raise
> an error or warning if a common file does match the whitelisted hash.

From what I can tell, the paths for Jetpack files are not directly mapped (directory for directory) to their SDK counterpart, making it difficult, if not impossible to directly compare SDK files to their counterparts in the add-on. I guess the problem is actually determining whether a file is a common file or not. If the hash for a file is not in the whitelist, that doesn't mean it's an error.

Based on the attachment, it seems that the tool that generates that output matches the hashes of each file to its SDK counterpart, and lists all of the files that don't match files from an SDK. That wouldn't work for the validator because the files listed as non-matches are things like install.rdf, favicon files, settings, files, etc.

Additionally, Jetpack add-ons are not distinguished from standard extensions, meaning a normal extension would light up like a Christmas tree, listing each file in the add-on as a non-Jetpack SDK file.

Perhaps this would require inspection of the package.json file or some other file in the add-on. If this is the case, I'm going to need someone to point me towards some documentation or instructions.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #14)
> (In reply to comment #11) 
> > Add-ons are allowed to include JS libraries, even
> > if we generally discourage them.
> 
> This was part of the Google Doc that contained the spec for the validator:
> 
> https://docs.google.com/Doc?docid=0Abey5P3VyBI4ZGdyZHNuN2pfMTIzZHFxZzdic3I&hl=en
> 
> If it should be changed, please let me know.

The doc doesn't say that libraries are not allowed, as far as I can tell. It does say that it's necessary to compare them with the checksums, which is what we do in remora and what we need in zamboni.
 
> Additionally, Jetpack add-ons are not distinguished from standard extensions,
> meaning a normal extension would light up like a Christmas tree, listing each
> file in the add-on as a non-Jetpack SDK file.
> 
> Perhaps this would require inspection of the package.json file or some other
> file in the add-on. If this is the case, I'm going to need someone to point me
> towards some documentation or instructions.

I think the presence of harness-options.json in the root of the package should serve to identify jetpack add-ons.
(In reply to comment #14)
> (In reply to comment #12)
> > Matt, instead of skipping the files it sounds like the validator needs to raise
> > an error or warning if a common file does match the whitelisted hash.
> 
> From what I can tell, the paths for Jetpack files are not directly mapped
> (directory for directory) to their SDK counterpart, making it difficult, if not
> impossible to directly compare SDK files to their counterparts in the add-on. 

Ah, this makes sense.  Skipping them sounds fine then.  As you mentioned, any tampered with files will undergo validation for malicious code, etc.
The patch as it stands is fine and is enough to close this bug.  Jorge does mention a good point that unless a new/modified jetpack file raises a warning, no one will know it was modified.  The ideal solution here would be:

1) Build a complete whitelist of the jetpack sdk on a per version basis (we may be supporting several versions at once.  oh yes.)  For example: hash_whitelist_jetpack_sdk_0.9

2) Detect a jetpack add-on (the harness-options file is good)

3) Detect what version of the SDK it's using (I have no idea how to do this, see bug 623633).  If we have to, start hashing and see which (if any) whitelist matches.

4) After you've finished hashing, compare the known good whitelist to the user input.  If any files on the known good whitelist haven't been found, that's a warning (a basic check for library completion and/or tampering).

Let me know the ETA on this change.  If it's long, we may need to kick it to 5.12.7.
(In reply to comment #17)
> ...Jorge does
> mention a good point that unless a new/modified jetpack file raises a warning,
> no one will know it was modified.

I thought Matt had a good point about this.  Even though the validator won't specifically say "error: file foobar was modified" it will not skip foobar.  Thus, it will run foobar (the modified file) through normal validation and that will check for malicious code or anything suspicious.
(In reply to comment #18)
> (In reply to comment #17)
> > ...Jorge does
> > mention a good point that unless a new/modified jetpack file raises a warning,
> > no one will know it was modified.
> 
> I thought Matt had a good point about this.  Even though the validator won't
> specifically say "error: file foobar was modified" it will not skip foobar. 
> Thus, it will run foobar (the modified file) through normal validation and that
> will check for malicious code or anything suspicious.

Right, but if the (mal|susp)icious code is written in a way to avoid setting off warnings, we get no notification that there was modification.  The point of this bug is to verify the jetpack code is unmodified and complete.
Kris: Could you send me a link to the tool that generated the output that you posted? I'd like to try to base some validator code on it, if I could.
The patch we have now is a good start and is in the code.  I'm bumping this to .7 so basta can continue the stuff in comment 17
Target Milestone: 5.12.6 → 5.12.7
Target Milestone: 5.12.7 → 5.12.8
Target Milestone: 5.12.8 → 5.12.9
Kris, can you send us the source for what made the validation output in comment 9?
Target Milestone: 5.12.9 → 5.12.10
Target Milestone: 5.12.10 → 5.12.11
What's the status of this bug?
In progress. Hopefully I'll have something for this week coming up.
Target Milestone: 5.12.11 → 5.12.12
I've got a database built (and a script to build the database from git) that contains all the paths and hashes for all JS files in the Jetpack SDK for every release (tag on Github).

What I'm running into (looking at comment 17) is that there's really no way to determine if a piece of the SDK is tampered with by checking if the file's hash isn't present. The addon-sdk's `cfx xpi` command only bundles packages that are used by the add-on. This means that unless cfx bundles every script in the SDK, there's no way we're going to be matching all of them.

Based on the output that Kris attached way back when, you can even see that only a very small subset of scripts are included. In addition, most (if not all) of the test packages aren't included.

If I'm not missing something or completely misinterpreting the problem, what would need to be built is something to parse the harness-options.json file, determine which packages were included by cfx, and test each of those packages and their dependencies.

It's also interesting to note that harness-options includes a (sha256?) hash for each of the files that it lists in the manifest. Perhaps the best solution would simply be to a.) Test that the given hash matches the hash of the file b.) Test that the given hash is in the hash database that we have generated from the SDK and c.) Test that the file's name matches up with the corresponding file's name in the database.
(In reply to comment #25)

> It's also interesting to note that harness-options includes a (sha256?) hash
> for each of the files that it lists in the manifest. Perhaps the best solution
> would simply be to a.) Test that the given hash matches the hash of the file
> b.) Test that the given hash is in the hash database that we have generated
> from the SDK and c.) Test that the file's name matches up with the
> corresponding file's name in the database.

That sounds alright, but I'd like someone more familiar with jetpacks to comment.
Target Milestone: 5.12.12 → 4.x (triaged)
(In reply to comment #26)
> (In reply to comment #25)
> 
> > It's also interesting to note that harness-options includes a (sha256?) hash
> > for each of the files that it lists in the manifest. Perhaps the best solution
> > would simply be to a.) Test that the given hash matches the hash of the file
> > b.) Test that the given hash is in the hash database that we have generated
> > from the SDK and c.) Test that the file's name matches up with the
> > corresponding file's name in the database.
> 
> That sounds alright, but I'd like someone more familiar with jetpacks to
> comment.

While I'm no expert on Jetpacks, this sounds good to me.
Target Milestone: 4.x (triaged) → 6.0.7
(In reply to comment #25)
> It's also interesting to note that harness-options includes a (sha256?) hash
> for each of the files that it lists in the manifest. Perhaps the best solution
> would simply be to a.) Test that the given hash matches the hash of the file
> b.) Test that the given hash is in the hash database that we have generated
> from the SDK and c.) Test that the file's name matches up with the
> corresponding file's name in the database.

That seems reasonable to me as well.  harness-options.js's `manifest` is the transitive closure of all modules the addon seems like it is going to load, so the modules it describes are the ones you want to check.  And this plan for checking them seems like a good one.

Note that the SDK doesn't currently prevent addons from loading modules that aren't in the manifest.  And we obviously need to fix that in order for this plan to actually work.  But Brian is going to fix that soon as part of a broader change, at which point addons will only be able to load those modules that are listed in its manifest.

That broader change, incidentally, is to stop bundling modules that an addon doesn't use (bug 627607).  The last time we spoke, Brian wasn't planning to change the locations of the modules nor the format of their resource: URLs.  I'm not sure if he's changing the format of the manifest, though.

Brian: can you confirm?


Also note that you must additionally check /bootstrap.js and /components/harness.js, which are responsible for reading the manifest and loading the addon.  Otherwise, a malicious addon could hack those files to load something different from what the manifest says it is loading.
Yeah. The plan is to have the (machine-readable JSON) manifest contain a
description of every part of the add-on beyond the basic loader. When the
validation tool realizes that it's dealing with a jetpack addon, it should:

 * figure out what version of the SDK produced the XPI (currently in
   harness-options.json in the "sdkVersion" property, might move in the
   future), use it to find a table of known files for that version
 * assert that the loader files (bootstrap.js, components/harness.js, a few
   others) are present and that their contents match the known files
 * load the manifest, walk through all items, build up a table of modules.
   For each entry:
   * compare the "jsSHA256" hash in the manifest against the contents of the
     file in the .XPI, reject if they differ (this indicates malicious
     deception: the XPI was modified after the SDK finished generating it)
   * compare the hash against the table of known files. If found, replace the
     entry with a note that says "module XYZ from the SDK"
 * present the reviewer with the table of modules: either recognized as being
   from the SDK, or new (and thus requiring review and analysis)

Take a look at
http://people.mozilla.com/~bwarner/jetpack/components/analysis-tool.html for
early notes about the sort of reviewer-helper tool that I've been aiming at.
The hashes and data structures in the manifest are designed to support this
sort of tool. Note that there are two sorts of review going on: review of
individual modules to figure out what they do (which can hopefully be
amortized among multiple addons), and review of the overall "table of
modules" to figure out how they're supposed to interact. The latter is really
the interesting part.

The linker change I'm hoping to land next week (bug 627607) will indeed stop
including unused modules: the XPI should contain just the modules which are
actually used (plus the loader, the manifest, and metadata like install.rdf).
I'm going tl leave the URLs alone for 627607, but I hope to change them
shortly afterwards to make the XPI easier to work with (specifically to make
it possible to leave the XPI packed, instead of unpacking it upon install).

I hope to change the format of the manifest slightly too, but maybe not
before we release 1.0 . Currently the manifest is contained in the "manifest"
property of harness-options.json . I want to move the manifest out to a
separate file ("manifest.json"), so I can put a signature on it that's
independent of the other stuff in harness-options.json . I'm pretty happy
with the basic format (a dict, indexed by the URI that points into the XPI
file for each module), but I might clean up the property names or add a few
before everything settles down.

Regarding addons loading off-manifest modules: yeah, the intention is to
reject modules which do that. The SDK scans each module for require()
statements when it builds the XPI (that's how it constructs the manifest and
decides what to copy into the XPI), so any module which uses dynamic tricks
to load something that's not on the manifest should both fail at runtime and
be rejected by review.
Target Milestone: 6.0.7 → 6.0.8
Blocks: 616669
Target Milestone: 6.0.8 → 6.0.9
Priority: P3 → P1
Target Milestone: 6.0.9 → 6.0.10
Hey folks, I've got some code that should be sufficient:

https://github.com/mattbasta/amo-validator/commit/6975b32fdc4a1bd10d17f1e45cfc16fe6bea79de

If someone wants to look at it and make sure it's acceptable before I start writing tests for it, that will save some headaches. If I don't get any protests, I'll go ahead and try to get this done today.

Here's some sample output. It only shows when verbose mode is enabled:


Jetpack add-on detected.
Identified files:
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-api-utils-data/bootstrap-remote-process.js
  ./packages/api-utils/data/bootstrap-remote-process.js : 1.0b5rc2
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-api-utils-lib/preferences-service.js
  ./packages/api-utils/lib/preferences-service.js : 1.0b5rc2
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-api-utils-lib/unit-test.js
  ./packages/api-utils/lib/unit-test.js : 1.0b4rc3
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-tests/
  ./python-lib/cuddlefish/tests/bug-588119-files/packages/no-icon/lib/main.js : 1.0b5rc2
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-api-utils-lib/errors.js
  ./packages/api-utils/lib/errors.js : 1.0b5rc2
 ...

Unknown files:
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/js/options.js
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/js/jquery.md5.js
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/edit.html
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/js/detect.js
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/css/options.css
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/css/main.css
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/js/jquery-ui.js
 resources/jid0-gxjllfbcoax0lcltedfrekqdqpi-as-ff-data/js/autogrowinput.js
 ...


All of this is also available in the JSON metadata under:

is_jetpack
jetpack_sdk_version
jetpack_unknown_files
jetpack_identified_files
jetpack_loaded_modules
That's great.  I think Kumar is out for the day, but I'll ask him to check it out on Monday.
I've just added some tweaks, the tests, and much-needed improvements:

https://github.com/mattbasta/amo-validator/commit/2fb4b6505e60d20c7a0a079f49ce38fa5e86cbb1

A few notes:

- harness-options.js won't be listed as an unknown file, since we're parsing it and inspecting its contents automatically.
- Files found to be Jetpack files won't be inspected in the second tier of validation (since they're part of the SDK).
- Only JS files are tested to determine whether they're part of the SDK, though any non-image file will be listed in the metadata as an unknown file. No errors/warnings/failures will be raised for these, however.
- .DS_Store files will not be listed as unknown files.
- Items in the harness-option.json's manifest section must be resource:// URIs. Anything else is considered dangerous.
Whiteboard: [required amo-editors] → [required amo-editors][patch][needs review]
Target Milestone: 6.0.10 → 6.0.11
Got caught up this week, we'll get this next week.
Target Milestone: 6.0.11 → 6.0.12
Merged:

https://github.com/mozilla/amo-validator/commit/a6b4c63155e7a273aa202f5dea11bcaa97754c4b
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: