Closed Bug 776017 Opened 12 years ago Closed 12 years ago

Submit addon for admin review to bypass validation (fixes: Yoono addon fails on validation during upload)

Categories

(addons.mozilla.org Graveyard :: Add-on Validation, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
2012-10-18

People

(Reporter: eviljeff, Assigned: basta)

References

Details

Attachments

(1 file)

After uploading the xpi the validation always fails with a "timeout after 90 seconds" error.

I've reproduced the error on prod and staging.
basta, anything we can do to speed this up? Any immediate workarounds you can suggest for the developer?

$ time ./addon-validator -o json --determined ~/tmp/yoono_7.7.12_full.xpi
...
real	1m0.605s
user	0m45.263s
sys	0m13.983s
Of the 60 seconds that the the timeout is set to locally, 52 of those involve JS processing. Of those 52 seconds, ~30 are simply the calls out to Spidermonkey to parse the JS. 7 seconds alone are spent parsing JSON. There's a LOT of JS in the add-on.

My best advice is to simply cut out unnecessary JS and use approved libraries (like Underscore) whenever possible instead of reinventing the wheel. Concatenating JS files that are loaded together will also reduce overhead. Otherwise, this isn't really a defect in the validator, it's just that there's more work to be done than can be accomplished in the 90 seconds that the add-on has to validate.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
This shouldn't be a fatal error that prevents the add-on from being uploaded to AMO though.  At the very least it should let the upload proceed and we'll decide if the add-on has unnecessary JS and fits policy.  Ideally any results before the timeout is hit should be returned.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I stand by my original resolution of WONTFIX:

1. We can make the timeout message a warning, but then nobody would pay attention to it.
2. JavaScript validation happens at the very end of validation. If validation times out, the bulk of JS validation will simply get tossed out.
3. Compatibility bumps will also time out and compatibility warnings and errors will not be raised.
4. The add-on is 11.4MB (!) decompressed and the vast majority of that is JavaScript. If the validator isn't able to process the contents of the package in 90 seconds, it's doubtful that a human would be capable of finding a problem buried in all that code. (Not that I don't have faith in our reviewers, it's the sheer volume of the code) We've only ever seen one add-on like this before (which one escapes me), but we should have an alternate channel for getting add-ons like this reviewed.

As a sidebar, do we have policies relating to the performance impact of add-ons? If so, I'd say we should reject this one on those principles. The amount of resources consumed by this add-on are conceivably astronomical.
There isn't an alternative way of getting the Add-on on AMO. If you want to implement a checkbox we (Staff) can click to bypass validation, or an alternative upload form then I'm okay with that.

Yoono is a featured addon and has been on AMO for a long while.  The previous version was uploaded fine and doesn't seem significantly different in terms of scale.  The symptoms points to validator changes (perhaps more checks?) rather than add-on changes.

We don't need to review all the code - just what's changed - and we don't prohibit large and complicated addons by policy.  Performance concerns are out of the scope of this bug really.
Thanks Andrew !

Matt, we do take into account the performances recommandations for add-ons as much as we can, which is for instance why we don't merge all of our scripts, so that we can load some of them only if and when they are needed.
In this release, a large part of the modifications are related to namespacing some stuff to make sure there is no interference with other add-ons, as requested by the reviewer from the previous release.

Another large part of the modifications deal with css prefix removals in CSS: -moz-border-radius-bottomleft becoming border-bottom-left-radius so that the aspect is still fine in FF 14.
The rest is mostly the replacement of the Facebook REST API with Graph API since the REST API is deprecated.

The size of the add-on is very similar to the previous one if not even a little smaller.

If we did sometimes re-invent the wheel, it's because Yoono came before the wheel (namely Underscore), at which point things like "prototype" wouldn't do performance-wise as well as security-wise since it contained evals, for instance.

Yoono is very rich in features (try it ;), which is why it's big. I guess you don't assume it's all junk that we forgot to delete :)
It has been proved to make people stick to Firefox when Firefox-based Rockmelt browser was first released :)

However I did find a few obsolete files, that lower the size of the XPI by about 60Kb, which is not much, and won't still pass validation, but we are working on a new feature that is really big and not even included in this release...

If no other (or just one other) extension hits the timeout issue, then increasing it should have no impact...
 
We are aware it's a lot to review, which is why we do provide by email a diff file for each new submission we make, to help reviewers deal with the volume.

Thank you all for your understanding !
Xavier: Thanks for adding your feedback. To address a few things:

- The validator is unlike Firefox or other Mozilla apps in that the performance of the script itself doesn't impact execution time, the size of the script does. Even if files aren't loaded by the add-on, they still get validated. Our custom JS interpreter isn't built to accurately execute your code, it's designed to find patterns that could cause problems with your add-on's execution.

The reason we're sensitive to having many files is because--in order to provide you with useful filenames, context snippets, and line numbers in your validation report--we have to fire up an instance of Spidermonkey for every chunk of JS in your add-on. The more chunks of JavaScript, the more calls to Spidermonkey.

One recommendation I'd make is to try to concatenate JavaScript files that would otherwise be loaded together. By concatenating these, fewer (expensive) calls to Spidermonkey need to be made, fewer strings need to be built, allocated, and deallocated, etc., helping the validation to move a lot more quickly.

I would also highly recommend looking into shimming libraries like Underscore or jQuery in where you can. I recognize that you've got a lot of legacy code, but each file that can be replaced with a library gets immediate perf benefits (both during validation and likely during actual usage). The validator knows to ignore files that it recognizes, preventing that code from being validated unnecessarily. I see that you've already got a copy of jQuery included, though it's not a canonical copy. Consider downloading the official 1.6.3 release and including it (unmodified) instead.

Another way to improve validation performance is to minimize the number of warnings and notices being generated. An example that I can see from the validation output is to remove json.js, JSON.js, and json2.js (which are rendered rather defunct in the last 15 versions or so of Firefox), which generate a handful of errors. Each bit of output requires the validator to grab a copy of the file and clip it to form a context snippet.

I'll talk to Kumar about getting the timeout raised, but that's a temporary (and with a big enough bump, potentially dangerous) fix. The timeout is to catch bugs in the validator that lead to infinite loops and to prevent add-ons that consume incredible amounts of resources from fubaring the celery boxes. There's also a timeout on celery tasks themselves which will cause validation to fail in nondescript, 500-error kinds of ways. Having a validator-level timeout prevents that from happening and returns (at the very least) a list of problems found up until the timeout occurred.

In the mean time, try refactoring with some of the tips I mentioned above.
As a temporary workaround we can try upping the timeout. Let's try it on -dev to see if it lets this add-on though.

Possible long-term solutions:

1. From eviljeff: Allow editors to bypass the validator with a checkbox or something so that authors can at least submit an xpi

2. Split validation into subtasks. This is the best solution but would require a big change. Instead of run all the validation in one task, we could validate each file in separate tasks and stitch the results together later. 

Since this add-on is only the 2nd we know of that exceeds our timeout, solution #1 might be better to try first.
Could we have the timeout be configurable per task? A checkbox where editors only could submit an addon and then choose to not have it timeout. I still think the value of having the validator run should be kept.
The option would also have to disable Celery's built-in timeouts. We have a validator timeout to begin with so that we can at least get *something* out of the validation process if it takes a long time rather than getting an error code back.
Yep, the celery timeout is per process so we can't adjust that (but we can set it really high). We can adjust the validator timeout at runtime based on an editor setting, sure.
Hi again, any information on this issue ? 
Since Facebook broke again its API we would like to release pretty soon, otherwise our users can't add their Facebook account.

Thanks
I increased the timeout by about fifteen seconds. If it hasn't gone live already, it'll go live tomorrow.
Hi again and thanks.

Unfortunately, either it's not enough, or it is not live yet :(
It actually timed out at 80 seconds, although my measurement can't be accurate since it seems to start validation while the upload progress bar has not yet reached 100% (but title goes from 'sending' to 'validating' at about 80%)

I'll try again later.

If I can share my 2 cents about solutions proposed earlier:

To me it would make more sense to modify Spidermonkey so that one instance can process several files (okay, easier said than done) instead of launching one instance per file.

This would benefit to ALL thousands extensions at each validation, even if not by much, and would decrease significantly the total load on the validation servers.
With the recent features such as sockets, workers, etc, I would not be surprised if more and more complex hence big extensions (like Yoono) became not as seldom as right now.

A faster validation time would also encourage developers to actually take into consideration the warnings, fix them and submit again knowing they wouldn't have to wait too long for the results.

We had a similar issue with the Yahoo JS code compressor that we use. The Java program would only process one file at a time, and launching one VM for each file would require an unrealistic time, so we modified it so that it could process several files, which it now does in a mere few seconds. The modification was admittedly probably easier than the one spidermonkey would require.
(In reply to Xavier from comment #14)
> To me it would make more sense to modify Spidermonkey so that one instance
> can process several files (okay, easier said than done) instead of launching
> one instance per file.

I've already discussed this with a few of the other guys on the team and there are some issues that are preventing us from doing this:

Approach 1:
If we concatenate all the JS, that would speed up the process significantly. Unfortunately, a single syntax error would eliminate all validation for the entire add-on. Syntax errors aside, it would be impossible to get useful line numbers and contexts out of the results.

Approach 2:
Spin up a single Spidermonkey instance and iterate over the many JS files. This has a number of problems: 1.) If validation crashes hard, lots and lots of temp files get left on the disk. If the disk fills up, the server fails hard. 2.) There's no huge perf gains since a large chunk of the work takes place in the reflection code, which would still be run individually and synchronously.

Approach 3:
Use Python threads to parallelize calls to Spidermonkey. This has some benefits, but it is almost wholly negated by the complexity introduced into the codebase. Most of the objects in the validator are not thread safe or immutable, meaning most of the code would need to be rewritten. The other problem is that we can't simply spam the CPU with processes: if an add-on has a few hundred or a few thousand JS files, it's unreasonable to spin up a few hundred processes that each wants to max out a hardware thread. Instead, we'd need to use a pool, which (even with a relatively high cap) would still incur overhead in spinning up processes since Python's GIL makes all of those things block.

Approach 4:
Break the validation of each file into its own celery task and have the server figure it out. This is viable, but it's not at all a small undertaking. It's not reasonable for developers to need Rabbit installed locally to test add-ons with the CLI, so the existing code would still need to work. _Lots_ of code would need to be written and rewritten with little benefit for nobody but a handful of add-ons.

> This would benefit to ALL thousands extensions at each validation, even if
> not by much, and would decrease significantly the total load on the
> validation servers.
> With the recent features such as sockets, workers, etc, I would not be
> surprised if more and more complex hence big extensions (like Yoono) became
> not as seldom as right now.

In general, the vast majority of validations take under a few seconds. Internally, we test add-ons and monitor the performance of the validator and over 90% take under three or four seconds to run. Strictly speaking, there are very, very few add-ons which have enough JavaScript to even come close to the timeout.

I should also note that the benefits that I mentioned in comment #7 (especially those regarding the use of libraries) are not insignificant. Simply using canonical versions of libraries rather than re-minified versions is an easy benefit because it skips a few hundred thousand node traversals. Refactoring old code to effectively use the libraries that you're already including also helps validator performance (and in theory, real-life performance thanks to JIT in Firefox) in a non-insignificant way.
Hi all,

Unfortunately there is no way we can refactor that much code (about 260K lines) for now, let alone deal with the no-regression tests on all platforms (Google Chrome, iPhone, Xulrunner application for Mac and Windows, and Firefox) that use all or part of this code, we simply don't have the resources for that, as much as we wish we had.

Our only alternative is to have the latest release (7.7.14) available from our website only, but most of the 400k users are stuck with release 7.7.10 for now :(

In October Facebook will make breaking changes with its OAuth token handling, that we handle in 7.7.14, and that will cause 7.7.10 to no longer connect to Facebook.

Twitter already announced they will release their 1.1 API, that will also break 7.7.10 badly.

There is not much difference in size (and no difference at all in architecture, libraries or number of files) between release 7.7.10 that passed validation, and new ones that do not, so are we not overlooking another possible cause for the timeout ?

It would be amazingly great if validation could be skipped with a check box somewhere :)

Thanks all for all the thoughts given to this issue so far !
Hi Xavier, 
I understand refactoring all the code is infeasible in the short term, but just replacing a few libraries with versions directly from the developer (e.g. jquery) will make a difference on its own.  (And is required anyway now for any new files).  

A checkbox to bypass validation isn't likely to implemented particularly quickly, and isn't really a solution to the underlying issues either.
Although we've had in the past regressions on updating jQuery, I'm willing to try that but I need to know what is used by the validator to recognize jQuery as such ?
I'm surprised it does not already, since we are using vanilla 1.6.3
Just tried again, with JQuery and JQueryUI 1.8.0, still timing out.
Is there a change that it's not recognized ? 
Is the file name important ?

One more argument "against" such a large refactoring is that there is absolutely no guaranty that it would then pass validation.
Also, our next big feature will add about 50K lines of Javascript.

We can't put all our other projects (vital, for our company's future) on hold on a mere chance that we would meet some intermediate tool requirements that the target does not really have (even if it theoretically might benefit the target).

Not having a validation process for big applications does not encourage feature-rich applications that really add value to the browser.

It would be sad to have Firefox with all its APIs, workers, embedded sqlite, sockets, and all, to just see more ToDo Lists management extensions... ;)

Let me know if you have other ideas I could test, thanks a lot.
An idea I had this morning, posting so it's not lost:

We catch the timeout as usual, but instead of bubbling the error up we re-submit the validation with a flag having it skip the complex JS processing.  The new validation results are stored as usual and, instead of going into the regular approval queue, the new version gets automatically flagged for admin review (with a note saying why).

I don't think that would be too technically difficult since the flows (upload into a queue) is mostly unchanged, just a matter of flagging it.  A simpler and faster version would be to skip the second validation altogether and simply put the timed-out package into the admin queue saying it is too large to process.  Then the admin reviewers could manually review it (including running the validator themselves off the command line).
That sounds good to me!
(In reply to Xavier from comment #18)
> Although we've had in the past regressions on updating jQuery, I'm willing
> to try that but I need to know what is used by the validator to recognize
> jQuery as such ?
> I'm surprised it does not already, since we are using vanilla 1.6.3

The validator compares a hash of the file to one pulled from jquery.com (or wherever) so the file has to be identical.  Even re-saving in an IDE the file can mess up the encoding slightly and any build scripts need to skip it.
Filenames aren't important.

I just realised we didn't have jquery 1.8 so that wouldn't work till the next push anyway.
Thanks Andrew,

So I guess jQueryUI would not benefit from this, since it can be customized to only load useful stuff ?


And thanks Will and Jorge, this feature would be uber awesome.
(In reply to Xavier from comment #23)
> Thanks Andrew,
> 
> So I guess jQueryUI would not benefit from this, since it can be customized
> to only load useful stuff ?

If its the full file it works (e.g. http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.23/jquery-ui.min.js) but otherwise no.
Assignee: nobody → mattbasta
Target Milestone: --- → 2012-09-06
Target Milestone: 2012-09-06 → 2012-09-13
I've done some experiments with parallelizing Spidermonkey access, though this doesn't provide any worthwhile benefit in CPython due to the GIL. Some quick benchmarks actually showed a slowdown of about 10% with this code, mostly due to contention around the subprocess module and locking and unlocking of semaphores to prevent process spamming.

My next plan involves running all JavaScript parsing through a single instance of Spidermonkey, since that seems to be the single largest bottleneck in any given validation run. Currently, Spidermonkey processes are synchronously spun up and torn down for every script that's encountered. Ideally, a list of files would be passed to SM, which would iteratively parse each one and store the result in a corresponding temporary file. The files can be read and decoded asynchronously as the Spidermonkey process generates them. I can use the code from the previous experiment to make all this happen. I'll be working on this in the coming weeks.

This is quick and dirty and should (for the time being) solve timeout issues for exceedingly large add-ons.
Hi guys,

Facebook changed some data format in its API (a string is now an object), which makes impossible to add a Facebook account into Yoono extension :(

Most of our users use Yoono for Facebook.

Is there a way we could submit our fix ? Thanks
Summary: Yoono addon fails on validation during upload → Submit addon for admin review to bypass validation (fixes: Yoono addon fails on validation during upload)
https://github.com/mozilla/amo-validator/commit/6635895bc2d1011b4e34d6a10593334f5cb988c5

This commit will cause the validator to skip exhaustive validation for any add-on that contains more than 1000 JS files. On a cold run, Yoono validates in 13 seconds. I'll refrain from reiterating the drawbacks of this.

This effectively implements an equivalent solution to the proposal in comment #20.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Thanks Matt. This appears to only affect the validator. Can you open an additional bug(s) to hook this up to the submission flow? We need to flag these add-ons for reviewers so that reviewers know to spend additional time on the JS. Also, this needs a requirements.txt bump so it arrives in Zamboni.
The requirements file has been bumped.

The patch reports the big add-on with a warning message for now, so the editors can see (from the validation report) that the exhaustive validation process didn't take place. Did you want to show more/other messages?
hm, I guess that'll probably work. I thought we might need to send it to a different review queue but it will go into pending like any other add-on so that's probably fine. Thanks!
Thanks all.
Is it already online ? I tried submitting, and didn't see a difference in the result, so I'm unsure if the file is queued for reviewing or not ?
This fix will be pushed live later today. I recommend that you try again tomorrow, and reopen the bug if the problem persists.
♬ ♩ ♪ Oups it did it again ♫ ♩ ♬ 

I'm still getting the Timeout issue :(
Increasing severity due to Britney Spears references.
Severity: normal → major
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 2012-09-13 → 2012-10-18
I hit it, baby, one more time and it works for me:

http://imgur.com/a/oBK6p
Just in case anyone needs a vaccine against the depravity in comment 33, for which the poster should be ashamed,

♫ Help! I need somebody. Help! Not just anybody. Help! You know I need someone. Help! ♫


♬ I know the pieces fit, 'cause I watched them tumble down. No fault, none to blame... ♬


♪ Freude, schöner Götterfunken, Tochter aus Elysium... ♪
I may be missing something obvious, but we don't seem to be using the same interface, here is what I have : 
http://imgur.com/PDBWs

(And, yes, Kris, I'm ashamed, what do you think. Sounds like the vaccine came in too late anyway. And next time I'll do it Oppa Gangman style)
Hi guys, any news ? Should I threaten with more Britney references ? Thanks.
comment 35 makes this sound like it worked.  Kris or Jorge, can you try?
But comment #37 shows in a screenshot that it fails, at least for me, and I'm stuck there...
If you would like me to send the xpi I'm trying to submit, let me know.
Thanks again.
That would be useful.
Ok Matt, email with attachment on its way, thanks.
Lowered the threshold for aborting the exhaustive JS traversal:

https://github.com/mozilla/amo-validator/commit/c402e4a82cfa262f30fa6d8bb29a05be53caeee7
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
I confirm it's no longer timeouting, it's *great*, thanks a lot !
(In reply to Xavier from comment #44)
> I confirm it's no longer timeouting, it's *great*, thanks a lot !

awesome.  thanks everyone.
Status: RESOLVED → VERIFIED
Reopening, as the implementation only addresses this specific case of an add-on with an obscene number of JS files. This still seems to be an issue for Zotero, which merely has an obscene amount of code.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
The validator timeout is preventing us from submitting a Firefox 17–compatible version of Zotero.

Recent versions of Zotero now also fail to validate, so this is not due to any changes in Zotero.
There isn't inherently a problem with having lots of JS to validate; the vast majority of overhead with JS validation is from system calls to spin up Spidermonkey, which doesn't impact add-ons with relatively low numbers of JS files.

There has been a sharp increase in validation times in recent weeks that I'm debugging on bug 812538. That bug should be used to capture any information about Zotero and other recently failing add-ons.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Blocks: 813274
I am still having problem uploading new version (for FF17) of my add-on : https://addons.mozilla.org/en-US/firefox/addon/key-manager/. 
I had the same problem when I try to submit for FF16 compatibility update. 

When I try to upload Windows-only version (file size: ~2.2MB), I get "Unexpected server error while validating." error message after a long pause. 
When I try to upload for all versions (file size : ~7.9MB), I get "Timeout" error immediately. 

I have approximately 110 JS files irrespective platform count. Is that too much? 
The add-on also contains binary files. 

Any suggestion how to get around this problem? 
Is there anyway I can have one time exemption of validation error until this problem is fixed? 

Thanks.
Please send your file to amo-admin-reviews AT mozilla.org and we'll help you.
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: