Closed Bug 838524 Opened 11 years ago Closed 11 years ago

Plugincheck down - doesn't check plugins

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

x86
All
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cbook, Assigned: nmaul)

References

Details

Attachments

(1 file)

Attached image screenshot
see screenshot, got pinged by the german community about the problem. Schalk did we changed anything that can cause this problem?
Moving to the AMO Ops queue. I've checked and verified that plugins.m.o seems to be functioning normally and isn't returning 500. The test URL for Nagios monitoring is returning 200 OK with valid data too.
Assignee: server-ops-webops → server-ops-amo
Component: Server Operations: Web Operations → Server Operations: AMO Operations
QA Contact: nmaul → oremj
Can confirm that https://www.mozilla.org/en-US/plugincheck on both the latest Nightly (2013-02-05) as well as Beta (19.0) display a blank plugins area.
There are javascript errors, quite possibly related:

[16:12:25.211] SyntaxError: unterminated string literal @ https://www.mozilla.org/js/plugincheck.js:20
[16:12:25.212] ReferenceError: Pfs is not defined @ https://www.mozilla.org/en-US/plugincheck/:708

plugincheck.js version: da5f1e56d5d67a1867d28f12c27b7289
and btw stage https://www.allizom.org/en-US/plugincheck/ is working, so its a issue on production
Has there been a deployment of this recently to production that involved rebuilding the plugincheck.js file? I have run into these exact problems when I started working on the rewrite of the front end and it was related to some funky things happening when building the minified bundle from Perfedies.
one user in the support forums mentioned:

"Ironically, this seems to be caused by some new script from newrelic.com to monitor problems with the application! "

the only reference i found is on line 724 

<script type="text/javascript">if(!NREUMQ.f){NREUMQ.f=function(){NREUMQ.push(["load",new Date().getTime()]);var e=document.createElement("script");e.type="text/javascript";e.src=(("http:"===document.location.protocol)?"http:":"https:")+"//"+"d1ros97qkrwjf5.cloudfront.net/42/eum/rum.js";document.body.appendChild(e);if(NREUMQ.a)NREUMQ.a();};NREUMQ.a=window.onload;window.onload=NREUMQ.f;};NREUMQ.push(["nrfj","beacon-1.newrelic.com","6faa84b0ff","2194368","MVEDNkFSDxYEB0VaDAgbNBBaHAgLBghEVwYVGxEQVlUEEQYMH0MLFg==",0,30,new Date().getTime(),"","","","",""]);</script>


no clue if this is true that this is the reason for the problem
plugincheck.js is different on stage vs prod:

➜  ~  diff plugincheck.js.stage plugincheck.js.prod 
20c20,22
< document.open();window[successCallbackName]=function(json){done=1;pageCacheFlag&&(pageCache[finalUrl]={s:json});later(function(){cleanUp();notifySuccess(json);});};window[errorCallbackName]=function(state){(!state||state=="complete")&&!done++&&later(errorFunction);};xOptions.abort=cleanUp=function(){clearTimeout(timeoutTimer);document.open();removeVariable(errorCallbackName);removeVariable(successCallbackName);document.write(empty);document.close();frame.remove();};document.write(['<html><head><script src="',finalUrl,'" onload="',errorCallbackName,'()" onreadystatechange="',errorCallbackName,'(this.readyState)"></script></head><body onload="',errorCallbackName,'()"></body></html>'].join(empty));document.close();timeout>0&&(timeoutTimer=setTimeout(function(){!done&&errorFunction(empty,"timeout");},timeout));});return xOptions;}
---
> document.open();window[successCallbackName]=function(json){done=1;pageCacheFlag&&(pageCache[finalUrl]={s:json});later(function(){cleanUp();notifySuccess(json);});};window[errorCallbackName]=function(state){(!state||state=="complete")&&!done++&&later(errorFunction);};xOptions.abort=cleanUp=function(){clearTimeout(timeoutTimer);document.open();removeVariable(errorCallbackName);removeVariable(successCallbackName);document.write(empty);document.close();frame.remove();};document.write(['<html><head><script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script>
> <script src="',finalUrl,'" onload="',errorCallbackName,'()" onreadystatechange="',errorCallbackName,'(this.readyState)"></script></head><body onload="',errorCallbackName,'()"><script type="text/javascript">if(!NREUMQ.f){NREUMQ.f=function(){NREUMQ.push(["load",new Date().getTime()]);var e=document.createElement("script");e.type="text/javascript";e.src=(("http:"===document.location.protocol)?"http:":"https:")+"//"+"d1ros97qkrwjf5.cloudfront.net/42/eum/rum.js";document.body.appendChild(e);if(NREUMQ.a)NREUMQ.a();};NREUMQ.a=window.onload;window.onload=NREUMQ.f;};NREUMQ.push(["nrfj","beacon-1.newrelic.com","6faa84b0ff","2194368","MVEDNkFSDxYEB0VaDAgbNBBaHEtLDxc=",0,1,new Date().getTime(),"","","","",""]);</script>
> </body></html>'].join(empty));document.close();timeout>0&&(timeoutTimer=setTimeout(function(){!done&&errorFunction(empty,"timeout");},timeout));});return xOptions;}
CCing @jakem for assistance related to bedrock.
I set newrelic.browser_monitoring.auto_instrument *[1] to false in /etc/php.d/newrelic.ini (puppet/trunk/modules/secrets/files/webapp/newrelic/newrelic.ini). This has removed the newrelic stuff from the php side of things.

FYI this breaks newrelic for every php site on every cluster that was using this feature (probably all/most of them) for now due to how the puppet module was structured.

If you want user monitoring you will need to code around this in a manual way.

Leaving this bug open as we need to have a discussion as to a real fix, this is really just a temporary workaround.


[1] https://newrelic.com/docs/php/real-user-monitoring-in-php#auto-rum
Assignee: server-ops-amo → server-ops-webops
Component: Server Operations: AMO Operations → Server Operations: Web Operations
QA Contact: oremj → nmaul
Assignee: server-ops-webops → nmaul
[root@bedrockadm.private.phx1 data]# find . -type f -name "plugincheck.js" -print0 | xargs -0 md5sum
130338c4e5950e0541fb57bd993a7a3b  ./bedrock/src/www.mozilla.org/js/plugincheck.js
130338c4e5950e0541fb57bd993a7a3b  ./bedrock/www/www.mozilla.org/js/plugincheck.js
130338c4e5950e0541fb57bd993a7a3b  ./bedrock-dev/src/www-dev.allizom.org/js/plugincheck.js
4333126888e15ddf4bd9831733d1bef0  ./bedrock-dev/src/www-demo3.allizom.org-django/bedrock/media/js/plugincheck/plugincheck.js
130338c4e5950e0541fb57bd993a7a3b  ./bedrock-dev/www/www-dev.allizom.org/js/plugincheck.js
4333126888e15ddf4bd9831733d1bef0  ./bedrock-dev/www/www-demo3.allizom.org-django/bedrock/media/js/plugincheck/plugincheck.js
130338c4e5950e0541fb57bd993a7a3b  ./bedrock-stage/src/www.allizom.org/js/plugincheck.js
130338c4e5950e0541fb57bd993a7a3b  ./bedrock-stage/www/www.allizom.org/js/plugincheck.js


Stage and prod have the exact same base file... although this doesn't account for any minimization and/or combination that might be happening. I can't speak to that. Maybe pmac can?

One difference is that stage did not have a proper app name set for New Relic. That shouldn't matter though, it reports either way... this just changes what name it reports under.
So the plugincheck.js file should be the exact same as it is being built from https://github.com/ozten/Perfidies-of-the-Web
At this time we believe this should be resolved for all use cases. If anyone is still experiencing this issue, please comment and (if possible) include what IP "www.mozilla.org" is resolving to at the time it fails, and (again if possible) some HTTP headers... it might allow us to isolate any still-occurring issues down to a particular backend node.
The problem is that when https://newrelic.com/docs/php/real-user-monitoring-in-php#auto-rum is turned on, the PHP code that the New Relic agent uses, injects custom JavaScript into the JS being served and this breaks the plugin behavior

So either the JS needs to be patched to be compatible with the New Relic JS or we have to leave the Real User Monitoring turned off for bedrock
I'm not convinced that's true... I've disabled puppet on bedrock1.dev.webapp.phx1 (www-dev.allizom.org) and re-enabled this setting. The plugincheck page still works for me.

In fact, I have *never* been able to replicate this problem... on dev, stage, or prod, even before (albeit timing is questionable) we disabled it.

I suspect this affects different browsers or OS's differently.
We have fixed a totally unrelated problem that was breaking www-dev.allizom.org/plugincheck and www.allizom.org/plugincheck.

At the moment, here's what we're set up with on dev, stage, and prod:

dev: newrelic.browser_monitoring.auto_instrument = true, puppet disabled until 2013-02-06 17:26
stage: newrelic.browser_monitoring.auto_instrument = false, puppet enabled
prod: newrelic.browser_monitoring.auto_instrument = false, puppet enabled

In my testing, all 3 env's work without issue.. even though we currently have the auto instrumentation enabled on dev.
and at the moment as jake said, plugincheck is up and running, but maybe we should leave this bug open to fix the cause of the problem
This was resolved.

dev/stage had a bad cert for plugins.stage.mozilla.com. This caused /plugincheck/ to break on www-dev.allizom.org and www.allizom.org, but only for users who hadn't previously visited (and accepted) the out of date cert on p.s.m.c. I had, so I couldn't replicate. Dev and stage were *not* affected by the New Relic RUM setting... on or off, no problem. This ultimately contributed greatly to the confusion.

Prod, at the moment (and at the time of the incident), uses a totally different /plugincheck/ than dev or stage. It *is* affected by the New Relic RUM instrumentation.


/plugincheck/ is being redesigned/rewritten, which is why dev and stage are different. The hope is that soon we will be rid of the old one, and we'll be able to safely enable this bit of New Relic, should we want to. For now though, it's disabled, and we have no plans to turn it on unilaterally. Additionally, the cert on p.s.m.c is good for another year now, so that problem is also resolved.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.