Closed Bug 667495 Opened 13 years ago Closed 13 years ago

QMO staging site needs more RAM allocated to PHP

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davehunt, Assigned: nmaul)

References

()

Details

(Whiteboard: waiting on feedback)

Attachments

(1 file)

I have started running automated tests for QMO, and found that they often fail with a fatal error displayed to the user. The error message is always something like:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 272010 bytes) in /data/www/quality-new.stage.mozilla.com/wp-includes/functions.php on line 251
Matt, it looks like enough memory isn't allocated on the staging server for PHP, perhaps? Who could take a look at this for us?
Can we get an update on this? Our automated tests are still regularly failing.
Moving this to server operations so they can allocate more RAM for PHP on the staging box. Can you guys have it match production, please?
Assignee: nobody → server-ops
Component: Website → Server Operations
Product: quality.mozilla.org → mozilla.org
QA Contact: website → mrz
Summary: Frequent fatal errors when loading the QMO staging site → QMO staging site needs more RAM allocated to PHP
Version: unspecified → other
Any update on this? I saw this today while trying to work on staging and our Selenium scripts still fail when they run due to this.

The error I got today is:


Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 595687 bytes) in /data/www/quality-new.stage.mozilla.com/wp-includes/wp-db.php on line 774
For the record:

staging is quality-new.stage.mozilla.com, lives on mrapp-stage02
prod is quality.mozilla.org, lives on dm-qmo01


Both have PHP memory_limit set to 32MB. Do the same selenium jobs fail on prod?
I don't know if we run the Selenium scripts on Prod (as it is new) but I've never seen that error message on the production site.
The automated tests are not currently running on production.
According to comment 2, the automated test fails every time on stage. Could we run the failing test against prod, to verify that it fails there in the same way?

I have bumped the mem limit on stage up from 32MB to 48MB... let me know if that helps (at the very least it should change the number reported in the error). We really need to test prod too... if prod is susceptible to the same issue then it should be fixed as well.


What exactly is being done when this error is thrown? It's somewhat unusual for a script to use that much RAM intentionally, but not exceptionally so. I just want to try and confirm it's doing something that makes sense and that we're fixing the right thing. :)
There are currently only two automated tests. One submits a valid new user, and the other attempts to submit a user with an invalid username. Typically I have seen this failure after submitting the create new user form (which both tests do).

I'm not sure I want to run the create new user test on production as it will leave these orphaned accounts. I will mark the unhappy path test as prod safe, and will start running it on prod shortly.
Assignee: server-ops → nmaul
I'm unable to do this because of the CAPTCHA on the registration page. The answer differs depending on the environment, and now staging appears to be using reCAPTCHA. This causes our tests to fail even if the memory limit issue does not occur.
I've disabled reCAPTCHA on stage so you can run your tests.
This was added yesterday. As I mentioned in e-mail, we can turn it off on staging.
The tests have been re-enabled without the CAPTCHA. I will monitor the results and update if we see the original failure again.
Whiteboard: waiting on feedback
Any news on this? If this is fixed on stage, I'd like to make the same change to prod, to keep them consistent.
ping abillings / davehunt ... did the stage memory_limit fix the issue? If so I want to make the same change on prod to keep them consistent.
I haven't seen the issue on staging since the change, however I've been unable to run the automated tests for some days now due to unresolved issues with the version of Selenium we were using.
(In reply to Dave Hunt (:davehunt) from comment #16)
> I haven't seen the issue on staging since the change, however I've been
> unable to run the automated tests for some days now due to unresolved issues
> with the version of Selenium we were using.

How about now?
I'm still blocked. We rolled back to Selenium 1.0.8 due to http://code.google.com/p/selenium/issues/detail?id=2037. I tried setting up 2.x on a single node but am now also blocked by http://code.google.com/p/selenium/issues/detail?id=2199.
fwiw, we are working in a new QMO dev/stage/prod environment in bug 641627.  Hopefully be done in the week to come.  It will be a different platform, not to promise that it will be better but just to note here that things will change.
This should be invalid now with dev and stage on the new cluster.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: