532498 - Error message "Service Unavailable" with every my edit

Reporter

Description

•

15 years ago

When I try to save my edit on SUMO every time I get error page:

"Service Unavailable
The service is temporarily unavailable. Please try again later." 

When I submit post data on this error page again then all works fine. Chris Ilias said that maybe this problem is related to Amsterdam server because he doesn't see problem and I'm from Europe.

This problem was discussed in Contributors forum:
https://support.mozilla.com/en-US/forum/3/513605

Stephen Donner [:stephend] Not actively reading bugmail

Comment 1

•

15 years ago

Pavel, can you get and attach (as a plain text file) Live HTTP Headers [1] output of just these submission attempts?

[1] https://addons.mozilla.org/en-US/firefox/addon/3829

Thanks!

Tom Ellins [:TMZ]

Comment 2

•

15 years ago

IT please check this out.

http://mozilla-uk.org/headers

Assignee: nobody → server-ops

Component: General → Server Operations

Product: support.mozilla.com → mozilla.org

QA Contact: general → mrz

Version: unspecified → other

Tom Ellins [:TMZ]

Updated

•

15 years ago

Severity: normal → blocker

Tom Ellins [:TMZ]

Comment 3

•

15 years ago

per jsocol's request. http://mozilla-uk.org/service.JPG

Derek Moore [:dmoore]

Updated

•

15 years ago

Assignee: server-ops → dmoore

Shyam Mani [:fox2mike]

Comment 4

•

15 years ago

I think Derek's fixed this..TMZ confirmed over IRC.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Pavel Cvrcek [:JasnaPaka]

Reporter

Comment 5

•

15 years ago

Attached file Headers — Details

Looks like it still doesn't work for me. See my headers.

Assignee: dmoore → pcvrcek

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Tom Ellins [:TMZ]

Comment 6

•

15 years ago

This appears to be a intermittent issue. 3 people in the EU can not reproduce. Fox2mike, could you look at this?

Tom Ellins [:TMZ]

Comment 7

•

15 years ago

test article on prod where this can be tested without editing live docs. = https://support.mozilla.com/en-US/kb/tmztest?bl=n

[:Cww]

Comment 8

•

15 years ago

re-assign to server-ops if you need IT attention.

Assignee: pcvrcek → server-ops

Derek Moore [:dmoore]

Updated

•

15 years ago

Assignee: server-ops → dmoore

Derek Moore [:dmoore]

Comment 9

•

15 years ago

I believe we have isolated the issue related to SUMO timouts in Amsterdam. Any client request (particularly edits) which took more than 10 seconds to complete could trigger a service timeout for other users of the site. We've removed the monitor which controlled this behavior and we've slightly expanded the monitoring timeouts in general.

I'll leave this bug open for now, as I'd like for everyone to confirm their issues have been resolved.

Pavel Cvrcek [:JasnaPaka]

Reporter

Comment 10

•

15 years ago

At this moment all works fine for me. Thanks!

Underpass

Comment 11

•

15 years ago

Just had the error editing the Italian version of ((How to make Firefox the default browser)).

Derek Moore [:dmoore]

Comment 12

•

15 years ago

Thanks, Simone. I've investigated and confirmed the problem. We'll continue to work on it.

Michele Rodaro [michro]

Comment 13

•

15 years ago

Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; it; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5

Just had the error when I approved an edit I've made to the Italian version of ((Firefox consumes a lot of CPU resources)).

It happens even when I try to connect to the "all Knowledge Base articles" page
(after a refresh I can view the page).

Michele

Thomas Schwecherl

Comment 14

•

15 years ago

Unchanged situation since several weeks:
"Service Unavailable" error after every(!) approving or saving of an article and most times when I try to open https://support.mozilla.com/kb/all+Knowledge+Base+articles or https://support.mozilla.com/kb/Localization Dashboard

Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5

Kadir Topal [:atopal]

Comment 15

•

15 years ago

I can reproduce, just saw this error when I opened this page:
https://support.mozilla.com/en-US/kb/All%20Knowledge%20Base%20articles
Not editing anything.

Derek Moore [:dmoore]

Comment 16

•

15 years ago

Thanks for the input, everyone. We're still tracking problems with pages which take a significant amount of time to load (such as the All Articles link, above).

Boersenfeger

Comment 17

•

15 years ago

The Problem is in SUMO on all sites, that I translate or changed in the last week, and that were a lot. Im living in Germany, maybe the Server in Amsterdam is responsible here!? Sry, my English is bad. Have a nice Christmas all.

Vito Smolej

Comment 18

•

15 years ago

I get it pretty regularly (sl l10n, located in Munich). It has no negative effects, as far as I am concerned - to me, "it's just another process timing out on you".

matthew zeier [:mrz]

Comment 19

•

15 years ago

Vito, I dumbed down the health check more.  Want to let this bake a bit before rolling this to other sites.

matthew zeier [:mrz]

Updated

•

15 years ago

Whiteboard: [baking]

matthew zeier [:mrz]

Comment 20

•

15 years ago

I'm going to call this baked and close it.  We'll use this same strategy for other sites.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Chris Ilias [:cilias]

Comment 21

•

14 years ago

We're getting reports of this still happening. Has this been rolled out?

matthew zeier [:mrz]

Comment 22

•

14 years ago

Yes, long time ago.  Still seeing it with sumo?

Vito Smolej

Comment 23

•

14 years ago

I >>NEVER<< can move the SUMO material from the staging area to the knowledge base without seeing this. The process finishes OK, I just need to refresh the page. 

It works for me, it's is just a drag.

Boersenfeger

Comment 24

•

14 years ago

Nothing changed here. See my First Post Comment 17!
I see the "Service Unavailable
The service is temporarily unavailable. Please try again later." Dialog every Time, when I wrote or change a Site on SUMO. I have to send the changed Information again, than it works.

Underpass

Comment 25

•

14 years ago

Same here (Italy). It happens about 9 times out of 10.

Thought you're still working on this.

matthew zeier [:mrz]

Comment 26

•

14 years ago

> Thought you're still working on this.

After comment 19 we let it sit for a week, didn't hear any complains and assumed the issue resolved.  No one's been working on it since largely because this bug was marked resolved.

I'll re-open and we'll re-investigate this week.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Derek Moore [:dmoore]

Comment 27

•

14 years ago

Everyone,

We've made more changes to improve the proxy stability between Amsterdam and San Jose. We have also migrated some prior experimental changes on non-SSL sumo to all SSL-enabled sites.

Please update here if you are still experiencing problems. It is particularly helpful if you include the URL where you encounter a timeout and the timestamp of when it occurred.

Underpass

Comment 28

•

14 years ago

Hello,

This URL

https://support.mozilla.com/it/kb/all+Knowledge+Base+articles

always gives me "Connection reset"

Underpass

Comment 29

•

14 years ago

Just had this error when editing the article ((*Eliminare i cookie))

Underpass

Comment 30

•

14 years ago

Again saving this article 

https://support.mozilla.com/it/kb/*Firefox+cannot+load+websites+but+other+programs+can

Boersenfeger

Comment 31

•

14 years ago

This Site https://support.mozilla.com/de/kb/Article+list?style_mode=inproduct says
Service Unavailable

The service is temporarily unavailable. Please try again later.

Derek Moore [:dmoore]

Comment 32

•

14 years ago

Thank you for the specific feedback, everyone. There are several problems here, both on the San Jose and the Amsterdam side, and the specific examples are very useful for tracking them down.

Guillermo López :willyaranda (probably SLOW response)

Comment 33

•

14 years ago

I'm getting the problem *every* time I edit, or try to approve a change. It's so frustrating.

Anyway, the slowness of the site is painful.

Boersenfeger

Comment 34

•

14 years ago

This Side was changed here. https://support.mozilla.com/de/kb/*Das+Download-Fenster+%C3%B6ffnet+sich+nicht?bl=n
Same Problem ...

Michele Rodaro [michro]

Comment 35

•

14 years ago

Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; it; rv:1.9.2) Gecko/20100115 Firefox/3.6

Always the same error.
Just had "Service Unavailable" saving the edits to the articles

https://support.mozilla.com/it/kb/*Improvvisa+apertura+di+molte+schede+o+finestre+in+Firefox

https://support.mozilla.com/it/kb/*Errore+nel+caricamento+di+un+sito+web

and every time I try to connect to https://support.mozilla.com/it/kb/all+Knowledge+Base+articles

David Tenser [:djst]

Comment 36

•

14 years ago

Same here, getting this problem every time I edit (from Sweden). What's the status of this bug? What does [baking] mean?

John Daggett (:jtd)

Comment 37

•

14 years ago

This was happening to me three hours ago, grrr....

David Tenser [:djst]

Comment 38

•

14 years ago

I can confirm comment 28, loading https://support.mozilla.com/it/kb/all+Knowledge+Base+articles always times out.

The other steps to reproduce require editing an article. Every time I save an edit, it times out.

Boersenfeger

Comment 39

•

14 years ago

https://support.mozilla.com/de/kb/*Firefox+als+Standardbrowser+festlegen+funktioniert+nicht?bl=n
and others, Problem held.
It is necessary to commit every Article, when this Problem occurs? So every Article, which I worked on, show this !
(Bad English, sorry)

matthew zeier [:mrz]

Updated

•

14 years ago

Assignee: dmoore → jeremy.orem+bugs

matthew zeier [:mrz]

Comment 40

•

14 years ago

oremj is going to poke at this today.  if we can't find a quick fix we'll turn down the GLB service and investigate more without impacting you.

Jeremy Orem [:oremj]

Comment 41

•

14 years ago

Let's see if moving to zeus magically fixes this.

Boersenfeger

Comment 42

•

14 years ago

(In reply to comment #41)
Problem is not fixed. I changed this Article https://support.mozilla.com/de/kb/*Lesezeichen+zum+Internet+Explorer+exportieren?bl=n and its happened again.

Jeremy Orem [:oremj]

Comment 43

•

14 years ago

We haven't moved it yet. I'll comment when it has been moved.

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

14 years ago

Comment 44

•

14 years ago

Looks like moving to zeus didn't fix the problem.

Jeremy Orem [:oremj]

Comment 45

•

14 years ago

Found the root of this problem! Anytime a page is created it sends out a ton of e-mails and zeus eventually just sees this as timing out.

If I hit the webhead directly it doesn't time out, but my current page create is up to 240 seconds with no end in sight. This also explains why we don't see the issue in stage, it doesn't send e-mail.

I'll reassign to support and it will be up to them to figure out how to send e-mail faster (most likely needs to be asynchronous).

Assignee: jeremy.orem+bugs → nobody

Component: Server Operations → Knowledge Base Software

Product: mozilla.org → support.mozilla.com

QA Contact: mrz → kb-software

Version: other → unspecified

Jeremy Orem [:oremj]

Comment 47

•

14 years ago

A quick workaround would be to turn off sending e-mails on page create until this is fixed.

Vito Smolej

Comment 48

•

14 years ago

re "Looks like moving to zeus didn't fix the problem." 

- it did not make it worse either. 

smo

Laura Thomson :laura

Comment 49

•

14 years ago

We can't turn off those emails.  Ways to fix:

- Flush user output and then send the emails (hiding problems from the user)
- Move the emails out of the process, sample code here
http://gearman.org/index.php?id=php_-_mail_queue
but requires gearman.  Jeremy, shall I file a separate bug for IT for gearman for SUMO?

James Socol [:jsocol, :james]

Assignee

Comment 50

•

14 years ago

If IT is OK with it, it feels like Gearman is the way to go. Assigning to me, 1.5.3, Major. Any objections?

Assignee: nobody → james

Severity: blocker → critical

Priority: -- → P1

Target Milestone: --- → 1.5.3

David Tenser [:djst]

Comment 51

•

14 years ago

(In reply to comment #50)
> Any objections?

Only cheerful excitement!

Jeremy Orem [:oremj]

Comment 52

•

14 years ago

I'm okay with gearman, but I've heard rumors of the amo team switching to celery (http://ask.github.com/celery/). Should probably talk to them first.

James Socol [:jsocol, :james]

Assignee

Comment 53

•

14 years ago

(In reply to comment #52)
> I'm okay with gearman, but I've heard rumors of the amo team switching to
> celery (http://ask.github.com/celery/). Should probably talk to them first.

I asked in #amo and no one said they were moving. Also, I couldn't find PHP APIs (maybe I'm looking under the wrong names?) where Gearman has published APIs in a bunch of languages, including the PECL extension.

James Socol [:jsocol, :james]

Assignee

Comment 54

•

14 years ago

Attached patch patch, moves sending wiki edit notifications to a gearman worker — Details — Splinter Review

Sorry this took so long. I spent a long time trying to follow what Tiki does when sending these notifications, and I'm fairly sure I understand why its so slow. However, I couldn't duplicate that functionality in a reasonable amount of time without just letting Tiki do it, itself.

So, this patch basically does a copy-paste of the contents of sendWikiEmailNotifications() (in webroot/lib/notification/notificationemaillib.php) and puts it in a worker (in scripts/gearman/notification.php). To make this work, the worker needs to cd to webroot and include tiki-setup.php, which means that the worker needs a full checkout of SUMO to work. (I apologize for that but after a week in the rabbit hole, I needed to solve this.)

This adds a new configuration option to webroot/db/local.php.dist, namely $gearman_servers, an array. Each member of $gearman_servers is an array with 'host' and 'port' vars.

To run the worker, make sure gearmand is running, then just type
php notifications.php -d

The worker will start a daemon (the '-d' bit) and write its PID to scripts/gearman/etc/notification.pid.

Now, edit a page (I'll attach something in a second that helps not send mail). The email should all still get sent. If you kill the worker process, and edit a page, the notifications will get sent out the next time you start the worker.

This bit of code is shockingly deep. I can understand how it would slow down the response so much. If Jeremy was correct that this is the pain point, this patch will go a long way toward alleviating that.

Attachment #434416 - Flags: review?(paulc)

James Socol [:jsocol, :james]

Assignee

Comment 55

•

14 years ago

Attached patch helpful testing patch — Details — Splinter Review

This helped me test the notifications on real data. Shutting down the mail services might also work, if you can find a way to clear the queue before restarting them. (I couldn't, frustratingly. Still working on that.)

All this does is patch webroot/lib/webmail/htmlMimeMail.php (which at least _some_ things use to do their sending) to write out some information to the file /tmp/mailq instead of sending mail. It let me test with real data.

Paul Craciunoiu [:paulc]

Comment 56

•

14 years ago

Comment on attachment 434416 [details] [diff] [review]
patch, moves sending wiki edit notifications to a gearman worker

Looks good. It was a lot easier than I expected. I did a diff of the sendWikiEmailNotification() function contents just to make sure all that deep code stayed the same and that looks fine too.

And I saw useful output in /tmp/mailtq, so your test file helped a lot! ;)


One thing I wasn't able to test is resuming. Perhaps I missed a step? Here's what I did:
* after seeing expected output in /tmp/mailq (in other words, things were working), I killed the worker and made another edit to the page.
* started the worker with |php notification.php -d| as before
* checked /tmp/mailq for an update

The update wasn't there. Maybe I'm missing something? An explicit "resume"? I'm cool with having another look at this if so.

However, emails get sent out of the main thread so for that purpose, r+.

Attachment #434416 - Flags: review?(paulc) → review+

James Socol [:jsocol, :james]

Assignee

Comment 57

•

14 years ago

r64841. Still talking to fox2mike in bug 551513 about getting it running.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 14 years ago

Resolution: --- → FIXED

Whiteboard: [baking]

Guillermo López :willyaranda (probably SLOW response)

Comment 58

•

14 years ago

I'm getting this problem while adding a new translation every time with:

https://support.mozilla.com/tiki-edit_translation.php?locale=es&page=Firefox%20crashes%20when%20you%20exit%20it

And I don't think that this could be cause by mailing since this not involves any mail…

James Socol [:jsocol, :james]

Assignee

Updated

•

14 years ago

Depends on: 551513

James Socol [:jsocol, :james]

Assignee

Comment 59

•

14 years ago

Ran into a problem that I'd missed locally: occasionally instead of sending notifications, the worker will spew a bunch of HTML to the terminal and then die.

My theory is that the database server is closing the connection and the client is not trying to reopen it before sending queries. I need to verify this tomorrow but it's by far my best lead.

Assuming that's the case, what needs to happen: the DB connection needs to be made in the task function (see webroot/db/tiki-db.php for what happens). This probably means creating and destroying a tikilib object in the task as well. Quite possibly it also means putting that DB connection object into the global scope, and then creating all the objects that extend TikiLib. Joy.

Blocks: 555003

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

James Socol [:jsocol, :james]

Assignee

Comment 60

•

14 years ago

Attached patch patch to patch 1, moves DB connecting/closing inside the function and adds debug output — Details — Splinter Review

This patch does a lot to the normal environment set up by tiki-setup.php:

1. Destroys a number of DB connection-dependent globals immediately after they're created.
2. Connects to the DB and recreates the globals when it needs to (when it receives a job).
3. Destroys the globals and disconnects again after the job is done.

It also adds a constant (DAEMON) that is TRUE if the script was run with the -d flag. It then uses that constant to control printing debug statements to the terminal (it only does if DAEMON is FALSE). The debugging statements are prettied-up for easy reading.

James Socol [:jsocol, :james]

Assignee

Updated

•

14 years ago

Attachment #436085 - Flags: review?(paulc)

James Socol [:jsocol, :james]

Assignee

Updated

•

14 years ago

Attachment #436085 - Flags: review?(paulc) → review?(laura)

Laura Thomson :laura

Comment 61

•

14 years ago

Comment on attachment 436085 [details] [diff] [review]
patch to patch 1, moves DB connecting/closing inside the function and adds debug output

Code itself works (easier than I thought, too, yay!)...my only general comment is consider putting the debug info into an error log if you're running in DAEMON mode, as this may help to debug problems in prod down the track.

Attachment #436085 - Flags: review?(laura) → review+

James Socol [:jsocol, :james]

Assignee

Comment 62

•

14 years ago

(In reply to comment #61)
> (From update of attachment 436085 [details] [diff] [review])
> Code itself works (easier than I thought, too, yay!)...my only general comment
> is consider putting the debug info into an error log if you're running in
> DAEMON mode, as this may help to debug problems in prod down the track.

That's a good idea, but I'm going to hold off for now as I don't know the best place/way to write out logs, and Shyam's not here today.

r65138.

Pinging the on-call to reboot the worker.

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → FIXED

James Socol [:jsocol, :james]

Assignee

Comment 63

•

14 years ago

Backed out in r65201.

I've spent a week or so on this, and while I've made progress, I'm also heading down the rabbit hole, which is a bad way to allocate our resources right now.

This has been a problem for a while. It will probably be a problem for a little while longer, but it will be fixed in Kitsune, and it's better for us to focus on that now.

I'm not WONTFIXing this, but it also can't block 1.5.3 any more.

Severity: critical → major

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Target Milestone: 1.5.3 → Future

James Socol [:jsocol, :james]

Assignee

Updated

•

14 years ago

No longer blocks: 555003

Guillermo López :willyaranda (probably SLOW response)

Comment 64

•

14 years ago

I don't want to be arrogant, but maybe in the US you don't see this, but I think this problem is very big in Europe datacenter to leave it for Katsume to fix it (how many time until we can see it in production BTW?).

I'm getting reports from people that tries to help with SUMO translations, and everyone are having this issue while saving. And they lost their job while translating.

In my community we are few people that know about this problem, and, for the nature of SUMO, it's a wiki, we can't reach every people that wants to help, I think a lot of people has lost time and effort while saving because of this bug.

James Socol [:jsocol, :james]

Assignee

Comment 65

•

14 years ago

We realize that this is a huge pain, especially for localizers. You should know that we didn't just back this out of 1.5.3 because it was frustrating or we didn't feel like working on it. We did it because it was basically eating up all of our resources and preventing other work from getting done, and it looked like it would continue to do so. Especially if, like comment 58 indicates, our approach wouldn't actually solve everything.

SUMOdev is a two-person team right now. If one of us is completely dedicated to one, the other person can't get code reviewed, check in, get feedback, etc--at least not quickly enough for it to be helpful. While that's fine for a day or so, this was eating up all of my time for a week with no end in sight. Essentially every day spent on this bug delays Kitsune by a day right now.

Ultimately, it was a decision of resource allocation and how to best serve the SUMO project and community as a whole. While fixing this bug would be a good use of our time, there are higher priorities right now.

For an overview of the rough plan for the next 6-9 months, you can look here:
* https://wiki.mozilla.org/Support/Kitsune_Milestones
* https://wiki.mozilla.org/Support/SUMOdev_Meeting_Notepad/2010_Q1#Mar_30.2C_2010

I'll try to write up a blog post about what we've been doing--I know it hasn't been highly visible (though check http://support-stage-new.mozilla.com/en/search hopefully sometime tomorrow) or seem very important--it's just search results. But it's actually hugely important in laying the foundations for the new platform.

Vito Smolej

Comment 66

•

14 years ago

as for me "... this is a huge pain, especially for localizers ..." does not hold, I mean, as long I can assume that "it´s not a bug, it´s a feature" (g), I´m doing fine (...press "reload" and continue). Seems like, unfortunately, every new member of the SUMO is bound to hit this pothole sooner or later...

Goes without saying, that your work is much appreciated.

regards

smo

Boersenfeger

Comment 67

•

14 years ago

Sure, you had not enough manpower. But new Translaters, who loosed there Stuff are not amused and went off from Translating Work on SUMO. Is it better than? Ive never loose my Text on this error. Maybe U can warn the translaters, that they should save their text before they click on send! Nice Easterweekend all.

[:Cww]

Comment 68

•

14 years ago

As far as I know, the submit still works and no data is lost.  It just seems that way because the server gets so caught up processing the request that it throws an error message.  Would it help if we put a notice above the submit button saying that you may see an error but it's just because the server is working on your request and they don't have to resubmit anything.

Vito Smolej

Comment 69

•

14 years ago

There ?may? be a situation, leading to loss of data, namely (just a scenario of what happened to me some time ago) that you get a fresh file, localize it and send it off >>without making a staging copy<< (i.e. translated to a beginner's mindset >>knowing<< there's something like a staging and a production copy). 

Checking for that would make sense in any case and probably keep some feathers unruffled. 

Re msgbox on timeout, something like "...in case you get XYZ message, press reload to continue" would make sense - it would indicate the problöem is known, and help the user climb over it.

Kadir Topal [:atopal]

Comment 70

•

14 years ago

Yes, I'd also suggest we inform users that we know about the problem and provide them with a workaround. Generally I'd also say that this problem is unacceptable, but in regard to the situation trying to fix it would mean spending ressources on a dying piece of software. Instead we should try to bear with it just a little bit longer until this part of the KB is replaced by Django-Code as well. 

I know this isn't really satisfying and being from Europe I hate that as much as you, but the alternative is even worse :/

David Tenser [:djst]

Comment 71

•

14 years ago

We currently have 1.5 people working on SUMO web development (James x 1, Paul x 0.5) and James spent approximately a full week trying to nail this down (see comment 63). We can't justify having James or Paul spend another week or more going around in circles and further delaying the development of SUMO 2.0.

mrz: Is there *any* way this problem can at least be reduced with some sort of IT quickfix -- e.g. more DB mirrors, more RAM, faster servers, etc? If so, we really should look into that now so we can improve the situation.

This really hurts the usability of SUMO for anyone editing pages, and seems to be happening mostly in Europe (although I'm not 100% sure about that).

Kadir Topal [:atopal]

Comment 72

•

14 years ago

Just wanted to add that it's not only editing anymore, I also got the same message when I used the search and when I replied to a forum thread, again from Europe.

matthew zeier [:mrz]

Comment 73

•

14 years ago

> mrz: Is there *any* way this problem can at least be reduced with some sort of
> IT quickfix -- e.g. more DB mirrors, more RAM, faster servers, etc? If so, we
> really should look into that now so we can improve the situation.
> 
> This really hurts the usability of SUMO for anyone editing pages, and seems to
> be happening mostly in Europe (although I'm not 100% sure about that).

Wasn't aware this was still an issue!

It has to do with the proxy setup we have in Amsterdam.  US-based hits aren't having any problems.  The only quick fix is to disable the Amsterdam proxy and force everyone to come to the US.

Should we do that?

David Tenser [:djst]

Comment 74

•

14 years ago

Could we try that to see if it's a net positive? Would this only affect logged in sessions? If so, it seems like the right thing to do (given my lack of insights about the potential downsides of such a change).

matthew zeier [:mrz]

Comment 75

•

14 years ago

This will affect all users.  

I took Amsterdam out of the GLB pool.  Let me know if there are still issues (and if there are, please let me know what IP address your computer is getting for support.mozilla.com).

Tobias (:Tobbi) Markus

Comment 76

•

14 years ago

(In reply to comment #75)
> This will affect all users.  
> 
> I took Amsterdam out of the GLB pool.  Let me know if there are still issues
> (and if there are, please let me know what IP address your computer is getting
> for support.mozilla.com).

There are definitely still issues with Service Unavailable:

support.mozilla.com has the IP 63.245.213.89 for me.

matthew zeier [:mrz]

Comment 77

•

14 years ago

you probably have to wait for dns ttls to expire.

Jeremy Orem [:oremj]

Comment 78

•

14 years ago

(In reply to comment #58)
> I'm getting this problem while adding a new translation every time with:
> 
> https://support.mozilla.com/tiki-edit_translation.php?locale=es&page=Firefox%20crashes%20when%20you%20exit%20it
> 
> And I don't think that this could be cause by mailing since this not involves
> any mail…

I think this is bug 549961.

matthew zeier [:mrz]

Comment 79

•

14 years ago

Still seeing those Amsterdam IPs (63.245.213.0/24)?

Kadir Topal [:atopal]

Comment 80

•

14 years ago

What I get is: 63.245.213.89 not sure if that's Amsterdam as well

matthew zeier [:mrz]

Comment 81

•

14 years ago

Anything in 63.245.213.89 is.  Are you on a unix-like machine?  Can you send me the output of "host -v developer.mozilla.org" ?

Kadir Topal [:atopal]

Comment 82

•

14 years ago

sure

Trying "developer.mozilla.org"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13716
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;developer.mozilla.org.         IN      A

;; ANSWER SECTION:
developer.mozilla.org.  517     IN      CNAME   developer-mozilla-org.geo.mozilla.com.
developer-mozilla-org.geo.mozilla.com. 3517 IN CNAME devmo.glb.mozilla.net.
devmo.glb.mozilla.net.  97      IN      A       63.245.209.139

Received 141 bytes from 192.168.178.1#53 in 30 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59184
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.         IN      AAAA

;; AUTHORITY SECTION:
glb.mozilla.net.        300     IN      SOA     ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800

Received 99 bytes from 192.168.178.1#53 in 43 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13479
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.         IN      MX

;; AUTHORITY SECTION:
glb.mozilla.net.        300     IN      SOA     ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800

matthew zeier [:mrz]

Comment 83

•

14 years ago

> developer-mozilla-org.geo.mozilla.com. 3517 IN CNAME devmo.glb.mozilla.net.
> devmo.glb.mozilla.net.  97      IN      A       63.245.209.139

209.139 is San Jose.  Something in your system is still returning the wrong address but DNS is working. 

Nothing in /etc/hosts right?

Kadir Topal [:atopal]

Comment 84

•

14 years ago

I still see this (63.245.213.88) and caches should've been expired by now. An no, nothing in /etc/hosts

matthew zeier [:mrz]

Comment 85

•

14 years ago

A bit of a disconnect - DNS is returning the correct results for you but you're not seeing the right address.  What tool are you using to get 213.88?

Kadir Topal [:atopal]

Comment 86

•

14 years ago

the ping command

David Tenser [:djst]

Comment 87

•

14 years ago

*Just* got a service unavailable error when posting a forum reply at https://support.mozilla.com/en-US/forum/3/656442. In Sweden, one minute ago.


Output:

Trying "developer.mozilla.org"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24996
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;developer.mozilla.org.		IN	A

;; ANSWER SECTION:
developer.mozilla.org.	159	IN	CNAME	developer-mozilla-org.geo.mozilla.com.
developer-mozilla-org.geo.mozilla.com. 2904 IN CNAME devmo.glb.mozilla.net.
devmo.glb.mozilla.net.	27	IN	A	63.245.209.139

;; AUTHORITY SECTION:
glb.mozilla.net.	294	IN	NS	ns4-glb.mozilla.net.
glb.mozilla.net.	294	IN	NS	ns1-glb.mozilla.net.

;; ADDITIONAL SECTION:
ns1-glb.mozilla.net.	167	IN	A	63.245.208.15
ns4-glb.mozilla.net.	167	IN	A	63.245.212.25

Received 217 bytes from 213.80.98.2#53 in 7 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31179
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.		IN	AAAA

;; AUTHORITY SECTION:
glb.mozilla.net.	290	IN	SOA	ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800

Received 99 bytes from 213.80.98.2#53 in 12 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28539
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.		IN	MX

;; AUTHORITY SECTION:
glb.mozilla.net.	290	IN	SOA	ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800

Received 99 bytes from 213.80.98.2#53 in 12 ms

matthew zeier [:mrz]

Comment 88

•

14 years ago

This seems more app related then - you're hitting San Jose on that last query.

Tobias (:Tobbi) Markus

Comment 89

•

14 years ago

I'm still hitting Amsterdam (I'm located in Northern Germany). See my traceroute paste here:

Routenverfolgung zu support.mozilla.com [63.245.213.89] über maximal 30 Abschnitte:

  1     1 ms     1 ms     1 ms  192.168.1.1
  2    41 ms    41 ms    40 ms  217.0.119.143
  3    43 ms    43 ms    41 ms  217.0.86.6
  4    44 ms    46 ms    45 ms  hh-eb1-i.HH.DE.NET.DTAG.DE [62.154.32.230]
  5    44 ms    43 ms    44 ms  so-7-1.car2.Hamburg1.Level3.net [4.68.127.241]
  6    44 ms    43 ms    43 ms  ae-11-11.car1.Hamburg1.Level3.net [4.69.133.177]
  7    50 ms    50 ms    50 ms  ae-4-4.ebr1.Dusseldorf1.Level3.net [4.69.133.182]
  8    50 ms    50 ms    49 ms  ae-1-100.ebr2.Dusseldorf1.Level3.net [4.69.141.150]
  9    53 ms    53 ms    53 ms  ae-47-47.ebr1.Amsterdam1.Level3.net [4.69.143.205]
 10    54 ms    53 ms    53 ms  ae-12-51.car2.Amsterdam1.Level3.net [4.69.139.131]
 11    54 ms    54 ms    55 ms  212.72.43.14
 12    54 ms    54 ms    54 ms  92.60.240.130
 13    54 ms    54 ms    54 ms  sumo02.zlb.nl.mozilla.com [63.245.213.89]

Ablaufverfolgung beendet.

matthew zeier [:mrz]

Comment 90

•

14 years ago

For the second time I've taken Amsterdam out of the GLB pool.  I don't know why it put itself back in but it's out now.

James Socol [:jsocol, :james]

Assignee

Comment 91

•

14 years ago

Everyone in Europe:

Please keep track of timeouts (they may appear as blank pages, "Service Unavailable" or "Server is not responding" errors) over the next few days (give the change a little window to propagate first). If the timeout rate is noticeably higher or lower, please let us know here.

Kadir Topal [:atopal]

Comment 92

•

14 years ago

Okay, I don't see Europe in the traceroute any more, however SUMO is still giving me blank pages from time to time, but that's probably a different bug. Does anyone else still see "service unavailable" messages?

Pavel Cvrcek [:JasnaPaka]

Reporter

Comment 93

•

14 years ago

I worked with SUMO last week and I didn't see message "service unavailable" but many times I got blank page (as Kadir said). My location: Czech Republic, Central Europe.

Boersenfeger

Comment 94

•

14 years ago

(In reply to comment #93)
I have this Issue both, since my first post in this Bug-Tracker. Every Time I change an Article on SUMO or make a Translation and try to save it, this happened. When I have a look for the Page, where all Article shows, I often have this Message: "Service Unavailable
The service is temporarily unavailable. Please try again later."

Thomas Schwecherl

Comment 95

•

14 years ago

I don't see the "service unavailable" message any more - but a blank page instead (on large pages) or the request to download the page (after an action, e.g. saving a review). Location: Upper Austria.

David Tenser [:djst]

Comment 96

•

14 years ago

Via James: "The error message changed and so it's been difficult to get an answer on whether it's better or not. The test is: do you see blank pages more or less often than you saw "Service Unavailable?" It's the same error, just without the AMS Zeus error message."

Thomas, that download dialog is bug 549961. We're trying to figure out what we can do about that. 

These two bugs are both results of the current Tiki-based SUMO not scaling to meet the increased load from more users, more contributors, and more KB articles. We're trying to strike a balance between devoting most of our resources on building the new Django-based SUMO, but especially bug 549961 is enough of a problem right now that we need to spend cycles on fixing that before continuing with the next-gen SUMO. Stay tuned, and thanks everyone for your patience and understanding.

Kadir Topal [:atopal]

Comment 97

•

14 years ago

We will increase the apache timeout window on Thursday when SUMO is moved to it's own hardware, so please report back if that fixes the blank page issues. The change is tracked in bug 569412

Michele Rodaro [michro]

Comment 98

•

14 years ago

Hi from Italy,

I didn't get blank pages or the Save/Open file dialog (bug 549961) since two/three days when editing or approving articles.
I hope that this issue has been definitively fixed.

Boersenfeger

Comment 99

•

14 years ago

In the last few Days, I read and change nearly 20 Articles for the German SUMO. On every Change I click on Save, a white Page occurs, than I have to press the F5 Button and say OK to the following Question... It still happened for me. :-(

James Socol [:jsocol, :james]

Assignee

Updated

•

14 years ago

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → INCOMPLETE

Headers 15 years ago Pavel Cvrcek [:JasnaPaka] 5.65 KB, text/plain		Details
patch, moves sending wiki edit notifications to a gearman worker 14 years ago James Socol [:jsocol, :james] 15.26 KB, patch	paulc : review+	Details \| Diff \| Splinter Review
helpful testing patch 14 years ago James Socol [:jsocol, :james] 1.51 KB, patch		Details \| Diff \| Splinter Review
patch to patch 1, moves DB connecting/closing inside the function and adds debug output 14 years ago James Socol [:jsocol, :james] 8.16 KB, patch	laura : review+	Details \| Diff \| Splinter Review