Closed Bug 532498 Opened 15 years ago Closed 14 years ago

Error message "Service Unavailable" with every my edit

Categories

(support.mozilla.org :: Knowledge Base Software, task, P1)

Tracking

(Not tracked)

RESOLVED INCOMPLETE
Future

People

(Reporter: JasnaPaka, Assigned: jsocol)

References

Details

Attachments

(4 files)

When I try to save my edit on SUMO every time I get error page:

"Service Unavailable
The service is temporarily unavailable. Please try again later." 

When I submit post data on this error page again then all works fine. Chris Ilias said that maybe this problem is related to Amsterdam server because he doesn't see problem and I'm from Europe.

This problem was discussed in Contributors forum:
https://support.mozilla.com/en-US/forum/3/513605
Pavel, can you get and attach (as a plain text file) Live HTTP Headers [1] output of just these submission attempts?

[1] https://addons.mozilla.org/en-US/firefox/addon/3829

Thanks!
IT please check this out.

http://mozilla-uk.org/headers
Assignee: nobody → server-ops
Component: General → Server Operations
Product: support.mozilla.com → mozilla.org
QA Contact: general → mrz
Version: unspecified → other
Severity: normal → blocker
per jsocol's request. http://mozilla-uk.org/service.JPG
Assignee: server-ops → dmoore
I think Derek's fixed this..TMZ confirmed over IRC.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Attached file Headers
Looks like it still doesn't work for me. See my headers.
Assignee: dmoore → pcvrcek
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This appears to be a intermittent issue. 3 people in the EU can not reproduce. Fox2mike, could you look at this?
test article on prod where this can be tested without editing live docs. = https://support.mozilla.com/en-US/kb/tmztest?bl=n
re-assign to server-ops if you need IT attention.
Assignee: pcvrcek → server-ops
Assignee: server-ops → dmoore
I believe we have isolated the issue related to SUMO timouts in Amsterdam. Any client request (particularly edits) which took more than 10 seconds to complete could trigger a service timeout for other users of the site. We've removed the monitor which controlled this behavior and we've slightly expanded the monitoring timeouts in general.

I'll leave this bug open for now, as I'd like for everyone to confirm their issues have been resolved.
At this moment all works fine for me. Thanks!
Just had the error editing the Italian version of ((How to make Firefox the default browser)).
Thanks, Simone. I've investigated and confirmed the problem. We'll continue to work on it.
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; it; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5

Just had the error when I approved an edit I've made to the Italian version of ((Firefox consumes a lot of CPU resources)).

It happens even when I try to connect to the "all Knowledge Base articles" page
(after a refresh I can view the page).

Michele
Unchanged situation since several weeks:
"Service Unavailable" error after every(!) approving or saving of an article and most times when I try to open https://support.mozilla.com/kb/all+Knowledge+Base+articles or https://support.mozilla.com/kb/Localization Dashboard

Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
I can reproduce, just saw this error when I opened this page:
https://support.mozilla.com/en-US/kb/All%20Knowledge%20Base%20articles
Not editing anything.
Thanks for the input, everyone. We're still tracking problems with pages which take a significant amount of time to load (such as the All Articles link, above).
The Problem is in SUMO on all sites, that I translate or changed in the last week, and that were a lot. Im living in Germany, maybe the Server in Amsterdam is responsible here!? Sry, my English is bad. Have a nice Christmas all.
I get it pretty regularly (sl l10n, located in Munich). It has no negative effects, as far as I am concerned - to me, "it's just another process timing out on you".
Vito, I dumbed down the health check more.  Want to let this bake a bit before rolling this to other sites.
Whiteboard: [baking]
I'm going to call this baked and close it.  We'll use this same strategy for other sites.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
We're getting reports of this still happening. Has this been rolled out?
Yes, long time ago.  Still seeing it with sumo?
I >>NEVER<< can move the SUMO material from the staging area to the knowledge base without seeing this. The process finishes OK, I just need to refresh the page. 

It works for me, it's is just a drag.
Nothing changed here. See my First Post Comment 17!
I see the "Service Unavailable
The service is temporarily unavailable. Please try again later." Dialog every Time, when I wrote or change a Site on SUMO. I have to send the changed Information again, than it works.
Same here (Italy). It happens about 9 times out of 10.

Thought you're still working on this.
> Thought you're still working on this.

After comment 19 we let it sit for a week, didn't hear any complains and assumed the issue resolved.  No one's been working on it since largely because this bug was marked resolved.

I'll re-open and we'll re-investigate this week.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Everyone,

We've made more changes to improve the proxy stability between Amsterdam and San Jose. We have also migrated some prior experimental changes on non-SSL sumo to all SSL-enabled sites.

Please update here if you are still experiencing problems. It is particularly helpful if you include the URL where you encounter a timeout and the timestamp of when it occurred.
Hello,

This URL

https://support.mozilla.com/it/kb/all+Knowledge+Base+articles

always gives me "Connection reset"
Just had this error when editing the article ((*Eliminare i cookie))
This Site https://support.mozilla.com/de/kb/Article+list?style_mode=inproduct says
Service Unavailable

The service is temporarily unavailable. Please try again later.
Thank you for the specific feedback, everyone. There are several problems here, both on the San Jose and the Amsterdam side, and the specific examples are very useful for tracking them down.
I'm getting the problem *every* time I edit, or try to approve a change. It's so frustrating.

Anyway, the slowness of the site is painful.
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; it; rv:1.9.2) Gecko/20100115 Firefox/3.6

Always the same error.
Just had "Service Unavailable" saving the edits to the articles

https://support.mozilla.com/it/kb/*Improvvisa+apertura+di+molte+schede+o+finestre+in+Firefox

https://support.mozilla.com/it/kb/*Errore+nel+caricamento+di+un+sito+web

and every time I try to connect to https://support.mozilla.com/it/kb/all+Knowledge+Base+articles
Same here, getting this problem every time I edit (from Sweden). What's the status of this bug? What does [baking] mean?
This was happening to me three hours ago, grrr....
I can confirm comment 28, loading https://support.mozilla.com/it/kb/all+Knowledge+Base+articles always times out.

The other steps to reproduce require editing an article. Every time I save an edit, it times out.
https://support.mozilla.com/de/kb/*Firefox+als+Standardbrowser+festlegen+funktioniert+nicht?bl=n
and others, Problem held.
It is necessary to commit every Article, when this Problem occurs? So every Article, which I worked on, show this !
(Bad English, sorry)
Assignee: dmoore → jeremy.orem+bugs
oremj is going to poke at this today.  if we can't find a quick fix we'll turn down the GLB service and investigate more without impacting you.
Let's see if moving to zeus magically fixes this.
(In reply to comment #41)
Problem is not fixed. I changed this Article https://support.mozilla.com/de/kb/*Lesezeichen+zum+Internet+Explorer+exportieren?bl=n and its happened again.
We haven't moved it yet. I'll comment when it has been moved.
Looks like moving to zeus didn't fix the problem.
Found the root of this problem! Anytime a page is created it sends out a ton of e-mails and zeus eventually just sees this as timing out.

If I hit the webhead directly it doesn't time out, but my current page create is up to 240 seconds with no end in sight. This also explains why we don't see the issue in stage, it doesn't send e-mail.

I'll reassign to support and it will be up to them to figure out how to send e-mail faster (most likely needs to be asynchronous).
Assignee: jeremy.orem+bugs → nobody
Component: Server Operations → Knowledge Base Software
Product: mozilla.org → support.mozilla.com
QA Contact: mrz → kb-software
Version: other → unspecified
A quick workaround would be to turn off sending e-mails on page create until this is fixed.
re "Looks like moving to zeus didn't fix the problem." 

- it did not make it worse either. 

smo
We can't turn off those emails.  Ways to fix:

- Flush user output and then send the emails (hiding problems from the user)
- Move the emails out of the process, sample code here
http://gearman.org/index.php?id=php_-_mail_queue
but requires gearman.  Jeremy, shall I file a separate bug for IT for gearman for SUMO?
If IT is OK with it, it feels like Gearman is the way to go. Assigning to me, 1.5.3, Major. Any objections?
Assignee: nobody → james
Severity: blocker → critical
Priority: -- → P1
Target Milestone: --- → 1.5.3
(In reply to comment #50)
> Any objections?

Only cheerful excitement!
I'm okay with gearman, but I've heard rumors of the amo team switching to celery (http://ask.github.com/celery/). Should probably talk to them first.
(In reply to comment #52)
> I'm okay with gearman, but I've heard rumors of the amo team switching to
> celery (http://ask.github.com/celery/). Should probably talk to them first.

I asked in #amo and no one said they were moving. Also, I couldn't find PHP APIs (maybe I'm looking under the wrong names?) where Gearman has published APIs in a bunch of languages, including the PECL extension.
Sorry this took so long. I spent a long time trying to follow what Tiki does when sending these notifications, and I'm fairly sure I understand why its so slow. However, I couldn't duplicate that functionality in a reasonable amount of time without just letting Tiki do it, itself.

So, this patch basically does a copy-paste of the contents of sendWikiEmailNotifications() (in webroot/lib/notification/notificationemaillib.php) and puts it in a worker (in scripts/gearman/notification.php). To make this work, the worker needs to cd to webroot and include tiki-setup.php, which means that the worker needs a full checkout of SUMO to work. (I apologize for that but after a week in the rabbit hole, I needed to solve this.)

This adds a new configuration option to webroot/db/local.php.dist, namely $gearman_servers, an array. Each member of $gearman_servers is an array with 'host' and 'port' vars.

To run the worker, make sure gearmand is running, then just type
  php notifications.php -d

The worker will start a daemon (the '-d' bit) and write its PID to scripts/gearman/etc/notification.pid.

Now, edit a page (I'll attach something in a second that helps not send mail). The email should all still get sent. If you kill the worker process, and edit a page, the notifications will get sent out the next time you start the worker.

This bit of code is shockingly deep. I can understand how it would slow down the response so much. If Jeremy was correct that this is the pain point, this patch will go a long way toward alleviating that.
Attachment #434416 - Flags: review?(paulc)
This helped me test the notifications on real data. Shutting down the mail services might also work, if you can find a way to clear the queue before restarting them. (I couldn't, frustratingly. Still working on that.)

All this does is patch webroot/lib/webmail/htmlMimeMail.php (which at least _some_ things use to do their sending) to write out some information to the file /tmp/mailq instead of sending mail. It let me test with real data.
Comment on attachment 434416 [details] [diff] [review]
patch, moves sending wiki edit notifications to a gearman worker

Looks good. It was a lot easier than I expected. I did a diff of the sendWikiEmailNotification() function contents just to make sure all that deep code stayed the same and that looks fine too.

And I saw useful output in /tmp/mailtq, so your test file helped a lot! ;)


One thing I wasn't able to test is resuming. Perhaps I missed a step? Here's what I did:
* after seeing expected output in /tmp/mailq (in other words, things were working), I killed the worker and made another edit to the page.
* started the worker with |php notification.php -d| as before
* checked /tmp/mailq for an update

The update wasn't there. Maybe I'm missing something? An explicit "resume"? I'm cool with having another look at this if so.

However, emails get sent out of the main thread so for that purpose, r+.
Attachment #434416 - Flags: review?(paulc) → review+
r64841. Still talking to fox2mike in bug 551513 about getting it running.
Status: REOPENED → RESOLVED
Closed: 15 years ago14 years ago
Resolution: --- → FIXED
Whiteboard: [baking]
I'm getting this problem while adding a new translation every time with:

https://support.mozilla.com/tiki-edit_translation.php?locale=es&page=Firefox%20crashes%20when%20you%20exit%20it

And I don't think that this could be cause by mailing since this not involves any mail…
Depends on: 551513
Ran into a problem that I'd missed locally: occasionally instead of sending notifications, the worker will spew a bunch of HTML to the terminal and then die.

My theory is that the database server is closing the connection and the client is not trying to reopen it before sending queries. I need to verify this tomorrow but it's by far my best lead.

Assuming that's the case, what needs to happen: the DB connection needs to be made in the task function (see webroot/db/tiki-db.php for what happens). This probably means creating and destroying a tikilib object in the task as well. Quite possibly it also means putting that DB connection object into the global scope, and then creating all the objects that extend TikiLib. Joy.
Blocks: 555003
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This patch does a lot to the normal environment set up by tiki-setup.php:

1. Destroys a number of DB connection-dependent globals immediately after they're created.
2. Connects to the DB and recreates the globals when it needs to (when it receives a job).
3. Destroys the globals and disconnects again after the job is done.

It also adds a constant (DAEMON) that is TRUE if the script was run with the -d flag. It then uses that constant to control printing debug statements to the terminal (it only does if DAEMON is FALSE). The debugging statements are prettied-up for easy reading.
Attachment #436085 - Flags: review?(paulc)
Attachment #436085 - Flags: review?(paulc) → review?(laura)
Comment on attachment 436085 [details] [diff] [review]
patch to patch 1, moves DB connecting/closing inside the function and adds debug output

Code itself works (easier than I thought, too, yay!)...my only general comment is consider putting the debug info into an error log if you're running in DAEMON mode, as this may help to debug problems in prod down the track.
Attachment #436085 - Flags: review?(laura) → review+
(In reply to comment #61)
> (From update of attachment 436085 [details] [diff] [review])
> Code itself works (easier than I thought, too, yay!)...my only general comment
> is consider putting the debug info into an error log if you're running in
> DAEMON mode, as this may help to debug problems in prod down the track.

That's a good idea, but I'm going to hold off for now as I don't know the best place/way to write out logs, and Shyam's not here today.

r65138.

Pinging the on-call to reboot the worker.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Backed out in r65201.

I've spent a week or so on this, and while I've made progress, I'm also heading down the rabbit hole, which is a bad way to allocate our resources right now.

This has been a problem for a while. It will probably be a problem for a little while longer, but it will be fixed in Kitsune, and it's better for us to focus on that now.

I'm not WONTFIXing this, but it also can't block 1.5.3 any more.
Severity: critical → major
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 1.5.3 → Future
No longer blocks: 555003
I don't want to be arrogant, but maybe in the US you don't see this, but I think this problem is very big in Europe datacenter to leave it for Katsume to fix it (how many time until we can see it in production BTW?).

I'm getting reports from people that tries to help with SUMO translations, and everyone are having this issue while saving. And they lost their job while translating.

In my community we are few people that know about this problem, and, for the nature of SUMO, it's a wiki, we can't reach every people that wants to help, I think a lot of people has lost time and effort while saving because of this bug.
We realize that this is a huge pain, especially for localizers. You should know that we didn't just back this out of 1.5.3 because it was frustrating or we didn't feel like working on it. We did it because it was basically eating up all of our resources and preventing other work from getting done, and it looked like it would continue to do so. Especially if, like comment 58 indicates, our approach wouldn't actually solve everything.

SUMOdev is a two-person team right now. If one of us is completely dedicated to one, the other person can't get code reviewed, check in, get feedback, etc--at least not quickly enough for it to be helpful. While that's fine for a day or so, this was eating up all of my time for a week with no end in sight. Essentially every day spent on this bug delays Kitsune by a day right now.

Ultimately, it was a decision of resource allocation and how to best serve the SUMO project and community as a whole. While fixing this bug would be a good use of our time, there are higher priorities right now.

For an overview of the rough plan for the next 6-9 months, you can look here:
* https://wiki.mozilla.org/Support/Kitsune_Milestones
* https://wiki.mozilla.org/Support/SUMOdev_Meeting_Notepad/2010_Q1#Mar_30.2C_2010

I'll try to write up a blog post about what we've been doing--I know it hasn't been highly visible (though check http://support-stage-new.mozilla.com/en/search hopefully sometime tomorrow) or seem very important--it's just search results. But it's actually hugely important in laying the foundations for the new platform.
as for me "... this is a huge pain, especially for localizers ..." does not hold, I mean, as long I can assume that "it´s not a bug, it´s a feature" (g), I´m doing fine (...press "reload" and continue). Seems like, unfortunately, every new member of the SUMO is bound to hit this pothole sooner or later...

Goes without saying, that your work is much appreciated.

regards

smo
Sure, you had not enough manpower. But new Translaters, who loosed there Stuff are not amused and went off from Translating Work on SUMO. Is it better than? Ive never loose my Text on this error. Maybe U can warn the translaters, that they should save their text before they click on send! Nice Easterweekend all.
As far as I know, the submit still works and no data is lost.  It just seems that way because the server gets so caught up processing the request that it throws an error message.  Would it help if we put a notice above the submit button saying that you may see an error but it's just because the server is working on your request and they don't have to resubmit anything.
There ?may? be a situation, leading to loss of data, namely (just a scenario of what happened to me some time ago) that you get a fresh file, localize it and send it off >>without making a staging copy<< (i.e. translated to a beginner's mindset >>knowing<< there's something like a staging and a production copy). 

Checking for that would make sense in any case and probably keep some feathers unruffled. 

Re msgbox on timeout, something like "...in case you get XYZ message, press reload to continue" would make sense - it would indicate the problöem is known, and help the user climb over it.
Yes, I'd also suggest we inform users that we know about the problem and provide them with a workaround. Generally I'd also say that this problem is unacceptable, but in regard to the situation trying to fix it would mean spending ressources on a dying piece of software. Instead we should try to bear with it just a little bit longer until this part of the KB is replaced by Django-Code as well. 

I know this isn't really satisfying and being from Europe I hate that as much as you, but the alternative is even worse :/
We currently have 1.5 people working on SUMO web development (James x 1, Paul x 0.5) and James spent approximately a full week trying to nail this down (see comment 63). We can't justify having James or Paul spend another week or more going around in circles and further delaying the development of SUMO 2.0.

mrz: Is there *any* way this problem can at least be reduced with some sort of IT quickfix -- e.g. more DB mirrors, more RAM, faster servers, etc? If so, we really should look into that now so we can improve the situation.

This really hurts the usability of SUMO for anyone editing pages, and seems to be happening mostly in Europe (although I'm not 100% sure about that).
Just wanted to add that it's not only editing anymore, I also got the same message when I used the search and when I replied to a forum thread, again from Europe.
> mrz: Is there *any* way this problem can at least be reduced with some sort of
> IT quickfix -- e.g. more DB mirrors, more RAM, faster servers, etc? If so, we
> really should look into that now so we can improve the situation.
> 
> This really hurts the usability of SUMO for anyone editing pages, and seems to
> be happening mostly in Europe (although I'm not 100% sure about that).

Wasn't aware this was still an issue!

It has to do with the proxy setup we have in Amsterdam.  US-based hits aren't having any problems.  The only quick fix is to disable the Amsterdam proxy and force everyone to come to the US.

Should we do that?
Could we try that to see if it's a net positive? Would this only affect logged in sessions? If so, it seems like the right thing to do (given my lack of insights about the potential downsides of such a change).
This will affect all users.  

I took Amsterdam out of the GLB pool.  Let me know if there are still issues (and if there are, please let me know what IP address your computer is getting for support.mozilla.com).
(In reply to comment #75)
> This will affect all users.  
> 
> I took Amsterdam out of the GLB pool.  Let me know if there are still issues
> (and if there are, please let me know what IP address your computer is getting
> for support.mozilla.com).

There are definitely still issues with Service Unavailable:

support.mozilla.com has the IP 63.245.213.89 for me.
you probably have to wait for dns ttls to expire.
(In reply to comment #58)
> I'm getting this problem while adding a new translation every time with:
> 
> https://support.mozilla.com/tiki-edit_translation.php?locale=es&page=Firefox%20crashes%20when%20you%20exit%20it
> 
> And I don't think that this could be cause by mailing since this not involves
> any mail…

I think this is bug 549961.
Still seeing those Amsterdam IPs (63.245.213.0/24)?
What I get is: 63.245.213.89 not sure if that's Amsterdam as well
Anything in 63.245.213.89 is.  Are you on a unix-like machine?  Can you send me the output of "host -v developer.mozilla.org" ?
sure

Trying "developer.mozilla.org"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13716
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;developer.mozilla.org.         IN      A

;; ANSWER SECTION:
developer.mozilla.org.  517     IN      CNAME   developer-mozilla-org.geo.mozilla.com.
developer-mozilla-org.geo.mozilla.com. 3517 IN CNAME devmo.glb.mozilla.net.
devmo.glb.mozilla.net.  97      IN      A       63.245.209.139

Received 141 bytes from 192.168.178.1#53 in 30 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59184
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.         IN      AAAA

;; AUTHORITY SECTION:
glb.mozilla.net.        300     IN      SOA     ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800

Received 99 bytes from 192.168.178.1#53 in 43 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13479
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.         IN      MX

;; AUTHORITY SECTION:
glb.mozilla.net.        300     IN      SOA     ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800
> developer-mozilla-org.geo.mozilla.com. 3517 IN CNAME devmo.glb.mozilla.net.
> devmo.glb.mozilla.net.  97      IN      A       63.245.209.139

209.139 is San Jose.  Something in your system is still returning the wrong address but DNS is working. 

Nothing in /etc/hosts right?
I still see this (63.245.213.88) and caches should've been expired by now. An no, nothing in /etc/hosts
A bit of a disconnect - DNS is returning the correct results for you but you're not seeing the right address.  What tool are you using to get 213.88?
the ping command
*Just* got a service unavailable error when posting a forum reply at https://support.mozilla.com/en-US/forum/3/656442. In Sweden, one minute ago.


Output:

Trying "developer.mozilla.org"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24996
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;developer.mozilla.org.		IN	A

;; ANSWER SECTION:
developer.mozilla.org.	159	IN	CNAME	developer-mozilla-org.geo.mozilla.com.
developer-mozilla-org.geo.mozilla.com. 2904 IN CNAME devmo.glb.mozilla.net.
devmo.glb.mozilla.net.	27	IN	A	63.245.209.139

;; AUTHORITY SECTION:
glb.mozilla.net.	294	IN	NS	ns4-glb.mozilla.net.
glb.mozilla.net.	294	IN	NS	ns1-glb.mozilla.net.

;; ADDITIONAL SECTION:
ns1-glb.mozilla.net.	167	IN	A	63.245.208.15
ns4-glb.mozilla.net.	167	IN	A	63.245.212.25

Received 217 bytes from 213.80.98.2#53 in 7 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31179
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.		IN	AAAA

;; AUTHORITY SECTION:
glb.mozilla.net.	290	IN	SOA	ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800

Received 99 bytes from 213.80.98.2#53 in 12 ms
Trying "devmo.glb.mozilla.net"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28539
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;devmo.glb.mozilla.net.		IN	MX

;; AUTHORITY SECTION:
glb.mozilla.net.	290	IN	SOA	ns.mozilla.org. sysadmins.mozilla.org. 2010042100 10800 3600 604800 1800

Received 99 bytes from 213.80.98.2#53 in 12 ms
This seems more app related then - you're hitting San Jose on that last query.
I'm still hitting Amsterdam (I'm located in Northern Germany). See my traceroute paste here:

Routenverfolgung zu support.mozilla.com [63.245.213.89] über maximal 30 Abschnitte:

  1     1 ms     1 ms     1 ms  192.168.1.1
  2    41 ms    41 ms    40 ms  217.0.119.143
  3    43 ms    43 ms    41 ms  217.0.86.6
  4    44 ms    46 ms    45 ms  hh-eb1-i.HH.DE.NET.DTAG.DE [62.154.32.230]
  5    44 ms    43 ms    44 ms  so-7-1.car2.Hamburg1.Level3.net [4.68.127.241]
  6    44 ms    43 ms    43 ms  ae-11-11.car1.Hamburg1.Level3.net [4.69.133.177]
  7    50 ms    50 ms    50 ms  ae-4-4.ebr1.Dusseldorf1.Level3.net [4.69.133.182]
  8    50 ms    50 ms    49 ms  ae-1-100.ebr2.Dusseldorf1.Level3.net [4.69.141.150]
  9    53 ms    53 ms    53 ms  ae-47-47.ebr1.Amsterdam1.Level3.net [4.69.143.205]
 10    54 ms    53 ms    53 ms  ae-12-51.car2.Amsterdam1.Level3.net [4.69.139.131]
 11    54 ms    54 ms    55 ms  212.72.43.14
 12    54 ms    54 ms    54 ms  92.60.240.130
 13    54 ms    54 ms    54 ms  sumo02.zlb.nl.mozilla.com [63.245.213.89]

Ablaufverfolgung beendet.
For the second time I've taken Amsterdam out of the GLB pool.  I don't know why it put itself back in but it's out now.
Everyone in Europe:

Please keep track of timeouts (they may appear as blank pages, "Service Unavailable" or "Server is not responding" errors) over the next few days (give the change a little window to propagate first). If the timeout rate is noticeably higher or lower, please let us know here.
Okay, I don't see Europe in the traceroute any more, however SUMO is still giving me blank pages from time to time, but that's probably a different bug. Does anyone else still see "service unavailable" messages?
I worked with SUMO last week and I didn't see message "service unavailable" but many times I got blank page (as Kadir said). My location: Czech Republic, Central Europe.
(In reply to comment #93)
I have this Issue both, since my first post in this Bug-Tracker. Every Time I change an Article on SUMO or make a Translation and try to save it, this happened. When I have a look for the Page, where all Article shows, I often have this Message: "Service Unavailable
The service is temporarily unavailable. Please try again later."
I don't see the "service unavailable" message any more - but a blank page instead (on large pages) or the request to download the page (after an action, e.g. saving a review). Location: Upper Austria.
Via James: "The error message changed and so it's been difficult to get an answer on whether it's better or not. The test is: do you see blank pages more or less often than you saw "Service Unavailable?" It's the same error, just without the AMS Zeus error message."

Thomas, that download dialog is bug 549961. We're trying to figure out what we can do about that. 

These two bugs are both results of the current Tiki-based SUMO not scaling to meet the increased load from more users, more contributors, and more KB articles. We're trying to strike a balance between devoting most of our resources on building the new Django-based SUMO, but especially bug 549961 is enough of a problem right now that we need to spend cycles on fixing that before continuing with the next-gen SUMO. Stay tuned, and thanks everyone for your patience and understanding.
We will increase the apache timeout window on Thursday when SUMO is moved to it's own hardware, so please report back if that fixes the blank page issues. The change is tracked in bug 569412
Hi from Italy,

I didn't get blank pages or the Save/Open file dialog (bug 549961) since two/three days when editing or approving articles.
I hope that this issue has been definitively fixed.
In the last few Days, I read and change nearly 20 Articles for the German SUMO. On every Change I click on Save, a white Page occurs, than I have to press the F5 Button and say OK to the following Question... It still happened for me. :-(
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: