Closed Bug 412798 Opened 17 years ago Closed 17 years ago

MDC Japan access to developer.mozilla.org blocked again.

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: potappo, Assigned: oremj)

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11
Build Identifier: 

MDC Japan access to developer.mozilla.org blocked again. Before:Bug 392468
MDC Japan hosts  a  bot program for a new document translation in MDC, but it can't work.
http://mdc.mozilla.gr.jp/bot/entoja/enja.cgi?showlang=en


Reproducible: Always

Steps to Reproduce:
1.Exec a bot(submit from http://mdc.mozilla.gr.jp/bot/entoja/enja.cgi).
Actual Results:  
The response is an error.

Expected Results:  
The bot works.
Assignee: nobody → server-ops
Component: Administration → Server Operations
Product: Mozilla Developer Center → mozilla.org
QA Contact: administration → justin
Version: unspecified → other
Removing critical-ness, because we don't need to be paging people for this.

potappo: is the bot still throttled to every 10 sec?
Severity: critical → normal
I remember Jeremy blocking someone spidering mdc a few days back, since it was killing the cluster.  CCing him on the bug.  What ip does this bot come from?
Assignee: server-ops → aravind
Yes, I did.  The bot hitting us wasn't rate limited, so it ended up taking down all three boxes.
Is the bot doing something that keeps its requests from serviced out of the netscaler every time?  Is it hitting a lot of pages that nobody else hits?

oremj: if you can let us know (maybe by a post to dev-mdc) when you have to block someone, it would be great; MDC admins tend to be the first line of contact for people suffering from such blocks, and I don't think we can look them up anywhere at present.
I'm pretty sure the bot is just hitting a lot of pages that aren't in cache.  Yes, I can either e-mail a list or put the ips on a wiki page somewhere.
Can we just start a low-frequency crawl of the site, such that everything gets into the cache and stays there until it's changed?  I'm pretty sure that's the state other large-scale wikis aim for, and it means that for non-logged-in users the load on the systems is related to the rate of change, and not the rate of access.

(If the bot's logged in, then we should make it not be unless it needs to edit, for sure.)
# please add me into CC > potappo.

(In reply to comment #1)
> potappo: is the bot still throttled to every 10 sec?

i've not modified our code from that time. so, YES.

(In reply to comment #5)
> I'm pretty sure the bot is just hitting a lot of pages that aren't in cache. 
> Yes, I can either e-mail a list or put the ips on a wiki page somewhere.

To reduce an affect to the server, our main bot (page update log db) queries with action=raw. MW returns the raw page data (as in DB) with this option, so not html.
I've commented out the cron job for the updater of log db v.2. So, the rate from our rat1.mozilla.gr.jp would bring down, i think.
Assignee: aravind → oremj
Status: UNCONFIRMED → NEW
Ever confirmed: true
Block removed.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.