Closed
Bug 254632
Opened 21 years ago
Closed 18 years ago
Delay addon releases until mirrors sync
Categories
(addons.mozilla.org Graveyard :: Public Pages, defect)
addons.mozilla.org Graveyard
Public Pages
Tracking
(Not tracked)
RESOLVED
FIXED
3.2
People
(Reporter: Bugzilla-alanjstrBugs, Assigned: wenzel)
References
Details
Attachments
(3 files)
The mirrors take between 10 and 30 minutes to sync, depending on the mirror.
Unfortunately, when we approve something, it appears on the site right away. We
should set up some sort of queing system that waits 30 minutes before posting to
the front page. We can add a column to t_approvallog to indicate whether
t_version has been updated or not.
We should ask BMO how they handle this stuff.
Comment 1•21 years ago
|
||
BMO doesn't use the mirrors, IIRC. Personally, IMO, 10-30minutes delay compared
to say, what could be, hours is not worth the time to set up such a convoluted
sounding system to filter new records out. Keeping in mind, the site does not
itself ever touch the approvallog to get info from there. That's record keeping
for blame on the editor staff mostly. So having to touch the approvallog would
be yet another join on already complex queries, which I'm not in favor of.
I'd much rather work on reducing the delay involved (some of which is between
rodan and ftp-stage, which'll be ironed out between Bug 252676 and Bug 254925)
than try to be perfect and avoid the occasional 404, which if a mirror fails to
rsync would happen anyway.
Status: NEW → RESOLVED
Closed: 21 years ago
Hardware: PC → All
Resolution: --- → WONTFIX
Comment 2•21 years ago
|
||
fwiw, we usually put bugzilla tarballs up about an hour before we update the
links on the website, for just that reason. On the other hand, our releases are
infrequent enough, they're all uploaded by hand anyway. (and so is the website
update)
Updated•21 years ago
|
Component: Update → Developers
Product: mozilla.org → Update
Version: other → unspecified
*** Bug 298773 has been marked as a duplicate of this bug. ***
I think we should consider this. Many, many users are affected by this - I'd say it's probably our biggest complaint. It's very unprofessional - especially for a site of our size.
Reopening, reassigning to default and changing to enhancement.
Severity: normal → enhancement
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: Links appear on front page before mirrors sync → Delay addon releases until mirrors sync
Assignee: bugtrap → nobody
Status: REOPENED → NEW
QA Contact: mozilla.update → developers
Comment 5•19 years ago
|
||
I thought the app already did this... did it get implemented on another bug? (I'm pretty sure it waits 30 minutes after approval before listing it).
If we had an "ok to list" flag, it would be pretty easy to have a cron job periodically check all the mirrors for new approvals and flag them as displayable once they all have it...
(In reply to comment #5)
> I thought the app already did this... did it get implemented on another bug?
> (I'm pretty sure it waits 30 minutes after approval before listing it).
>
Confirming that this does NOT happen. I'm able to reproduce not a valid install package errors.
I don't remember it ever being implemented. Maybe it was, maybe it wasn't, but it isn't now :)
Updated•19 years ago
|
Assignee: nobody → morgamic
Severity: enhancement → critical
Depends on: 314019
Target Milestone: --- → 2.1
*** Bug 299377 has been marked as a duplicate of this bug. ***
*** Bug 297712 has been marked as a duplicate of this bug. ***
Comment 9•19 years ago
|
||
*** Bug 357275 has been marked as a duplicate of this bug. ***
Updated•19 years ago
|
Assignee: morgamic → nobody
Comment 10•19 years ago
|
||
This is indeed a pain in the butt. Vlad wrote a script for me to check that the file was present and identical on all the mirrors in the rotation, which I'll attach here. Some sort of "waiting for mirrors" state might be useful (as would more frequent rsyncing of by the mirrors) as a way to mitigate this.
Severity: critical → major
Component: Developer Pages → Public Pages
Priority: P1 → --
Target Milestone: 2.1 → 3.0
Comment 11•19 years ago
|
||
In case it's helpful in pursuing this!
Comment 12•19 years ago
|
||
We will end up fixing this somewhat laterally, by just not using mirrors, yay.
Flags: blocking-remora-launch+
| Assignee | ||
Comment 13•19 years ago
|
||
Yup, in AMOv3, this is fixed: Add-ons are not pushed into the mirror network anymore.
Status: NEW → RESOLVED
Closed: 21 years ago → 19 years ago
Resolution: --- → FIXED
Comment 14•19 years ago
|
||
We had to go back to storing public files on the mirrors. Our proposed solution (by justdave) is to serve the files from the webheads for the first 20 minutes (probably by datestatuschanged) and then redirect to the mirrors.
Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•18 years ago
|
Target Milestone: 3.0 → 3.2
Comment 15•18 years ago
|
||
Just wanted to confirm as a problem, and, IMHO, a pretty serious one.
I believe that not only AMO visitors will get the "Download Error" pop-up when trying to install, but also many existing users - they will see "Update available" notification and will get the error when trying to update. With popular extensions having a lot of users, this can actually happen to quite a lot of people over the 30 minute outage.
I think something like this has been already suggested, but maybe this can be solved on an SQL query level - 'if the time difference between the approval time and current time is less than 40 minutes, serve the previous version on AMO'.
Updated•18 years ago
|
Assignee: nobody → sancus
Status: REOPENED → NEW
| Assignee | ||
Comment 17•18 years ago
|
||
Absolutely.
I will go with Justin's suggestion of serving the file from the app cluster for a short period of time (30 minutes, maybe) and only after that switching to the mirror system. Sounds like a pretty straightforward thing to do.
| Assignee | ||
Comment 18•18 years ago
|
||
Alright, this patch should do the trick: If the status was changed more than 30 minutes ago, we redirect to the releases mirror, otherwise, we serve the file straight up from the application (shaken, not stirred).
Please review.
Attachment #286281 -
Flags: review?(morgamic)
| Reporter | ||
Comment 19•18 years ago
|
||
1) Can you change bug status to Assigned?
2) Is 30 minutes enough?
| Assignee | ||
Comment 20•18 years ago
|
||
1) yes, if that makes you feel better :)
2) I would assume so, considering the OP said 10-30, and Justin suggested 20.
Reed, do you know how long mirror propagation will take from AMO? I assume there is a cronjob running and doing some sort of rsync? Can you comment on if 30 minuts will suffice for mirror propagation?
Status: NEW → ASSIGNED
Comment 21•18 years ago
|
||
(In reply to comment #20)
> Reed, do you know how long mirror propagation will take from AMO? I assume
> there is a cronjob running and doing some sort of rsync? Can you comment on if
> 30 minutes will suffice for mirror propagation?
I don't have those numbers available to me to give to you, but I bet somebody in IT can help you with that. CC'ing a few folks that can hopefully better answer your question.
Comment 22•18 years ago
|
||
The rsync between the netapp backing store that addons manages and stage.mozilla.org happens every 5 minutes. OSL and TDS (our primary relay mirrors) both pull from stage every 10 minutes. The other mirrors all *should* pull from them every 15 minutes. You can monitor how far mirrors are lagged from stage by looking at https://nagios.mozilla.org/ftplag/ (use guest/guest for the login). Clicking on a time will give you a graph of the last 24 hours. You can get the last week by changing "daily" to "weekly" in the URL to the graph.
| Reporter | ||
Comment 23•18 years ago
|
||
> 2) I would assume so, considering the OP said 10-30, and Justin suggested 20.
I am the OP and I didn't have the hard facts.
Dave -
Thanks for the stats. It looks like there are spikes of 40 to 60 minutes sometimes(or worse for ftp.scarlet.be) for releases.m.o. I don't know why there's only one machine under ftp.m.o, but I've been out of the loop.
Comment 24•18 years ago
|
||
ftp.m.o is 2.2 TB now. Nobody else has the disk space to mirror it anymore. :)
| Assignee | ||
Comment 25•18 years ago
|
||
It seems like 30 minutes is a pretty decent average, though there are peaks where it takes longer than 60 minutes. Can we live with these, or do we want to increase the time to 60 or even 90 minutes?
Fwiw, the time is configurable.
Comment 26•18 years ago
|
||
I was pointing out on IRC that it would be pretty trivial to just have a flag in the database that says whether the file has made it to the mirrors or not and just have a cron job check all the files that don't have that flag set yet periodically, and set it as soon as they show up everywhere. I have a script already that does something similar that we've been using to check the mime types when people complain about a server sending the wrong mime type on a file. I was asked to upload it here. It turns out that this script is actually pretty similar to vlad's that's already posted here, except that mine actually looks up the IP addresses in the releases pool (making it forward-compatible) instead of using a static list. (It's also Perl instead of Bash)
Comment 27•18 years ago
|
||
We have two options here:
1) don't even show the link until mirrors are available
2) show the link but use the app servers to serve the file until mirrors have it
When it comes to checking:
1) use an arbitrarily long time window (30-60 min)
2) use a script that checks individual releases.m.o mirrors for the file and flip a bit when all check out (just like sentry!)
So... my two cents:
1) We should show the link and use the app servers for the 30 min window... thought about load, and this is not a high risk
2) The sentry script is a good idea, but I am not sure if the added complexity (cron, db change, etc.) is worth it -- what is wrong with using the time window if that's simpler?
Comment 28•18 years ago
|
||
Comment on attachment 286281 [details] [diff] [review]
patch delaying mirror redirect for 30 minutes
I think this is acceptable as a stop-gap. Not sure if we want to implement a full sentry-like daemon just for this when we can just fallback on the local FS temporarily. Thoughts?
Attachment #286281 -
Flags: review?(morgamic) → review+
Comment 29•18 years ago
|
||
My vote: let's see how much this helps before we invest in a sentry. Do we have nagios tracking of mirror latency for this stuff? Or stats on the download rates over the first 30/60/90 mins, to model the app-server load before we deploy this?
(Are we using X-Send-File now to avoid php-copying of XPIs shipped from the app server?)
Comment 30•18 years ago
|
||
(In reply to comment #29)
> Do we have nagios tracking of mirror latency for this stuff?
Yes, and there's even graphs. See comment 22.
And over the last 4 or 5 days, there's been several mirrors running 90 to 120 minutes frequently because of network issues of some sort.
| Assignee | ||
Comment 31•18 years ago
|
||
(In reply to comment #30)
> And over the last 4 or 5 days, there's been several mirrors running 90 to 120
> minutes frequently because of network issues of some sort.
I wouldn't mind setting the delay time to 2 hours or so, then, if our app servers can handle it (which I think they can, since the vast majority will be served from the mirrors, and this only kicks in with updated add-ons).
Comment 32•18 years ago
|
||
Then AMO will get the initial hit when a popular add-on is updated - rather than distributing it on the mirrors. My logs indicate that the download numbers in the first few hours after an update are huge which is why I stopped linking to a copy on my own server while waiting for AMO to update. But maybe in terms of regular AMO traffic this really isn't too bad.
Comment 33•18 years ago
|
||
(In reply to comment #32)
> Then AMO will get the initial hit when a popular add-on is updated - rather
> than distributing it on the mirrors. My logs indicate that the download numbers
> in the first few hours after an update are huge which is why I stopped linking
> to a copy on my own server while waiting for AMO to update. But maybe in terms
> of regular AMO traffic this really isn't too bad.
>
Could delay updating the information that indicates to clients that the add-on has an update (can't remember what it's called - update.rdf?)
Also delay showing the add-on in the Updated add-ons rss feed?
| Assignee | ||
Comment 34•18 years ago
|
||
Comment on attachment 286281 [details] [diff] [review]
patch delaying mirror redirect for 30 minutes
Delay code is committed to SVN r7910.
| Assignee | ||
Updated•18 years ago
|
Keywords: push-needed
Comment 36•18 years ago
|
||
Fred, this doesn't work -- in prod it is still not there -- do you think the file redirects are still being cached?
| Assignee | ||
Comment 37•18 years ago
|
||
Are you sure? Was there an add-on with recently published files that leads to a file-not-found error? I don't think they are still cached, and if anything, the delay time should be over for most of the redirects that could be cached anyway.
Comment 38•18 years ago
|
||
morgamic doesn't read his IMs!
I remembered Fred saying that the admin panel doesn't change datestatuschanged when manually pushing a file public. Both of the instances we've had issues with this today were pushed via the admin panel.
| Assignee | ||
Comment 39•18 years ago
|
||
Thanks for clarifying this, Justin! Is there a reason for us not to add this datestatuschanged update to the admin panel also?
Comment 40•18 years ago
|
||
We should, it will just require some additional logic of making sure that a specific status was changed before updating it, otherwise all the files would be changed everytime the form is submitted, which would mess up sorting for other pages.
Filed bug 404411
| Assignee | ||
Comment 41•18 years ago
|
||
With the closing of bug 404411 (and its pending push), the mirror release delay will be functional. Closing FIXED.
If problems appear, please file a new bug. Thanks.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago → 18 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•