Closed Bug 254632 Opened 21 years ago Closed 18 years ago

Delay addon releases until mirrors sync

Categories

(addons.mozilla.org Graveyard :: Public Pages, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Bugzilla-alanjstrBugs, Assigned: wenzel)

References

Details

Attachments

(3 files)

The mirrors take between 10 and 30 minutes to sync, depending on the mirror. Unfortunately, when we approve something, it appears on the site right away. We should set up some sort of queing system that waits 30 minutes before posting to the front page. We can add a column to t_approvallog to indicate whether t_version has been updated or not. We should ask BMO how they handle this stuff.
BMO doesn't use the mirrors, IIRC. Personally, IMO, 10-30minutes delay compared to say, what could be, hours is not worth the time to set up such a convoluted sounding system to filter new records out. Keeping in mind, the site does not itself ever touch the approvallog to get info from there. That's record keeping for blame on the editor staff mostly. So having to touch the approvallog would be yet another join on already complex queries, which I'm not in favor of. I'd much rather work on reducing the delay involved (some of which is between rodan and ftp-stage, which'll be ironed out between Bug 252676 and Bug 254925) than try to be perfect and avoid the occasional 404, which if a mirror fails to rsync would happen anyway.
Status: NEW → RESOLVED
Closed: 21 years ago
Hardware: PC → All
Resolution: --- → WONTFIX
fwiw, we usually put bugzilla tarballs up about an hour before we update the links on the website, for just that reason. On the other hand, our releases are infrequent enough, they're all uploaded by hand anyway. (and so is the website update)
Component: Update → Developers
Product: mozilla.org → Update
Version: other → unspecified
*** Bug 298773 has been marked as a duplicate of this bug. ***
I think we should consider this. Many, many users are affected by this - I'd say it's probably our biggest complaint. It's very unprofessional - especially for a site of our size. Reopening, reassigning to default and changing to enhancement.
Severity: normal → enhancement
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: Links appear on front page before mirrors sync → Delay addon releases until mirrors sync
Assignee: bugtrap → nobody
Status: REOPENED → NEW
QA Contact: mozilla.update → developers
I thought the app already did this... did it get implemented on another bug? (I'm pretty sure it waits 30 minutes after approval before listing it). If we had an "ok to list" flag, it would be pretty easy to have a cron job periodically check all the mirrors for new approvals and flag them as displayable once they all have it...
(In reply to comment #5) > I thought the app already did this... did it get implemented on another bug? > (I'm pretty sure it waits 30 minutes after approval before listing it). > Confirming that this does NOT happen. I'm able to reproduce not a valid install package errors. I don't remember it ever being implemented. Maybe it was, maybe it wasn't, but it isn't now :)
Assignee: nobody → morgamic
Severity: enhancement → critical
Depends on: 314019
Target Milestone: --- → 2.1
*** Bug 299377 has been marked as a duplicate of this bug. ***
*** Bug 297712 has been marked as a duplicate of this bug. ***
Priority: -- → P1
*** Bug 357275 has been marked as a duplicate of this bug. ***
Assignee: morgamic → nobody
This is indeed a pain in the butt. Vlad wrote a script for me to check that the file was present and identical on all the mirrors in the rotation, which I'll attach here. Some sort of "waiting for mirrors" state might be useful (as would more frequent rsyncing of by the mirrors) as a way to mitigate this.
Severity: critical → major
Component: Developer Pages → Public Pages
Priority: P1 → --
Target Milestone: 2.1 → 3.0
In case it's helpful in pursuing this!
We will end up fixing this somewhat laterally, by just not using mirrors, yay.
Flags: blocking-remora-launch+
Yup, in AMOv3, this is fixed: Add-ons are not pushed into the mirror network anymore.
Status: NEW → RESOLVED
Closed: 21 years ago19 years ago
Resolution: --- → FIXED
We had to go back to storing public files on the mirrors. Our proposed solution (by justdave) is to serve the files from the webheads for the first 20 minutes (probably by datestatuschanged) and then redirect to the mirrors. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 3.0 → 3.2
Just wanted to confirm as a problem, and, IMHO, a pretty serious one. I believe that not only AMO visitors will get the "Download Error" pop-up when trying to install, but also many existing users - they will see "Update available" notification and will get the error when trying to update. With popular extensions having a lot of users, this can actually happen to quite a lot of people over the 30 minute outage. I think something like this has been already suggested, but maybe this can be solved on an SQL query level - 'if the time difference between the approval time and current time is less than 40 minutes, serve the previous version on AMO'.
Assignee: nobody → sancus
Status: REOPENED → NEW
Fred can you pick this up?
Assignee: sancus → fwenzel
Absolutely. I will go with Justin's suggestion of serving the file from the app cluster for a short period of time (30 minutes, maybe) and only after that switching to the mirror system. Sounds like a pretty straightforward thing to do.
Alright, this patch should do the trick: If the status was changed more than 30 minutes ago, we redirect to the releases mirror, otherwise, we serve the file straight up from the application (shaken, not stirred). Please review.
Attachment #286281 - Flags: review?(morgamic)
1) Can you change bug status to Assigned? 2) Is 30 minutes enough?
1) yes, if that makes you feel better :) 2) I would assume so, considering the OP said 10-30, and Justin suggested 20. Reed, do you know how long mirror propagation will take from AMO? I assume there is a cronjob running and doing some sort of rsync? Can you comment on if 30 minuts will suffice for mirror propagation?
Status: NEW → ASSIGNED
(In reply to comment #20) > Reed, do you know how long mirror propagation will take from AMO? I assume > there is a cronjob running and doing some sort of rsync? Can you comment on if > 30 minutes will suffice for mirror propagation? I don't have those numbers available to me to give to you, but I bet somebody in IT can help you with that. CC'ing a few folks that can hopefully better answer your question.
The rsync between the netapp backing store that addons manages and stage.mozilla.org happens every 5 minutes. OSL and TDS (our primary relay mirrors) both pull from stage every 10 minutes. The other mirrors all *should* pull from them every 15 minutes. You can monitor how far mirrors are lagged from stage by looking at https://nagios.mozilla.org/ftplag/ (use guest/guest for the login). Clicking on a time will give you a graph of the last 24 hours. You can get the last week by changing "daily" to "weekly" in the URL to the graph.
> 2) I would assume so, considering the OP said 10-30, and Justin suggested 20. I am the OP and I didn't have the hard facts. Dave - Thanks for the stats. It looks like there are spikes of 40 to 60 minutes sometimes(or worse for ftp.scarlet.be) for releases.m.o. I don't know why there's only one machine under ftp.m.o, but I've been out of the loop.
ftp.m.o is 2.2 TB now. Nobody else has the disk space to mirror it anymore. :)
It seems like 30 minutes is a pretty decent average, though there are peaks where it takes longer than 60 minutes. Can we live with these, or do we want to increase the time to 60 or even 90 minutes? Fwiw, the time is configurable.
I was pointing out on IRC that it would be pretty trivial to just have a flag in the database that says whether the file has made it to the mirrors or not and just have a cron job check all the files that don't have that flag set yet periodically, and set it as soon as they show up everywhere. I have a script already that does something similar that we've been using to check the mime types when people complain about a server sending the wrong mime type on a file. I was asked to upload it here. It turns out that this script is actually pretty similar to vlad's that's already posted here, except that mine actually looks up the IP addresses in the releases pool (making it forward-compatible) instead of using a static list. (It's also Perl instead of Bash)
We have two options here: 1) don't even show the link until mirrors are available 2) show the link but use the app servers to serve the file until mirrors have it When it comes to checking: 1) use an arbitrarily long time window (30-60 min) 2) use a script that checks individual releases.m.o mirrors for the file and flip a bit when all check out (just like sentry!) So... my two cents: 1) We should show the link and use the app servers for the 30 min window... thought about load, and this is not a high risk 2) The sentry script is a good idea, but I am not sure if the added complexity (cron, db change, etc.) is worth it -- what is wrong with using the time window if that's simpler?
Comment on attachment 286281 [details] [diff] [review] patch delaying mirror redirect for 30 minutes I think this is acceptable as a stop-gap. Not sure if we want to implement a full sentry-like daemon just for this when we can just fallback on the local FS temporarily. Thoughts?
Attachment #286281 - Flags: review?(morgamic) → review+
My vote: let's see how much this helps before we invest in a sentry. Do we have nagios tracking of mirror latency for this stuff? Or stats on the download rates over the first 30/60/90 mins, to model the app-server load before we deploy this? (Are we using X-Send-File now to avoid php-copying of XPIs shipped from the app server?)
(In reply to comment #29) > Do we have nagios tracking of mirror latency for this stuff? Yes, and there's even graphs. See comment 22. And over the last 4 or 5 days, there's been several mirrors running 90 to 120 minutes frequently because of network issues of some sort.
(In reply to comment #30) > And over the last 4 or 5 days, there's been several mirrors running 90 to 120 > minutes frequently because of network issues of some sort. I wouldn't mind setting the delay time to 2 hours or so, then, if our app servers can handle it (which I think they can, since the vast majority will be served from the mirrors, and this only kicks in with updated add-ons).
Then AMO will get the initial hit when a popular add-on is updated - rather than distributing it on the mirrors. My logs indicate that the download numbers in the first few hours after an update are huge which is why I stopped linking to a copy on my own server while waiting for AMO to update. But maybe in terms of regular AMO traffic this really isn't too bad.
(In reply to comment #32) > Then AMO will get the initial hit when a popular add-on is updated - rather > than distributing it on the mirrors. My logs indicate that the download numbers > in the first few hours after an update are huge which is why I stopped linking > to a copy on my own server while waiting for AMO to update. But maybe in terms > of regular AMO traffic this really isn't too bad. > Could delay updating the information that indicates to clients that the add-on has an update (can't remember what it's called - update.rdf?) Also delay showing the add-on in the Updated add-ons rss feed?
Comment on attachment 286281 [details] [diff] [review] patch delaying mirror redirect for 30 minutes Delay code is committed to SVN r7910.
Keywords: push-needed
These bugs were pushed last Thursday.
Keywords: push-needed
Fred, this doesn't work -- in prod it is still not there -- do you think the file redirects are still being cached?
Are you sure? Was there an add-on with recently published files that leads to a file-not-found error? I don't think they are still cached, and if anything, the delay time should be over for most of the redirects that could be cached anyway.
morgamic doesn't read his IMs! I remembered Fred saying that the admin panel doesn't change datestatuschanged when manually pushing a file public. Both of the instances we've had issues with this today were pushed via the admin panel.
Thanks for clarifying this, Justin! Is there a reason for us not to add this datestatuschanged update to the admin panel also?
We should, it will just require some additional logic of making sure that a specific status was changed before updating it, otherwise all the files would be changed everytime the form is submitted, which would mess up sorting for other pages. Filed bug 404411
Depends on: 404411
With the closing of bug 404411 (and its pending push), the mirror release delay will be functional. Closing FIXED. If problems appear, please file a new bug. Thanks.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago18 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: