Closed Bug 453475 Opened 16 years ago Closed 15 years ago

Set up redirects for archived pages

Categories

(www.mozilla.org :: General, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fantasai.bugs, Assigned: reed)

References

Details

(Keywords: helpwanted)

Attachments

(1 file)

We should be setting up automatic redirects for every archived page once the archive has a disclaimer up. The process should be

  1. Manually set up a redirect for pages where there's an up-to-date
     alternative (e.g. on MDC).
  2. Manually set up a 410 for pages that really should have been deleted.
     Maybe include a not-very-obvious link to the archive on the 410 page.
  3. Script redirects to the archive site for everything else that has been
     moved to the archive.
OS: Linux → All
Hardware: PC → All
This script:
  http://fantasai.inkedblade.net/temp/moz/missing.txt (1.4MB)
will redirect pages if they are in the archive, and send up a 404 if not. It needs the 404 template I'm attaching right now (which should replace the current 404.html). To set up the script, someone needs to fix it with executable privs and edit the appropriate configs to use this script instead of the 404.html page.
Something like adding
  ErrorDocument 404 /missing.pl
to the top-level .htaccess
Assignee: nobody → fantasai.bugs
Status: NEW → ASSIGNED
Ok, the banner is up on the archive site. Assigning to reed: please set this up or reassign to an appropriate sysadmin. :)
Assignee: fantasai.bugs → reed
You need to get all this checked-in. I'm not deploying anything that isn't in a version controlled repository, as it makes management a nightmare. You should set +x on the file as you check it in, and CVS will preserve that. You can probably do all this yourself... is there anything you actually need a sysadmin to set up?
Assignee: reed → fantasai.bugs
I don't know. I wasn't sure about the executable bits. I'll try it tomorrow and let you know if there's a problem.
All right reed, back to you. Make 
  http://www.mozilla.org/missing.pl
execute. Because I have no idea how.
Assignee: fantasai.bugs → reed
Would it be possible to put that script somewhere else?  I'd like to try to keep things out of the top-level of the site as much as possible.
Where do you want it put? 404 pages /are/ a top-level sort of thing, but I don't think it matters much in this case since nobody's supposed to point /at/ the script.

I want it put wherever it'll actually execute. :)
Other scripts seem to be in the tools directory.  Maybe putting it there will fix the issue with making it execute?  This is just a guess though, so I'll step aside to let Reed tell us what to do :)
The tools directory is for scripts that are used to build the directory structure that is served by Apache.  Things that need to live in the directory structure that is served by Apache should be elsewhere.  I'm not sure which this is (I sort of hope it's the former, but suspect it's the latter).
Blocks: 460423
Blocks: 455696
Blocks: 455042
Blocks: 462905
Blocks: 329296
Blocks: 254203
Blocks: 458865
Marking this as a blocker since archiving has almost entirely stopped without this in place.  I'm interested in figuring out how to get this resolved.  If this is still an open issue by then, I'd like to make this the top agenda item at the next www.mozilla.org planning meeting.
Severity: normal → blocker
Keywords: helpwanted
I'm only starting on this so please forgive my ignorance if what I say is wrong.

It looks like fantasai's script does the right thing. However, since it requires a mapping from old URLs to new ones, it would be nice if the mapping was self-update-able. (1)
From the few bits I know, there are plans for switching the mozilla.org website to php includes. (2)

Given the above two, one of each page's files could include information that defines which old URL should redirect to it. This may subtract very little from performance but add a huge gain to maintenance (the mapping is automatic).

It would also be good to implement a SELECT box of current valid pages to go to, and a logger/counter in the 404 file. Ideally:
* when a user lands on this 404 page, they can use the SELECT to request that the URL they tried should redirect to a certain existing page. Someone with the right privileges gets notified by email about this selection, and gets to approve or deny it.
* the counter keeps track of which URLs land on the 404, and how many times so we can focus on the "most missed" pages.

Unfortunately, I know no Perl at all. So I would be able to do this in PHP, or fantasai or someone else can help implement it in Perl :)
Depends on: 487011
That's totally overkill. We have a dated archived copy of the website. It's not intended to ever change. If we hit a 404, and there's a copy of the file on that website, we redirect there. Otherwise we don't and return a 404. If there's a "better" page to direct to, we don't do it with a 404 script handler, we do it with a 301 redirect in the .htaccess. The script doesn't need improvement, it needs to be set up so that it *executes*.
I like the idea of making the 404 page a little more useful by allowing people to more easily provide feedback and allowing us to gather stats about the most missed pages.  As fantasai mentioned though, if a page exists there but not on the main site we should just go ahead and send somewhere there.  For a better 404 page in general though, we could add a new bug.

As for being able to do things in PHP vs. Perl, that should be an option on the new site.  Not sure if that's something we can do now or not.
I should proof-read things better before submitting...

I meant to say 'if a page exists on the archive server'.
(In reply to comment #13)
I'm suggesting we make 404 pages more useful. I forgot to mention it should include search results on the page.
Perhaps my initial implementation idea wasn't quite optimal, but the aim of it is:
1. Allow community to decide on new redirects that aren't setup yet -- do some of the work for us, we're overloaded I'm sure.
2. Keep the redirections updated. By having, perhaps, a separate script that goes through the existing URLs and creates a mapping dynamically. I honestly think that, after a while, we will forget about updating the text file with the mapping, or at least it would be nice to not have to do so, but just run a script.
3. Show search results containing the terms entered in the URL. If we have clean and friendly URLS, then page links should contain useful words that we could search on.

Since getting to the 404 page means the user is lost, I think performance is not the primary focus, so we could sacrifice a little there, so that the user finds his/her way back to the right pages more easily.
Yay, it's alive!
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Reed, that's awesome!  Very cool news.  I'll get you a free lunch when I'm in the office tomorrow :)
This might have caused bug 492726, serving the 404 page as plain text.
Blocks: 492726
Depends on: 501414
Component: www.mozilla.org → General
Product: Websites → www.mozilla.org
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: