there's a *lot* of code that, for better or worse, pulls files from mxr with 'raw=1' in the query-string. when mxr urls are redirected to dxr the correct file loads (yay!) however it's dxr's html view, not the file source. this is likely to break things. it would be nice if dxr supported 'raw' mode, and just returned the file contents as text/plain. two common examples on github are: http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 https://mxr.mozilla.org/mozilla/source/security/nss/lib/ckfw/builtins/certdata.txt?raw=1 certdata.txt is important because old versions of our cacert.pem pointed to mxr (eg. https://github.com/magichuihui/alipay-test/blob/c769bcc00aa6cc5d93f242a410a896f689795969/cacert.pem).
Along the lines of bug 1279952 comment 12, I think we should explicitly break this case - albeit probably by redirecting to a warning page rather than redirecting to a non-raw HTML version of the file.
(In reply to Ed Morley [:emorley] from comment #1) > Along the lines of bug 1279952 comment 12, I think we should explicitly > break this case - albeit probably by redirecting to a warning page rather > than redirecting to a non-raw HTML version of the file. > [quoting that comment] > If we do set up redirects for old MXR links I *really really* think we > should explicitly break any links that use `?raw=1`, so people can't > silently unsafely use these resources over HTTP. > (Whilst DXR is HTTPS-only and sets HSTS headers, the initial redirect can be MITMed). i don't fully agree: - if we're to prevent 'raw' for that reason, then we should only block http requests, not https requests - a warning page isn't going to work when a raw url is consumed by code (eg. https://github.com/barseghyanartur/tld) i think the redirect over in bug 1279952 should probably prevent http raw requests (using a 403 response), however https should be allowed and dxr should support these; either directly, or by redirecting to hgweb/etc.
Ok I agree a 403/404 is preferred to the warning page. I'd very much like us to not encourage people to misuse DXR as a CDN, so ideally 403 (or 404) for all `?raw=1` requests, including HTTPS. Though if that's not seen as acceptable, we should 301 redirect the `?raw=1` requests to hg.mozilla.org (perhaps just for those two common files, and let the rest 403/404).
(In reply to Byron Jones ‹:glob› from comment #0) > there's a *lot* of code that, for better or worse, pulls files from mxr with > 'raw=1' in the query-string. I feel very strongly that we should actively break/discourage this behavior. The code indexing sites are meant to be used for searching and reading code, not distributing it. That's what hg.m.o is for! > certdata.txt is important because old versions of our cacert.pem pointed to > mxr (eg. > https://github.com/magichuihui/alipay-test/blob/ > c769bcc00aa6cc5d93f242a410a896f689795969/cacert.pem). And I feel that including an MXR url in a product like that was a gross mistake. (In reply to Byron Jones ‹:glob› from comment #2) > - a warning page isn't going to work when a raw url is consumed by code (eg. > https://github.com/barseghyanartur/tld) For some/many use cases that may be entirely correct, but we already have evidence that some people will investigate why things are broken and attempt to fix them. At some point we have to stop perpetuating past mistakes and let things break.
A possible solution for the MXR transition... There are a few key files that many people use because the documentation in days past told them to. I propose we do an actual HTTP redirect to the canonical version stored in VCS for those files (skipping the interstitial page). We also put comments in those files listing the location of the canonical version of the file so that later archeology will show the correct location from which to to retrieve the file. For the rest of the ?raw=1 files, we explicitly let them fail. glob poked around all github/mozilla repos and found the following files referenced using ?raw=1. List follows, with his guesses at a canonical location. http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 → https://publicsuffix.org/list/public_suffix_list.dat http://mxr.mozilla.org/mozilla/source/security/nss/lib/ckfw/builtins/certdata.txt?raw=1 → https://hg.mozilla.org/mozilla-central/raw-file/tip/security/nss/lib/ckfw/builtins/certdata.txt http://mxr.mozilla.org/mozilla/source/extensions/spellcheck/locales/en-US/hunspell/README.txt?raw=1 → http://hg.mozilla.org/mozilla-central/raw-file/tip/extensions/spellcheck/locales/en-US/hunspell/README_en_US.txt (i don't think we should redirect this one)
(In reply to Kendall Libby [:fubar] from comment #4) > I feel very strongly that we should actively break/discourage this behavior. > The code indexing sites are meant to be used for searching and reading code, > not distributing it. That's what hg.m.o is for! > > And I feel that including an MXR url in a product like that was a gross > mistake. i agree 100%. (In reply to Amy Rich [:arr] [:arich] from comment #5) > [redirect a few key files] > For the rest of the ?raw=1 files, we explicitly let them fail. that works for me - broader search results indicate that we only need to be concerned with effective_tld_names.dat and certdata.txt. other files are referenced, but with no where near the frequency as those two. as far as i understand it, a redirect already exists for effective_tld_names.dat; morphing bug to add a certdata.txt redirect.
No longer blocks: 1097091
Summary: dxr doesn't support the raw=1 parameter → redirect mxr requests for security/nss/lib/ckfw/builtins/certdata.txt?raw=1 to hg.mozilla.org
Another +1 for Amy's solution. This keeps the icky stuff in the web server config and out of the more complicated DXR codebase. Incidentally, DXR does provide raw files but only for images (e.g. https://dxr.mozilla.org/mozilla-central/raw/mobile/android/tests/browser/chrome/tp5/twitter.com/a0.twimg.com/profile_images/316019228/326994260_1117936370_0_mini.jpeg), as necessary to fulfill requests for its <img> and <svg> tags.
I'm not sure that Zeus can order the rules, so I've combined the existing outage-redirect (ie hardhat) and the mxr-publicsuffix-301 rules and added a stanza for certdata.txt. If the URL path matches effective_tld_names.dat, it 301s to publicsuffix.org; if it matches certdata.txt, it 301s to hg; otherwise, 301 to hardhat. Also tweaked it so that it matches for mxr.m.o and lxr.m.o, JIC. Applied to both http://mxr and https://mxr and it all appears to work. Would appreciate another set of eyes.
(In reply to Kendall Libby [:fubar] from comment #8) > Would appreciate another set of eyes. looks like you're matching 'substring appears anywhere in url' instead of the full path to the files. ie. http://mxr.mozilla.org/cheeseeffective_tld_names.datmonkey redirects, as does http://mxr.mozilla.org/some/other/project/certdata.txt i'm not sure if this is a problem or not :) i'm thinking about if another project uses the same filename, would it be bad to redirect to publicsuffix/hg.m.o instead of dxr? (i'm leaning towards this is an edge case we shouldn't worry about). other than that it looks good to me.
yeah, the original tld redirect was vague, because it was getting hit in a few different repos. doesn't seem to have hurt anything, so I'm happy to continue with certdata, unless/until something actually breaks
(In reply to Richard Soderberg [:atoll] from comment #11) > Comment 5: I wish we weren't redirecting to hg.mozilla.org, but oh well. We > probably already have protection classes for these two URLs anyways. I would be happy to reconsider or have a conversation if you wished to profer why. There are protection classes for several services, but nothing URL based; just the pre-existing rule 301'ing the tld file. > Comment 8: Zeus rules are executed top to bottom, unless a rule halts > execution by sending a reply. Ah, TIL. Is there an easy way to rearrange the list?
We can't serve stale data, so there's no point in redirecting anywhere else. hg is the only possible choice. You can rearrange the list with drag and drop.
These redirects seem to be in place: can this bug be resolved FIXED or is there something else that needs to be finished up?
(In reply to Richard Soderberg [:atoll] from comment #13) > We can't serve stale data, so there's no point in redirecting anywhere else. > hg is the only possible choice. But <http://mxr.mozilla.org/mozilla/source/security/nss/lib/ckfw/builtins/certdata.txt?raw=1> was stale for a long time because <http://mxr.mozilla.org/mozilla/> was a index of CVS-era repo.
I accept whatever solution :fubar considers appropriate, and have no further requests of this bug.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.