Closed Bug 891654 Opened 11 years ago Closed 11 years ago

Validator can't find hosted manifest that works fine in every other way

Categories

(Marketplace Graveyard :: Validation, defect)

x86
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX
2013-07-11

People

(Reporter: Harald, Unassigned)

Details

Steps:
 - Validate http://www.segundamano.es/firefox_manifest.webapp
 - Error: No manifest was found at that URL. Check the address and try again

Expected result:
 - Validation OK

Manifest returns correct mime-type and status code, tested via CURL with FxOS and FF Android user agent.

This blocks submission for a carrier launch partner.
Without a UA string the server is providing an empty response.

curl http://www.segundamano.es/firefox_manifest.webapp
curl: (52) Empty reply from server
It also blocks urllib:
% curl --user-agent 'Python-urllib/2.6' http://www.segundamano.es/firefox_manifest.webapp
curl: (52) Empty reply from server
Target Milestone: --- → 2013-07-11
Bug 888085 would fix it so we send a custom user-agent. Let's do that.
I'm inclined to WONTFIX. If the server is explicitly blocking specific UAs, how are we supposed to get around that? If we changed the UA to FXOS's UA and the server started blocking that one too, should we change it to a different UA? Should we use googlebot's UA? At what point do we draw the line?

If this were something that zamboni/the validator were doing incorrectly I'd say that we should fix it, but really any application with default settings would run into this same problem.
No user agent seems a unexpected request and I am not surprised that it triggers unexpected results. Is there a motivation behind that, like suppressing any UA-detection?

If this is a expected limitation in the Validator it might help to be called out when the response isn't what was expected. Maybe developers can expand more information to see the request headers that were sent and the response headers that were received.
(In reply to Harald Kirschner :digitarald from comment #5)
> If this is a expected limitation in the Validator it might help to be called
> out when the response isn't what was expected. Maybe developers can expand
> more information to see the request headers that were sent and the response
> headers that were received.

We're very limited in the information that we can return to the user about what failed in HTTP requests because of security reasons. We've had to cut back the amount of data in the past and have an open bug to cut it back even more. At one point, we gave you very detailed information about the result of bad requests, but that's unfortunately no longer an option.
Is there a bug that explains the security/privacy concerns?

The partner disabled those requests to block spiders that don't provide a UA. Does Marketplace make the request from a fixed IP so the partner can unblock it?
Flags: needinfo?(mattbasta)
Principles of the open web aside (we should never encourage partners to filter requests just because they don't include identifying information), no, there is no guarantee that the validator will be run from a particular IP.
Flags: needinfo?(mattbasta)
OK, if the IP can change, we shouldn't let them filter by IP.

What is the decision on bug 888085? I really would faithfully request resources with the right UA.

Is there a bug that explains the security/privacy concerns?
Bug 878368 explains why we can't give more information about request failures. PM me if you need to get access to it.

> I really would faithfully request resources with the right UA.

Make no mistake, we are sending a user agent; it's the default one provided by Python's urllib2 module. It should look something like:

  Python-urllib/2.7

The requests we're sending are a.) not at all malformed and b.) not in any way incorrect (we are indeed requesting the manifest from Python's urllib2 module). I don't think we should be special casing our code for Firefox OS.

If another platform adds WebAPI support, will they not be able to install apps because the remote server is filter out their user agent? How many times should we request the manifest with different UAs before we determine that the remote server isn't giving priority or preference to one UA over another?

And what about Android? What if the remote server returns a manifest for FXOS but for an FX4A UA they have an Apache redirect in place that redirects all Android traffic to their "mobile site", which isn't a manifest at all.

At the end of the day, it's a wash. But in any case I don't believe specifying a UA is going to do anything more than gloss over the one-off cases where developers have poorly-conceived spider (or otherwise) blocking mechanisms installed on their servers. This is why robots.txt exists.
%% curl 'http://www.segundamano.es'
curl: (52) Empty reply from server

Repeating the remarks above, it's pretty unacceptable and anti-the-way-of-the-web to be checking against some magic whitelist of UAs.

To move forward: is there a way the partner can request their IT team/hosting provider of segundamano.es to not block on empty/unknown User-Agents?
I am requesting that, but wanted to make sure that I don't put additional work on a partner when bug 888085 didn't get discussed yet.

Can somebody cc me on bug 878368, please?
(In reply to Christopher Van Wiemeersch [:cvan] from comment #11)
> To move forward: is there a way the partner can request their IT
> team/hosting provider of segundamano.es to not block on empty/unknown
> User-Agents?

This is the right thing to do.  Python-urllib is a standard library on the web and having us conform to other companies arbitrary lists of blocked UAs feels like a maintenance headache for no purpose.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.