Closed Bug 690391 Opened 13 years ago Closed 13 years ago

Run web spider to determine what sites only check for 1 major version number digit

Categories

(mozilla.org :: Miscellaneous, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: christian, Assigned: bc)

References

Details

We know a bunch of version sniffing will break with Firefox 10 (on mozilla-central) because they don't expect 2 or more digits in the major version number.

Can we spider the web and figure out sites that blatantly do this with "broken" regular expressions?
I'm not sure quite how to go about this. Let me think a bit. Suggestions welcome.
Yeah, me either. I was thinking we could do a dumb search for navigator.appVersion or ".test(navigator.userAgent)" and with some fuzzy context (5 lines?) looks for a regex that starts with [0-9]\. or \d\. Probably wouldn't give us that good of a picture though.

Good news is that my first googling on how to version detect gives me something that actually works for 10.0, so at least the top hit example code isn't 100% wrong.
One option would be to crawl with a user-agent that's 10.0 and a user-agent that's 9.0 and compare the results. Just comparing the HTTP request streams would probably be a good start; if you get different resources, or the set of requests is different, we probably have a problem. If some sites are serving nondeterministic resources then we can identify those false positives by loading them twice with each UA string and ignoring sites where loads differ with the same UA string.

E.g. for a given site X we could do
1) load X with UA 9.0
2) load X with UA 10.0
3) load X again with UA string 9.0
If the resources loaded by 1) are the same as the resources loaded by 3) but different from the resources loaded by 2), report a warning and have someone manually test the site.

Then do the same thing but instead of comparing resources loaded, compare screenshots.

Do all this running Adblock :-).
roc: I like that idea. I had considered the different ua's but not the comparison of the actual loaded resources. Screenshots should be useful/possible as well. The adblock idea is simply brilliant.
I did a quick check of the dependencies in bug uafx10

Using differences in the loaded resources would have flagged each. bug 691679 included google-analytics.com and plusone.google.com/ differences which would have caused a false positive if they had been the only differences. I could add additional adblock filters to remove those. I would need to add similar blocks for others... yahoo|live|facebook? Suggestions?

One thing about Adblocks filter subscriptions are that they are country/language specific. If I restrict the run to the top 500 sites in the United States I can just use Fanboy's English list, but if we want to do more I would need to include more (all?) of the available filters.

Visually comparing the pages in the uafx10 dependent bugs shows that they would have been flagged by comparing "screenshots". I would expect "screenshots" to be less sensitive to hidden tracking code than the resource checking.

How many sites are we interested in checking? With this overhead, I *guess* I be able to do about ~ 2000 day/machine. I have two linux boxes at seneca I can use. 

roc: How important are the 3 steps above do you think? I was thinking of a single pass:

1. load with 10 ua.
   record loaded resources
   take a "screenshot"
2. load with 9 ua.
   record loaded resources
   take a "screenshot"
3. compare and report.

I was thinking of just using text output in the reports and not keeping screenshots or other data apart from the text comparison report. Ok?
If that works without too many false positives, that's fine. I was just worrying about sites that display random non-ad stuff every time they load.
Couldn't we restrict fails to > 10% difference in screenshot pixels? That would probably help reduce false positives.

I'm worried about sites that require a login though (like blackboard, facebook, etc)...probably not much we can do there.
I can do some test runs on limited sets and see how bad the false positive situation is with respect to different approaches. I'm still looking for some guidance on how many sites you are interested in.
At least the google top 1000. Preferably the top 1000 in the countries there as well (http://www.google.com/adplanner/static/top1000/)
I attempted to detect problematic sites by:

* Using private browsing mode
* For each response domain, removing the stored data that is temporarily stored even in private browsing mode before attempting the next load.

* Using Adblock with all of Fanboy's subscriptions installed
* Using Flashblock.

Adblock would hopefully eliminate many ads while Flashblock would eliminate Flash animations which would cause problems for image comparisons.

1. performing two successive loads with Firefox 9 UA
   recording responses
   capturing image of the page
   comparing the responses and images
2. performing two successive loads with Firefox 10 UA
   recording responses
   capturing image of the page
   comparing the responses and images
3. comparing the first Firefox 9 load's responses and image
   to the first Firefox 10 load's responses and images.

The hope was that identical responses and images for each UA with different responses and images for the two UAs would flag sites to be investigated. Unfortunately, the responses were not a reliable indicator since they would differ even for the same UA on successive loads. The same was true for comparing the images for successive loads using the same UA.

The response and image differences for successive loads using the same UA were caused by a variety of factors:

* image capture time at differing points during the loading of the resources. I don't think this was a major issue but do plan to add a delay after page load to give the page more time to arrive at a stable state.

* rotating content such as internal ads, stories, user profile pictures, captchas, and web based animations or animated gifs.

* slight pixel differences in layout cause insignificant image differences. This appeared to be worse for Linux and somewhat better for Windows and could possible be related to reflow and timing issues.

These same variations also caused the images for the two different UAs to differ.

I settled on using the comparison in step 3 above where I ignored the responses and used only the image differences between Firefox 9 and Firefox 10. I did this partly to reduce the size of the collected data since the images were rather large (I used a 1200x1024 image size) and it did not appear that showing both loads and their differences for each UA would be that helpful. Even with this limitation in the output, the data was almost 500M in size. Testing the urls from bug 690287 showed this approach would easily find sites that showed an unsupported browser message and would include sites, such as smartusa.com, which presented significantly different content to the two UAs.

I ran this approach on the top 1000 sites from alexa (which I had handy and which compares favorably to google's list) using Linux and Windows. Linux completed 998 and Windows attempted 893 sites out of the thousand. DNS and time out issues account for the failure to load all 1000 sites. I extracted the data into a series of pages with 10 sites per page where I showed a message if the site failed to load; a message that they were identical if the images for the two UAs were identical; or showed the original images for each UA along with a difference image for easy comparison if the site successfully loaded and showed a difference between the two UAs images.

I did not find any sites which displayed an unsupported browser message. It was not always clear if the image differences were due to normal ad/content rotations or were due to load time/image capture time differences or to actual significant differences between Firefox 9 and Firefox 10. For example, I'm not sure I would have recognized smartusa as problematic. I did limited investigations into some sites but finding the browser detection code is not always easy, especially with Firebug which though *shiny* lacks some of the utility of venkman. I miss venkman so much!

I intend to re-run the test for the google list, including the country specific lists, with the following changes:

* introduce a delay after page load before the responses and images are finalized.
* output the response differences (unique responses for each load), images and image differences for each of the 3 steps outlined above.
* extract the data into a series of pages where I can easily visually compare the images and their differences. This should help in determining if the variations between Firefox 9 and Firefox 10 are significantly different from the variations between successive loads using the same UA.

Distributing the results is somewhat problematic due to: size of the data (which will probably triple); possible copyright issues; hard-core porn images.

I'll have a follow up by Friday I hope.
I completed reviewing the results of the google lists including the country specific top 100s (2100 in total). I found no obvious Firefox 10 issues. Of course I could have missed it in the 14,700 images (Firefox 9 two passes and comparison, Firefox 10 two passes and comparison, Firefox 9 compared to Firefox 10). I did find that hotwire.com considers Firefox 9 and 10 to be out of date, but thinks Firefox 8 is ok. Of course this was only the homepages and there could be pages deeper on the sites that flag Firefox 10.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.