<img> inside <noscript> preloaded when <noscript> appears in document.write after <script src>

RESOLVED FIXED in Firefox 5

Status

()

Core
HTML: Parser
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: Mike Cherichetti, Assigned: hsivonen)

Tracking

({regression})

Trunk
mozilla6
regression
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox5 fixed)

Details

(URL)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

6 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15 ( .NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0

If you trace the HTTP requests you can see the <SCRIPT> and <NOSCRIPT> blocks are both executed for all of the advertisements in the page, which all contain /advertpro/ in their URL so that can be used to filter for them.

Reproducible: Always

Steps to Reproduce:
1. Visit web site
2. Allow page to complete loading
3. Review HTTP traces
Actual Results:  
Both <NOSCRIPT> and <SCRIPT> are executed.

Expected Results:  
Only <SCRIPT> should be executed.

Here are the results of a HTTP trace that I did with HTTP Analyzer, which shows the <SCRIPT> call executes and then right after it the <NOSCRIPT> part of the code loads an <IMG> tag.

NO.            Starred  OffSet   Timeline  Hints  Duration(s)  Method  Result  Received  Type             URL                                                                                                                                                                                                                                                                                                                                                                                                                                  RedirectURL                                                
-  00:00:00.656    firefox.exe[2528] (Count=15, Sent=12.73 K, Received=15.38 K, ElapsedTime=2.969 s)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
   53        False    + 1.281            True   0.126 s      GET     200     1.12 K    text/javascript  http://ads.contactmusic.com/advertpro/servlet/view/banner/javascript/zone?zid=162&pid=0&random=44211242&millis=1300470897418&referrer=http%3A//ads.contactmusic.com/advertpro/servlet/view/banner/javascript/html/zone%3Fzid%3D89%26pid%3D28%26custom1%3Dy%26custom2%3Dy%26custom3%3Dy%26custom4%3Dy%26random%3D70707430%26millis%3D1300470896657%26referrer%3Dhttp%253A//www.contactmusic.com/                                                                                                 
   54        False    + 1.296            False  0.142 s      GET     302     334       text/html        http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=162&pid=0&position=1                                                                                                                                                                                                                                                                                                                                         http://ads.contactmusic.com/advertpro/servlet/file?id=16  
   56        False    + 1.343            True   0.111 s      GET     200     1.13 K    text/javascript  http://ads.contactmusic.com/advertpro/servlet/view/banner/javascript/zone?zid=163&pid=0&random=87306750&millis=1300470897423&referrer=http%3A//ads.contactmusic.com/advertpro/servlet/view/banner/javascript/html/zone%3Fzid%3D88%26pid%3D28%26custom1%3Dy%26custom2%3Dy%26custom3%3Dy%26custom4%3Dy%26random%3D20405627%26millis%3D1300470896768%26referrer%3Dhttp%253A//www.contactmusic.com/                                                                                                 
   70        False    + 1.359            False  0.345 s      GET     302     334       text/html        http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=163&pid=0&position=1                                                                                                                                                                                                                                                                                                                                         http://ads.contactmusic.com/advertpro/servlet/file?id=12  
   99        False    + 1.812            True   0.173 s      GET     200     1.12 K    text/javascript  http://ads.contactmusic.com/advertpro/servlet/view/banner/javascript/zone?zid=164&pid=0&random=13977396&millis=1300470897951&referrer=http%3A//www.contactmusic.com/                                                                                                                                                                                                                                                                                                                            
   100       False    + 1.828            False  0.157 s      GET     302     333       text/html        http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=164&pid=0&position=1                                                                                                                                                                                                                                                                                                                                         http://ads.contactmusic.com/advertpro/servlet/file?id=7
(Reporter)

Comment 1

6 years ago
Created attachment 520248 [details]
Example HTTP Trace Results
(Reporter)

Updated

6 years ago
Version: unspecified → Trunk

Comment 2

6 years ago
Do you see this in both, Fx 3.6 and Fx 4 ?
Severity: major → normal
Component: General → General
Product: Firefox → Core
QA Contact: general → general
(Reporter)

Comment 3

6 years ago
No, I'm only seeing this happen with Firefox 4 RC.  It does not happen with Firefox 3.6.15
Component: General → HTML: Parser
QA Contact: general → parser
That said, I'm seeing the <img> end up as text in the DOM....

Henri, are we doing prefetch on things inside <noscript> perchance?
In a simple experiment we don't seem to be...

I can reproduce the zid=162/163/164 images from comment 0 be being loaded, but those urls do not appear in the source of the page (much less inside <noscript>)....  nor in the DOM of the final page.  So they're presumably being dynamically added inside some iframe somehere.
Alright.  The load of the zid=163 image is happening when the script loaded from <http://oascentral.artistdirect.com/RealMedia/ads/adstream_jx.ads/www.contactmusic.com/homepage/1675279910@Top,Right,Left!Right> does a document.write of this string:

  <IMG src="http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=163&pid=0&position=1" hspace="0" vspace="0" border="0" alt="Click Here!">

followed by a newline.  This is certainly NOT sitting inside a <noscript> block; it's being explicitly document.written into the page.

Unfortunately, the data returned for that script seems to depend on something other than the url, because what the scriptloader sees and what I see if I load the url directed are totally different (e.g. an order of magnitude different in size).
Oh, and to be clear, the load in comment 6 was in fact a preload for the image.

I'll attach the script in question in a sec, but this looks like a parser bug.  The <img> tag is in fact inside <noscript>, but they are in separate document.write calls.  Henri?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Created attachment 520587 [details]
Script in question
(Assignee)

Comment 9

6 years ago
(In reply to comment #7)
> I'll attach the script in question in a sec, but this looks like a parser bug. 
> The <img> tag is in fact inside <noscript>, but they are in separate
> document.write calls.  Henri?

This is by design. See http://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5Parser.cpp#411

Getting the preloads always right when multi-level document.write is involved would be disproportionately complex compared to the relative harmlessness of doing an extra preload in a special case. After all, document.writing a <noscript> is rather pointless in the first place.

Things should work right for <noscript> coming from the network stream and for document.write when multi-level document.write is not involved.

Marking WONTFIX, since the code is the way it is intentionally for complexity avoidance.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
Summary: <NOSCRIPT> is executed in addition to <SCRIPT> when JavaScript is enabled → <img> inside document.written <noscript> after multi-level document.write preloaded
(Reporter)

Comment 10

6 years ago
How can this be considered correct behavior when it breaks the contract of <noscript> ?

This can be a big problem for any system that uses images as tracking assets.  In this case it will cause over counting of delivered ads, which advertisers will be charged extra money for.  The financial impact of that can be very large for sites that serve 100's of millions or billions of ads per month, which a number of our customers do.

It's quite common for ad servers to have nested document.write calls because they exchange ads with each other.  In this case the contact music site is displaying a third-party ad.  However, they don't have any ads to display at this time.  So, they revert to sending back a code from contact music that displays different ads, which again is quite a common practice for all ad servers.  The reason the code has <noscript> is that the publishing ad server might display the code with <script> or <iframe> tags, which isn't known in advance.

I can appreciate not wanting to complicate code, but the potential impact of this is very large for systems which are designed to depend on tracking assets only being loaded in <script> or <noscript> but not both.

May I suggest that a potential fix could be to simply disable pre-loading of images in all <noscript> blocks when JavaScript is enabled?  I don't understand why the <noscript> code is even being parsed when JavaScript is enabled.  Can't the parser simply ignore the content until it reaches the </noscript> tag?  The reverse is also true, if JavaScript is disabled it shouldn't parse what's in the <script> tags.
> This can be a big problem for any system that uses images as tracking assets. 

Any sort of speculative prefetch is a big problem for systems like that, yes.  They just need to deal with it....  There's no guarantee that an <img> tag in a webpage means exactly one image load even in the absence of speculative prefetch: it could be none, it could be more than one.

> I don't understand why the <noscript> code is even being parsed when JavaScript
> is enabled.

The point is that the speculative image prescan in this case doesn't know that it's inside a <noscript>, because that was passed to the parser in a separate call.

The speculative prescan is _speculative_.  It doesn't happen from the main parse; by the time you're parsing the stuff for real you know whether you need to load the image, so it's not speculation.
Blocks: 649294
(Assignee)

Comment 12

6 years ago
I misdiagnosed this the first time round. The speculative tree builder was configured to run the HTML5 tree building algorithm as if scripts were disabled, so the noscript contents got tokenized as tags regardless of multilevel document.write issues.
Assignee: nobody → hsivonen
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(Assignee)

Comment 13

6 years ago
Created attachment 528542 [details] [diff] [review]
Make the noscript treatment in the speculative doc.write tree builder match the non-speculative doc.write tree builder

I copied the state of the other tree builder instead of just passing PR_TRUE in case there are situations where scripting is "off" but an extensions can still run scripts.
Attachment #528542 - Flags: review?(bzbarsky)
(Assignee)

Comment 14

6 years ago
Created attachment 528543 [details] [diff] [review]
Mochitest
Attachment #528543 - Flags: review?(bzbarsky)
(Assignee)

Updated

6 years ago
Status: REOPENED → ASSIGNED
Summary: <img> inside document.written <noscript> after multi-level document.write preloaded → <img> inside <noscript> preloaded when <noscript> appears in document.write after <script src>
(Assignee)

Updated

6 years ago
Duplicate of this bug: 649294
(Assignee)

Updated

6 years ago
OS: Windows 7 → All
Hardware: x86 → All
(Assignee)

Updated

6 years ago
Attachment #528543 - Attachment is obsolete: true
Attachment #528543 - Flags: review?(bzbarsky)
(Assignee)

Comment 16

6 years ago
Created attachment 528546 [details] [diff] [review]
Mochitest, v2
Attachment #528546 - Flags: review?(bzbarsky)
Comment on attachment 528542 [details] [diff] [review]
Make the noscript treatment in the speculative doc.write tree builder match the non-speculative doc.write tree builder

r=me
Attachment #528542 - Flags: review?(bzbarsky) → review+
Comment on attachment 528546 [details] [diff] [review]
Mochitest, v2

I assume you made sure this fails without the patch, right?

If so, r=me.
Attachment #528546 - Flags: review?(bzbarsky) → review+
(Assignee)

Comment 19

6 years ago
(In reply to comment #18)
> I assume you made sure this fails without the patch, right?

I did.

Thanks. 

http://hg.mozilla.org/mozilla-central/rev/b16fc4a77dac
http://hg.mozilla.org/mozilla-central/rev/557165a66267
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
(Assignee)

Comment 20

6 years ago
Comment on attachment 528542 [details] [diff] [review]
Make the noscript treatment in the speculative doc.write tree builder match the non-speculative doc.write tree builder

Nominating for Aurora, because this fix is extremely low-risk but keeping this bug unfixed causes grief to Internet advertising bureaus that are trying to count ad impressions.
Attachment #528542 - Flags: approval-mozilla-aurora?
(Assignee)

Updated

6 years ago
Attachment #528546 - Flags: approval-mozilla-aurora?

Updated

6 years ago
Keywords: regression

Updated

6 years ago
Attachment #528542 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

6 years ago
Attachment #528546 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 21

6 years ago
Landed in Aurora, too:
http://hg.mozilla.org/mozilla-aurora/rev/036daba8643f
http://hg.mozilla.org/mozilla-aurora/rev/c5a406ca5980
status-firefox5: --- → fixed
Flags: in-testsuite+
Target Milestone: --- → mozilla6

Updated

6 years ago
Duplicate of this bug: 656791

Comment 23

6 years ago
(In reply to comment #10)
> How can this be considered correct behavior when it breaks the contract of
> <noscript> ?
> 
> This can be a big problem for any system that uses images as tracking
> assets.  In this case it will cause over counting of delivered ads, which
> advertisers will be charged extra money for.  The financial impact of that
> can be very large for sites that serve 100's of millions or billions of ads
> per month, which a number of our customers do.
> 
> It's quite common for ad servers to have nested document.write calls because
> they exchange ads with each other.  In this case the contact music site is
> displaying a third-party ad.  However, they don't have any ads to display at
> this time.  So, they revert to sending back a code from contact music that
> displays different ads, which again is quite a common practice for all ad
> servers.  The reason the code has <noscript> is that the publishing ad
> server might display the code with <script> or <iframe> tags, which isn't
> known in advance.
> 
> I can appreciate not wanting to complicate code, but the potential impact of
> this is very large for systems which are designed to depend on tracking
> assets only being loaded in <script> or <noscript> but not both.
> 
> May I suggest that a potential fix could be to simply disable pre-loading of
> images in all <noscript> blocks when JavaScript is enabled?  I don't
> understand why the <noscript> code is even being parsed when JavaScript is
> enabled.  Can't the parser simply ignore the content until it reaches the
> </noscript> tag?  The reverse is also true, if JavaScript is disabled it
> shouldn't parse what's in the <script> tags.

This is already becoming a big problem as Firefox 4 gains adoption.  At Rubicon we're considering a change to our ad tags to remove the <noscript> block entirely.  This may cause detrimental web experience for users that choose not to use javascript (albeit a very small percentage at this point).  The alternative is to have a double-count scenario which causes a lot of problems for web publishers trying to evaluate performance of their online properties.

Is there any chance that this can make it into a Firefox 4 patch?
Ed, there are no Firefox 4 patches planned before the release of Firefox 5 in about a month, unless a critical security problem comes up.

This patch will be in Firefox 5 when that ships; Firefox 4 users will be automatically updated to Firefox 5.

Comment 25

6 years ago
(In reply to comment #23)
> we're considering a change to our ad tags to remove the <noscript>
> block entirely.  This may cause detrimental web experience for users that
> choose not to use javascript

How can a user with javascript disabled get any web experience from content that is written by javascript?

Comment 26

6 years ago
Exactly the point.  Normally such users' web experience would be fulfilled by the content called by code within the <noscript> blocks.  However, this bug is causing serious problems in web performance evaluation as noted above -- leading to a possible temporary solution by publishers of removing the <noscript> blocks entirely (which they shouldn't have to do) to mitigate the effects of this bug.  This will result in detrimental performance for users with JavaScript disabled.
bkearns, those noscript blocks currently don't make it into the page at all if JavaScript is disabled.  They're written out only if a script runs.  At least in all the cases that have been brought up so far.

Comment 28

6 years ago
Hi, we are experiencing the same counting issues and again are looking to remove noscript tags as a result, which is a shocking thing for usability. Whilst Boris makes the point that publisher side Javascript is required to output the no script tag in the first place, the fact is that we as third party adservers do not have visibility into how the publishers put the tags on the page, so we have to remove the noscript tag from all our tags even ones which are directly output, or output by the publisher in iFrames and these break too. This is having a major impact on online advertising companies and causing many many hours of extra work to try and untangle the double counting issue.

Comment 29

6 years ago
No need for pushing, this bug is fixed. Ships with Firefox 5 final on 2011-06-21.
You need to log in before you can comment on or make changes to this bug.