Closed Bug 642908 Opened 13 years ago Closed 13 years ago

<img> inside <noscript> preloaded when <noscript> appears in document.write after <script src>

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla6
Tracking Status
firefox5 --- fixed

People

(Reporter: mike.cherichetti, Assigned: hsivonen)

References

()

Details

(Keywords: regression)

Attachments

(4 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15 ( .NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0

If you trace the HTTP requests you can see the <SCRIPT> and <NOSCRIPT> blocks are both executed for all of the advertisements in the page, which all contain /advertpro/ in their URL so that can be used to filter for them.

Reproducible: Always

Steps to Reproduce:
1. Visit web site
2. Allow page to complete loading
3. Review HTTP traces
Actual Results:  
Both <NOSCRIPT> and <SCRIPT> are executed.

Expected Results:  
Only <SCRIPT> should be executed.

Here are the results of a HTTP trace that I did with HTTP Analyzer, which shows the <SCRIPT> call executes and then right after it the <NOSCRIPT> part of the code loads an <IMG> tag.

NO.            Starred  OffSet   Timeline  Hints  Duration(s)  Method  Result  Received  Type             URL                                                                                                                                                                                                                                                                                                                                                                                                                                  RedirectURL                                                
-  00:00:00.656    firefox.exe[2528] (Count=15, Sent=12.73 K, Received=15.38 K, ElapsedTime=2.969 s)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
   53        False    + 1.281            True   0.126 s      GET     200     1.12 K    text/javascript  http://ads.contactmusic.com/advertpro/servlet/view/banner/javascript/zone?zid=162&pid=0&random=44211242&millis=1300470897418&referrer=http%3A//ads.contactmusic.com/advertpro/servlet/view/banner/javascript/html/zone%3Fzid%3D89%26pid%3D28%26custom1%3Dy%26custom2%3Dy%26custom3%3Dy%26custom4%3Dy%26random%3D70707430%26millis%3D1300470896657%26referrer%3Dhttp%253A//www.contactmusic.com/                                                                                                 
   54        False    + 1.296            False  0.142 s      GET     302     334       text/html        http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=162&pid=0&position=1                                                                                                                                                                                                                                                                                                                                         http://ads.contactmusic.com/advertpro/servlet/file?id=16  
   56        False    + 1.343            True   0.111 s      GET     200     1.13 K    text/javascript  http://ads.contactmusic.com/advertpro/servlet/view/banner/javascript/zone?zid=163&pid=0&random=87306750&millis=1300470897423&referrer=http%3A//ads.contactmusic.com/advertpro/servlet/view/banner/javascript/html/zone%3Fzid%3D88%26pid%3D28%26custom1%3Dy%26custom2%3Dy%26custom3%3Dy%26custom4%3Dy%26random%3D20405627%26millis%3D1300470896768%26referrer%3Dhttp%253A//www.contactmusic.com/                                                                                                 
   70        False    + 1.359            False  0.345 s      GET     302     334       text/html        http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=163&pid=0&position=1                                                                                                                                                                                                                                                                                                                                         http://ads.contactmusic.com/advertpro/servlet/file?id=12  
   99        False    + 1.812            True   0.173 s      GET     200     1.12 K    text/javascript  http://ads.contactmusic.com/advertpro/servlet/view/banner/javascript/zone?zid=164&pid=0&random=13977396&millis=1300470897951&referrer=http%3A//www.contactmusic.com/                                                                                                                                                                                                                                                                                                                            
   100       False    + 1.828            False  0.157 s      GET     302     333       text/html        http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=164&pid=0&position=1                                                                                                                                                                                                                                                                                                                                         http://ads.contactmusic.com/advertpro/servlet/file?id=7
Version: unspecified → Trunk
Do you see this in both, Fx 3.6 and Fx 4 ?
Severity: major → normal
Product: Firefox → Core
QA Contact: general → general
No, I'm only seeing this happen with Firefox 4 RC.  It does not happen with Firefox 3.6.15
Component: General → HTML: Parser
QA Contact: general → parser
That said, I'm seeing the <img> end up as text in the DOM....

Henri, are we doing prefetch on things inside <noscript> perchance?
In a simple experiment we don't seem to be...

I can reproduce the zid=162/163/164 images from comment 0 be being loaded, but those urls do not appear in the source of the page (much less inside <noscript>)....  nor in the DOM of the final page.  So they're presumably being dynamically added inside some iframe somehere.
Alright.  The load of the zid=163 image is happening when the script loaded from <http://oascentral.artistdirect.com/RealMedia/ads/adstream_jx.ads/www.contactmusic.com/homepage/1675279910@Top,Right,Left!Right> does a document.write of this string:

  <IMG src="http://ads.contactmusic.com/advertpro/servlet/view/banner/image/zone?zid=163&pid=0&position=1" hspace="0" vspace="0" border="0" alt="Click Here!">

followed by a newline.  This is certainly NOT sitting inside a <noscript> block; it's being explicitly document.written into the page.

Unfortunately, the data returned for that script seems to depend on something other than the url, because what the scriptloader sees and what I see if I load the url directed are totally different (e.g. an order of magnitude different in size).
Oh, and to be clear, the load in comment 6 was in fact a preload for the image.

I'll attach the script in question in a sec, but this looks like a parser bug.  The <img> tag is in fact inside <noscript>, but they are in separate document.write calls.  Henri?
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #7)
> I'll attach the script in question in a sec, but this looks like a parser bug. 
> The <img> tag is in fact inside <noscript>, but they are in separate
> document.write calls.  Henri?

This is by design. See http://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5Parser.cpp#411

Getting the preloads always right when multi-level document.write is involved would be disproportionately complex compared to the relative harmlessness of doing an extra preload in a special case. After all, document.writing a <noscript> is rather pointless in the first place.

Things should work right for <noscript> coming from the network stream and for document.write when multi-level document.write is not involved.

Marking WONTFIX, since the code is the way it is intentionally for complexity avoidance.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Summary: <NOSCRIPT> is executed in addition to <SCRIPT> when JavaScript is enabled → <img> inside document.written <noscript> after multi-level document.write preloaded
How can this be considered correct behavior when it breaks the contract of <noscript> ?

This can be a big problem for any system that uses images as tracking assets.  In this case it will cause over counting of delivered ads, which advertisers will be charged extra money for.  The financial impact of that can be very large for sites that serve 100's of millions or billions of ads per month, which a number of our customers do.

It's quite common for ad servers to have nested document.write calls because they exchange ads with each other.  In this case the contact music site is displaying a third-party ad.  However, they don't have any ads to display at this time.  So, they revert to sending back a code from contact music that displays different ads, which again is quite a common practice for all ad servers.  The reason the code has <noscript> is that the publishing ad server might display the code with <script> or <iframe> tags, which isn't known in advance.

I can appreciate not wanting to complicate code, but the potential impact of this is very large for systems which are designed to depend on tracking assets only being loaded in <script> or <noscript> but not both.

May I suggest that a potential fix could be to simply disable pre-loading of images in all <noscript> blocks when JavaScript is enabled?  I don't understand why the <noscript> code is even being parsed when JavaScript is enabled.  Can't the parser simply ignore the content until it reaches the </noscript> tag?  The reverse is also true, if JavaScript is disabled it shouldn't parse what's in the <script> tags.
> This can be a big problem for any system that uses images as tracking assets. 

Any sort of speculative prefetch is a big problem for systems like that, yes.  They just need to deal with it....  There's no guarantee that an <img> tag in a webpage means exactly one image load even in the absence of speculative prefetch: it could be none, it could be more than one.

> I don't understand why the <noscript> code is even being parsed when JavaScript
> is enabled.

The point is that the speculative image prescan in this case doesn't know that it's inside a <noscript>, because that was passed to the parser in a separate call.

The speculative prescan is _speculative_.  It doesn't happen from the main parse; by the time you're parsing the stuff for real you know whether you need to load the image, so it's not speculation.
Blocks: 649294
I misdiagnosed this the first time round. The speculative tree builder was configured to run the HTML5 tree building algorithm as if scripts were disabled, so the noscript contents got tokenized as tags regardless of multilevel document.write issues.
Assignee: nobody → hsivonen
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I copied the state of the other tree builder instead of just passing PR_TRUE in case there are situations where scripting is "off" but an extensions can still run scripts.
Attachment #528542 - Flags: review?(bzbarsky)
Attached patch Mochitest (obsolete) — Splinter Review
Attachment #528543 - Flags: review?(bzbarsky)
Status: REOPENED → ASSIGNED
Summary: <img> inside document.written <noscript> after multi-level document.write preloaded → <img> inside <noscript> preloaded when <noscript> appears in document.write after <script src>
OS: Windows 7 → All
Hardware: x86 → All
Attachment #528543 - Attachment is obsolete: true
Attachment #528543 - Flags: review?(bzbarsky)
Attached patch Mochitest, v2Splinter Review
Attachment #528546 - Flags: review?(bzbarsky)
Comment on attachment 528542 [details] [diff] [review]
Make the noscript treatment in the speculative doc.write tree builder match the non-speculative doc.write tree builder

r=me
Attachment #528542 - Flags: review?(bzbarsky) → review+
Comment on attachment 528546 [details] [diff] [review]
Mochitest, v2

I assume you made sure this fails without the patch, right?

If so, r=me.
Attachment #528546 - Flags: review?(bzbarsky) → review+
(In reply to comment #18)
> I assume you made sure this fails without the patch, right?

I did.

Thanks. 

http://hg.mozilla.org/mozilla-central/rev/b16fc4a77dac
http://hg.mozilla.org/mozilla-central/rev/557165a66267
Status: ASSIGNED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Comment on attachment 528542 [details] [diff] [review]
Make the noscript treatment in the speculative doc.write tree builder match the non-speculative doc.write tree builder

Nominating for Aurora, because this fix is extremely low-risk but keeping this bug unfixed causes grief to Internet advertising bureaus that are trying to count ad impressions.
Attachment #528542 - Flags: approval-mozilla-aurora?
Attachment #528546 - Flags: approval-mozilla-aurora?
Keywords: regression
Attachment #528542 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #528546 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Landed in Aurora, too:
http://hg.mozilla.org/mozilla-aurora/rev/036daba8643f
http://hg.mozilla.org/mozilla-aurora/rev/c5a406ca5980
Flags: in-testsuite+
Target Milestone: --- → mozilla6
(In reply to comment #10)
> How can this be considered correct behavior when it breaks the contract of
> <noscript> ?
> 
> This can be a big problem for any system that uses images as tracking
> assets.  In this case it will cause over counting of delivered ads, which
> advertisers will be charged extra money for.  The financial impact of that
> can be very large for sites that serve 100's of millions or billions of ads
> per month, which a number of our customers do.
> 
> It's quite common for ad servers to have nested document.write calls because
> they exchange ads with each other.  In this case the contact music site is
> displaying a third-party ad.  However, they don't have any ads to display at
> this time.  So, they revert to sending back a code from contact music that
> displays different ads, which again is quite a common practice for all ad
> servers.  The reason the code has <noscript> is that the publishing ad
> server might display the code with <script> or <iframe> tags, which isn't
> known in advance.
> 
> I can appreciate not wanting to complicate code, but the potential impact of
> this is very large for systems which are designed to depend on tracking
> assets only being loaded in <script> or <noscript> but not both.
> 
> May I suggest that a potential fix could be to simply disable pre-loading of
> images in all <noscript> blocks when JavaScript is enabled?  I don't
> understand why the <noscript> code is even being parsed when JavaScript is
> enabled.  Can't the parser simply ignore the content until it reaches the
> </noscript> tag?  The reverse is also true, if JavaScript is disabled it
> shouldn't parse what's in the <script> tags.

This is already becoming a big problem as Firefox 4 gains adoption.  At Rubicon we're considering a change to our ad tags to remove the <noscript> block entirely.  This may cause detrimental web experience for users that choose not to use javascript (albeit a very small percentage at this point).  The alternative is to have a double-count scenario which causes a lot of problems for web publishers trying to evaluate performance of their online properties.

Is there any chance that this can make it into a Firefox 4 patch?
Ed, there are no Firefox 4 patches planned before the release of Firefox 5 in about a month, unless a critical security problem comes up.

This patch will be in Firefox 5 when that ships; Firefox 4 users will be automatically updated to Firefox 5.
(In reply to comment #23)
> we're considering a change to our ad tags to remove the <noscript>
> block entirely.  This may cause detrimental web experience for users that
> choose not to use javascript

How can a user with javascript disabled get any web experience from content that is written by javascript?
Exactly the point.  Normally such users' web experience would be fulfilled by the content called by code within the <noscript> blocks.  However, this bug is causing serious problems in web performance evaluation as noted above -- leading to a possible temporary solution by publishers of removing the <noscript> blocks entirely (which they shouldn't have to do) to mitigate the effects of this bug.  This will result in detrimental performance for users with JavaScript disabled.
bkearns, those noscript blocks currently don't make it into the page at all if JavaScript is disabled.  They're written out only if a script runs.  At least in all the cases that have been brought up so far.
Hi, we are experiencing the same counting issues and again are looking to remove noscript tags as a result, which is a shocking thing for usability. Whilst Boris makes the point that publisher side Javascript is required to output the no script tag in the first place, the fact is that we as third party adservers do not have visibility into how the publishers put the tags on the page, so we have to remove the noscript tag from all our tags even ones which are directly output, or output by the publisher in iFrames and these break too. This is having a major impact on online advertising companies and causing many many hours of extra work to try and untangle the double counting issue.
No need for pushing, this bug is fixed. Ships with Firefox 5 final on 2011-06-21.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: