nsStandardURL::SetRef broken in Win32 PGO with new PGO input (Yahoo News Photo Highlight slideshow does not properly advance under windows)

RESOLVED FIXED

Status

()

RESOLVED FIXED
10 years ago
8 years ago

People

(Reporter: wgianopoulos, Assigned: ted)

Tracking

({fixed1.9.1, regression})

Trunk
x86
Windows XP
fixed1.9.1, regression
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.2 +

Firefox Tracking Flags

(status1.9.2 beta1-fixed)

Details

(URL)

Attachments

(4 attachments)

(Reporter)

Description

10 years ago
Using recent official Firefox nightly builds, the slideshow int he URL above will not properly advance to slide 4 form slide3.  It instead goes back to slide 1.

I cannot reproduce this bug in my own Windows build.  Therefore I suspect a PGO issue.

The regression range has been determined to be:

http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2009-01-13+03%3A00%3A00&enddate=2009-01-13+06%3A00%3A00

All of the bugs in that range are places related except for bug 472706.

I suspect there was a latent PGO issue that this checkin somehow brought to the surface.
Can you get an exact regression range using the changesets in about:buildconfig? (Not that I don't believe the range.)
Also, I can reproduce this in:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090122 Minefield/3.2a1pre
but not in my self-built (non-PGO):
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090123 Minefield/3.2a1pre
(Reporter)

Comment 3

10 years ago
(In reply to comment #1)
> Can you get an exact regression range using the changesets in
> about:buildconfig? (Not that I don't believe the range.)

The regression range was not identified by me, I got it from this post on Mozillazine:

http://forums.mozillazine.org/viewtopic.php?p=5571555#p5571555

According to that post, this is the range by changeset:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a7274359673d&tochange=8c4f6932ab41
I know it's a pain, but a reduced testcase would help a lot here. I suspect this is a latent bug that just gets exposed by PGO, like that other layout bug we had with PGO.
I dunno, the bug URL WFM on my own win32 PGO build.

Comment 6

10 years ago
Is it just me that "save page, complete" freezes the browser?
(Reporter)

Comment 7

10 years ago
(In reply to comment #6)
> Is it just me that "save page, complete" freezes the browser?

Also froze the browser for me.
(Reporter)

Updated

10 years ago
Summary: yahoo slideshow does not properly advance under windows → Yahoo News Photo Hightlight slideshow does not properly advance under windows
(Reporter)

Updated

10 years ago
Summary: Yahoo News Photo Hightlight slideshow does not properly advance under windows → Yahoo News Photo Highlight slideshow does not properly advance under windows
Just for the record, turning off content JIT does not affect the presence of this bug.
A reduced testcase would be really nice. I still can't reproduce locally with my own PGO builds (which use the same set of pages as the official builds).
I started a reduction over the weekend, but it's god-awful hard. The page includes a huge chunk of YUI, and it's very hard to tease out what parts of it can be removed. If anyone's interested in continuing to try to reduce, I can send you my partially-reduced copy.
To expound, I took the main JS file of the page down from 6444 lines to 4754 lines. (6444 is after reformatting it, of course, because it had been minified.) The HTML file I stripped down from 495 to 218 lines, including removing all stylesheets.
Flags: blocking1.9.2?
Jesse, Martijn, do you think you can help get this minimized?
We need to either fix this (preferably) or back out the PGO input changes (admitting defeat).

I'll upload my somewhat reduced copy somewhere in a minute.
Flags: blocking1.9.2? → blocking1.9.2+
Created attachment 362574 [details]
script file
Created attachment 362576 [details]
testcase

Click "Next" three times to see the bug.

Comment 17

10 years ago
I can think of three ways to approach this:

* Create a Lithium script with the criterion "Loading this in a PGO build and a non-PGO build gives different results".  Automate the clicking and add some sort of output.  Use the tips in 
http://www.squarefree.com/2009/01/11/reducing-real-world-scripts/.

* Do some special fuzz-testing that generates scripts and tells you if it finds any whose behavior depends on PGO.  If it finds something, it will be much easier for Lithium to reduce the fuzz testcase than the Yahoo code.

* Run the static analysis script from bug 420069 comment 19 on JS, XPConnect, and/or DOM.  This script is designed to catch bugs where the code depends on undefined order-of-evaluation, a known PGO trap.

I don't use Windows and I don't have dehydra set up, so I'm not likely to be able to help directly.
Created attachment 363031 [details]
testcase

I think this is the initial cause of the Yahoo issue. window.location.hash doesn't seem to work anymore as it should, afaict.
So I've stepped through a nightly build in a debugger, with bz's help, and determined that things are mostly ok up until the end of nsStandardURL::SetRef:
http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsStandardURL.cpp#2242
after this "mRef.mLen = refLen", mRef.mLen == 0, which is clearly broken.

Out of time for now, but more debugging is warranted.
(Assignee)

Updated

10 years ago
Assignee: nobody → ted.mielczarek
Status: NEW → ASSIGNED
So I stepped through assembly in both a non-PGO build and a PGOed nightly, and I can't figure that this is anything other than a compiler bug. The return value of ReplaceSegment is ignored there, but in the PGOed version, the value of eax is stored in mRef.mLen, which is wrong. The disassembly looks like so:
PGO:
100DD824  call        nsStandardURL::ReplaceSegment (10135D7Fh) 
    mPath.mLen += (refLen - mRef.mLen);
100DD829  sub         edi,dword ptr [esi+8Ch] 
    mRef.mLen = refLen;
    return NS_OK;
100DD82F  lea         ecx,[esp+24h] 
100DD833  add         dword ptr [esi+54h],edi 
100DD836  mov         dword ptr [esi+8Ch],eax 

non-PGO:
10075734  call        nsStandardURL::ReplaceSegment (10072CE5h) 
    mPath.mLen += (refLen - mRef.mLen);
10075739  mov         eax,ebx 
1007573B  sub         eax,dword ptr [esi+8Ch] 
    mRef.mLen = refLen;
    return NS_OK;
10075741  lea         ecx,[ebp-60h] 
10075744  add         dword ptr [esi+54h],eax 
10075747  mov         dword ptr [esi+8Ch],ebx 

[esi+8Ch] here is mRef.mLen. You can see that in the non-PGO case, it's keeping refLen in ebx, and just copies that over. In the PGOed case, it uses eax, despite not preserving the value there from the function call. I'll see if I can fiddle the code to make the compiler happy here. Maybe change the ReplaceSegment call to (void)ReplaceSegment, or assign it to a dummy variable? I don't know. I guess we could actually save the return value of ReplaceSegment and use it like almost every other Set* method does in this class.
Summary: Yahoo News Photo Highlight slideshow does not properly advance under windows → nsStandardURL::SetRef broken in Win32 PGO with new PGO input(Yahoo News Photo Highlight slideshow does not properly advance under windows)
(Assignee)

Updated

10 years ago
Blocks: 419893
If we had unit tests running on PGO builds on trunk, test_standardurl.js would have caught this. I did a local PGO build, and that test fails here.
I have a little patch that uses the return value of ReplaceSegment, and it fixes the code generation in a local PGO build. test_standardurl.js passes locally, and Martijn's testcase does as well. Disassembly looks like:

10099E64  call        nsStandardURL::ReplaceSegment (100698EFh) 
    mPath.mLen += shift;
10099E69  add         dword ptr [esi+54h],eax 
    mRef.mLen = refLen;
    return NS_OK;
10099E6C  lea         ecx,[esp+24h] 
10099E70  mov         dword ptr [esi+8Ch],edi 

Will attach in a second.
Component: Build Config → Networking
QA Contact: build-config → networking
Summary: nsStandardURL::SetRef broken in Win32 PGO with new PGO input(Yahoo News Photo Highlight slideshow does not properly advance under windows) → nsStandardURL::SetRef broken in Win32 PGO with new PGO input (Yahoo News Photo Highlight slideshow does not properly advance under windows)
Created attachment 364506 [details] [diff] [review]
use the return value of ReplaceSegment in SetRef
Attachment #364506 - Flags: superreview?(bzbarsky)
Attachment #364506 - Flags: review?(bzbarsky)
(Assignee)

Updated

10 years ago
Duplicate of this bug: 480303
Comment on attachment 364506 [details] [diff] [review]
use the return value of ReplaceSegment in SetRef

Looks fine, though it's not clear to me that the temporary is needed...
Attachment #364506 - Flags: superreview?(bzbarsky)
Attachment #364506 - Flags: superreview+
Attachment #364506 - Flags: review?(bzbarsky)
Attachment #364506 - Flags: review+
Yeah, not strictly necessary, but it's consistent with SetFileName's pattern.
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/1ddb1be9ba85
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
Attachment #364506 - Flags: approval1.9.1?
Comment on attachment 364506 [details] [diff] [review]
use the return value of ReplaceSegment in SetRef

Can we land this on 1.9.1 so we can get the PGO improvements landed there also?
Did you report this to Microsoft?
No. I thought about trying to make a reduced testcase and just couldn't muster up the energy. Also, I don't know where the canonical location to report bugs to Microsoft is.
http://connect.microsoft.com/VisualStudio ?

Also couldn't you just give them the full preprocessed source file?
Well, the bug only shows up with PGO, so you'd need something you could actually compile, link and run.
Comment on attachment 364506 [details] [diff] [review]
use the return value of ReplaceSegment in SetRef

a191=beltzner
Attachment #364506 - Flags: approval1.9.1? → approval1.9.1+
Mass change: adding fixed1.9.2 keyword

(This bug was identified as a mozilla1.9.2 blocker which was fixed before the mozilla-1.9.2 repository was branched (August 13th, 2009) as per this query: http://is.gd/2ydcb - if this bug is not actually fixed on mozilla1.9.2, please remove the keyword. Apologies for the bugspam)
Keywords: fixed1.9.2
status1.9.2: --- → beta1-fixed
Keywords: fixed1.9.2
You need to log in before you can comment on or make changes to this bug.