Closed Bug 396509 Opened 17 years ago Closed 16 years ago

Standard Vista account with Parental Controls turned on cannot launch Firefox

Categories

(Firefox :: General, defect, P1)

x86
Windows Vista
defect

Tracking

()

VERIFIED FIXED
Firefox 3 beta3

People

(Reporter: abillings, Assigned: jimm)

References

Details

(Keywords: relnote)

Attachments

(1 file, 2 obsolete files)

If a "Standard" account on Windows Vista has Parental Controls turned on for it at all, when that account tries to launch Firefox, the "busy" mouse pointer will appear for a second and then disappear. Nothing else happens and Firefox does not launch. 

If Parental Controls are turned off for the same user, Firefox will launch. If they are turned back on, it will fail to launch again.

This was found in the M8 RC1 on Windows Vista Ultimate edition: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9a8) Gecko/2007091216 GranParadiso/3.0a8
Does this also occur with M7? If not this may have to do with some of the recent work that was done with parental controls.
I installed M7 in a different directory on the same machine and it has the same issue.
hmmm... I'm surprised M7 didn't work and this has worked in the past so a regression range would be appreciated.
Flags: blocking-firefox3?
Can one of the other QA with Vista Ultimate provide a regression range? I'm not in the office tonight or tomorrow (Tuesday) and our virtual machines are, as it turns out, Vista Business and do not have Parental Controls. One of the QA lab machines has Ultimate.
The hold up occurs before main is called, during dll initialization.

normal load: 
..
Loaded 'C:\Windows\System32\psapi.dll'
Loaded 'C:\Windows\System32\samlib.dll'
Loaded 'C:\Windows\System32\mswsock.dll'
Loaded 'C:\Windows\System32\wship6.dll'
Loaded 'C:\Windows\System32\uxtheme.dll'
Loaded 'C:\Windows\System32\setupapi.dll'
..

Frozen load:

Loaded 'C:\Windows\System32\mswsock.dll'
Loaded 'C:\Windows\System32\wship6.dll'
(freeze)

The exe isn't frozen, it's waiting, here's the call stack - 

ntdll.dll!__LdrpInitialize@8()  - 0x2cf7 bytes	
ntdll.dll!_LdrInitializeThunk@8()  + 0x10 bytes	

It's basically looping around in LdrpInitialize waiting on something to finish up. A regression window might be handy, tracking this down could be a real pain otherwise.

mconnor and I are thinking that this doesn't fully block the M8 release, but obviously blocks final release.
Flags: blocking-firefox3? → blocking-firefox3+
Keywords: relnote
Target Milestone: --- → Firefox 3 M9
I am hunting down a regression range. This doesn't work with Alpha6 either.
Jim, didn't you do some work with parental controls a while back? If so, what version of the trunk were you using?
I just wanted to note quickly how I am testing this:

1. Install Minefield build into the default directory.
2. Make sure PC are turned on for the other user on my machine.
3. With Minefield running, switch to the other user account.
4. Attempt to launch Minefield.
5. I do see the busy mouse pointer, but then goes away and you really don't know where you stand.
Yes. I ran into this on some pre-M8 work with the download manager, but I made the assumption it was something funny in working with debug builds. Most of the testing I was doing was with a cgi proxy I had setup on my dekstop.
I tested it the same way as Marcia except I stopped the browser in step 3 before logging in with the other account. I did try and uninstall and reinstall with both admin and standard accounts as well. Standard required an admin password to install (as expected).
I was asked to test Alpha 1 of Minefield (Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9a1) Gecko/20061204 GranParadiso/3.0a1), and I'll note that it's not working there, either.
(In reply to comment #5)
> The hold up occurs before main is called, during dll initialization.
> 
> normal load: 
> ..
> Loaded 'C:\Windows\System32\psapi.dll'
> Loaded 'C:\Windows\System32\samlib.dll'
> Loaded 'C:\Windows\System32\mswsock.dll'
> Loaded 'C:\Windows\System32\wship6.dll'
> Loaded 'C:\Windows\System32\uxtheme.dll'
> Loaded 'C:\Windows\System32\setupapi.dll'
> ..
> 
> Frozen load:
> 
> Loaded 'C:\Windows\System32\mswsock.dll'
> Loaded 'C:\Windows\System32\wship6.dll'
> (freeze)
> 
> The exe isn't frozen, it's waiting, here's the call stack - 
> 
> ntdll.dll!__LdrpInitialize@8()  - 0x2cf7 bytes  
> ntdll.dll!_LdrInitializeThunk@8()  + 0x10 bytes 
> 
> It's basically looping around in LdrpInitialize waiting on something to finish
> up. A regression window might be handy, tracking this down could be a real pain
> otherwise.

It seems you pointed this out in https://bugzilla.mozilla.org/show_bug.cgi?id=355554#c40, Jim, where you said, "I've been meaning to get
back and figure out why locally built release builds don't launch under these
accounts, but haven't had the time. The exe will spawn, but main never gets
called, seems to have something to do with dll loading."

I tried running even the 2007-08-22 builds with Parental Controls enabled, and they didn't finish loading (firefox.exe was spawned, though no UI came up), until I elevated the process and re-ran as Admin, which worked...
Yep. So how do you guys handle this type of thing, just randomly pick snapshots prior to december 2006's A1 in an effort to zoom in on it?
I usually use http://archive.mozilla.org/pub/ and use those builds to narrow down the range
Archives, that's handy. :) From my understanding, MOZILLA_1_8_BRANCH was branched 2005-08-12, and there's a 1.8 release in there from 2005-08-09. So correct me if I'm wrong but if the latest 1.8 build from this week runs correctly (2.0.0.7 I believe), we then do a manual bsort between 2005-08-09 and M1 (December 2006?) on the trunk builds until we nail it down? 
Sounds good to me
I'll work on getting a regression range within a day or two, max (if nobody beats me to it).
(In reply to comment #18)
> I'll work on getting a regression range within a day or two, max (if nobody
> beats me to it).

Regression range is:

2005-12-13 06 WORKS
2005-12-14 06 FAILS

Bonsai query of checkin window:

http://tinyurl.com/2froq3

That leaves bug 316416 as the likely suspect.
Blocks: 316416
Keywords: qawanted
If the hang happens before we enter main(), I don't see how this could be blamed to any code change in our product... a linker or compiler change, perhaps.

Are there any other threads running? It might be nice to get a stack from all threads to see if there's a deadlock.
ntdll.dll!_KiFastSystemCallRet@0() 	
ntdll.dll!_NtWaitForKeyedEvent@16()  + 0xc bytes	
ntdll.dll!_TppWaitpSet@16()  + 0x84 bytes	
ntdll.dll!_TppSetWaitInterrupt@12()  + 0x7d bytes	
ntdll.dll!_RtlRegisterWait@24()  + 0x178 bytes	
kernel32.dll!_RegisterWaitForSingleObject@24()  + 0x5e bytes	
wpclsp.dll!74dbcade() 	
[Frames below may be incorrect and/or missing, no symbols loaded for wpclsp.dll]	
wpclsp.dll!74db87a0() 	
ws2_32.dll!__SEH_epilog4_GS()  + 0xa bytes	
ws2_32.dll!DPROVIDER::Initialize()  + 0x28c bytes	
ws2_32.dll!_WSASocketW@24()  + 0x9d bytes	
ws2_32.dll!_socket@12()  + 0x56 bytes	
nspr4.dll!_PR_MD_SOCKET(int af=0x00000017, int type=0x00000001, int flags=0x00000000)  Line 152 + 0x1f bytes	C
nspr4.dll!_pr_test_ipv6_socket()  Line 1276 + 0xb bytes	C
nspr4.dll!_pr_init_ipv6()  Line 315	C
nspr4.dll!PR_NewLogModule(const char * name=0x609714a0)  Line 367 + 0xe bytes	C
xul.dll!`dynamic initializer for 'sCookieLog''()  Line 204 + 0xb bytes	C++
>	msvcr80.dll!_initterm(void (void)* * pfbegin=0x6095f80c, void (void)* * pfend=0x6095f914)  Line 855	C


ntdll.dll!_LdrpCallInitRoutine@16()  + 0x14 bytes	
> 	ntdll.dll!_LdrpRunInitializeRoutines@4()  - 0x7e1 bytes	
ntdll.dll!_LdrpInitializeProcess@8()  - 0x1cc bytes	
ntdll.dll!__LdrpInitialize@8()  - 0x4e01 bytes	
ntdll.dll!_LdrInitializeThunk@8()  + 0x10 bytes	

>	ntdll.dll!_KiFastSystemCallRet@0() 	
ntdll.dll!_NtDelayExecution@8()  + 0xc bytes	
ntdll.dll!__LdrpInitialize@8()  - 0x2cdf bytes	
ntdll.dll!_LdrInitializeThunk@8()  + 0x10 bytes	

>	ntdll.dll!__LdrpInitialize@8()  - 0x2ce7 bytes	
ntdll.dll!_LdrInitializeThunk@8()  + 0x10 bytes	

>	ntdll.dll!_KiFastSystemCallRet@0() 	
ntdll.dll!_NtDelayExecution@8()  + 0xc bytes	
ntdll.dll!__LdrpInitialize@8()  - 0x2cdf bytes	
ntdll.dll!_LdrInitializeThunk@8()  + 0x10 bytes	
I'm not familiar with our module code, but it looks like the changes in bug 316416 in combination with our ipv6 networking are the problem.
sorry, some clipping occured on the top thread:
ntdll.dll!_KiFastSystemCallRet@0() 	
ntdll.dll!_NtWaitForKeyedEvent@16()  + 0xc bytes	
ntdll.dll!_TppWaitpSet@16()  + 0x84 bytes	
ntdll.dll!_TppSetWaitInterrupt@12()  + 0x7d bytes	
ntdll.dll!_RtlRegisterWait@24()  + 0x178 bytes	
kernel32.dll!_RegisterWaitForSingleObject@24()  + 0x5e bytes	
wpclsp.dll!74dbcade() 	
[Frames below may be incorrect and/or missing, no symbols loaded for wpclsp.dll]	
wpclsp.dll!74db87a0() 	
ws2_32.dll!__SEH_epilog4_GS()  + 0xa bytes	
ws2_32.dll!DPROVIDER::Initialize()  + 0x28c bytes	
ws2_32.dll!_WSASocketW@24()  + 0x9d bytes	
ws2_32.dll!_socket@12()  + 0x56 bytes	
nspr4.dll!_PR_MD_SOCKET(int af=0x00000017, int type=0x00000001, int flags=0x00000000)  Line 152 + 0x1f bytes	C
nspr4.dll!_pr_test_ipv6_socket()  Line 1276 + 0xb bytes	C
nspr4.dll!_pr_init_ipv6()  Line 315	C
nspr4.dll!PR_NewLogModule(const char * name=0x609714a0)  Line 367 + 0xe bytes	C
xul.dll!`dynamic initializer for 'sCookieLog''()  Line 204 + 0xb bytes	C++
>	msvcr80.dll!_initterm(void (void)* * pfbegin=0x6095f80c, void (void)* * pfend=0x6095f914)  Line 855	C
xul.dll!_CRT_INIT(void * hDllHandle=0x603d0000, unsigned long dwReason=0x00000000, void * lpreserved=0x0012fd24)  Line 316 + 0xf bytes	C
xul.dll!__DllMainCRTStartup(void * hDllHandle=0x603d0000, unsigned long dwReason=0x00000000, void * lpreserved=0x00000000)  Line 492 + 0x8 bytes	C
xul.dll!_DllMainCRTStartup(void * hDllHandle=0x603d0000, unsigned long dwReason=0x00000001, void * lpreserved=0x0012fd24)  Line 462 + 0x11 bytes	C
Updates - I tracked down a symbol file for the parental controls lib, and built a fresh debug build from a checkout yesterday. The call stack changed from the original release I was working with, but the culprit remains the call to ipv6 init which calls into wpclsp.dll.

ntdll.dll!_KiFastSystemCallRet@0() 	
ntdll.dll!_NtWaitForKeyedEvent@16()  + 0xc bytes	
ntdll.dll!_TppWaitpSet@16()  + 0x84 bytes	
ntdll.dll!_TppSetWaitInterrupt@12()  + 0x7d bytes	
ntdll.dll!_RtlRegisterWait@24()  + 0x178 bytes	
kernel32.dll!_RegisterWaitForSingleObject@24()  + 0x5e bytes	
wpclsp.dll!CSocketInfo::Initialize()  + 0x37c bytes	
wpclsp.dll!WSPSocket()  + 0x1ae bytes	
ws2_32.dll!_WSASocketW@24()  + 0x9d bytes	
ws2_32.dll!_socket@12()  + 0x56 bytes	
nspr4.dll!_PR_MD_SOCKET(int af=0x00000017, int type=0x00000001, int flags=0x00000000)  Line 152 + 0x11 bytes	C
nspr4.dll!_pr_test_ipv6_socket()  Line 1276 + 0xb bytes	C
nspr4.dll!_pr_probe_ipv6_presence()  Line 306	C
nspr4.dll!_pr_init_ipv6()  Line 314 + 0x5 bytes	C
nspr4.dll!_PR_InitStuff()  Line 255	C
nspr4.dll!_PR_ImplicitInitialization()  Line 266	C
nspr4.dll!PR_NewLogModule(const char * name=0x003601e8)  Line 369	C
xpcom_core.dll!`dynamic initializer for 'gTimerLog''()  Line 56 + 0xe bytes	C++
msvcr80d.dll!__initterm()  + 0x1a bytes	
xpcom_core.dll!_CRT_INIT(void * hDllHandle=0x00280000, unsigned long dwReason=0x00000001, void * lpreserved=0x0012fd24)  Line 316 + 0xf bytes	C
xpcom_core.dll!__DllMainCRTStartup(void * hDllHandle=0x00280000, unsigned long dwReason=0x00000001, void * lpreserved=0x0012fd24)  Line 492 + 0x11 bytes	C
xpcom_core.dll!_DllMainCRTStartup(void * hDllHandle=0x00280000, unsigned long dwReason=0x00000001, void * lpreserved=0x0012fd24)  Line 462 + 0x11 bytes	C
ntdll.dll!_LdrpCallInitRoutine@16()  + 0x14 bytes	
ntdll.dll!_LdrpRunInitializeRoutines@4()  - 0x7e1 bytes	
ntdll.dll!_LdrpInitializeProcess@8()  - 0x1cc bytes	
ntdll.dll!__LdrpInitialize@8()  - 0x4e01 bytes	
ntdll.dll!_LdrInitializeThunk@8()  + 0x10 bytes	
wtc, is there a way to avoid the socket calls from _PR_ImplicitInitialization?
The socket(AF_INET6, SOCK_STREAM, 0) call detects whether
IPv6 support is enabled.  If there is another way to detect
that, we can avoid the socket call.  We can also perform
this test lazily, which should be done only as a last resort.

Please find out why the socket call is problematic.
Otherwise I don't know how to prevent a regression if I
change _PR_ImplicitInitialization in the future.  Also
please let me know if commenting out the socket call
in mozilla/nsprpub/pr/src/io/prsocket.c fixes the hang:

1272 PR_IMPLEMENT(PRBool) _pr_test_ipv6_socket()
1273 {
#if 0  <=== ADD
1274     PROsfd osfd;
1275 
1276     osfd = _PR_MD_SOCKET(AF_INET6, SOCK_STREAM, 0);
1277     if (osfd != -1) {
1278         _PR_MD_CLOSE_SOCKET(osfd);
1279         return PR_TRUE;
1280     }
#endif  <=== ADD
1281     return PR_FALSE;
1282 }
http://litmus.mozilla.org/show_test.cgi?id=4690

in-litmus + (Marcia wrote the testcase for this; thanks, Marcia!)
Flags: in-litmus+
I'm guessing a bit:

* the Windows DLL initialization code holds a lock while it calls DllMain, which is held while we initialize static objects
* the parental controls DLL creates a thread in its socket hook; that thread is blocked from execution waiting on the DLL init lock
* The main threads issues a blocking wait for the helper thread to do something and deadlocks

From the MSDN docs: "Calling functions that require DLLs other than Kernel32.dll may result in problems that are difficult to diagnose. For example, calling User, Shell, and COM functions can cause access violation errors, because some functions load other system components. Conversely, calling functions such as these during termination can cause access violation errors because the corresponding component may already have been unloaded or uninitialized.

Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result."

If we want to say that NSPR initialization should not occur during DLLMain, Mozilla could try to remove all our uses of
static PRLogModuleInfo *gFoo = PR_NewLogModule("Foo");

But this pattern is pretty ingrained into the Mozilla codebase.
-> Jim to drive to resolution, we can relnote this for b1...
Assignee: nobody → jmathies
Target Milestone: Firefox 3 M9 → Firefox 3 M10
>Also
>please let me know if commenting out the socket call
>in mozilla/nsprpub/pr/src/io/prsocket.c fixes the hang:
>
>1272 PR_IMPLEMENT(PRBool) _pr_test_ipv6_socket()
>1273 {
>#if 0  <=== ADD

Yes, this fixes the issue. With this commented out, you can even run locally built debug builds assuming you approve the app and the windbgdlg stand alone through parental controls.
> Please find out why the socket call is problematic.

We are calling this from within the process attach call on xpcom_core.dll. That entry point is not reentrant and will lock up threads that loop back around with thread attachement calls. My guess is a thread is being created (most likely in RegisterWaitForSingleObject within the parental controls lib) and that is in turn trying to attach to the library, causing the hang.

Chen has a nice little write-up on this behavior:
http://blogs.msdn.com/oldnewthing/archive/2007/09/04/4731478.aspx

>The socket(AF_INET6, SOCK_STREAM, 0) call detects whether
>IPv6 support is enabled.  If there is another way to detect
>that, we can avoid the socket call.

From what I've found on various msdn related pages, and specifically this blog post - 

http://blogs.msdn.com/wndp/archive/2006/08/29/Creating-IP-Agnostic-Applications--Part--1.aspx

"For Winsock developers, this is accomplished by calling the socket or WSASocket function with the address family parameter set to AF_INET to test IPv4 availability and AF_INET6 for IPv6."

we're doing the test the right way, just at the wrong time.

Target Milestone: Firefox 3 M10 → Firefox 3 M11
Status: NEW → ASSIGNED
Priority: -- → P3
Attached patch parental controls ipv6 patch v.1 (obsolete) — Splinter Review
Hey all, I'd welcome comments on this. I looked at bug 316416 and decided not to mess with it. So I went with a simple delay on the ipv6 init code. Tested on a pc account and everything is working fine.
Attachment #291750 - Flags: review?(wtc)
Comment on attachment 291750 [details] [diff] [review]
parental controls ipv6 patch v.1

Jim, thanks for the patch.  Your patch is in the right direction
but it has two problems.

1. The way you ensure that _pr_init_ipv6() is called only once is
not thread-safe:

    if (!_pr_ipv6_initialized) _pr_init_ipv6();

We initialize NSPR this way because we assume threads are created
using NSPR functions, so the program is single-threaded before any
NSPR function is called.  But we can't make this assumption for
_pr_init_ipv6().

You can use PR_CallOnce for this purpose.

2. Any function that tests the global variable _pr_ipv6_is_present
must ensure that _pr_init_ipv6() has been called.  You can find
all such functions using this LXR query:

http://lxr.mozilla.org/nspr/ident?i=_pr_ipv6_is_present

Note that there is a second implementation of PR_Socket for Unix
in ptio.c, and several functions in prnetdb.c, that need to ensure
that _pr_init_ipv6() has been called.
Attachment #291750 - Flags: review?(wtc) → review-
Cool - thanks for the feedback. Another patch forthcoming.
Jim,

Instead of littering the NSPR source tree with PR_CallOnce
calls for _pr_init_ipv6(), I suggest that we take a different
approach.

Rename the global variable _pr_ipv6_is_present and
make it static.

Replace the references to _pr_ipv6_is_present by a
new internal function with the same name.

Add PR_CallOnce calls to the _pr_ipv6_is_present() and
_pr_push_ipv6toipv4_layer() functions.

The reason this approach works is that the global variables
initialized by _pr_init_ipv6()

    _pr_ipv6_is_present (to be renamed ipv6_is_present)
    _pr_ipv6_to_ipv4_id
    ipv6_to_v4_tcpMethods
    ipv6_to_v4_udpMethods

are only used by _pr_ipv6_is_present() and
_pr_push_ipv6toipv4_layer().

I outline this approach below:

static PRBool ipv6_is_present;  /* renamed and made static */

static PRCallOnceType _pr_init_ipv6_once;

static PRStatus PR_CALLBACK _pr_init_ipv6(void)  /* note the static and PR_CALLBACK */
{
    ...
}

PRBool _pr_ipv6_is_present(void)  /* this name is now a function */
{
    if (PR_CallOnce(&_pr_init_ipv6_once, _pr_init_ipv6) != PR_SUCCESS)
        return PR_FALSE;
    return ipv6_is_present;
}

PR_IMPLEMENT(PRStatus) _pr_push_ipv6toipv4_layer(PRFileDesc *fd)
{
    PRFileDesc *ipv6_fd = NULL;

    if (PR_CallOnce(&_pr_init_ipv6_once, _pr_init_ipv6) != PR_SUCCESS)
        return PR_FAILURE;

    ...
}

PR_CallOnce is documented at http://developer.mozilla.org/en/docs/PR_CallOnce.
Please let me know if you have any questions.
Movin' up the priority as I'd really like to get this in Beta3
Priority: P3 → P1
Hey Jim, can you update the status here?
Attached patch parental controls ipv6 patch v.2 (obsolete) — Splinter Review
Attachment #291750 - Attachment is obsolete: true
Attachment #295828 - Flags: review?(wtc)
Jim, thanks for the patch.  I fixed some minor problems I
found during code review and took the opportunity to clean
up the code a little bit.  Could you please review and test
it?

I checked in the patch on the NSPR trunk for NSPR 4.7.
I will create a new NSPR CVS tag for the Mozilla trunk to
pull next week.

Checking in misc/prinit.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prinit.c,v  <--  prinit.c
new revision: 3.47; previous revision: 3.46
done
Checking in misc/prnetdb.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prnetdb.c,v  <--  prnetdb.c
new revision: 3.56; previous revision: 3.55
done
Checking in io/pripv6.c;
/cvsroot/mozilla/nsprpub/pr/src/io/pripv6.c,v  <--  pripv6.c
new revision: 3.13; previous revision: 3.12
done
Checking in io/prsocket.c;
/cvsroot/mozilla/nsprpub/pr/src/io/prsocket.c,v  <--  prsocket.c
new revision: 3.61; previous revision: 3.60
done
Checking in pthreads/ptio.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptio.c,v  <--  ptio.c
new revision: 3.108; previous revision: 3.107
done
Attachment #295828 - Attachment is obsolete: true
Attachment #296771 - Flags: review?(jmathies)
Attachment #295828 - Flags: review?(wtc)
Upgraded the NSPR tag in mozilla/client.mk on the Mozilla trunk
to NSPR_HEAD_20080113 to pick up this fix.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Thanks for the help - confirmed with a checkout this morning. This is working correctly: trunk is now booting correctly on parental controls accounts under Vista.
Attachment #296771 - Flags: review?(jmathies) → review+
Verified FIXED in both Windows Vista Ultimate and Windows Vista Home Premium with:

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b3pre) Gecko/2008011405 Minefield/3.0b3pre
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.