Browser built with mingw-w64 crashes on shutdown (Faulting module name: nssckbi.dll_unloaded)

RESOLVED FIXED in 3.39

Status

enhancement
RESOLVED FIXED
2 years ago
11 months ago

People

(Reporter: gk, Assigned: tjr)

Tracking

(Blocks 2 bugs)

other
3.39
All
Windows
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [tor])

Attachments

(1 attachment, 10 obsolete attachments)

Reporter

Description

2 years ago
After we switched Tor Browser to ESR 52 we got reports of crashes during shutdown (Faulting module name: nssckbi.dll_unloaded).

It turns out we had to deal with those already quite some time ago (see: https://trac.torproject.org/projects/tor/ticket/10761 and bug 1151447). Back then during the ESR 38 timeframe they got fixed by removing the -mnop-fun-dllimport flag for NSS builds. And this was before library folding (bug 856404) hit m-c.

During ESR 45 we had to revert bug 856404 due to bug 1248552 and did not get a single report about shutdown crashes. They only showed up again shortly after switching to ESR 52 which has library folding enabled.
Reporter

Comment 1

2 years ago
Jacek, I am not sure what is going on. Could that be related to the library folding being enabled and/or MOZ_FOLD_LIBS_FLAGS="-mnop-fun-dllimport" (still) being available in old-configure.in (https://dxr.mozilla.org/mozilla-esr52/source/old-configure.in#1005)?

FWIW the bug on our side is https://trac.torproject.org/projects/tor/ticket/22581.
Flags: needinfo?(jacek)
Whiteboard: [tor]

Comment 2

2 years ago
Yes, it may be related. Now that libraries are folded, there should not be inter-library calls that were problematic previously, but there may be other similar calls (I can't test myself ATM, sorry). Ideally we wouldn't use -mnop-fun-dllimport, but that's impossible with current way libraries are folded. Maybe there is a workaround for that crash, but we'd need to know which exact call is problematic.
Flags: needinfo?(jacek)
Reporter

Comment 3

2 years ago
Okay, I'll try to get a debug build done and some stack traces from the problematic machine. It could take a while, though.
Reporter

Comment 4

2 years ago
Yesterday I got a machine in my hands exhibiting the problem. Here is the gdb backtrace I got with a non-stripped build:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1048.0x1cc]
0x6edfda4c in ?? ()
(gdb) bt
#0  0x6edfda4c in ?? ()
#1  0x634cca60 in _PR_DestroyThreadPrivate ()
   from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#2  0x634cbb1a in _PR_CleanupThread ()
   from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#3  0x634ca36c in _PR_NativeRunThread ()
   from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#4  0x634bc19e in pr_root@4 ()
   from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#5  0x6c29c556 in endthreadex ()
   from C:\Users\user\Desktop\Tor Browser\Browser\msvcr100.dll
#6  0x6c29c600 in endthreadex ()
   from C:\Users\user\Desktop\Tor Browser\Browser\msvcr100.dll
#7  0x7572ef8c in KERNEL32!AcquireSRWLockExclusive ()
   from C:\Windows\system32\kernel32.dll
#8  0x76ed367a in ntdll!RtlInsertElementGenericTableAvl ()
   from C:\Windows\system32\ntdll.dll
#9  0x76ed364d in ntdll!RtlInsertElementGenericTableAvl ()
   from C:\Windows\system32\ntdll.dll
#10 0x00000000 in ?? ()

Jacek: Does that ring some bell?
Flags: needinfo?(jacek)

Comment 5

2 years ago
Sorry for late response. I can't be sure, but it looks like -mnop-fun-dllimport may be the problem once again. The thing is that we can't do libraries folding without it. Maybe, if it's just one place that's problematic, we could hack it in problematic cases. It crashes when destroying thread private data via registered destructors. It would be interesting to know, where the problematic destructor was registered.
Flags: needinfo?(jacek)
Reporter

Comment 6

2 years ago
Okay, thanks. I'll have access to the machine where this is reproducible in a couple of weeks again. I guess I'll compile a bundle without library folding again and see whether that fixes it. Meanwhile, how would I find out where the destructor got resgistered assuming I have an unstripped build and gdb handy? Would it help to have stack traces from all threads and not just the crashing one? Or...?
Flags: needinfo?(jacek)

Comment 7

2 years ago
One way would be to try to track address of instruction on top of the stack. It's 0x6edfda4c in backtrace from comment 4. If the crash is indeed similar to observed previously, it will be a thunk in unloaded library.

Another thing worth trying is a breakpoint at PR_NewThreadPrivateIndex and watch dtor arguments passed there.
Flags: needinfo?(jacek)

Updated

Last year
Product: Core → Firefox Build System
Assignee

Comment 8

Last year
Turns out :dmajor and I have been chasing this bug for the past couple of weeks.

I think there's five non-null dtors that get registered:
A: https://searchfox.org/mozilla-central/source/ipc/glue/BackgroundImpl.cpp#413
B: https://searchfox.org/mozilla-central/source/toolkit/crashreporter/ThreadAnnotation.cpp#206
C: https://searchfox.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#100
D: https://searchfox.org/mozilla-central/source/nsprpub/pr/src/threads/prrwlock.c#417
E: https://searchfox.org/mozilla-central/source/nsprpub/pr/src/malloc/prmem.c#453

I've seen successful calls to:
22:37:39     INFO -  GECKO(4164) | function pointer call index 4: 00000000030488E0
22:37:40     INFO -  GECKO(4164) | function pointer call index 3: 00000000038AA910
22:37:40     INFO -  GECKO(4164) | function pointer call index 7: 000000006B6A2EB0

And then 
22:37:40     INFO -  GECKO(4164) | function pointer call index 8: 0000000069A81458 <- Failure

3 and 4 are in xul.dll:
22:37:41     INFO -  0x02620000 - 0x0d48bfff  xul.dll  60.0.0.6697  (WARNING: No symbols, , 4F252EBC1D196EB8F28B802D8950E0951)
And are probably A, B, or C

7 is in nss:
22:37:41     INFO -  0x6b640000 - 0x6b93bfff  nss3.dll  60.0.0.6697  (WARNING: No symbols, , BB0FF785D171D372AB55BFCF3F03C9C61)

:dmajor found the unloaded module and connected out issue to this bug:
> <Unloaded_nssckbi.dll>+0x1458:
> 00000000`69a81458 ??              ???
Assignee: nobody → tom
The only one from that list that I could find in nssckbi was this: https://searchfox.org/mozilla-central/rev/53afcfdbabed96883126d0ebbcac499b358e32f2/security/nss/lib/base/error.c#68

Which on an MSVC build looks like:
nssckbi!error_once_function [z:\build\build\src\security\nss\lib\base\error.c @ 67]:
00000001`800022b4 488b15b5be0000  mov     rdx,qword ptr [nssckbi!_imp_PR_Free (00000001`8000e170)]
00000001`800022bb 488d0dceaf0500  lea     rcx,[nssckbi!error_stack_index (00000001`8005d290)]
00000001`800022c2 48ff2587be0000  jmp     qword ptr [nssckbi!_imp_PR_NewThreadPrivateIndex (00000001`8000e150)]

PR_Free is imported from nss3 by nssckbi. When nssckbi registers `PR_Free` as that function pointer, we go and grab the value out of the import table and so the function pointer we register is actually code in NSS, and we can survive the unloading of nssckbi. That's on an MSVC build.

What if, for some reason mingw instead registers some thunk within nssckbi that says "call qword ptr [nssckbi!_imp_PR_Free]"? Then if nssckbi gets unloaded, that function pointer is no longer usable.
> What if, for some reason mingw instead registers some thunk within nssckbi
> that says "call qword ptr [nssckbi!_imp_PR_Free]"? Then if nssckbi gets
> unloaded, that function pointer is no longer usable.

https://searchfox.org/mozilla-central/rev/53afcfdbabed96883126d0ebbcac499b358e32f2/old-configure.in#885-886
I wonder if we could remove -mnop-fun-dllimport just for the one .c file containing error_once_function? Or even move that function to its own file...
Or maybe worst case do a GetProcAddress to get to the actual code of PR_Free.
Assignee

Comment 13

Last year
(In reply to David Major [:dmajor] from comment #11)
> I wonder if we could remove -mnop-fun-dllimport just for the one .c file
> containing error_once_function? Or even move that function to its own file...

I have been messing with the NSS gyp files for a bit and have had no luck at moving the flag into them (to subsequently remove it for the one file.)  I think I'm going to need some NSS help on how to make this happen....

(I confirmed it doesn't build without the flag entirely.)

Comment 14

Last year
We will soon need to change -mnop-fun-dllimport for supporting clang builds (bug 1390583) since it's not supported by clang. Proposed change may be useful here as well:

-        MOZ_FOLD_LIBS_FLAGS="-mnop-fun-dllimport"
+        MOZ_FOLD_LIBS_FLAGS="-Ddllimport=noop"

Once we have that, we could just

#undef dllimport

wherever we need in source/header files. We could even combine that with #pragma push_macro/pop_macro to limit its scope to problematic functions.
Assignee

Comment 15

Last year
Still analyzing the results but changing -mnop-fun-dllimport to -Ddllimport=noop did not work (was it intended to work with gcc?)

[task 2018-05-07T14:18:27.482Z] 14:18:27     INFO -  /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/mingw32/bin/x86_64-w64-mingw32-gcc -std=gnu99 -mwindows -o error.o -c -DDEBUG -DNSS_FIPS_DISABLED -DNSS_NO_INIT_SUPPORT -DNSS_X86_OR_X64 -DNSS_X64 -DNSS_USE_64 -DUSE_UTIL_DIRECTLY -DNO_NSPR_10_SUPPORT -DSSL_DISABLE_DEPRECATED_CIPHER_SUITE_NAMES -D_WINDOWS -DWIN95 -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_WARNINGS -DNSS_DISABLE_LIBPKIX -I/builds/worker/workspace/build/src/security/nss/lib/base -I/builds/worker/workspace/build/src/obj-firefox/security/nss/lib/base/base_nssb -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/private/nss -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -I/builds/worker/workspace/build/src/obj-firefox/dist/include -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wduplicated-cond -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=free-nonheap-object -Wno-unknown-pragmas -Wno-unused-function -Wno-switch -Wno-enum-compare -fno-strict-aliasing -mms-bitfields -fno-keep-inline-dllexport -ffunction-sections -fdata-sections -Wa,-mbig-obj -fno-math-errno -pthread -pipe -g -fno-omit-frame-pointer -Ddllimport=noop  -MD -MP -MF .deps/error.o.pp   /builds/worker/workspace/build/src/security/nss/lib/base/error.c

> [task 2018-05-07T14:18:27.483Z] 14:18:27     INFO -  /builds/worker/workspace/build/src/obj-firefox/dist/include/nspr/prlock.h:51:1: warning: 'noop' attribute directive ignored [-Wattributes]

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dde068be0fcc2719ffdf49356a6150a9c9662111
Assignee

Comment 16

Last year
(In reply to Tom Ritter [:tjr] from comment #15)
> Still analyzing the results but changing -mnop-fun-dllimport to
> -Ddllimport=noop did not work (was it intended to work with gcc?)

Sorry ignore that, I need to analyze the results more...
Assignee

Comment 17

Last year
(In reply to Tom Ritter [:tjr] from comment #16)
> (In reply to Tom Ritter [:tjr] from comment #15)
> > Still analyzing the results but changing -mnop-fun-dllimport to
> > -Ddllimport=noop did not work (was it intended to work with gcc?)
> 
> Sorry ignore that, I need to analyze the results more...

Okay, I can confirm that while the build did compile with -Ddllimport=noop; undefing it for the one function, as shown in this build https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c53a08e2be15c51099a921709b1ccd409271640 , the thunk is still generated:

> nss3!PR_DelSleepQ+0x335:
> 00000000`6b6d6965 ffd0            call    rax {nssckbi+0x1458 (00000000`69a81458)}
> 0:070> uf nssckbi+0x1458
> nssckbi+0x1458:
> 00000000`69a81458 ff25da0e0700    jmp     qword ptr [nssckbi!C_GetFunctionList+0x6f878 (00000000`69af2338)]  Branch
> 0:070> t
> nssckbi+0x1458:
> 00000000`69a81458 ff25da0e0700    jmp     qword ptr [nssckbi!C_GetFunctionList+0x6f878 (00000000`69af2338)] ds:00000000`69af2338={nss3!PR_Free (00000000`6b6a2e80)}
> 0:070> t
> nss3!PR_Free:
> 00000000`6b6a2e80 55              push    rbp
> undefing it for the one function, as shown in this build
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=0c53a08e2be15c51099a921709b1ccd409271640 , the thunk
> is still generated:

That patch didn't really undefine the culprit dllimport. Presumably, somewhere in the preprocessed error.c, there is some #included definition of PR_Free containing a dllimport declaration. *That's* what you want to undefine. Try moving the #undef dllimport to the top of this file, before any #includes.
Or, perhaps you'll need...
#undef dllimport
#define dllimport noop
...to match the spirit of the previous patch.

For good measure you may want to move error_once_function to its own .c file in case the remainder of error.c doesn't want its dllimport definition messed with.
Assignee

Updated

Last year
Blocks: 1434316
Assignee

Comment 20

Last year
Okay, we tried three approaches to fix this.  

1) https://hg.mozilla.org/try/rev/e9d1d7d60c9034526008709dd3823a428c1a07bf

This adds a new function 
>PR_IMPLEMENT(PRStatus) PR_NewThreadPrivateIndexWithPR_Free(
>    PRUintn *newIndex)
>{
>    return PR_NewThreadPrivateIndex(newIndex, PR_Free);
>}
that we call in error_once_function instead, to avoid storing the address of a thunk.

2) https://hg.mozilla.org/try/rev/d85a8ca8b416b26055ce23bd7456e6c0a350a5c6

This changes mnop-fun-dllimport to Ddllimport=noop and then undefines it for PR_Free only in error.c

The uglier part is that without it, it's looking for __imp_PR_Free which doesn't exist. So we we add it with 'typeof(PR_Free) *__imp_PR_Free = PR_Free;'

3) https://hg.mozilla.org/try/rev/19f8badaebec136c917a49980ad88835f308d279

This is the same idea as #2, but it uses mno-nop-fun-dllimport. To make it work, we have to move error_once_function to a new file, and then add a new gyp target. 


#3 is extremely ugly. #2 seems less ugly than #1.  So the patch proposal is #2.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Updated

Last year
Attachment #8974759 - Flags: review?(franziskuskiefer)
Comment on attachment 8974759 [details]
Bug 1389967 In MinGW, work around a pointer to a function thunk disappearing when we unload nssckbi

https://reviewboard.mozilla.org/r/243152/#review249538

As your name says this is pretty hacky... Looking at the 3 options you mention I think I'd prefer #1. It gives us a real solution to the problem instead of hacking around it.

::: nsprpub/pr/src/malloc/prmem.c:462
(Diff revision 3)
>      else
>          free(ptr);
>  }
>  
> +// See the comment about MINGW_HACK in prmem.h
> +#if defined(__MINGW32__)

Is this always needed or only `#ifdef MINGW_HACK`?
Attachment #8974759 - Flags: review?(franziskuskiefer)

Comment 25

Last year
mozreview-review
Comment on attachment 8974759 [details]
Bug 1389967 In MinGW, work around a pointer to a function thunk disappearing when we unload nssckbi

https://reviewboard.mozilla.org/r/243152/#review249592

::: nsprpub/pr/include/prmem.h:63
(Diff revision 3)
> + ** we're not doing the fixup there! So inside prmem.c we create a
> + ** global function pointer to the real PR_Free that has the incorrect
> + ** name: __imp_PR_Free
> + */
> +#ifdef MINGW_HACK
> +#pragma push_macro("dllimport")

Do we want to limit the change to error.c? If we're going to support it anyway, it may be a good idea to have it enabled for other files as well. Other callers may also benefit from that. I'd suggest to change it to #ifdef __MINGW32__ or even #ifdef __MINGW_IMP_SYMBOL and leave error.c unchanged.

::: nsprpub/pr/src/malloc/prmem.c:520
(Diff revision 3)
>  #endif
>  }
>  
> +// See the comment about MINGW_HACK in prmem.h
> +#if defined(__MINGW32__)
> +#if defined(HAVE_64BIT_BUILD)

mingw-w64 provides __MINGW_IMP_SYMBOL macro for that. It may be better to use it.

Comment 26

Last year
(In reply to Franziskus Kiefer [:fkiefer or :franziskus] from comment #24)
> As your name says this is pretty hacky... Looking at the 3 options you
> mention I think I'd prefer #1. It gives us a real solution to the problem
> instead of hacking around it.

FWIW (I'm not suggesting to do that, but it's probably too much of complication, I think we should take one of Tom's solution, but I think it's worth mentioning), I think that the real non-hack solution would be changing build system to not link the same object files into different modules, where sometimes they need dllimport, sometimes live in the same library as imported functions. MSVC somehow can handle that (although AFAIK it gives warnings and code may be sub-optimal), but it's not really the right thing to do. I recall that we had a bug for that, but I can't find it.
Comment hidden (mozreview-request)
Comment on attachment 8974759 [details]
Bug 1389967 In MinGW, work around a pointer to a function thunk disappearing when we unload nssckbi

https://reviewboard.mozilla.org/r/243152/#review250200

I agree with Jacek that the way this is built is the real issue. But that's unfortunately nothing we can easily change.
From the doable solutions I like this one best.
To get this landed we need two separate patches for NSS and NSPR though.
Attachment #8974759 - Flags: review?(franziskuskiefer)
Assignee

Updated

Last year
Attachment #8974759 - Attachment is obsolete: true
Assignee

Comment 29

Last year
Posted patch nss.patch (obsolete) — Splinter Review
Attachment #8976283 - Flags: review?(franziskuskiefer)
Assignee

Comment 30

Last year
Posted patch nspr.patch (obsolete) — Splinter Review
Attachment #8976284 - Flags: review?(franziskuskiefer)
Assignee

Comment 31

Last year
Franziskus - I will want to uplift these to esr60; what is the process for doing that with NSS/NSPR?
Assignee

Updated

Last year
Blocks: 1462143
Comment on attachment 8976283 [details] [diff] [review]
nss.patch

Review of attachment 8976283 [details] [diff] [review]:
-----------------------------------------------------------------

thanks. Can you put up a patch that I can apply? :) (this patch has windows line endings)
Attachment #8976283 - Flags: review?(franziskuskiefer) → review+
Comment on attachment 8976284 [details] [diff] [review]
nspr.patch

Review of attachment 8976284 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #8976284 - Flags: review?(franziskuskiefer) → review+
(In reply to Tom Ritter [:tjr] from comment #31)
> Franziskus - I will want to uplift these to esr60; what is the process for
> doing that with NSS/NSPR?

Well that's a little difficult. We'd have to make new point releases for NSS and NSPR. There are currently no plans for an NSS 3.36.2 release and I don't know if we can require a new NSPR version for an NSS point release. So I'm not sure if this going to happen. Kai, what do you think?
Flags: needinfo?(kaie)
Assignee

Comment 35

Last year
Posted patch nss.patch (obsolete) — Splinter Review
I'm not sure why one had Windows line endings and the other didn't, but maybe this will work...?
Attachment #8976283 - Attachment is obsolete: true
Attachment #8976530 - Flags: review?(franziskuskiefer)
Attachment #8976530 - Flags: review?(franziskuskiefer) → review+
IIUC this fix is only required for the Windows environment.

The version dependency chain is tricky. If we change NSS on a stable branch on all platforms to require a new NSPR function, and if any packager fails to increase the version dependency on the NSPR package, things break. We had this happen in the past, when we changed NSS to use PR_GetEnvSecure instead of getenv.

Because you're changing nss/lib/base/error.c I suspect that the effect won't be limited to nssckbi, but will affect all NSS libs, which would then all require the newer NSPR version with the new symbol.

Could we limit the use of this new API to the platform that requires it? Could the error_once_function contain an #idef that uses the new API on mingw, only, and keeps using the old API on all other environments?

Are nspr/nss ever packaged separately from the application with mingw builds? If in all mingw builds the application is always bundled together with the nspr/nss libraries, then you have full control, and could use a local patch in your application to fix this, because you don't depend on external nspr/nss packages to contain this fix.

Is this correct? If yes, it would avoid a lot of work, because we wouldn't have to work on the uplifting to older branches.

Instead of branch uplifting, we could limit this change to the latest development versions. We could add the new API in NSPR 4.20. And NSS 3.38 could contain the updated error_once_function with the #ifdef to use the new API on the mingw environment, only. We can pick that up for Firefox 62.

If you require this to be added to the official Firefox 60 ESR.x branch, could you please briefly justify why?

If it's really necessary, we'd have to create an NSPR 4.19.1 release, and also NSS 3.36.2 (ESR 60) and NSS 3.37.1 (FF 61) releases.

In any case, there's one more change that the NSPR patch will require. All exported NSPR functions must be declared in file pr/src/nspr.def. You'd have to add a new section at the end of that file, with the first version of the library that provides it.
Flags: needinfo?(kaie)
Also, I have an idea for a cleaner and more general solution.

IIUC, you require a mechanism that returns a function pointer from the point of view from within the NSPR library.

How about a new API PR_GetNSPRScopeFunctionPointer(const char *name_of_function).

This function could contain a switch that returns function pointers based on the given name parameter.
If we ever require additional function pointers, besides PR_Free, we could reuse that API, and could simply extend the implementation of PR_GetNSPRScopeFunctionPointer to support more names. We'd still have to increase the version number dependency on future enhancements. However, with this approach, code that uses this API could be implemented in a semi failsafe way, because they could fall back like this:
  ptr = PR_GetNSPRScopeFunctionPointer("PR_Free");
  if (!ptr) { ptr = PR_Free; }

With this approach, we wouldn't introduce new immediate breakage if the required implementation isn't available in a secondary nspr package. Instead, it would be limited to the crash-at-shutdown that we have today.
Either like this:

static PRStatus
error_once_function(void)
{
#ifdef __MINGW32__
    PRLibrary *lib = NULL;
    if (PR_FindFunctionSymbolAndLibrary("PR_NewThreadPrivateIndexWithPR_Free",
                                        &lib)) {
        PR_UnloadLibrary(lib);
        return PR_NewThreadPrivateIndexWithPR_Free(&error_stack_index);
    }
#endif
    return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
}


or like this:

static PRStatus
error_once_function(void)
{
#ifdef __MINGW32__
    PRLibrary *lib = NULL;
    PRFuncPtr *func = NULL;
    if (PR_FindFunctionSymbolAndLibrary("PR_GetNSPRScopeFunctionPointer",
                                        &lib)) {
        PR_UnloadLibrary(lib);
        func = PR_GetNSPRScopeFunctionPointer("PR_Free");
        if (func) {
            return PR_NewThreadPrivateIndex(&error_stack_index, func);
        }
    }
#endif
    return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
}
I'd like to ask for the use of PR_FindFunctionSymbolAndLibrary for any official stable branches this must be backported to.

For Firefox 62 and newer, we don't need PR_FindFunctionSymbolAndLibrary but can assume the functions exist.
If MINGW never packages NSPR/NSS separately, and if the use of PR_NewThreadPrivateIndexWithPR_Free (or PR_GetNSPRScopeFunctionPointer) is limited with #idef __MINGW32__, then we don't need PR_FindFunctionSymbolAndLibrary on the stable branch either.
Assignee

Comment 41

Last year
(In reply to Kai Engert (:kaie:) from comment #36)
> IIUC this fix is only required for the Windows environment.

Yes.

> Could we limit the use of this new API to the platform that requires it?
> Could the error_once_function contain an #idef that uses the new API on
> mingw, only, and keeps using the old API on all other environments?

Yes.

> Are nspr/nss ever packaged separately from the application with mingw
> builds? 

This is needed for -esr60 (and eventually -central) builds in TaskCluster and for Tor (who will track esr60 in their own repo).

> If in all mingw builds the application is always bundled together
> with the nspr/nss libraries, then you have full control, and could use a
> local patch in your application to fix this, because you don't depend on
> external nspr/nss packages to contain this fix.

While Tor could apply a local patch in their repo, we would like to have these patches in TaskCluster builds of esr60.  I don't know what the options are for this: apply a patch directly onto -esr60? apply a patch during the ./mach build process?
 
> If you require this to be added to the official Firefox 60 ESR.x branch,
> could you please briefly justify why?

Our goal (in ~1-2 months) is to get MinGW x86/x64 running in esr60 in TC with passing tests. Without this fix, all the tests fail. We'd like to have TC esr60 building and running the MinGW build to avoid unexpected build breakage from changes in esr60.

We can delay bringing this into esr60 until we've cleared all other test blockers, but we would like to bring it in in that timeframe (1-2 months).

> How about a new API PR_GetNSPRScopeFunctionPointer(const char
> *name_of_function).

This approach is fine by me.
Tom, thanks for your response. You also clarified (on IRC) that none of the mingw environments require separate packaging of nss/nspr.

I'd like to dismiss my idea about the generic PR_GetNSPRScopeFunctionPointer API. It doesn't really create a benefit, if the availability of this API cannot guarantee that any function pointer can be obtained with it, but that some kind of version dependency would still be necessary. Also, I'd like to dismiss the idea about checking whether the symbol is present in a shared library. If we are careful about not using it outside of mingw on the stable branches, we should never run into a situation were it isn't available.

I still think that function PR_NewThreadPrivateIndexWithPR_Free isn't ideal.

So here's a new suggestion:

- Instead of PR_NewThreadPrivateIndexWithPR_Free, we create a new NSPR API:
  PR_GetFuncPtr_PR_Free, which just does "return PR_Free".

- we create a new NSPR 4.19.1 release that offers PR_GetFuncPtr_PR_Free
  for all platforms.
  (Both ESR 60 and FF 61 use NSPR 4.19, so this release can be sufficient
   for both.)

- ESR 60 uses NSS 3.36.1
  We release a new NSS 3.36.2, which only uses the new NSPR API on the
  mingw platform. This way we should avoid all dependency issues for
  all platforms that package it separately.

    static PRStatus
    error_once_function(void)
    {
    #ifdef __MINGW32__
        return PR_NewThreadPrivateIndex(&error_stack_index, PR_GetFuncPtr_PR_Free());
    #else
        return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
    #endif
    }

- Because FF 61 uses NSS 3.37, we should probably release NSS 3.37.1,
  which contains the same platform specific fix. This will ensure that upgrading
  to a newer version doesn't reintroduce the same issue.

- In theory, for FF 62 and newer, we could get rid of the platform specific
  implementation of the error_once_function.
  However, it might also be fine to keep it. It would be a good documentation
  why this is necessary.
  So my suggestion is, use the same #ifdef on NSS trunk for NSS 3.38+.
(In reply to Kai Engert (:kaie:) from comment #42)
> - Instead of PR_NewThreadPrivateIndexWithPR_Free, we create a new NSPR API:
>   PR_GetFuncPtr_PR_Free, which just does "return PR_Free".

(Sorry if I'm intruding)

Since all of this is limited to MinGW anyway, is `PR_NewThreadPrivateIndexWithPR_Free` much different from just calling the system `GetProcAddress`?
(Er, I meant `PR_GetFuncPtr_PR_Free`.)
I don't know the answer to comment 43 and 44. I have the impression that what matters is the location of the caller. If you try to obtain the function address from within NSS, what you get is apparently different from what you get, if you obtain that address from within the NSPR scope. That's how I understood Tom's explanation. I don't know if GetProcAddress would be independent of the scope from which it is called.
One minor change.

Instead of calling the NSPR release 4.19.1, I'd prefer to call it 4.20, but it depends on being able to convince the ESR 60.x drivers to accept such an upgrade.

So far, the NSPR trunk hasn't received any functional changes. Only platform specific build defines had been added. These changes should be fine for the ESR 60 branch. Calling it 4.20 would be more aligned with the rule that new APIs aren't introduced on stable NSPR branches.

You'd have to ask that ESR 60 gets upgraded to NSPR 4.20 (instead of 4.19.1), which also picks up these two minor changes for the Risc-V and FreeBSD platforms. I think this should be accepteable for the ESR 60.x release drivers.
(In reply to Kai Engert (:kaie:) from comment #45)
> I don't know if GetProcAddress would be independent of the scope from which it is called.

Windows would look it up using nss3.dll's export table, so it would not be from error_once_function's perspective.
Posted patch 1389967-nspr-v4.patch (obsolete) — Splinter Review
NSPR part for my latest suggestion.
Posted patch 1389967-nss-v4.patch (obsolete) — Splinter Review
NSS part
(In reply to David Major [:dmajor] from comment #47)
> (In reply to Kai Engert (:kaie:) from comment #45)
> > I don't know if GetProcAddress would be independent of the scope from which it is called.
> 
> Windows would look it up using nss3.dll's export table, so it would not be
> from error_once_function's perspective.

If GetProcAddress(PR_Free) would solve the issue, that would be great. No change to NSPR would then be necessary.

Tom, would you like to try that?
Assignee

Comment 51

Last year
(In reply to Kai Engert (:kaie:) from comment #50)
> (In reply to David Major [:dmajor] from comment #47)
> > (In reply to Kai Engert (:kaie:) from comment #45)
> > > I don't know if GetProcAddress would be independent of the scope from which it is called.
> > 
> > Windows would look it up using nss3.dll's export table, so it would not be
> > from error_once_function's perspective.
> 
> If GetProcAddress(PR_Free) would solve the issue, that would be great. No
> change to NSPR would then be necessary.
> 
> Tom, would you like to try that?

Sure I'll give it a shot. Won't have results until next week.
Assignee

Comment 52

Last year
Posted patch nss-getprocaddress.patch (obsolete) — Splinter Review
Seems to work!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3dc264cbf93b32256802110a71ebb7de007849f8&filter-searchStr=mingw&selectedJob=179518769


I'd like to hold off on landing this for the time being until I've got everything else in place and know for certain this is sufficient.
Attachment #8976284 - Attachment is obsolete: true
Attachment #8976530 - Attachment is obsolete: true
Attachment #8977033 - Attachment is obsolete: true
Attachment #8977034 - Attachment is obsolete: true
Attachment #8979285 - Flags: review?(franziskuskiefer)
Attachment #8979285 - Flags: review?(dmajor)
Attachment #8979285 - Flags: feedback?(kaie)
Comment on attachment 8979285 [details] [diff] [review]
nss-getprocaddress.patch

Review of attachment 8979285 [details] [diff] [review]:
-----------------------------------------------------------------

I can't review in nss land but this looks good to me!

Are we sure that nss3.dll will always be loaded at the point of this call? Otherwise GetModuleHandle will return null and GetProcAddress will be unhappy. I'll let the reviewers weigh in but IMO we shouldn't worry about it unless we have a known case where nss3 isn't present.

::: lib/base/error.c
@@ +83,5 @@
> + * crash. So we do this bit of ugly to avoid that crash. Fortunately
> + * this is the only place we've had to do this.
> + */
> +#if defined(__MINGW32__)
> +    HMODULE nss3 = GetModuleHandle("nss3");

Minor nitpick, I'd write this as GetModuleHandleW(L"nss3.dll") to avoid making Windows do the translation for you. Same 'W' and 'L' on GetProcAddress.
Attachment #8979285 - Flags: review?(dmajor) → feedback+
Might as well use A variants rather than W.
Why? 'A' means that Windows has to do the conversion for you.
Great, so we don't need a new NSPR API, which simplifies the situation a lot.

The implementation of the PR_Free function isn't inside nss3, it's implemented in the nspr4 library.

The Firefox build configuration for Windows uses library folding. Which DLL contains the NSPR library code? You should use GetModuleHandle with that library name.

(I cannot comment on the use of character conversion functions on Windows.)

Once we have found the final code (after the library name and conversion discussion), we should still make a new NSS release for the ESR 60.x branch, IMHO.
On Windows, the implementation of PR_Free is indeed in nss3.dll.
Assignee

Comment 58

Last year
Posted patch nss-getprocaddress.patch (obsolete) — Splinter Review
Updated patch to use W variants.
Attachment #8979285 - Attachment is obsolete: true
Attachment #8979285 - Flags: review?(franziskuskiefer)
Attachment #8979285 - Flags: feedback?(kaie)
Attachment #8980590 - Flags: review?(franziskuskiefer)
Assignee

Updated

Last year
Attachment #8980590 - Attachment is patch: true
Assignee

Comment 59

Last year
Posted patch nss-getprocaddress.patch (obsolete) — Splinter Review
Attachment #8980590 - Attachment is obsolete: true
Attachment #8980590 - Flags: review?(franziskuskiefer)
Attachment #8980633 - Flags: review?(franziskuskiefer)
Comment on attachment 8980633 [details] [diff] [review]
nss-getprocaddress.patch

Review of attachment 8980633 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, except for the comment.
Let me know then when you want this landed.

::: lib/base/error.c
@@ +84,5 @@
> + * this is the only place we've had to do this.
> + */
> +#if defined(__MINGW32__)
> +    HMODULE nss3 = GetModuleHandleW(L"nss3");
> +    return PR_NewThreadPrivateIndex(&error_stack_index, GetProcAddress(nss3, "PR_Free"));

I'd like to be prudent here and check that nss3 and the result of GetProcAddress are not null.
Attachment #8980633 - Flags: review?(franziskuskiefer)

Comment 61

Last year
FWIW, in mingw clang build, I managed to support library folding without ignoring dllimport attribute. This means that clang builds shouln't be affected by this bug.

Updated

Last year
Depends on: mingw-clang
Assignee

Updated

11 months ago
Blocks: mingw-clang
No longer depends on: mingw-clang

Comment 62

11 months ago
I'm not sure it should block 1465790. In clang builds we don't use dllimport tricks, so this bug should not happen.
Assignee

Comment 63

11 months ago
Yes; you're right.
No longer blocks: mingw-clang
Assignee

Comment 64

11 months ago
Posted patch nss-getprocaddess.patch (obsolete) — Splinter Review
Forget to update this patch to check return values; finally did so.
Attachment #8980633 - Attachment is obsolete: true
Attachment #8993135 - Flags: review?(franziskuskiefer)
Comment on attachment 8993135 [details] [diff] [review]
nss-getprocaddess.patch

Review of attachment 8993135 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm but can you make one that doesn't have windows line endings?
Attachment #8993135 - Flags: review?(franziskuskiefer)
Assignee

Comment 66

11 months ago
(In reply to Franziskus Kiefer [:fkiefer or :franziskus] from comment #65)
> Comment on attachment 8993135 [details] [diff] [review]
> nss-getprocaddess.patch
> 
> Review of attachment 8993135 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> lgtm but can you make one that doesn't have windows line endings?

Hopefully this works. I don't know why that happens, I didn't write it on Windows, I didn't post it on Windows, etc....
Attachment #8993135 - Attachment is obsolete: true
Assignee

Updated

11 months ago
Attachment #8993399 - Flags: review?(franziskuskiefer)
Attachment #8993399 - Flags: review?(franziskuskiefer) → review+
https://hg.mozilla.org/projects/nss/rev/783d13e6d3458f291d2b1ae79c7e67bebfa931ed
Status: NEW → RESOLVED
Closed: 11 months ago
Component: General → Libraries
Product: Firefox Build System → NSS
Resolution: --- → FIXED
Target Milestone: --- → 3.39
Version: 52 Branch → other
You need to log in before you can comment on or make changes to this bug.