Closed Bug 1451255 Opened 6 years ago Closed 3 years ago

Assertion failure: false (Tried and failed to create an XPCOM log), at src/xpcom/base/nsTraceRefcnt.cpp:715 and crash when "XPCOM_MEM_BLOAT_LOG" contains a unicode character

Categories

(Core :: XPCOM, defect, P3)

60 Branch
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: froydnj)

References

(Blocks 2 open bugs)

Details

(Keywords: assertion, crash)

Attachments

(4 files)

To reproduce have a debug or logrefcnt build and then run:

> XPCOM_MEM_BLOAT_LOG="
Here the log output and the still to fix stack:

### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to 
As it looks like Bugzilla doesn't handle unicode characters at the moment and cuts-off all remaining content from the comment. So here the output with the characters removed:

### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to %unicode%.log
[30915, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/workspace/build/src/js/xpconnect/src/nsXPConnect.cpp, line 572
[30915, Main Thread] WARNING: NS_ENSURE_SUCCESS_VOID(rv) failed with result 0x80004005: file /builds/worker/workspace/build/src/js/xpconnect/loader/mozJSComponentLoader.cpp, line 594
Assertion failure: globalObj, at /builds/worker/workspace/build/src/js/xpconnect/loader/mozJSComponentLoader.cpp:655
#01: nsXPTCStubBase::Stub249()[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0xb1d66f]
#02: nsXPTCStubBase::Stub249()[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0xb1bbda]
#03: nsXPTCStubBase::Stub249()[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0xb1d452]
#04: nsXPTCStubBase::Stub249()[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x135320]
#05: catch_exception_raise[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c17ec4]
#06: catch_exception_raise[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c21267]
#07: catch_exception_raise[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c21fb7]
#08: start[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/firefox +0xd49]
Component: XPConnect → XPCOM
I'm not sure if this is XPCOM or nsXPConnect. I don't know how setting XPCOM_MEM_BLOAT_LOG could cause global creation to fail.
Maybe this is related to artifact builds? That's what I build locally.

If I use a non-optimized debug build from [1] the build starts, but throws tons of assertions and I'm not able to load a single URL because the tab always crashes.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=build%20debug&filter-tier=1&filter-tier=2&filter-tier=3&bugfiler&selectedJob=171731575
I actually missed a line in the log of the downloaded debug build:

> Hit MOZ_CRASH(Failed to create or init an nsIFile) at /builds/worker/workspace/build/src/dom/ipc/ContentChild.cpp:1606

https://dxr.mozilla.org/mozilla-central/source/dom/ipc/ContentChild.cpp#1606

I assume this is the reason for all the crashes, which is caused by the Unicode character for the leak log filename.
PR_GetEnv and InitWithNativePath should not be used on Windows because they are lossy.

Probably we should:
1. add a GetEnv wrapper to use _wgetenv on Windows,
2. use mozilla::filesystem::Path::value_type instead of char (or PathString instead of nsACString), and
3. use the nsLocalFile constructor instead of nsIFile::InitWithPath.
Blocks: 1423897
(In reply to Masatoshi Kimura [:emk] from comment #6)
> 3. use the nsLocalFile constructor instead of nsIFile::InitWithPath.

You removed nsILocalFile via bug 1375125. So this won't be doable, right?
nsLocalFile, not ns*I*LocalFile.
Oh sorry. I misread.

Btw. this happens for all environment variables for log files like MOZ_LOG too.
Summary: Assertion failure: globalObj, at js/xpconnect/loader/mozJSComponentLoader.cpp:655 when "XPCOM_MEM_BLOAT_LOG" contains a unicode character → Assertion failure: globalObj, at js/xpconnect/loader/mozJSComponentLoader.cpp:655 for log file environment variables with unicode characters included
Severity: normal → critical
Keywords: assertion, crash
Summary: Assertion failure: globalObj, at js/xpconnect/loader/mozJSComponentLoader.cpp:655 for log file environment variables with unicode characters included → Assertion failure: globalObj, at js/xpconnect/loader/mozJSComponentLoader.cpp:655 and crash for log file environment variables with unicode characters included
If we can find someone to work on that bug that would be fantastic. I'm currently working on tests to get coverage for profiles with Unicode characters, and this assertion/crash is somewhat blocking me given that the leak logs are getting created inside the profile. And I would like to keep that location for test coverage.
How does it fail with MOZ_LOG?  That environment variable isn't even referenced in ContentChild.cpp, and it seems like that would be a separate bug.
Flags: needinfo?(hskupin)
(In reply to Masatoshi Kimura [:emk] from comment #6)
> PR_GetEnv and InitWithNativePath should not be used on Windows because they
> are lossy.
> 
> Probably we should:
> 1. add a GetEnv wrapper to use _wgetenv on Windows,
> 2. use mozilla::filesystem::Path::value_type instead of char (or PathString
> instead of nsACString), and
> 3. use the nsLocalFile constructor instead of nsIFile::InitWithPath.

I don't think this has anything to do with Windows, because the particular ContentChild.cpp code that is linked in comment 5 is in Mac-specific code for Mac sandboxing.

That being said, I don't know what happens to environment variables with Unicode characters on Mac.  Is it possible that the Unicode is being converted to Latin-1 somewhere along the way (presumably somewhere in Python)?
(In reply to Nathan Froyd [:froydnj] from comment #12)
> That being said, I don't know what happens to environment variables with
> Unicode characters on Mac.  Is it possible that the Unicode is being
> converted to Latin-1 somewhere along the way (presumably somewhere in
> Python)?

Actually I guess Python would not come into play here, since presumably the STR in comment 0 is some command you can execute from the shell?  I can't tell whether the getenv() calls inside xul (i.e. the debug log in comment 2) is getting the correct unicode (UTF-8?) characters or not.

Python code setting XPCOM_MEM_BLOAT_LOG would have to be careful that the correct value is getting set, of course.
(In reply to Nathan Froyd [:froydnj] from comment #11)
> How does it fail with MOZ_LOG?  That environment variable isn't even
> referenced in ContentChild.cpp, and it seems like that would be a separate
> bug.

Sorry, you are correct. I just tried with a Nightly build of Firefox and it works fine with MOZ_LOG_FILE (not MOZ_LOG). But running `mach run` for my debug build it now fails all the time with this failure even with no such env variable set. Given that I tested MOZ_LOG_FILE with the mach command it was a red herring, and I have to check why I cannot start Firefox at all.

(In reply to Nathan Froyd [:froydnj] from comment #12)
> I don't think this has anything to do with Windows, because the particular
> ContentChild.cpp code that is linked in comment 5 is in Mac-specific code
> for Mac sandboxing.

ContentChild.cpp has been changed meanwhile, so here an updated link for the line it crashes in:
https://dxr.mozilla.org/mozilla-central/rev/2f5ffe4fa2153a798ed8b310a597ea92abd1b868/dom/ipc/ContentChild.cpp#1590

And yes, I see it on MacOS here. But where do you see that this is Mac-specific code? 

(In reply to Nathan Froyd [:froydnj] from comment #13)
> Actually I guess Python would not come into play here, since presumably the
> STR in comment 0 is some command you can execute from the shell?  I can't

Yes, something like `XPCOM_MEM_BLOAT_LOG=unicode_path ./mach run` reproduces it for me.

> Python code setting XPCOM_MEM_BLOAT_LOG would have to be careful that the
> correct value is getting set, of course.

So for our test harnesses which use mozrunner, I fixed the encoding so we always encode Unicode strings to UTF-8 for all environment variables. Before that we only did that for an interactive session. See bug 1451246.
Flags: needinfo?(hskupin)
Summary: Assertion failure: globalObj, at js/xpconnect/loader/mozJSComponentLoader.cpp:655 and crash for log file environment variables with unicode characters included → Assertion failure: globalObj, at js/xpconnect/loader/mozJSComponentLoader.cpp:655 and crash when "XPCOM_MEM_BLOAT_LOG" contains a unicode character
(In reply to Henrik Skupin (:whimboo) from comment #14)
> (In reply to Nathan Froyd [:froydnj] from comment #12)
> > I don't think this has anything to do with Windows, because the particular
> > ContentChild.cpp code that is linked in comment 5 is in Mac-specific code
> > for Mac sandboxing.
> 
> ContentChild.cpp has been changed meanwhile, so here an updated link for the
> line it crashes in:
> https://dxr.mozilla.org/mozilla-central/rev/
> 2f5ffe4fa2153a798ed8b310a597ea92abd1b868/dom/ipc/ContentChild.cpp#1590
> 
> And yes, I see it on MacOS here. But where do you see that this is
> Mac-specific code? 

Because GetDirectoryPath is only called from:

https://dxr.mozilla.org/mozilla-central/rev/2f5ffe4fa2153a798ed8b310a597ea92abd1b868/dom/ipc/ContentChild.cpp#1704-1715

which is part of:

https://dxr.mozilla.org/mozilla-central/rev/2f5ffe4fa2153a798ed8b310a597ea92abd1b868/dom/ipc/ContentChild.cpp#1611-1612

which presumably would not be getting run on non-Mac systems. ;)

> (In reply to Nathan Froyd [:froydnj] from comment #13)
> > Actually I guess Python would not come into play here, since presumably the
> > STR in comment 0 is some command you can execute from the shell?  I can't
> 
> Yes, something like `XPCOM_MEM_BLOAT_LOG=unicode_path ./mach run` reproduces
> it for me.

OK, so that involves Python code, right?  And from your earlier comment, it sounds like with Nightly, you can do:

XPCOM_MEM_BLOAT_LOG="<unicode-containing-path" ./firefox

(OK, maybe not Nightly, per se, since the leak tracking is only active in DEBUG builds, which Nightly is not; but this ought to work OK with a debug build from try, for instance?  Or perhaps MOZ_LOG_FILE="<unicode-containing-path>", since that should work on all builds.)

Is that correct?  If executing firefox directly with a unicode environment variable works, but running it from within `./mach run` does not, I am inclined to say that there is something in Python that is causing problems.
Flags: needinfo?(hskupin)
(In reply to Nathan Froyd [:froydnj] from comment #15)
> > And yes, I see it on MacOS here. But where do you see that this is
> > Mac-specific code? 
> 
> Because GetDirectoryPath is only called from:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> 2f5ffe4fa2153a798ed8b310a597ea92abd1b868/dom/ipc/ContentChild.cpp#1704-1715
> 
> which is part of:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> 2f5ffe4fa2153a798ed8b310a597ea92abd1b868/dom/ipc/ContentChild.cpp#1611-1612
> 
> which presumably would not be getting run on non-Mac systems. ;)
 
Ah, I see. Thanks. Maybe we should do it similar as what has been done on bug 685236 for other components?

> > Yes, something like `XPCOM_MEM_BLOAT_LOG=unicode_path ./mach run` reproduces
> > it for me.
> 
> OK, so that involves Python code, right?  And from your earlier comment, it
> sounds like with Nightly, you can do:
> 
> XPCOM_MEM_BLOAT_LOG="<unicode-containing-path" ./firefox

No, it fails in both cases, so the Python code shouldn't be related here. 

> MOZ_LOG_FILE="<unicode-containing-path>", since that should work on all
> builds.)

It was a false assumption on my side as mentioned in the last comment. So we don't have to further concentrate us on that part.
Flags: needinfo?(hskupin)
I pushed another try build with some other recent fixes in mozbase for Unicode characters and all debug builds are failing due to this problem with a hang and crash. 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=453a50bc94e144403cded0c6e274e09b2d29422d&filter-searchStr=debug&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable

On Windows we actually get a different assertion:

https://treeherder.mozilla.org/logviewer.html#?job_id=172808189&repo=try&lineNumber=1616-1629

> 05:46:37     INFO -  ### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to c:\users\genericworker\appdata\local\temp\tmpshxu7q.mozrunner??\runreftest_leaks.log
> 05:46:37     INFO -  Assertion failure: false (Tried and failed to create an XPCOM log), at z:/build/build/src/xpcom/base/nsTraceRefcnt.cpp:714
> 05:46:48     INFO -  #01: static void InitTraceLog() [xpcom/base/nsTraceRefcnt.cpp:745]
> 05:46:48     INFO -  #02: NS_LogCtor [xpcom/base/nsTraceRefcnt.cpp:1168]
> 05:46:48     INFO -  #03: nsTArray_base<nsTArrayInfallibleAllocator,nsTArray_CopyWithMemutils>::nsTArray_base<nsTArrayInfallibleAllocator,nsTArray_CopyWithMemutils>() [xpcom/ds/nsTArray-inl.h:16]
> 05:46:48     INFO -  #04: nsXREDirProvider::nsXREDirProvider() [toolkit/xre/nsXREDirProvider.cpp:99]
> 05:46:48     INFO -  #05: XREMain::XREMain() [toolkit/xre/nsAppRunner.cpp:3119]
> 05:46:48     INFO -  #06: XRE_main(int,char * * const,mozilla::BootstrapConfig const &) [toolkit/xre/nsAppRunner.cpp:5071]
> 05:46:48     INFO -  #07: static int do_main(int, char * *, char * *) [browser/app/nsBrowserApp.cpp:232]
> 05:46:48     INFO -  #08: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:306]
> 05:46:48     INFO -  #09: wmain [toolkit/xre/nsWindowsWMain.cpp:132]
> 05:46:48     INFO -  #10: static int __scrt_common_main_seh() [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:283]
> 05:46:48     INFO -  #11: KERNEL32.DLL + 0x12774
> 05:46:48     INFO -  #12: ntdll.dll + 0x70d61
Blocks: 918097
Not sure what happened but the MacOS assertion seems to be gone for me and also the try build doesn't show that for debug builds:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1df3f7469bfbcd391e8e1ef5128b24c5b45fe7ba&filter-searchStr=mac

Instead we still have perma assertions and crashes on Windows as mentioned in the last comment:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1df3f7469bfbcd391e8e1ef5128b24c5b45fe7ba&filter-searchStr=windows%20debug
Summary: Assertion failure: globalObj, at js/xpconnect/loader/mozJSComponentLoader.cpp:655 and crash when "XPCOM_MEM_BLOAT_LOG" contains a unicode character → Assertion failure: false (Tried and failed to create an XPCOM log), at src/xpcom/base/nsTraceRefcnt.cpp:715 and crash when "XPCOM_MEM_BLOAT_LOG" contains a unicode character
OK, that makes a little more sense; the remedy for that is something along the lines of comment 6.
Here the line of failure:
https://dxr.mozilla.org/mozilla-central/rev/325ef357e5b73d63794e47c02c7f8e7cf58ccb48/xpcom/base/nsTraceRefcnt.cpp#706

> stream = ::fopen(fname.get(), "w" FOPEN_NO_INHERIT);

It fails to open the file, but the path seem to be logged correctly right before the assertion:

https://treeherder.mozilla.org/logviewer.html#?job_id=173462556&repo=try&lineNumber=1565

>     INFO -  ### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to c:\users\task_1523612942\appdata\local\temp\tmpaupiel.mozrunner
I have what I think is a patch for the Windows problem; will need to ensure it compiles locally for Linux and then push to try and deal with Windows issues before it's ready for testing.
...on Windows, at least.  Apparently if you have environment variables
set that contain multibyte characters, and ask for them with getenv, you
get garbage.  Or perhaps you get something sensible, but then passing it
to fopen produces garbage.  Either way, the most reasonable way to
handle this is to use the Windows wide-character APIs all over.

Henrik, do you want to see if this addresses your problem?
Attachment #8967856 - Flags: review?(continuation)
Attachment #8967856 - Flags: feedback?(hskupin)
Attachment #8967856 - Flags: review?(continuation) → review+
Nathan, I will be back on Monday. If the patch is all fine please go ahead and land it, I will verify everything on Monday. Thanks a lot for this quick fix!
Comment on attachment 8967856 [details] [diff] [review]
make nsTraceRefcnt handle multibyte characters in log files

Review of attachment 8967856 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsTraceRefcnt.cpp
@@ +673,3 @@
>  #ifdef XP_WIN
> +  // This is gross, I know.
> +  const wchar_t* envvar = reinterpret_cast<const wchar_t*>(aEnvVar);

I suggest `MOZ_ASSERT(NS_IsAscii(aEnvvar)) + NS_ConvertASCIItoUTF16(aEnvvar)` because `aEnvvar` is under our control and it is very unlikely to be non-ASCII.

@@ +673,4 @@
>  #ifdef XP_WIN
> +  // This is gross, I know.
> +  const wchar_t* envvar = reinterpret_cast<const wchar_t*>(aEnvVar);
> +  const char16_t* value = reinterpret_cast<const char16_t*>(::_wgetenv(envvar));

`char16ptr_t` would be prettier although it is just a suger for reinterpret_cast.

@@ +706,5 @@
> +        NS_ConvertUTF8toUTF16 pt(processType);
> +#else
> +        nsDependentCString pt(processType);
> +#endif
> +        fname.Append(pt);

`fname.AppendASCII(processType);` instead of an #ifdef clutter.
(In reply to Masatoshi Kimura [:emk] from comment #24)
> ::: xpcom/base/nsTraceRefcnt.cpp
> @@ +673,3 @@
> >  #ifdef XP_WIN
> > +  // This is gross, I know.
> > +  const wchar_t* envvar = reinterpret_cast<const wchar_t*>(aEnvVar);
> 
> I suggest `MOZ_ASSERT(NS_IsAscii(aEnvvar)) +
> NS_ConvertASCIItoUTF16(aEnvvar)` because `aEnvvar` is under our control and
> it is very unlikely to be non-ASCII.

I understand the MOZ_ASSERT.  I don't understand the NS_ConvertASCIItoUTF16 bit.  Are you saying that I should pass in the environment variable name as `const char*` always, and then convert for Windows only?  I am not sure that gains anything, since I still need the `wchar_t*` and the `char16_t*` values...

> @@ +706,5 @@
> > +        NS_ConvertUTF8toUTF16 pt(processType);
> > +#else
> > +        nsDependentCString pt(processType);
> > +#endif
> > +        fname.Append(pt);
> 
> `fname.AppendASCII(processType);` instead of an #ifdef clutter.

Ah, that would be better, thank you!
(In reply to Nathan Froyd [:froydnj] from comment #25)
> I understand the MOZ_ASSERT.  I don't understand the NS_ConvertASCIItoUTF16
> bit.  Are you saying that I should pass in the environment variable name as
> `const char*` always, and then convert for Windows only?

Yes.

>  I am not sure that
> gains anything, since I still need the `wchar_t*` and the `char16_t*`
> values...

It will reduce some macro tricks such as ENVVAR_PRINTF or ENVVAR(x). But up to you.
Comment on attachment 8967856 [details] [diff] [review]
make nsTraceRefcnt handle multibyte characters in log files

I pushed the patch together with my changes for mozprofile to try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3cf83f7f3c15a8ebe766620b2a0e0ee5846fa92
Attachment #8967856 - Flags: feedback?(hskupin)
I accidentally pushed an artifact build to try which then doesn't contain the changes from this bug. So here a new try build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c875a13c647fcf0ae3f3df2d95aa59682f36c30c&selectedJob=173887765
Ok, so I cannot verify via try until this remaining mozprofile unit test is fixed. But I tried locally via a Mingw32 shell and noticed that it still fails on Windows 10:

$ XPCOM_MEM_BLOAT_LOG=../m\?\?/leak.log firefox.exe
[..]
### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to ../m??/leak.log
Assertion failure: false (Tried and failed to create an XPCOM log), at z:/build/build/src/xpcom/base/nsTraceRefcnt.cpp:733

Most likely I did something wrong on my side because I think that you tested that yourself when working on the patch.
So I didn't actually test this; I just iterated on try until it compiled and ran debug tests successfully, because that meant I at least hadn't broken any functionality first.

(In reply to Henrik Skupin (:whimboo) from comment #30)
> $ XPCOM_MEM_BLOAT_LOG=../m\?\?/leak.log firefox.exe
> [..]
> ### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to
> ../m??/leak.log
> Assertion failure: false (Tried and failed to create an XPCOM log), at
> z:/build/build/src/xpcom/base/nsTraceRefcnt.cpp:733

It's not obvious to me from the example here...What unicode character is \? supposed to be?  Just any unicode character?  Because '?' is ASCII, so if we can't create filenames with '?' in them, that's a different problem...

Assuming it is a unicode character, I think the phrasing of the message means that we're correctly getting the unicode value from the environment, but we can't open the file?  Does the directory exist?  If the directory exists, I have no insight into why we wouldn't be able to open the file.
Could there be sandboxing issues? The content process creates the file, so you have to specify a valid temp directory.
(In reply to Nathan Froyd [:froydnj] from comment #31)
> It's not obvious to me from the example here...What unicode character is \?
> supposed to be?  Just any unicode character?  Because '?' is ASCII, so if we

It's what auto-completion in the mingw32 shell suggested. But yes, any kind of Unicode character can be used here to get it reproduced.

> Assuming it is a unicode character, I think the phrasing of the message
> means that we're correctly getting the unicode value from the environment,
> but we can't open the file?  Does the directory exist?  If the directory
> exists, I have no insight into why we wouldn't be able to open the file.

At least we don't assert and crash anymore. That's what I can at least say.
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d89a66dea5f
make nsTraceRefcnt handle multibyte characters in log files; r=mccr8
(In reply to Andrew McCreight [:mccr8] from comment #32)
> Could there be sandboxing issues? The content process creates the file, so
> you have to specify a valid temp directory.

I think the log for the content process would include a PID marker in the filename, though?

I pushed the patch with the small AppendASCII change from comment 24; I didn't want to iterate on try with trying to get the types lined up for the other changes, so that can be a followup.

Let's see what happens with comment 30 and fixing the mozprofile test.  If things are still broken on try with unicode profile directories after the mozprofile test(s?) are fixed, we can investigate further.
Assignee: nobody → nfroyd
Severity: critical → normal
Keywords: leave-open
OS: Unspecified → All
Priority: -- → P3
Hardware: Unspecified → All
I updated my mozprofile patch and marked the failing test as skip. Lets see if the landed patch fixes everything:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=85ef901be939607846a699ea60e59bb663679e6d
Given that a mozrunner change caused some other bustage now I had to push another try build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ebc18ad26b5437fa8954e777cb72aae2ee7ebbc7&selectedJob=174315051

And we are still failing in reftests:

https://treeherder.mozilla.org/logviewer.html#?job_id=174315051&repo=try&lineNumber=1592-1604

> 13:07:58     INFO -  ### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to c:\users\task_1524056521\appdata\local\temp\tmpstljs5.mozrunner?Assertion failure: false (Tried and failed to create an XPCOM log), at z:/build/build/src/xpcom/base/nsTraceRefcnt.cpp:727
> 13:08:14     INFO -  #01: static void InitTraceLog() [xpcom/base/nsTraceRefcnt.cpp:764]
> 13:08:14     INFO -  #02: NS_LogCtor [xpcom/base/nsTraceRefcnt.cpp:1190]
> 13:08:14     INFO -  #03: nsTArray_base<nsTArrayInfallibleAllocator,nsTArray_CopyWithMemutils>::nsTArray_base<nsTArrayInfallibleAllocator,nsTArray_CopyWithMemutils>() [xpcom/ds/nsTArray-inl.h:15]
> 13:08:14     INFO -  #04: mozilla::BootstrapImpl::XRE_main(int,char * * const,mozilla::BootstrapConfig const &) [toolkit/xre/Bootstrap.cpp:49]
> 13:08:14     INFO -  #05: static int do_main(int, char * *, char * *) [browser/app/nsBrowserApp.cpp:232]
> 13:08:14     INFO -  #06: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:304]
> 13:08:14     INFO -  #07: wmain [toolkit/xre/nsWindowsWMain.cpp:132]
> 13:08:14     INFO -  #08: static int __scrt_common_main_seh() [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:283]
> 13:08:14     INFO -  #09: kernel32.dll + 0x53c45
> 13:08:14     INFO -  #10: ntdll.dll + 0x637f5
> 13:08:14     INFO -  #11: ntdll.dll + 0x637c8
> 13:10:58  WARNING -  TEST-UNEXPECTED-FAIL | None | application terminated with exit code 2147483651

Note that after the log output we also miss a carriage return, so that the assertion is concatenated.
Status: NEW → ASSIGNED
Flags: needinfo?(nfroyd)
Debugging output suggests that something is very fishy with the passed-in environment variable:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f5365e69c1409fe44d4af6312d0d7a92da98ff6

16:15:05     INFO -  REFTEST INFO | Application command: Z:\task_1524153997\build\application\firefox\firefox.exe -marionette -profile c:\users\task_1524153997\appdata\local\temp\tmpbgfkv6.mozrunner
Flags: needinfo?(nfroyd)
Sigh, bugzilla, thanks for eating my comment.  Once more, with the unicode characters removed.

Debugging output suggests that something is very fishy with the passed-in environment variable:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f5365e69c1409fe44d4af6312d0d7a92da98ff6

16:15:05     INFO -  REFTEST INFO | Application command: Z:\task_1524153997\build\application\firefox\firefox.exe -marionette -profile c:\users\task_1524153997\appdata\local\temp\tmpbgfkv6.mozrunner$COOKIE
16:15:05     INFO -  Running command: Z:\task_1524153997\build\application\firefox\firefox.exe -marionette -profile c:\users\task_1524153997\appdata\local\temp\tmpbgfkv6.mozrunner$COOKIE
16:15:05     INFO -  ## utf8[0] = 99
16:15:05     INFO -  ## utf8[1] = 58
16:15:05     INFO -  ## utf8[2] = 92
16:15:05     INFO -  ## utf8[3] = 117
16:15:05     INFO -  ## utf8[4] = 115
16:15:05     INFO -  ## utf8[5] = 101
16:15:05     INFO -  ## utf8[6] = 114
16:15:05     INFO -  ## utf8[7] = 115
16:15:05     INFO -  ## utf8[8] = 92
16:15:05     INFO -  ## utf8[9] = 116
16:15:05     INFO -  ## utf8[10] = 97
16:15:05     INFO -  ## utf8[11] = 115
16:15:05     INFO -  ## utf8[12] = 107
16:15:05     INFO -  ## utf8[13] = 95
16:15:05     INFO -  ## utf8[14] = 49
16:15:05     INFO -  ## utf8[15] = 53
16:15:05     INFO -  ## utf8[16] = 50
16:15:05     INFO -  ## utf8[17] = 52
16:15:05     INFO -  ## utf8[18] = 49
16:15:05     INFO -  ## utf8[19] = 53
16:15:05     INFO -  ## utf8[20] = 51
16:15:05     INFO -  ## utf8[21] = 57
16:15:05     INFO -  ## utf8[22] = 57
16:15:05     INFO -  ## utf8[23] = 55
16:15:05     INFO -  ## utf8[24] = 92
16:15:05     INFO -  ## utf8[25] = 97
16:15:05     INFO -  ## utf8[26] = 112
16:15:05     INFO -  ## utf8[27] = 112
16:15:05     INFO -  ## utf8[28] = 100
16:15:05     INFO -  ## utf8[29] = 97
16:15:05     INFO -  ## utf8[30] = 116
16:15:05     INFO -  ## utf8[31] = 97
16:15:05     INFO -  ## utf8[32] = 92
16:15:05     INFO -  ## utf8[33] = 108
16:15:05     INFO -  ## utf8[34] = 111
16:15:05     INFO -  ## utf8[35] = 99
16:15:05     INFO -  ## utf8[36] = 97
16:15:05     INFO -  ## utf8[37] = 108
16:15:05     INFO -  ## utf8[38] = 92
16:15:05     INFO -  ## utf8[39] = 116
16:15:05     INFO -  ## utf8[40] = 101
16:15:05     INFO -  ## utf8[41] = 109
16:15:05     INFO -  ## utf8[42] = 112
16:15:05     INFO -  ## utf8[43] = 92
16:15:05     INFO -  ## utf8[44] = 116
16:15:05     INFO -  ## utf8[45] = 109
16:15:05     INFO -  ## utf8[46] = 112
16:15:05     INFO -  ## utf8[47] = 98
16:15:05     INFO -  ## utf8[48] = 103
16:15:05     INFO -  ## utf8[49] = 102
16:15:05     INFO -  ## utf8[50] = 107
16:15:05     INFO -  ## utf8[51] = 118
16:15:05     INFO -  ## utf8[52] = 54
16:15:05     INFO -  ## utf8[53] = 46
16:15:05     INFO -  ## utf8[54] = 109
16:15:05     INFO -  ## utf8[55] = 111
16:15:05     INFO -  ## utf8[56] = 122
16:15:05     INFO -  ## utf8[57] = 114
16:15:05     INFO -  ## utf8[58] = 117
16:15:05     INFO -  ## utf8[59] = 110
16:15:05     INFO -  ## utf8[60] = 110
16:15:05     INFO -  ## utf8[61] = 101
16:15:05     INFO -  ## utf8[62] = 114
16:15:05     INFO -  ## utf8[63] = -61
16:15:05     INFO -  ## utf8[64] = -80
16:15:05     INFO -  ## utf8[65] = -59
16:15:05     INFO -  ## utf8[66] = -72
16:15:05     INFO -  ## utf8[67] = -62
16:15:05     INFO -  ## utf8[68] = -115
16:15:05     INFO -  ## utf8[69] = -62
16:15:05     INFO -  ## utf8[70] = -86
16:15:05     INFO -  ## utf8[71] = 92

utf[54..62] spell out 'mozrunner', at which point you'd expect to see the UTF8 bytes for \U0001f36a COOKIE.  But instead, you see:

map(lambda x: x + 256, [-61, -80, -59, -72, -62, -115, -62, -86]) => [195, 176, 197, 184, 194, 141, 194, 170]

which is a lot of bytes for \U0001f36a.  Emacs tells me that we should be seeing [240, 159, 141, 170] ([0xf0, 0x9f, 0x8d, 0xaa]) for the UTF8 bytes for that character.

Trying:

''.join(map(chr, [195, 176, 197, 184, 194, 141, 194, 170])).decode('utf8')

gives:

u'\xf0\u0178\x8d\xaa'

which is...not at all close to what we want to see, even if I had some off-by-one errors in there.  I have no idea where those bits are even coming from.  Actually, those are *very* close to the UTF8 bytes for \U0001f36a, but we have this crazy \u0178 in there.  I don't know where that character is coming from.

I would suggest:

1) Using a different unicode character, at least, something < 16 bits, just to make sure it's not an issue with \U0001f36a.  I am not be totally confident that our string processing code does the right thing with non-16 bit characters, but that should be a separate bug if it is indeed a problem.
2) Being *very* *very* sure that something in our Python testing code is not mangling the unicode characters before executing the firefox process.  Because that is sure what it looks like is happening.
Attached file python test
Sorry for the delay here in checking your fix but I was swamped with other major stuff the last weeks. But finally I was able to write a small Python testcase which demonstrates that the landed patch didn't fix the problem.

Just replace the path of the binary with the debug build on your own system. Then run the script and notice the assertion. When you comment out the bloat log environment variable, you will see that the code will work fine for the HTTP_LOG. So something is broken on Windows in handling the Unicode profile string. 

### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to c:\users\mozauto
\appdata\local\temp\tmphr27lf.leáklog/bloat.log
Assertion failure: false (Tried and failed to create an XPCOM log), at z:/build/
build/src/xpcom/base/nsTraceRefcnt.cpp:727
#01: ???[c:\mozilla\bin\debug\xul.dll +0xc23079]
#02: ???[c:\mozilla\bin\debug\xul.dll +0xc29f22]
#03: ???[c:\mozilla\bin\debug\xul.dll +0xbe3f15]
#04: workerlz4_maxCompressedSize[c:\mozilla\bin\debug\xul.dll +0x4c5b851]
#05: workerlz4_maxCompressedSize[c:\mozilla\bin\debug\xul.dll +0x4c4c9e7]
#06: workerlz4_maxCompressedSize[c:\mozilla\bin\debug\xul.dll +0x4c53b6b]
#07: ???[c:\mozilla\bin\debug\firefox.exe +0x3d0f]
#08: ???[c:\mozilla\bin\debug\firefox.exe +0x386a]
#09: ???[c:\mozilla\bin\debug\firefox.exe +0x4198]
#10: TargetNtUnmapViewOfSection[c:\mozilla\bin\debug\firefox.exe +0x5c630]
#11: BaseThreadInitThunk[C:\Windows\system32\kernel32.dll +0x15a4d]
#12: RtlUserThreadStart[C:\Windows\SYSTEM32\ntdll.dll +0x2b831]
Flags: needinfo?(nfroyd)
Does MOZ_LOG_FILE still work if the character is not a Latin-1 character?  I suspect not?
Flags: needinfo?(nfroyd)
When I change the code to use a Cyrillic character like U+043B Firefox doesn't start at all with the filesystem encoding because it cannot find the profile, or doesn't pick up the profile path, creates it's own profile, and logs all to stdout with "utf-8". 

Please note that we have a similar Unicode discussion currently going-on on bug 1457399. Maybe I make mistakes in correctly encoding the Unicode strings when passing those to subprocess.Popen.
I was able to start Firefox by switching to mozprocess because it makes use of `CreateProcessW` what `subprocess.Popen` isn't doing for Python 2.7.x. So no it also doesn't work, and just prints all the logs to stdout, BUT doesn't crash like we do here.
Please also see bug 1466097 with troubles related to MOZ_LOG_FILE by using an extended ASCII character.

Nathan, can you please tell me how you tested your patch on Windows? I would like to know how you set/encoded the environment variable via the console. I would need a working example to be more helpful with testing this patch. Thanks.
Flags: needinfo?(nfroyd)
I didn't test it, because the code is pretty obviously correct: if you want to get non-latin1 environment variable values, you need to use wgetenv, problem solved.  If that's not correct, then there's something wrong with the program invoking Firefox, i.e. Python.

Testing with a full Firefox build isn't helpful, because there's a lot of other stuff going on; we need a small test program to demonstrate clearly what's happening.  I wrote this one to see what happens when we're invoked from Python and retrieving environment variable values via getenv and wgetenv.  I had to modify the python test program in more-or-less obvious ways, passing in the environment:

env = {
    "ASCII_VAR": "ascii",
    "LATIN1_VAR": u"l\xe1tin1",
    "UNICODE_BMP_VAR": u"bmp\u263a", # smiley face
    "UNICODE_NON_BMP_VAR": u"!bmp\U0001f36a",
}

The results are not encouraging:

Filesystem encoding: mbcs
var UNICODE_NON_BMP_VAR = ['21', '62', '6d', '70', 'd83c', 'df6a']
var UNICODE_BMP_VAR = ['62', '6d', '70', '263a']
var ASCII_VAR = ['61', '73', '63', '69', '69']
var LATIN1_VAR = ['6c', 'e1', '74', '69', '6e', '31']

forcing UNICODE_NON_BMP_VAR as ['21', '62', '6d', '70', 'f0', '9f', '8d', 'aa']
forcing UNICODE_BMP_VAR as ['62', '6d', '70', 'e2', '98', 'ba']
forcing ASCII_VAR as ['61', '73', '63', '69', '69']
forcing LATIN1_VAR as ['6c', 'c3', 'a1', '74', '69', '6e', '31']

Examining var ASCII_VAR
getenv returns: ascii
0x61 0x73 0x63 0x69 0x69
wgetenv returns: ascii
0x0061 0x0073 0x0063 0x0069 0x0069

Examining var LATIN1_VAR
getenv returns: látin1
0x6c 0xc3 0xa1 0x74 0x69 0x6e 0x31
wgetenv returns: látin1
0x006c 0x00c3 0x00a1 0x0074 0x0069 0x006e 0x0031

Examining var UNICODE_BMP_VAR
getenv returns: bmpΓÿ║
0x62 0x6d 0x70 0xe2 0x98 0xba
wgetenv returns: bmpΓ0x0062 0x006d 0x0070 0x00e2 0x02dc 0x00ba

Examining var UNICODE_NON_BMP_VAR
getenv returns: !bmp🍪
0x21 0x62 0x6d 0x70 0xf0 0x9f 0x8d 0xaa
wgetenv returns: !bmp≡0x0021 0x0062 0x006d 0x0070 0x00f0 0x0178 0x008d 0x00aa

What this tells us is that we're passing in utf8 values for the environment variables, which wgetenv happily interprets as garbage.  This is incorrect; we should be passing in utf16 values, which is possibly why mozprocess works a little better.  In fact, you can't actually pass in utf16 values with Python 2.7, because it doesn't use the unicode-native APIs on Windows (I believe) and will complain about nulls in the environment (!).  Python 3.x appears to DTRT; you can just have unicode string environment variable values and they are correctly converted to utf16 when an external process is invoked:

Filesystem encoding: utf-8
var ASCII_VAR = ['61', '73', '63', '69', '69']
var LATIN1_VAR = ['6c', 'e1', '74', '69', '6e', '31']
var UNICODE_BMP_VAR = ['62', '6d', '70', '263a']
var UNICODE_NON_BMP_VAR = ['21', '62', '6d', '70', '1f36a']

Examining var ASCII_VAR
getenv returns: ascii
0x61 0x73 0x63 0x69 0x69
wgetenv returns: ascii
0x0061 0x0073 0x0063 0x0069 0x0069

Examining var LATIN1_VAR
getenv returns: lßtin1
0x6c 0xe1 0x74 0x69 0x6e 0x31
wgetenv returns: lßtin1
0x006c 0x00e1 0x0074 0x0069 0x006e 0x0031

Examining var UNICODE_BMP_VAR
getenv returns: bmp?
0x62 0x6d 0x70 0x3f
wgetenv returns: bmp0x0062 0x006d 0x0070 0x263a

Examining var UNICODE_NON_BMP_VAR
getenv returns: !bmp??
0x21 0x62 0x6d 0x70 0x3f 0x3f
wgetenv returns: !bmp0x0021 0x0062 0x006d 0x0070 0xd83c 0xdf6a

Dunno why the filesystem encoding is different, though.

We could instead pass in utf8 values and have the leak checker and the logging framework convert from UTF8 to UTF16.  My sense is that behavior is not "Windows-native", for lack of a better term--it'd probably fail when you passed in environment variables manually from the command line or similar.  So the best thing to do is to ensure Python is passing in correctly-encoded environment variable values.
Flags: needinfo?(nfroyd)
Nathan, so how would this look like with Python3? I tried several ways but I cannot pass-in a UTF-8 encoded string as environment variable to `subprocess.Popen()`. Do you have some code which clearly works for you?

The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?

Flags: needinfo?(nfroyd)

No, this is still a problem given that nothing landed here. Maybe we should remove the intermittent-failure keyword.

We should leave this bug open. It might not belong in XPCOM, though.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #47)

Nathan, so how would this look like with Python3? I tried several ways but I
cannot pass-in a UTF-8 encoded string as environment variable to
subprocess.Popen(). Do you have some code which clearly works for you?

Attached is the Python code I used to generate the output in comment 46 with the C++ program already attached in this bug; Python3 seems to translate UTF-8 encoded values into UTF-16 values suitable to retrieve with wgetenv with subprocess.Popen() for me.

Flags: needinfo?(nfroyd)
Attachment #9040099 - Attachment mime type: application/octet-stream → text/plain

The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?

Flags: needinfo?(nfroyd)
Flags: needinfo?(nfroyd)

The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?

Flags: needinfo?(nfroyd)

I guess we can freely use Python3 now, so we can make this issue go away?

Flags: needinfo?(nfroyd)

The leave-open keyword is there and there is no activity for 6 months.
:nika, maybe it's time to close this bug?

Flags: needinfo?(nika)

As it looks like this seems to work nowadays with a Firefox Nightly debug build:

XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to ./firefox-log♥.txt_tab_pid83770

Not sure what else needs to be done in Gecko specifically. Maybe we should close as WFM?

Actually we landed a patch for Firefox 61 about 3 years ago (see comment 39), and the bug was kept open due to remaining issues when running Firefox through mozprocess. I think we should better get this bug marked as fixed, and file a new bug in case problems remain with Python 3.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(nika)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: