Closed Bug 694570 Opened 13 years ago Closed 8 years ago

Stop using libgnome and libgnomeui on Linux

Categories

(Toolkit :: Startup and Profile System, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: chrisccoulson, Assigned: oliver.henshaw)

References

Details

(Keywords: regression)

Attachments

(5 files, 5 obsolete files)

Firefox still depends on libgnome and libgnomeui (specifically, GnomeClient - see toolkit/xre/nsNativeAppSupportUnix.cpp) for talking to the session manager. The problem with this is:

1) Those are both deprecated, and have been for a long time
2) They have quite heavy dependencies on a lot of other deprecated libraries (orbit, bonobo, gnomevfs, gconf, glade, gnomecanvas)

We dropped these entirely from the default install in Ubuntu, and this now results in Firefox not shutting down cleanly when users log out of their session (leading to the "Well, this is embarassing" screen when they next start Firefox.

Unfortunately, there isn't really a good replacement for GnomeClient, but the options are:

1) Use the gnome-session DBus interface directly. However, this won't work outside of gnome.
2) Use libsm directly to talk to the session managers xsmp interface, rather than relying on a higher level library. This is more work, but lighter dependencies and would work outside of gnome too.
3) Embed a copy of EggSMClient from http://git.gnome.org/browse/libegg/tree/libegg/ , which has a xsmp backend and is light on dependencies (just libsm AFAICT)

Gnome applications seem to do a combination of 1 and 3, but I think option 1 is a non-starter for us, as the dbus interface is completely gnome-specific and doesn't have any API stability guarantees. The only software I know which does 2 is metacity.

What do other people think?
Considering first option: I understand that using dbus is suitable only for Gnome but libgnome(ui) is also Gnome-only, right? So far only Gnome platform was supported.

I also like third option but I'm unsure what's Mozilla's attitude to embedding foreign sources.

What is for sure that we need to get rid of libgnome(ui). Sooner is better.
(In reply to jhorak from comment #1)
> Considering first option: I understand that using dbus is suitable only for
> Gnome but libgnome(ui) is also Gnome-only, right? So far only Gnome platform
> was supported.

No: libgnome/libgnomeui are using xsmp, so this (supposedly) works on all desktops.
Indeed, libgnome/libgnomeui does work on all desktops. The fact that it pulls in half of gnome to work is unfortunate, but it does work
I like the third option too. If we can get some agreement that that is the best way to go, then I will start work on this
Hmm, EggSMClient is LGPL only, right?

Not sure but I think that this is not compatible with Mozilla's tri-license when embedded codewise.
Wolfgang: ah, good point. We can ask the code to be relicensed if needed, I guess. There shouldn't be tons of contributors to it (a quick count gives me 13 people, most of them being well-known and easy to reach).
Assignee: nobody → chrisccoulson
Any news on this subject? I may try to create some patch for this one.
Ah, I've already got one for it here. Will attach what I have in a moment, but I'm just doing something else first
This is what I've got so far, although it's not ready for review just yet. I haven't made any corresponding changes in configure.in or any other part of the build system yet.

Any feedback is welcome on the approach.

This makes session manager integration work without embedding eggsmclient, depending on the GNOME-specific dbus interface, adding any other desktop-specific dependencies or depending on Gtk 3.4 (which has session management in GtkApplication)

It should work outside of GNOME too, although I didn't try this yet.

I've not added a runtime dependency on libSM or libICE, as there wasn't one before. There is a build-time dependency on these though, although the headers for these are already installed on the build machines.

I made a couple of other changes to the behaviour too:
- It now only sends the "session-save" notification when the save style is SmSaveLocal or SmSaveBoth. Saving internal state with a save style of SmSaveGlobal is actually incorrect. This means that Firefox now distinguishes between a normal session exit and a session exit with session saving enabled.
- As "quit-application-requested" might pop up a dialog, it only does this if the interact style is not SmInteractStyleNone
- "quit-application-requested" is only sent after sending SmcInteractRequest and receiving an interact message. If we were using eggsmclient, this would actually happen transparently anyway (using the EggSMClient::quit_requested signal)

I have to say, libSM and libICE are probably the most horrible API's I've ever had to work with :)
I'm getting "IO error occured doing Protocol Setup on connection" when calling SmcOpenConnection.
Libs used:
libSM-1.2.1-2.fc18.x86_64
libICE-1.0.8-2.fc18.x86_64
it dies somewhere in _IceRead function.
Ignore last comment, problem was with Fedora's Gnome shell.

Basically it looks like it's working fine. Problem is that Firefox quit, ie. nsIAppStartup::Quit takes some time and session manager is sometimes impatient and kills Firefox sooner that it actually quits which leads to another 'Well, this is embarrassing' page.

There seems to be session saving code missing yet. Session saving should be supported in eggsmc.
I think problem is that nsIAppStartup::Quit have some async calls and returns sooner than application is actually quit. The session manager is expecting that Firefox quit is finished when it returns from nsNativeAppSupportUnix::DieCB method.

We need to wait in nsNativeAppSupportUnix::DieCB until application actually finish quitting. Is there something for this in current code?
(In reply to jhorak from comment #12)
> I think problem is that nsIAppStartup::Quit have some async calls and
> returns sooner than application is actually quit. The session manager is
> expecting that Firefox quit is finished when it returns from
> nsNativeAppSupportUnix::DieCB method.
> 
> We need to wait in nsNativeAppSupportUnix::DieCB until application actually
> finish quitting. Is there something for this in current code?

We don't need to wait in DieCB() until the app has done most of the quit, but I expect SmcCloseConnection() should not be called until later in the shutdown sequence.

http://lesstif.sourceforge.net/doc/super-ux/g1ae04e/chap12.html#12.4.1.2

Perhaps there is some later message that can be used to close the SM connection.
Perhaps nsINativeAppSupport::Stop() is appropriate.  I'm not clear from the docs that Stop() will always be called.

Perhaps reordering the operations in DieCB() to close the SM connection after nsIAppStartup::Quit may be sufficient.

I wonder whether GnomeClient was explicitly closing the connection, or whether is just got closed with the Display connection.
The SmcCloseConnection called after nsIAppStartup::Quit improved situation a bit, but it's not 100%. The best so far was not to call SmcCloseConnection from DieCB at all. Destructor of nsNativeAppSupportUnix calls DisconnectFromSM and according to my testing it seems to be working just fine.
so what is the status of this bug now? i'm sorta confused based on the last comment. Also, should it be a blocker for wayland support?
Blocks: 557601
There is really no good reason for this code to exist anymore.  I ended up with libgnome installed as a dependency for another old program and my Firefox install started misbehaving as a (nearly untracable) sideeffect of that.  It took me quite a while to figure out that a Firefox bug could be triggered as a result of installing an unrelated program.

The fact that I didn't have libgnome installed before, and everything was working just fine, is as much of an indication as any that this code needs to die.
(In reply to desrt from comment #16)
> There is really no good reason for this code to exist anymore.  I ended up
> with libgnome installed as a dependency for another old program and my
> Firefox install started misbehaving as a (nearly untracable) sideeffect of
> that.  It took me quite a while to figure out that a Firefox bug could be
> triggered as a result of installing an unrelated program.
> 
> The fact that I didn't have libgnome installed before, and everything was
> working just fine, is as much of an indication as any that this code needs
> to die.
I think this is explained by bug #557601 comment #8 - firefox blocks shutdown for important asynchronous events when you ask it to quit, but not when the session manager asks. I don't think attachment 608321 [details] [diff] [review] addresses this any better than the current code does, so a fix is needed for bug #557601 in any case.

I hope someone with more knowledge of asynchronous shutdown can tell whether my analysis is correct and can point to the best way to fix this.
Actually, comment #14 is accurate.Calling SmcCloseConnection from ~nsNativeAppSupportUnix should always be safe, as all shutdown phases will have completed by this point (in the current code)

I managed to understand the shutdown sequence a little better by attaching gdb and sprinkling some breakpoints around then doing a normal quit. I couldn't get much direct insight into quit-on-logout as just attaching a debugger seemed to slow things down enough that firefox would always SIGHUP, but from my newly-aqcquired understanding, the code paths *are* the same.


Breakpoint 7, nsObserverService::NotifyObservers (this=0x7f029b4d0330, aSubject=0x0, aTopic=0x7f02b12745a1 "profile-before-change", aSomeData=0x7f02b1592000 <nsXREDirProvider::DoShutdown()::kShutdownPersist> u"shutdown-persist") at /usr/src/debug/firefox-37.0.1/mozilla-release/xpcom/ds/nsObserverService.cpp:319
319 {
[Current thread is 1 (Thread 0x7f02b3d5f780 (LWP 8791))]
#0 nsObserverService::NotifyObservers (this=0x7f029b4d0330, aSubject=0x0, aTopic=0x7f02b12745a1 "profile-before-change", aSomeData=0x7f02b1592000 <nsXREDirProvider::DoShutdown()::kShutdownPersist> u"shutdown-persist") at /usr/src/debug/firefox-37.0.1/mozilla-release/xpcom/ds/nsObserverService.cpp:319
#1 0x00007f02b0ab1fe0 in nsXREDirProvider::DoShutdown (this=0x7fffeed02f88) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsXREDirProvider.cpp:905
#2 0x00007f02b0aaa102 in ScopedXPCOMStartup::~ScopedXPCOMStartup (this=0x7f02b29ac7c0, __in_chrg=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsAppRunner.cpp:1324
#3 0x00007f02b0aafc63 in XREMain::XRE_main (this=0x7fffeed02f48, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsAppRunner.cpp:4312
#4 0x00007f02b0aafeac in XRE_main (argc=1, argv=0x7fffeed04468, aAppData=0x7fffeed03158, aFlags=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsAppRunner.cpp:4505
#5 0x00000000004040fd in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffeed04468, xreDirectory=0x7f02b2949780) at /usr/src/debug/firefox-37.0.1/mozilla-release/browser/app/nsBrowserApp.cpp:292
#6 0x000000000040389f in main (argc=1, argv=0x7fffeed04468) at /usr/src/debug/firefox-37.0.1/mozilla-release/browser/app/nsBrowserApp.cpp:661

i.e. profile-before-change (and web-workers-shutdown which happens just after) happens when  toolkit/xre/nsAppRunner.cpp:XREMain::XRE_main destroys the ScopedXPCOMStartup object, just after exiting the event loop.

Breakpoint 3, nsNativeAppSupportUnix::~nsNativeAppSupportUnix (this=0x7f02b29e5460, __in_chrg=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsNativeAppSupportUnix.cpp:126
126 class nsNativeAppSupportUnix : public nsNativeAppSupportBase
[Current thread is 1 (Thread 0x7f02b3d5f780 (LWP 8791))]
#0 nsNativeAppSupportUnix::~nsNativeAppSupportUnix (this=0x7f02b29e5460, __in_chrg=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsNativeAppSupportUnix.cpp:126
#1 0x00007f02b0ab0e3e in nsNativeAppSupportBase::Release (this=0x7f02b29e5460) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsNativeAppSupportBase.cpp:16
#2 0x00007f02b0aafebd in XRE_main (argc=1, argv=0x7fffeed04468, aAppData=0x7fffeed03158, aFlags=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsAppRunner.cpp:4507
#3 0x00000000004040fd in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffeed04468, xreDirectory=0x7f02b2949780) at /usr/src/debug/firefox-37.0.1/mozilla-release/browser/app/nsBrowserApp.cpp:292
#4 0x000000000040389f in main (argc=1, argv=0x7fffeed04468) at /usr/src/debug/firefox-37.0.1/mozilla-release/browser/app/nsBrowserApp.cpp:661

...whereas ~nsNativeAppSupportUnix is called when it goes out of scope when XRE_main returns,so is necessarily later.

As for the appService->Quit() called from die_cb (DieCB in the patch), it does indeed deliberately queue the work of exiting for after it returns:

// Quit the application. We will asynchronously call the appshell's
// Exit() method via nsAppExitEvent to allow one last pass
// through any events in the queue. This guarantees a tidy cleanup.


So attachment 608321 [details] [diff] [review] (with minor tweaks) should indeed fix bug #557601. I'm not sure whether it can be fixed in the current code.
As a minor aside, it would be desirable to do as much work as possible between SaveYourselfCB() and SmcSaveYourselfDone(), and as little as possible between DieCB() and SmcCloseConnection().

At least in KDE, the time-out for the first phase is slightly longer (15 seconds and configurable versus 10 seconds and not configurable IIRC)

But probably the convolutions required to accomplish this aren't worth it.
I'm guessing libsm/libice won't work on wayland? 

Looks like we might be able to get shutdown (but not logout) support from systemd. -https://wiki.freedesktop.org/www/Software/systemd/inhibit/
(In reply to Bryan Quigley from comment #20)
> I'm guessing libsm/libice won't work on wayland? 
> 
> Looks like we might be able to get shutdown (but not logout) support from
> systemd. -https://wiki.freedesktop.org/www/Software/systemd/inhibit/

Well, the ICE protocol was deliberately independent of the X11 protocol so I guess classical session managers could still work with wayland. In practice, I don't know.

A systemd-style logout inhibit would let us delay logout until firefox has saved everything it needs to. But it doesn't provide a way to save restore the position and desktop of each window, as libsm/xsmp does. Nor does it restore applications on the next login or let word-processor style apps restore with the currently-opened file.
I've updated the patch above, and have had some success with it, but it still needs some work in order to disconnect from the SM at the right time. It's been compiled and tested a little on top of 37.0.1, it just needed a couple of trivial conflicts resolved to rebase it to master, but hasn't been tested there.

Changes from v1
---------------

* Rebase to 37.0.1 and resolve conflicts
  - Drop MOZ_WIDGET_GTK == 2 checks
  - Unbitrot to match "Bug 777292 part 2 - Change all nsnull to
nullptr" (Note that this was actually Bug 626472)
  - Unbitrot to match "Bug 773151: Convert nsCAutoString->nsAutoCString
CLOSED TREE r=bsmedberg"
  - Unbitrot to match "Bug 784739 - Switch from NULL to nullptr in
toolkit/;r=ehsan"
  - Avoid "Wwrite-strings" warning for static string conversion
* Further changes to the functionality of the patch
  - Don't DisconnectFromSM() until destructor.
  - Add comments explaining the rationale for disconnect timing wrt
shutdown phases
  - Expand some comments with more detail on the standard or
implementation issues
  - Don't IceSetErrorHandler, and add comment about the motivation for
IceSetIOErrorHandler
  - Rename client_id to prev_client_id; Rename new_client_id to
client_id
  - Set SM_CLIENT_ID on client leader window as per ICCM specified

Questions
---------

* Is the #define gtk_function gtk_function_, #undef gtk_function dance
in the gtk compat defines the right approach?
* are gio watch callbacks always guaranteed to be on the main thread?
* what should I use to log instead of printf?

Remaining issues
----------------
* Doesn't solve bug #557601 - DisconnectFromSM is called at the right
time when quitting firefox, but is too early on shutdown. I don't
understand why yet.
* The interaction call/response with the session manager needs some work
* Doesn't set a couple of SM properties that gnome-client did
* Doesn't use grabs to prevent interaction at the wrong time like
gnome-client did
To expand on it not yet solving bug #557601 - on normal application quit I see New state = DISCONNECTED
 after phase "web-workers-shutdown"; on SM-triggered shutdown/quit I've seen it DISCONNECTED just after phase "profile-change-teardown"..
Updated this patch again, and tested on master and on top of 45.0.1. It works
with both gtk2 and gtk3 builds but I'd still like feedback about whether I'm
doing the right thing with all the gtk plumbing.

Solved Issues
-------------

"Doesn't solve bug #557601 - DisconnectFromSM is called at the right time when
quitting firefox, but is too early on shutdown. I don't understand why yet."

This was just PEBKAC. Turns out that if you run your test firefox build from a
terminal it will reliably be killed when the terminal quits - who knew? Running
with nohup or from e.g. krunner works and firefox survives until it's finished
cleaning up. Firefox started by session restore also behaves as expected.

Changes from v2
---------------

* Rebase to master
  - Unbitrot to match "Bug 1207245 - part 6 - rename nsRefPtr<T> to RefPtr<T>"
  - Unbitrot: add missing nsThreadUtils.h include
* Add keyword for SetClientState logging
* Add symbol to mozgtk so that gtk3 builds link
* Written a new commit message

Questions
---------

* Is the #define gtk_function gtk_function_, #undef gtk_function dance in the
  gtk compat defines the right approach?
* Is the mozgtk stub symbol definition sufficient for gtk3 builds?
* are gio watch callbacks always guaranteed to be on the main thread?
* what should I use to log instead of printf? What log facilities are still
 available in late shutdown?

Remaining issues
----------------

* The interaction call/response with the session manager may need some work
  - in fact it needs to be tested at all.
* Doesn't set a couple of SM properties that gnome-client did
* Doesn't use grabs to prevent interaction at the wrong time like gnome-client
  did. But the libegg SM implementation doesn't do anything to prevent
  interaction, as far as I can tell. So I'm not sure what the correct behaviour
  is.
Attachment #8737263 - Flags: feedback?(karlt)
Attachment #8605868 - Attachment is obsolete: true
(In reply to Oliver Henshaw from comment #18)
> I managed to understand the shutdown sequence a little better by attaching
> gdb and sprinkling some breakpoints around then doing a normal quit.

https://wiki.mozilla.org/XPCOM_Shutdown may be useful, but it sounds like
you've worked things out anyway.

(In reply to Oliver Henshaw from comment #24)
> Questions
> ---------
> 
> * Is the #define gtk_function gtk_function_, #undef gtk_function dance in the
>   gtk compat defines the right approach?

It's probably possible to use macros to save some of the repetition, if you
like.  See, for example,
https://dxr.mozilla.org/mozilla-central/rev/68c0b7d6f16ce5bb023e08050102b5f2fe4aacd8/toolkit/system/gnome/nsPackageKitService.cpp#48
https://dxr.mozilla.org/mozilla-central/rev/68c0b7d6f16ce5bb023e08050102b5f2fe4aacd8/dom/gamepad/linux/udev.h#74

But what's there already looks fine, except that I don't know why the #undefs
are required.

> * Is the mozgtk stub symbol definition sufficient for gtk3 builds?

Yes, good.

> * are gio watch callbacks always guaranteed to be on the main thread?

Yes, if set up on the main thread.

> * what should I use to log instead of printf? What log facilities are still
>  available in late shutdown?

MOZ_LOG and LazyLogModule from
https://dxr.mozilla.org/mozilla-central/source/xpcom/base/Logging.h
should work fine, I think.
That's the preferred system if it works, and I don't see it being shut down
at any stage.

> Remaining issues
> ----------------
> 
> * The interaction call/response with the session manager may need some work
>   - in fact it needs to be tested at all.

I haven't looked at that yet.  Will make some time to look over this patch,
thanks.

> * Doesn't set a couple of SM properties that gnome-client did

Which ones are those?

> * Doesn't use grabs to prevent interaction at the wrong time like
>   gnome-client did. But the libegg SM implementation doesn't do anything to
>   prevent interaction, as far as I can tell. So I'm not sure what the
>   correct behaviour is.

I don't know of a good reason why we need to care about that.
This fails to compile in latest trunk with undefined reference to NS_NewRunnableMethod.
Clang thinks I meant to type NewRunnableMethod.
Was the function renamed?
Changes from v3
---------------

* Rebase to master
  - Unbitrot to match "Bug 1268313: Part 7 - Move NS_NewRunnableMethod and
    friends to mozilla::NewRunnableMethod"
* Set client state to disconnected before closing connection, so that the log
  message appears before the actual disconnection.
* Use mozilla::Logging in SetClientState
  - The first possible MOZ_LOG user is very early in startup so I need to
    add a LogModule::Init() in nsNativeAppSupportUnix::Start as nothing else
    has initialised the logging yet.
* Rework setting of SM Properties
  - Add helper functions setSMValue, setSMProperty.
  - Set all the SM Properties required by the spec
  - Try to set a reasonable-ish fallback value if can't determine a correct value
    (NB: XREMain::XRE_mainInit will exit early if AppData->name is not set)
  - Warn if setting a fallback value
  - Use appropriate name for each SMProp and SMPropValue[]

Solved Issues
-------------

Tested interaction during shutdown (i.e. "quit-application-requested") by
having multiple tabs open and set browser.showQuitWarning to true, and
patching with:

diff --git a/browser/components/nsBrowserGlue.js b/browser/components/nsBrowserGlue.js
index eec2e82..ee0d186 100644
--- a/browser/components/nsBrowserGlue.js
+++ b/browser/components/nsBrowserGlue.js
@@ -312,7 +312,7 @@ BrowserGlue.prototype = {
       case "session-save":
         this._setPrefToSaveSession(true);
         subject.QueryInterface(Ci.nsISupportsPRBool);
-        subject.data = true;
+        subject.data = false;
         break;
       case "places-init-complete":
         if (!this._migrationImportsDefaultBookmarks)
@@ -1377,8 +1377,8 @@ BrowserGlue.prototype = {
     // browser.showQuitWarning specifically covers quitting
     // browser.tabs.warnOnClose is the global "warn when closing multiple tabs" pref

-    var sessionWillBeRestored = Services.prefs.getIntPref("browser.startup.page") == 3 ||
-                                Services.prefs.getBoolPref("browser.sessionstore.resume_session_once");
+    var sessionWillBeRestored = false; //Services.prefs.getIntPref("browser.startup.page") == 3 ||
+                               // Services.prefs.getBoolPref("browser.sessionstore.resume_session_once");
     if (sessionWillBeRestored || !Services.prefs.getBoolPref("browser.warnOnQuit"))
       return;
Attachment #8737263 - Attachment is obsolete: true
Attachment #8737263 - Flags: feedback?(karlt)
Attachment #8754795 - Flags: review?(karlt)
Attachment #8754796 - Flags: review?(karlt)
Comment on attachment 8754795 [details] [diff] [review]
(1/2) - Change modelines to those recommended by coding style

Thanks!
Attachment #8754795 - Flags: review?(karlt) → review+
Comment on attachment 8754796 [details] [diff] [review]
(2/2) - Stop using libgnome and libgnomeui on Linux v4

Andrew might be able to look over this before I get a chance.  Still there's quite a bit going on here, so it may take some time.  Thanks, Andrew!
Attachment #8754796 - Flags: review?(karlt) → review?(andrew)
As indicated in bug 557601, this is a regression with the port to GTK3, for those that still had the libgnome libraries installed.
Blocks: gtk3, 1186003
Keywords: regression
Comment on attachment 8754796 [details] [diff] [review]
(2/2) - Stop using libgnome and libgnomeui on Linux v4

Review of attachment 8754796 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks! There are a few style issues, but the file's style is highly inconsistent anyway. While I'm not that familiar with X11 session management, things seem quite sane according to https://www.x.org/releases/X11R7.6/doc/libSM/SMlib.html.

::: toolkit/xre/nsNativeAppSupportUnix.cpp
@@ +429,5 @@
> +
> +  --gArgc;
> +}
> +
> +static void setSMValue(SmPropValue& val, const nsCString& data)

Please capitalize function names and add a newline after the return type declaration in implementations.

Also, this block of code isn't guarded by MOZ_X11. Systems without X11 won't be able to resolve SmPropValue.

@@ +436,5 @@
> +  val.length = data.Length();
> +}
> +
> +static void setSMProperty(SmProp& prop, const char* name, const char* type,
> +                          int numVals, SmPropValue vals[])

Style and guard as described above.

@@ +491,5 @@
> +    char *arg = *curarg;
> +    if (arg[0] == '-' && arg[1] == '-') {
> +      arg += 2;
> +      if (!strcmp(arg, "sm-disable")) {
> +        RemoveArg(curarg);

Is there any reason we can't just increment argv / decrement argc here instead of shifting the char* pointers in RemoveArg?

@@ +576,5 @@
> +    char errbuf[256];
> +    mSessionConnection = SmcOpenConnection(nullptr, this, SmProtoMajor,
> +                                           SmProtoMinor, mask, &callbacks,
> +                                           prev_client_id.get(), &client_id,
> +                                           256, errbuf);

Replace 256 with sizeof(errbuf).
Attachment #8754796 - Flags: review?(andrew) → review+
Changes from v4
---------------

Made myself the author, since I deserve the blame even though I don't
deserve the credit. Added a note that Chris Coulson wrote the original
patch. If there's a better way to handle this, perhaps someone with
mercurial-fu can do the magic?

Addressed review comments:

> ::: toolkit/xre/nsNativeAppSupportUnix.cpp
> @@ +429,5 @@
> > +
> > +  --gArgc;
> > +}
> > +
> > +static void setSMValue(SmPropValue& val, const nsCString& data)
>
> Please capitalize function names and add a newline after the return type
> declaration in implementations.
>
> Also, this block of code isn't guarded by MOZ_X11. Systems without X11 won't
> be able to resolve SmPropValue.

Done.

>
> @@ +436,5 @@
> > +  val.length = data.Length();
> > +}
> > +
> > +static void setSMProperty(SmProp& prop, const char* name, const char* type,
> > +                          int numVals, SmPropValue vals[])
>
> Style and guard as described above.
and done.

> @@ +491,5 @@
> > +    char *arg = *curarg;
> > +    if (arg[0] == '-' && arg[1] == '-') {
> > +      arg += 2;
> > +      if (!strcmp(arg, "sm-disable")) {
> > +        RemoveArg(curarg);
>
> Is there any reason we can't just increment argv / decrement argc here
> instead of shifting the char* pointers in RemoveArg?
The argument to remove might be at an arbitrary position in argv, so I
don't think there's a better implementation. Actually, it's copied from
toolkit/xre/nsAppRunner.cpp so might it be worth a followup
de-duplicating it?

>
> @@ +576,5 @@
> > +    char errbuf[256];
> > +    mSessionConnection = SmcOpenConnection(nullptr, this, SmProtoMajor,
> > +                                           SmProtoMinor, mask, &callbacks,
> > +                                           prev_client_id.get(), &client_id,
> > +                                           256, errbuf);
>
> Replace 256 with sizeof(errbuf).

Done

Note
----

An easier way to test interaction during shutdown (i.e.
"quit-application-requested") by having multiple tabs open and set
browser.showQuitWarning to true, then configuring the session manager to
not save state (so it only asks for SmSaveGlobal)
Attachment #8754796 - Attachment is obsolete: true
https://treeherder.mozilla.org/#/jobs?repo=try&revision=24c95faf3d3e
is  a compile-only try run. Not sure what tests, if any, are suitable.
sorry had to back out for memory leaks like https://treeherder.mozilla.org/logviewer.html#?job_id=9625415&repo=fx-team
Flags: needinfo?(chrisccoulson)
toolkit/xre/nsAppRunner.cpp:SaveToEnv() intentionally leaks memory because it ends up calling putenv(3) which 'leaks' by design and by specification - this was suppressed for valgrind in bug #793534. But I don't understand why this doesn't trigger the leak sanitizer in the existing code - also there's nothing in these patches that touches nsAppRunner.cpp.

I can't reproduce this locally but I made a couple of wild guesses as to what change could trigger the lsan warning (https://hg.mozilla.org/try/rev/9770e1ff4923 and https://hg.mozilla.org/try/rev/69de1e6095ae ) but the try job still fails - https://treeherder.mozilla.org/#/jobs?repo=try&revision=59094121359b&selectedJob=22442721

If there was an nspr function that used setenv(3) instead of putenv(3) then the strings would be copied inside libc and the existing lsan suppression for libc would cover this.

needinfo'ing mccr8 in case he has any insight into this.
Flags: needinfo?(continuation)
I suggest continuing to reduce the patch to find the minimal change that triggers the leak.

Still, it might be helpful to know how PR_SetEnv() leaks are usually suppressed for the leak sanitizer.
I have no idea why this wouldn't show up before. I'm not an expert on PR_SetEnv, but all of the callers of SaveToEnv are passing in string constants, so it doesn't look to me that you actually need to do a strdup. The header for SetEnv just says that it has to be a "constant, persistent string", and passing in a string literal should be fine. A few call sites do that already. So maybe just inline SaveToEnv and remove the strdup.
Flags: needinfo?(continuation)
nsAppRunner is compiled into libxul.so.  If it leaves static strings in the environment then that can cause shutdown crashes after libxul.so is unloaded - bug 555894 and bug 473629.

My best guess here is that something is or was changing the environment, in a way that changed whether lsan thought the strings were still in use.  The fix may be just to add suppressions for these leaks, but it would be helpful to know why lsan thinks these are no longer in the environment.
LSan treats things reachable from global variables as not being leaks, which is different than Valgrind.
A fairly reduced change is https://hg.mozilla.org/try/rev/def1122a86ba4a17de9da4ed6fa04323c819b753 (with the no-op https://hg.mozilla.org/try/rev/487f69ba0d14 as parent) - see https://treeherder.mozilla.org/#/jobs?repo=try&revision=959751bdb468. I think setenv and putenv share some code in glibc - so perhaps there's some interaction there, i.e. setenv calls prevent putenv leaks but removing the calls reveals them?

If the putenv leaks are something that is fixed in later glibc versions (than the try builder uses) that may explain why I can't reproduce this locally.
(In reply to Oliver Henshaw from comment #43)
> A fairly reduced change is
> https://hg.mozilla.org/try/rev/def1122a86ba4a17de9da4ed6fa04323c819b753

Interesting, thanks.  So either setenv or unsetenv changes something about the other environment variables.

I think these leak reports should just be suppressed.
I guess the way to do that is to add to
http://searchfox.org/mozilla-central/source/build/sanitizers/lsan_suppressions.txt
Flags: needinfo?(chrisccoulson)
Attachment #8754795 - Attachment is obsolete: true
glandium suggested using MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT directly in
the code.
Attachment #8763602 - Flags: review?(continuation)
Minor change that I spotted while trying to track down the setenv
problem - drop unneeded xlib.h include.
Attachment #8757425 - Attachment is obsolete: true
Another un-needed include, this one was left over after the removal of
the N900 code.
Attachment #8763607 - Flags: review?(karlt)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=76ba509b5927 shows the lsan annotation works.

Is there any need to re-run the normal try tests from comment #34?
Comment on attachment 8763602 [details] [diff] [review]
(2/4) - Annotate deliberate leak in SaveToEnv

(In reply to Oliver Henshaw from comment #46)
> glandium suggested using MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT directly in
> the code.

Yes, that's much better than my suggestion, thanks.

LGTM, but to comply with documentation, I'll leave review to Andrew.
Attachment #8763602 - Flags: feedback+
Attachment #8763607 - Flags: review?(karlt) → review+
(In reply to Oliver Henshaw from comment #49)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=76ba509b5927 shows
> the lsan annotation works.
> 
> Is there any need to re-run the normal try tests from comment #34?

No, those ASAN tests should be enough, thanks.
Comment on attachment 8763602 [details] [diff] [review]
(2/4) - Annotate deliberate leak in SaveToEnv

Review of attachment 8763602 [details] [diff] [review]:
-----------------------------------------------------------------

Be sure to #include "mozilla/MemoryChecking.h" in this file please.
Attachment #8763602 - Flags: review?(continuation) → review+
(In reply to Andrew McCreight (PTO-ish) [:mccr8] from comment #52)
> Comment on attachment 8763602 [details] [diff] [review]
> (2/4) - Annotate deliberate leak in SaveToEnv
> 
> Review of attachment 8763602 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Be sure to #include "mozilla/MemoryChecking.h" in this file please.

It's already there. Which is good, because I can pretend I checked.
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/fx-team/rev/10617f9fdc57
(1/4) - Change modelines to those recommended by coding style. r=karlt
https://hg.mozilla.org/integration/fx-team/rev/8996273af30f
(2/4) - Annotate deliberate leak in SaveToEnv. r=karlt
https://hg.mozilla.org/integration/fx-team/rev/c37c930d089f
(3/4) - Stop using libgnome and libgnomeui on Linux. r=acomminos
https://hg.mozilla.org/integration/fx-team/rev/ecd3562339dc
(4/4) - Drop unused Xatom.h include. r=karlt
Keywords: checkin-needed
Thank you for your thoroughness and persistence, Oliver!
Assignee: chrisccoulson → oliver.henshaw
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: