Closed Bug 888900 Opened 6 years ago Closed 6 years ago

Clipboard is emptied(paste command gray out) when exiting the browser since Beta23.0b1, nsWindow destructor doesn't get called

Categories

(Core :: General, defect)

24 Branch
x86_64
Windows 7
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla25
Tracking Status
firefox22 --- unaffected
firefox23 + verified
firefox24 + verified
firefox25 + verified

People

(Reporter: alice0775, Assigned: billm)

References

Details

(Keywords: regression)

Attachments

(1 file)

Build Identifier:
http://hg.mozilla.org/releases/mozilla-aurora/rev/30a7b726833d
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130630 Firefox/24.0 ID:20130630004018
http://hg.mozilla.org/mozilla-central/rev/cbb24a4a96af
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130630 Firefox/25.0 ID:20130630031138


Reproducible: very often

Steps to Reproduce:
1. Start Firefox
2. Copy selected text
3. Exit browser
4. Try to paste on Notepad.exe(or other text editor)

Actual Results:
Paste command gray out

Expected Results:
Clipboard text should be pasted


Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/b31bfbb2bdc4
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130403 Firefox/23.0 ID:20130403083520
Bad:
http://hg.mozilla.org/mozilla-central/rev/b5cb88ccd907
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130403 Firefox/23.0 ID:20130403084327
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b31bfbb2bdc4&tochange=b5cb88ccd907


Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/32125e0a9954
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130403 Firefox/23.0 ID:20130403072721
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/798d734dc32a
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130403 Firefox/23.0 ID:20130403091421
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=32125e0a9954&tochange=798d734dc32a


Regression window(ionmonkey)
Good:
http://hg.mozilla.org/projects/ionmonkey/rev/73009fa09525
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20130308 Firefox/22.0 ID:20130308040205
Bad:
http://hg.mozilla.org/projects/ionmonkey/rev/ef0e134ef78f
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20130312 Firefox/22.0 ID:20130312040201
Pushlog:
http://hg.mozilla.org/projects/ionmonkey/pushloghtml?fromchange=73009fa09525&tochange=ef0e134ef78f


In local build(from ionmonkey):
Last Good: eccf45749400
First Bad: b942f88d95c5
Regressed by:
b942f88d95c5	Jan de Mooij — Merge from mozilla-central.
This is happening to me too in UX Nightly. It began about 3 updates ago.

I started UX in Safe Mode and it fixed it. So then I started back in normal mode and disabled all of my add-ons and restarted. I still couldn't copy text and then paste it after closing UX, so now I'm thinking that it may be either some custom setting I made in about:config, or it's a bug.
Alice, thanks again for reporting and bisecting.

According to comment 0 this is a regression from the baseline compiler landing. Shouldn't it also affect Firefox 23 in that case?
Flags: needinfo?(alice0775)
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
(In reply to Jan de Mooij [:jandem] from comment #2)
> Alice, thanks again for reporting and bisecting.
> 
> According to comment 0 this is a regression from the baseline compiler
> landing. Shouldn't it also affect Firefox 23 in that case?

Yeah, the problem happens in Beta23.0b1 and latest tinderbox build.
http://hg.mozilla.org/releases/mozilla-beta/rev/f863495cdd41
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130701 Firefox/23.0 ID:20130701155331
Flags: needinfo?(alice0775)
Summary: Clipboard is emptied(paste command gray out) when exit browser since Aurora24.0a2 → Clipboard is emptied(paste command gray out) when exit browser since Beta23.0b1
I can't reproduce this on OS X... Will try Windows now.
OK, I see this on Windows. Setting the baselinejit.chrome pref to |false| fixes it. Looks like it's caused by a chrome script that runs during shutdown.
(In reply to Jan de Mooij [:jandem] from comment #6)
> OK, I see this on Windows. Setting the baselinejit.chrome pref to |false|
> fixes it. Looks like it's caused by a chrome script that runs during
> shutdown.

How is baselinejit.chrome set to false? Can you tell me exactly what to do, step-by-step?

Thank you Jan de Mooij!
(In reply to twocables from comment #7)
> How is baselinejit.chrome set to false? Can you tell me exactly what to do,
> step-by-step?

Type about:config in the address bar, hit enter and click the "I'll be careful" button. Then in the search field type baselinejit.chrome There should be one result, double click on it to change the value to "false".

This could make the browser or extensions slower though, so do this at your own risk.
In reply to Jan de Mooij [:jandem] from comment #8)
> (In reply to twocables from comment #7)
> > How is baselinejit.chrome set to false? Can you tell me exactly what to do,
> > step-by-step?
> 
> Type about:config in the address bar, hit enter and click the "I'll be
> careful" button. Then in the search field type baselinejit.chrome There
> should be one result, double click on it to change the value to "false".
> 
> This could make the browser or extensions slower though, so do this at your
> own risk.

Oh, I was trying that in Firefox 22 thinking that I was in UX. Facepalm. :)

Anyway, it doesn't work for me. Is there anything else that can be tried?
(In reply to twocables from comment #9)
> Anyway, it doesn't work for me. Is there anything else that can be tried?

Did you restart the browser after changing that pref to |false|? If you're still seeing the problem after a restart it may be a different issue.
Yeah, I even did it twice just to be sure.
Alice, is it possible the regression range in comment 0 is wrong or for another bug? This reproduces reliably for me with newer builds, but works fine with nightlies from 1-2 months ago. Bisecting now.
I think the problem of twocables@comcast.net is different bug because he/she is using UX build.
Hm bisecting points to a regression around June 25, but when I bisect on inbound the first revision fails.

This does not match comment 0, bug 873484 (filed in May) and bug 883554 (filed two weeks ago), so it's likely a timing/GC/threading issue...
(In reply to Alice0775 White from comment #13)
> I think the problem of twocables@comcast.net is different bug because he/she
> is using UX build.

Oh. I just assumed that this was for UX because it was posted by someone participating in a UX thread on the MozillaZine forum.

I will submit a new bug. Thank you for pointing it out, Alice!
twocables is filing a separate bug (please do reference it here!).

For the original issue, is anybody able to repro this regression range? (qawanted for a Windows repro) If not, we won't have reason to track. Thanks!
Keywords: qawanted
I'm reporting a bug for UX Nightly. Should I choose "Trunk", or "25 Branch"?
Has anyone reproduced yet?
QA Contact: anthony.s.hughes
Firefox 23.0a1 2013-04-02 (aae004a3c5d9) - UNAFFECTED
Firefox 23.0a1 2013-04-03 (97cfc16ba5dc) - AFFECTED

Pushlog:
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aae004a3c5d9&tochange=97cfc16ba5dc

Note that I'm unable to regress this any further on mozilla-central due to the age of this regression.
Keywords: qawanted
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #19)
> Firefox 23.0a1 2013-04-02 (aae004a3c5d9) - UNAFFECTED
> Firefox 23.0a1 2013-04-03 (97cfc16ba5dc) - AFFECTED
> 
> Pushlog:
> > http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aae004a3c5d9&tochange=97cfc16ba5dc

Anybody correct me if I'm wrong, but if I'm reading this [0] right it broke a day before the baseline compiler merge. At least for Anthony, Alice and I get different ranges...

[0] http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2013-04-01&enddate=2013-04-05
If you're able to reproduce, do we need to discuss a regression range any longer? Sounds like it's time to dig in.
The range of comment20 includes the range of comment#0.

The offending changeset is b942f88d95c5	definitely.
When this bug happened,
*Paste command is grayed out on notepad.exe and iexplore.exe.
*However, Paste command is enabled on Thunderbird.exe/wordpad.exe. And the paste command is carried out successfully.
I'm on it. I hope to have more info later today.
I used binary search on script->lineno to see which Baseline-compiled script is responsible (have the compiler abort on scripts with lineno not in the current range, etc). The culprit seems to be WindowsPreviewPerTab.jsm:357, PreviewController.prototype.handleEvent.

The problem is likely event-related so this makes sense I think. Now trying to find out what's going wrong.
This is pretty weird. Here's what happens:

(1) The Baseline JIT compiles PreviewController.prototype.handleEvent, WindowsPreviewPerTab.jsm:357.

(2) We attach a Native Call stub for the call to item here:

        let r = clientRects.item(i);

(3) GC marks the JSFunction* (clientRects.item) stored inside this stub.

(4) Somehow, (3) causes us to never run nsWindow::~nsWindow with sInstanceCount == 0, so we never call OleFlushClipboard() and the clipboard is emptied.

The question is why we don't call nsWindow::~nsWindow for the last instance. Here's how this used to happen:

     xul.dll!nsWindow::~nsWindow()  Line 398    C++
     xul.dll!nsWindow::`scalar deleting destructor'()  + 0x8 bytes    C++
     xul.dll!nsBaseWidget::Release()  Line 71 + 0x1b bytes    C++
     xul.dll!nsDeviceContext::~nsDeviceContext()  Line 242 + 0x33 bytes    C++
     xul.dll!nsDeviceContext::Release()  Line 26 + 0x18 bytes    C++
     xul.dll!nsPresContext::~nsPresContext()  Line 316 + 0xc1 bytes    C++
     xul.dll!nsPresContext::`scalar deleting destructor'()  + 0x8 bytes    C++
     xul.dll!nsPresContext::DeleteCycleCollectable()  Line 324 + 0xf bytes    C++
     xul.dll!mozilla::dom::BarProp::cycleCollection::DeleteCycleCollectableImpl(void * p)  Line 38 + 0xa bytes    C++
     xul.dll!SnowWhiteKiller::~SnowWhiteKiller()  Line 2169    C++
     xul.dll!nsCycleCollector::FreeSnowWhite(bool aUntilNoSWInPurpleBuffer)  Line 2280 + 0xa bytes    C++
     xul.dll!nsCycleCollector::Shutdown()  Line 3017    C++
     xul.dll!nsCycleCollector_shutdown()  Line 3337    C++
     xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager * servMgr)  Line 679    C++
     xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup()  Line 1128    C++

I think that thanks to (3) we are no longer able to collect our PresContext or something. Does this make sense? Any idea how to fix this? CC'ing people who should know more...
At first glance, it seems like a problem that we're relying on a destructor to run at shutdown. We've been optimizing the shutdown sequence so that we can kill the process rather than having to destroy everything. This means that side effects like file I/O and flushing the clipboard should be done somewhere else to ensure that they run.

It seems like just bad luck that the baseline compiler triggered this. I'm guessing it just changes marking a little bit so that the window stays alive longer and doesn't get destroyed before shutdown.
It looks like bug 402439 maybe had something to do with this. It moved the clipboard flushing from an observer to the nsWindow destructor. Maybe we could do the flushing in both places?

I pushed a simple change to try to do this.
https://tbpl.mozilla.org/?tree=Try&rev=6274250d37c5

Jan, could you maybe try this out tomorrow if it looks green?
That's believable.  Is the immediate problem that we're not calling OleFlushClipboard?  It should be easy to move that somewhere else (like the widget XPCOM module dtor).
(In reply to Bill McCloskey (:billm) from comment #28)
> Jan, could you maybe try this out tomorrow if it looks green?

Your patch fixes the problem, I tested both the Try build (comment 30) and a local build just to be sure. Thanks again! Assigning this to you.
Assignee: jdemooij → wmccloskey
This patch causes us to flush the clipboard when we destroy the last nsWindow and when we get a shutdown notification. That way it shouldn't regress bug 402439, since we're guaranteed to flush the clipboard before shutting down OLE (which will still happen in ~nsWindow).

I'm not sure whether it's safe to do CloseClipboard(). The old code did it, but I don't know enough about the Win32 API to say whether it's right. Maybe someone else knows?

It seems like this bug could be more widespread than just the issue reported here. As far as I can tell, if we ever leak a window during shutdown (which, despite our best efforts, happens a lot) then we won't write out the clipboard. That seems bad, so I think we should consider backporting a fix for this aggressively if it works on trunk.
Attachment #776772 - Flags: review?(jmathies)
Duplicate of this bug: 894412
(In reply to Bill McCloskey (:billm) from comment #27)
> At first glance, it seems like a problem that we're relying on a destructor
> to run at shutdown. We've been optimizing the shutdown sequence so that we
> can kill the process rather than having to destroy everything. This means
> that side effects like file I/O and flushing the clipboard should be done
> somewhere else to ensure that they run.
> 
> It seems like just bad luck that the baseline compiler triggered this. I'm
> guessing it just changes marking a little bit so that the window stays alive
> longer and doesn't get destroyed before shutdown.

How early might this happen? We free resources in the Windows destroy native window callback OnDestroy(). That code *must* run.

Not running a class destructor in the name of quicker shutdown perf (why do we care about that if the main window is already hidden?) seems wrong.
Summary: Clipboard is emptied(paste command gray out) when exit browser since Beta23.0b1 → Clipboard is emptied(paste command gray out) when exiting the browser since Beta23.0b1, nsWindow destructor doesn't get called
(In reply to Bill McCloskey (:billm) from comment #32)
> Created attachment 776772 [details] [diff] [review]
> flush the clipboard in an observer

How about moving this call to Destroy() inside a check of sInstanceCount?
We also free ime related resources in the dtor which might not get called.
(In reply to Jim Mathies [:jimm] from comment #36)
> We also free ime related resources in the dtor which might not get called.

nsTextStore::Terminate() and nsIMM32Handler::Terminate() just clean up the global instances. Perhaps, it might not cause any problems.

Cannot we use ModuleDtor for the cleaning up?
http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWidgetFactory.cpp#261

I mean, can we make nsWindow::Shutdown() for the |sInscanceCount == 0| case?
Is there anything actually wrong with flushing in an observer?

> How about moving this call to Destroy() inside a check of sInstanceCount?

sInstanceCount is decremented in ~nsWindow. In general, we don't know if or when that will be called because it may be run by the cycle collector. So I don't think it's a good idea to do something that depends on sInstanceCount being correct.

I think the main reason we care about shutdown performance is that it matters for B2G. Optimizing it for desktop naturally fell out of that. Shutdown performance matters for desktop as well if you're trying to restart your browser. I've had shutdowns that took tens of seconds, and you have to wait until it's done to re-open Firefox or else you'll be told that the profile is in use.

The basic idea of all this as I understand it is that stuff that destructors should only be responsible for freeing memory while observers handle important stuff like writing out to disk. That way we can skip the destructors.
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #37)
> (In reply to Jim Mathies [:jimm] from comment #36)
> > We also free ime related resources in the dtor which might not get called.
> 
> nsTextStore::Terminate() and nsIMM32Handler::Terminate() just clean up the
> global instances. Perhaps, it might not cause any problems.
> 
> Cannot we use ModuleDtor for the cleaning up?
> http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWidgetFactory.
> cpp#261
> 
> I mean, can we make nsWindow::Shutdown() for the |sInscanceCount == 0| case?

I like this more, assuming modules get shut down properly. Rather than duplicating code, I'd like to identify a location we know won't get skipped where we can do global shutdown work like OleFlushClipboard and OleUninitialize.
(In reply to Bill McCloskey (:billm) from comment #38)
> Is there anything actually wrong with flushing in an observer?
> 
> > How about moving this call to Destroy() inside a check of sInstanceCount?
> 
> sInstanceCount is decremented in ~nsWindow. In general, we don't know if or
> when that will be called because it may be run by the cycle collector. So I
> don't think it's a good idea to do something that depends on sInstanceCount
> being correct.

Ok. Per Masayuki's post, do you think nsWidgetWindowsModuleDtor() is safe?

> I think the main reason we care about shutdown performance is that it
> matters for B2G. Optimizing it for desktop naturally fell out of that.
> Shutdown performance matters for desktop as well if you're trying to restart
> your browser. I've had shutdowns that took tens of seconds, and you have to
> wait until it's done to re-open Firefox or else you'll be told that the
> profile is in use.

I'm really surprised we approved this kind of blanket change without thinking it through. Any idea which bug or platform post this was discussed in?

> The basic idea of all this as I understand it is that stuff that destructors
> should only be responsible for freeing memory while observers handle
> important stuff like writing out to disk. That way we can skip the
> destructors.

I'm fine with moving stuff out of our dtors as long as we find a suitable place to do the work we really need to do to get a clean shutdown.

More generally, I wonder about other areas of the code base, for example our acceleration related code and how failing the cleanup might effect graphic driver performance. We also allocate system memory in areas like IPC, drag and drop, and I'm sure there are others. I would love to say with confidence that once the process shuts down the system is smart enough to free everything up, but I'm not sure that is a safe bet.
The main discussion about shutting down is in bug 662444 I think. The change that stopped us from running the cycle collector at shutdown was bug 818739.

I'm not sure whether XPCOM modules are shut down properly. Let's ask bsmedberg.
Flags: needinfo?(benjamin)
(In reply to Jim Mathies [:jimm] from comment #40)
 
> > I think the main reason we care about shutdown performance is that it
> > matters for B2G. Optimizing it for desktop naturally fell out of that.
> > Shutdown performance matters for desktop as well if you're trying to restart
> > your browser. I've had shutdowns that took tens of seconds, and you have to
> > wait until it's done to re-open Firefox or else you'll be told that the
> > profile is in use.
> 
> I'm really surprised we approved this kind of blanket change without
> thinking it through. Any idea which bug or platform post this was discussed
> in?
It was thought through. Any code relying on dtors to do something during shutdown is just plain wrong,
especially in CCed/GCed environment. One should use various shutdown notifications, not dtors.
> How early might this happen? We free resources in the Windows destroy native
> window callback OnDestroy(). That code *must* run.

Why? If we've already decided to shut down, freeing resources is just extra work. The plan of record is to kill the Firefox process once we've shut down the profile and skip all of the rest of shutdown (except when doing leak checking in some debug builds).

I'm not sure what "XPCOM modules are shut down properly" means. I believe that (since we haven't done the exit patch yet) the XPCOM module destructors will run. But nobody should be relying on those destructors to do anything except clean up leaks. This doesn't have anything to do with JSM modules, which are handled by xpconnect.
Flags: needinfo?(benjamin)
The fact (running destructor isn't guaranteed) sounds there might be other bugs which fail cleaning up system wide API on other platforms too.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #43)
> > How early might this happen? We free resources in the Windows destroy native
> > window callback OnDestroy(). That code *must* run.
> 
> Why? If we've already decided to shut down, freeing resources is just extra
> work. The plan of record is to kill the Firefox process once we've shut down
> the profile and skip all of the rest of shutdown (except when doing leak
> checking in some debug builds).

Poking through the code a bit, I see we use calls like CreateFileMapping, which return a HANDLE which must be closed in order to free the associated memory. Also while I'm not familiar with our D3D/D2D backend and how the system manages memory there, I'm concerned an improper shutdown might leave resources allocated in gfx drivers.

Plus, this bug is a perfect example of what can go wrong without proper shutdown.
Why don't we contain the discussion here to this particular bug. bsmedberg, when we're ready to land the "terminate before proper shutdown" work I'd suggest we post to dev platform first to discuss possible side effects.
(In reply to Jim Mathies [:jimm] from comment #45)
> Plus, this bug is a perfect example of what can go wrong without proper
> shutdown.
Note, we have *never* guaranteed the order of dtors in general. That would be hard in refcounted
environment, and even harder when GC/CC are involved.
Windows closes all open resources when a process terminates, including open HANDLEs and whatnot. We have to be prepared for this anyway, since it will terminate processes after we finish from WM_ENDSESSION. Raymond Chen is talking about Firefox here: http://blogs.msdn.com/b/oldnewthing/archive/2012/01/05/10253268.aspx

So I will strongly assert that we should not try to free all our D2D/D3D resources on exit, nor should we try to close all our filemappings, or anything like that. We should make sure that profile data is properly flushed, and let Windows clean up everything else.

Our strategy for exit(0) was already discussed and decided on dev-platform.
Comment on attachment 776772 [details] [diff] [review]
flush the clipboard in an observer

I'm ok with this as a work around. I'll file a follow up on the code in ~nsWindow not executing properly on shutdown.
Attachment #776772 - Flags: review?(jmathies) → review+
Comment on attachment 776772 [details] [diff] [review]
flush the clipboard in an observer

I'm going to request approval for all branches, although it seems debatable. The risk to taking this patch seems pretty low to me. The problem is pretty severe when it happens (loss of data in the clipboard). On Aurora and Beta, it seems like this happens pretty frequently because of a quirk with the baseline compiler. However, I think it's a problem on all Firefox version since 2007: if we happen to leak any windows during shutdown, then we won't flush the clipboard.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: it can cause the data in the clipboard to be lost
User impact if declined: data loss
Fix Landed on Version: 25
Risk to taking this patch (and alternatives if risky): seems pretty low
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): immediate cause was the baseline compiler, but this has always been somewhat broken
User impact if declined: loss of clipboard data
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #776772 - Flags: approval-mozilla-esr17?
Attachment #776772 - Flags: approval-mozilla-beta?
Attachment #776772 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/e74e95c046e7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Comment on attachment 776772 [details] [diff] [review]
flush the clipboard in an observer

We'll take this low risk fix on aurora/beta but it doesn't meet ESR criteria and will have to ride the trains to be in the 24.0 ESR release coming up in another cycle.
Attachment #776772 - Flags: approval-mozilla-esr17?
Attachment #776772 - Flags: approval-mozilla-esr17-
Attachment #776772 - Flags: approval-mozilla-beta?
Attachment #776772 - Flags: approval-mozilla-beta+
Attachment #776772 - Flags: approval-mozilla-aurora?
Attachment #776772 - Flags: approval-mozilla-aurora+
Verified as fixed on Firefox 23 beta 8:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0
Build ID: 20130722172257
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130723 Firefox/24.0 buildid 20130723004004
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20130723 Firefox/25.0 buildid 20130723030205

Verified fixed on aurora 24.0a2, nightly 25.0a1 with STR in comment 0.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.