Closed Bug 455283 Opened 16 years ago Closed 16 years ago

RoboForm extension causes Firefox 3.0.2 to crash.

Categories

(Firefox :: Extension Compatibility, defect)

3.0 Branch
x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: ivank, Assigned: roc)

Details

(Keywords: verified1.9.0.2, verified1.9.0.4)

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071128 SeaMonkey/1.1.7 Mnenhy/0.7.5.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071128 SeaMonkey/1.1.7 Mnenhy/0.7.5.0

Crash report:
http://crash-stats.mozilla.com/report/index/74469270-8038-11dd-8fb3-0013211cbf8a

nsIFrame interface has been changed in Firefox 3.0.2.
Virtual method GetScreenRectInAppUnitsExternal has been added.
RoboForm uses nsIFrame VFT from Firefox 3.0, and RoboForm call of nsIFrame->GetWindow() causes Firefox to crash.

Due to your article:
http://developer.mozilla.org/En/Updating_extensions_for_Firefox_3
Firefox 3.0.* API shouldn't be changed since Firefox 3.0 release.

Is it possible to stop Firefox 3.0.2 release and to return back the API?
Or at least to move GetScreenRectInAppUnitsExternal to the end of nsIFrame definition?

Here is the code from RoboForm:
nsIFrame *frame = nsnull;
frame = presShell->GetPrimaryFrameFor(pContent);
if(!frame) return E_FAIL;
nsIWidget *widget = nsnull;
widget = frame->GetWindow();
if(!widget) return E_FAIL;
HWND hwnd = (HWND)widget->GetNativeData(NS_NATIVE_WINDOW);

Reproducible: Always

Steps to Reproduce:
1. 
2.
3.
Flags: blocking1.9.0.3?
Flags: blocking1.9.0.2?
nsIFrame is not, and has never been, any kind of public API. Don't use it.
Component: General → Extension Compatibility
QA Contact: general → extension.compatibility
see also https://bugzilla.mozilla.org/show_bug.cgi?id=436302#c18
i got two more crash reports in the pipe, but those keep lasting in pending state for hours :(
bp-2ec12b00-826f-11dd-859c-001a4bd43ed6
bp-77560d54-826f-11dd-9e66-001321b13766

a workaround is to disable the roboform toolbar (actually before the crash happens).
RoboForm has a history of not using public APIs, fwiw. I bet they're not alone. Is there clear documentation (or indication in the code) of which APIs are public (and thus not altered in dot releases) vs. private (and thus shouldn't be used by add-ons distributed to Firefox users)?
Status: UNCONFIRMED → NEW
Ever confirmed: true
FWIW, RoboForm claims about 100k users on Firefox 3 at the moment.
(In reply to comment #3)
> RoboForm has a history of not using public APIs, fwiw. I bet they're not alone.
> Is there clear documentation (or indication in the code) of which APIs are
> public (and thus not altered in dot releases) vs. private (and thus shouldn't
> be used by add-ons distributed to Firefox users)?

Well the issue here is that the "frozen" api's generally don't get you very far at all. But at least pointing out that any interface that isn't in an idl is probably not good to use would be good.
The change was made in bug 422607 which fixed a regression with window positioning.

Bug 436302 comment 18 indicates that the issue might be with a deprecated DLL file; is that the case, or is that a red herring?
this is a new bug.

you change your supposedly frozen APIs and things start crashing

he reported it here:
https://bugzilla.mozilla.org/show_bug.cgi?id=455283

do you REALLY have to change APIs in every dot version and cause crashes of all binary addons?
Vadim: please see comment 1 - this API isn't frozen.
frozen APIs are not sufficient to do anything serious.

we need this API to get coordinates of objects on the page.

our developer should provide more detailed comments.

there is no any other api that does it.

the truth is that your frozen APIs are not complete,
so you cannot change indices of unfrozen APIs entry points.

then again, unless you have a pleasure of crashing and trashing otherwise working products.
actually i see that the problem is described at the top of the ticket.

securoty releases (and 3.0.2 is probably the one) are not supposed to change any APIs at all.

what exactly is the point of adding calls in the middle other 
than to increase crashes of addons that use this APIs?
APIs that should be stable on security releases (although not between major releases) are:
http://mxr.mozilla.org/seamonkey/source/dom/public/idl/core/nsIDOMNSElement.idl
http://mxr.mozilla.org/seamonkey/source/dom/public/idl/html/nsIDOMNSHTMLElement.idl

If you really need external APIs for other things, you should let Mozilla developers know what those other things are, rather than just using internal ones.  (It's not like you haven't run into this problem before.)
i asked our FF developers to compile list of un-frozen APIs 
that we use with justification for why we use them.

meanwhile is it possible to move method GetScreenRectInAppUnitsExternal to the end of the list, so that all other methods retains their indices in vtable.

this looks like a simple change and it will fix the crash problem.
Adding to the end of the list sounds reasonable; I have no idea what the state of 3.0.2 is, though.
Ivan Khlistov:
Examples of using unfrozen interfaces in Roboform:

*nsIFrame to get html element frame (to get element coordinates)
*nsIDocShellTreeItem used to search for HTML docshells
*nsIMutationObserver to catch DOM tree mutation events
*nsIWidget to get handles of firefox windows (for example requires for Basic Auth dialogs formfill)
Roboform requires info about Firefox windows. So we need HWND and have to use dependent unfrozen interfaces.
*nsIDOMDocumentXBL used to receive XUL elements
*nsIDOMCSS2Properties used to work with CSS
(In reply to comment #15)
> Ivan Khlistov:
> Examples of using unfrozen interfaces in Roboform:
> 
> *nsIFrame to get html element frame (to get element coordinates)

see comment 12 for public interfaces to do this.  (Public interfaces that shouldn't change for security releases, but might change for major releases -- in fact, the two interfaces I pointed you to will be combined in the next major release.)

> *nsIDocShellTreeItem used to search for HTML docshells

You'd need to provide more information about what it is you actually want to do here (although maybe somebody else knows of frozen interfaces for this, or is willing to say that this interface shouldn't change for security releases, which might be the case) -- and this bug report probably isn't the best place.  Maybe discuss in one of the newsgroups?

> *nsIMutationObserver to catch DOM tree mutation events

The frozen DOM event APIs should let you do this.

> *nsIWidget to get handles of firefox windows (for example requires for Basic
> Auth dialogs formfill)
> Roboform requires info about Firefox windows. So we need HWND and have to use
> dependent unfrozen interfaces.

Same comment as above about more information being needed.  (But unlike the previous case, nsIWidget is not potentially public.)

> *nsIDOMDocumentXBL used to receive XUL elements
> *nsIDOMCSS2Properties used to work with CSS

These are public interfaces that shouldn't change in security releases, although they may change between major releases.
so can firefox people fix this particular crash by moving GetScreenRectInAppUnitsExternal to the
end of the list, so that all other methods retain their indices in vtable.

RoboForm Adapter does not expect Firefox APIs to change for security dot releases and there is no any protecttion against that.
we only expect things to change for major releases.

i think firefox crashing for about 300K users is a problem that deserves a fix.
nsIFrame is not an API. I'm sorry it seems like one, but it's not even XPCOM, and it's for private use within layout. You need to stop using it, although I'm pretty sure we've already had this conversation.

nsIDocShellTreeItem will not change for minor releases
so can you restore the order of entry points for FF 3.0.2 and 
we will meanwhile figure out how *not* to use this API.

the problem is that RF version that uses it is already out there and
it will start crashing once FF 3.0.2 comes out.

even if we figure out how to do it without nsIFrame,
we cannot change the existing version.
Another thing you should look at is the accessibility APIs. They are supported and offer a lot of functionality for looking at the contents of documents, changing the contents of editors, etc.
Attached patch patch as requested by roc (obsolete) — Splinter Review
Comment on attachment 338760 [details] [diff] [review]
patch as requested by roc

We need this in 1.9.0.2.
Attachment #338760 - Flags: approval1.9.0.2?
Roc: is there need beyond Roboform? This bug isn't marked blocking yet, so I'm having trouble understanding if you're saying we need to respin or not.
The other possible option is to remove this from nsIFrame on branch entirely and put it in nsLayoutUtils, no?  Or just remove it completely, since we don't have any callers of GetScreenRectInAppUnitsExternal that I can see.

Vadim, this has come up before, and we've repeatedly told you guys what APIs you _should_ be using.  We'll fix this on our end, but please fix your code.  It's been years now that this problem has persisted on your end.
Based on comments here, I'm comfortable stating that this does block 1.9.0.3; I'm not yet convinced that we need to block 1.9.0.2 for it. If we have reasons to respin, we might be able to take this patch. Shaver seems to be blocking and tackling the idea of getting 1.9.0.2+this patch builds generated for people to test against.

If we *don't* block for 1.9.0.2 on this bug, then we'll need to blocklist this version of RoboForm for Firefox 3.0.2 users.
Flags: blocking1.9.0.3? → blocking1.9.0.3+
what do you mean? your tetminology here is a bit heavy.
"respin", "shaver" -- i do not know what this is.

do you say FF 3.0.2 will crash or block RF and 
then FF 3.0.3 will work ok with RoboForm?

can you just keep the order of calls in FF 3.0.2,
so that RF does not crash and we have time to get rid of offending calls.
my general proposal is *not* to make any changes that change existing function vtable indices for dot security releases.

any binary addon is much more likely to crash if you do this.

such changes are better done in major releases such as upcoming FF 3.1
where addon developers have time to test them and make them work.

we are going to fully convert to "frozen" interfaces for the FF 3.1 release,
but we may need assistance from Mozilla developers if we cannot get necessary functionality from "frozen" interfaces.
Vadim: sorry, didn't mean to confuse you.

(In reply to comment #26)
> do you say FF 3.0.2 will crash or block RF and 
> then FF 3.0.3 will work ok with RoboForm?

For Firefox 3.0.2, we have two options:
1. Delay the release of Firefox 3.0.2, take this patch & rebuild.
2. Ship Firefox 3.0.2 tomorrow and blocklist the version of RoboForm that crashes.

I am saying that at this point we are unwilling to delay the release of Firefox 3.0.2 any further, so we will likely be following option #2.

We *will* take Roc's patch for Firefox 3.0.3, at which point RoboForm should work again. Or you can update RoboForm to not break, I guess, but then when we take this patch for 3.0.3, the vtable will go back to how it was before 3.0.2 :(

> can you just keep the order of calls in FF 3.0.2,
> so that RF does not crash and we have time to get rid of offending calls.

To do this would add a week to our shipping schedule. We are still considering it, but we would like to be able to ship this release sooner rather than later.

(In reply to comment #27)
> my general proposal is *not* to make any changes that change existing function
> vtable indices for dot security releases.

While I think that's a good general guideline for us to work to, I don't think it's something we can guarantee for reasons that have been expressed to you before. First and foremost being that we don't support this as a way of add-ons calling into our code.

> any binary addon is much more likely to crash if you do this.

Agreed, which is why it's a good guideline to follow; still not guaranteed.

> we are going to fully convert to "frozen" interfaces for the FF 3.1 release,
> but we may need assistance from Mozilla developers if we cannot get necessary
> functionality from "frozen" interfaces.

Good to hear. Please join us in IRC (irc.mozilla.org #developers) if you need any help.
> "respin", "shaver" -- i do not know what this is.

"respin" means throwing away the already-built final bits of Firefox 3.0.2, making changes, and recreating those bits across all locales, etc.  As beltzner says, this means a week-long delay in ship or so.

"Shaver" is a name.  Specifically the last name of Mike Shaver, currently VP of engineering.
>We *will* take Roc's patch for Firefox 3.0.3, at which point RoboForm should
>work again. Or you can update RoboForm to not break, I guess, but then when we
>take this patch for 3.0.3, the vtable will go back to how it was before 3.0.2

1. we cannot really "upgrade roboform not to break", not the version that users already have. we can offer user to upgrade and then user upgrades or not.

2. by blocking RF or letting it crash you will be effectively destroying our business and not helping your reputation either. even microsoft -- which is an evil corporation -- is not evil enough to change binary interfaces in security updates.

3. we really do not like to see our business destroyed, 
so we request and stongly insist that this problem must be fixed in 3.0.2.

4. not fixing it in 3.0.2 and fixing it in 3.0.3 makes no sense at all, it will just further confuse the issue, because now we will have 3 different setups to address. given that sometimes declared firefox version does not correspond to the actual version, it may create potentially unfixable crashes. if you are not going to fix it in 3.0.2, then you should just leave it as is -- so that you do not make it worse.
Update: it looks like bug 454406 might require us to rebuild Firefox 3.0.2 anyway, so we'll probably take this fix.

Vadim: looks like things will be OK, but we'll want you to test RoboForm against these new builds when they're available, OK?
Comment on attachment 338760 [details] [diff] [review]
patch as requested by roc

Please land on cvs trunk as well as GECKO190_20080827_RELBRANCH
Attachment #338760 - Flags: approval1.9.0.2? → approval1.9.0.2+
(In reply to comment #32)
> (From update of attachment 338760 [details] [diff] [review])
> Please land on cvs trunk as well as GECKO190_20080827_RELBRANCH

Done:
mozilla/layout/generic/nsIFrame.h 	3.403
mozilla/layout/generic/nsIFrame.h 	3.402.2.1
(In reply to comment #30)
> even microsoft -- which is an evil corporation -- is not evil enough to
> change binary interfaces in security updates.

How do you know? They don't publish their source code. We certainly didn't
change any of the supported interfaces we export to $(OBJDIR)/dist/sdk which
more or less correspond with the most you'd be able to use in a MS product.

> 3. we really do not like to see our business destroyed, 

Nor do we want to inconvenience the thousands of Firefox users who also use
Roboform. If you're going to base your product on Firefox internals you really
should be testing with Firefox nightly builds just as Firefox developers do to
catch regressions sooner. This could have been caught a month ago rather than
the day before we were scheduled to ship, after we've been testing "final"
builds for a week or two.

To get on the nightly update train -- at least one of your testers -- go to
ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ and install one of the
builds that end in "mozilla1.9.0". Every day you'll get prompted to install the
latest development patches.

If you still support Firefox 2 another tester should install one of the
"mozilla1.8" builds so you can make sure we don't accidentally break that, too.
Attached patch better fixSplinter Review
We simply don't need this function. It was only added by way of analogy to GetScreenRectExternal, there are no users.

We shouldn't rev the IID on nsIFrame. It hasn't changed on 1.9.0.x yet and it shouldn't change on that branch. Here we're just reverting the vtable change so we don't need to change the IID for this.

I'll land this on trunk as well.
Assignee: nobody → roc
Attachment #338760 - Attachment is obsolete: true
Attachment #338911 - Flags: superreview?(bzbarsky)
Attachment #338911 - Flags: review?(bzbarsky)
Attachment #338911 - Flags: approval1.9.0.2?
I guess on trunk, we should rev the IID though. I'll do that.
Attachment #338911 - Flags: superreview?(bzbarsky)
Attachment #338911 - Flags: superreview+
Attachment #338911 - Flags: review?(bzbarsky)
Attachment #338911 - Flags: review+
>How do you know? They don't publish their source code. We certainly didn't
>hange any of the supported interfaces we export to $(OBJDIR)/dist/sdk which
>ore or less correspond with the most you'd be able to use in a MS product.

yes, they do not publish their source code and we do not really use any internal functions, that's true. still we never had a situation where a MS security release comes out and RF starts to crash the IE. And believe me, we try to dig pretty deep into IE too, so there are things that might have failed due to changes in unpublished internals -- but they do not.

we do have problems only when they go from IE6 to IE7 or from IE7 to IE8
or from XP to Vista. but this is announced like 6 months in advance and there is plenty of time to test betas.

if you want to win over MS one day in this browser war (and we would like to encourage you to do it), i recommend that you adopt that tactic of theirs -- do not change anything that would cause binary incompatibilities for small releases. it does not look like much but it fixes many problems that you do not even see before the official release.

> This could have been caught a month ago rather than 
> the day before we were scheduled to ship, after we've been testing "final"
> builds for a week or two.

Yes, the timing was not good on our side either.
Me personally, I learned about this yesterday or day before.
I will instruct developers to jump onto testing nightly releases right away.

The long term solution (FF 3.1) is to switch to using only official frozen interfaces.

We still support FF 2.0 and even 1.5, but they have a lot less users.
but we will be testing them too, esp 2.0, as it is still being used by many.
Comment on attachment 338911 [details] [diff] [review]
better fix

a=beltzner, please land on cvs trunk and the relbranch
Attachment #338911 - Flags: approval1.9.0.2? → approval1.9.0.2+
(In reply to comment #37)
> The long term solution (FF 3.1) is to switch to using only official frozen
> interfaces.

In bug 256418 comment 43 you indicated that was the plan, which was over three years ago.  What went wrong then, and what can we do to prevent this from happening again?  Considering that 3.1 is rapidly approaching beta, this seems difficult at best, unless we act immediately.

As seems fairly clear from dbaron's responses, there are generally alternatives to private interfaces, and I'm sure we can provide a lot of those easily, if we are actually asked for help.  Where there is actually no safe way to access important information, we can certainly implement APIs to expose that data, if we know what consumers of the platform actually need.

Can you have your developers start by filing bug reports detailing what they need for which there isn't a frozen, or at least public, interface?  We can only give you what you need if you ask for it.
I checked in the "better fix" to CVS and GECKO190_20080827_RELBRANCH.
we did do it last time 3 years ago, 
we got rid of unfrozen interfaces.

then we started to improve quality of form filling, 
so we needed better coordinates of objects on the page.

and our developer apparently have started 
using unfrozen interfaces again to get these coordinates. 

then it was about getting info about AJAX navigation 
and they got it from private interfaces.
it is wrong that they came after easy pickings in the private world.

now we will do another pass and get rid of them again.
we are now preparing list of what we use of the private APIs and
we will get rid of them or will ask for replacement APIs at the IRC/forum.
I also would like to thank everybody here for helping us and
not releasing the FF 3.0.2 that would cause crashes.
We really appreciate it.

As a "thank you", we can give free licenses to RoboForm and GoodSync
to everybody involved. Just submit a "check/money order" order here
http://www.roboform.com/php/pums/rfprepay.php?lang=en&lic=default&currency=USD&dc=F29&snc=2
The notify me at vm (a-t) siber.com and I will activate this order for free.
Verified fix on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.2) Gecko/2008091620 Firefox/3.0.2.   Installed roboform on build 6, and used test:  
http://www.roboform.com/test.html.   No crashes.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Version: unspecified → 3.0 Branch
Flags: blocking1.9.0.2?
Tony, can you verify this with the nightly build for 1.9.0.4 as well since we, apparently, had to check this into both the release branch and the main one?
looks like roboform isnt compat with 3.0.4pre.  I'll come back to this once the 3.0.4 bits are built.
2 Tony.
Roboform extension has internal version check and it doesn't accept Firefox nightly builds.
To resolve this problem you should modify preference
general.useragent.extra.firefox = "Firefox/3.0.4pre".

I am checking Roboform almost everyday with nightly builds and it works properly.
(In reply to comment #46)
> I am checking Roboform almost everyday with nightly builds and it works
> properly.

I will mark this as verified then for 1.9.0.4. I'm glad that there are no issues.
Roboform finally is being implemented on frozen API. It will be in a beta stage in  a week.
We have found that almost everything required API exists in Gecko SDK.
But there is only one problem. 
We need nsIBaseWindow::GetParentNativeWindow() method.
Is it possible to implement this method in Gecko 1.9.1 SDK?
For example as exported dll function:
nsresult NS_DocShell_GetHWND(nsISupports *pDocShell, void **pHWND);
We really need it.
For what it's worth, nsIBaseWindow should really be frozen (and we have a bug on that)...  It would be fairly reasonable for you to treat it as frozen.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: