Closed Bug 701944 Opened 13 years ago Closed 12 years ago

need warning UI for users who run into bug 691847 that encourages a fresh install

Categories

(Firefox :: Installer, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: asa, Unassigned)

Details

Attachments

(3 files, 2 obsolete files)

(filing this bug on behalf of the 8.0.1 chemspill team)

We're crashing on startup when users use Windows System Restore and end up on a previous Firefox version.  See bug 691847.

We want to be able to identify that problem before a startup crash and raise a dialog that says "We're sorry. It appears Firefox is broken. Please click on the link below to download and install a new Firefox."
need L10N and UX on this to verify the string.
Basically, we can't get the string localized at this point, mostly because there's nothing hooked up for the release repositories.

But as we have l10n-merge on for the release builds anyway, adding this in English shouldn't require tweaking the patch that does that, if we want to have it on central etc, too.

This is a case where I excempt string freeze, as the UX is clearly worse without this patch than with this patch and the string showing up in English. Being a distinct piece of UI also helps, as it doesn't mix languages.

Once we land this, we should announce this and the rationale in .l10n.
(In reply to Axel Hecht [:Pike] from comment #2)
> Basically, we can't get the string localized at this point, mostly because
> there's nothing hooked up for the release repositories.

Sorry I don't know all of the localization context to fully understand your comment. Are you saying that we can't get this localized because the string is not yet available? Or that we won't be able to localize this string ever?
We're not going to be able to localize this string for the chem-spill 8.0.1, and we'll have smaller coverage for 9, and possibly even 10.
I'm a little confused. If users are crashing on startup, they won't be able to update to 8.0.1 (or whatever version we add this to) without a manual update, no?

From reading bug 691847 I don't understand why a prompt would be needed... Doesn't that bug fix the issue? If not, can't Firefox flash caches or whatever itself instead of asking the user for a manual reinstall?
(In reply to Justin Dolske [:Dolske] from comment #5)
> I'm a little confused. If users are crashing on startup, they won't be able
> to update to 8.0.1 (or whatever version we add this to) without a manual
> update, no?

This will not fix an 8.0.1 system restored to 7, but will prevent a similar occurrence if people do a system restore from 9.0 to 8.0.1 in the future.

> From reading bug 691847 I don't understand why a prompt would be needed...
> Doesn't that bug fix the issue? If not, can't Firefox flash caches or
> whatever itself instead of asking the user for a manual reinstall?

From our read of the fix being worked on in 691847 earlier today, it did not seem to resolve the issue. Perhaps we were mistaken, but nobody has yet corrected us.

Jeff - could you clarify if 691847 will resolve the issue of 9 downgraded to 8.0.1?
(In reply to Asa Dotzler [:asa] from comment #0)
> We want to be able to identify that problem before a startup crash and raise
> a dialog that says "We're sorry. It appears Firefox is broken. Please click
> on the link below to download and install a new Firefox."

PS: If Firefox is broken enough to not start up, clicking on a link won't work, right? Need a plain text url to copy and paste into IE?
(In reply to Axel Hecht [:Pike] from comment #7)
> (In reply to Asa Dotzler [:asa] from comment #0)
> > We want to be able to identify that problem before a startup crash and raise
> > a dialog that says "We're sorry. It appears Firefox is broken. Please click
> > on the link below to download and install a new Firefox."
> 
> PS: If Firefox is broken enough to not start up, clicking on a link won't
> work, right? Need a plain text url to copy and paste into IE?

Thanks for bringing this up - this was discussed today but didn't make it into the bug. We'd like to explore the possibility of forcing the link to be opened in IE (as opposed to the default browser, which may very well be Firefox), since this issue is Windows specific for now.
I think it will resolve the issue as far as JS serialization compatibility goes.  However, it will not address the issue as far as all the other contents of omni.jar go -- XUL, CSS, images, &c.  The new bytecode version allows you to detect that your omni.jar is incompatible.  But it can't possibly do anything about all the other resources in omni.jar being wrong.
(In reply to Jeff Walden (remove +bmo to email) from comment #9)
> I think it will resolve the issue as far as JS serialization compatibility
> goes.  However, it will not address the issue as far as all the other
> contents of omni.jar go -- XUL, CSS, images, &c.  The new bytecode version
> allows you to detect that your omni.jar is incompatible.  But it can't
> possibly do anything about all the other resources in omni.jar being wrong.

I understand now. We need to discuss simple low risk methods of determining whether an omni.jar file is out of sync.
I've started looking into this, but I should make a strong disclaimer: once the JS engine does a bytecode version check as bug 691847's patch has it do, the issue is more the provenance of the JS component loader, the JS subscript loader, and the startup cache.  Someone who knows that code would be a much better point man on this, and would be much more knowledgeable about any issues.

That said, I can follow control flow and call graphs with MXR as well as the next man, and I think I might have at least a plausible start at what an implementation would look like.  It appears to me that the right place to put a precautionary check is probably somewhere in js/xpconnect/loader/mozJSLoaderUtils.cpp in ReadCachedScript.  (Right at startup cache initialization would be better, but at that point there's no JSContext to use to check the bytecode version, and spinning one up using JSAPI primitives would do things like touch per-thread data and other things that seem pretty risky.)  Dumping this at the start of that method would -- I think -- maybe -- do the job.  I am making some assumptions here that should really really really be verified by a startupcache/omnijar person:

  static bool haveDoneBadBytecodeCheck = false;
  if (!haveDoneBadBytecodeCheck) {
    haveDoneBadBytecodeCheck = true;

    JSScript *dummy = NULL;
    NS_NAMED_LITERAL_CSTRING(xpcomUtils, "jsloader/resource/gre/modules/XPCOMUtils.jsm");
    if (!ReadCachedScript(cache, xpcomUtils, cx, &dummy))
      FailWithReinstallWarningAndDie();
  }

I'm using XPCOMUtils.jsm only because when I debugged into a crash with MSVC, that seemed to be the script that was dodgy, so I'm guessing I can "rely" on it always being there.  I'm also manually PathifyURI'ing it because creating a URI for it seems like unwanted chance for error.  (I totally cargo-culted the pathified URI from a comment in StartupCacheUtils.cpp.)

I don't know if ReadCachedScript is as early as things go awry -- if other non-script stuff gets read out of the cache earlier, its contents could also cause things to go awry.  But that's a rather speculative fear, based mostly on my not knowing the first thing about the startup cache or about omnijar.

Anyway.  There's a start at it.  But you really don't want me driving this aspect of it if at all possible.
Er, that would be |if (!NS_FAILED(ReadCachedScript(...)))|, probably.
I just noticed that if I perform the steps to reproduce using upgrading from 6.0.1 to 8.0, I don't even get a crash. The process doesn't seem to be launched. Could this prevent fx from showing anything at all, and does it make this fix a little more complex?
Yes, all bets are off if we're mixing and matching random versions of omni.jar and binary components. Implementing a generic defense against pieces of the install being corrupted is error-prone and generally not something we should be considering in a chemspill, I think.

If we really need to mitigate the impact of this scenario in a chemspill, bug 701875 might be the more reasonable patch to take.
(In reply to Asa Dotzler [:asa] from comment #0)
> We want to be able to identify that problem before a startup crash and raise
> a dialog that says "We're sorry. It appears Firefox is broken. Please click
> on the link below to download and install a new Firefox."

As long as we make sure to launch this with IE (on Windows) and Safari (should this ever apply to Mac), this looks good to me.
This probably needs security review as well.  

Setting precedence with a dialog with this kind of message, and redirecting users to download software, sounds pretty scary.   

It's likely to be spoofed in the future and lead to installation of rouge software.  

If we can think of ways that allow users to authenticate the source of the message and the source of download package we should be trying to do that.
Attached patch Check buildid (obsolete) — Splinter Review
This patch does the checking part but not the UI/notification part. I haven't tested whether it works properly after l10n repacking. It embeds the buildid into the omnijar and checks the buildid right after opening the omnijar and before any real data is read from the omnijar.
Oh, and this patch also some additional packaging parts to not break Android. I can test and polish this part if someone can put together the windows dialog part.
From reading everything over, it appears this has always been a problem in our update process, we just happened to pick a really bad day to do an update. Now that day has passed, so presumably the problem for the rest of the pre-update users reverts back to what it was previously, which I think we have established was a rare problem. Also with bug 701875 fixed, it sounds like this will never be a problem again. So I’m wondering if people still feel this “bug” needs fixing? Seems like it may not.
We need to do a lot more work to make Firefox survive a system restore and it's something we're going to have to roll out over a couple of releases.  The bugs we have on file don't cover enough and we're likely to end up with a frankeninstall which is - at least in my opinion - worse than a message that tells someone how to get back to a good state.

Our hope was that a warning would at least give users a chance to know what to do in the 8 -> 9 transition and however many releases it takes us to get to a full fix.

So this is tactical, but very useful for sure.

And so far this isn't a rare event, it was just hidden from view since we didn't have data on it.  The rate of crash/failure per upgrade doesn't seem to be decreasing much right now, but we'll know more tomorrow based on today's experiments.
I would really like bug 701875 to be the correct fix, but I'm not sure that we yet understand what regressions it may cause. For instance, does renaming omni.jar affect builds in releng or possibly add-on compatibility?

Until then, I'd like to continue evaluating the risk of this solution to leave our options open.
No longer blocks: 702045
Attached patch dialog patch v.1 (obsolete) — Splinter Review
This is a starting point. I've added a dialog template to the widget.rc which embeds in xul.dll plus the code to display it. Additional localizations can also be defined here as well, or we can load strings dynamically from some where and set them via code. This doesn't have the download link in it yet, which will require a bit of string/click trickery.
(In reply to Alex Keybl [:akeybl] from comment #21)
> I would really like bug 701875 to be the correct fix, but I'm not sure that
> we yet understand what regressions it may cause. For instance, does renaming
> omni.jar affect builds in releng or possibly add-on compatibility?
> 

I do not expect any regressions from renaming omni.jar.
A couple open questions - 

1) is this fx specific, or will we need to break up resources based on the app being built?

2) localization of strings - how we handle it has an impact on how we create and display the dialog. Can we open up string bundles here or is that functionality not working?
As per face-to-face talk with legneato, this is not on the critical path for 8.0.x.

For a real UX, he had more comments, but I'll let him chime in here in case there's more back channel dialog.

For the strings, they should probably be next or in updater.ini, and read from that ini file in a similar fashion to what the updater does, possibly with en-US fallbacks hard-coded in case that file is corrupt. Other fallbacks are not required.
Assignee: nobody → jmathies
Attachment #574089 - Attachment is obsolete: true
Attached patch base patch v.1Splinter Review
Attachment #574017 - Attachment is obsolete: true
Attached patch win patch v.1Splinter Review
Attached image dialog
Before I start filling in on the string stuff, we need to figure out what we want this dialog to look like, and what we want it to say. This is just a rough cut on what we might put in here. One caveat, if we do add linked text to launch IE, it should be stand alone in the dialog and not embedded in a paragraph of text. Static text controls are rectangular in shape and don't overlap well, so breaking these up from other text makes the coding much simpler.

I can also add an icon if we want which would need to be embedded in xul or the exe, and I can bold/italics/change font sizes similar to what we do on the crash reporter dialog.

I also think someone mentioned this would need a security review. I'll kick that off once we have a better idea on look and feel.
Attachment #574446 - Flags: ui-review?(limi)
We're no longer considering this for 8.0.1 - we've decided any of the options that mitigate bug 691847 are too high risk for a chemspill.

Rob is going to lead the effort on FF9 code changes associated with system restores. Please sync up with him to see if this option is still on the table.

Thanks for putting together the work so far!
Comment on attachment 574446 [details]
dialog

(In reply to Alex Keybl [:akeybl] from comment #29)
> We're no longer considering this for 8.0.1 - we've decided any of the
> options that mitigate bug 691847 are too high risk for a chemspill.
> 
> Rob is going to lead the effort on FF9 code changes associated with system
> restores. Please sync up with him to see if this option is still on the
> table.
> 
> Thanks for putting together the work so far!

Ok, putting additional development on hold until those decisions get made.
Attachment #574446 - Flags: ui-review?(limi)
Assignee: jmathies → nobody
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: