Open Bug 1667772 Opened 4 years ago Updated 2 years ago

"XML Parsing Error: undefined entity" when opening Nightly

Categories

(Toolkit :: Startup and Profile System, defect)

x86_64
Windows 10
defect

Tracking

()

People

(Reporter: saschanaz, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(3 files)

Attached image image.png

This prevents opening Nightly.

Attached image wrong icon

Not sure it's related, but the icon also incorrectly is an installer one.

Here's one try. From conversation on Slack, you seem to be able to create a new profile, and that's working.

  1. Open about:support and check the Startup Cache table
  2. Open that path in File Explorer, but move to the folder for your old profile (the broken one)
  3. Make a copy of the startupCache folder, then remove all content, and try to restart with your old profile

Does that change anything?

It's startupCache but the existing profile does not have that directory. Copy-pasting from the new one doesn't help.

Ah no, I went to the wrong directory, %LOCALAPPDATA%\Mozilla\Firefox\Profiles\ but I went %APPDATA%\Mozilla\Firefox\Profiles\ 👀

Okay, I can load my previous local build and it runs on my existing profile. So it is something in the latest build.

Let's see if someone has other ideas to debug this. Could you attach your raw data from about:support to the bug?

Loading the profile from the previous build did something, now the latest build just works. Hmm.

Did you make a copy of the profile folder before, by any chance? Just to see if there are differences.

I had no backup so this is after it's somehow fixed, but anyway...

While I had no backup, "IgnoreDiskCache": true looks interesting to me as this is false by default in the new one. (Edit: It's somehow now false and still works well)

Thank you Kagami! We are chasing this issue for a while and its very hard to get hold on a way to reproduce it.

My leading hypothesis at the moment (although not validated by any experiments) is that sometimes, for some reason we load one of the files as empty (zero-byte-long), and the error you are seeing seems to fit into it - we are loading browser.dtd as zero-bytes-long, which means that your browser.xhtml cannot apply DTD strings from that file and the result you see (Yellow Screen of Death) indicates that the very first string from that file could not be loaded.

(In reply to Kagami :saschanaz from comment #4)

Okay, I can load my previous local build and it runs on my existing profile. So it is something in the latest build.

That's a very important point! Thank you for testing it! It fits into the zero-byte-long hypothesis and indicates that the problem is not a profile (since you can load previous build with a current profile), but the build.

So, what might be happening is that you have a working build with a working profile, and then we perform a partial update which for some reason leaves your new build with a zero-byte-long browser.dtd.

Your build does not have any personal data (data is in profile), and may be crucial for unfolding what's going on. Would you feel comfortable zipping your build (not profile) and sending it to me for investigation? My email is zbraniecki at mozilla dot com.

If you prefer not to do that, I can try to debug via you remotely:

There are two omni.ja files in your build - one is for "browser" and another is for "toolkit". Your browser.xhtml should live in the browser one, so I'd like you to zero on the one that would be in a path like /Applications/Firefox/Resources/browser/omni.ja[1].

This is a zip file, can you unzip it (unzip omni.ja -d ./unpacked-omni) and then check the file ./unpacked-omni/chrome/en-US/locale/browser/browser.dtd? Does it look normal? Is it empty?

[1] There should be another omni.ja one level up, that's the toolkit one.

mhowell: what are the best logs to log out to determine if this is a failed (partial) update? I think that these are *.log files in the update directory itself.

Kagami: can you find the update directory (listed in about:support) and try to find as many *.log files as possible? Thanks!

Flags: needinfo?(mhowell)
Flags: needinfo?(krosylight)

Oops, I already rebased+rebuilded so it's now not the build I mentioned earlier. Would it be still enough? BTW, do you mean the failing build or the succeeding build? The failing one was just from the public nightly channel.

I have no omni.ja in my local build (probably because it's a debug build), but obj-x86_64-pc-mingw32\dist\bin\browser\chrome\en-US\locale\browser\browser.dtd looks normal with its 12 KB size.

Flags: needinfo?(krosylight)

(In reply to Nick Alexander :nalexander [he/him] from comment #12)

mhowell: what are the best logs to log out to determine if this is a failed (partial) update? I think that these are *.log files in the update directory itself.

Yes, the update directory is the right place; from the directory that the button in about:support opens in the build that was broken, go into the subdirectory called updates and you should see last-update.log and backup-update.log; those files should tell us if anything went wrong with an application update.

Flags: needinfo?(mhowell)

Kagami: can you find the update directory (listed in about:support) and try to find as many *.log files as possible? Thanks!

There is a big file with 61MB size, I'm sending it to your email address instead.

(In reply to Kagami :saschanaz from comment #15)

Kagami: can you find the update directory (listed in about:support) and try to find as many *.log files as possible? Thanks!

There is a big file with 61MB size, I'm sending it to your email address instead.

Thanks -- I see it. There are 28 update directories, which just means that you've had many versions of Firefox installed at various times. I will try to figure out which was the "failing one [...] from the public nightly channel"... and I think it's C:\ProgramData\Mozilla\updates\308046B0AF4A39CB, which corresponds to C:\Program Files\Mozilla Firefox.

backup-update.log looks healthy. The only thing interesting is:

...
PREPARE ADD defaultagent_localized.ini
...

so we have a new INI file, which nominally feels connected to l10n/YSOD. But it's hard to see how it could interact.

last-update.log looks healthy:

Performing a replace request
PATCH DIRECTORY C:\ProgramData\Mozilla\updates\308046B0AF4A39CB\updates\0
INSTALLATION DIRECTORY C:\Program Files\Mozilla Firefox
WORKING DIRECTORY C:\Program Files\Mozilla Firefox\updated
Begin moving destDir (C:\Program Files\Mozilla Firefox) to tmpDir (C:\Program Files\Mozilla Firefox.bak)
rename_file: proceeding to rename the directory
Begin moving newDir (C:\Program Files\Mozilla Firefox.bak/updated) to destDir (C:\Program Files\Mozilla Firefox)
rename_file: proceeding to rename the directory
Now, remove the tmpDir
ensure_remove: failed to remove file: C:\Program Files\Mozilla Firefox.bak/updater.exe, rv: -1, err: 13
ensure_remove_recursive: unable to remove directory: C:\Program Files\Mozilla Firefox.bak, rv: -1, err: 41
Removing tmpDir failed, err: -1
remove_recursive_on_reboot: file will be removed on OS reboot: C:\Program Files\Mozilla Firefox\tobedeleted\rep8f9391ff-8d10-4d74-936f-0c7737cbc85e
succeeded
calling QuitProgressUI

That just means the updater couldn't delete itself while running; I think it's perfectly normal.

There's nothing in the update logs to suggest a bad update, but the logs are not so rich that we can rule that situation out.

Nightly channel installs to C:\Program Files\Firefox Nightly, and this one was failing. Sorry for confusing you 🙏

Bouncing needinfo for comment #17, and tentatively moving to the updater component - we can move it elsewhere if we narrow down the problem is elsewhere.

Component: General → Application Update
Flags: needinfo?(nalexander)
Product: Firefox → Toolkit

For what it's worth, I don't think this is an installer problem.

Kagami was able to create a new profile with the same (supposedly) broken build, and it was working. If the executable was damaged, that wouldn't have been possible, would it? Everything seems to point to a cache problem, or something else broken in the profile that was "fixed" by the other build.

(In reply to Francesco Lodolo [:flod] from comment #19)

For what it's worth, I don't think this is an installer problem.

I agree: per #c16, and due to the other information below, this looks like (yet more) startup cache/omnijar cache interaction.

Kagami was able to create a new profile with the same (supposedly) broken build, and it was working. If the executable was damaged, that wouldn't have been possible, would it? Everything seems to point to a cache problem, or something else broken in the profile that was "fixed" by the other build.

I wonder if we should start maintaining a log of the startup cache invalidations that we do in the wild, so that we have some record of how this transient behaviour occurs? I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1668051 to discuss.

Kagami was able to create a new profile

To be clear, I used an empty profile that I created earlier for testing.

(In reply to Kagami :saschanaz from comment #21)

Kagami was able to create a new profile

To be clear, I used an empty profile that I created earlier for testing.

OK, that's not what I understood. Having said that, I think comment 19 still stands (if the build is broken, even using an existing profile would break).

Sadly, this seems to have ground out. My belief is that the update mechanism was not involved; this sounds much more like further fallout from the startup caching work, perhaps interacting with XPI files in ways that we don't understand (similar, perhaps, to Bug 1656515). But we just don't know at this point :(

Clearing NI since I answered in #c20.

Flags: needinfo?(nalexander)

The current theory is that these types of errors are not due to faulty updates (or installs) but instead due to I/O errors of some sort when reading the omnijar. Refiling as such so that it gets triaged next to the YSOD meta ticket (Bug 1675823).

Component: Application Update → XML
Product: Toolkit → Core

(not really an XML issue. If XML parser gets bogus data, it is expected to fail.)

Component: XML → Startup and Profile System
Product: Core → Toolkit

The severity field is not set for this bug.
:mossop, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dtownsend)
Severity: -- → S2
Flags: needinfo?(dtownsend)

Using developer edition. This problem has re-surfaced for me except reported ERL is 743, stops use of browser. Have not changed any settings since. What info is required?

What info is required?

As much detail as possible about what you did right before it started happening, and if you get it to "fix itself" what steps led to that.

We're struggling to create a so called "steps to reproduce" and we're hunting it down based on vague descriptions of what people do that leads to the problem but at the moment we don't even know if it happens always after an update, or is the update completely unrelated.

Ok. My background includes php/mysql/frontend dev, more recently cybersecurity.

I also have a LOT of bookmarks in the dev edition, and would like to extract these back to beta version.

The problem with "what was I doing before the error message" is:

  1. I was doing a lot
    webex
    jetbrains/php
    jetbrains/pycharm/mysql-connector/hashlib
    Dragon
    Word
    excel
    virtual box/windows/ubuntu/kali
    git
    the problems with Virt Box inhabitants wiping IP settings serendipitously
    the problems with Dell BSD DRIVER_POWER_STATE_FAILURE with no real resolution (Nvidia p620 perhaps) serendipitously
    etc

  2. the updates are frequent
    and, frankly, I have been impressed with how well a dev edition has operated without complaint with 7 days, 18 hours per day use for a v long time.

Given direction, I can find the current version/date, and I assume you know the xml file involved - does it change frequently?

I assume you know the xml file involved - does it change frequently?

The problem is not related to the file in question. It is a red herring.
The correct file loaded in certain, unknown to us, circumstances, loads incomplete or empty and results in the XML error.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: