Closed Bug 1788173 Opened 2 years ago Closed 2 years ago

pending update prevents network connections and some GUI operations

Categories

(Core :: IPC, defect)

Firefox 104
defect

Tracking

()

VERIFIED FIXED
113 Branch
Tracking Status
firefox113 --- verified

People

(Reporter: david.balazic, Assigned: gerard-majax)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(6 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:104.0) Gecko/20100101 Firefox/104.0

Steps to reproduce:

I was running two profiles with Firefox v104

Suddenly, web pages did not load any more. As if the network were down. I checked with MS Edge and it worked.

I checked if there are any "modal" dialogs from firefox under the windows (a known issue, see bug 1676604 or bug 696976), but there were none.

I suspected there is an update on the way (I had a similar experience lately), so I exited both profiles, waited... as there were some Firefox.exe process still hanging around after al FF windows were closed. After like 10-20 seconds, the process disappeared. I restarted FF, it was now version 104.0.1

Other symptoms: while trying to load a web page (which did not work), I tried to open the about dialog with alt-f, left, up, enter, but the menus did not appear (the do in normal situations)

To make it clear: after restarting (the new version of) Firefox, everything worked again

Also: this PC uses "Microsoft Intune Management Extension" which did the Firefox update
Intune has a habit to make SW (partly) non-functional during the update process :-( (VirtualBox in my case)

Hi, are you still able to reproduce this? If yes, could you try to capture a http log?
What do you mean by "As if the network were down"? Is there any error page shown?
Thanks.

Flags: needinfo?(david.balazic)

(In reply to Kershaw Chang [:kershaw] from comment #2)

Hi, are you still able to reproduce this?

Can't test until the next update is released.

What do you mean by "As if the network were down"? Is there any error page shown?

No error page, the pages just appear to load, but never finish. Like in the related bug I mentioned. The pages try to load, but there is no progress at all. Maybe a timeout would be reported after a minute or two, but I did not wait that long, as it was obvious there is something wrong with firefox, so I restarted it.

Flags: needinfo?(david.balazic)

The Bugbug bot thinks this bug should belong to the 'Core::Networking' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Networking
Product: Firefox → Core

(In reply to Kershaw Chang [:kershaw] from comment #2)

Hi, are you still able to reproduce this? If yes, could you try to capture a http log?

It'd be really helpful if you can recreate this and record a http log.
Thanks.

Flags: needinfo?(david.balazic)

I created a http log, but am concerned about privacy issues.
Can I mark the attachment private or something like that?

Flags: needinfo?(david.balazic) → needinfo?(kershaw)

also, it is 90 MB big, should I compress it before uploading?

(In reply to David Balažic from comment #7)

also, it is 90 MB big, should I compress it before uploading?

Yes, please compress it.
If you have any privacy concern, please send the log to necko@mozilla.com. Thanks.

Flags: needinfo?(kershaw)

sent zipped http log to that email

This time I was running Firefox 104.0.1 and the update for 104.0.2 was in progress
I had two firefox profiles running when this happened (now and in the first report)

The log shows that some requests were cancelled with an unknown reason.
For example:

2022-09-11 14:55:02.485000 UTC - [Parent 29828: Main Thread]: D/nsHttp HttpChannelParent::RecvCancel [this=26284aec240]
2022-09-11 14:55:02.485000 UTC - [Parent 29828: Main Thread]: D/nsHttp HttpChannelParent::RecvCancel: [this=25f3a204c00] cancelled call in child process from script: https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.en_GB.y1MTJxmZ87s.es5.O/am=wOVu8H8JPwKAIcSR-e3Az4Rh4QmAAPYnA0AACDBisprW9ydKsuOHBAAgAO2sGMMYAHDNA9BxQACTAS4AAAAAAAAAAAAAAAAAAABgF5D8/d=1/exm=Cmvl0d,DL8jZe,E1P0kd,HXLjIb,J41knb,M25vPb,N35dyc,OIxRw,anc,dLIOJe,ebdd,f,gYOl6d,hkjXJ,i20jfd,it,l,lKrWxc,lLYctc,lSiOKd,m_i,ml,n73qwf,oRmHt,pA5mjb,pE92lb,puPi7e,spit,sps,t,uuoH9c,v2eEBc,xaQcye/ed=1/im=1/br=1/rs=AHGWq9C0oP0GF39mVM_6so1449a46ioejA/ee=cEt90b:ws9Tlc;U9MqLc:O9Wdw;lw1rzf:mDFAuf;a0qnvf:O18Eyf;iFQyKf:vfuNJf;nAFL3:NTMZac;oGtAuc:sOXFj;yxTchf:KUM7Z;qddgKe:xQtZb;wR5FRb:siKnQd;dIoSBb:SpsfSb;eBAeSb:zbML3c;NPKaK:PVlQOd;LBgRLc:XVMNvd;Y6au2e:H6lfpe;AoStPd:kKbhQb;wBjAMd:ONcH1e;XBqggf:fjv93c;zxjSOb:VJrhKe;BNeSrd:PLFcdb;m5fIF:nbXCtd;vSg7wf:qVdf8c;yR6u0d:oOUfr;JFpGO:k00cOd;O9G84b:VjR9rd/m=rn4kU,yWJZbc,PaBahd,MMhUM,zm225,o2ajQe,kRtote,Av2a7c,H6lfpe,kbPIy,igbF5,Trl7bc,cv,dFpypf,MigGy,jVZ0pe,rMQdJc,Sz7W7c,CTcde,NVcOs,PZhDZb,Aihl6c,J03Die:142:362
2022-09-11 14:55:02.485000 UTC - [Parent 29828: Main Thread]: D/nsHttp HttpAsyncAborter::AsyncAbort [this=26298917500 status=804b0002]

Hi David, could you try to download the latest nightly and try to make the log again?
The newer version of Firefox also logs the reason of cancellation, so we maybe we can have a idea of what's going on.
Thanks.

Severity: -- → S3
Flags: needinfo?(david.balazic)
Priority: -- → P2
See Also: → 1760580
Whiteboard: [necko-triaged]

I think this problem happens due to a "backgound" process (in this case "Microsoft Intune Management Extension") updating Firefox while it is running.

If I start two profiles, I can not trigger an update from the about dialog, as it says:
"Nightly is being updated by another instance"

Also MS Intune will not update the nightly, at least not when I want it.

Is there another way to do an update? While at least two Firefox profiles are running?

Flags: needinfo?(david.balazic)

This happened again, updating from 105.0.3 to 106.0

Description:

I was running two profiles open, several windows and tabs in both.

Suddenly, web pages did not load any more. Also ctrl+t opened a new tab, but ctrl+tab did not switch to them. Tab swittc does not work with mouse either (point and click on tab).

After exiting one profile, the other "came to life". There I opened the about dialog, which downloaded and update, then immediatelly again downloaded an update and finally showed the "Restart to update" button.

This happened again, updating to 106.0.3

This time only one profile was running.

Is there a reason you're not letting Firefox update itself?

It's company policy.

If Intune management is updating our files while we are running, I'm not sure there's much we can do...

Hey, I'm encountering the same behavior in Nightly every day since I switched from using one profile to 2 profiles. And it's not about not letting Firefox update, but rather that I do not realize there is an update until the browser stops loading any new page that I want to navigate to, and that happens twice a day since there are 2 Nightly builds each day.

The steps I do are:

  1. Double-click on the Firefox icon, and from profiler open profile 1 (work profile).
  2. Open profile 2 by clicking a link + profiler, as that's my everything else/personal profile, so all clicked links are opening in it, and I can separate them from my work tabs.

If I recall right, the hang happens on my work profile which is annoying as hell, cause sometimes it stops me to load Zoom pages that are connecting me to a meeting. So in order to fix it, I need to close both my profiles, watch in task manager until all Firefox processes are closed (~30s-1m), and then restart both my profiles.
Could be that updates will happen on the 2nd (last) profile opened and that's why the 1st profile hangs?

I'm on Windows 11 with Firefox Nightly as my default browser if it matters. Let me know if I can provide additional details.

Flags: needinfo?(mozilla)
Flags: needinfo?(kershaw)

Putting this over to the install team.

Flags: needinfo?(mozilla) → needinfo?(bytesized)

I suspect that these problems are basically duplicates of Bug 1705217 for David, and the very closely related Bug 1480452 for Paul. Except that, for some reason, the "Sorry. We just need to do one small thing to keep going" message isn't being shown.

It would help verify this if, when network connections start failing, you could check if Firefox has been updated by checking the version that Firefox thinks that it is (via about:support or the Help->About dialog) against the version of the file on the disk (right click the binary->Properties->Details->Product version). This is a bit trickier on Nightly because most Nightly updates don't change the major or minor version and I'm not actually sure how to figure out the build ID just by inspecting the files on the disk.

Work is underway on Bug 1480452 and Bug 1705217, so I'm not sure that there is much to additionally be done there. But it seems like a new bug that the "Sorry. We just need to do one small thing to keep going" message isn't being shown.

@gerard-majax I know that you have worked on build ID mismatch problems in the past. I was hoping maybe you would have some valuable debugging advice?

Flags: needinfo?(bytesized) → needinfo?(lissyx+mozillians)

(In reply to Robin Steuber (they/them) [:bytesized] from comment #19)

This is a bit trickier on Nightly because most Nightly updates don't change the major or minor version and I'm not actually sure how to figure out the build ID just by inspecting the files on the disk.

Ah, you can inspect application.ini to get the version and the build version, which can then be compared with the values in about:support.

If you are experiencing this problem, can you check whether those values match when it happens again?

Flags: needinfo?(poiegas)
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(david.balazic)

The verification we do is done at https://searchfox.org/mozilla-central/rev/a3a9112d4d73d1323eabbc7faa9937cd9aae6465/dom/base/nsFrameLoader.cpp#3723-3747, e.g. compare live libxul's buildID vs on disk's application.ini

And if they dont match, we would show about:restartrequired as per: https://searchfox.org/mozilla-central/rev/a3a9112d4d73d1323eabbc7faa9937cd9aae6465/dom/base/nsFrameLoader.cpp#3786-3807 and if for some reason buildID verification fails, we should still trigger the event that shows about:restartrequired

This looks like not a networking bug.

Component: Networking → Application Update
Flags: needinfo?(kershaw)
Product: Core → Toolkit

Hi, sorry for the late reply, but I had 1 week of medical leave. I've run into the issue again and did the checks in about:support and in the application.ini file.

The build ID in the opened profile that stopped loading pages is - 20230306094520
The build ID written in the ini file is - 20230306211718
The build ID from the other profile opened that is still loading pages is - 20230306211718

There's a clear difference, and I assume that if the greater number (after the date) means that it is a newer build, then the profile that stops working remains behind on an older version.
Is this going to change in the future after the fixes are done, or anyone that uses multiple profiles at the same time will be forced to restart the browser when an update occurs on one of the profiles?

Let me know if I can help in any other way.

Flags: needinfo?(poiegas)

(In reply to Paul Oiegas [:poiegas], PM for (Ecosystem QA, Services QA, WebCompat QA, Hubs QA teams) from comment #23)

The build ID in the opened profile that stopped loading pages is - 20230306094520
The build ID written in the ini file is - 20230306211718

Well it sounds like it really ought to be showing about:restartrequired but, for some reason, it isn't. I believe that the other changes that have been made regarding build id mismatches and about:restartrequired have been put in Core::IPC, so I'm going to move this bug there.

Severity: S3 → --
Component: Application Update → IPC
Priority: P2 → --
Product: Toolkit → Core

Paul, if the symptom is "nothing happens" and you dont get about:restartrequired or about:tabcrashed I worry this is unrelated to my work. As mentionned in comment 21, you should get one or the other: https://searchfox.org/mozilla-central/rev/416f31a51174620f04fc994d248b664b54517699/browser/base/content/tabbrowser.js#6009-6010

Evenn if IPC was hitting some race and thinks BuildID matches, you should have about:tabcrashed: https://searchfox.org/mozilla-central/rev/a3a9112d4d73d1323eabbc7faa9937cd9aae6465/dom/base/nsFrameLoader.cpp#3786, so maybe for some reason you dont even hit this code: https://searchfox.org/mozilla-central/rev/a3a9112d4d73d1323eabbc7faa9937cd9aae6465/dom/base/nsFrameLoader.cpp#3749 MaybeNotifyCrashed not called, or with a condition that makes it return early.

Flags: needinfo?(poiegas)
See Also: → 1821421
1. Start about:processes
2. rename xul.dll to Xxul.dll
3. copy new xul.dll
4. kill all Preallocated
5. about:processes shows no preallocated
6. open a new tab, try to browse (it works)
7. get about:restartrequired, close it
8. wait a bit
9. ghost Preallocated appears
10. Open a new tab, try to browse
11. observe indefinite spinner, nothing loads

Screenshot with ghost preallocated and tab trying to load google.com, spinner running indefinitively

Attached file repro.log

MOZ_LOG=Process:5 log when reproducing the screenshot situation

In case it matters, it happened again today.
I was running two profiles.

The one that had the problem (not loading pages):

about:support shows Build ID 20230227191043

C:\Program Files\Mozilla Firefox\application.ini has:
BuildID=20230309232128

Other profile that still works:

about:support shows Build ID 20230227191043

Flags: needinfo?(david.balazic)

(In reply to Alexandre LISSY :gerard-majax from comment #25)

Paul, if the symptom is "nothing happens" and you dont get about:restartrequired or about:tabcrashed I worry this is unrelated to my work. As mentionned in comment 21, you should get one or the other: https://searchfox.org/mozilla-central/rev/416f31a51174620f04fc994d248b664b54517699/browser/base/content/tabbrowser.js#6009-6010

Evenn if IPC was hitting some race and thinks BuildID matches, you should have about:tabcrashed: https://searchfox.org/mozilla-central/rev/a3a9112d4d73d1323eabbc7faa9937cd9aae6465/dom/base/nsFrameLoader.cpp#3786, so maybe for some reason you dont even hit this code: https://searchfox.org/mozilla-central/rev/a3a9112d4d73d1323eabbc7faa9937cd9aae6465/dom/base/nsFrameLoader.cpp#3749 MaybeNotifyCrashed not called, or with a condition that makes it return early.

Sadly I'm not a developer so I'm not sure I completely understand the code you mentioned. After looking over the code I am inclined to say that nothing happens in general. I am not receiving any page that asks me to restart, and neither do I encounter a browser crash/tab crash.
I did have a couple of crashes, in the last few days, but I think they were exceptions or results of extra steps that I did. In general, I do not encounter a crash when this happens.

Today I encountered this again and I have the same symptoms, the profile that was opened from the icon stops loading pages (ID 20230314161405), while the one opened through a link, continues to work (ID 20230315162649).
The only thing that signals that this happened, is the green dot on the "hamburger menu" button informing me that there is an update. That's how I know why my pages stop loading. And the dot only appears on the profile that stops working.
Other than that, there's nothing else, every page I try to load populates the address bar, but the page remains blank with no message on it.

Flags: needinfo?(poiegas)
Attached image Browser console

I'm attaching a screenshot of the Browser console from the profile that stops working, if it's of any help.
However, when I try to open a new page, nothing extra appears in the console, so probably these are all the logs recorded untill it happened.

I've tried also to open Dev Tools to see the network tab, but the browser does not allow that while on a page that does not load, either by key shortcut or from menu.

Also, here are my latest 3 crashes from March 08, 10 and 11.
https://crash-stats.mozilla.org/report/index/d53f18c9-f1ae-4a8b-8456-bbde20230308
https://crash-stats.mozilla.org/report/index/0ff3a51c-4e5a-4084-8209-b095e0230310
https://crash-stats.mozilla.org/report/index/b8187443-b6ee-4264-872b-3e4730230310

(In reply to Paul Oiegas [:poiegas], PM for (Ecosystem QA, Services QA, WebCompat QA, Hubs QA teams) from comment #30)

Other than that, there's nothing else, every page I try to load populates the address bar, but the page remains blank with no message on it.

With the loading spinner animating forever?

Can you try when you repro to have a look, on the failing profile, at about:processes and share a screenshot?

Flags: needinfo?(poiegas)
Flags: needinfo?(david.balazic)

I'd be curious if you get something similar to my screenshot in comment 27: either missing any preallocated process, or having some with weird memory usage (2MB when it should be ~20MB).

It could help also if you have a look at Process Explorer's output filtering on firefox.exe (make sure we see the PIDs), so we could identify if those preallocated are really existing on the system. If we get into a similar situation as my screenshot, it means we have a shot at reproducing and maybe better notifying the user of tabcrashed and/or restartrequired.

I've restarted the profiles after my comments, but I'll check next time when I reproduce it. I'll leave the NI so I won't forget.

Attached image Processes and PIDs

Here you go. Funny thing that I managed to reproduce this on both the profiles today, because somehow when I clicked a link on a 3rd party app, Fx tried to open a 3rd instance, and the update tried to happen on that 3rd instance, so I ended up with both the other 2 profiles already open dead.

But I managed to get you the data from one of the profiles, the first one that I realized it's not loading pages.

Flags: needinfo?(poiegas)

(In reply to Paul Oiegas [:poiegas], PM for (Ecosystem QA, Services QA, WebCompat QA, Hubs QA teams) from comment #35)

Created attachment 9324187 [details]
Processes and PIDs

Here you go. Funny thing that I managed to reproduce this on both the profiles today, because somehow when I clicked a link on a 3rd party app, Fx tried to open a 3rd instance, and the update tried to happen on that 3rd instance, so I ended up with both the other 2 profiles already open dead.

But I managed to get you the data from one of the profiles, the first one that I realized it's not loading pages.

Thanks, from the screenshot it looks like:

  • you have processes with very low "weir" memory usage
  • those are tab processes
  • they are not visible under the main process in Process Explorer so they likely are dead

This would suggest my repro is on-par, and we have process dying in a way we dont catch and show warning to the user.

Chang, you marked See also for bug 1760580 ; would a disappearing socket process explain the behavior that you hinted on comment 10 ?

Flags: needinfo?(kershaw)

(In reply to Alexandre LISSY :gerard-majax from comment #37)

Chang, you marked See also for bug 1760580 ; would a disappearing socket process explain the behavior that you hinted on comment 10 ?

No, socket process is not used for networking requests yet.
comment #10 only showed that some networking requests were cancelled, but we don't know why and whether the cancellation has any impact in this bug.

Flags: needinfo?(kershaw)
Assignee: nobody → lissyx+mozillians
Attachment #9324969 - Attachment description: WIP: Bug 1788173 - Force IPC channel error on timeout waiting for client connection → Bug 1788173 - Force IPC channel error on timeout waiting for client connection r?#ipc-reviewers
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Duplicate of this bug: 1711143
Pushed by alissy@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f72d594cc624 Force IPC channel error on timeout waiting for client connection r=ipc-reviewers,jld
Flags: needinfo?(david.balazic)

So the timeout change is landing, it should likely be in the next nightly. If it works as intended, what you should see is that when reproducing the issue, after ~10s, the new tab loading forever should stop and you get about:tabcrashed. We will know for sure the buildid when this hits mozilla-central, and you should probably have to wait a second nightly update to trigger the issue (since you need the "old running" version to be the one with those changes).

Please keep an eye on this and keep us informed :)

Flags: needinfo?(poiegas)
Flags: needinfo?(david.balazic)
Blocks: 1826795
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

Hi Alexandre, I think my issue was fixed through the message you were mentioning. Now, when I'm on 2 profiles, one updates, and one remains behind, the one that remains behind tries to load the page for a few seconds, and after the "Restart to Keep Using Nightly" message is displayed.

Let me know if you need any other details.

Flags: needinfo?(poiegas)

If the "a few seconds" is around 10 secs, that means yes, we nailed it with that fix :)

Flags: needinfo?(poiegas)

To be honest, I haven't counted :D, but If I remember well is under 10s.

Flags: needinfo?(poiegas)

Counted last time when it happened, it took like 6-7s on my PC. Not sure though if it is also hardware dependent besides networking.

With the latest update (to 113.0.1) I saw the "Firefox updating .. must restart" tab.
So I guess this is fixed.

Flags: needinfo?(david.balazic)

I'll mark this as verified since both the reporter and I have seen the fix working as it should.

Status: RESOLVED → VERIFIED

Just a note.

This feature works, but a bit weird. Today I got the update from 116.0.1 to 116.0.2 and obseverved different behaviors, apparently depending on the URL (this all happened after the update and before the restart):

  • gmail.com (page already loaded before the update)
    It shows a yellow popup (by Javascript in the web page) saying:

Not connected. Connecting in 28s… Try now

  • open a new tab, enter ijs.si into the URL bar, prints:

Gah. Your tab just crashed.
We can help!

Choose Restore This Tab to reload the page.

opens a new tab, after a few seconds saying:

Restart to Keep Using Firefox

An update to Firefox started in the background. You’ll need to restart to finish the update.

Your windows and tabs will be quickly restored, but private ones will not.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: