Closed Bug 896276 Opened 11 years ago Closed 11 years ago

"It looks like you haven't started Nightly in a while ..." message appears when it shouldn't

Categories

(Firefox :: General, defect)

25 Branch
x86_64
Windows 8
defect
Not set
normal

Tracking

()

VERIFIED FIXED
Firefox 25
Tracking Status
firefox25 + verified

People

(Reporter: streetwolf52, Assigned: dao)

References

Details

(Keywords: dataloss, regression)

Attachments

(3 files)

Attached image cfi.png
User Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:25.0) Gecko/20130721 Firefox/25.0 (Nightly/Aurora)
Build ID: 20130721030203

Steps to reproduce:

Start Fx25
Takes a few tries.


Actual results:

Receive the attached notification message at the bottom of the Fx window:




Expected results:

No notification.
I get this with a new profile and in safe mode.  Not all the time but even one time is too many IMO. What controls this?  Can I turn it off?
Component: Untriaged → General
It should only be shown after 60 days the profile was not used, but for a new profile it should surely not appear.
Looking at the code the only viable case seems to be if Services.appinfo.replacedLockTime has a very wrong value in the past or the process start up time is in the future.

Which steps did you follow exactly, created a new profile and launched firefox continuously until the notification appeared?
The message appeared within a couple of restarts o Fx after a new profile. I tried to discover a pattern but couldn't do so. I don't seem to get them as often now.  Maybe a couple of times a day.

If Services.appinfo.replacedLockTime is a pref I don't have it.
I just got the message again after restarting Fx about 4 times.
so, **when you get that message**, could you please open the Browser Console (from DevTools) and execute these 2 commands in the bottom command prompt:

Services.appinfo.replacedLockTime
and
Services.startup.getStartupInfo().process

This should print 2 large numbers in the console, please paste them here (indicating which number is for each command)
Services.appinfo.replacedLockTime
[07:50:15.906] 1374504548542

Services.startup.getStartupInfo().process
[07:51:04.273] [object Date]

I get the same result for the second one even when I don't have the reset message.

I'm not sure if it's related but the console is repeatedly displaying this message:

--
[07:52:15.351] [Exception... "'you cannot set the current.totalTime pref to the number -2698473446, as number pref values must be in the signed 32-bit integer range -(2^31-1) to 2^31-1.  To store numbers outside that range, store them as strings.' when calling method: [nsIObserver::observe]"  nsresult: "0x8057001e (NS_ERROR_XPC_JS_THREW_STRING)"  location: "native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0"  data: no]
I'll take this when I have a chance.

(In reply to Gary [:streetwolf] from comment #6)
> Services.appinfo.replacedLockTime
> [07:50:15.906] 1374504548542

What timezone are you in? Are you running this profile on multiple computers?

> Services.startup.getStartupInfo().process
> [07:51:04.273] [object Date]
> 
> I get the same result for the second one even when I don't have the reset
> message.

Can you add ".toString()" to the end: Services.startup.getStartupInfo().process.toString()
Assignee: nobody → mnoorenberghe+bmo
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Just noticed from entering 'Services.startup.getStartupInfo().process.toString()' I am set to Pacific Standard Time in Windows.  I set my timezone to EST and restarted Fx.  Now it displays EST.  Was this the cause of my problem?
Still getting the reset message even after setting my TZ to EDT.  In fact I get it more often now.

I don't run my profile on multiple computers

[19:40:07.425] Services.startup.getStartupInfo().process.toString()
[19:40:07.427] "Sun Jan 25 2099 03:04:50 GMT-0500 (Eastern Daylight Time)"
This should be considered a regression, and somehow a priority. If we suggest to reset profiles to users who don't need that, they may suffer dataloss without any benefit.
If we can't figure the cause and hit rate, we should disable the notification before release, imo.

(In reply to Gary [:streetwolf] from comment #9)
> Still getting the reset message even after setting my TZ to EDT.  In fact I
> get it more often now.
> 
> I don't run my profile on multiple computers
> 
> [19:40:07.425] Services.startup.getStartupInfo().process.toString()
> [19:40:07.427] "Sun Jan 25 2099 03:04:50 GMT-0500 (Eastern Daylight Time)"

here is the issue, process start up time in the future, now the funny thing will be to figure what's causing that.
Notice I found a bug related to the process start up time in bug 896025, but that bug may only cause that time to be in the past. So this looks like a different bug.
Didn't even notice the 2099 year.  All that time and we are still at Fx25 :-)
I just rebooted my machine and now I get:

[19:53:08.010] Services.startup.getStartupInfo().process.toString()
[19:53:08.012] "Mon Jul 22 2013 19:52:29 GMT-0400 (Eastern Standard Time)"
Jut got the reset message and my year changed again:

[19:56:38.617] Services.startup.getStartupInfo().process.toString()
[19:56:38.618] "Sun Jan 25 2099 04:00:25 GMT-0500 (Eastern Daylight Time)"
(In reply to Marco Bonardo [:mak] from comment #10)
> This should be considered a regression, and somehow a priority.

I can't prioritize this above Australis but I will definitely look at this as we get closer to Aurora. If you want to take this then feel free. At this point, nothing jumps out in the code so I think more diagnosis with Gary is needed to pinpoint the cause. Doing a quick bugzilla search showed bug 164070 so I wonder if this is 2099 date is from calling some date function with -1 or some other negative number?

(In reply to Gary [:streetwolf] from comment #14)
> Jut got the reset message and my year changed again:
> 
> [19:56:38.617] Services.startup.getStartupInfo().process.toString()
> [19:56:38.618] "Sun Jan 25 2099 04:00:25 GMT-0500 (Eastern Daylight Time)"

How are you restarting the browser? Does your computer's clock show the correct date? Can you reproduce this in safe mode? Thanks for your help.
Blocks: 498181
No longer blocks: 888560
(In reply to Matthew N. [:MattN] from comment #15)
> I can't prioritize this above Australis but I will definitely look at this
> as we get closer to Aurora.

Sure, I just said I think we should have at least a workaround before release (we could just check the date is not in the future, for example).

(In reply to Gary [:streetwolf] from comment #14)
> [19:56:38.618] "Sun Jan 25 2099 04:00:25 GMT-0500 (Eastern Daylight Time)"

I wonder if there's a bug in the ticks to milliseconds conversion, or viceversa.
Which cpu do you have? Could you please tell us model and frequency?
Could be unrelated, but the most information we have the better we can investigate.
Yes it does happen in safe mode.  Here's my machines specs:

Intel Core i7 Quad 3770K @ 5.0Ghz[b]|[/b]ASUS P8Z77-V Deluxe[b]|[/b]Corsair 1050W PSU[b]|[/b]Corsair H100 Liquid CPU Cooler[b]|[/b]16GB RAM @ 2400 Mhz[b]|[/b]Sapphire HD7970 3GB OC Edition[b]|[/b]Dual ASUS PA246Q IPS 24" LCD[b]|[/b]Samsung SSD 830 256GB[b]|[/b]1-2TB Seagate SATA3, 2-3TB Seagate SATA3 in RAID 0 [b]|[/b]Windows 8 Pro x64
Computer clock shows the correct date.  I've exited Fx and then ran it from a shortcut s well s doing a restart within Fx.  Both produce the reset message.

Keep in mind this doesn't happen 100% of the time.  But when it does happen I get the 2099 year until I exit out of the message then it returns to 2013 until the next time it happens again.
I might be on to something.  I noticed that I didn't have a Metro tile on my Metro start screen, so on a whim I went into RegEdit and deleted these keys.

[HKEY_CURRENT_USER\Software\Mozilla\Firefox\Metro]
[HKEY_CURRENT_USER\Software\Mozilla\Firefox\Metro\Prefs]
[HKEY_CURRENT_USER\Software\Mozilla\Firefox\Metro\Prefs\128]
"app.update.auto"="true"
"app.update.enabled"="true"
"app.update.service.enabled"="true"
"app.update.metro.enabled"="true"
[HKEY_CURRENT_USER\Software\Mozilla\Firefox\Metro\Prefs\32]
[HKEY_CURRENT_USER\Software\Mozilla\Firefox\Metro\Prefs\64]

After I started up Fx the keys were created again and I now have the Fx Metro tile.

What's interesting is that I haven't had a reset message since doing this.  The keys above are the new ones, I didn't save a copy of the old ones.  I don't think they were different but I'm not certain.  Stay tuned.
Oh well.  I went a long time without a reset message but finally just got one.
(In reply to Gary [:streetwolf] from comment #17)
> Yes it does happen in safe mode.  Here's my machines specs:
> 
> Intel Core i7 Quad 3770K @ 5.0Ghz

This is a quite high overclock, if I'm not wrong. May you try working for a couple days closed to the cpu spec, like 3.9 or 4.0? Since the use the cpu ticks to calculate the time, I wonder if there may be an overflow somewhere regarding that.
I'm thinking the i7 has turbo clock, so in some situations it may be higher than 5GHz, we may be doing an overflowing calculation in such situations (the code uses LONGLONG and does quite some multiplications), that would explain the random behavior.
I'll lower my speed and see what happens even though I'm not having any problems with other applications and programs.  I still would go back to 5Ghz so perhaps you could come up with another solution to this problem.  Maybe I've exposed a bug that you never figured on.
It's possible there's a real bug, the experiment is useful to confirm there's a relation.
I went down to 4Ghz and ran into the problem.  I don't get the message as often as I did even running at 5Ghz.  Also, I have my UEFI BIOS setup where my CPU speed stays low unless I really need it. I enable EIST and C1E CPU mode to accomplish this. I hardly think starting Fx would require the maximum speed even if that was the cause of the messages.

Turbo mode is off so I never go above 5Gz.
Hmm.. as I'm looking at the code at [1] (not that I would fully understand) it seems like you are using TimeStamp for an absolute wall clock time.  That is not correct.

I think it's like comparing apples and grapes when comparing last access to profile and QueryPerformanceCounter value.

The code is simply wrong.  This bug is not coming from an issue in TimeStamp, but from wrong use of TimeStamp, IMO.


[1] http://hg.mozilla.org/mozilla-central/annotate/b3fcd828cadc/toolkit/components/startup/nsAppStartup.cpp#l677
Nathan & Gabriele: Could you take a look at comment 26 about nsAppStartup::GetStartupInfo?
Assignee: mnoorenberghe+bmo → nobody
Status: ASSIGNED → NEW
Component: General → Startup and Profile System
Flags: needinfo?(nfroyd)
Flags: needinfo?(gsvelto)
Product: Firefox → Toolkit
I upped the Vcore of my CPU a little bit and I haven't had a single reset message all day.  I'll monitor things for another day and if I don't get any messages I think I found the fix for me anyway.

Strange though that doing a date calculation requires more voltage from my CPU than more CPU intensive programs I use.  Maybe this still exposes a bug of sorts.
As explained in comment 26, there's a misuse of the TimeStamp API for Services.startup.getStartupInfo().process, it has been intended only to measure intervals in a session, as diff of TimeStamps, not as an absolute time value comparable across sessions. So there's definitely a bug, your changes may just be wallpapering it.
Services.appinfo.replacedLockTime is instead correct, it's a common PRTime got from the file lock lastModified.
(In reply to Honza Bambas (:mayhemer) from comment #26)
> Hmm.. as I'm looking at the code at [1] (not that I would fully understand)
> it seems like you are using TimeStamp for an absolute wall clock time.  That
> is not correct.

Unfortunately we had to opt for the lesser of two evils: this information is used both to generate differences (for which we need TimeStamp) and absolute values (for which we need PR_Now() or something similar). Basically this code serves two masters and its hard to make them both happy at the same time.

All the related discussion is in bug 793735 but in a nutshell the problem is the following: we had a number of intermittent failures which were caused by certain events in the startup timeline apparently happening in reverse order (e.g. the main() timestamp appearing before the process creation one); this was due to using different time sources for the different events and to a lesser extent by the inherent instability of PR_Now() (due to DST and NTP adjustments). So we decided to switch all measures to TimeStamp this however introduced the problem of the absolute-time consumers; since I didn't want to break the existing interface we decided to translate intervals into absolute time by fixing a starting point. This might obviously yield wrong results if the time is adjusted afterwards but no more wrong than what would appear with the existing code.

> I think it's like comparing apples and grapes when comparing last access to
> profile and QueryPerformanceCounter value.
> 
> The code is simply wrong.  This bug is not coming from an issue in
> TimeStamp, but from wrong use of TimeStamp, IMO.

I think the problem is different here: the wrong timestamp is the process creation timestamp which is not sampled (as it's impossible to sample process creation) but synthesized; the relevant code is here:

https://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp_windows.cpp#610

Note that we sanitize the computed process creation timestamp by comparing it with the earliest "real" timestamp we could take and replace it if it looks like it happened after it. Taking comment 28 into account then my guess is that the first timestamp taken is wrong and the valid process creation timestamp is replaced with it (in his case the TSC should be part of the uncore logic and live in a different voltage zone compared to the cores; so it might be possible that overclocking is messing up its value). To verify if this is the case it might be enough to check for telemetry errors as the above code records one if it detects a wrong process startup time.
I think Gabriele's answer addresses what needinfo? was used for.
Flags: needinfo?(nfroyd)
Flags: needinfo?(gsvelto)
So I haven't had a reset message in two days after I upped my Vcore. I wanted to test something out on Birch and lo and behold the message came back with the same Vcore. Wonder what's up with this?
(In reply to Gary [:streetwolf] from comment #32)
> So I haven't had a reset message in two days after I upped my Vcore. I
> wanted to test something out on Birch and lo and behold the message came
> back with the same Vcore. Wonder what's up with this?

When this happens could you check in about:telemetry if the STARTUP_MEASUREMENT_ERRORS histogram is present and contains errors? If my guess is correct and the process startup time is wrong then one error should show up in the histogram's 0 entry.
Will do.
Simple Measurements
 
Property 	Value
AMI_startup_begin 	-2698479145685
XPI_startup_begin 	-2698479145681
XPI_bootstrap_addons_begin 	-2698479145620
XPI_bootstrap_addons_end 	-2698479145571
XPI_startup_end 	-2698479145571
AMI_startup_end 	-2698479145571
delayedStartupStarted 	-2698479144897
sessionRestoreInitialized 	-2698479144885
delayedStartupFinished 	-2698479144849
uptime 	-44974648
addonManager 	{"installedUnpacked":9}
startupInterrupted 	0
debuggerAttached 	0
js 	{"setProto":0,"customIter":0}
shutdownDuration 	196
maximalNumberOfConcurrentThreads 	31
startupSessionRestoreReadBytes 	33805795
startupSessionRestoreWriteBytes 	58342
startupWindowVisibleReadBytes 	34428387
startupWindowVisibleWriteBytes 	58342
savedPings 	54
(In reply to Marco Bonardo [:mak] from comment #16)
> (In reply to Matthew N. [:MattN] from comment #15)
> > I can't prioritize this above Australis but I will definitely look at this
> > as we get closer to Aurora.
> 
> Sure, I just said I think we should have at least a workaround before
> release (we could just check the date is not in the future, for example).

Or use Date.now(). Why are we using startup.getStartupInfo().process? The difference shouldn't matter for this feature.
Attached patch patchSplinter Review
Assignee: nobody → dao
Status: NEW → ASSIGNED
Attachment #782110 - Flags: review?(mak77)
Comment on attachment 782110 [details] [diff] [review]
patch

Review of attachment 782110 [details] [diff] [review]:
-----------------------------------------------------------------

yes, this makes sense, we don't need that level of precision to compare with 60 days.
Attachment #782110 - Flags: review?(mak77) → review+
I guess someone should file a new bug for nsIAppStartup.getStartupInfo().process being off?

https://hg.mozilla.org/integration/fx-team/rev/5f7ad1bfd200
Component: Startup and Profile System → General
Product: Toolkit → Firefox
Summary: "It looks like you haven't started Nightly in a while ..." message → "It looks like you haven't started Nightly in a while ..." message appears when it shouldn't
It so happens that I get a message from LastPass that my LP session has expired at least once a day.  This coincides with receiving the reset messages I believe.  Do you think this could be another similar situation in calculating dates?
https://hg.mozilla.org/mozilla-central/rev/5f7ad1bfd200
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 25
(In reply to Gary [:streetwolf] from comment #41)
> It so happens that I get a message from LastPass that my LP session has
> expired at least once a day.  This coincides with receiving the reset
> messages I believe.  Do you think this could be another similar situation in
> calculating dates?

It depends on what LastPass uses for the comparison, we don't have control over that.
I could not reproduce the initial issue on the 21st of July Nightly.
Gary since you were able to reproduce it, could you please verify if the issue is fixed on the latest Nightly and on the latest Aurora?

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/
Flags: needinfo?(garyshap)
I would say it is fixed.
Flags: needinfo?(garyshap)
Per comment 45
Status: RESOLVED → VERIFIED
Hi, sorry to barge in, but that bug is back, at least for me. It happens every single time I launch Firefox 25.0 or 25.0.1 on my MacBook. I believe the problem started with version 25.0, but I am not entirely sure. I have 3 partitions on my MacBook, NTFS and Mac OS for the OSes (Mac OS X 10.7.5 and Windows 7 Pro, 32 bit edition), and a FAT32 partition to share data between the OSes, including my Mozilla profiles. I have some odd issues like the addon for AVG is considered "new" the first time I launch Firefox after a reboot into Windows, but this issue is happening every single time I launch Firefox on the Mac side (not sure at this point about Windows side).

Because the MacBook deals with time differently in Windows and Mac OS, the clock is 3 hours off when I reboot until it can connect to the internet for clock adjustment, so I could understand that files may have a creation time which could be up to 3h in the future on occasion, but not 61 days off. Please fix that bug or indicate a way to disable that feature. If I hit the reset button by mistake, I might as well just look for a different browser than have to rebuild my customization over and over.
By the way, from above request to previous user:
[22:09:42.730] Services.appinfo.replacedLockTime
[22:09:42.733] 1340812112000
[22:09:53.937] Services.startup.getStartupInfo().process
[22:09:53.940] [object Date]
[22:10:32.489] Services.startup.getStartupInfo().process.toString()
[22:10:32.491] "Sun Nov 17 2013 22:07:59 GMT-0500 (EST)"

SYSTEM INFO:
MacBook Pro
15-inch, Mid 2012

Processor  2.3 GHz Intel Core i7
Memory  4 GB 1600 MHz DDR3
Graphics  Intel HD Graphics 4000 384 MB
Software  Mac OS X Lion 10.7.5 (11G63)
(In reply to vincent from comment #47)
> Because the MacBook deals with time differently in Windows and Mac OS, the
> clock is 3 hours off when I reboot until it can connect to the internet for
> clock adjustment, so I could understand that files may have a creation time
> which could be up to 3h in the future on occasion, but not 61 days off.

From your description of the problem and from the data you posted I think that we're seeing a different issue here, see below:

(In reply to vincent from comment #48)
> [22:09:42.730] Services.appinfo.replacedLockTime
> [22:09:42.733] 1340812112000
> [22:09:53.937] Services.startup.getStartupInfo().process
> [22:09:53.940] [object Date]
> [22:10:32.489] Services.startup.getStartupInfo().process.toString()
> [22:10:32.491] "Sun Nov 17 2013 22:07:59 GMT-0500 (EST)"

Your replacedLockTime is off by over a year compared to the current startup time. It would be interesting to see those two values when running in Windows to compare; my gut feeling is that on Mac the modified time of files on a FAT32 device might be interpreted incorrectly.

Either way the problem is different than what we faced in this bug so could you please open a new one by cloning this bug? It would be very helpful if you could put the info above when running on both Mac and Windows.

In the meantime as a workaround you could consider using separate profiles and sync them via Firefox Sync which works very well for me across multiple devices and OSes.
Ok, done, see Bug 940140. Locktime looks very different on Windows, no message appears in Firefox. I may try Firefox Sync and avoid the issue, hopefully.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: