SetInterval stops firing callback on x86_64 platform

RESOLVED FIXED in mozilla1.8.1beta2

Status

()

defect
RESOLVED FIXED
13 years ago
a month ago

People

(Reporter: sylvain.pasche, Assigned: jst)

Tracking

({64bit, fixed1.8.1})

Trunk
mozilla1.8.1beta2
x86
Linux
Points:
---
Bug Flags:
blocking1.8.1 +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

13 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9a1) Gecko/20060710 Minefield/3.0a1
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9a1) Gecko/20060710 Minefield/3.0a1

When using setInterval with small interval values (<100ms) on amd64, the callback stops being called after some times.

I tried with two branch builds one x86_64 and the other i686, and the problem only appears with the x86_64 build.

Reproducible: Always
(Reporter)

Comment 1

13 years ago
Posted file testcase (obsolete) —
The number should not stop incrementing. On x86_64, it stops if you interact with the browser (change tab, ...).

Other test case:

Open the firefox credits dialog box. The name of the contributors stops scrolling when you do something described above (the about dialog also uses setInterval for scrolling)
(Reporter)

Updated

13 years ago
Blocks: 237202
Keywords: 64bit
(Reporter)

Updated

13 years ago
Summary: SetInterval not firing on x86_64 platform → SetInterval stops firing callback on x86_64 platform

Updated

13 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Reporter)

Comment 2

13 years ago
When compiling nsGlobalWindow.cpp, I saw this gcc message:

/.../nsGlobalWindow.cpp: In member function 'void nsGlobalWindow::RunTimeout(nsTimeout*)':
/.../nsGlobalWindow.cpp:6599: warning: comparison between signed and unsigned integer expressions

What we have there is:

      if (delay < ((PRTime)DOM_MIN_TIMEOUT_VALUE * PR_USEC_PER_MSEC)) {
        delay = (PRTime)DOM_MIN_TIMEOUT_VALUE * PR_USEC_PER_MSEC;

So I tried casting PR_USEC_PER_MSEC to PRTime, and that solves the issue.

I was wondering why this was not an issue on 32bit arch. It looks likes type coercion of PR_USEC_PER_MSEC (= 1000UL = long unsigned int) with PRTime (= long long int) is signed. However, that is unsigned on 64bit (where PRTime is long int).

There are other occurrences of PR_USE_PER_MSEC on that file, which may suffer from the same issue. If the way to go is to cast them everywhere, I can provide a patch.

Updated

13 years ago
Flags: blocking1.8.1?

Comment 3

13 years ago
Johnny - can you take a look?
Assignee: general → jst
Flags: blocking1.8.1? → blocking1.8.1+
(Assignee)

Comment 4

13 years ago
This looks like a compiler bug to me. The trivial test:

  int64_t a = -1;

  if (a < 1UL) {
    return 1;
  }

  return 0;

returns 0 when compiled on a x86_64 Linux box, and returns 1 on a 32-bit x86 box (both Win32 and Linux). I looked at all the other occurances of PR_USEC_PER_MSEC in nsGlobalWindow.cpp but I don't see any others that need fixing. Patch based on Sylvain's research coming up.
Status: NEW → ASSIGNED
(Assignee)

Comment 5

13 years ago
Posted patch Fix.Splinter Review
This fixes this bug by forcing both sides of the < comparison to be of the same type (PRType), and that way the delay ends up being set to a sane value if it is negative at the time of this comparison.
Attachment #229611 - Flags: superreview?
Attachment #229611 - Flags: review?(mrbkap)
(Assignee)

Updated

13 years ago
Attachment #229611 - Flags: superreview? → superreview?(dbaron)
(In reply to comment #4)
> This looks like a compiler bug to me. The trivial test:
> 
>   int64_t a = -1;
> 
>   if (a < 1UL) {
>     return 1;
>   }
> 
>   return 0;
> 
> returns 0 when compiled on a x86_64 Linux box, and returns 1 on a 32-bit x86
> box (both Win32 and Linux).

From looking at the C standard, this behavior is correct.  In C, when comparing a signed type and an unsigned type (per 6.3.1.8 clause 1 of ISO/IEC 9899:1999), conversion to the unsigned type is used unless the signed type
 (a) has greater rank (long long int > long int > int > ..., etc.) and
 (b) can represent all the values in the signed type (which implies (a),
     as far as I can tell).
This means that when comparing uint32 and int64, they are converted to int64, but when comparing uint64 and int64, they are converted to uint64.

I can't find anything in the C++ standard that defines this, or for that matter, anything that makes it acceptable for the compiler to not give an error when a signed type and unsigned type of the same size are compared (the error required when there isn't a unique "Best Viable Function").
Comment on attachment 229611 [details] [diff] [review]
Fix.

sr=dbaron, although I think it would probably be good to drop the cast on the next line as well, and perhaps even add a comment referring to the need to cast for 64-bit platforms.

Also (a larger change, probably not for this bug), it would be nice if these constants carried their units, especially since DOM_MIN_TIMEOUT_VALUE and DOM_MAX_TIMEOUT_VALUE have different units (MSEC, INTERVAL).
Attachment #229611 - Flags: superreview?(dbaron) → superreview+
Attachment #229611 - Flags: review?(mrbkap) → review+
(Assignee)

Comment 8

13 years ago
This is what I checked in.
(Assignee)

Comment 9

13 years ago
Fix checked in on the trunk. Thank you Sylvain for finding what caused this!
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
(Assignee)

Updated

13 years ago
Attachment #229662 - Flags: approval1.8.1?
Target Milestone: --- → mozilla1.8.1beta2
(Reporter)

Comment 10

13 years ago
Attachment #229015 - Attachment is obsolete: true
(Reporter)

Comment 11

13 years ago
great! I had a crash while writing this comment :-/. I will try to remember what I was writing.

If you try the updated testcase, you will see that the lateness value (second line) sometimes takes quite wrong values (large negative ones) on x86_64. I tried changing this line in nsGlobalWindow.cpp:

-      handler->SetLateness((PRIntervalTime)(lateness / PR_USEC_PER_MSEC));
+      handler->SetLateness((PRIntervalTime)(lateness / (PRTime)PR_USEC_PER_MSEC));

And then it is ok.

To understand what's going on, I tried running that:

#include <stdio.h>
#ifdef __x86_64__
typedef long int int64;
#else
typedef long long int int64;
#endif

int main() {
  int64 l = -100;
  printf("%i\n", (int64)(l / 10UL));
}

Which shows me on x86_64 (gcc 4.0.3):

-1717986929

and on 32bit:

-10

Is it about the same rule as comment 6?

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 229662 [details] [diff] [review]
Same as above with dbaron's suggestions.

a=dbaron on behalf of drivers.  Please check in to MOZILLA_1_8_BRANCH and mark
fixed1.8.1 once you have.
Attachment #229662 - Flags: approval1.8.1? → approval1.8.1+
(Reporter)

Comment 13

13 years ago
I think I forgot to add some explanations to my comment 11 above.

First of all, thanks for the patch. That solves this specific issue.

However, there are still some problems with the handling of the PR_USEC_PER_MSEC constant in some situations. The other situation I mention is when you write a callback function used by setInterval, and use the "secret" argument in the callback which returns the lateness in milliseconds. In this case, there are some issues with this lateness value (see updated testcase). 

The problem may not be serious, as this argument is secret. However, the main issue is that there may still be some bad behavior on 64bit platforms with the handling of this constant. The lateness is just one example I could see visually.

I could have opened a new bug about this, but this one already contains valuable information about this issue.

Maybe the title should be changed to something like "Issues with signedness of PR_USEC_PER_MSEC in nsGlobalWindow.cpp on x86_64 affecting window.SetInterval and potentially others".
(In reply to comment #6)
> I can't find anything in the C++ standard that defines this, or for that
> matter, anything that makes it acceptable for the compiler to not give an error
> when a signed type and unsigned type of the same size are compared (the error
> required when there isn't a unique "Best Viable Function").

I found it:  it's in [expr.rel] paragraph 2, which says "The usual arithmetic conversions are performed on operands of arithmetic or enumeration type."  Those are defined in [expr], paragraph 9.
(In reply to comment #11)
> -      handler->SetLateness((PRIntervalTime)(lateness / PR_USEC_PER_MSEC));
> +      handler->SetLateness((PRIntervalTime)(lateness /
> (PRTime)PR_USEC_PER_MSEC));

Yes, we should do this too.  This has the same problem -- on 64bit platforms it will be an unsigned division rather than a signed division.
(Assignee)

Comment 16

13 years ago
This does what Sylvain suggested and adds a comment about this too. I looked at  all the places where we divide PRTime with an unsigned long and there's two other places but in those cases the we don't necessarily need it since we know the values are always positive (at least in the one case), but to prevent cut-n-paste creep of incorrect code I made the same change in those cases as well.
(Assignee)

Comment 17

13 years ago
I just checked this in on the trunk, and I just realized that I talked about operands of different sizes in my checkin comment when I meant *same* sizes. Either way, this is FIXED.
Status: REOPENED → RESOLVED
Last Resolved: 13 years ago13 years ago
Resolution: --- → FIXED
(Assignee)

Comment 18

13 years ago
Comment on attachment 229963 [details] [diff] [review]
Change division of PRTime and unsigned long to work right as well.

This one goes hand-in-hand with the previous patch (already approved) in this bug.
Attachment #229963 - Flags: approval1.8.1?
(Reporter)

Comment 19

13 years ago
Thanks.

>+      // Note: We must cast the rhs expression to PRTime to work
>+      // around what looks like a compiler bug on x86_64.

Maybe that should be adapted because of comment 14 ?
Comment on attachment 229963 [details] [diff] [review]
Change division of PRTime and unsigned long to work right as well.

a=dbaron on behalf of drivers.  Please check in to MOZILLA_1_8_BRANCH and mark fixed1.8.1 once you have done so.
Attachment #229963 - Flags: approval1.8.1? → approval1.8.1+
(Assignee)

Comment 22

13 years ago
Fixed on the 1.8.1 branch.
Keywords: fixed1.8.1
(Assignee)

Comment 23

13 years ago
And once again, thank you Sylvain for your help here!
(Assignee)

Comment 24

13 years ago
This for some reason didn't actually land on the 1.8 branch when I added the fixed-1.8.1 keyword, but I just checked it in and it's in now, for sure. 

Comment 25

13 years ago
*** Bug 352607 has been marked as a duplicate of this bug. ***
(Reporter)

Updated

12 years ago
Duplicate of this bug: 403620

Comment 27

6 years ago
I'm seeing this problem today, I don't know if it's some regression or what.

Running the following in my console I only see it count to 30 and then stop.

var x = 1;
setInterval(function () {
    console.log(x+=1);
}, 0);


Name: Firefox
Version: 23.0.1
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0
Could you please file a new bug instead of commenting a random old bug?

Comment 29

6 years ago
No.

I did not comment on a "random" bug. I commented on the exact bug that I am seeing. Just because it was a bug before doesn't mean that it is not a bug now. The information and patches attached to the exact bug I am seeing may be useful to fixing it, again.

I don't care how old the bug is, regression is when code revisits an old bug so my report will revisit the old bug report. Common courtesy is to search a bug database for the bug you've found before creating new bug reports.

Would you have marked my new bug as a duplicate of this one if I had filed a new one, well you could. You could also then say "hurr it's been fixed years ago".

I saw a bug. I looked in the database to see if it has been identified, it has, and I posted a new sighting of it. You're not making sense asking me to file a separate report.
Kastor, unless this bug is reopened, commenting here does no good.  Keep in mind the original bug report is against Linux, not Windows as your user-agent indicates.

:emk is correct in asking you file your report as a new bug, particularly since the last traffic on this bug was over 5 years ago.  There have been so many changes in this timespan (the original bug was against Mozilla 1.8 code, which is Firefox 2!) that the odds of a correlation are pretty low.

It's just how the Mozilla community does things - a "local custom", if you will.  Please do file as a new bug, particularly if you can generate your testcase as a HTML page.  (That'd make it easier for us, by the way, to add your testcase to our test suites, to prevent it from regressing again!)

Comment 31

6 years ago
ok ok, I will abide by the "local custom" and file a new report. 

For the testcase I'll follow the document at https://developer.mozilla.org/en-US/docs/Mozilla/QA/Reducing_testcases and just write up a regular html page. Let me know if there's some special format for test cases.
Component: DOM → DOM: Core & HTML
Product: Core → Core
You need to log in before you can comment on or make changes to this bug.