Closed Bug 344439 Opened 18 years ago Closed 18 years ago

SetInterval stops firing callback on x86_64 platform

Categories

(Core :: DOM: Core & HTML, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.8.1beta2

People

(Reporter: sylvain.pasche, Assigned: jst)

References

Details

(Keywords: 64bit, fixed1.8.1)

Attachments

(5 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9a1) Gecko/20060710 Minefield/3.0a1
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9a1) Gecko/20060710 Minefield/3.0a1

When using setInterval with small interval values (<100ms) on amd64, the callback stops being called after some times.

I tried with two branch builds one x86_64 and the other i686, and the problem only appears with the x86_64 build.

Reproducible: Always
Attached file testcase (obsolete) —
The number should not stop incrementing. On x86_64, it stops if you interact with the browser (change tab, ...).

Other test case:

Open the firefox credits dialog box. The name of the contributors stops scrolling when you do something described above (the about dialog also uses setInterval for scrolling)
Blocks: 237202
Keywords: 64bit
Summary: SetInterval not firing on x86_64 platform → SetInterval stops firing callback on x86_64 platform
Status: UNCONFIRMED → NEW
Ever confirmed: true
When compiling nsGlobalWindow.cpp, I saw this gcc message:

/.../nsGlobalWindow.cpp: In member function 'void nsGlobalWindow::RunTimeout(nsTimeout*)':
/.../nsGlobalWindow.cpp:6599: warning: comparison between signed and unsigned integer expressions

What we have there is:

      if (delay < ((PRTime)DOM_MIN_TIMEOUT_VALUE * PR_USEC_PER_MSEC)) {
        delay = (PRTime)DOM_MIN_TIMEOUT_VALUE * PR_USEC_PER_MSEC;

So I tried casting PR_USEC_PER_MSEC to PRTime, and that solves the issue.

I was wondering why this was not an issue on 32bit arch. It looks likes type coercion of PR_USEC_PER_MSEC (= 1000UL = long unsigned int) with PRTime (= long long int) is signed. However, that is unsigned on 64bit (where PRTime is long int).

There are other occurrences of PR_USE_PER_MSEC on that file, which may suffer from the same issue. If the way to go is to cast them everywhere, I can provide a patch.
Flags: blocking1.8.1?
Johnny - can you take a look?
Assignee: general → jst
Flags: blocking1.8.1? → blocking1.8.1+
This looks like a compiler bug to me. The trivial test:

  int64_t a = -1;

  if (a < 1UL) {
    return 1;
  }

  return 0;

returns 0 when compiled on a x86_64 Linux box, and returns 1 on a 32-bit x86 box (both Win32 and Linux). I looked at all the other occurances of PR_USEC_PER_MSEC in nsGlobalWindow.cpp but I don't see any others that need fixing. Patch based on Sylvain's research coming up.
Status: NEW → ASSIGNED
Attached patch Fix.Splinter Review
This fixes this bug by forcing both sides of the < comparison to be of the same type (PRType), and that way the delay ends up being set to a sane value if it is negative at the time of this comparison.
Attachment #229611 - Flags: superreview?
Attachment #229611 - Flags: review?(mrbkap)
Attachment #229611 - Flags: superreview? → superreview?(dbaron)
(In reply to comment #4)
> This looks like a compiler bug to me. The trivial test:
> 
>   int64_t a = -1;
> 
>   if (a < 1UL) {
>     return 1;
>   }
> 
>   return 0;
> 
> returns 0 when compiled on a x86_64 Linux box, and returns 1 on a 32-bit x86
> box (both Win32 and Linux).

From looking at the C standard, this behavior is correct.  In C, when comparing a signed type and an unsigned type (per 6.3.1.8 clause 1 of ISO/IEC 9899:1999), conversion to the unsigned type is used unless the signed type
 (a) has greater rank (long long int > long int > int > ..., etc.) and
 (b) can represent all the values in the signed type (which implies (a),
     as far as I can tell).
This means that when comparing uint32 and int64, they are converted to int64, but when comparing uint64 and int64, they are converted to uint64.

I can't find anything in the C++ standard that defines this, or for that matter, anything that makes it acceptable for the compiler to not give an error when a signed type and unsigned type of the same size are compared (the error required when there isn't a unique "Best Viable Function").
Comment on attachment 229611 [details] [diff] [review]
Fix.

sr=dbaron, although I think it would probably be good to drop the cast on the next line as well, and perhaps even add a comment referring to the need to cast for 64-bit platforms.

Also (a larger change, probably not for this bug), it would be nice if these constants carried their units, especially since DOM_MIN_TIMEOUT_VALUE and DOM_MAX_TIMEOUT_VALUE have different units (MSEC, INTERVAL).
Attachment #229611 - Flags: superreview?(dbaron) → superreview+
Attachment #229611 - Flags: review?(mrbkap) → review+
This is what I checked in.
Fix checked in on the trunk. Thank you Sylvain for finding what caused this!
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Attachment #229662 - Flags: approval1.8.1?
Target Milestone: --- → mozilla1.8.1beta2
Attachment #229015 - Attachment is obsolete: true
great! I had a crash while writing this comment :-/. I will try to remember what I was writing.

If you try the updated testcase, you will see that the lateness value (second line) sometimes takes quite wrong values (large negative ones) on x86_64. I tried changing this line in nsGlobalWindow.cpp:

-      handler->SetLateness((PRIntervalTime)(lateness / PR_USEC_PER_MSEC));
+      handler->SetLateness((PRIntervalTime)(lateness / (PRTime)PR_USEC_PER_MSEC));

And then it is ok.

To understand what's going on, I tried running that:

#include <stdio.h>
#ifdef __x86_64__
typedef long int int64;
#else
typedef long long int int64;
#endif

int main() {
  int64 l = -100;
  printf("%i\n", (int64)(l / 10UL));
}

Which shows me on x86_64 (gcc 4.0.3):

-1717986929

and on 32bit:

-10

Is it about the same rule as comment 6?

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 229662 [details] [diff] [review]
Same as above with dbaron's suggestions.

a=dbaron on behalf of drivers.  Please check in to MOZILLA_1_8_BRANCH and mark
fixed1.8.1 once you have.
Attachment #229662 - Flags: approval1.8.1? → approval1.8.1+
I think I forgot to add some explanations to my comment 11 above.

First of all, thanks for the patch. That solves this specific issue.

However, there are still some problems with the handling of the PR_USEC_PER_MSEC constant in some situations. The other situation I mention is when you write a callback function used by setInterval, and use the "secret" argument in the callback which returns the lateness in milliseconds. In this case, there are some issues with this lateness value (see updated testcase). 

The problem may not be serious, as this argument is secret. However, the main issue is that there may still be some bad behavior on 64bit platforms with the handling of this constant. The lateness is just one example I could see visually.

I could have opened a new bug about this, but this one already contains valuable information about this issue.

Maybe the title should be changed to something like "Issues with signedness of PR_USEC_PER_MSEC in nsGlobalWindow.cpp on x86_64 affecting window.SetInterval and potentially others".
(In reply to comment #6)
> I can't find anything in the C++ standard that defines this, or for that
> matter, anything that makes it acceptable for the compiler to not give an error
> when a signed type and unsigned type of the same size are compared (the error
> required when there isn't a unique "Best Viable Function").

I found it:  it's in [expr.rel] paragraph 2, which says "The usual arithmetic conversions are performed on operands of arithmetic or enumeration type."  Those are defined in [expr], paragraph 9.
(In reply to comment #11)
> -      handler->SetLateness((PRIntervalTime)(lateness / PR_USEC_PER_MSEC));
> +      handler->SetLateness((PRIntervalTime)(lateness /
> (PRTime)PR_USEC_PER_MSEC));

Yes, we should do this too.  This has the same problem -- on 64bit platforms it will be an unsigned division rather than a signed division.
This does what Sylvain suggested and adds a comment about this too. I looked at  all the places where we divide PRTime with an unsigned long and there's two other places but in those cases the we don't necessarily need it since we know the values are always positive (at least in the one case), but to prevent cut-n-paste creep of incorrect code I made the same change in those cases as well.
I just checked this in on the trunk, and I just realized that I talked about operands of different sizes in my checkin comment when I meant *same* sizes. Either way, this is FIXED.
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
Comment on attachment 229963 [details] [diff] [review]
Change division of PRTime and unsigned long to work right as well.

This one goes hand-in-hand with the previous patch (already approved) in this bug.
Attachment #229963 - Flags: approval1.8.1?
Thanks.

>+      // Note: We must cast the rhs expression to PRTime to work
>+      // around what looks like a compiler bug on x86_64.

Maybe that should be adapted because of comment 14 ?
Comment on attachment 229963 [details] [diff] [review]
Change division of PRTime and unsigned long to work right as well.

a=dbaron on behalf of drivers.  Please check in to MOZILLA_1_8_BRANCH and mark fixed1.8.1 once you have done so.
Attachment #229963 - Flags: approval1.8.1? → approval1.8.1+
Fixed on the 1.8.1 branch.
Keywords: fixed1.8.1
And once again, thank you Sylvain for your help here!
This for some reason didn't actually land on the 1.8 branch when I added the fixed-1.8.1 keyword, but I just checked it in and it's in now, for sure. 
*** Bug 352607 has been marked as a duplicate of this bug. ***
I'm seeing this problem today, I don't know if it's some regression or what.

Running the following in my console I only see it count to 30 and then stop.

var x = 1;
setInterval(function () {
    console.log(x+=1);
}, 0);


Name: Firefox
Version: 23.0.1
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0
Could you please file a new bug instead of commenting a random old bug?
No.

I did not comment on a "random" bug. I commented on the exact bug that I am seeing. Just because it was a bug before doesn't mean that it is not a bug now. The information and patches attached to the exact bug I am seeing may be useful to fixing it, again.

I don't care how old the bug is, regression is when code revisits an old bug so my report will revisit the old bug report. Common courtesy is to search a bug database for the bug you've found before creating new bug reports.

Would you have marked my new bug as a duplicate of this one if I had filed a new one, well you could. You could also then say "hurr it's been fixed years ago".

I saw a bug. I looked in the database to see if it has been identified, it has, and I posted a new sighting of it. You're not making sense asking me to file a separate report.
Kastor, unless this bug is reopened, commenting here does no good.  Keep in mind the original bug report is against Linux, not Windows as your user-agent indicates.

:emk is correct in asking you file your report as a new bug, particularly since the last traffic on this bug was over 5 years ago.  There have been so many changes in this timespan (the original bug was against Mozilla 1.8 code, which is Firefox 2!) that the odds of a correlation are pretty low.

It's just how the Mozilla community does things - a "local custom", if you will.  Please do file as a new bug, particularly if you can generate your testcase as a HTML page.  (That'd make it easier for us, by the way, to add your testcase to our test suites, to prevent it from regressing again!)
ok ok, I will abide by the "local custom" and file a new report. 

For the testcase I'll follow the document at https://developer.mozilla.org/en-US/docs/Mozilla/QA/Reducing_testcases and just write up a regular html page. Let me know if there's some special format for test cases.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: