Closed Bug 528734 Opened 15 years ago Closed 13 years ago

spike in crashes [@ js_Interpret ] in nov 2009

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: chofmann, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: [crashkill])

Crash Data

bouncing along between 786 and 965 crashes with this signature between 10/04 and 10/18 with mild fluctuations
 
786   total crashes for js_Interpret on 20091004-crashdata.csv
...
965   total crashes for js_Interpret on 20091018-crashdata.csv

934   total crashes for js_Interpret on 20091019-crashdata.csv
929   total crashes for js_Interpret on 20091020-crashdata.csv
994   total crashes for js_Interpret on 20091021-crashdata.csv
939   total crashes for js_Interpret on 20091022-crashdata.csv
915   total crashes for js_Interpret on 20091023-crashdata.csv
868   total crashes for js_Interpret on 20091024-crashdata.csv
949   total crashes for js_Interpret on 20091025-crashdata.csv
1005   total crashes for js_Interpret on 20091026-crashdata.csv
1039   total crashes for js_Interpret on 20091027-crashdata.csv
998   total crashes for js_Interpret on 20091028-crashdata.csv
967   total crashes for js_Interpret on 20091029-crashdata.csv
913   total crashes for js_Interpret on 20091030-crashdata.csv

fx3.6b1 released

928   total crashes for js_Interpret on 20091031-crashdata.csv
1119   total crashes for js_Interpret on 20091101-crashdata.csv
1126   total crashes for js_Interpret on 20091102-crashdata.csv
1320   total crashes for js_Interpret on 20091103-crashdata.csv
1321   total crashes for js_Interpret on 20091104-crashdata.csv
1884   total crashes for js_Interpret on 20091105-crashdata.csv

fx3.5.5 released

3125   total crashes for js_Interpret on 20091106-crashdata.csv
3328   total crashes for js_Interpret on 20091107-crashdata.csv
3894   total crashes for js_Interpret on 20091108-crashdata.csv
3857   total crashes for js_Interpret on 20091109-crashdata.csv
3879   total crashes for js_Interpret on 20091110-crashdata.csv

fx3.6b2 released

4240   total crashes for js_Interpret on 20091111-crashdata.csv
4013   total crashes for js_Interpret on 20091112-crashdata.csv

still some work to do, but data above and preliminary analysis in https://bugzilla.mozilla.org/show_bug.cgi?id=510996#c45 makes it look like there is a strong correlation to a regression in 3.6 betas than in 3.5.5 as the 3.6 beta population grows.

one or more bugs needs to block 3.6 if that is the case.

blocking on a tracking bug is not ideal, but we need to block all releases on something until this spike is diagnosed, understood, and maybe fixed.

work in bug 510996 , and bug 519363 seem to provide the best hope for reducing this spike in crash volume so far.
Flags: blocking1.9.2?
Whiteboard: [crashkill]
https://wiki.mozilla.org/CrashKill/Crashr#Release_3.6b2 also shows a doubling
of the 3.6b1 population from 65k to 125k around the time we see the first
increase in volume before 3.5.5 was released.  then from 11/02 to 11/11 we had
another 88% increase in beta testers on 3.6.
The most used 3.5.x release in service to the ~40M 3.5.x user base has been been floating along at 600-700 crashes per day on js_Interpret

here is a snapshot

 659 20091015-crashdata.csv js_Interpret 3.5.3
 642 20091016-crashdata.csv js_Interpret 3.5.3
 640 20091017-crashdata.csv js_Interpret 3.5.3
 704 20091018-crashdata.csv js_Interpret 3.5.3
 686 20091019-crashdata.csv js_Interpret 3.5.3
 705 20091020-crashdata.csv js_Interpret 3.5.3
 752 20091021-crashdata.csv js_Interpret 3.5.3
 670 20091022-crashdata.csv js_Interpret 3.5.3
 658 20091023-crashdata.csv js_Interpret 3.5.3
 631 20091024-crashdata.csv js_Interpret 3.5.3
 712 20091025-crashdata.csv js_Interpret 3.5.3
 706 20091026-crashdata.csv js_Interpret 3.5.3
 775 20091027-crashdata.csv js_Interpret 3.5.3
 564 20091028-crashdata.csv js_Interpret 3.5.3

3.5.4 ships

 495 20091029-crashdata.csv js_Interpret 3.5.4
 539 20091030-crashdata.csv js_Interpret 3.5.4
 540 20091031-crashdata.csv js_Interpret 3.5.4
 654 20091101-crashdata.csv js_Interpret 3.5.4
 674 20091102-crashdata.csv js_Interpret 3.5.4
 740 20091103-crashdata.csv js_Interpret 3.5.4
 677 20091104-crashdata.csv js_Interpret 3.5.4
 655 20091105-crashdata.csv js_Interpret 3.5.4
 452 20091106-crashdata.csv js_Interpret 3.5.4

  15 20091105-crashdata.csv js_Interpret 3.5.5

3.5.5 ships

 231 20091106-crashdata.csv js_Interpret 3.5.5
 528 20091107-crashdata.csv js_Interpret 3.5.5
 666 20091108-crashdata.csv js_Interpret 3.5.5
 690 20091109-crashdata.csv js_Interpret 3.5.5
 692 20091110-crashdata.csv js_Interpret 3.5.5
 713 20091111-crashdata.csv js_Interpret 3.5.5
 699 20091112-crashdata.csv js_Interpret 3.5.5

I think a regression in 3.5.x can be ruled out.
here is the signature growth counts for 3.6 betas

host-5-95:crashdata chofmann$ ./sig-count-by-version.sh ^js_Interpret ^3.6b1\$ 20091*
   3 20091015-crashdata.csv js_Interpret 3.6b1
   9 20091016-crashdata.csv js_Interpret 3.6b1
   5 20091017-crashdata.csv js_Interpret 3.6b1
  11 20091018-crashdata.csv js_Interpret 3.6b1
  13 20091019-crashdata.csv js_Interpret 3.6b1
  15 20091020-crashdata.csv js_Interpret 3.6b1
  19 20091021-crashdata.csv js_Interpret 3.6b1
  23 20091022-crashdata.csv js_Interpret 3.6b1
  14 20091023-crashdata.csv js_Interpret 3.6b1
  16 20091024-crashdata.csv js_Interpret 3.6b1
  22 20091025-crashdata.csv js_Interpret 3.6b1
  24 20091026-crashdata.csv js_Interpret 3.6b1
  16 20091027-crashdata.csv js_Interpret 3.6b1
  24 20091028-crashdata.csv js_Interpret 3.6b1
  26 20091029-crashdata.csv js_Interpret 3.6b1
  17 20091030-crashdata.csv js_Interpret 3.6b1
  81 20091031-crashdata.csv js_Interpret 3.6b1
 140 20091101-crashdata.csv js_Interpret 3.6b1
 160 20091102-crashdata.csv js_Interpret 3.6b1
 211 20091103-crashdata.csv js_Interpret 3.6b1
 288 20091104-crashdata.csv js_Interpret 3.6b1
 845 20091105-crashdata.csv js_Interpret 3.6b1
2022 20091106-crashdata.csv js_Interpret 3.6b1
2224 20091107-crashdata.csv js_Interpret 3.6b1
2610 20091108-crashdata.csv js_Interpret 3.6b1
2576 20091109-crashdata.csv js_Interpret 3.6b1
2596 20091110-crashdata.csv js_Interpret 3.6b1
1913 20091111-crashdata.csv js_Interpret 3.6b1
1004 20091112-crashdata.csv js_Interpret 3.6b1
host-5-95:crashdata chofmann$ ./sig-count-by-version.sh ^js_Interpret ^3.6b2\$ 20091*
   2 20091109-crashdata.csv js_Interpret 3.6b2
  70 20091110-crashdata.csv js_Interpret 3.6b2
1137 20091111-crashdata.csv js_Interpret 3.6b2
1894 20091112-crashdata.csv js_Interpret 3.6b2
trunk shows an interesting spike around 11/06

   3 20091004-crashdata.csv js_Interpret 3.7a1pre
   9 20091005-crashdata.csv js_Interpret 3.7a1pre
  11 20091006-crashdata.csv js_Interpret 3.7a1pre
  10 20091007-crashdata.csv js_Interpret 3.7a1pre
   9 20091008-crashdata.csv js_Interpret 3.7a1pre
   6 20091009-crashdata.csv js_Interpret 3.7a1pre
   8 20091010-crashdata.csv js_Interpret 3.7a1pre
   4 20091011-crashdata.csv js_Interpret 3.7a1pre
  10 20091012-crashdata.csv js_Interpret 3.7a1pre
  10 20091013-crashdata.csv js_Interpret 3.7a1pre
   9 20091014-crashdata.csv js_Interpret 3.7a1pre
  19 20091015-crashdata.csv js_Interpret 3.7a1pre
  10 20091016-crashdata.csv js_Interpret 3.7a1pre
  12 20091017-crashdata.csv js_Interpret 3.7a1pre
  11 20091018-crashdata.csv js_Interpret 3.7a1pre
  11 20091019-crashdata.csv js_Interpret 3.7a1pre
  19 20091020-crashdata.csv js_Interpret 3.7a1pre
  12 20091021-crashdata.csv js_Interpret 3.7a1pre
  10 20091022-crashdata.csv js_Interpret 3.7a1pre
  16 20091023-crashdata.csv js_Interpret 3.7a1pre
  12 20091024-crashdata.csv js_Interpret 3.7a1pre
  10 20091025-crashdata.csv js_Interpret 3.7a1pre
  19 20091026-crashdata.csv js_Interpret 3.7a1pre
  20 20091027-crashdata.csv js_Interpret 3.7a1pre
  16 20091028-crashdata.csv js_Interpret 3.7a1pre
  25 20091029-crashdata.csv js_Interpret 3.7a1pre
  13 20091030-crashdata.csv js_Interpret 3.7a1pre
  18 20091031-crashdata.csv js_Interpret 3.7a1pre
  14 20091101-crashdata.csv js_Interpret 3.7a1pre
   7 20091102-crashdata.csv js_Interpret 3.7a1pre
  17 20091103-crashdata.csv js_Interpret 3.7a1pre
  20 20091104-crashdata.csv js_Interpret 3.7a1pre
  23 20091105-crashdata.csv js_Interpret 3.7a1pre
  81 20091106-crashdata.csv js_Interpret 3.7a1pre
  36 20091107-crashdata.csv js_Interpret 3.7a1pre
 114 20091108-crashdata.csv js_Interpret 3.7a1pre
  92 20091109-crashdata.csv js_Interpret 3.7a1pre
 105 20091110-crashdata.csv js_Interpret 3.7a1pre
  69 20091111-crashdata.csv js_Interpret 3.7a1pre
  36 20091112-crashdata.csv js_Interpret 3.7a1pre
the build date where that largest number of crashes were seen is around 11/8.  129 crashes on builds from that day.

crashes by build
   1 js_Interpret 20091013
   6 js_Interpret 20091014
   3 js_Interpret 20091016
   4 js_Interpret 20091019
   4 js_Interpret 20091020
   9 js_Interpret 20091021
   3 js_Interpret 20091022
   5 js_Interpret 20091023
   4 js_Interpret 20091024
   7 js_Interpret 20091025
  12 js_Interpret 20091026
  10 js_Interpret 20091027
   5 js_Interpret 20091028
  12 js_Interpret 20091029
  18 js_Interpret 20091030
   9 js_Interpret 20091031
  12 js_Interpret 20091101
  13 js_Interpret 20091102
  21 js_Interpret 20091103
  19 js_Interpret 20091104
  45 js_Interpret 20091105
  56 js_Interpret 20091106
  48 js_Interpret 20091107
 129 js_Interpret 20091108
  72 js_Interpret 20091109
  49 js_Interpret 20091110
  35 js_Interpret 20091111
   3 js_Interpret 20091112
we bounced between 1 and 18 crashes on builds made during all of august and september so that appears to be the baseline.   builds from the trunk made after 11/05 seem to be where we break from that pattern and the possible regression was introduced.  that doesn't help to explain the increased rate of crashes in 3.6b1 which was all ready out the door by then. 

were there any changes made to branch for the 3.6b1 release that didn't go to the trunk first?

its also possible that some update rate of trunk users is also distorting these trunk build crash rates.  if the regression was checked into the trunk and 3.6b1 back in late oct. or early nov. but not many users tested those early builds it might have stayed hidden in the crash numbers in comment 5.
from comment 14

when we saw that first spike of 81 crashes on 11/6

 81 20091106-crashdata.csv js_Interpret 3.7a1pre

here is the breakdown of builds those trunk testers were using...  mostly builds from that day and the day before.

./sig-count-on-trunk.sh js_Interpret  20091106*   | more
crashes by build
   1 js_Interpret 20091019
   2 js_Interpret 20091021
   1 js_Interpret 20091024
   2 js_Interpret 20091025
   1 js_Interpret 20091026
   1 js_Interpret 20091028
   6 js_Interpret 20091029
   3 js_Interpret 20091031
   1 js_Interpret 20091101
   1 js_Interpret 20091102
   3 js_Interpret 20091103
   2 js_Interpret 20091104
  32 js_Interpret 20091105
  25 js_Interpret 20091106
Summary: spike in crashes [@ js_Interp ] in nov → spike in crashes [@ js_Interpret ] in nov
Summary: spike in crashes [@ js_Interpret ] in nov → spike in crashes [@ js_Interpret ] in nov 2009
I would definitely like to look into this. But can you tell me how you are generating this data? As we were talking about in the other bug, I find this kind of data analysis incredibly useful but I don't have a clean way of generating the data.
OK, first I'm going to try to identify the "features" in the data that need explanation. I think they are these:

1. Rapid crash growth in 3.6b1 during 10/30-11/06. Crashes were stably low before, and stably high after (if we consider the total of 3.6b1 + 3.6b2):

  17 20091030-crashdata.csv js_Interpret 3.6b1
  81 20091031-crashdata.csv js_Interpret 3.6b1
 140 20091101-crashdata.csv js_Interpret 3.6b1
 160 20091102-crashdata.csv js_Interpret 3.6b1
 211 20091103-crashdata.csv js_Interpret 3.6b1
 288 20091104-crashdata.csv js_Interpret 3.6b1
 845 20091105-crashdata.csv js_Interpret 3.6b1
2022 20091106-crashdata.csv js_Interpret 3.6b1
2224 20091107-crashdata.csv js_Interpret 3.6b1
2610 20091108-crashdata.csv js_Interpret 3.6b1

2. Rapid crash growth on trunk somewhere around 11/4-11/8.

Going by crash date, we have the increase occurring all on 11/06:

  23 20091105-crashdata.csv js_Interpret 3.7a1pre
  81 20091106-crashdata.csv js_Interpret 3.7a1pre

Going by build date, the bump occurs with the 11/05 build:

  19 js_Interpret 20091104
  45 js_Interpret 20091105

with a transient spike for the 11/08 and 11/09 builds.

  48 js_Interpret 20091107
 129 js_Interpret 20091108
  72 js_Interpret 20091109
  49 js_Interpret 20091110

Did I catch all the jumps we need to look at?
One thing I forgot to mention above is that I'm not sure I understand the data. In the reports above, when it says "3.6b1", does that mean only the 3.6b1 release, or does it also include the "3.6b1pre" nightlies? Similarly, on the "Crashr" page on the wiki, I see there is a table of values for "3.6b1", but "b1 released" is a note on the table, which seems to imply that the table is also including "3.6b1pre" nightlies. Is that right?

Regarding (1), the rapid crash growth in 3.6 beta during 10/30-11/06, it looks to me like it is probably a function of getting an increased number of users. I can't say for sure without knowing the answers to the questions above, but as comment 1 notes, there was huge growth in the number of beta users near that date range.
re: when it says "3.6b1", does that mean...

that means when releng has flipped the product name in the nighly or release build scripts.

"3.6b1" might mean a few days worth of builds when respin changes are happenning.  if you want to get real precise you could use the combination of release number and build_id_date or "build"  which is in field #9

so the sequence might look like

releng sets version = 3.6b1pre
many days of builds are produced as development grinds on

code freeze happends
code freeze really happens ;-)

releng sets  version = 3.6b1

builds made that day or the next  -- 10's or 100's download
bug found 
respin for new build  10's or 100's download
rinse and repeat for a few days..

3.6b1 ships...   volume in testers ramp up fast and continues to +100k
This looks to now be a meta bug, which can't block. We've blocked on the two dependencies, and AIUI may have ended up knocking down a bunch of the crashes because of it. We should definitely keep tracking it, though.
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
spun off bug 537039 which might be part of the 3.6 overall rise on js_Interpret since seems to appear only on that branch.
Depends on: 537039
er, bug 537039 == js_Interpret:916 is the top crash if these on 3.6b* and the most interesting one to look at for future stable releases.

there are other more frequent crashes on 3.5.x 

313 js_Interpret:4436 (3.5.x)

  checking --- 20091227-crashdata.csv js_Interpret:4436
  release total-crashes
                js_Interpret:4436 crashes
                         pct.
  all     207108  314     0.00151612
  3.0.15  2940            0
  3.0.16  32075           0
  3.5.5   8122    10      0.00123122
  3.5.6   110490  291     0.00263372
  3.6b5   20527           0
  3.6b4   2074            0
  3.6b3   628             0
  3.6b2   634             0
  3.6b1   1937            0

76 js_Interpret:4192 (3.0.x)

  checking --- 20091227-crashdata.csv js_Interpret:4192
  release total-crashes
                js_Interpret:4192 crashes
                           pct.
  all     207108  76      0.000366958
  3.0.15  2940    2       0.000680272
  3.0.16  32075   72      0.00224474
  3.5.5   8122            0
  3.5.6   110490          0
  3.6b5   20527           0 
  3.6b4   2074            0
  3.6b3   628             0
  3.6b2   634             0
  3.6b1   1937            0

73 js_Interpret:4432  (3.5.x)

   checking --- 20091227-crashdata.csv js_Interpret:4432
   release total-crashes
                 js_Interpret:4432 crashes
                            pct.
   all     207108  73      0.000352473
   3.0.15  2940            0
   3.0.16  32075           0
   3.5.5   8122    2       0.000246245
   3.5.6   110490  70      0.000633541
   3.6b5   20527           0
   3.6b4   2074            0
   3.6b3   628             0
   3.6b2   634             0
   3.6b1   1937            0

65 js_Interpret:1501  (maybe this was fixed in beta3?)

   checking --- 20091227-crashdata.csv js_Interpret:1501
   release total-crashes
                 js_Interpret:1501 crashes
                            pct.
   all     207108  65      0.000313846
   3.0.15  2940            0
   3.0.16  32075           0
   3.5.5   8122            0
   3.5.6   110490          0
   3.6b5   20527           0
   3.6b4   2074            0
   3.6b3   628             0
   3.6b2   634     12      0.0189274
   3.6b1   1937    53      0.0273619

61 js_Interpret:916 (3.6X)

   checking --- 20091227-crashdata.csv js_Interpret:916
   release total-crashes
                 js_Interpret:916 crashes
                            pct.
   all     207108  61      0.000294532
   3.0.15  2940            0
   3.0.16  32075           0
   3.5.5   8122            0
   3.5.6   110490          0
   3.6b5   20527   42      0.00204609
   3.6b4   2074    11      0.00530376
   3.6b3   628             0
   3.6b2   634     3       0.00473186
   3.6b1   1937    5       0.00258131

22 js_Interpret:904
   checking --- 20091227-crashdata.csv js_Interpret:904
   release total-crashes
                 js_Interpret:904 crashes
                            pct.
   all     207108  22      0.000106225
   3.0.15  2940            0
   3.0.16  32075           0
   3.5.5   8122            0
   3.5.6   110490          0
   3.6b5   20527   16      0.000779461
   3.6b4   2074    1       0.00048216
   3.6b3   628             0
   3.6b2   634     4       0.00630915
   3.6b1   1937    1       0.000516262

20 js_Interpret:4423

   checking --- 20091227-crashdata.csv js_Interpret:4423
   release total-crashes
                 js_Interpret:4423 crashes
                            pct.
   all     207108  20      9.6568e-05
   3.0.15  2940            0
   3.0.16  32075           0
   3.5.5   8122            0
   3.5.6   110490  17      0.00015386  also a few reports on 3.5 and 3.5.3
   3.6b5   20527           0 
   3.6b4   2074            0
   3.6b3   628             0
   3.6b2   634             0
   3.6b1   1937            0


other with lower volume.

20 js_Interpret:3883
17 @0x0 | js_Interpret:2240
15 js_Interpret:859
13 js_Interpret:4213
12 js_Interpret:6184
11 js_Interpret:4514
10 js_Interpret:1528
Crash Signature: [@ js_Interpret ]
Resolving as works for me. Let's assume the spike is over. We have other bugs logged for variations of this signature.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.