Closed Bug 630798 Opened 13 years ago Closed 13 years ago

PHX throttle setting incorrect for Camino, Seamonkey, Thunderbird

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: alqahira, Assigned: jabba)

References

()

Details

Attachments

(1 file, 2 obsolete files)

Our 14-day topcrash report for 2.0.6 (see URL field) usually reports it covers ~72% of all 2.5-3.5K or so crashes.

Last Tuesday night (when I do my pre-meeting report), we only had 1967 crashes.  I explained that away with the recent downtime, the move to PHX and Hbase/collector stuff (bug 626944 and bug 626953).

However, this week we're down another ~60%, to only 819 crashes.

I'd love to believe a release version has, in two weeks, suddenly become 75% less crashy, but I don't think that's possible.  Our 2.0.6 user numbers remain very similar week-in and week-out (they've actually increased slightly over the course of January), so I really have to wonder if something about the move to PHX or to Socorro 1.7.6 is causing crashes to go missing.

Can someone please investigate?  Any chance we accidentally got throttled by the processor config during the move (Camino's supposed to be processed at 100% of all submitted crashes)?
(Is this only applicable to Camino? If Firefox crash volume has artificially dropped as well, that could cause issues with upcoming releases.)
Firefox volume looks normal.  Let's start by checking the throttle config.
Ok, looks like Camino is being throttled.  Turning over to jabba to update configs.
Assignee: nobody → jdow
(In reply to comment #3)
> Ok, looks like Camino is being throttled.  Turning over to jabba to update
> configs.

Any reason not to set these to 100% in the default configs?

I'll attach a patch, if that looks good we can build a socorro package with this change and push that out.

I'd like to make the object less complicated so we can override it using environment variables, so the /etc/socorro/ override system would be usable.
Attached patch config from old SJC prod (obsolete) — Splinter Review
From SJC collector02.

Do we want all the differences here?
Assignee: jdow → rhelmer
Status: NEW → ASSIGNED
Attachment #509154 - Flags: review?(lars)
Attachment #509154 - Flags: feedback?(laura)
Comment on attachment 509154 [details] [diff] [review]
config from old SJC prod

Yes, we want the same config as in SJC.
Attachment #509154 - Flags: feedback?(laura) → feedback+
Comment on attachment 509154 [details] [diff] [review]
config from old SJC prod

this patch contains more than the just a config change. It also appears to contain stuff checked into trunk for 1.7.7, we don't want that pushed to production.
Attachment #509154 - Flags: review?(lars) → review-
Ouch, that will teach me to scroll down.

r+ to the diff on collectorconfig.py.dist, only.
Attached patch config from old SJC prod (obsolete) — Splinter Review
Sorry, was lazy with my last diff. Here's just the intended changes.
Attachment #509154 - Attachment is obsolete: true
Attachment #509181 - Flags: review?(lars)
Thunderbird looks to be down as well.  What I see in https://crash-stats.mozilla.com/topcrasher/byversion/Thunderbird/3.1.7/ (261 for #1 crasher) strikes me as being roughly 10-20% of normal.  https://crash-stats.mozilla.com/topcrasher/byversion/Thunderbird/3.1.7/14 has 1675 for the same #1 crasher
One more try.. remove hard linebreaks inserted from copy/paste from SJC.
Attachment #509181 - Attachment is obsolete: true
Attachment #509207 - Flags: review?(lars)
Attachment #509181 - Flags: review?(lars)
(In reply to comment #10)
> Thunderbird looks to be down as well.  What I see in
> https://crash-stats.mozilla.com/topcrasher/byversion/Thunderbird/3.1.7/ (261
> for #1 crasher) strikes me as being roughly 10-20% of normal. 
> https://crash-stats.mozilla.com/topcrasher/byversion/Thunderbird/3.1.7/14 has
> 1675 for the same #1 crasher

The attached patch should take care of Thunderbird and Seamonkey as well as Camino.
Target Milestone: --- → 1.7.7
Comment on attachment 509207 [details] [diff] [review]
config from old SJC prod

add the email thing here, too, if it is appropriate...
Attachment #509207 - Flags: review?(lars) → review+
Summary: Camino 2.0.6 crash volume has declined precipitously since PHX deployment → PHX throttle setting incorrect for Camino, Seamonkey, Thunderbird
(In reply to comment #13)
> Comment on attachment 509207 [details] [diff] [review]
> config from old SJC prod
> 
> add the email thing here, too, if it is appropriate...

Went ahead and checked this in, for starters:
Committed r2906

Looking into the SJC revision control history to see if we used to bypass throttling based on email before.
Made a new build based on our existing one:
https://hudson.mozilla.org/job/socorro/237/artifact/trunk/socorro-attachment509207 [review].tar.gz

You can see both plus checksums here (md5sum for new one is 8fa413bc956dea66cbf2e48e5562f2f2)
https://hudson.mozilla.org/job/socorro/237/

tar zxf socorro.tar.gz
# apply patch, copy collectorconfig.py.dist to collectorconfig.py
# tar it up the way hudson does:
tar --mode 755 --owner 0 --group 0 -zcf socorro-attachment509207 [details] [diff] [review].tar.gz socorro

I put it up alongside the existing build:

jabba, the install script will make a backup directory, so if you want to make extra sure that there are no other changes (there should not be) you could diff after install like:

$ diff -r /data/socorro.21-01-2011_15_01_49/ /data/socorro
Assignee: rhelmer → jdow
This should be resolved now
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Collectors have the config change from attachment 509207 [details] [diff] [review]. This is also landed on trunk so should not regress again, sorry for the hassle.

Smokey/Wayne, please reopen if things don't start looking normal again tomorrow.
Thanks rhelmer and jabba for fixing this so quickly!

(In reply to comment #17)
> Smokey/Wayne, please reopen if things don't start looking normal again
> tomorrow.

I don't have a good idea of what our daily crash volume typically is, so it'll be hard for me to say if it's fixed 24 hours after deployment.  It's definitely better already, though, just based on some rough math, and I'll keep an eye on it over the next few days.
Hi Smokey, I noticed that you're close to the problem and might be able to more quickly pinpoint if the fix was correct. If you have time could you verify the fix.
Thx
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: