Closed Bug 971799 Opened 10 years ago Closed 8 years ago

Log # network bytes transmitted/received to Graphite

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jhopkins, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

We currently only log packets, not actual bytes transmitted/received.
Attachment #8374927 - Flags: review?(jwatkins)
Comment on attachment 8374927 [details] [diff] [review]
[puppet] configure collectd::plugins::ethstat

This looks fine.  My only concern is it will add about 50 more metrics per node.  I would like to determine how many total new metrics this will add up to and double check with :ericz on how it might impact the graphite cluster.  When new metrics are added, carbon does rate limit the whisper db file creation process so it might also take a day or two for these new metrics to appear.

I don't exactly remember the total count of releng centos and ubuntu instances.

:jhopkins, would you know?

r+ if it won't bring down graphite :-)
Attachment #8374927 - Flags: review?(jwatkins) → review+
Flags: needinfo?(eziegenhorn)
What I could do instead is "Map" the metrics I am interested in and limit reporting to just those metrics.  For example:

  Map "rx_bytes" "if_rx_bytes"
  Map "tx_bytes" "if_tx_bytes"
  Map "rx_errors" "if_rx_errors"
  Map "tx_errors" "if_tx_errors"
  MappedOnly true

Still would be good to know what impact logging all the metrics has.

The number of machines checking in varies.  There's ~1,050 AWS instances running at the moment, plus what physical Centos and Ubuntu machines we have in Inventory.
We certainly don't have room for 50 more metrics per host.  :jhopkins are you saying you could add just 4 more?  4 would work but be tight in SCL3 (relevant bug is 971389 to free up more space).  PHX1 is already overloaded so I'm not sure it really would matter and that's being worked on in bug 919038.
Flags: needinfo?(eziegenhorn)
I could use just two more metrics (rx_bytes, tx_bytes), even, since rx/tx errors are already covered by the Interface module.
Works for me.
Attachment #8374927 - Attachment is obsolete: true
Attachment #8375701 - Flags: review?(jwatkins)
Comment on attachment 8375701 [details] [diff] [review]
[puppet] reduce list of metrics vs. original patch

Looks good.  Although I'm fairly certain ethstat isn't compatible with OSX.  Might want to take out the Darwin case so it fails if added to the darwin profile.
Attachment #8375701 - Flags: review?(jwatkins) → review+
Comment on attachment 8375701 [details] [diff] [review]
[puppet] reduce list of metrics vs. original patch

Removed 'Darwin' case and landed as 
https://hg.mozilla.org/build/puppet/rev/34372e5c9d6d
Attachment #8375701 - Flags: checked-in+
Merged to production.  New ethstat/* metrics should show up in the next day or two (mentioned in comment 2) for Linux hosts.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Unfortunately, this is only working for physical hosts.  EC2 hosts report no stats, and I've reproduced same from the console "ethtool -S eth0".
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Currently working on win64 related work which is a higher priority (per taras).  Unassigning so I don't block someone else working on this bug.
Assignee: jhopkins → nobody
Depends on: 975047
Status: REOPENED → RESOLVED
Closed: 10 years ago8 years ago
Resolution: --- → INCOMPLETE
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: