Closed Bug 1156498 Opened 9 years ago Closed 9 years ago

Test effect of switching to TCP communication with rsyslog

Categories

(Content Services Graveyard :: Tiles: Ops, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mostlygeek, Assigned: mostlygeek)

Details

Currently onyx (python) sends log messages to rsyslog via UDP. This has shown to lose some messages when size and message rate get too high. The TCP transport doesn't have these limitations but has higher overhead. 

This bug is meant to determine the effects of the overhead and what it means to the service running in production. 

Testing procedure: 

- create a testing onyx box in stage, a single c3.large
- create a load sending box (siege) that can talk to onyx directly (no ELB)
- measure with seige the req/sec UDP mode allows 
- measure message delivery percentage
- switch onyx to TCP mode
- run siege again, measuring req/sec and message delivery percentage
Assignee: nobody → bwong
Siege testing notes:

SIEGE URL FILE: (max.txt) 
===
# 10x small pings
http://ec2-52-6-108-164.compute-1.amazonaws.com/v2/links/click POST {"locale":"en-US", "d":"0k_ping"}
.... repeated 9 more times

# 5x 2K pings
http://ec2-52-6-108-164.compute-1.amazonaws.com/v2/links/view POST < 2k_ping.txt
... repeat 4 more times

# 3x 3K pings
http://ec2-52-6-108-164.compute-1.amazonaws.com/v2/links/view POST < 3k_ping.txt
...

# one big 4K ping
http://ec2-52-6-108-164.compute-1.amazonaws.com/v2/links/view POST < 4k_ping.txt

====

The POST files, 2k_ping.txt, 3k_ping.txt and 4k_ping.txt are the same with diff amounts of filler:

{"locale":"en-US", "d":"3k_ping 01234567890 ...filler data... "}

APPROXIMATE SIZES: 

[ec2-user@ip-172-31-27-64 ~]$ ls -l *_ping.txt
-rw-rw-r-- 1 ec2-user ec2-user 2036 Apr 20 22:39 2k_ping.txt
-rw-rw-r-- 1 ec2-user ec2-user 3091 Apr 20 22:39 3k_ping.txt
-rw-rw-r-- 1 ec2-user ec2-user 4037 Apr 20 22:40 4k_ping.txt
UDP TEST #1
===========

[ec2-user@ip-172-31-27-64 ~]$ siege -c 10 -i -q -b -t 30s -f max.txt

Lifting the server siege...      done.

Transactions:                  29895 hits
Availability:                 100.00 %
Elapsed time:                  29.37 secs
Data transferred:               0.00 MB
Response time:                  0.01 secs
Transaction rate:            1017.88 trans/sec
Throughput:                     0.00 MB/sec
Concurrency:                    9.99
Successful transactions:       29895
Failed transactions:               0
Longest transaction:            1.00
Shortest transaction:           0.00

= ONYX SIDE = (for some reason counts are always 10+ more than seige counts)
29905 impression.log

grep -o '.k_ping' impression.log  | sort | uniq -c | sort -rn
  14849 0k_ping
   7863 2k_ping
   4750 3k_ping
   2443 4k_ping

= CONCLUSION =

100% of the pings made it through at the maximum speed of a c3.large servers.
UDP TEST #2
===========
siege -c 10 -i -q -b -t 30s -f max.txt

Lifting the server siege...      done.

Transactions:                  30228 hits
Availability:                 100.00 %
Elapsed time:                  29.70 secs
Data transferred:               0.00 MB
Response time:                  0.01 secs
Transaction rate:            1017.78 trans/sec
Throughput:                     0.00 MB/sec
Concurrency:                    9.99
Successful transactions:       30228
Failed transactions:               0
Longest transaction:            1.02
Shortest transaction:           0.00

= ONYX SIDE =
30238 impression.log

grep -o '.k_ping' impression.log  | sort | uniq -c | sort -rn
  14972 0k_ping
   8050 2k_ping
   4809 3k_ping
   2407 4k_ping

= CONCLUSION =

100% pings made it through
After much investigation I'm nixing using TCP since it will require rewriting onxy's logging code to handle the nuances and differences in syslog transport protocols. 

Using UDP, each datagram is a single message. In TCP mode messages come as a stream with each message framed by a LF (line feed). [1]

While testing a hacked for tcp version of onyx, the logging code does not handle TCP logging well. TCP logging requires the onyx python process to maintain a socket connection between HTTP requests. UDP being connection and stateless is much easier to work with. 

Finally, the tests above show that even with UDP we don't lose messages even when we hit the maximum requests a c3.large server can handle. 

[1] http://www.rsyslog.com/doc/v7-stable/concepts/messageparser.html
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Hm, so are we able to make rsyslog lose data over UDP at all?  can we try the test again with 16, 64, 256K messages to see if we can break it?
You need to log in before you can comment on or make changes to this bug.