GTests keep reading after read error

RESOLVED FIXED

Status

NSS
Test
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: ekr, Unassigned)

Tracking

trunk

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
You're not supposed to do this.
(Reporter)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
What about DTLS?

See https://public-artifacts.taskcluster.net/HfsKhOR2R2mVLDxOIAN_2w/1/public/logs/live_backing.log

[ RUN      ] DatagramOnly/TlsConnectDatagram.ShortRead/0
Version: DTLS 1.3
rsa: Changing state from INIT to CONNECTING
client: Changing state from INIT to CONNECTING
client: Handshake
client: Would have blocked
rsa: Handshake
rsa: Would have blocked
Poll() waiters = 2 timers = 17
client: Readable
client: Handshake
client: Handshake success
client: Changing state from CONNECTING to CONNECTED
client: Handshake success
client: Readable
client: ReadBytes -1
client: Read error -5998: The operation would have blocked
client: Re-arming
Poll() waiters = 2 timers = 15
rsa: Readable
rsa: Handshake
rsa: Handshake success
rsa: Changing state from CONNECTING to CONNECTED
rsa: Handshake success
Connected with version 772 cipher suite TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
rsa: Writing 1200 bytes
Poll() waiters = 2 timers = 16
client: Readable
client: ReadBytes -1
client: Read error -12155: Not enough room in buffer for DTLS record.
client: Writing 50 bytes
rsa: Writing 50 bytes
Poll() waiters = 2 timers = 17
rsa: Readable
rsa: ReadBytes 50
rsa: Re-arming
Poll() waiters = 2 timers = 17
rsa: Readable
rsa: ReadBytes -1
rsa: Read error -5998: The operation would have blocked
rsa: Re-arming
Poll() waiters = 2 timers = 16
Poll() waiters = 2 timers = 15
Poll() waiters = 2 timers = 14
Poll() waiters = 2 timers = 13
Poll() waiters = 2 timers = 12
Poll() waiters = 2 timers = 11
tls_connect.cc:378: Failure
Value of: client_->received_bytes()
  Actual: 0
Expected: amount
Which is: 50
[  FAILED  ] DatagramOnly/TlsConnectDatagram.ShortRead/0, where GetParam() = 772 (2179 ms)
Flags: needinfo?(ekr)
Maybe that test is invalid, but I think that SSL_ERROR_RX_SHORT_DTLS_READ is recoverable.
(Reporter)

Comment 4

2 years ago
Yeah, that seems like a bug in the tests, because we should be recovering in that case.
Flags: needinfo?(ekr)
Created attachment 8759516 [details] [diff] [review]
bug1276899-1.patch

Is this the sort of thing you are looking for?
Attachment #8759516 - Flags: review?(ekr)
(Reporter)

Comment 6

2 years ago
Comment on attachment 8759516 [details] [diff] [review]
bug1276899-1.patch

Review of attachment 8759516 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. But, why are the tests passing in automation without this?
Attachment #8759516 - Flags: review?(ekr) → review+
It's manifesting as an intermittent.
(Reporter)

Comment 8

2 years ago
That's concerning. Any theory on why?
Probably not answering your question, but adding some more information: TlsConnectDatagram.ShortRead tests currently fail almost permanently when run by ASan builds on Taskcluster. This seems be due to a timing issue because ASan debug builds and the docker host are both somewhat slow. I can reproduce it locally by running ssl_gtests (built with USE_ASAN=1) in the Taskcluster docker image limited to 50% of a single core. With mt's patch above the failures are gone.
(Reporter)

Comment 10

2 years ago
Yeah, my question is "Why does it ever work"
Well, your change only stops the polling from happening.  It doesn't block future attempts to call ReadBytes().  Reads also happen as a result of the handshake timer.  Maybe in those builds where it succeeds it is reading as a consequence of the handshake timer.
You need to log in before you can comment on or make changes to this bug.