Closed Bug 654202 Opened 14 years ago Closed 11 years ago

Masters getting 421's from smtp.mozilla.org

Categories

(Release Engineering :: General, defect, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Unassigned)

Details

(Whiteboard: [buildmasters][puppet])

Note that dm-mail03 is the same as smtp.mozilla.org. The TinderboxMailNotifier connects directly to mail.build.mozilla.org = smtp.mozilla.org = dm-mail03 to send its tinderbox messages. It looks like TinderboxMailNotifier does not handle 4xx (transient) errors correctly - it should store and retry the message. If these happen frequently, we may need to add such error handling to the mail notifier, or consider pointing the mail notifier at the local postfix install on the master (which will retry on its behalf). If this was just a few errors, we can probably let them slide. Note that the local machine *was* able to send email, as the error report here is in an email. -------- Original Message -------- Subject: twistd.log exceptions on buildbot-master04.build.scl1.mozilla.com bm04-tests1 Date: Mon, 02 May 2011 19:00:01 -0000 From: Client Builder <cltbld@buildbot-master04.build.scl1.mozilla.com> To: release@mozilla.com The following exceptions (total 2) were detected on buildbot-master04.build.scl1.mozilla.com bm04-tests1: Exception in /builds/buildbot/tests1/master/twistd.log: 2011-05-02 11:53:36-0700 [ESMTPSender,client] Unhandled Error Traceback (most recent call last): Failure: twisted.mail.smtp.SMTPDeliveryError: 421 No recipients accepted tinderbox-daemon@tinderbox.mozilla.org: 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded >>> MAIL FROM:<talos.buildbot@build.mozilla.org> <<< 250 2.1.0 Ok >>> RCPT TO:<tinderbox-daemon@tinderbox.mozilla.org> <<< 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded -------------------------------------------------------------------------------- Exception in /builds/buildbot/tests1/master/twistd.log: 2011-05-02 11:53:36-0700 [ESMTPSender,client] Unhandled Error Traceback (most recent call last): Failure: twisted.mail.smtp.SMTPDeliveryError: 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded >>> RCPT TO:<tinderbox-daemon@tinderbox.mozilla.org> <<< 250 2.1.5 Ok >>> DATA <<< 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded
I've advocated using the local mail transport agent before (bug 493817) but we decided it wasn't worth the effort when mail was reliable. Still think it's worth doing, as long as we monitor that we're not queuing lots of mail locally.
We can configure it with puppet, too, so it's easy to make sure it's working correctly. And a stub configuration for postfix is very simple.
Priority: -- → P3
Whiteboard: [buildmasters][puppet]
What happens once this timeout happens? It gives up and the email is never sent? What are the steps needed to make this happen? (setup sendmail) I am curious to know if fixing this up would help mitigating bug 652812. My gut feeling says "not completely" as it seems that tbox is taking long to process mail (I guess it would help if it helps on re-trying).
Using a local MTA is the correct solution. SMTP mail is a reasonable MQ if you use all of it. If both ends don't implement the retry mechanisms, all bets are off. We may want to set the retry timeouts very short on the buildmasters, since I think the defaults are 4hrs in some cases.
I got few more yesterday: Exception in /builds/buildbot/tests1/master/twistd.log: 2011-05-24 17:57:25-0700 [ESMTPSender,client] Unhandled Error Traceback (most recent call last): Failure: twisted.mail.smtp.SMTPDeliveryError: 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded <<< 250 2.1.5 Ok >>> DATA <<< 354 End data with <CR><LF>.<CR><LF> <<< 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded
I got few more at 04:49 (all at the same time). In the previous comment also all instances happen at the same time. Exception in /builds/buildbot/tests1/master/twistd.log: 2011-05-24 04:49:03-0700 [ESMTPSender,client] Unhandled Error Traceback (most recent call last): Failure: twisted.mail.smtp.EHLORequiredError: 502 Server does not support ESMTP Authentication <<< 220 dm-mail03.mozilla.org ESMTP Postfix >>> EHLO buildbot-master06.build.scl1.mozilla.com <<< 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded
Another 15 at the same time from buildbot-master06 Exception in /builds/buildbot/tests1/master/twistd.log.1: 2011-05-25 13:22:22-0700 [ESMTPSender,client] Unhandled Error Traceback (most recent call last): Failure: twisted.mail.smtp.SMTPDeliveryError: 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded <<< 250-8BITMIME <<< 250 DSN >>> MAIL FROM:<talos.buildbot@build.mozilla.org> <<< 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded
Few more today: - buildbot-master06 few in from 12:18:08 to 12:53:11 - buildbot-master5 at 12:18:00 - buildbot-master04 several from 12:57:38 to 12:58:08 Test masters so far. Traceback (most recent call last): Failure: twisted.mail.smtp.SMTPDeliveryError: 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded <<< 250-8BITMIME <<< 250 DSN >>> MAIL FROM:<talos.buildbot@build.mozilla.org> <<< 421 4.4.2 dm-mail03.mozilla.org Error: timeout exceeded
found in triage. RelEng are not using tinderbox server since ~sep2012. Is this still an issue?
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: catlee
Product: mozilla.org → Release Engineering
(In reply to John O'Duinn [:joduinn] from comment #9) > found in triage. > > RelEng are not using tinderbox server since ~sep2012. Is this still an issue? Probably not!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.