Closed
Bug 772467
Opened 13 years ago
Closed 13 years ago
Figure out (and fix!) stale buildslave connections in AWS
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: catlee)
Details
(Whiteboard: [ec2])
Attachments
(1 file)
2.13 KB,
patch
|
dustin
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
If you reboot a build slave in AWS without first shutting off buildbot, the master doesn't know that the old instance disconnected, and will prevent the new instance from connecting forever.
Assignee | ||
Comment 1•13 years ago
|
||
I'm not at all sure why the AWS slaves hit this problem more frequently than our other machines, but this patch seems to work around the issue.
Instead of relying on callRemote() to cause the old tcp session to die, we add a timeout (30s here), and if we haven't heard back from the slave before the timeout we disconnect it. This then allows the next slave connection to succeed.
Attachment #640910 -
Flags: review?(dustin)
Comment 2•13 years ago
|
||
Comment on attachment 640910 [details] [diff] [review]
give up on old slave connections
lgtm. getPeer includes both the remote IP and port, so a collision is unlikely (since slaves don't re-use ports)
Attachment #640910 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 3•13 years ago
|
||
Comment on attachment 640910 [details] [diff] [review]
give up on old slave connections
landed on bm35 only for now
Attachment #640910 -
Flags: checked-in+
Assignee | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 4•13 years ago
|
||
Not sure if we want or need this on other buildbot masters as well?
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•