Closed
Bug 948244
Opened 11 years ago
Closed 11 years ago
creating a request takes too long
Categories
(Testing Graveyard :: Mozpool, defect)
Testing Graveyard
Mozpool
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
Details
Attachments
(1 file)
9.03 KB,
patch
|
dividehex
:
review+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=31710912&tree=Mozilla-Inbound
> 16:46:09 INFO - retry: Calling <built-in function remove> with args: ('tmpfile_stdout',), kwargs: {}, attempt #1
> 16:46:19 ERROR - Try 1: Can't get http://mobile-imaging-007.p7.releng.scl1.mozilla.com/api/device/panda-0649/request/: HTTPConnectionPool(host='mobile-imaging-007.p7.releng.scl1.mozilla.com', port=80): Request timed out. (timeout=10)!
but
> request.693659 INFO - [2013-12-09 16:46:10,889] entering state finding_device
> request.693659 INFO - [2013-12-09 16:46:11,695] Finding device.
> request.693659 INFO - [2013-12-09 16:46:14,415] Assigning device panda-0649.
> request.693659 INFO - [2013-12-09 16:46:15,516] Request succeeded.
> request.693659 INFO - [2013-12-09 16:46:16,017] entering state contacting_lifeguard
> 127.0.0.1:46788 - - [09/Dec/2013 16:46:19] "HTTP/1.1 POST /api/device/panda-0649/request/" - 200 OK
so the HTTP transaction took slightly more than the 10s timeout in mozpoolclient. HTTP requests really shouldn't be blocking for that long!
Here's what's going on between those lines
> (initial HTTP request)
mozpool/mozpool/handlers.py
39 args, body = templeton.handlers.get_request_parms()
40 try:
41 assignee = body['assignee']
42 duration = int(body['duration'])
43 image_name = body['image']
44 environment = body.get('environment', 'any')
45 except (KeyError, ValueError):
46 raise web.badrequest()
47
48 try:
49 image = self.db.images.get(image_name)
50 except exceptions.NotFound:
51 raise web.notfound()
52
53 boot_config = {}
54 for k in image['boot_config_keys']:
55 try:
56 boot_config[k] = body[k]
57 except KeyError:
58 raise web.badrequest()
59
60 request_id = self.db.requests.add(device_name, environment, assignee,
61 duration, image['id'], boot_config)
62 mozpool.mozpool.driver.handle_event(request_id, 'find_device', None)
mozpool/statedriver.py
104 machine = self._get_machine(machine_name)
105 machine.handle_event(event, args)
mozpool/statemachine.py
187 handler = self._event_methods.get(event)
188 if handler:
189 handler(self, args)
mozpool/mozpool/requestmachine.py
105 self.machine.goto_state(finding_device)
mozpool/statemachine.py
100 if isinstance(new_state_name_or_class, type) and issubclass(new_state_name_or_class, State):
101 new_state_name_or_class = new_state_name_or_class.state_name
102
103 self.state.on_exit()
104
105 # only log actual state changes, rather than re-entries of the same state
106 if self.state.state_name != new_state_name_or_class:
107 self.logger.info('entering state %s' % (new_state_name_or_class,))
> request.693659 INFO - [2013-12-09 16:46:10,889] entering state finding_device
109 self.state = self._make_state_instance(new_state_name_or_class)
110 self.write_state(new_state_name_or_class, self.state.TIMEOUT)
111
112 self.state.on_entry()
mozpool/mozpool/requestmachine.py
125 self.find_device()
131 self.logger.info('Finding device.')
> request.693659 INFO - [2013-12-09 16:46:11,695] Finding device.
132 device_name = None
133 count = self.machine.increment_counter(self.state_name)
134 self.db.device_requests.clear(self.machine.request_id)
135 request = self.db.requests.get_info(self.machine.request_id)
136 image_is_reusable = self.db.images.is_reusable(request['image'])
137
138 avail_devices = self.db.devices.list_available(
139 environment=request['environment'],
140 device_name=request['requested_device'])
141
142 if avail_devices:
143 if image_is_reusable:
144 devices_with_image = [x for x in avail_devices
145 if x['image'] == request['image'] and
146 util.from_json(x['boot_config']) ==
147 util.from_json(request['boot_config'])]
148 if devices_with_image:
149 avail_devices = devices_with_image
150
151 # pick a device at random from the returned list
152 device_name = random.choice(avail_devices)['name']
153 self.logger.info('Assigning device %s.' % device_name)
> request.693659 INFO - [2013-12-09 16:46:14,415] Assigning device panda-0649.
154 if self.db.device_requests.add(self.machine.request_id,
155 device_name):
156 self.logger.info('Request succeeded.')
> request.693659 INFO - [2013-12-09 16:46:15,516] Request succeeded.
157 self.machine.goto_state(contacting_lifeguard)
mozpool/statemachine.py
100 if isinstance(new_state_name_or_class, type) and issubclass(new_state_name_or_class, State):
101 new_state_name_or_class = new_state_name_or_class.state_name
102
103 self.state.on_exit()
104
105 # only log actual state changes, rather than re-entries of the same state
106 if self.state.state_name != new_state_name_or_class:
107 self.logger.info('entering state %s' % (new_state_name_or_class,))
> request.693659 INFO - [2013-12-09 16:46:16,017] entering state contacting_lifeguard
109 self.state = self._make_state_instance(new_state_name_or_class)
110 self.write_state(new_state_name_or_class, self.state.TIMEOUT)
111
112 self.state.on_entry()
mozpool/mozpool/requestmachine.py
186 req = self.db.requests.get_info(self.machine.request_id)
187 device_name = req['assigned_device']
188 device_state = self.db.devices.get_machine_state(device_name)
189 if device_state != 'ready':
190 self.logger.error('Assigned device %s is in unexpected state %s '
191 'when about to contact lifeguard.' %
192 (device_name, device_state))
193 self.machine.goto_state(finding_device)
194 # note that there's still a small chance of a race here between
195 # mozpool and lifeguard: lifeguard begins self-testing the device
196 # after this check and before it receives the event below; the
197 # device eventually returns to the 'ready' state, but has not sent
198 # the request. The timeout for 'pending' will catch this rare
199 # situation.
200 return
201
202 # If the requested image is reusable and we got a device with that
203 # image and the requested bootconfig, just power cycle it.
204 # Otherwise, image it. Note that there will be a failure if the
205 # image is not installed and the device is not imageable.
206 event = ''
207 device_request_data = {}
208 assigned_device_name = req['assigned_device']
209
210 if self.db.images.is_reusable(req['image']):
211 dev = self.db.devices.get_image(req['assigned_device'])
212 if (dev['image'] == req['image'] and
213 util.from_json(dev['boot_config']) ==
214 util.from_json(req['boot_config'])):
215 event = 'please_power_cycle'
216
217 if not event:
218 # Use the device's hardware type and requested image to find the
219 # pxe config, if any.
220 event = 'please_image'
221 device_request_data['boot_config'] = req['boot_config']
222 device_request_data['image'] = req['image']
223
224 # try to ask lifeguard to start imaging or power cycling
225 device_url = 'http://%s/api/device/%s/event/%s/' % (
226 self.db.devices.get_imaging_server(assigned_device_name),
227 assigned_device_name, event)
228 def posted(result):
229 if result.status_code != 200:
230 self.logger.warn("got %d from Lifeguard" % result.status_code)
231 return
232 mozpool.driver.handle_event(self.machine.request_id, 'lifeguard_contacted', {})
233 async.requests.post.start(posted, device_url,
234 data=json.dumps(device_request_data))
> 127.0.0.1:46788 - - [09/Dec/2013 16:46:19] "HTTP/1.1 POST /api/device/panda-0649/request/" - 200 OK
That's an awful lot to be doing before returning, but not bad given Mozpool's assumptions that it's OK to block on DB operations. The 0.5s between "Request succeeded." and "entering state contacting_lifeguard" must have been due to the inserts into the request_log table. That sucks, but we should be resilient to slowish databases.
We could easily finish the HTTP request after "Request succeeded". Probably the easiest way to accomplish that is to transition to a "contact_lifeguard" state which has no on_entry method but a 0s timeout leading to the "contacting_lifeguard" state.
Assignee | ||
Comment 1•11 years ago
|
||
Attachment #8345321 -
Flags: review?(jwatkins)
Comment 2•11 years ago
|
||
Attachment #8345321 -
Flags: review?(jwatkins) → review+
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•