Closed Bug 948244 Opened 11 years ago Closed 11 years ago

creating a request takes too long

Categories

(Testing Graveyard :: Mozpool, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

Attachments

(1 file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=31710912&tree=Mozilla-Inbound > 16:46:09 INFO - retry: Calling <built-in function remove> with args: ('tmpfile_stdout',), kwargs: {}, attempt #1 > 16:46:19 ERROR - Try 1: Can't get http://mobile-imaging-007.p7.releng.scl1.mozilla.com/api/device/panda-0649/request/: HTTPConnectionPool(host='mobile-imaging-007.p7.releng.scl1.mozilla.com', port=80): Request timed out. (timeout=10)! but > request.693659 INFO - [2013-12-09 16:46:10,889] entering state finding_device > request.693659 INFO - [2013-12-09 16:46:11,695] Finding device. > request.693659 INFO - [2013-12-09 16:46:14,415] Assigning device panda-0649. > request.693659 INFO - [2013-12-09 16:46:15,516] Request succeeded. > request.693659 INFO - [2013-12-09 16:46:16,017] entering state contacting_lifeguard > 127.0.0.1:46788 - - [09/Dec/2013 16:46:19] "HTTP/1.1 POST /api/device/panda-0649/request/" - 200 OK so the HTTP transaction took slightly more than the 10s timeout in mozpoolclient. HTTP requests really shouldn't be blocking for that long! Here's what's going on between those lines > (initial HTTP request) mozpool/mozpool/handlers.py 39 args, body = templeton.handlers.get_request_parms() 40 try: 41 assignee = body['assignee'] 42 duration = int(body['duration']) 43 image_name = body['image'] 44 environment = body.get('environment', 'any') 45 except (KeyError, ValueError): 46 raise web.badrequest() 47 48 try: 49 image = self.db.images.get(image_name) 50 except exceptions.NotFound: 51 raise web.notfound() 52 53 boot_config = {} 54 for k in image['boot_config_keys']: 55 try: 56 boot_config[k] = body[k] 57 except KeyError: 58 raise web.badrequest() 59 60 request_id = self.db.requests.add(device_name, environment, assignee, 61 duration, image['id'], boot_config) 62 mozpool.mozpool.driver.handle_event(request_id, 'find_device', None) mozpool/statedriver.py 104 machine = self._get_machine(machine_name) 105 machine.handle_event(event, args) mozpool/statemachine.py 187 handler = self._event_methods.get(event) 188 if handler: 189 handler(self, args) mozpool/mozpool/requestmachine.py 105 self.machine.goto_state(finding_device) mozpool/statemachine.py 100 if isinstance(new_state_name_or_class, type) and issubclass(new_state_name_or_class, State): 101 new_state_name_or_class = new_state_name_or_class.state_name 102 103 self.state.on_exit() 104 105 # only log actual state changes, rather than re-entries of the same state 106 if self.state.state_name != new_state_name_or_class: 107 self.logger.info('entering state %s' % (new_state_name_or_class,)) > request.693659 INFO - [2013-12-09 16:46:10,889] entering state finding_device 109 self.state = self._make_state_instance(new_state_name_or_class) 110 self.write_state(new_state_name_or_class, self.state.TIMEOUT) 111 112 self.state.on_entry() mozpool/mozpool/requestmachine.py 125 self.find_device() 131 self.logger.info('Finding device.') > request.693659 INFO - [2013-12-09 16:46:11,695] Finding device. 132 device_name = None 133 count = self.machine.increment_counter(self.state_name) 134 self.db.device_requests.clear(self.machine.request_id) 135 request = self.db.requests.get_info(self.machine.request_id) 136 image_is_reusable = self.db.images.is_reusable(request['image']) 137 138 avail_devices = self.db.devices.list_available( 139 environment=request['environment'], 140 device_name=request['requested_device']) 141 142 if avail_devices: 143 if image_is_reusable: 144 devices_with_image = [x for x in avail_devices 145 if x['image'] == request['image'] and 146 util.from_json(x['boot_config']) == 147 util.from_json(request['boot_config'])] 148 if devices_with_image: 149 avail_devices = devices_with_image 150 151 # pick a device at random from the returned list 152 device_name = random.choice(avail_devices)['name'] 153 self.logger.info('Assigning device %s.' % device_name) > request.693659 INFO - [2013-12-09 16:46:14,415] Assigning device panda-0649. 154 if self.db.device_requests.add(self.machine.request_id, 155 device_name): 156 self.logger.info('Request succeeded.') > request.693659 INFO - [2013-12-09 16:46:15,516] Request succeeded. 157 self.machine.goto_state(contacting_lifeguard) mozpool/statemachine.py 100 if isinstance(new_state_name_or_class, type) and issubclass(new_state_name_or_class, State): 101 new_state_name_or_class = new_state_name_or_class.state_name 102 103 self.state.on_exit() 104 105 # only log actual state changes, rather than re-entries of the same state 106 if self.state.state_name != new_state_name_or_class: 107 self.logger.info('entering state %s' % (new_state_name_or_class,)) > request.693659 INFO - [2013-12-09 16:46:16,017] entering state contacting_lifeguard 109 self.state = self._make_state_instance(new_state_name_or_class) 110 self.write_state(new_state_name_or_class, self.state.TIMEOUT) 111 112 self.state.on_entry() mozpool/mozpool/requestmachine.py 186 req = self.db.requests.get_info(self.machine.request_id) 187 device_name = req['assigned_device'] 188 device_state = self.db.devices.get_machine_state(device_name) 189 if device_state != 'ready': 190 self.logger.error('Assigned device %s is in unexpected state %s ' 191 'when about to contact lifeguard.' % 192 (device_name, device_state)) 193 self.machine.goto_state(finding_device) 194 # note that there's still a small chance of a race here between 195 # mozpool and lifeguard: lifeguard begins self-testing the device 196 # after this check and before it receives the event below; the 197 # device eventually returns to the 'ready' state, but has not sent 198 # the request. The timeout for 'pending' will catch this rare 199 # situation. 200 return 201 202 # If the requested image is reusable and we got a device with that 203 # image and the requested bootconfig, just power cycle it. 204 # Otherwise, image it. Note that there will be a failure if the 205 # image is not installed and the device is not imageable. 206 event = '' 207 device_request_data = {} 208 assigned_device_name = req['assigned_device'] 209 210 if self.db.images.is_reusable(req['image']): 211 dev = self.db.devices.get_image(req['assigned_device']) 212 if (dev['image'] == req['image'] and 213 util.from_json(dev['boot_config']) == 214 util.from_json(req['boot_config'])): 215 event = 'please_power_cycle' 216 217 if not event: 218 # Use the device's hardware type and requested image to find the 219 # pxe config, if any. 220 event = 'please_image' 221 device_request_data['boot_config'] = req['boot_config'] 222 device_request_data['image'] = req['image'] 223 224 # try to ask lifeguard to start imaging or power cycling 225 device_url = 'http://%s/api/device/%s/event/%s/' % ( 226 self.db.devices.get_imaging_server(assigned_device_name), 227 assigned_device_name, event) 228 def posted(result): 229 if result.status_code != 200: 230 self.logger.warn("got %d from Lifeguard" % result.status_code) 231 return 232 mozpool.driver.handle_event(self.machine.request_id, 'lifeguard_contacted', {}) 233 async.requests.post.start(posted, device_url, 234 data=json.dumps(device_request_data)) > 127.0.0.1:46788 - - [09/Dec/2013 16:46:19] "HTTP/1.1 POST /api/device/panda-0649/request/" - 200 OK That's an awful lot to be doing before returning, but not bad given Mozpool's assumptions that it's OK to block on DB operations. The 0.5s between "Request succeeded." and "entering state contacting_lifeguard" must have been due to the inserts into the request_log table. That sucks, but we should be resilient to slowish databases. We could easily finish the HTTP request after "Request succeeded". Probably the easiest way to accomplish that is to transition to a "contact_lifeguard" state which has no on_entry method but a 0s timeout leading to the "contacting_lifeguard" state.
Attachment #8345321 - Flags: review?(jwatkins)
Attachment #8345321 - Flags: review?(jwatkins) → review+
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: