Closed
Bug 1417024
Opened 8 years ago
Closed 8 years ago
Adding deepspeech-gpu instances
Categories
(Taskcluster :: Operations and Service Requests, task)
Taskcluster
Operations and Service Requests
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: gerard-majax, Unassigned)
Details
To perform testing with GPU-enabled, we would need:
"instanceTypes": [
{
"instanceType": "p2.xlarge",
"capacity": 1,
"utility": 1,
"secrets": {},
"scopes": [],
"userData": {
"isolatedContainers": true
},
"launchSpec": {
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvdb",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 120,
"VolumeType": "gp2"
}
}
]
}
}
],
I've created aws-provisioner/deepspeech-gpu as a copy of aws-provisioner/deepspeech-worker but with the updated instanceTypes definition above from comment 0 and an increased maxPrice of 0.6 (deepspeech-worker had 0.2).
Note, you'll need to set up one or more new clients like https://tools.taskcluster.net/auth/clients/project%2Fdeepspeech%2Fworker%2Fmacos-light-6 but for the new workers that will claim these tasks.
Flags: needinfo?(lissyx+mozillians)
I'm talking nonsense, these will be spawned by AWS - I was thinking these were workers you were managing yourself!
Flags: needinfo?(lissyx+mozillians)
I think this is done, can you confirm if you are able to schudule tasks ok, and if you get the ps.xlarge instance types like you should?
Flags: needinfo?(lissyx+mozillians)
*p2.xlarge*
| Reporter | ||
Comment 7•8 years ago
|
||
Thanks, I'll take a look at that as soon as I can :)
| Reporter | ||
Comment 8•8 years ago
|
||
I have a task that is starting:
> [taskcluster 2017-11-27 12:05:03.100Z] Worker Node Type: p2.xlarge
> [taskcluster 2017-11-27 12:05:03.100Z] Worker Type: deepspeech-gpu
Flags: needinfo?(lissyx+mozillians)
| Reporter | ||
Comment 9•8 years ago
|
||
So far, trying to run |nvidia-smi| in the container fails:
> + nvidia-smi
> modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-112-generic/modules.dep.bin'
> modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-112-generic/modules.dep.bin'
> modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_340'
> modprobe: ERROR: could not insert 'nvidia_340': Function not implemented
> NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
:wcosta is checking that the host EC2 is properly configured to run Docker with NVIDIA.
Flags: needinfo?(wcosta)
Comment 10•8 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #9)
> So far, trying to run |nvidia-smi| in the container fails:
>
> > + nvidia-smi
> > modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-112-generic/modules.dep.bin'
> > modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-112-generic/modules.dep.bin'
> > modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_340'
> > modprobe: ERROR: could not insert 'nvidia_340': Function not implemented
> > NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
>
This error is gone once you install the linux-image-3.13.0-112-generic package. However, another error comes up related to grub-probe.
> :wcosta is checking that the host EC2 is properly configured to run Docker
> with NVIDIA.
I confirmed the EC2 image deepspeech-gpu worker-type is running is just another generic instance running. No special nvidia package was installed, afaik.
Flags: needinfo?(wcosta)
Comment 11•8 years ago
|
||
Some more info: getting an EC2 image setup with CUDA driver won't be an easy task, iiuc. docker-worker EC2 instances run on Ubuntu 14.04, and nvidia-docker on 16.04. I heard migrating docker-worker to run on 16.04 is... though.
| Reporter | ||
Comment 12•8 years ago
|
||
(In reply to Wander Lairson Costa [:wcosta] from comment #11)
> Some more info: getting an EC2 image setup with CUDA driver won't be an easy
> task, iiuc. docker-worker EC2 instances run on Ubuntu 14.04, and
> nvidia-docker on 16.04. I heard migrating docker-worker to run on 16.04
> is... though.
Hm, maybe we can get lucky, it seems according to this issue some people have it running on 14.04: https://github.com/NVIDIA/nvidia-docker/issues/475
| Reporter | ||
Comment 13•8 years ago
|
||
If there's no way to quickly hack that on top of the current 14.04, do you know when making docker-worker to work is expected for 16.04 ?
Flags: needinfo?(wcosta)
Comment 14•8 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #13)
> If there's no way to quickly hack that on top of the current 14.04, do you
> know when making docker-worker to work is expected for 16.04 ?
I am experiencing with upgrading AWS images to 16.04 right now
Flags: needinfo?(wcosta)
| Reporter | ||
Comment 15•8 years ago
|
||
Looks like I failed to follow that bug, but the long story being short, Wander spent a lot of time trying to get that working, and failed. So I guess we should just remove the worker for now, and see later when bits are ready.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(pmoore)
Resolution: --- → WONTFIX
I've deleted deepspeech-gpu worker type.
Flags: needinfo?(pmoore)
| Assignee | ||
Updated•7 years ago
|
Component: Service Request → Operations and Service Requests
You need to log in
before you can comment on or make changes to this bug.
Description
•