-
-
Notifications
You must be signed in to change notification settings - Fork 733
Creating a LocalCluster on a multi-core machine can take a while #2450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @mt-jones who ran into this as well I think |
So given a number of cores (like 32) we try to find some factors (like 8 and 4) that we can use for processes and threads. I'm inclined to split the difference and try to find numbers that are roughly equal, with a preference given to processes, especially for low numbers. Intuitively I like the following splits:
|
cc @jhamman who has some experience with high-core-count nodes. |
When we have a large number of available cores we should set the default number of threads per process to be higher than one. This implements a policy that aims for a number of processes equal to the square root of the number of cores, at least above a certain amount of cores. Partially addresses dask#2450
Partially addressed in #2452 |
Happened upon this and wanted to give some quick feedback. I have been working with 80 and 160 core machines recently and found that for me, the heuristic that has been working best was trending towards fewer workers and more threads (due to sending millions of tasks) My heuristic was to take the sqrt(core) and go down finding good divisors.
Giving me on 160 core machine (10, 16) (worker, thread), and on 80 core (8, 10). |
I've played around with this a bit as well. In general, I find that we usually want more processes than threads. @bluecoconut's |
Thank you for chiming in with your experience @bluecoconut . What you say is useful information. The approach in #2452 is the one that @jhamman suggests. It's reassuring that we all came to almost the same approach. The exact right approach will always be workload dependent though of course. Regardless, it would be nice to be able to reduce (or at least identify) the cost of creating new workers with forkserver if anyone has spare cycles (no obligation to anyone though of course). |
When we have a large number of available cores we should set the default number of threads per process to be higher than one. This implements a policy that aims for a number of processes equal to the square root of the number of cores, at least above a certain amount of cores. Partially addresses #2450
I'm playing with a machine that has 80 logical cores. In this case creating a
LocalCluster
or rawClient
can take a while. I believe that this is because it tries creating 80 workers at once with forkserver.There are a few potential approaches to solve this:
fork
approach rather than forkserver (though dask breaks for other reasons here)Probably we should do some combination of 1 and 3. Any thoughts or suggestions?
The text was updated successfully, but these errors were encountered: