Settings
Max Parallel GPU

If you need more then 10 GPUs concurrently, contact us at
founders@comfydeploy.com.Workflow timeout
The maximum amount of time you want a workflow to run for. If a workflow run exceeds this time, the run will be cancelled.We can increase this to up to 24 hours, contact us at
founders@comfydeploy.com.Warm time
After your workflow has finished running on a GPU, you have the option to keep it warm for a certain amount of time, to reduce cold starts for your next request.Warm time is still charged, this is a trade-off between cost and performance.
Keep warm
For the highest performance workloads. You can keep your GPUs warm to reduce cold starts to zero.Example situations
This are some examples to show what happens with different request patterns with the same settings. In this examplemax parallel gpuset to 2warm timeset to 1 minutealways warm GPUsset to 0
Example 1: Basic
We have only 1 request.r1comes in, a GPU spins up.r1finishes.- The GPU is kept warm for 1 minute before spinning down.

Example 2: Taking advantage of warm GPUs
This time we have 2 requests, where the 2nd request uses a warm GPU.r1comes in, a GPU spins up.r1finishes.r2is beforer1_f + warm time, so we reuse the same GPUr2is faster thanr1because the GPU was warm.- The GPU is kept warm for 1 minute before spinning down.

Example 3: Scaling up and hitting max GPUs
We have 2 requests, and we’ll spin up 2 GPUs.r1comes in, a GPU spins up.r2comes in beforer1finishes, a new GPU spins up.r1finishes.r1GPU spins down after staying warm for 1 min.r2finishesr2GPU spins down after staying warm for 1 min.

r3 while our 2 requests were running (between r2 and r1_f).
The third request would have to wait for one of the GPUs to finish before it can start as we’ve hit our max GPU limit.
r3 starts.
