founders@comfydeploy.com
.founders@comfydeploy.com
.max parallel gpu
set to 2warm time
set to 1 minutealways warm GPUs
set to 0r1
comes in, a GPU spins up.r1
finishes.r1
comes in, a GPU spins up.r1
finishes.r2
is before r1_f + warm time
, so we reuse the same GPUr2
is faster than r1
because the GPU was warm.r1
comes in, a GPU spins up.r2
comes in before r1
finishes, a new GPU spins up.r1
finishes.r1
GPU spins down after staying warm for 1 min.r2
finishesr2
GPU spins down after staying warm for 1 min.r3
while our 2 requests were running (between r2
and r1_f
).
The third request would have to wait for one of the GPUs to finish before it can start as we’ve hit our max GPU limit.
r3 starts
.