The Facts
The GPU shortage has reared its ugly head again in the last few weeks, in particular for smaller companies trying to rent GPUs for experimentation:
In practice, most smaller players I’ve talked to have recently had difficulty leasing any GPU instances from the major players (Azure, AWS, GCP, Lambda Labs, CoreWeave).
Why it matters
A few months ago, I wrote about how the GPU shortage impacts large model providers — it forces them to make awkward tradeoffs between the UX of existing products and research for advanced products.
In the last few weeks, the impact of GPU shortages on startups and practitioners has been much more front-and-center. Right now, the best way to get GPUs is to sign long-term leases for large numbers of GPUs — often multi-million dollar contracts. For most smaller players, this is simply not a real option.
This leaves a few options for startups and tinkerers:
Use models built by model providers (who already have huge GPU supplies).
Use services provided by larger startups — companies like Replicate have raised enough money to reserve GPUs
Try to get lucky and get access to on-demand GPUs when they become available.
In practice, this shortage limits innovation: it is unreasonably hard for smaller teams to experiment with fine-tuning models. Most of the fine-tuning activity I have seen has been on the smallest models (Llama 2 3B or 13B) instead of the more powerful 70B parameter models — those smaller models fit on smaller GPUs that have more available supply.
My thoughts
Two quick thoughts:
Although I think the fine-tuning era should be starting, GPU availability will dramatically slow it down. Providers like Lamini, who can aggregate demand and reserve GPUs, serve to benefit.
I think the popularity of hosted models is artificially inflated by there not being any alternatives. If the GPU shortage becomes less dramatic, I expect more teams to experiment with their own models.