Authors note: Back on my normal schedule this week after the last few week of travel!
The Facts
MosaicML released a really interesting blog a few weeks back about how they were able to train modern LLMs on AMD GPUs.
Performance (in terms of power consumption and speed of training) slightly lagged the NVIDIA A100:
I’m aware of some large companies that have also seen similar results performing inference on AMD GPUs.
Why it matters
As we’ve discussed in the past, GPU constraits are severe limitations on the ability of current LLM providers to iterate on their products. OpenAI recently mentioned that they’re allocating 20% of their compute capacity to alignment research for super-human AGI! That is a lot of GPUs they can’t use to increase rate limits on the GPT-4 API!
AMD doesn’t have the distribution needed with cloud providers to end the GPU shortage overnight, but their training stack becoming competitive could have two really significant impacts on the LLM landscape:
The current GPU shortage may be shorter than previously expected if we can begin moving more workloads to AMD GPUs. Remember, even moving inference to AMD frees up more GPUs for training or fine tuning models.
There are many factors to the cost of training LLMs, but the large cost of NVIDIA GPUs is certainly a big chunk of the cost. Chipping away at their margin would be great for the cost of development overall.
AMD isn’t all of the way there yet — their support for training workloads is still really only for PyTorch models, and the performance significantly lags NVIDIA’s last-generation GPUs.
My thoughts
This blog is one of the first signs I’ve seen of AMD actually seeing real use in large-scale model training. Just the fact that its possible is exciting! In recent months we’d gotten mixed signals about AMD GPU capabilities (George Hotz recently was quite frustrated developing with them for his new venture).
It will take time, but I expect the market share to start to balance out moving forward. The stakes are too high for us to truly subscribe to a monopoly in LLM acceleration.