👩💻 Join our community of thousands of amazing developers!
This post follows from the previous post Distributed training of a GPT model using DeepSpeed. We discussed that an ML model allows for three dimensions of parallelism, on Data, Pipeline and Tensors/Models. We covered distributed data parallellism and sharded data parallelism in the previous post. Here we will discuss pipeline and model (tensor) parallelism. The 3D parallelism aims and partitioning (color-coded) computer resources across the 3D space of data, pipeline and tensor (model) dimensi...