Pipeline Parallelism (GPipe)

Ready
Training Data
MB 0
MB 1
MB 2
MB 3
Full Model (4 Layers)
GPU 0 (Layers 0-1)
GPU 1 (Layers 1-2)
GPU 2 (Layers 2-3)
Device Utilization Timeline
Forward
Backward
Idle