Pipeline Parallelism (GPipe)
Ready
Training Data
MB 0
MB 1
MB 2
MB 3
Full Model (4 Layers)
GPU 0 (Layers 0-1)
→
GPU 1 (Layers 1-2)
→
GPU 2 (Layers 2-3)
Device Utilization Timeline
Forward
Backward
Idle
⏮ Previous
▶ Play
⏭ Next Step
↻ Reset