ROBOT IMAGES
TEXT COMMAND
“Pick up the butter and
hand it over to the robot
on your left.”
ROBOT STATE
Joint angles
Finger positions
20 Hz
7-9 Hz
7-9Hz
200 Hz
7-9Hz
20 Hz
SYSTEM 2
Infrequent Vision-
Language Semantic
Reasoning
7B Pretrained VLM
GPU 2
Latent Vector
SYSTEM 1
Fast, Reactive Control
80M Transformer
GPU 1
Whole Upper Body Control