"More importantly, we map an image to a feature that encodes not only object identities but also object poses, and such a feature map can generalize better to images of novel objects the robot has ...