Given a single image with multiple concepts, annotated by loose segmentation masks, our method can learn a distinct token for each concept, and use natural language guidance to re-synthesize the ...
A group of young men in their bathing suits are swimming in a body of water located in the foreground of the scene. On the right-hand side of the foreground, one man is pulling another out of the ...