Multimodal Understanding & Generation Group (MUG), Intelligence Computing Lab, Tsinghua University