Although the main task is 3D part segmentation, Previous work finetunes 2D VLM (GLIP) using VLM’s objective instead of main task as an objective. Thus we propose a task adaptation approach that adapts 2D task to 3D task, instead of the conventional domain adaptation, which fails to fully exploiting 3D segmentation results.
However, our new objective function (3D mRIoU Loss) is not differentiable w.r.t GLIP’s output, we propose an alternative approach, Weight Prediction & Score Reformulation. (Please refer to the main paper for details.)
Additionally, we improve performance by using the bounding boxes predicted by GLIP as conditions for SAM to perform mask refinement.