Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60, 91-110.
"Clip goes 3d: Leveraging prompt tuning for language grounded 3d recognition." ICCV 2023 workshop paper