AI_Tech(CV) 에 있는 논문 정리

Leejaegun·2024년 9월 2일

논문리뷰

목록 보기

1/10

1.Introduction to Computer Vision

없음. 다만 참조하고 있는 "생성형 AI학습 데이터 공개해야" https://www.hani.co.kr/arti/economy/economy_general/1128825.html 기사는 읽을 만함.

2. CNN부터 ViT까지

VGGNet: https://arxiv.org/abs/1409.1556
ResNet: https://arxiv.org/abs/1512.03385
ViT: https://arxiv.org/abs/2010.11929

3. CNN 시각화와 데이터 증강

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization: https://arxiv.org/abs/1610.02391
Grad-CAM++ : https://arxiv.org/abs/1710.11063
mixup: Beyond Empirical Risk Minimization: https://arxiv.org/abs/1710.09412
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features: https://arxiv.org/abs/1905.04899

4. Segmentation & Detection

Fully_Convolutional Network: https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf
R-CNN : https://arxiv.org/abs/1311.2524
YOLO : https://arxiv.org/pdf/1506.02640
focal_Loss(RetinaNet) :https://arxiv.org/abs/1708.02002
Panoptic Segmentation: https://arxiv.org/abs/1801.00868
DETR: https://arxiv.org/pdf/2005.12872
SAM: https://arxiv.org/pdf/2304.02643
Uni-DVPS : (비공개 자료.-> 학교 도서관으로 access해서 받으면됨.)
Grounded SAM : https://arxiv.org/abs/2401.14159

5. Computational Imaging

Real-World Single Image Super-Resolution: A New Benchmark and A New Model (https://arxiv.org/abs/1904.00523)
Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms (https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123700188.pdf)
Blind Super-Resolution Kernel Estimation using an Internal-GAN (https://arxiv.org/abs/1909.06581)
(Motion estimation 최신 논문) SpatialTracker: Tracking Any 2D Pixels in 3D Space (https://arxiv.org/abs/2404.04319)
RealBlur : https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123700188.pdf
SRGAN : https://arxiv.org/abs/1609.04802
learning_blind vidio temporal consistency : https://openaccess.thecvf.com/content_ECCV_2018/papers/Wei-Sheng_Lai_Real-Time_Blind_Video_ECCV_2018_paper.pdf

6. Multimodal1

CLIP huggingface implementation: https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_clip.py
ImageBIND official implementation: https://github.com/facebookresearch/ImageBind
LanguageBIND: https://arxiv.org/abs/2310.01852
zero-shot : https://openaccess.thecvf.com/content/CVPR2022/papers/Tewel_ZeroCap_Zero-Shot_Image-to-Text_Generation_for_Visual-Semantic_Arithmetic_CVPR_2022_paper.pdf
clip :https://arxiv.org/abs/2103.00020
zeroshot cap : https://arxiv.org/abs/2111.14447
styleGAN : https://arxiv.org/pdf/1812.04948
imageBIND : https://arxiv.org/abs/2305.05665
DALE-e :https://arxiv.org/abs/2204.06125

7. Multimodal2

Flamingo pytorch implementation: https://github.com/lucidrains/flamingo-pytorch/blob/main/flamingo_pytorch/flamingo_pytorch.py
Flamingo : https://arxiv.org/abs/2204.14198
LLaVA: https://llava-vl.github.io/
Q-Former 학습 보충 논문 (순서대로 읽기 권장)
-- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (https://arxiv.org/abs/2201.12086)
-- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (https://arxiv.org/abs/2301.12597)
VisualProgramming:https://arxiv.org/abs/2211.11559

8. Generative Models

DDPM: https://arxiv.org/abs/2006.11239
LDM (Stable Diffusion): https://arxiv.org/abs/2112.10752
DDIM: https://arxiv.org/abs/2010.02502
pixelRNN :https://arxiv.org/abs/1601.06759
Stable Diffusion : https://arxiv.org/abs/2112.10752
ControlNet :https://arxiv.org/abs/2302.05543
Marigold(Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation):
https://arxiv.org/abs/2312.02145
SORA : https://arxiv.org/abs/2402.17177
LoRA :
https://arxiv.org/abs/2106.09685

9. 3D Understanding

[3D MACHINE LEARNING] - 3D DATA REPRESENTATIONS: https://www.antoinetlc.com/blog-summary/3d-data-representations
Mesh R-CNN : https://arxiv.org/abs/1906.02739
NeRF: https://arxiv.org/abs/2003.08934
3DGS(Gaussian): https://arxiv.org/abs/2308.04079
DreamFusion: https://arxiv.org/abs/2209.14988
Structure from mothion revisited:
https://openaccess.thecvf.com/content_cvpr_2016/papers/Schonberger_Structure-From-Motion_Revisited_CVPR_2016_paper.pdf
shapeNet : https://arxiv.org/abs/1512.03012
Pixel2Mesh: Generatubg 3D mesh models from single RGB images
https://arxiv.org/abs/1804.01654
Paint-it :https://openaccess.thecvf.com/content/CVPR2024/papers/Youwang_Paint-it_Text-to-Texture_Synthesis_via_Deep_Convolutional_Texture_Map_Optimization_and_CVPR_2024_paper.pdf

10. 3D Human

Loper et al., SMPL: A Skinned Multi-Person Linear Model: SIGGRAPH 2015.
https://dl.acm.org/doi/10.1145/2816795.2818013
Bogo et al., Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image: ECCV 2016.
https://arxiv.org/abs/1607.08128
Anguelov et al., SCAPE: Shape Completion and Animation of People: SIGGRAPH 2005.
https://dl.acm.org/doi/10.1145/1073204.1073207

11)etc 그외 중요하다고 생각되는 것들

R-CNN
https://arxiv.org/abs/1311.2524
YOLO
https://arxiv.org/pdf/1506.02640
U-net
https://arxiv.org/abs/1505.04597
SDEdit
https://arxiv.org/abs/2108.01073
Fast R-CNN
https://arxiv.org/abs/1504.08083
Faster R-CNN
https://arxiv.org/abs/1506.01497
Masked R-CNN
https://arxiv.org/abs/1703.06870
DINO
https://arxiv.org/abs/2104.14294
GAN
https://arxiv.org/abs/1406.2661
LoRA
https://arxiv.org/pdf/2106.09685
Live2Diffusion
https://arxiv.org/pdf/2407.08701
Meta-learning
https://arxiv.org/pdf/1703.03400
Gasuian Misxture
https://arxiv.org/pdf/1711.06929
Multilabel Image Classification Using Deep Learning
https://www.mathworks.com/help/deeplearning/ug/multilabel-image-classification-using-deep-learning.html
Fine-Grained Image Analysis with Deep Learning: A Survey
https://arxiv.org/abs/2111.06119
Few shot Classification
https://arxiv.org/pdf/2303.07502
one shot Classification
https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf
skin color -based classification
https://arxiv.org/pdf/1708.02694
A survey on Image Data Augmentation for Deep learning
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0
AutoAugment: Learning Augmentation Policies from Data
https://arxiv.org/abs/1805.09501
RandAugment: Practical automated data augmentation with a reduced search spac
https://arxiv.org/abs/1909.13719
A ConvNet for the 2020s
https://arxiv.org/abs/2201.03545
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
https://arxiv.org/abs/2010.11929
CoAtNet: Marrying Convolution and Attention for All Data Sizes
https://arxiv.org/abs/2106.04803
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
https://arxiv.org/abs/2103.10697
CNN for NLP
https://emnlp2014.org/papers/pdf/EMNLP2014181.pdf
DMLR
https://arxiv.org/abs/2311.13028
DMops
https://arxiv.org/abs/2301.01228
DataPerf: Benchmarks for Data-Centric AI Development
https://arxiv.org/abs/2207.10062
Expectation-Maximization Attention Networks for Semantic Segmentation
https://openaccess.thecvf.com/content_ICCV_2019/papers/Li_Expectation-Maximization_Attention_Networks_for_Semantic_Segmentation_ICCV_2019_paper.pdf
Generalized Video Deblurring for Dynamic Scenes
https://arxiv.org/abs/1507.02438
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
https://arxiv.org/abs/1701.06538
From Sparse to Soft Mixtures of Experts
https://arxiv.org/abs/2308.00951

다 논문을 리뷰하기는 어렵지만 그래도 훌륭한 논문들이므로 왠만하면 다 읽어보자..!
하다보니까 논문을 안 읽고는 이해하기가 힘들겠는데..? 강의만으로는 이해하기 불가한거 아닌가 ㅋㅋ

12) xAI 관련 논문

Grad-CAM
https://arxiv.org/abs/1610.02391
Grad-CAM++
https://arxiv.org/abs/1710.11063
Saliency map
https://arxiv.org/abs/1312.6034
igos++
https://arxiv.org/abs/2012.15783
코드:https://github.com/khorrams/IGOS_pp

Leejaegun

Lee_AA

다음 포스트