Before.. RNN LSTM : slow to train Can we parallelize sequential data? Transformers Input sequence can be transmitted **parallel ** No concept of time step Pass all the words simultaneously and determine the word embedding simultaneously (RNN passes input word one after ano
"CoCa: Contrastive Captioners are Image-Text Foundation Models" History of Vision and Language training Vision pretraining pretrain ConvNets or Transformers on large-scale data such as ImageNet, Instagram to solve visual recognition problem these models only learn modes for the vision modality-> not applicable to joint reasoning task over both image and text inputs Vision-Lan
- Models 1. Initial “mymodel.yaml” 2. “convnext_base.yaml” I follwed the hyperparamters same as convnext-base-224finetunedonImageInannotations Learning rat
Batch size and Learning rate https://openreview.net/pdf?id=B1Yy1BxCZ 1. Batch Size small: converges quickly at the cost of noise in the training process large: converges slowly with accurate estimates of the error gradient 2. Learning
Hyperparameter Tuning The process of finding the right combination of hyperparameters to maximize the model performance Hyperparameter tuning methods Random Search Grid Search Each iteration tries a combination of hyperparameters in a specific order. It fits the model on each combination, records the model performance, and returns the best model with the best hyperparameters. Bayesian Optimization Tree-structured Parzen estimators(TPE) Hype
The second GDSC-ML session was to convert Jupyter Notebook of MNIST CNN model into Python scripts. Like most people, I was used to do ML projects through Jupyter notebook. It had a big advangtage that I can validate and check the code easily by just typing (Shift + Enter). But there are some fallbacks of Jupyter Notbooks in data science projects Unorganized hard to keep track of what I wr