AI for Generative Art: create animated mangas with AI & image processing

Hyun Lee·2023년 7월 6일
0

University

목록 보기
4/9

< Python Basics for Data Science >

1. Python Basics

2. Python Functions and Modules

3. Python Libraires

Numpy

: provides vectorization of mathematical operations on arrays and
matrices which significantly improves the performance

Pandas

: provides tools for data manipulation: reshaping, merging,
sorting, slicing, aggregation etc.

< Data Science and Machine Learning >

1. Data Science and Machine Learning

Data Science

  • Data Understanding & Data Acquisition
    - Data Cleaning & Data Transformation
    - Model Building
    - Evaluation
    - Deployment
  • Data Preparation

    Data Collection
    Data Processing
    Missing and Repeated Values
    Outliers and Errors
    Data Preparation: Data Transformation
    Scaling

  • Training data and test data

Machine Learning

  • Classification: redict answers to Yes/No questions

    Evaluation of Classification

  • Regression: Predict real values

    Linear Regression
    Evaluation of Regression

  • Clustering: Find patterns of similar objects

  • Supervised Learning
    1. Binary classification:
      1. Multiclass classification
      2. Multilabel classification
  • Unsupervised learning ( Clustering )

  • Training, Validation and Test data

  • Cross Validation

  • Overfitting

2. DATA PREPROCESSING FOR MACHINE LEARNING

3. MACHINE LEARNING MODELS

< Deep Learning for Computer Vision >

1 . Deep Learning for Computer Vision

Deep Learning

  • The Neuron

  • activation function
    : 네트워크에 non- linear properties를 적용해주는 역할

  • Softmax Activation Function

  • Training Process

  • Loss Function ( Cost Function )
    error를 measure하기 위해 사용

  • Gradient Descent
    The Loss/Cost is used to calculate the gradients. And gradients are used to update the weights of the Neural network.

  • Training a Deep Learning Model

  • Transfer Learning

    이미 학습된 모델의 지식을 다른 관련 작업에 적용하는 것을 의미. 많은 작업에서는 이러한 양의 데이터를 확보하기가 어렵거나 시간과 비용이 많이 드는 경우가 있습니다. 이때 Transfer learning은 유용한 방법이 될 수 있다

  • Fine-tuning

    Fine-tuning은 전이 학습(transfer learning) 기법의 한 형태로, 사전 학습된 모델을 새로운 작업에 맞게 조정하는 과정을 말합니다. 이 과정에서는 사전 학습된 모델의 일부 레이어 또는 가중치를 동결(freeze)하고, 나머지 부분을 새로운 데이터셋으로 다시 학습시킵니다. Fine-tuning의 목적은 사전 학습된 모델이 이미 학습한 일반적인 특징을 유지하면서, 새로운 작업에 필요한 특징을 미세하게 조정하는 것이다.
    freeze는 copy하는 것이고 fine-tuning은 regerate하는 것임.

Computer Vision

  • Image Classification

  • Object Detection

  • Semantic Segmentation

  • Instance Segmentation

    instance means object

CNN (Convolutional neural netwokrks)


A convolutional layer applies a series of convolutional filters (also known as kernels) to the input data.
Each filter is a small matrix of weights that is convolved with the input data, performing a mathematical operation called convolution

The Kernel
Conventionally, the first ConvLayer is responsible for capturing the Low-Level features such as edges, color, gradient orientation, etc. With added layers, the architecture adapts to the High-Level features as well, giving us a network that has a wholesome understanding of images in the dataset.
gray scale feature: 2 dimentional(width & height). 1 channel
RGB feature: 3 channel . Height X Width X Channel

Pooling Layer

* dropout: to avoid overefitting, randomly abandon some neurons

Example of ConvNet

Pyramid architecture

Autoencoder
When training CNNs, one of the problems is that we need a lot of labeled data. In the case of image classification, we need to separate images into different classes, which is a manual effort.
=> However, we might want to use raw (unlabeled) data for training CNN feature extractors, which is called self-supervised learning. Instead of labels, we will use training images as both network input and output.
=> The main idea of autoencoder is that we will have an encoder network that converts input image into some latent space (normally it is just a vector of some smaller size), then the decoder network, whose goal would be to reconstruct the original image.

OpenCV

: OpenCV is a powerful image processing library written in C++, which has become the de factostandard for image processing. It has a convenient Python interface.

  • Loading Images

  • Resizing Images

  • Cropping Images

2 . ConvNet for image classification

< Generative AI for Computer Vision >

Generative AI

  • Generative VS Discriminative

GAN (Generative Adversarial Network)

signal means the losss
=> Discriminator : a classifier
=> Generator : Learning


  • Image-to-Image

  • Head Models

  • Image Synthesis
    $

  • Text-to-Image

  • Diffusion Model

  • Transformer

RNN (Recurrent Neural Network)

  • Encoder-Decoder

Attention

to decoder to flexibily use the most relevant parts of the input sequences
It combines weighted combinations of all the encoded input vectors, with the most relevant vectors being assigned the highest weights.
Attention provides a better measure of long-term dependencies and enables models to more efficiently memorize relevant information.

  • Scaled Dot-Product Attention

  • Self Dot-Product Attention

  • Self Attentio nand Cross Attention

  • Multo-head attention

Transformer


cross attention은 encoder -> decoder 사이

Vision Transformer (VIT)


MLP: multi layer perception, fully connected

DETR (Dectection Transformer)

Hugging Face

Hugging Face Website

Example Code

Stable Difusion for Image Generation

Diffusion Models

Stable Diffusion

  • Process

  • The Componet of Stable Diffusion

    다른 모델들과 다르게 stabel diffusion은 latent space를 밖으로 꺼내서

  1. Text encoder

  2. The UNet Noise Predictor

  3. The Auto Enscoder-Decoder Model

  • Training Stable diffusion

  • Stable diffusion Inference
    ( Inference: 이미 모델을 가지고 있음. training을 이미 끝냄 )

  • What is Diffusionn Anyway?

  • How diffusison works

  • How to generate images

  • Generate images with text

  • Prompts

  • LoRA models
    : LoRA (Low-Rank Adaptation) is a training technique for fine-tuning Stable Diffusion models.

  • ControlNet
    : ControlNet is a neural network model for controlling Stable Diffusion models. You can use ControlNet along with any Stable Diffusion models. With ControlNet, Stabel Diffusion users fianlly have a way to contrl where the subjects are and how they look with precision.
    Canny: edge detector

  • Stable Diffusion

    stabel diffusion
    github

AUTOMATIC1111

Text-to-image tab

Summary

  • Data Science is the discovery of hidden patterns from raw data using various tools, skills, and techniques.
  • If you want to answer a yes/no question, this is classification
  • If you want to predict a numerical value, this is regression
  • If you want to group observations into similar-looking groups, this is clustering
  • If you want to recommend someone an item (e.g., book/movie/product) based on ratings data from customers, this is a recommender system
  • Deep learning algorithms are constructed with connected layers.
    deep learning” is coined for machine learning models built with many hidden layers: deep neural networks.
  • The structure of a Neuron: linear func, activation func
  • By applying transfer learning to a new task, one can achieve significantly higher performance than training with only a small amount of data. Transfer learning is a machine learning approach where a pre-trained model's knowledge is leveraged to solve a new task, often requiring less data and training time.
  • Fine-tuning is the process of taking a pre-trained model and continuing its training on a new task with a smaller learning rate to adapt the model's parameters specifically to the new data.transforming an architecture to suit a new purpose based on an existing learned model and finely adjusting the weight of an already learned model.
  • Computer vision : image classification
  • A Convolutional Neural Network (CNN)is a type of deep learning neural network commonly used for analyzing visual data, such as images or videos. A convolutional layer applies a series of convolutional filters (also known as kernels) to the input data.
    Most of the CNNs used for image processing follow a so-called pyramid architecture.
  • Generative AI enables users to quickly generate new content based on a variety of inputs. Inputs and outputs to these models can include text, images, sounds, animation, 3D models, or other types of data.
  • Discriminative models are primarily used for classification tasks, where the goal is to assign labels to new, unseen instances based on their features
  • A Generative Adversarial Network (GAN) is a class of deep learning models where two neural networks, a generator and a discriminator, are pitted against each other in a competition,
  • A Recurrent Neural Network (RNN) is a type of deep learning model that is capable of processing sequential data by utilizing internal memory to retain information and make decisions based on previous inputs.
  • Attention is a mechanism in deep learning that allows a model to focus on specific parts of input data, assigning different weights to different elements, enabling better performance on tasks requiring context and long-range dependencies.
  • The Transformer is a deep learning model architecture based on the attention mechanism, designed to process sequential data by leveraging self-attention layers, making it highly effective for tasks such as machine translation and natural language understanding.
  • Vision Transformer (ViT) applies the transformer's self-attention mechanism to images by dividing them into patches and transforming them into sequences, allowing for effective image recognition and classification.
  • DETR ( detection Transformer) el is a computer vision model used for object detection in images
  • Hugging Face is a popular open-source library and platform that provides a wide range of natural language processing (NLP) models, tools, and resources
  • Diffusion Models are a method of creating data that is similar to a set of training data. They train by destroying the training data through the addition of noise, and then learning to recover the data by reversing this noising process.
  • stable Diffusion is a text-to-image AI model. It is trained on millions of image and text description pairs found on the internet.
    Text encoder/ Unet noise predictor/ auto encoder-decoder model
    Stable Diffusion works on the latent space of images rather than on the pixel space of images.
  • LoRA (Low-Rank Adaptation) is a training technique for fine-tuning Stable Diffusion models.
    LoRA models are small Stable Diffusion models that apply tiny changes to standard checkpoint models.
  • ControlNet is a neural network model for controlling Stable Diffusion models.

Machine Learning Intermediate

Preliminary of machine learning

Modle Space

Biad and Variance

  • Bias : is referring to the gap between predicted value of the data
    and actual value of the data.
  • Variance : refers to the model's sensitivity to small fluctuations or noise in the traning data.

Feature Space

Decision Boundary

Linear seperabel and Non-linear seperable

Model Genernality

Overfitting Underfitting

Cures of demensionality

Interpretability

Interpretability in machine learning refers to the ability to understand and explain how a machine learning model arrives at its predictions or decisions. It is an essential characteristic, especially in domains where decisions have significant
consequences, such as healthcare, finance, legal, and critical systems.

Probalistic perspecive

Parametric and Non-parametreic

Problaistic perspective

Optimization

Regularigation

profile
01 | Computer Science.

0개의 댓글