Construction of a mathematical model relating a molecular struture to a chemical property or biological effect by means of statistical techniques
Workflow for predictive QSAR modeling
QSAR model을 통해 virtual screening & Molecluar design 수행
*curated data를 만들어주는 프로그램들도 있다. [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00456-1]
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5992871/]
removal of mixutre, inorganic : 혼합화합물 제외, 무기화합물 제외
Structural conversion cleaning/removal of salts: 염화수소 제외..?
어떤 질병 target에 binding 하는지에 대한 정보
*Any form of experimental observation, not limited to biological activites
observations are consistent (preferably obtained from a single experimental source)
다른 assay에서 얻어진 data는 가능한한 단일 모델에 combining하면 안된다.
⇒ Data consistency checking! 서로 다른 기관에서 얻어진 데이터는 consistency가 떨어진다.
A QSAR model can be expected to provide predictions on compound that are represented by the structures in the training set.
적용이 가능한가?
데이터셋의 사이즈는 너무 작으면 안됨
When a dataset includes more compounds
Select a diverse subset of compounds
Cluster a dataset and build models separately for each cluster
In case of classification or category QSAR when compounds belong to a small data number of activity classes or categories, it is possible to exclude many compounds from model development.
Lower limit
Continuous repose variable (activity)
the number of compounds in the traning set should be at least 20
about 10 compounds should be in each of the test and external evlaution set
Classification or category reponse variable
traing : at least 10 compounds of each class
test and external evfaluation less than 5 compounds
Activity values
Continuous response variable
Total range of activities: at least five times higher than the experimental error →전체 데이터 범위가 실험오류의 범위보다 5배 이상 정도 커야한다.?
No large gaps : that exceed 10%-15% of the entire range of activities between two consecutive values of activities
Classification or category QSAR
적어도 20개 compound of each class
class간 compound 개수는 같을수록 좋다.
(*Ligand-based VS)
From Chemical Library ——————— Virtual Screening ———————> Potential Hits
## in virtual screening
1. Empirical Rules
: 신약이 될 가능성이 전혀 없는것들은 걸러냄 (ex) Ro5, QED
2. Chemical Similarity Filter
: Similarity가 너무 높거나 낮아서 QSAR모델을 적용하기 어려운 것들을 filtering
3. QSAR-based Filters
4. Feasibility/Availability
: SAS(synthetic Accessibility) 등
From 이미 알려져있는 약물
약물에 대해 각각의 Target에 대한 QSAR 모델을 테스트하면, 어떤 target에 binding하는 지 알 수 있다.
Specificity를 높이고, off-target에 대한 binding affinity를 낮출 수 있도록 lead optimization 가능
출처 : LAIDD 강의