Prediction of LC50 Value using Quantitative Structure-Activity (QSAR) Models
Project Statement
Quantitative structure-activity relationship (QSAR) modeling pertains to the construction of predictive models of biological activities as a function of structural and molecular information of a compound library. Typical molecular parameters that are used to account for electronic properties, hydrophobicity, steric effects, and topology can be determined empirically through experimentation or theoretically via computational chemistry.
A given compilation of data sets (set of multiple data scriptor values) is then subjected to data preprocessing and data modeling through the use of statistical and/or machine learning techniques. Quantitative structure-activity relationship (QSAR) and quantitative structure- property relationship (QSPR) makes it possible to predict the activities/properties of a given compound as a function of its molecular substituent. Essentially, new and untested compounds possessing similar molecular features as compounds used in the development of QSAR/QSPR models are likewise assumed to also possess similar activities/properties.
The construction of QSAR/QSPR model typically comprises of two main steps: (i) description of molecular structure and (ii) multivariate analysis for correlating molecular descriptors with observed activities/properties. An essential preliminary step in model development is data understanding. Intermediate steps that are also crucial for successful development of such QSAR/QSPR models include data preprocessing and statistical evaluation.
The goal here is to build an end-to-end automated Machine Learning model that predicts the LC50 value, the concentration of a compound that causes 50% lethality of fish in a test batch over a duration of 96 hours, using 6 given molecular descriptors.
Reference: Isarankura-Na-Ayudhya C, Naenna T, Nantasenamat C, Prachayasittikul V. A practical overview of quantitative structure-activity relationship.
Project Demonstration
Data Used
Programming Languages Used
Python
Python Libraries Used
dvc==2.45.1
ipykernel==6.9.1
ipython==8.1.1
joblib==1.2.0
keras==2.9.0
numpy==1.24.2
pandas==1.5.3
PyYAML==6.0
scikit-learn==1.2.1
seaborn==0.12.2
xgboost==1.7.4