Small Molecule Drug Development for the BRAF V600 Mutation

San Jose State UniversityDecember 2022

KEYWORDS

BRAF-V600EMachine LearningSVMRandom Forest ClassifierQuaSAR

ABSTRACT

This report presents the findings behind the use of computational or in-silico methods to find therapeutic targets allows for the effective integration of the massive amounts of data currently available and the accurate prediction of the effectiveness of a given target molecule that could potentially inhibit the expression of the most common B-Raf Proto-Oncogene, Serine/Threonine Kinase (BRAF) mutation. In order to find small chemical molecules that may prevent the expression of the most prevalent BRAF oncogenic mutation, machine-learning algorithms, such as the SVM (Support Vector Machine). An SVM model utilizes support vectors to adjust the threshold of the hyperplane to categorize data points and is widely used for classification models. Complemented with a Random Forest Classifier, the linear SVM model was able to use a dataset with 243 different compounds to achieve an average of 0.976 precision, 0.975 recall, 0.966 accuracies, and a 0.962 area under the receiving operating characteristic curve across 50 independent iterations. 10 common features were present in all 50 iterations, which provides computational evidence that these features directly affect the identification of the model. The model is not limited to strictly identifying compounds, as it affords the ability to determine if certain features truly affect the identification. This model may be used to conclude whether a QuaSAR descriptor truly correlates with the potential of a compound to inhibit the expression of the BRAF mutation. The model consistently achieved optimal performance with each iteration. Future work will implement an improved feature selection process to achieve perfect performance, a deeper analysis of feature importances, and use alternative classification models.

Loading Document

Initializing viewer...