University of California, Santa Barbara
Suman Sirimulla, Computational Science
Faculty Sponsor's Department(s):
Materials Research Laboratory
Increasing the Accuracy of Anti-Sars-CoV-2 Assay Prediction Models Using Multi-Layer Combinatorial Fusion Methods
The testing and discovery of pharmaceutical molecules is very slow and expensive. Combined with the sheer number of molecules that could possess desirable pharmaceutical properties is a seemingly impossible dilemma. In situations akin to the ongoing Sars-CoV-2 pandemic, time is extremely valuable and thus demands a more efficient and practical method of drug identification. Machine learning algorithms such as Sci-Kit Learn, provide an economically viable and relatively fast alternative for filtering out which molecules are most likely to possess desired pharmaceutical properties. Using information from pubchem and open source databases like rdkit, chemical fingerprints, physicochemical properties, and pharmacophoric properties can be extracted and used to train machine learning models in supervised learning techniques to identify molecules of interest. Additionally, this process can be expedited by applying unsupervised learning via clustering to preprocess the data and further eliminate molecules that likely possess no properties of interest. While machine learning does provide a baseline guess, our previous models were only approximately 60 % accurate on our test sets (KC) and their performance may be further improved. Thus, it is proposed that multi-layer combinatorial fusion (MCF) be used to strengthen and train models to be more accurate by combining the ideas of combinatorial fusion analysis (CFA) which employs both multiple scoring systems (MSS), rank-score characteristic function (RSC), and cognitive diversity (CD). MCF may provide a more accurate model that can better identify molecules that have anti-Sars-CoV-2 traits.