Under copyright Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work. Use: This work is available from the UC San Diego Library. This digital copy of the work is intended to support research, teaching, and private study.
The COVID-19 pandemic has wrecked global havoc and caused the deaths of millions. Scientists are working around the clock to understand the mechanisms through which the SARS-CoV-2 virus infects the human body, and they are fortunately aided by complex molecular dynamics simulations. The scientists at the Amaro Lab at the University of California San Diego have simulated the spike protein of the SARS-CoV-2 virus. It is understood that the spike is responsible for binding with the ACE2 receptor on the human cell when it is in the open state, with its receptor binding domain in the up position. The aim of this project is to use machine learning techniques to aid the Amaro Lab in using their simulation data to understand the mechanisms by which the SARS-CoV-2 spike protein enters into the open state and infects the human cell. Ultimately a Stochastic Gradient Descent classifier was successfully trained to predict when the spike was in the open state with a precision and recall that is greater than 0.99. This success indicates that machine learning can be used to mine spike simulation data for insights into the substructures relevant to SARS-CoV-2 infection and discover therapeutic targets for drug development. Research Data Curation Program, UC San Diego, La Jolla, 92093-0175 (https://lib.ucsd.edu/rdcp) Montgomery, Mary Kate; Arora, Shitiz; Hope-Bell, Eleanor; Sun, Hao; Xue, Kayla; Amaro, Rommie; Altintas De Callafon, Ilkay (2022). Predicting Effects of SARS-CoV-2 Variant Mutations on Spike Protein Dynamics and Mechanism. In Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0ZP4692 Scripts: FinalExtractedFeature.csv contains added feature extraction scripts. Input data: For load_data_test.ipynb, dropped density features and expanded RBD neighborhood to 40 nm then re-ran from feature extraction. Output data: For spike_dashboard.py, changed folder parsing to only list datasets that have had features extracted.
Type
dataset
Identifier
ark:/20775/bb8610679f
Language
English
Subject
Machine learning Molecular dynamics simulations Drug development Capstone projects Kubernetes Stochastic Gradient Descent classifier Spike protein Data Science & Engineering Master of Advanced Study (DSE MAS) Receptor-binding domain COVID-19 DSE MAS - 2022 Cohort
If you're wondering about permissions and what you can do with this item, a good starting point is the "rights information" on this page. See our terms of use for more tips.
Share your story
Has Calisphere helped you advance your research, complete a project, or find something meaningful? We'd love to hear about it; please send us a message.