Under copyright Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work. Use: This work is available from the UC San Diego Library. This digital copy of the work is intended to support research, teaching, and private study.
Biologists work with a multitude of protein sequences represented by strings of letters. The amino acid sequence of these proteins allows us to leverage various machine learning Natural Language Processing algorithms aimed to predict enzyme classifications which are indicative of both protein structure and functionality. Our goal is to propose a multi level classification solution that is designed to predict the respective class of a given enzyme. Our approach consists of predicting the classification of an enzyme by applying NLP to a protein sequence. Our method utilizes BERT (Bidirectional Encoder Representations from Transformers) models to create embeddings, or feature vectors, and a variety of machine learning models to predict the respective class and subclass of an enzyme. Research Data Curation Program, UC San Diego, La Jolla, 92093-0175 (https://lib.ucsd.edu/rdcp) Baldino, Breanne; Dohkani, Tahamtan; Pinto, Matteo; Sundaresan, Ambika; Yu, Cindy; Rose, Peter (2021). Prediction of Enzyme Classification using Protein Sequence Embeddings. In Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0736QSX
Type
dataset
Identifier
ark:/20775/bb34560073
Language
English
Subject
Enzyme classification Protein data analysis Data Science & Engineering Master of Advanced Study (DSE MAS) Task: Classification Capstone projects Enzyme class multi-class classification DSE MAS - 2021 Cohort
If you're wondering about permissions and what you can do with this item, a good starting point is the "rights information" on this page. See our terms of use for more tips.
Share your story
Has Calisphere helped you advance your research, complete a project, or find something meaningful? We'd love to hear about it; please send us a message.