Skip to main content

Dataset / LLNL 3D Protein-Ligand Dataset for Anti-viral Screening against SARS-CoV-2

Have a question about this item?

Item information. View source record on contributor's website.

Title
LLNL 3D Protein-Ligand Dataset for Anti-viral Screening against SARS-CoV-2
Creator
Allen, Jonathan
Jones, Derek
Kim, Hyojin
Kirshner, Dan
Lightstone, Felice
Zhang, Xiaohua
Date Created and/or Issued
2020-04
Contributing Institution
UC San Diego, Research Data Curation Program
Collection
Lawrence Livermore National Laboratory (LLNL) Open Data Initiative
Rights Information
Under copyright
Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.
Use: This work is available from the UC San Diego Library. This digital copy of the work is intended to support research, teaching, and private study.
Rights Holder and Contact
Lawrence Livermore National Laboratory
Description
This dataset contains protein-ligand complexes in a 3D representation for anti-viral drug screening against SARS-CoV-2. This is a part of the Lawrence Livermore National Laboratory Covid-19 Therapeutic Design database, but is specifically designed to facilitate machine learning and other data science tasks with regard to both efficacy (protein-ligand binding affinity) and safety. This complex dataset is called "ml-hdf", comprised of ligands and four potential binding pockets of the SARS-CoV-2 protein targets in a 3D atomic representation. The ligands in this dataset includes Federal Drug Administration (FDA) approved drugs and "Other-world-approved" drugs that have been approved for use by the EU, Canada and Japan. The compounds were docked against two binding pockets from the Spike protein (spike, spike1) and two conformations of the main protease (protease, protease2).
LLNL’s Laboratory Directed Research and Development (LDRD), tracking # 20-ERD-065 and 20-ERD-062. AHA CRADA: funded by the American Heart Association Center for Accelerated Drug Discovery under a collaborative research and development agreement (CRADA TC02274).
Research Data Curation Program, UC San Diego, La Jolla, 92093-0175 (https://lib.ucsd.edu/rdcp)
Kim, Hyojin; Jones, Derek; Zhang, Xiaohua; Kirshner, Dan; Lightstone, Felice; Allen, Jonathan (2020). LLNL 3D Protein-Ligand Dataset for Anti-viral Screening against SARS-CoV-2. In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0KW5DK5
Is Supplement To: Derek Jones and Hyojin Kim and Xiaohua Zhang and Adam Zemla and Garrett Stevenson and William D. Bennett and Dan Kirshner and Sergio Wong and Felice Lightstone and Jonathan E. Allen (2020). Improved Protein-ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference. arXiv preprint arXiv: https://arxiv.org/pdf/2005.07704.pdf.
Directory Structure There are four directories, each of which contains ligands docked against a particular binding pocket. The cut-off for the binding pocket regions is 8 angstroms, similar to the pdbbind database. - spike: complex data for one binding site of the spike protein (6m0j), stabilized by disulfide Cys480-Cys488. This is an allosteric binding site. - spike1: complex data for another binding site of the spike protein (6m0j) in the proximity of the beta-turn formed by residues 501-505. This is the receptor binding domain to ACE2. - protease: complex data for a conformation binding site of the main protease (6lu7) - protease2: complex data for another conformation binding site of the main protease (6y84) In each subdirectory, there are about 90 ml-hdf files, each of which contains about 100 complex poses (10 docking poses per complex). The poses have been down selected by using the Autodock Vina with the AMBER molecular simulation package.
Type
Dataset
Language
English
Subject
Virtual drug screening
3D protein-ligand complex
Covid19 antiviral screening
Deep learning
Machine learning

About the collections in Calisphere

Learn more about the collections in Calisphere. View our statement on digital primary resources.

Copyright, permissions, and use

If you're wondering about permissions and what you can do with this item, a good starting point is the "rights information" on this page. See our terms of use for more tips.

Share your story

Has Calisphere helped you advance your research, complete a project, or find something meaningful? We'd love to hear about it; please send us a message.

Explore related content on Calisphere: