Under copyright Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work. Use: This work is available from the UC San Diego Library. This digital copy of the work is intended to support research, teaching, and private study.
Rights Holder and Contact
Lawrence Livermore National Laboratory
This dataset contains protein-ligand complexes in a 3D representation for anti-viral drug screening against SARS-CoV-2. This is a part of the Lawrence Livermore National Laboratory Covid-19 Therapeutic Design database, but is specifically designed to facilitate machine learning and other data science tasks with regard to both efficacy (protein-ligand binding affinity) and safety. This complex dataset is called "ml-hdf", comprised of ligands and four potential binding pockets of the SARS-CoV-2 protein targets in a 3D atomic representation. The ligands in this dataset includes Federal Drug Administration (FDA) approved drugs and "Other-world-approved" drugs that have been approved for use by the EU, Canada and Japan. The compounds were docked against two binding pockets from the Spike protein (spike, spike1) and two conformations of the main protease (protease, protease2). LLNL’s Laboratory Directed Research and Development (LDRD), tracking # 20-ERD-065 and 20-ERD-062. AHA CRADA: funded by the American Heart Association Center for Accelerated Drug Discovery under a collaborative research and development agreement (CRADA TC02274). Research Data Curation Program, UC San Diego, La Jolla, 92093-0175 (https://lib.ucsd.edu/rdcp) Kim, Hyojin; Jones, Derek; Zhang, Xiaohua; Kirshner, Dan; Lightstone, Felice; Allen, Jonathan (2020). LLNL 3D Protein-Ligand Dataset for Anti-viral Screening against SARS-CoV-2. In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0KW5DK5 Is Supplement To: Derek Jones and Hyojin Kim and Xiaohua Zhang and Adam Zemla and Garrett Stevenson and William D. Bennett and Dan Kirshner and Sergio Wong and Felice Lightstone and Jonathan E. Allen (2020). Improved Protein-ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference. arXiv preprint arXiv: https://arxiv.org/pdf/2005.07704.pdf. Directory Structure There are four directories, each of which contains ligands docked against a particular binding pocket. The cut-off for the binding pocket regions is 8 angstroms, similar to the pdbbind database. - spike: complex data for one binding site of the spike protein (6m0j), stabilized by disulfide Cys480-Cys488. This is an allosteric binding site. - spike1: complex data for another binding site of the spike protein (6m0j) in the proximity of the beta-turn formed by residues 501-505. This is the receptor binding domain to ACE2. - protease: complex data for a conformation binding site of the main protease (6lu7) - protease2: complex data for another conformation binding site of the main protease (6y84) In each subdirectory, there are about 90 ml-hdf files, each of which contains about 100 complex poses (10 docking poses per complex). The poses have been down selected by using the Autodock Vina with the AMBER molecular simulation package.
Virtual drug screening 3D protein-ligand complex Covid19 antiviral screening Deep learning Machine learning