Skip to main content

Dataset / Data from: Exploration and Explanation in Computational Notebooks

Have a question about this item?

Item information. View source record on contributor's website.

Title
Data from: Exploration and Explanation in Computational Notebooks
Creator
Hollan, James D
Rule, Adam
Tabard, Aurélien
Date Created and/or Issued
July 2017
Contributing Institution
UC San Diego, Research Data Curation Program
Collection
Data from: Exploration and Explanation in Computational Notebooks
Rights Information
Under copyright
Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.
Use: This work is available from the UC San Diego Library. This digital copy of the work is intended to support research, teaching, and private study.
Rights Holder and Contact
UC Regents
Description
In July 2017, our team queried, downloaded, and analyzed approximately 1.25 million Jupyter Notebooks in public repositories on GitHub. By our calculation this was about 95% of all Jupyter Notebooks publicly available on GitHub at the time. This dataset includes: ~1.25 million Jupyter Notebooks Metadata about each notebook Metadata about each of the nearly 200,000 public repositories that contained a Jupyter Notebook Top level README files for nearly 150,000 repositories containing a Jupyter Notebook In addition to this core data, these data include: A smaller, starter dataset with 1000 randomly selected repositories containing ~6000 notebooks CSV files summarizing and indexing the notebooks, repositories, and READMEs Log files documenting when each file was downloaded Scripts for our initial analysis of the dataset
This research was funded by NSF grants #1319829 and #1735234 as well as NLM grant #T15LM011271.
Research Data Curation Program, UC San Diego, La Jolla, 92093-0175 (https://lib.ucsd.edu/rdcp)
Rule, Adam; Tabard, Aurélien; Hollan, James D. (2018). Data from: Exploration and Explanation in Computational Notebooks. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0JW8C39
Rule A, Tabard A, and Hollan J. (2018) Exploration and Explanation in Computational Notebooks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’18). ACM Press, New York, NY. doi:10.1145/3173574.3173606.
Type
Dataset
Language
English
Subject
Data science
Interactive notebooks
Jupyter Notebook
Data analysis

About the collections in Calisphere

Learn more about the collections in Calisphere. View our statement on digital primary resources.

Copyright, permissions, and use

If you're wondering about permissions and what you can do with this item, a good starting point is the "rights information" on this page. See our terms of use for more tips.

Share your story

Has Calisphere helped you advance your research, complete a project, or find something meaningful? We'd love to hear about it; please send us a message.

Explore related content on Calisphere: