Skip to main content

Dataset / Using NLP to Predict the Severity of Cyber Security Vulnerabilities

Have a question about this item?

Item information. View source record on contributor's website.

Title
Using NLP to Predict the Severity of Cyber Security Vulnerabilities
Date Created and/or Issued
2021-01 to 2021-06
Contributing Institution
UC San Diego, Research Data Curation Program
Collection
Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects
Rights Information
Under copyright
Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.
Use: This work is available from the UC San Diego Library. This digital copy of the work is intended to support research, teaching, and private study.
Rights Holder and Contact
Cook, Bryan; Janamian, Saba; Lim, Teck; Logan, James; Ulloa, Ivan
Description
Cyber-attacks continue to be one of the world’s foremost safety and economic threats, and, in recent years, have become more numerous and severe. Cybersecurity engineers use industry-standard “Common Vulnerabilities and Exposure” (CVE) records to understand and address known threats. CVE records generally contain “Common Vulnerability Scoring System” (CVSS) scores, which indicate a human-determined level of severity. These scores are important to cybersecurity engineers in threat prioritization. Unfortunately, nearly half of all CVE records have not yet been assigned CVSS v3 scores, a critical component of the overall CVSS score. The VulnerWatch product is introduced as a machine learning solution for predicting CVSS v3 scores. Bidirectional Encoder Representation (BERT) is used on CVE record text descriptions to predict eight metrics that, in aggregate, indicate a CVSS v3 score. VulnerWatch provides the user with a prioritized list of CVE records that do not have human-determined CVSS v3 scores, along with a predicted score. It also allows the engineer to manually enter text describing threats and receive a predicted CVSS v3 score in near real-time. The accuracy of predictions for metrics determining CVSS v3 scores is favorable, averaging close to 0.9, with similar levels of precision and recall. Resultant CVSS v3 score predictions are also favorably accurate (MSE = 1.27, MAE = 0.5, R2= 0.51). At this level of accuracy, VulnerWatch is deemed to be successful in providing a valuable tool in combatting cyber-attacks.
Research Data Curation Program, UC San Diego, La Jolla, 92093-0175 (https://lib.ucsd.edu/rdcp)
Cook, Bryan; Janamian, Saba; Lim, Teck; Logan, James; Ulloa, Ivan; Altintas, Ilkay; Gupta, Amarnath (2021). Using NLP to Predict the Severity of Cyber Security Vulnerabilities. In Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0TX3F89
Type
Dataset
Language
English
Subject
Cyber attack
Cybersecurity
Bidirectional Encoder Representation (BERT)
Transfer learning
Natural Language Processing (NLP)
Common Vulnerabilities and Exposure (CVE)
Capstone projects
Data Science & Engineering Master of Advanced Study (DSE MAS)
DSE MAS - 2021 Cohort

About the collections in Calisphere

Learn more about the collections in Calisphere. View our statement on digital primary resources.

Copyright, permissions, and use

If you're wondering about permissions and what you can do with this item, a good starting point is the "rights information" on this page. See our terms of use for more tips.

Share your story

Has Calisphere helped you advance your research, complete a project, or find something meaningful? We'd love to hear about it; please send us a message.

Explore related content on Calisphere: