Secret breach prevention in software issue reports

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Shahriyar, Dr. Rifat
dc.contributor.author	Wahab, Zahin
dc.date.accessioned	2025-02-17T09:58:24Z
dc.date.available	2025-02-17T09:58:24Z
dc.date.issued	2024-06-03
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6956
dc.description.abstract	In the digital age, the exposure of sensitive information poses a significant threat to security. Leveraging the ubiquitous nature of code-sharing platforms like GitHub and BitBucket, developers often accidentally disclose credentials and API keys, granting unauthorized access to critical systems. Despite the availability of tools for detecting such breaches in source code, detecting secret breaches in software issue reports remains largely unexplored. This thesis presents a novel technique for secret breach detection in software issue reports using a combination of language models and state-of-the-art regular expressions. We highlight the challenges posed by noise, such as log files, URLs, commit IDs, stack traces, and dummy passwords, which complicate the detection process. By employing relevant pre-processing techniques and leveraging the capabilities of advanced language models, we aim to mitigate potential breaches effectively. Drawing insights from existing research on secret detection tools and methodologies, we propose an approach combining the strengths of state-of-the-art regexes with the contextual understanding of language models. Our method aims to reduce false positives and improve the accuracy of secret breach detection in software issue reports. We have also developed a secret breach mitigator tool for GitHub, which will warn the user if there is any secret in the posted issue report. By addressing this critical gap in contemporary research, our work aims at enhancing the overall security posture of software development practices. We have curated a benchmark dataset of 25000 instances with only 437 true positives. Although the data is highly skewed, our model performs well with a 0.6347 F1-score, whereas state-of-the-art regular expression hardly manages to get a 0.0341 F1-Score with a very poor precision score.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE), BUET	en_US
dc.subject	Computer software	en_US
dc.title	Secret breach prevention in software issue reports	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0421052013	en_US
dc.identifier.accessionNumber	119772
dc.contributor.callno	001.6425/ZAH/2024	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Show simple item record

Search BUET IR

Advanced Search

Browse

All of IR
This Collection

Secret breach prevention in software issue reports

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account