Automating Threat Intelligence Knowledge Graph Construction through Named Entity Recognition with DeepSeek
Penulis
Dendi Renaldo Permana, Nurul Afifah, Septiani Kusuma Ningrum, Deris Stiawan, Mohd Yazid Idris, Rahmat Budiarto
Dipublikasikan di
8th International Conference of Reliable Information and Communication Technology
Abstrak
The proliferation of cyber threats has created a deluge of unstructured cyber threat intelligence (CTI), making manual analysis impractical and over-whelming. Knowledge graphs (KGs) offer a powerful solution for structur-ing this data, but their construction is bottlenecked by the foundational task of named entity recognition (NER). While state-of-the-art NER often relies on massive general-purpose or resource-intensive domain-specific models, this paper explores a novel alternative that leverages a lightweight, code-centric language model, DeepSeek Coder 1.3B. We hypothesize that its unique pre-training on a 2-trillion-token corpus, heavily skewed towards source code (87%), provides a strong inductive bias for recognizing the quasi-syntactic nature of cybersecurity entities. We fine-tune DeepSeek Coder on the CyberNER dataset, a public corpus derived from security blogs and annotated with 10 distinct entity types using the BIO tagging scheme to test our hypothesis. This research evaluates the viability of using code-specialized models as an efficient and effective method for populating cybersecurity knowledge graphs, addressing a critical challenge in automat-ing threat intelligence analysis.
Tim Penulis
Dendi Renaldo Permana
Universitas Sriwijaya
Nurul Afifah
Universitas Sriwijaya
Septiani Kusuma Ningrum
Universitas Sriwijaya
Deris Stiawan
Universitas Sriwijaya
Mohd Yazid Idris
Universiti Teknologi Malaysia
Rahmat Budiarto
Al-Baha University
