Transforming Unstructured Text into Actionable Cyber Intelligence

Andri Wijaya, M.T.I.

Article overview

Advances in digital technology have led to a massive increase in unstructured data, including cybersecurity reports, social media posts, digital documents, and various other forms of electronic communication. This has driven the need for technologies capable of automatically understanding and processing textual data. Natural Language Processing (NLP) has emerged as a key approach in supporting information extraction, text classification, contextual analysis, and the development of data-driven intelligent systems. In the field of cybersecurity, NLP is utilized to support Cyber Threat Intelligence through the process of extracting threat information from unstructured cybersecurity reports. Information such as malware, Indicators of Compromise (IoC), attack techniques, and patterns of threat actor activity can be automatically identified to generate more structured and easily analyzable intelligence. Furthermore, advancements in transformer architecture and Large Language Models (LLMs) are expanding NLP’s capabilities to understand language context in a more complex and adaptive manner. The integration of NLP, artificial intelligence, deep learning, and cybersecurity presents significant opportunities for building smarter, automated, and context-aware cyber threat analysis systems. This research discusses the development of NLP, its application in cybersecurity and cyber threat intelligence, the challenges of processing the Indonesian language, and future development directions based on Large Language Models to support digital transformation and strengthen cybersecurity.

Natural Language ProcessingCyber Threat IntelligenceCybersecurityLarge Language ModelsArtificial IntelligenceDeep LearningText MiningInformation Extraction

COMNETS View

Vol. 1 • No. 2 • 2026

Open issue

Transforming Unstructured Text into Actionable Cyber Intelligence

According to Xue and Liu [1], the development of digital technology in recent years has led to a massive increase in unstructured data, such as digital documents, social media, cybersecurity reports, online articles, and other forms of electronic communication. Ferrag et al. [2] explain that the need for technologies capable of automatically understanding and processing text data has become increasingly important in supporting data-driven decision-making and modern intelligent systems. Furthermore, advancements in transformer architecture and Large Language Models (LLMs) have further expanded NLP’s capabilities to understand human language context in a more complex and adaptive manner.

Arazzi et al. [3] explain that Natural Language Processing has evolved as a solution to assist in information extraction, text classification, context analysis, and the development of intelligent systems capable of automatically understanding human natural language. NLP integrates various disciplines such as linguistics, machine learning, deep learning, data mining, and artificial intelligence to produce approaches capable of transforming text data into more structured and easily analyzable information. With these capabilities, NLP is currently widely applied in various fields such as cybersecurity, business intelligence, social media analysis, healthcare, and decision support systems.

In the field of cybersecurity, research is focused on the application of Cyber Threat Intelligence to support the automated detection, mitigation, and analysis of digital threats. According to Ismail [4], most cyber threat information is still available in the form of unstructured text reports such as threat intelligence reports, incident reports, security advisories, and malware analysis reports. This situation often results in the cyber threat analysis process taking a long time when performed manually. Therefore, NLP is utilized to assist in the extraction of cyber threat information, thereby generating more structured and easily analyzable intelligence data.

Research in this field focuses on developing text mining and information extraction methods to identify various critical details from cyber threat reports, such as malware names, indicators of compromise (IoC), attack techniques, attack targets, and activity patterns carried out by threat actors. According to research conducted by Albarrak et al. [5], the application of NLP in cyber threat intelligence enhances the effectiveness of threat identification processes and aids in the development of more proactive and adaptive cybersecurity systems. Beyond the cybersecurity domain, research is also directed toward the analysis of Android malware. Raju et al. [6] explain that the rapid growth of mobile devices has made the Android operating system one of the primary targets for malware attacks. Android malware continues to evolve with increasingly complex attack techniques, necessitating more adaptive and automated analytical approaches. In this context, NLP is used to assist in the analysis of cyber threat reports related to Android malware to generate intelligence datasets that can be utilized in the development of malware detection models based on machine learning and deep learning.

The development of threat intelligence-based datasets is a key focus of this research. According to studies by Rahman et al. [7] and Xu et al. [8], the integration of NLP, artificial intelligence, and cybersecurity offers significant opportunities for building smarter and more automated cybersecurity systems. Datasets generated through automated extraction from cybersecurity reports are expected to provide richer intelligence context compared to conventional malware datasets. The extracted data is then used in classification, prediction, and threat pattern analysis processes using various artificial intelligence and deep learning approaches.

In its implementation, the research also utilizes various machine learning and deep learning approaches to support natural language analysis. Ainslie et al. [9] explain that techniques such as text classification, topic modeling, sentiment analysis, named entity recognition (NER), and relation extraction play a crucial role in understanding the structure and context of text data more deeply. Furthermore, the development of transformer-based language models and Large Language Models opens new opportunities for the development of artificial intelligence-based cyber threat analysis systems capable of performing analysis automatically and in real-time.

The application of NLP is also utilized in various other fields such as business intelligence, social media analysis, data analytics, and decision support systems. According to Ismail’s research [4], NLP can be used to analyze customer opinions, understand market trends, and identify user behavior patterns from available digital data. In social media analysis, NLP is utilized to perform sentiment analysis, topic detection, and mapping of public opinion regarding specific issues. This demonstrates that NLP plays a crucial role in supporting data-driven digital transformation across various industrial sectors and modern organizations.

Research development in the field of NLP is also directed toward supporting the processing of the Indonesian language, which still faces various challenges compared to languages with abundant resources, such as English. These challenges include limited datasets, variations in language structure, the use of informal language, and a scarcity of available linguistic resources. Therefore, research in the field of NLP is expected to contribute to the development of natural language processing technologies that are more adaptable to the characteristics of the Indonesian language, particularly in the domains of cybersecurity and intelligence analysis.

Looking ahead, the direction of expertise development is focused on the integration of NLP, artificial intelligence, cyber intelligence, and Large Language Models to build smarter, automated, and context-aware cyber threat analysis systems. According to Ahi and Valizadeh [10], the use of LLMs in the field of cybersecurity has great potential to enhance a system’s ability to understand threat contexts, reason about cyber threat data, and generate more accurate and relevant intelligence information. With these technological advancements, NLP is expected to become a key foundation in supporting digital transformation, strengthening cybersecurity, and developing data-driven intelligent systems in the future.

References
[1] H. Xue and W. Liu, “Bibliometric Analysis of Natural Language Processing Technology in Education: Hot Topics, Frontier Evolution, and Future Prospects,” SAGE Open, vol. 15, no. 1, 2025, doi: 10.1177/21582440251319891.
[2] M. A. Ferrag et al., “Revolutionizing Cyber Threat Detection with Large Language Models: A Privacy-Preserving BERT-Based Lightweight Model for IoT/IIoT Devices,” IEEE Access, vol. 12, pp. 23733–23750, 2024, doi: 10.1109/ACCESS.2024.3363469.
[3] M. Arazzi et al., “NLP-based techniques for Cyber Threat Intelligence,” Computer Science Review, vol. 58, p. 100765, 2025, doi: 10.1016/j.cosrev.2025.100765.
[4] W. S. Ismail, “Threat Detection and Response Using AI and NLP in Cybersecurity,” Journal of Internet Services and Information Security, vol. 14, no. 1, pp. 195–205, 2024, doi: 10.58346/JISIS.2024.I1.013.
[5] M. Albarrak, K. Salonitis, and S. Jagtap, “Natural Language Processing (NLP)-Based Frameworks for Cyber Threat Intelligence and Early Prediction of Cyberattacks in Industry 4.0: A Systematic Literature Review,” Applied Sciences, vol. 16, no. 2, p. 619, 2026, doi: 10.3390/app16020619.
[6] A. D. Raju, I. Y. Abualhaol, R. S. Giagone, Y. Zhou, and S. Huang, “A Survey on Cross-Architectural IoT Malware Threat Hunting,” IEEE Access, vol. 9, pp. 91686–91708, 2021, doi: 10.1109/ACCESS.2021.3091427.
[7] M. A. Rahman et al., “A Survey of Large Language Models (LLMs) for Cybersecurity: Opportunities and Directions,” in Proc. 2025 IEEE Int. Big Data Conference (BigData), 2025, pp. 4333–4342, doi: 10.1109/BigData66926.2025.11402639.
[8] H. Xu et al., “Large Language Models for Cybersecurity: A Systematic Literature Review,” ACM Transactions on Software Engineering and Methodology, 2025, doi: 10.1145/3769676.
[9] S. Ainslie, D. Thompson, S. Maynard, and A. Ahmad, “Cyber-threat intelligence for security decision-making: A review and research agenda for practice,” Computers and Security, vol. 132, 2023, doi: 10.1016/j.cose.2023.103352.
[10] K. Ahi and S. Valizadeh, “Large Language Models (LLMs) and Generative AI in Cybersecurity and Privacy: A Survey of Dual-Use Risks, AI-Generated Malware, Explainability, and Defensive Strategies,” in 2025 Silicon Valley Cybersecurity Conference (SVCC), 2025, pp. 1–8, doi: 10.1109/SVCC65277.2025.11133642.