An R-based System Implementation for Automated Threat Intelligence Analysis Integrating Text Mining and Machine Learning
Abstract
The escalating threat of global cyberattacks targeting network systems and exfiltrating sensitive information has rendered traditional cybersecurity defense mechanisms increasingly inadequate. Consequently, assisting enterprises in constructing a comprehensive Cyber Threat Intelligence (CTI) aggregation and analysis framework has become critically important. This study aims to address a major pain point faced by cybersecurity analysts—namely, the significant time and resources required to manually process vast amounts of unstructured CTI reports—by proposing an innovative automated analytical solution. The proposed research framework integrates the R programming language with Text Mining and Machine Learning (ML) techniques. Initially, CTI reports collected from institutions such as Taiwan’s National Information Sharing and Analysis Center (N-ISAC) were processed through text mining. Using Term Frequency–Inverse Document Frequency (TF-IDF), we extracted lexical features and constructed a structured dataset. Subsequently, multiple machine learning classifiers were implemented and evaluated, including the C4.5 decision tree, Naive Bayes, Logistic Regression, and Classification and Regression Tree (CART) models, to automatically identify and categorize potential threats and vulnerabilities [1]. Experimental results demonstrated that the Naive Bayes classifier achieved the highest performance with an accuracy rate of 95.48% on the CTI dataset. Moreover, this study successfully implemented a CTI analysis system equipped with visualization capabilities. Empirical validation confirmed that the system significantly reduces the time required for cybersecurity professionals to assess threat intelligence and rapidly generate remediation or hardening strategies for risk mitigation. The proposed research provides enterprises with a high-efficiency, high-accuracy, and scalable CTI analytical tool, effectively enhancing organizational cybersecurity resilience and forensic integrity within Security Information and Event Management (SIEM) and Managed Detection and Response (MDR) environments.
Keywords
Threat intelligence, Text mining, Feature selection, Machine learning
Citation Format:
Hung-Cheng Yang, I-Long Lin, Yu-Shan Lin, Chorng-Ming Chen, "An R-based System Implementation for Automated Threat Intelligence Analysis Integrating Text Mining and Machine Learning," Journal of Internet Technology, vol. 27, no. 1 , pp. 121-133, Jan. 2026.
Hung-Cheng Yang, I-Long Lin, Yu-Shan Lin, Chorng-Ming Chen, "An R-based System Implementation for Automated Threat Intelligence Analysis Integrating Text Mining and Machine Learning," Journal of Internet Technology, vol. 27, no. 1 , pp. 121-133, Jan. 2026.
Refbacks
- There are currently no refbacks.
Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314 E-mail: jit.editorial@gmail.com
