TY - GEN
T1 - Implementation of Token Parsing Technique for Regex Based Classification of Unstructured Data for Cyber Threat Analysis
AU - Mohd Pakhari, Muhammad Hazim
AU - Jamil, Norziana
AU - Rusli, Mohd Ezanee
AU - Abdul Rahim, Azril Azam
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/8/24
Y1 - 2020/8/24
N2 - Cyber Threat Intelligence (CTI) is a concept for information about cyber threats which were analysed, structured, and refined. This information is used to help organizations to understand the current risk that have different levels that might bring harm to their enterprises. Besides, CTI can also help organizations to plan for defensive countermeasures and protect themselves from the attacks that can cause them damage. In this paper, we introduce a token parsing technique for regex based classification of unstructured data for cyber threat analytic (CTA) engine that does threat analysis based on data crawled from several public resources. Our engine crawls and fetch data from the public resource in time series, analyse the data and provide a meaningful information to the user with the timeline of the fetched parameter. The collected data which appears as non-structured are converted by the engine to appear as a structured data and then be inserted into the database. Subsequently, the engine then analyses the threat data by modelling it before useful information be returned to the user. The challenge is to have a structured data useful for analysis. This paper explains how our token parsing technique is useful in regex based classification to convert the unstructured data into useful structured data.
AB - Cyber Threat Intelligence (CTI) is a concept for information about cyber threats which were analysed, structured, and refined. This information is used to help organizations to understand the current risk that have different levels that might bring harm to their enterprises. Besides, CTI can also help organizations to plan for defensive countermeasures and protect themselves from the attacks that can cause them damage. In this paper, we introduce a token parsing technique for regex based classification of unstructured data for cyber threat analytic (CTA) engine that does threat analysis based on data crawled from several public resources. Our engine crawls and fetch data from the public resource in time series, analyse the data and provide a meaningful information to the user with the timeline of the fetched parameter. The collected data which appears as non-structured are converted by the engine to appear as a structured data and then be inserted into the database. Subsequently, the engine then analyses the threat data by modelling it before useful information be returned to the user. The challenge is to have a structured data useful for analysis. This paper explains how our token parsing technique is useful in regex based classification to convert the unstructured data into useful structured data.
KW - Cyber Threat Analysis
KW - Cyber Threat Intelligence
KW - Pastebin
KW - Threat Intelligence
KW - Threat Modeling
UR - http://www.scopus.com/inward/record.url?scp=85097642842&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097642842&partnerID=8YFLogxK
U2 - 10.1109/ICIMU49871.2020.9243415
DO - 10.1109/ICIMU49871.2020.9243415
M3 - Conference contribution
AN - SCOPUS:85097642842
T3 - 2020 8th International Conference on Information Technology and Multimedia, ICIMU 2020
SP - 395
EP - 398
BT - 2020 8th International Conference on Information Technology and Multimedia, ICIMU 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Information Technology and Multimedia, ICIMU 2020
Y2 - 24 August 2020 through 25 August 2020
ER -