TY - GEN
T1 - Feature based techniques for auto-detection of novel email worms
AU - Masud, Mohammad M.
AU - Khan, Latifur
AU - Thuraisingham, Bhavani
PY - 2007
Y1 - 2007
N2 - This work focuses on applying data mining techniques to detect email worms. We apply a feature-based detection technique. These features are extracted using different statistical and behavioral analysis of emails sent over a certain period of time. The number of features thus extracted is too large. So, our goal is to select the best set of features that can efficiently distinguish between normal and viral emails using classification techniques. First, we apply Principal Component Analysis (PCA) to reduce the high dimensionality of data and to find a projected, optimal set of attributes. We observe that the application of PCA on a benchmark dataset improves the accuracy of detecting novel worms. Second, we apply J48 decision tree algorithm to determine the relative importance of features based on information gain. We are able to identify a subset of features, along with a set of classification rules that have a better performance in detecting novel worms than the original set of features or PCA-reduced features. Finally, we compare our results with published results and discuss our future plans to extend this work.
AB - This work focuses on applying data mining techniques to detect email worms. We apply a feature-based detection technique. These features are extracted using different statistical and behavioral analysis of emails sent over a certain period of time. The number of features thus extracted is too large. So, our goal is to select the best set of features that can efficiently distinguish between normal and viral emails using classification techniques. First, we apply Principal Component Analysis (PCA) to reduce the high dimensionality of data and to find a projected, optimal set of attributes. We observe that the application of PCA on a benchmark dataset improves the accuracy of detecting novel worms. Second, we apply J48 decision tree algorithm to determine the relative importance of features based on information gain. We are able to identify a subset of features, along with a set of classification rules that have a better performance in detecting novel worms than the original set of features or PCA-reduced features. Finally, we compare our results with published results and discuss our future plans to extend this work.
KW - Classification technique
KW - Data mining
KW - Email worm
KW - Feature selection
KW - Principal component analysis
UR - http://www.scopus.com/inward/record.url?scp=38049178750&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049178750&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-71701-0_22
DO - 10.1007/978-3-540-71701-0_22
M3 - Conference contribution
AN - SCOPUS:38049178750
SN - 9783540717003
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 205
EP - 216
BT - Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings
PB - Springer Verlag
T2 - 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007
Y2 - 22 May 2007 through 25 May 2007
ER -