TY - GEN
T1 - An effective support vector machines (SVMs) performance using hierarchical clustering
AU - Awad, Mamoun
AU - Khan, Latifur
AU - Bastani, Farokh
AU - Yen, I. Ling
PY - 2004/12/1
Y1 - 2004/12/1
N2 - The training time for SVMs to compute the maximal marginal hyper-plane is at least O(N 2) with the data set size N, which makes it non-favorable for large data sets. This paper presents a study for enhancing the training time of SVMs, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the Dynamically Growing Self-Organizing Tree (DGSOT) Algorithm for clustering because it has proved to overcome the drawbacks of traditional hierarchical clustering algorithms. Clustering analysis helps find the boundary points, which are the most qualified data points to train SVMs, between two classes. We present a new approach of combination of SVMs and DGSOT, which starts with an initial training set and expands it gradually using the clustering structure produced by the DGSOT algorithm. We compare our approach with the Rocchio Bundling technique in terms of accuracy loss and training time gain using two benchmark real data sets.
AB - The training time for SVMs to compute the maximal marginal hyper-plane is at least O(N 2) with the data set size N, which makes it non-favorable for large data sets. This paper presents a study for enhancing the training time of SVMs, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the Dynamically Growing Self-Organizing Tree (DGSOT) Algorithm for clustering because it has proved to overcome the drawbacks of traditional hierarchical clustering algorithms. Clustering analysis helps find the boundary points, which are the most qualified data points to train SVMs, between two classes. We present a new approach of combination of SVMs and DGSOT, which starts with an initial training set and expands it gradually using the clustering structure produced by the DGSOT algorithm. We compare our approach with the Rocchio Bundling technique in terms of accuracy loss and training time gain using two benchmark real data sets.
UR - http://www.scopus.com/inward/record.url?scp=16244387316&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=16244387316&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2004.26
DO - 10.1109/ICTAI.2004.26
M3 - Conference contribution
AN - SCOPUS:16244387316
SN - 076952236X
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 663
EP - 667
BT - Proceedings - 16th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2004
A2 - Khoshgoftaar, T.M.
T2 - Proceedings - 16th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2004
Y2 - 15 November 2004 through 17 November 2004
ER -