109

urn:lsid:arphahub.com:pub:3dc5f44e-8666-58db-bc76-a455210e8891

JUCS - Journal of Universal Computer Science

jucs

0948-695X 0948-6968

Journal of Universal Computer Science

10.3217/jucs-015-04-0705

29334

Research Article

I.2.6 - Learning I.5.0 - General I.5.1 - Models I.5.3 - Clustering

An Efficient Data Preprocessing Procedure for Support Vector Clustering

Wang

Jeen-Shing

jeenshin@mail.ncku.edu.tw 1 Chiang

Jen-Chieh

National Cheng Kung University, Tainan City, Taiwan

National Cheng Kung University

Tainan City

Taiwan

Corresponding author: Jeen-Shing Wang (jeenshin@mail.ncku.edu.tw).

Academic editor:

2009

28 02 2009

15 4 705 721 9697A69E-6006-5BA0-9725-AAFFAF955082 7000683

Jeen-Shing Wang, Jen-Chieh Chiang

This article is freely available under the J.UCS Open Content License.

Abstract

This paper presents an efficient data preprocessing procedure for the of support vector clustering (SVC) to reduce the size of a training dataset. Solving the optimization problem and labeling the data points with cluster labels are time-consuming in the SVC training procedure. This makes using SVC to process large datasets inefficient. We proposed a data preprocessing procedure to solve the problem. The procedure contains a shared nearest neighbor (SNN) algorithm, and utilizes the concept of unit vectors for eliminating insignificant data points from the dataset. Computer simulations have been conducted on artificial and benchmark datasets to demonstrate the effectiveness of the proposed method.