- Introduction
- Installation Guide
- User Guide
- Dataset Format
- Support
- Citing LightTwinSVM
- Contributing
- FAQ
- Donations
- Numerical Experiments
LightTwinSVM is a simple and fast implementation of standard Twin Support Vector Machine. It is licensed under the terms of GNU GPL v3. Anyone who is interested in machine learning and classification can use this program for their work/projects.
The main features of the program are the following:
- A simple console program for running TwinSVM classifier
- Fast optimization algorithm: The clipDCD algorithm was improved and is implemented in C++ for solving optimization problems of TwinSVM.
- Linear, RBF kernel and Rectangular are supported.
- Binary and Multi-class classification (One-vs-All & One-vs-One) are supported.
- The OVO estimator is compatible with scikit-learn tools such as GridSearchCV, cross_val_score, etc.
- The classifier can be evaluated using either K-fold cross-validation or Training/Test split.
- It supports grid search over C and gamma parameters.
- CSV and LIBSVM data files are supported.
- Detailed classification result will be saved in a spreadsheet file.
Twin Support Vector Machine classifier was proposed by:
Khemchandani, R., & Chandra, S. (2007). Twin support vector machines for pattern classification. IEEE Transactions on pattern analysis and machine intelligence, 29(5), 905-910.
The clipDCD algorithm was proposed by:
Peng, X., Chen, D., & Kong, L. (2014). A clipping dual coordinate descent algorithm for solving support vector machines. Knowledge-Based Systems, 71, 266-278.
Currently, supported operating systems are as follows. Choose your OS from list below for detailed install instructions.
- Debian-based Linux systems (Ubuntu 14.04, Ubuntu 16.04, Ubuntu 17.10, Ubuntu 18.04 and Linux Mint 18)
- RPM-based Linux systems (Fedora)
- Mac OSX
- Microsoft Windows
First of all, Python 3.5 interpreter or newer is required. Python 3 is usually installed by default on most Linux distributions. In order to build and run the program, the following Python packages are needed:
- NumPy
- SciPy
- Scikit-learn
- Pandas
- Pybind11
- Cython(To build C++ extension module on Windows.)
- PyInstaller(To generate a binary executable for Windows platform.)
In order to build C++ extension module(Optimizer), the following tools and libraries are required:
- GNU C++ Compiler (For Linux systems)
- Apple XCode (For OSX systems)
- Visual Studio (For Windows systems)
- Armadillo C++ Linear Algebra Library
- LAPACK and BLAS Library
A shell script is created to help users download required dependencies and install program automatically. However, make sure that Git and GNU C++ compiler is installed on your system.
A note for MacOS users: Make sure that Apple XCode is installed on your system.
To install the program, open a terminal and execute the following commands:
git clone https://github.com/mir-am/LightTwinSVM.git
cd LightTwinSVM && ./setup.sh
If the installation was successful, you'd be asked to delete temporary directory for installation. You can also run unit tests to check functionalities of the program. Finally, a Linux shell "ltsvm.sh" is created to run the program.
After the successful installation, LightTwinSVM program should look like this in terminal:
First, download Git program from here if it's not installed on your system. Also, Visual Studio 2015 or newer should be installed so that C++ extension module can be compiled. Before proceeding further, make sure that all the required Python packages are installed. Dependencies are listed here.
A note for Windows users: If this is the first time that you will run a PowerShell script, then you need to make sure that the ExecutionPolicy is set on your system. Otherwise, you cannot run the setup script on Windows. Please check out this answer on Stack Overflow that helps you set the ExecutionPolicy.
To install the program on Windows, open a PowerShell terminal and run the following commands:
git clone https://github.com/mir-am/LightTwinSVM.git
cd LightTwinSVM && .\win-setup.ps1
When the installation is finished, a batch file "ltsvm.bat" will be created to run the program.
It is highly recommended to install the LightTwinSVM program automatically using the setup script. If for some reasons you still want to build the program manually, a step-by-step guide is provided here for Linux and OSX systems.
LightTwinSVM is a simple console application. It has 4 steps for doing classification. Each step is explained below:
Step 1: Choose your dataset by pressing Enter key. A file dialog window will be shown to help you find and select your dataset. CSV and LIBSVM files are supported. It is highly recommended to normalize your dataset.
Step 2: Choose a kernel function among Linear, Gaussian (RBF) and Rectangular. RBF kernel often produces better classification result but takes more time. However if you want to use non-linear kernel and your dataset is large, then consider choosing Rectangular kernel.
Step 2/4: Choose a kernel function:(Just type the number. e.g 1)
1-Linear
2-RBF
3-RBF(Rectangular kernel)
-> 2
Step 3: To evaluate TwinSVM performance, You can either use K-Fold cross validation or split your data into training and test sets.
Step 3/4: Choose a test methodology:(Just type the number. e.g 1)
1-K-fold cross validation
2-Train/test split
-> 1
Determine number of folds for cross validation: (e.g. 5)
-> 5
Step 4: You need to determine the range of C penalty parameter and gamma (If RBF kernel selected.) for exhaustive grid search.
An example:
Step 4/4:Type the range of C penalty parameter for grid search:
(Two integer numbers separated by space. e.g. -> -5 5
-> -4 4
After completing the above steps, the exhaustive search will be started. When the search process is completed, a detailed classification result will be saved in a spreadsheet file. In this file, all the common evalaution metrics(e.g Accuracy, Recall, Precision and F1) are provided.
A instance of spreadsheet file containing classification result can be seen here.
LightTwinSVM can be imported as a Python package in your project. Currently, a Jupyter notebook is avaliable here, which is "A Step-by-Step Guide on How to Use Multi-class TwinSVM".
To run the notebooks, make sure that Jupyter is installed on your system. If not, use the following command to install it:
pip3 install jupyter
For more details, check out Jupyter documentation.
Aside from the program's command line interface, you may want to use the LightTwinSVM's Python package for your project. All you have to do is to copy-paste the "ltsvm" folder (the installed version) into the root folder of your project. Next, you can import "ltsvm" package in a module of your interest.
You can read about the documentation of the LightTwinSVM's estimators and tools here.
- LIBSVM data files are supported. Note that the extension of this file should be '*.libsvm'.
- For comma separated value (CSV) file, make sure that your dataset is consistent with the following rules:
- First row can be header names. (It's optional.)
- First column should be labels of samples. Moreover, labels of positive and negative samples should be 1 and -1, respectively.
- All the values in dataset except headernames should be numerical. Nominal values are not allowed.
To help you prepare your dataset and test the program, three datasets are included here.
Have a question about the software?
You can contact me via email. Feedback and suggestions for improvements are welcome.
Have a problem with the software or found a bug?
To let me know and fix it, please open an issue here.
To report a problem or bug, please provide the following information:
- Error messages
- Output of the program.
- Explain how to reproduce the problem if possible.
If you use the LightTwinSVM program in your research work, please cite the following paper:
- Mir et al., (2019). LightTwinSVM: A Simple and Fast Implementation of Standard Twin Support Vector Machine Classifier. Journal of Open Source Software, 4(35), 1252, https://doi.org/10.21105/joss.01252
BibTeX entry:
@article{ltsvm2019,
title = {LightTwinSVM: A Simple and Fast Implementation of Standard Twin Support Vector Machine Classifier},
author = {Mir, Amir M. and Nasiri, Jalal A.},
journal = {Journal of Open Source Software},
volume = {4},
issue = {35},
pages = {1252},
year = {2019},
doi = {10.21105/joss.01252},
url = {https://doi.org/10.21105/joss.01252}
}
Thanks for considering contribution to the LightTwinSVM project. Contributions are highly appreciated and welcomed. For guidance on how to contribute to the LightTwinSVM project, please see the contributing guideline.
- What is the main idea of TwinSVM classifier?
TwinSVM does classification by using two non-parallel hyperplanes as opposed to a single hyperplane in the standard SVM. In TwinSVM, each hyperplane is as close as possible to samples of its own class and far away from samples of other class. To know more about TwinSVM and its optimization problems, you can read this blog post.
If you have used the LightTwinSVM program and found it helpful, please consider making a donation via PayPal to support this work. It also motivates me to maintain the program.
In order to indicate the effectiveness of the LightTwinSVM in terms of accuracy, experiments were conducted to compare it with scikit-learn's SVM on several UCI benchmark datasets. Similar to most research papers on classification, K-fold cross-validation is used to evaluate these classifiers (K was set to 5). Also, grid search was used to find the optimal values of hyper-parameters. Table below shows the accuracy comparison between the LightTwinSVM and scikit-learn's SVM.
Datasets | LightTwinSVM | scikit-learn's SVM | Difference in Accuracy |
---|---|---|---|
Pima-Indian | 78.91±3.73 | 78.26±2.62 | 0.65 |
Australian | 87.25±2.27 | 86.81±3.22 | 0.44 |
Haberman | 76.12±4.79 | 76.80±2.68 | -0.68 |
Cleveland | 85.14±5.45 | 84.82±4.04 | 0.32 |
Sonar | 84.62±4.89 | 64.42±6.81 | 20.2 |
Heart-Statlog | 85.56±2.96 | 85.19±2.62 | 0.37 |
Hepatitis | 86.45±5.16 | 83.23±3.55 | 3.22 |
WDBC | 98.24±1.36 | 98.07±0.85 | 0.17 |
Spectf | 81.68±5.35 | 79.78±0.19 | 1.9 |
Titanic | 81.93±2.59 | 82.27±1.83 | -0.34 |
Mean Accuracy | 84.59 | 81.94 | 2.65 |
From the above table, it can be found that LightTwinSVM is more efficient in terms of accuracy. Therefore, it outperforms sklearn's SVM on most datasets. All in all, if you have used SVM for your task/project, the LightTwinSVM program may give you a better predication accuracy for your classification task. More information on this experiment can be found in the project's paper here.
- For test and experiment with the LightTwinSVM program, Wine and Pima-Indian datasets are included in the project from UCI machine learning repository.
- Thanks to Stefan van der Walt and Nicolas P. Rougier for reviewing this project, which published in the Journal of Open Source Software. (March 31, 2019)
- Thanks to idejie for test and support on the MacOS. (Dec 8, 2018)