<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article" dtd-version="3.0" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">109</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:3dc5f44e-8666-58db-bc76-a455210e8891</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">JUCS - Journal of Universal Computer Science</journal-title>
        <abbrev-journal-title xml:lang="en">jucs</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0948-695X</issn>
      <issn pub-type="epub">0948-6968</issn>
      <publisher>
        <publisher-name>Journal of Universal Computer Science</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3217/jucs-022-06-0760</article-id>
      <article-id pub-id-type="publisher-id">23272</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
        <subj-group subj-group-type="scientific_subject">
          <subject>E.1 - DATA STRUCTURES</subject>
          <subject>H.0 - GENERAL</subject>
          <subject>H.4 - INFORMATION SYSTEMS APPLICATIONS</subject>
          <subject>M.1 - KNOWLEDGE ENGINEERING METHODOLOGIES</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>A Proposal for Recommendation of Feature Selection Algorithm based on Data Set Characteristics</article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Goswami</surname>
            <given-names>Saptarsi</given-names>
          </name>
          <email xlink:type="simple">saptarsi007@gmail.com</email>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Chakrabarti</surname>
            <given-names>Amlan</given-names>
          </name>
          <xref ref-type="aff" rid="A2">2</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Chakraborty</surname>
            <given-names>Basabi</given-names>
          </name>
          <xref ref-type="aff" rid="A3">3</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>
        <addr-line content-type="verbatim">Institute of Engineering and Management, Kolkata, India</addr-line>
        <institution>Institute of Engineering and Management</institution>
        <addr-line content-type="city">Kolkata</addr-line>
        <country>India</country>
      </aff>
      <aff id="A2">
        <label>2</label>
        <addr-line content-type="verbatim">Calcutta University, Kolkata, India</addr-line>
        <institution>Calcutta University</institution>
        <addr-line content-type="city">Kolkata</addr-line>
        <country>India</country>
      </aff>
      <aff id="A3">
        <label>3</label>
        <addr-line content-type="verbatim">Iwate Prefectural University, Takizawa, Japan</addr-line>
        <institution>Iwate Prefectural University</institution>
        <addr-line content-type="city">Takizawa</addr-line>
        <country>Japan</country>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Saptarsi Goswami (<email xlink:type="simple">saptarsi007@gmail.com</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: </p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2016</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>06</month>
        <year>2016</year>
      </pub-date>
      <volume>22</volume>
      <issue>6</issue>
      <fpage>760</fpage>
      <lpage>781</lpage>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/CCA9137D-EE21-5274-A20C-094E39CF26BC">CCA9137D-EE21-5274-A20C-094E39CF26BC</uri>
      <uri content-type="zenodo_dep_id" xlink:href="https://zenodo.org/record/5505241">5505241</uri>
      <history>
        <date date-type="received">
          <day>30</day>
          <month>11</month>
          <year>2015</year>
        </date>
        <date date-type="accepted">
          <day>28</day>
          <month>05</month>
          <year>2016</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Saptarsi Goswami, Amlan Chakrabarti, Basabi Chakraborty</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="" xlink:type="simple">
          <license-p>This article is freely available under the J.UCS Open Content License.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>Feature selection is an important prerequisite of any pattern recognition, machine learning or data mining problem. A lot of algorithms for feature subset selection have been developed so far for reduction of dimensionality of the data set in order to achieve high recognition accuracy with low computational cost. However, some methods or algorithms work well for some of the data sets and perform poorly on others. For any particular data set, it is difficult to find out the most suitable algorithm without some random trial and error process. It seems that the characteristics of the data set might have some effect on the algorithm for feature selection. In this work, the data set characteristics is studied for recommendation of appropriate feature selection algorithm to be used for a particular data set. A new proposal in terms of intra attribute relationship and a measure MVS (multivariate score) has been introduced to quantify and group different data sets on the basis of the data set correlation structure into several categories. The measure is used to group 63 publicly available bench mark data set according to their characteristics. The performance of different feature selection algorithms on different groups of data are then studied by simulation experiments to verify the relationship o f data set characteristics and the feature selection algorithm. The effect of some other data set characteristics has also been studied. Finally a framework of recommendation regarding the choice of proper feature selection algorithm has been indicated.</p>
      </abstract>
    </article-meta>
  </front>
</article>
