<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article" dtd-version="3.0" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">109</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:3dc5f44e-8666-58db-bc76-a455210e8891</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">JUCS - Journal of Universal Computer Science</journal-title>
        <abbrev-journal-title xml:lang="en">jucs</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0948-695X</issn>
      <issn pub-type="epub">0948-6968</issn>
      <publisher>
        <publisher-name>Journal of Universal Computer Science</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3897/jucs.93533</article-id>
      <article-id pub-id-type="publisher-id">93533</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
        <subj-group subj-group-type="scientific_subject">
          <subject>C.2.4 - Distributed Systems</subject>
          <subject>H.2.8 - Database Applications</subject>
          <subject>H.4 - INFORMATION SYSTEMS APPLICATIONS</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Big Data Provenance Using Blockchain for Qualitative Analytics via Machine Learning </article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Mehboob Khan</surname>
            <given-names>Kashif</given-names>
          </name>
          <email xlink:type="simple">kashifmehboobkhan@yahoo.com</email>
          <uri content-type="orcid">https://orcid.org/0000-0002-7208-6072</uri>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Haider</surname>
            <given-names>Warda</given-names>
          </name>
          <uri content-type="orcid">https://orcid.org/0000-0002-5054-2313</uri>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Ahmed Khan</surname>
            <given-names>Najeed</given-names>
          </name>
          <uri content-type="orcid">https://orcid.org/0000-0003-1986-7192</uri>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Saleem</surname>
            <given-names>Darakhshan</given-names>
          </name>
          <uri content-type="orcid">https://orcid.org/0000-0001-8712-3617</uri>
          <xref ref-type="aff" rid="A2">2</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>
        <addr-line content-type="verbatim">NED University of Engineering and Technology, Karachi, Pakistan</addr-line>
        <institution>NED University of Engineering and Technology</institution>
        <addr-line content-type="city">Karachi</addr-line>
        <country>Pakistan</country>
      </aff>
      <aff id="A2">
        <label>2</label>
        <addr-line content-type="verbatim">SSUET, Karachi, Pakistan</addr-line>
        <institution>SSUET</institution>
        <addr-line content-type="city">Karachi</addr-line>
        <country>Pakistan</country>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Kashif Mehboob Khan (<email xlink:type="simple">kashifmehboobkhan@yahoo.com</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: </p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2023</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>28</day>
        <month>05</month>
        <year>2023</year>
      </pub-date>
      <volume>29</volume>
      <issue>5</issue>
      <fpage>446</fpage>
      <lpage>469</lpage>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/2A773D27-71C8-586A-B7EA-7E6FC49A13FD">2A773D27-71C8-586A-B7EA-7E6FC49A13FD</uri>
      <history>
        <date date-type="received">
          <day>17</day>
          <month>08</month>
          <year>2022</year>
        </date>
        <date date-type="accepted">
          <day>19</day>
          <month>01</month>
          <year>2023</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Kashif Mehboob Khan, Warda Haider, Najeed Ahmed Khan, Darakhshan Saleem</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by-nd/4.0/" xlink:type="simple">
          <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-ND 4.0). This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>The amount of data is increasing rapidly as more and more devices are being linked to the Internet. Big data has a variety of uses and benefits, but it also has numerous challenges associated with it that are required to be resolved to raise the caliber of available services, including data integrity and security, analytics, acumen, and organization of Big data. While actively seeking the best way to manage, systemize, integrate, and affix Big data, we concluded that blockchain methodology contributes significantly. Its presented approaches for decentralized data management, digital property reconciliation, and internet of things data interchange have a massive impact on how Big data will advance. Unauthorized access to the data is very challenging due to the ciphered and decentralized data preservation in the blockchain network. This paper proposes insights related to specific Big data applications that can be analyzed by machine learning algorithms, driven by data provenance, and coupled with blockchain technology to increase data trustworthiness by giving interference-resistant information associated with the lineage and chronology of data records. The scenario of record tampering and big data provenance has been illustrated here using a diabetes prediction. The study carries out an empirical analysis on hundreds of patient records to perform the evaluation and to observe the impact of tampered records on big data analysis i.e diabetes model prediction. Through our experimentation, we may infer that under our blockchain-based system the unchangeable and tamper-proof metadata connected to the source and evolution of records produced verifiability to acquired data and thus high accuracy to our diabetes prediction model. </p>
      </abstract>
    </article-meta>
  </front>
</article>
