Big Data Analysis in Public Sector: Opportunities, Tools, and Ethics
Big Data Analysis in Public Sector: Opportunities, Tools, and Ethics
Hendratna Mutaqin
Abstract
Big Data has a significant role in helping the individual, corporate, and government to get a better decision-making which supported by credible evidence. The problem emerges when traditional approaches cannot handle the big data so that sophisticated tools like Hadoop, Visualization Product, NoSQL are required to deal with it. Although having many advantages, the users have to follow the big data ethical rules. Big data ethics must be concerned to evitable any indictments from the other party.
Introduction
Big data analytic helps users to make predictions and solve problems. It can reduce the mistake when making a prediction, reducing cost of living and increasing the income. The vast, unstructured, and unsolved by traditional methods will be challenged to analyze the big data. This paper discusses the opportunity of big data for the public sector, technologies to handle and the ethics.
Literature Review
Big data refer to large data sets with a variety of instances, attributes, and sometimes classes. Yiu (2012) define big data as data sets hands-on by database management tools and hard to analyze by the traditional method. Moreover, Bhoola et al., (2014) show the big data is the enormous size and complex of the data so that it was difficult to process either traditional processing applications or database management tools. The different definition could be caused by the complexity of the data and the type different type of database management tools. Whenever current database management tools can not analyze the big data sets, the new one needs to be created. Yiu (2012) define big data analytics as the examining and interrogating process of big data assets to derive insights of value for decision-making. Big data analytic refer to the prediction and inferences by analyzing large data sets. Characteristic of big data comprises of volume, velocity, variety, veracity, value (Bhoola et al., 2014). There is five type of big data analytic: text, audio, video, social media, and predictive (Gandomi and Haider, 2014).
Big Data Opportunity for the Public Sector
The government also have a tremendous impact to improve public administration and services. The opportunity of big data consists of five classes sharing, learning, personalizing, solving and innovating for growth (Bhoola et al., 2014).
Figure 1 The Opportunity of Big Data
- Sharing, Sharing and linking citizens’ personal information data could enhance the efficiency of time and many of citizens and taxpayer. UK driving license process is a good example. By connecting between UK password, Driver, and Vehicle Agency can improve the process of obtaining a driving license.
- Learning, The previous activities that have been recorded in the database can be references to be assessed by the government to know the health and efficiency of their operations to improve their performance.
- Personalizing, Knowing in-depth the personal information of every citizen can help the government to serve precisely. The government can analyze the citizen's habit like purchase history, complaints, and reviews.
- Solving, A Large dataset with advanced analytics gives the pattern of a previous condition and alternative solution. It could be tools to address the problem of government by analyzing the previous solution and impact in a similar trend.
- Innovating for Growth, The government can identify the underperformance area and reallocate the resource in the rich area by implementing big data analytic. Government service will be improved by implementing big data analytic. However, quality and security of the data should be a concern to maintain the service to deal with the threat such as cyber-attack.
Tools to Handle Big Data
Currently, both Government and Businesses are collecting more data. The data become competitive gold with new skills and a new management style. On the other hand, it becomes useless without proper handling. Handling Big data is not easy. Data analytic tools could be the answer the solve the problem. There are several data analytic tools users can implement to deal with big data problem:
Figure 2 The Popular Big Data Analytic Tools
1.Hadoop
A multiple function open source software framework with the major components are MapReduce and Hadoop Distributed File System (HDFS) (Ashraf et al., 2015). MapReduce is a programming model not algorithm consist of two processing steps Map Key and Reduce. Hadoop is the implementation of MapReduce. Breaking down the data into a smaller size and using the variety of processors to process in parallel (Bhoola et al., 2014).
2.Visualization Product
Malthy (2011) explains visualization product as software has availability to compare models and datasets and decision-making with enable the quantitative and qualitative data.
3.Non-Relational (NoSQL) Databases
Tools consist of a wide variety of different database technologies that have the ability to response to a rise of stored significant data about users, objects, and products. NoSQL also has the abilities not only as a real-time but also offline (Ashraf et al., 2015). NoSQL can store data in rows and columns form with SQL queries to access it and sometimes do not rely on SQL to retrieve the data. Storing information in any structure is also the superiority of these tools (Watson, 2014).
Many data analytic tools, free or payable, available at this time. Zakir et.al. (2015) from their experiment find ten new sophisticated software to analyze big data: Apache Flume, Apache Sqoop, Apache Pig, Apache Hive, Apache ZooKeeper, Mongo DB, Apache Cassandra, Apache Hadoop, MapReduce, Apache Splunk.
One of the popular big data analytic tools is Apache Hadoop. Open-source framework, simple programming models, scaling up with local storage and computation from a single server to thousands of machine are the superiority of Apache Hadoop (Ishwarappa, 2015). The module of Apache Hadoop explains by Hadoop Ishwarappa (2015) comprise of Hadoop Distributed File System (HDFS), Hadoop YARN/Map Reduce, HBase, Pig, Hive, Sqoop, ZooKeeper, Avro, Cassandra, Mahout, Tez, Spark, and Flume. Every part of Hadoop has a different function, for example, HDFS and Mahout. HDFS is a distrusted file system with the role is to store good quality and performance of big data and give information about fault tolerant file system. Mahout is a machine learning and data-mining library with the focus in collaborating filter, cluster, and classification.
Working with various instruments is recommended to maximize the output value from the big data. Users who use NoSQL can also look out the result of different tools to compare and minimize the risk of misrepresentative of the data. Most of big data analytic use SQL, well known by many IT professionals so that it is quite easy to combine, access, and analyze data of any structure.
Big Data Ethics
Balancing between the use of data and human value like identity, confidentiality, privacy, transparency, and free choice can reduce the big data problem. There are four principles of big data ethics should be paramount: privacy, confidentiality, transparency and identity (King and Richards, 2014).
Figure 3 The Principles of Big Data Ethics
- Privacy, definition of privacy makes it become public debates. Hence, the government should establish the rules of privacy protection how collecting the personal data and how the information flows. The important point is the rules to govern and disclosure the information in legal, social and otherwise.
- Confidentiality, Trust is the key of secrecy with the aims to share the private information between sender and receiver under the term that has already agreed. Confidentiality law has to be created and implemented for example keeping the business secret from the employee to reveal to the other party.
- Transparency, Transparency has the crucial role in keeping all of the government activities accountable. On the other hand, Government should have the ability to request transparency policies report to all parties about collecting, sharing and using of public data.
- Identity, Big data must compromise with identity. Identity can mean many things like a specific name to a specific person. Companies can shape the identity of their customer by analyzing their habit. Customer identity must be protected by institutional who uses the data.
Data with large size and the complex 's hard to process either traditional processing applications or database management tools without the knowledge and sophisticated tools to represent it. Big data has many advantages as long as it can be analyzed with the knowledge and proper tools. The government can get the many benefits by implementing of big data analytic consists of the opportunity of sharing, learning, personalizing, solving and innovating for growth. Hadoop, Visualization Product, NoSQL are several sophisticated tools when performing big data analytic. Implementing big data analytic has to be accompanied by ethics like preventing the significant societal values such as privacy, confidentiality, transparency, and identity.
Ashraf, A., El-Bakry H., El-razek, S., El-Mashad, Y. (2015). Handling big data in E-learning. International Journal of Advanced Research in Computer Science & Technology (IJARCST), Vol. 3.
Bhoola, K., Kruger, K., Peick, J., and Tshabalala, N. (2014). Big data analytics. Actuarial Society of South Africa’s 2014 Convention, Cape Town International Convention Centre.
Gandomi, A., Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35, 137-144.
Ishwarappa, Anuradha, J. 2015. A brief introduction on big data 5Vs characteristics and Hadoop Technology. Bhubaneswar: International Conference on Intelligent Computing, Communication & Convergence.
Richards, N., King, Jonathan. (2014). Big data ethics. Retrieved from: http://pacscenter.stanford.edu/ sites/all/ files/RIchards%20and%20King%20Ethics.pdf.
Watson, Hug J. (2014). Tutorial: Big data analytics: concepts, technologies, and applications, Communications of the Association for Information Systems: Vol. 34, Article 65.
Yiu, C. (2012). The big data opportunity. London: Policy Exchange.
Zakir, J., Seymor, T., Berg, K., (2015). Big data analytics. Issues in information systems, Volume 16, Issue II, pp. 81-90.




Comments
Post a Comment