Big Data Quality Dimensions: A Systematic Literature Review
DOI:
https://doi.org/10.4301/S1807-1775202017003Keywords:
data, big data, quality, dimensions, assessmentAbstract
Although big data has become an integral part of businesses and society, there is still concern about the quality aspects of big data. Past research has focused on identifying various dimensions of big data. However, the research is scattered and there is a need to synthesize the ever involving phenomenon of big data. This research aims at providing a systematic literature review of the quality dimension of big data. Based on a review of 17 articles from academic research, we have presented a set of key quality dimensions of big data.References
Anstiss, S. (2012). Understanding data quality issues in dynamic organisational environments–a literaturereview. In Proceedings of the 23rd Australasian Conference on Information Systems 2012 (pp. 1-10). ACIS. http://dro.deakin.edu.au/eserv/DU: 30049090/anstiss-understandingdata-2012.pdf
Ardagna, D., Cappiello, C., Samá, W., & Vitali, M. (2018). Context-aware data quality assessment for big data. Future Generation Computer Systems, 89, 548-562. https://re.public.polimi.it/retrieve/handle/11311/1057520/295709/FutureGeneration.pdf
Batini, C., Rula, A., Scannapieco, M., & Viscusi, G. (2015). From Data Quality to Big Data Quality. Journal of Database Management, 26(1), 60-82. https://www.igi-global.com/article/from-data-quality-to-big-dataquality/140546
Becker, D., King, T. D., & McMullen, B. (2015). Big data, big data quality problem. In 2015 IEEE International Conference on Big Data (Big Data) (pp. 2644-2653). https://ieeexplore.ieee.org/abstract/document/7364064
Cai, L. & Zhu, Y., (2015). The Challenges of Data Quality and Data Quality Assessment in the Big Data Era.Data Science Journal, 14(2), 1-10. https://datascience.codata.org/articles/10.5334/dsj-2015-002/
Catarci, T., Scannapieco, M., Console, M., & Demetrescu, C. (2017). My (fair) big data. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 2974-2979). https://ieeexplore.ieee.org/abstract/document/8258267
Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications. http://us.sagepub.com/en-us/nam/research-design/book255675
DAMA, (2013). Defining Data Quality Dimensions. Data Management Association (DAMA)/ UK Working Group. https://is.gd/dama_def_data_quality_dim
El Alaoui, I., Gahi, Y., & Messoussi, R. (2019). Big Data Quality Metrics for Sentiment Analysis Approaches. In Proceedings of the 2019 International Conference on Big Data Engineering (pp. 36-43). https://dl.acm.org/citation.cfm?id=3341629
Fu, Q., & Easton, J. M. (2017). Understanding data quality: Ensuring data quality by design in the rail industry. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 3792-3799). https://ieeexplore.ieee.org/abstract/document/8258380
Hazen, B. T., Boone, C. A., Ezell, J. D. & Jones-Farmer, L. A., (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics, 154, 72-80. https://www.sciencedirect.com/science/article/abs/pii/S0925527314001339
Kitchenham, B. (2004). Procedures for Performing Systematic Reviews. Keele University. http://www.it.hiof. no/~haraldh/misc/2016-08-22-smat/Kitchenham-Systematic-Review-2004.pdf
Kwon, O., Lee, N. & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management, 34, 387-394. https://www.sciencedirect.com/science/article/pii/S0268401214000127
Laranjeiro, N., Soydemir, S. N., & Bernardino, J. (2015). A survey on data quality: classifying poor data. In 2015 IEEE 21st Pacific rim international symposium on dependable computing (PRDC) (pp. 179-188). https://ieeexplore.ieee.org/abstract/document/7371861
Liu, J., Li, J., Li, W. & Wub, J. (2015). Rethinking big data: A review on the data quality and usage issues. Journal of Photogrammetry and Remote Sensing, 115, 134-142. https://www.sciencedirect.com/science/article/abs/pii/S0924271615002567
Rao, D., Gudivada, V. N., & Raghavan, V. V. (2015). Data quality issues in big data. In 2015 IEEE International Conference on Big Data (Big Data) (pp. 2654-2660). https://ieeexplore.ieee.org/abstract/document/7364065
Serhani, M. A., El Kassabi, H. T., Taleb, I., & Nujum, A. (2016). A hybrid approach to quality evaluation across big data value chain. In 2016 IEEE International Congress on Big Data (BigData Congress) (pp.418-425). IEEE.https://ieeexplore.ieee.org/abstract/document/7584971
Taleb, I., El Kassabi, H. T., Serhani, M. A., Dssouli, R., & Bouhaddioui, C. (2016). Big data quality:A quality dimensions evaluation. In 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, (pp. 759-765).
Taleb, I., Serhani, M. A., & Dssouli, R. (2018). Big data quality: A survey. In 2018 IEEE International Congress on Big Data (BigData Congress) (pp. 166-173). https://ieeexplore.ieee.org/abstract/document/8457745
Taleb, I., & Serhani, M. A. (2017). Big Data Pre-Processing: Closing the Data Quality Enforcement Loop. In 2017 IEEE International Congress on Big Data (BigData Congress) (pp. 498-501). https://ieeexplore.ieee.org/abstract/document/8029366
Taleb, N., (2013). Beware the big errors of ‘Big Data’. ttps://www.wired.com/2013/02/big-data-means-bigerrors-people/
The world’s most valuable resource is no longer oil, but data. (2017, May 6). The Economist. https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
Wang, R. Y. & Strong, D. M., (1996). Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, 12(4), 5-33.
Woodall, P. et al., (2014). An Investigation of How Data Quality is Affected by Dataset Size in the Context of Big Data Analytics. In 19th International Conference on Information Quality (ICIQ), Xi’an, China. https://is.gd/Woodall_et_al_big_data
Woodall, P., Borek, A. & Kumar Parlikad, A., (2013). Data quality assessment: The Hybrid Approach. Information & Management, 50(7), 396-382. https://www.sciencedirect.com/science/article/abs/pii/S0378720613000517
Xie, C., Gao, J., & Tao, C. (2017). Big data validation case study. In 2017 IEEE third international conference on big data computing service and applications (BigDataService) (pp. 281-286). https://ieeexplore.ieee.org/abstract/document/7944952
Zhang, P., Xiong, F., Gao, J., & Wang, J. (2017). Data quality in big data processing: Issues, solutions and open problems. In 2017 IEEE Smart World, Ubiquitous Intelligence & Computing. (pp. 1-7). https://ieeexplore.ieee.org/abstract/document/8397554