Big Data Quality Dimensions: A Systematic Literature Review

Anandhi Ramasamy, Soumitra Chowdhury

Resumo


Although big data has become an integral part of businesses and society, there is still concern about the quality aspects of big data. Past research has focused on identifying various dimensions of big data. However, the research is scattered and there is a need to synthesize the ever involving phenomenon of big data. This research aims at providing a systematic literature review of the quality dimension of big data. Based on a review of 17 articles from academic research, we have presented a set of key quality dimensions of big data.

Palavras-chave


data, big data, quality, dimensions, assessment

Texto completo:

PDF (English)

Referências


Anstiss, S. (2012). Understanding data quality issues in dynamic organisational environments–a literaturereview. In Proceedings of the 23rd Australasian Conference on Information Systems 2012 (pp. 1-10). ACIS. http://dro.deakin.edu.au/eserv/DU: 30049090/anstiss-understandingdata-2012.pdf

Ardagna, D., Cappiello, C., Samá, W., & Vitali, M. (2018). Context-aware data quality assessment for big data. Future Generation Computer Systems, 89, 548-562. https://re.public.polimi.it/retrieve/handle/11311/1057520/295709/FutureGeneration.pdf

Batini, C., Rula, A., Scannapieco, M., & Viscusi, G. (2015). From Data Quality to Big Data Quality. Journal of Database Management, 26(1), 60-82. https://www.igi-global.com/article/from-data-quality-to-big-dataquality/140546

Becker, D., King, T. D., & McMullen, B. (2015). Big data, big data quality problem. In 2015 IEEE International Conference on Big Data (Big Data) (pp. 2644-2653). https://ieeexplore.ieee.org/abstract/document/7364064

Cai, L. & Zhu, Y., (2015). The Challenges of Data Quality and Data Quality Assessment in the Big Data Era.Data Science Journal, 14(2), 1-10. https://datascience.codata.org/articles/10.5334/dsj-2015-002/

Catarci, T., Scannapieco, M., Console, M., & Demetrescu, C. (2017). My (fair) big data. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 2974-2979). https://ieeexplore.ieee.org/abstract/document/8258267

Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications. http://us.sagepub.com/en-us/nam/research-design/book255675

DAMA, (2013). Defining Data Quality Dimensions. Data Management Association (DAMA)/ UK Working Group. https://is.gd/dama_def_data_quality_dim

El Alaoui, I., Gahi, Y., & Messoussi, R. (2019). Big Data Quality Metrics for Sentiment Analysis Approaches. In Proceedings of the 2019 International Conference on Big Data Engineering (pp. 36-43). https://dl.acm.org/citation.cfm?id=3341629

Fu, Q., & Easton, J. M. (2017). Understanding data quality: Ensuring data quality by design in the rail industry. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 3792-3799). https://ieeexplore.ieee.org/abstract/document/8258380

Hazen, B. T., Boone, C. A., Ezell, J. D. & Jones-Farmer, L. A., (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics, 154, 72-80. https://www.sciencedirect.com/science/article/abs/pii/S0925527314001339

Kitchenham, B. (2004). Procedures for Performing Systematic Reviews. Keele University. http://www.it.hiof. no/~haraldh/misc/2016-08-22-smat/Kitchenham-Systematic-Review-2004.pdf

Kwon, O., Lee, N. & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management, 34, 387-394. https://www.sciencedirect.com/science/article/pii/S0268401214000127

Laranjeiro, N., Soydemir, S. N., & Bernardino, J. (2015). A survey on data quality: classifying poor data. In 2015 IEEE 21st Pacific rim international symposium on dependable computing (PRDC) (pp. 179-188). https://ieeexplore.ieee.org/abstract/document/7371861

Liu, J., Li, J., Li, W. & Wub, J. (2015). Rethinking big data: A review on the data quality and usage issues. Journal of Photogrammetry and Remote Sensing, 115, 134-142. https://www.sciencedirect.com/science/article/abs/pii/S0924271615002567

Rao, D., Gudivada, V. N., & Raghavan, V. V. (2015). Data quality issues in big data. In 2015 IEEE International Conference on Big Data (Big Data) (pp. 2654-2660). https://ieeexplore.ieee.org/abstract/document/7364065

Serhani, M. A., El Kassabi, H. T., Taleb, I., & Nujum, A. (2016). A hybrid approach to quality evaluation across big data value chain. In 2016 IEEE International Congress on Big Data (BigData Congress) (pp.418-425). IEEE.https://ieeexplore.ieee.org/abstract/document/7584971

Taleb, I., El Kassabi, H. T., Serhani, M. A., Dssouli, R., & Bouhaddioui, C. (2016). Big data quality:A quality dimensions evaluation. In 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, (pp. 759-765).

Taleb, I., Serhani, M. A., & Dssouli, R. (2018). Big data quality: A survey. In 2018 IEEE International Congress on Big Data (BigData Congress) (pp. 166-173). https://ieeexplore.ieee.org/abstract/document/8457745

Taleb, I., & Serhani, M. A. (2017). Big Data Pre-Processing: Closing the Data Quality Enforcement Loop. In 2017 IEEE International Congress on Big Data (BigData Congress) (pp. 498-501). https://ieeexplore.ieee.org/abstract/document/8029366

Taleb, N., (2013). Beware the big errors of ‘Big Data’. ttps://www.wired.com/2013/02/big-data-means-bigerrors-people/

The world’s most valuable resource is no longer oil, but data. (2017, May 6). The Economist. https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data

Wang, R. Y. & Strong, D. M., (1996). Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, 12(4), 5-33.

Woodall, P. et al., (2014). An Investigation of How Data Quality is Affected by Dataset Size in the Context of Big Data Analytics. In 19th International Conference on Information Quality (ICIQ), Xi’an, China. https://is.gd/Woodall_et_al_big_data

Woodall, P., Borek, A. & Kumar Parlikad, A., (2013). Data quality assessment: The Hybrid Approach. Information & Management, 50(7), 396-382. https://www.sciencedirect.com/science/article/abs/pii/S0378720613000517

Xie, C., Gao, J., & Tao, C. (2017). Big data validation case study. In 2017 IEEE third international conference on big data computing service and applications (BigDataService) (pp. 281-286). https://ieeexplore.ieee.org/abstract/document/7944952

Zhang, P., Xiong, F., Gao, J., & Wang, J. (2017). Data quality in big data processing: Issues, solutions and open problems. In 2017 IEEE Smart World, Ubiquitous Intelligence & Computing. (pp. 1-7). https://ieeexplore.ieee.org/abstract/document/8397554




DOI: http://dx.doi.org/10.4301/S1807-1775202017003