Accelerating the Internet in the presence of Big Data: Reducing user delays by leveraging historical user request patterns for web caching

Chetan Kumar, Sean Marston


Approximately 4 billion people have access to the Internet, additionally 23 billion devices are connected as of 2018. This has allowed for a substantial growth in data collection which has allowed for Big Data to flourish. The continued increase in user, devices, and Big Data usage has created a significant intensification in Internet traffic. This in turn has the potential to increase user delays when accessing data on the Internet. There are a number of ways to help reduce user latency, web caching is able to reduce web user delays in addition to reducing network traffic and the load on web servers. In this study we propose a proxy level web caching mechanism leveraging historical web patterns to help reduce user latency and accelerate the Internet. In addition we survey the state of the art of other caching approaches. Our investigation shows there is useful potential for effective proxy caching mechanisms that exploit historical request patterns to significantly reduce delays for web users if they were to be deployed in large scale networks in this Big Data era.


Big Data, User delays, Web caching, Proxy cache, Historical request patterns


Alexa Report (2019). Top Sites in United States January February 2019 (n.d.).

Retrieved from

Ali, W., Shamsuddin, S.M., and Ismail, A.S. (2012). Intelligent web proxy caching approaches based on machine learning techniques. Decision Support Systems, 53(3), pp.565-579.

Cao, C. & Irani, S. (1997). Cost-Aware WWW Proxy Caching Algorithms. In Proceedings of the Usenix Symposium on Internet Technologies and Systems.

Cisco Global Cloud Index: Forecast and Methodology, 2016–2021 White Paper. (2018, November 19). Retrieved February 25, 2019, from

Cloudtweaks (2015). Infographic: How much data is produced every day? (n.d.) (2015, February 14). Retrieved February 25, 2019, from

Cockburn A. & McKenzie, B. (2002). Pushing Back: Evaluating a New Behaviour for the Back and Forward Buttons in Web Browsers. International Journal of Human-Computer Studies.

Cui, Y., Song, J., Li, M., Ren, Q., Zhang, Y., & Cai, X. (2018). SDN-based Big Data caching in ISP networks. IEEE Transactions on Big Data, 4(3), 356-367.

Datta, A., Dutta, K., Thomas, H. & VanderMeer, D. (2003). World Wide Wait: A Study of Internet Scalability and Cache-Based Approaches to Alleviate It. Management Science, 49(10), 1425–1444.

Davison, B.D. (2001). A Web Caching Primer. IEEE Internet Computing, 5(4), 38–45.

Davison, B.D. (2013). Web Caching and Content Delivery Resources. Retrieved from

Dospeed (2019). The Future of the Internet - 7 Big Predictions of 2020. (n.d.) (2019, January 26). Retrieved from

Elfayoumy, S. and Warden, S. (2014). Adaptive Cache Replacement: A Novel Approach. International Journal of Advanced Computer Science and Applications (IJACSA), 5(7).

Floratou, A., Megiddo, N., Potti, N., Özcan, F., Kale, U., & Schmitz-Hermes, J. (2015). Adaptive Caching Algorithms for Big Data Systems. IBM Research Report.

Gewirtz, D. (2018, March 21). Volume, velocity, and variety: Understanding the three V's of Big Data. Retrieved February 27, 2019, from

Hosanagar, K., & Tan, Y. (2004). Optimal Duplication in Cooperative Web Caching. In Proceedings of the 13th Workshop on Information Technology and Systems.

IBM Big Data & Analytics (2019). The Four V's of Big Data. (n.d.). Retrieved February 25, 2019, from

IDC Worldwide Big Data Technology and Services 2014–2018 Forecast Report (2014). Retrieved from

Indivigital (2018). 218 internet facts and statistics for 2018. (n.d.).

Retrieved from

Irani, S. and Lam, J. (2015). Cache Replacement with Memory Allocation. In Proceedings of the Seventeenth Workshop on Algorithm Engineering and Experiments (ALENEX).

Kumar, C., & Norris, J.B. (2008). A New Approach for a Proxy-Level Web Caching Mechanism. Decision Support Systems, 46, 52-60.

Kumar, C. (2009). Performance Evaluation for Implementations of a Network of Proxy Caches. Decision Support Systems, 46, 492-500.

Kumar, C. (2010). Speeding up the Internet: Exploiting Historical User Request Patterns for Web Caching. In Encyclopedia of E‒Business Development and Management in the Global Economy, Lee, I. (Ed.), IGI Global, PA.

Kumar, C. (2016). Speeding Up the Internet in Big Data Era: Exploiting Historical User Request Patterns for Web Caching to Reduce User Delays. In I. Lee (Ed.), Encyclopedia of E-Commerce Development, Implementation, and Management. IGI Global, Hershey, PA, pp. 880-886

McAfee, A. and Brynjolfsson, E. (2012). Big Data: The Management Revolution. Harvard Business Review, October Issue, Product #: R1210C-PDF-ENG.

Mookherjee, V.S., & Tan, Y. (2002). Analysis of a Least Recently Used Cache Management Policy for Web Browsers. Operations Research, 50(2), 345¬–357.

Moore, S. (2018, June 19). How to Create a Business Case for Data Quality Improvement. Retrieved February 25, 2019, from

Podlipnig, S., & Boszormenyi, L. (2003). A Survey of Web Cache Replacement Strategies. ACM Computing Surveys, 35(4), 374–398.

Rizzo, L., & Vicisano, L. (2000). Replacement Policies for a Proxy Cache. IEEE/ACM Transactions on Networking, 8(2), 158¬–170.

Sorn, J. and Tsuyoshi, M. (2013) Web Caching Replacement Algorithm Based on Web Usage Data.

New Generation Computing, 31(4), 311-329.

Stevens, J. (2018, December 18). Internet Statistics & Facts (Including Mobile) for 2019. Retrieved February 25, 2019, from

Tauscher L. & Greenberg, S. (1997). How People Revisit Web Pages: Empirical Findings and Implications for the Design of History Systems. International Journal of Human Computer Studies, Special issue on World Wide Web Usability, 47, 97–138.

Zeng, D., Wang, F., & Liu, M. (2004). Efficient Web Content Delivery Using Proxy Caching Techniques. IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, 34(3), 270–280.

hang, G. Li, Y., & Lin, T. (2013). Caching in information centric networking: A survey.

Computer Networks, 57 (16), 3128–3141.

Zhao, Y. and Wu, J. (2013). Dache: A data aware caching for big-data applications using the MapReduce framework. Proceedings of IEEE INFOCOM.