³Ô¹Ï¹ÙÍø

Professor Joshua Zhexue Huang will give an open lecture "Approximate Computing for Big Data Analysis" at NSU

Professor Joshua Zhexue Huang (Big Data Institute, Shenzhen University, China) will give an open lecture-tutorial "Approximate Computing for Big Data Analysis".

The lecture will be held on December 8 from 17:00 to 19:45 in an online format. We invite you to participate!

Join the Zoom meeting: (ID: 831 8963 0550, password: 984871).

In the era of big data, datasets with millions of objects and thousands of features have become a phenomenon in many organizations. Such datasets, often in the size of hundred gigabytes or even terabytes, can easily exceed the size of the memory of the cluster systems, creating computing problems in big data analysis. Therefore, how to effectively process and analyze terabyte big data with limited resources is both a theoretical and technical challenge in current big data research.

In this tutorial, we will discuss the issues of distributed data computing with a particular focus on approximate computing for big data. I will start with a general introduction to big data and challenges in big data analysis, and continue with discussions of current technologies used in big data analysis and their shortcomings. Then, I will introduce approximate computing for big data and a new method that uses multiple random samples to compute approximate results of big data. Finally, I will present the new technologies and algorithms to enable approximate computing, including the random sample partition (RSP) data model, the LMGI computing framework and the algorithm to generate the RSP data models fr om HDFS big data files. LMGI is a non-MapReduce framework that allows execution of serial algorithms independently on local nodes or virtual machines without data communications among the nodes. The new technologies present the following breakthroughs in big data computing: analyzing big data without memory lim it, executing serial algorithms directly in distributed computing, and extending the scalability of data analysis to the scale of terabytes on small clusters.

Please pay attention to the following rules for conducting an Internet seminar:
  • Please log in under your name and surname
  • During the report, please turn off the microphone
  • For the convenience of the presenter, it is recommended to submit your questions in the chat in advance