On the evening of November 19th, Deputy Director of Beijing International Mathematics Research Center, Dean of Yuanpei College of Peking University, Professor of Princeton University in the United States, and Academician of Chinese Academy of Sciences E Weinan visited the Caizhai lecture hall. Data research examples have led everyone into the world of “big data”.

E Weinan took the world’s largest search engine Google as an example to analyze the application of data calculation in the field of advertising push. Since its establishment, Google has been facing fierce competition from its search engine peers, but it has still achieved a tenfold increase in market value within less than ten years of listing. Search engines are faced with complex network data. How to find reasonable algorithms to extract effective information accurately and efficiently has prompted the search industry to turn its attention to “cloud computing” in the context of big data. Google is no exception, but while it continues to develop and improve its search functions, it has uniquely combined advertising push with user search preferences. In 2012, it achieved a click-through rate of 3.47% and a conversion rate of 5.63% for search ads, thus gaining A daily advertising revenue of 100 million U.S. dollars. Google’s strategy of combining data computing with advertising push even gave birth to a new discipline-computational advertising.

Before the Internet age, data computing has already been used in many fields. In the 17th century, the German astronomer Kepler discovered the “three laws of planetary motion”. The discovery of these laws was based on the calculation of a large number of astronomical data collected by predecessors. Later, Newton used his second law and the law of universal gravitation to rigorously prove Kepler’s law in mathematics, and let people understand the physical meaning of it, and achieved “not only knowing what is happening, but also knowing why.”

Image data processing and recognition technology is currently a hot research topic, and this technology is also an application of data computing in the real world. E Weinan pointed out that image recognition technology relies more on model-based mathematical operations rather than object-oriented computer algorithms. Unfortunately, the current image recognition and search technology still “does not exceed the level of web search before the advent of Google.”

Expert recommendation system is another main direction of data computing application. E Weinan uses the example of online movie rental provider Netflix to illustrate this point. Netflix will record and analyze the user’s viewing habits, and use sophisticated algorithms to analyze and calculate user data, and then make detailed and personalized video recommendations based on user preferences. Users can use PC, TV or mobile terminals such as iPad, Watch “tailor-made” video programs on the iPhone. In addition to online movie rentals, shopping websites such as Amazon and Taobao, as well as dating websites such as Jiayuan, also rely on expert recommendation systems.

In addition, big data has made great achievements in video processing, social network analysis, and public opinion analysis.

Lecture scene

Finally, E Weinan introduced the basic concepts related to data science. The basic problem that data science needs to solve is to find the model that generates the data based on the given data, so the essence of data analysis is the inverse problem. In the Internet age, data is complex and noise-filled, so how to build a model for these data? E Weinan gave a solution-for data point sets, Bayesian models, Gaussian mixture models, etc. can help solve the problem; for generalized time series data, such as text and biological macromolecules, It can be calculated using the hidden Markov model; two-dimensional field data such as images can be solved using the conditional random field model. E Weinan summarized this scheme as “maximum likelihood estimation, maximum posterior probability estimation”.

The basic methods of computational science have three dimensions: assigning mathematical structure to data, establishing statistical models, and finding algorithms. E Weinan especially emphasized that computational mathematics is an algorithm for functions, that is, an algorithm for continuous problems. There are algorithms such as function approximation, differentiation, integration, optimization, differential equations, and numerical algebra; while computer science is for computer systems (including Network) algorithms are calculated through numerical and matrix operations, network algorithms, sorting and combinatorial optimization methods. The data algorithm is in the middle of the above two, and the advantages of the two are concentrated.

The universality and complexity of data endow data science with many problems and difficulties. Data science is also interdisciplinary: on the one hand, data science involves many disciplines, such as statistics, machine learning, bioinformatics, astroinformatics, computational advertising, and computational sociology; on the other hand, between different disciplines There is also unity. For example, hidden Markov models are used in natural language processing and gene sequence analysis. Peking University has established undergraduate and graduate majors in data science.

“Big data” is one of the hottest topics in the media recently, and it has also been widely used in all walks of life, benefiting ordinary people, but how to make “big data” land is still a problem that needs to be considered by all walks of life. E Weinan pointed out that only by establishing and perfecting the discipline of data science and cooperating closely with the physical industry can we finally realize the implementation of “big data” and truly grasp the historical opportunities of the “big data” era.

Audience questions

After the lecture, E Weinan had a lively interaction with the teachers and students on the topic of data science. (Edited from Peking University News Network)