It’s just the chain-rule of multivariable partial derivative
This was said by my ML course teacher when he taught the DL backpropagation formula in a lecture this year. He is a maths genius ranked №1 when he was a bachelor student at Peking University, and then he went to Princeton to pursue a Ph.D. with a top professor in Online Learning. The original saying is “DL is nothing fancy, it’s just the chain-rule of multivariable partial derivative”. He commented DL from a mathematical view since he is an ML theory researcher who is trying to solve all kinds of mathematical problems which are more complicated than the Backprop.
Yes, I hate to exaggerate the current AI trends like a news article published by “BATTMD”(The six Chinese Internet giants: Baidu, Alibaba, Tencent, Toutiao, Meituan, Didi). In the past two years(2016–2018), They normally published a big news and declared that “Our AI technology is reforming the X traditional field”. Has their AI really changed the traditional fields like Transportation, Health, and Education? I don’t believe so. We have a long journey to go.
Not a joke
Here is a popular joke on Twitter:
CEO: We need to use AI for this project.
CTO: Machine learning algorhitms will be used for the project.
Project manager: We just need a simple neural net.
Data engineer: Linear regression will do the trick. Software engineer: if (x>5) then…
Seriously speaking, this is not a joke. Such kind programming implementation is prevalent before 2016 when AI is not so hot. In the early age of the Internet Industry, we don’t have “big data” so the fancy data-driven method doesn’t make sense. Even now, rule-based algorithms have their value in production. Sometimes human priors have more accuracy than the DL model.
Let’s think about the driverless car. Do you dare to sit in a car which is trained by an unexplainable deep learning model? As the picture below shows([Junfeng Yang, Deepxplore, SOSP 17]), such a simple left-or-right bug may cause a vivid human life died. Of course, nobody dares to hang their lives to such an unstable system.
When companies and Researcher in China are thinking about how to follow the AI trend, American Researchers are thinking in a more realistic way: “Can we fix such kind DL model bad case by just modifying a piece of simple if-else code?” That is why how to add rule-based algorithms to ML model and Software Verification are also valuable research topics in Computer Science. Human priors can even have better accuracy. Not every problem fits for statistical machine learning method. (Notes: I am not working on ML/DL Software Verification, but I am curious about the progress of this subarea.)
The attitude towards AI
Although I was a so-called AI product chief software architect and now I am an AI researcher, I decline to exaggerate the current AI trend. I am not the only one practitioner who declines to exaggerate. Pioneer researchers like Eric Xing and Micheal Jordan are excellent models in the AI community. They insist to wear two hats. One is their refusal to the AI exaggeration. Another is confessing that the current statistics-based AI is just a very early age in human’s ultimate dream of AI but simultaneously devoting all of their energy to make AI system have real human intelligence such as inference, learning by limited samples, and creativity.
Have a look at what Micheal Jordan said this year:
Artificial Intelligence — The Revolution Hasn’t Happened https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7
I admire MJ and Eric Xing’s attitude towards AI.
Thank for Prof. Kai-fu Lee’s lectures, interview, and book publication in the USA this month, which made China more understandable by friends in America. But I am not interested in this book. Maybe most readers of his book are governors, entrepreneur, product managers, sales, and general AI fans. True researchers know what our 1-year-old AI infant can do.
Standing on the shoulder of traditional EECS achievements
After my first graduation from a college, I have worked and studied in the Computer Science/Internet areas for ten years. The past ten years, I’ve worked in the traditional Computer Science research areas including Data Mining, Database System, Mobile OS, Distributed System, and Cloud Computing. From the perspective of system architecture design and optimization, Deep Learning is nothing fancy. It’s just standing on the shoulders of giants. These grants are Operating System, Distributed System, Advanced Programming Language, Traditional Algorithms, Computer Network, and Hardware. All of these are core research areas of traditional Computer Science. The traditional EECS achievements make large-scale machine learning computation possible. Tensorflow and PyTorch tuning parameters working style is the same stuff.
Can the three DL pioneers and gurus(Yann LeCun, Geoff Hinton, Yoshua Bengio) share the Turing Award because of their Deep Learning achievement? We can not predict whether several years later the DL can be proved explanable by mathematics. But at least the most recent Turning Award was granted to John L. Hennessy and David A. Patterson(https://www.youtube.com/watch?v=3LVeEjsn8Ts), who were pioneering a systematic, quantitative approach to the design and evaluation of computer architectures with an enduring impact on the microprocessor industry. Modern computer architecture is the beginning of human AI. AI is standing on the modern computer architecture which has a 40 years long history. That’s where we are.
Again, “Deep Learning is nothing fancy, it’s just the chain-rule of multivariable partial derivative”.