Has Big Data Become Obsolete?

Has Big Data Become Obsolete?
Has Big Data Become Obsolete?
Rather than saying that “big data” has become obsolete, it may be more accurate to say that it has not truly begun yet. As long as Moore’s Law continues to hold—meaning that electronic technology roughly doubles every 18 months—the era of big data is still only on its way. This is because as computing technology continues to advance and storage costs continue to fall, people have ever more resources to collect larger amounts of data and perform more fine-grained analysis. However, in traditional data analysis, once the volume of data reaches a certain scale, the results no longer improve further.
Take the simplest example of linear classification. Imagine two types of balls scattered across a plane—red balls and blue balls. We draw a straight line that separates the two groups as well as possible, and then use that line to predict the color of new balls placed on the plane, even if some of them are partially hidden. It is easy to see that because the classification model is so simple—just a single straight line—having massive amounts of data may not do much to improve the model’s accuracy. This is also one of the problems traditional data science has encountered. The main bottleneck of machine learning—the primary analytical tool of data science—lies here as well: in such cases, more data does not necessarily mean more value.
Deep learning has broken through this bottleneck. Put simply, it analyzes data through multiple layers and many computational operators, making it possible to build sufficiently complex models and thereby improve analytical capability. This approach is also called a neural network, because each operator is tiny and interconnected like a neuron. Of course, the field itself has no real biomimetic significance; it is simply called that because it appears similar to a network of nerves. Under this approach, larger datasets can often lead to higher accuracy, and there is even the possibility that quantitative change may produce a qualitative leap in performance. As a result, data scientists’ demand for data has suddenly increased, and big data science has emerged accordingly.
One of the criticisms of deep learning is that as models become more complex, people can no longer understand machine classification standards as easily as they can understand a straight line. When there is a black box of understanding, machine learning can look like witchcraft to some people. For example, if we provide a model with a set of strong essays and weaker essays, after training it can score new essays. But those scores are derived only from the examples it has learned from, and the machine cannot provide a detailed explanation for why it assigned a particular score. This significantly undermines trust in the result. That said, recent work explaining the principles behind deep learning algorithms may be the first step in turning deep learning from “witchcraft” into a science with theoretical support.
In any case, with the rapid development of deep learning, big data has probably only just lifted a corner of the curtain; it is still far from having fully arrived. As deep learning and artificial intelligence—the latter often built on the former—continue to advance rapidly, the scale of data demand will only keep growing. Perhaps only then will the true “big data era” really begin.


