Working with models vs. working with data
3 years ago
If I was to point out one most common mistake of a rookie Data Scientist, it’s their focus on the model, not on the data.
“If I only tweak this hyperparameter, if I only change this neural layer, suddenly all the pieces of the puzzle will jump into their proper place and my results will skyrock” they often think.
Well, almost always it’s not true.
To understand why the changing the model will not change much, I propose to look at the training process somehow differently. Let us see it as a way of extracting information. In this view, the model doesn’t “learn” anything, it just, as the training process continues, extracts more and more useful information. And only at the very end this “information set” is translated into classe, probabilities, bounding boxes, whatever the model’s task was.
This change in perspective allows us to see two fundamental assumptions that we have when working with any model: that the data does have this “useful information” in it, and that the model is able to extract it.
And the latter is the key to score high results. One has to understand limitations of the model in this regard.
For example, decision trees don’t understand ratios. If the task requires dividing one feature by another, they have no means to do it. This has to be done by a Data Scientist, the model has to get it “on the plate”, otherwise it will not use it. And no hyperparameter tuning will do any good.
Another example: convolutional neural networks, and rotations. They don’t “understand” rotations, they will have two very different decisions on the same, but rotated, object. To minimize this effect, a Data Scientist must make sure the network will see a lot of rotated objects during the training process. Swapping one layer for another will not solve this blind spot.
And this part of the job is really hard, since it’s way easier to just change a number here, or a layer there, instead of analyzing the data, and figuring out what kind of new feature, or new augmentation, is needed. Especially since it is usually not taught at Machine Learning courses.
Nevertheless, this field is called Data Science, not Data Models, for a reason :)