Performance metrics aren't everything

Share this article: TwitterFacebookGoogle+Diaspora*Email

Lately I’ve been getting pretty annoyed with an obsession with performance metrics. It’s like someone let the word out about what area under the ROC curve is and suddenly everyone thinks that’s the only measure of whether a data science project is ‘good’ or not.

The problem is, whether a model is good or not typically relies on much more than [insert your favorite performance metric here]. Yes, if your model is predicting at chance, it’s almost certainly useless. But even slightly above chance it might be immensely useful (conversely, even with perfect predictions it might be useless). The usefulness of a predictive model is a function of what it enables you to do that you wouldn’t be able to do without it. What actions or interventions will the project as a whole allow that would not otherwise happen, and how valuable are those?

Unfortunately, a little knowledge can be a dangerous thing. I’ve had projects where the predictive accuracy of a model is completely irrelevant - the model was built just to eliminate the effects of a few variables on some outcome (i.e. to get the residual). After showing how well the model captured the trends in those few variables, the non-technical people who had seen a predictive model before immediately started asking about the area under the ROC curve, and judging the project on this. Yet the value of the project was just removing those trends from the outcome, which the model did beautifully.

Return On Investment (brought up often and referred to as ROI by those who want you to know they’ve taken a business class) is a better concept to keep in mind for thinking about the value of a project, even if in practice it’s often not possible to work it out exactly. What is the (value - cost) of having this model in production? The cost of a false predictions and the payoff of true predictions are just as important to figuring this out as the predictive accuracy of a model. There is a lot that a data scientist can do to alter these costs and payoffs, and sometimes that’s a better place to focus effort than getting an extra 0.00001 precision in your model.

For example, often a model is expected to not only make a prediction, but give some idea of why that prediction was made. This is what makes tools like LIME that explain where a prediction has come from so useful. Unfortunately, they don’t completely solve the issue because explanations are complicated. Some explanations might be less useful than others. Explaining a patient is likely to be sick because they are old might be true, but that doesn’t make the explanation or the prediction useful. Additional work of picking out those features that yield explanations of interest (or grouping features together in a way that makes sense) is also necessary. Doing this work well might create a tool that drastically improves the efficiency of some process. Doing it poorly might mean your model directs effort to the wrong places, creating cost and no value. These are both in the realm of possibilities regardless of how accurate your model is.

There are plenty of other examples of ways you can use a predictive model beyond just its prediction, or add value on top of that prediction. Knowing a performance metric doesn’t mean you can judge a project’s value. Real projects aren’t Kaggle competitions where the only thing that matters is predictive accuracy.





Comments !

social