2 Comments

Fascinating example -- this reminds me of Milton Friedman's thermostat in many ways. Outside of your randomization approach, I'm not sure you can really apply a conventional confusion matrix analysis here (incidentally, have you tried determining an AUC for the model? might be more useful way of determining the tradeoff between sensitivity and specificity here than the single point estimate)

Expand full comment
author

Interesting point on the thermostat - it's a very similar problem. For those unfamiliar with it -- including me who had to brush up - here's a link on that:

https://worthwhile.typepad.com/worthwhile_canadian_initi/2010/12/milton-friedmans-thermostat.html

I think we agree on the confusion matrix NOT being the right approach. And I was about to look at ROC/AUC for the model... but I didn't because:

1) The issue wasn't model selection or threshold selection. I'm not sure how helpful a counterfactual like "what if they chose top 10% as a threshold" or some other model. There weren't really competing models and there was no data to evaluate them on. I was looking for how well they did based on the choices groups made using metrics like accuracy, F1, precision, recall, etc.

There was talk among the Census community about other metrics that they should have incorporated, like broadband connectivity rates (there was an online option, after all) that weren't part of the Low Response Score. But they also suffer from the same thermostat issues where if outreach had been funneled that way, the results might have been different.

2) To your point, the purpose of the exercise was to show how inadequate a conventional model evaluation approach is because of the thermostat issues. You could easily come to the wrong conclusion using an AUC or confusion-matrix kind of approach.

The wonky way to think about it is you're unsure if the tracts in the false positives and false negative buckets are due to the interventions working OR the prediction not working (misclassification error).

Where I do think a confusion matrix approach is helpful is you can use it to put bounds on effectiveness. For example, let's say that you thought that outreach could at most help a geography by 5% in self-response rate. You could apply that correction to the true positives and false positives and see how it moves the confusion matrix. Not a scientific approach, but helpful in getting a ballpark.

Expand full comment