Note: This post touches on a challenging but important topic — diversity and inclusion. I want to recognize ahead of time that as a cis white man, I speak of this from a place of privilege. If I get things wrong or say something offensive, I’d encourage you to shoot me an email (databetter@substack.com) and I’ll do my best to learn and correct things. I’m always trying to get better on these issues and have a long way to go. Thank you in advance for understanding.
When I began to work on the 2020 US census outreach, I thought it would be a wonky policy pursuit, not a central focus of political debate. By adding a question about citizenship at the 11th hour, the Trump administration ensured that every time I mentioned I was working on the census, eyebrows would raise about the nationwide uproar regularly discussed in both the headlines of the New York Times and the chyrons of Fox News.
I’m not going to dive deep into the politics of the debate, because that’s not the purpose of this project. But I do think it is a case study that provides some insight into the challenges of collecting sensitive data collection that too often get overlooked.
At first blush, it’s a bit surprising that the citizenship question was so controversial. This wasn’t anything new — the Census Bureau had been asking this same question for years without controversy on a different survey, the American Community Survey (ACS). ACS is not a small obscure survey: it surveys 2M-3M people a year. Moreover, both immigration advocates and critics support gathering better statistics on immigrant communities. It’s hard to make a case about immigration issues, regardless of your politics, without credible numbers. And historically, data from the Census Bureau has been considered one of the most trustworthy non-partisan sources of this information.1
Beyond the obvious politics, there were serious operational implications posed by this question.
Specifically:
Asking someone for their immigration status on an official government survey when the administration has a strong anti-immigration stance might scare some people away.23
People won’t respond accurately. They intentionally or unintentionally misstate their status at high rates. So it’s a bad way to collect these data accurately.4
In 2018 Census staff analyzed ACS data and concluded that millions of households, particularly in the Latinx community, would skip the 2020 Census due to this question. Independent researchers from Harvard ran a large survey experiment and came up with even more dire predictions (see footnotes for links to the studies).
If people didn’t respond to the census themselves, it would mean the bureau would have to spend tens of millions of dollars paying people to go door-to-door to try to get responses.5 As you imagine, this is not only expensive but pretty ineffective. Households unwilling to take a paper survey because of a sensitive question would also be unlikely to provide information to the census takers at their door. And even if people answered the question, did they know their status, and did they want to disclose it accurately?6
It’s important to recognize the citizenship question wasn’t the only recent effort to add additional questions to the Census. The most notable other push for the 2020 Census to get a question on sexual orientation and gender identity added.7 It wasn’t front-page news, but this was a serious effort. If it had been included, you might predict this question would stir up a huge backlash from progressives and it would potentially alienate millions of people.
But you’d probably be wrong. In fact, it was progressive and LGBTQI+ advocacy groups, like the Human Rights Campaign, who pressed the bureau to include this question on many of their surveys.8 The Bureau had performed extensive testing and it was basically ready to add it. Having talked to some of the amazing Census advocates from LGBTQI+ community, I think the question would have almost certainly been included if Hillary were president.
The bottom line is that context matters a lot for data collection. Seemingly innocuous — or even intentionally inclusive — questions can become highly problematic depending on the context.
A more recent example of this is the State of Minnesota’s process to make COVID vaccine appointments. Their form has mandatory questions on race, ethnicity, gender, and sexual orientation (see screenshot below) that appear before one makes an appointment. Most of these questions have “prefer not to say” options, but they’re hidden in dropdown menus. I’m assuming that Minnesota is collecting these data for noble reasons — they want to figure out which communities may be lagging on signups. However, it would be very reasonable for someone to be skeptical of why the government needs this information for an appointment.
Ironically, asking these questions on a signup form may discourage the very people the State of Minnesota is trying to help. If you’re going to collect this information, a much better time would be while someone is in the waiting room after they’ve received the vaccine. Or if you’re using the data to see if there’s an issue with no-shows for appointments in certain subgroups, at least ask the question after they’ve been given an appointment.
There’s an interesting epilogue to the citizenship question story. Because of the controversy, the Census Bureau commissioned a very large-scale randomized control trial to try to figure out how the question might affect operations. Nearly 480K Americans were mailed a Census form in 2019, half with the citizenship question and half without.9 Surprisingly, the question had little effect: There were barely any differences in response rates and only minor differences for subgroups, like Latinx respondents, and they were much smaller than expected. The Census Bureau staff breathed a sigh of relief. And indeed, one way to interpret these results is the liberal world got all worked up for no reason.
Another way to look at it is that the damage was done across the board and wasn’t detectable by their experiment. After all, you had to actually open the envelope (or online survey) and read the form to know whether it had the question on it. Having surveyed and interviewed a number of Census skeptics, I can tell you that most people assumed that the question was on the Census form, even after the Supreme Court had struck it down. My guess is that people who were worried about the citizenship question put the experimental mailer from the Census Bureau in the trash. Once the toothpaste is out of the tube, it’s hard to ram it back in.
In sum, asking sensitive questions well is a really tricky thing.
Here are a few tips:
Don’t do it unless you have to! Look for other sources of data if they exist;
Check whether the current political and social climate will affect responses;
Pay special attention to the timing and language of the question;
And most importantly, talk to people from these groups to understand their concerns and try to address them;
If you’re on the fence, do a small-scale test to see if it’s an issue.
I’ll dive into more detail on these topics in future posts. But hopefully, it’s a helpful start. If you have suggestions/stories from your own experience, leave a comment below.
I recognize that the Census Bureau does have a sordid history of disclosing sensitive data to assist in detaining Japanese-Americans during WWII. But in recent history, it’s been fairly trusted and actively supported by advocacy groups.
Brown, J.D., Heggeness, M.L., Dorinski, S.M. et al. Predicting the Effect of Adding a Citizenship Question to the 2020 Census. Demography 56, 1173–1194 (2019). https://link.springer.com/article/10.1007/s13524-019-00803-4
Baum, M., Dietrich, B., Goldstein, R., & Sen, M. (2019). Estimating the effect of asking about citizenship on the US census: Results from a randomized controlled trial. https://scholar.harvard.edu/files/mbaum/files/baum_et_al_citizenship_question.pdf
Brown, J.D., Heggeness, M.L., Dorinski, S.M. et al. Predicting the Effect of Adding a Citizenship Question to the 2020 Census. Demography 56, 1173–1194 (2019). https://link.springer.com/article/10.1007/s13524-019-00803-4
See John Abowd’s January 18th, 2018 Memorandum https://assets.documentcloud.org/documents/4500115/Census-Admin-Record-Part-3.pdf
If people don’t respond, the Census Bureau makes a statistical guess (hot deck imputation, for the stats folks).
There was also a push to add a so-called “MENA” category for people who identify as Middle-Eastern or North African.
For example, see https://www.npr.org/2018/09/20/649752485/trump-officials-did-not-want-census-survey-to-ask-about-sexual-orientation
For the results of the Census’s citizenship question test, see this report https://www2.census.gov/programs-surveys/decennial/2020/program-management/census-tests/2019/2019-census-test-report.pdf
Is there a commonly used way to ask sensitive questions that isn't "push polling"? Some sort of proxy data, or side-channel like international money transfers? Or is there a mandate that questions be more direct for the census in particular?
For the 2020 census citizenship question, reading the press it certainly seemed like kneecapping the response was the point. So other serving as an object of what not to do, it doesn't really bear on someone making a good-faith effort to collect data. Is there an example of a good data collection on a politically sensitive topic that you can point to contrast?