Race and the Politics of Statistics

When data collection = political power

May 09, 2021

A friend of mine asked me the other day why some surveys ask a question on race AND a different question on ethnicity. He was referring to surveys having one question on whether you identify as Latinx, Hispanic, or of Spanish origin and a separate question on racial category, without a Latinx option. Why not just ask one?

The short answer is that’s what the US Census Bureau does. And when people want to adjust survey responses to be “representative” of a population, what they do 95% of the time is use Census data to give more weight to some respondents and less weight to others. You can only do this if your demographic questions align very precisely with the Census. They set the standard because they have the most reliable baseline data.

2020 Census Race and Ethnicity Questions

One of the proposed combined questions from the 2015 National Content Test (Sorry for the poor resolution - I couldn’t find a high-res version)

The longer answer involves digging into the contentious history of race categorization in America.1 Racial categorization has always been about power. Inclusion in data collection is official recognition. If you’re not counted, you don’t count. That’s why subgroups push for their own distinct categories and statistics.

It’s also why opponents of expanded rights for minorities vigorously oppose data collection for these groups. In a vacuum, this position doesn’t make any sense. Opponents of minority rights want to track their “successes” just as much as advocates for these rights. Yet at every turn, they fight against data collection because legitimizing these groups through data collection is in of itself a “failure.”

The recent insistence on two separate questions for race and ethnicity is primarily about depressing and muddling the statistics on the Latinx community. Advocacy groups and scholars - including the new Census Bureau Director Rob Santos - have pushed for combining questions.2 Specifically, they’ve proposed adding Latinx added as a new category on the existing race question. The Census Bureau has done extensive tests which show that a combined question produces substantially better data precisely because it’s less confusing.3

This is common sense to most people; for example, I’ve never met someone from the Latinx community that has said that they identify as a “Latina white female.” Similarly, I don’t introduce myself as a “non-Hispanic white male.” Yet that is exactly how the Census’s surveys expect people to think about race and ethnicity.4

It’s likely not an accident the recent census and other official surveys haven’t adopted this new format. The Trump administration quietly stopped the adoption of a combined race and ethnicity question in its tracks.5 One can only speculate as to the motivations for keeping the old confusing format… but considering the administration's insistence on a flawed question on citizenship, it doesn’t take too much imagination to see a Stephen Miller acolyte having their hands in this.

So what is a data collector to do? I’m a fan of a combined question, but I still don’t think it’s the right thing to necessarily do in all cases. If you’re doing a survey where you need to reweight responses relative to population data from the Census, you’re stuck with the existing Census two-question format. Though imperfect, consistency is really helpful from a measurement perspective to compare apples-to-apples. If you don’t need weights, save a question and stick to the combined format.

Political debates on data collection aren’t unique to politics; organizations confront similar issues all the time. For example, if you’re trying to build an inclusive organization, tracking statistics on subgroups is a critical step towards recognizing the issue and laying the groundwork for improvement.

In theory, this should be as easy as adding a question or two to an employee survey. In practice, the politics are much more complex. Those in power may resist data collection out of fear that it might become public and used against them. Not collecting data is a great way to sideline issues.6 People from subgroups may also not want to divulge personal information, such as sexual preference, in company statistics for fear of further discrimination or becoming tokenized. Getting buy-in from members of subgroups before advocating for data collection is critical to making sure these well-intentioned efforts actually aren’t exclusionary.

I’d maintain it’s a worthy fight in most cases, even if there’s some initial reticence.

Leave a comment

I’ll be writing about this in some future posts. It’s a complex topic, so I don’t want to give it short shrift.

https://www.urban.org/urban-wire/separating-race-ethnicity-surveys-risks-inaccurate-picture-latinx-community

https://www.census.gov/newsroom/releases/archives/2010_census/cb12-146.html

There are people who identify as intersectional (e.g., people who identify as biracial). The two-question is even MORE confusing for them because there’s a separate racial category for “2 or more races.” For example, this is particularly puzzling for people who identify as both Latinx AND white, who don’t really fit in the Census categories at all.

https://www.census.gov/newsroom/press-releases/2018/2020-race-questions.html

https://www.npr.org/2017/12/02/567843266/trump-administration-delays-decision-on-race-ethnicity-data-for-census

See Scott Ganz’s awesome work on this: https://pubsonline.informs.org/doi/10.1287/orsc.2017.1164

Marion

May 10, 2021

Living in North Africa, really interesting to see also what other categorisations have been explicitly pulled out and what is expected to be lumped together under other... and think of which other groups as a result might not be counted well if results splinter into many subgroups !

Expand full comment

Benjamin Tseng

May 9, 2021

Thank you for explaining that which has always confused me about those surveys...

Data Better

Discussion about this post