People holding umbrellas walk New York’s Times Square in 2019. The US Census Bureau plans to change the way it protects the confidentiality of people’s information in the detailed demographics it creates through 2020. Mary Altaffer / AP Hide caption
Mary Altaffer / AP
Mary Altaffer / AP
As the country waits for more results from last year’s national workforce, the US Census Bureau faces an increasingly tricky balancing act.
How will the largest public data source in the US continue to protect people’s privacy while sharing the detailed demographic information that will be used to redesign constituencies, manage federal funding, and inform policy and research for the next decade?
There were concerns among census observers about how the office will strike that balance, beginning with the reallocation of the data, due to be released by mid-August.
This release is expected to be the first set of 2020 census statistics to include controversial new safeguards that Bureau officials say are needed to keep individuals in public data anonymous and prevent the use of their personal information. However, due to early testing, many data users are alarmed that the new data protection could render some of the new census statistics unusable.
The state of Alabama has filed a federal lawsuit to try to prevent the office from introducing these new safeguards. The case is currently in a three-judge court that is expected to soon rule on a motion for an emergency court order. Either way, the case is likely to reach the US Supreme Court. The legal challenge could ultimately derail the bureau’s schedule for releasing the data many state and local redistribution officials need to prepare for upcoming elections.
Here’s what else you need to know:
Why does the Census Bureau need to protect people’s privacy?
According to current law, the federal government is only allowed to release personal data 72 years after they have been recorded for the constitutionally required accounting from the census. The office relied on this promise of confidentiality to encourage many residents of the country to volunteer their information once a decade, especially among blacks, immigrants, and other historically under-counted groups who are unsure of how theirs are Answers could be used against them.
However, it is becoming more difficult for the office to keep that promise and continue to publish statistics from the census. Advances in the computation and access to voter registration lists and referenced commercial records have made it easier to trace allegedly anonymized information back to an individual.
To escape this mystery, the office built a new data protection system based on a mathematical concept known as differential privacy. It was invented in the research department of Microsoft and served as a framework for data protection measures in smaller Census Bureau projects as well as in some technology companies.
“Differential privacy is in every iPhone and iPad,” said Cynthia Dwork, a computer scientist at Microsoft Research and Harvard University, who co-invented differential privacy. “That may be larger than the number of respondents who took part in the ten-year US census, but there is an entity and commitment to privacy that are different here,” added Dwork of the Bureau’s 2020 census plans added.
How did the office protect people’s privacy in previous census data?
For decades, the office has been removing names and addresses from census records before converting them to anonymized data. This information is broken down by race, ethnicity, age, and gender into levels as detailed as a neighborhood.
But even in a sea of statistics, certain households – especially those in the minority community – can stand out because they live in remote areas or have other distinctive features that might make it easier to identify who they are.
As part of the added privacy protection over the years, the Agency has in the past withheld some tables of data and sometimes certain cells in tables from the public. The bureau also added “noise” – or data used to falsify census results – to certain tables before they were published. Starting with data from the 1990 census, a technique called “swapping” was used to exchange data on specific households with those in different parts of the city.
What made the office choose different privacy policies to protect the 2020 census data?
In 2016, the bureau’s researchers began in-house experiments to test the strength of the privacy protection used for the 2010 census data. Based on the results, the agency’s officials concluded that they could no longer rely on the exchange of data.
Using a fraction of the census data the office released a decade ago, researchers were able to reconstruct a full set of records for each person included in the 2010 census figures. After comparing this reconstructed data with records from commercial databases, they were able to re-identify 52 million people by name, according to a court petition from John Abowd, the bureau’s chief scientist. In the worst case, the bureau’s researchers estimate that attackers with access to more commercial data could expose the identities of 179 million people, or 58% of the 2010 census population.
In an attempt to better protect people’s privacy for the 2020 census, the office announced in 2017 that it would create a new system based on differentiated privacy. According to official data, in most published cases, they can add the least amount of noise necessary to maintain privacy data and balance confidentiality and ease of use.
“Of course it’s not that simple,” said acting director of the office, Ron Jarmin, at the annual meeting of the Population Association of America this month, adding that the office has opposed the sharing of data and the withholding of certain tables as alternative safeguards decided. “To achieve a similar level of data protection using these traditional methods, I think it would have resulted in a product that was even less useful to data users than what we are currently considering.”
How will differences in privacy affect 2020 census data?
The office says no noise has been added to protect the privacy of people in the state’s new population figures, including those used to reassign Congressional seats and electoral college votes, as well as figures for Washington, DC and Puerto Rico. The office also plans to reveal the total number of housing units in each census block, as well as the number of prisons, student dormitories, and other group accommodation in each block without privacy.
However, it remains unclear how the office’s differing privacy plans will affect other new reallocation data expected by Aug. 16, including population numbers and demographic details for counties, cities and other smaller areas.
It depends on how much noise the office wants to add and how it tries to offset the effects of adding noise. Bureau officials plan to make their decisions on the new redistribution dates in early June. Separate privacy decisions for other 2020 datasets are expected to be made later, after more public feedback is obtained.
Why were the bureau’s different privacy plans controversial?
However, Bureau officials have emphasized that their different data protection plans are still in the works. They currently have until May 28th to collect feedback from the public before finalizing plans to redistribute data for the next month.
Meanwhile, Alabama filed a federal lawsuit in March seeking to prevent the office from operating different privacy policies. The state claims the data will become unusable for redrawing voting cards. Sixteen states, most of which also have Republican-controlled legislatures, back Alabama’s claims in an amicus letter.
More privacy lawsuits could come later, including from civil rights groups that have been monitoring the office’s test data to see if the new safeguards make it more difficult to ensure fair representation of people of color during the redistribution.
“At this point in time, it doesn’t seem at all clear that anything the office publishes precludes the possibility that the proxy law and its enforcement could be compromised by differences in privacy,” said Thomas Saenz, president and general counsel of Mexican-American Legal Protection – and education fund, which also serves on one of the Office’s external advisory committees.
What if the courts prevent the office from using different privacy laws?
The release of data on the 2020 census redistribution – which is already delayed due to the coronavirus pandemic and the Trump administration’s interference with the census schedule – could be further delayed by “several months” in August, warned Abowd, the bureau’s chief scientist, in a court record.
“This delay is inevitable as the Census Bureau would have to develop and test new systems and software,” added Abowd, later estimating the work could take at least six to seven months.
Editor’s note: Apple and Microsoft are among the financial backers of NPR.