Surveillance Academia

Sabyasachi Das

Sabyasachi Das is an Associate Professor of Economics at Ahmedabad University. His research lies at the intersection of political economy, public economics, and institutional analysis, with a focus on how governance structures, electoral processes, and public policy shape inequality and resource distribution. He holds a PhD in Economics from Yale University.

Academics and researchers sometimes have an unhealthy relationship with data. We are always craving ever larger and more granular datasets, hoping they will uncover deeper and more interesting relationships between social and economic phenomena. The craving turns toxic when the urge to access larger datasets overrides other concerns about how data is generated. This is especially worrying as we move from active forms of data generation to passive ones, driven by the omnipresence of digital technologies in our lives and the surveillance apparatus built around them.

Traditionally, micro-data collection has been largely manual. Researchers either conduct surveys—which are time-consuming and labor-intensive—or access internal records of firms, government agencies, or other organizations. The latter typically requires digitizing and compiling physical records, again involving substantial labor. These modes of data generation, which I refer to as active processes, remain prevalent in research. They require active engagement with human subjects for the purpose of data generation. As a result, the act of collecting data about some human or organizational activity (such as fertility choices, consumption decisions, or firm production and revenues) is separate from the activity itself. This decoupling collapses when data collection becomes passive. Consider a telecom company collecting cellphone metadata from its customers. These data are generated automatically—passively—during routine cellphone use. Data collection no longer requires additional labor, but it does require additional capital: investments in data-collection technologies and storage infrastructure. This shift toward a surveillance-based, capital-intensive model of data production has several important implications for research.

When data production is active, usage purposes are generally better defined and research practices typically include built-in protocols that require informed consent from participants and adherence to ethical norms during data collection (such as asking only ethically permissible questions) and storage (for example, maintaining anonymity). This is especially true for survey-based research. Even firm-level data is often collected by government agencies operating under legal and ethical constraints. For instance, India’s Annual Survey of Industries—which collects firm-level information on revenues, employment, and capital—is conducted by the Ministry of Statistics and Programme Implementation. The Ministry, for the survey, must adhere to the guidelines of the Collection of Statistics Act, first passed in 1953 and amended in 2008. Similarly, in the United States, Title 13 of the U.S. Code (passed in 1954) provides the legal framework for census data collection, including firm data. Nordic countries, known for maintaining detailed social registries for the entire population, have independent public institutions (such as the Norwegian Data Protection Authority) with the primary purpose of ensuring compliance with data protection laws.

This is not to say that survey or institutional data collection is devoid of ethical issuesKhera, R. (2023). Some questions of ethics in randomized controlled trials. Review of Development Economics, 28(4), 2072–2087.. With passive data collection, however, the ability to impose similar guardrails may be limited, as data access increasingly becomes a private process. Even if passive data collection can potentially be regulated, its usage remains difficult to monitor and regulate. This issue arises both in the context of surveillance data held by private firms and data accessed or used by governments.

Researchers are increasingly collaborating with private organizations that collect vast amounts of micro-data—cellphone records, purchase histories, transportation activity on ride-sharing platforms, and so on. As a result, researchers’ access to such data depends heavily on their relationship with the organization that holds the data. In practice, researchers must offer a value proposition for access. Academic researchers can help firms in strategy, and more importantly, can influence policy to make it more favorable to them. The terms of these collaborations are often implicit and private, making them difficult for outsiders to evaluate.

A 2018 Guardian investigation Lawrence, F. (2022, July 13). Uber paid academics six-figure sums for research to feed to the media. The Guardian. examined leaked communications between academics and Uber executives, revealing the nature of such collaborations in that case. Academics were granted access to Uber’s data with the understanding that they would produce paid research favorable to the company, to be used for lobbying. Even unpaid research that the academics wanted to do with the data was supervised by Uber executives. As one executive put it, they would “work with” the researchers on framing the study and “decide what data we share with him.”

Sometimes, the nature of the research question itself signals the underlying relationship. For example, in a paper titled “Personalized Pricing and Consumer Welfare”(2022) Dubé, J., & Misra, S. (2022). Personalized pricing and consumer welfare. Journal of Political Economy Dubé and Misra conduct experiments in collaboration with a digital job-matching firm. They train a machine-learning algorithm using the firm’s internal customer data to predict optimal personalized prices. (The customers for this firm are the potential employers seeking prospective employees.) The authors estimate that such pricing could increase the firm’s profits by 86%, and report that the firm’s subsequent pricing policy at least partly implemented their finding. The stated research question concerns the consumer welfare implications of personalized pricing. Yet, in the course of answering it, the firm learned how to design a more profitable pricing strategy.

There are similar examples of such “firm-sanctioned” studies. The research questions in these papers are all valid and intellectually interesting exercises. However, using firms’ surveillance data on their customers for research creates an externality for both the firm and its customers. The customers generated the data in the first place, but did not—and likely cannot— meaningfully consent to their data being used in this way.

Academics also increasingly collaborate with governments to study the effectiveness of using surveillance data—generated either within government systems or by private organizations—to improve policy implementation and governance outcomes. One prominent policy area, particularly in developing countries, is the targeting of welfare programs. Identifying poor households is a central challenge in implementing such programs, and governments traditionally rely on surveys for this purpose.

A growing body of research asks whether individual cellphone records can help governments identify poor households more effectively than surveys. The underlying hypothesis is that the cellphone usage patterns of poorer individuals differ systematically from those of others. If true, such methods could, in principle, be scaled up to reduce the inclusion and exclusion errors associated with survey-based targeting. Many of these studies are conducted in difficult settings—Afghanistan Aiken, E. L., Bedoya, G., Blumenstock, J. E., & Coville, A. (2022). Program targeting with machine learning and mobile phone data: Evidence from an anti-poverty intervention in Afghanistan. Journal of Development Economics Rohingya refugee settlements in Bangladesh Aiken, E., Ashraf, A., Blumenstock, J., Guiteras, R., Mobarak, A. M., & National Bureau of Economic Research. (2025). Scalable Targeting of Social Protection: When do algorithms Out-Perform surveys and community knowledge? or crisis-affected contexts such as COVID-era Togo Aiken, E., Bellue, S., Karlan, D., Udry, C., & Blumenstock, J. E. (2022). Machine learning and phone data can improve targeting of humanitarian aid. Nature, 603(7903) and the Democratic Republic of Congo Mukherjee, A. N., Bermeo Rojas, L. X., Okamura, Y., Muhindo, J. V., & Bance, P. G. A. (2023). Digital-first approach to emergency cash transfers: STEP-KIN in the Democratic Republic of Congo. World Bank. —where conducting surveys is particularly challenging. The data collection processes in these studies are generally careful with regard to safeguarding individual privacy and make special efforts to meet the ethical standards. The findings, however, are sobering. Across contexts, predictive models trained on cellphone records do not significantly outperform survey-based methods or methods using institutionally collected data.

The Afghanistan study, for example, finds that cellphone-based targeting “is nearly as accurate as the commonly employed asset- and consumption-based methods.” The Bangladesh study concludes that while survey-based targeting (the Proxy Means Test) “is more costly than phone-based targeting, it is also more accurate.” As a result, the policy conclusions are nuanced: phone-based methods cannot replace traditional approaches but may at best complement them. As the Togo study notes,

our results do not imply that mobile-phone-based targeting should replace traditional approaches reliant on proxy means tests or community-based targeting. Rather, these methods provide a rapid and cost-effective supplement that may be most useful in crisis settings or in contexts where traditional data sources are incomplete or out of date.

Nevertheless, providing governments—especially those with limited capacity—with alternative digital technologies that can eschew investments in data-collection institutions carries risks. It may nudge them toward wider adoption of such tools than is appropriate, leading to underinvestment in (active) data collection mechanisms, which the research finds to be generally superior. Inappropriate or hasty adoption of these technologies in governance practices can indeed generate adverse outcomes for the beneficiaries, as discussed here Jaandaraz. (2022, March 24). On the perils of embedded experiments. Developing Economics. and here Muralidharan, K., Niehaus, P., & University of Virginia. (2022). Identity verification standards in welfare programs: experimental evidence from India. with respect to two different welfare programs in India.

In some cases, the benefits of state surveillance are easier to demonstrate than the costs, shaping which research questions are asked—or asked first. In January 2026, the Journal of Development Economics, a leading journal in the field, published a paper examining the impact of China’s nationwide expansion of facial-recognition-enabled surveillance cameras on crime. Ominously titled “Keeping an Eye on the Villain,” Ma, H., Xu, M., You, W., & Feng, J. (2025). Keeping an eye on the villain: Assessing the impact of surveillance cameras on crime. Journal of Development Economics the paper estimates that camera installation significantly reduced crime and increased citizens’ satisfaction with the government and their sense of security, especially among women. The authors estimate that preventing each crime costs about $6,000, which they describe as “highly cost-effective.” They point out that concerns with “human rights and privacy” are varying across contexts, and hence, ethical standards on this need not be uniform.

Surveillance has effectively shifted the responsibility for data generation from governments and researchers to human subjects themselves, with a wide scope of use. This has undoubtedly made some researchers’ lives easier—and intellectually, and sometimes materially, richer. In absence of clear guidelines that govern academic collaborations enabling passive data access, several ethical issues arise. Before passive data becomes the default benchmark for research, however, we should pause to reflect on its broader consequences for society and the economy at large.

Acknowledgement

The author thanks Reetika Khera, Ankur Sarin, Parikshit Ghosh and Ritwik Banerjee for valuable feedback and comments.