Few subjects are as polarising as statistics. There are those whose eyes sparkle at the glimpse of a chart and those who want to flee when they hear the word. Some see a source of divine truth where others find a liar’s accomplice.
Even those who make their living crunching numbers diverge in explaining what they do, a difference ingrained in history: Statisticians moved from claiming extreme objectivity in the early days (when they restricted their work to data gathering and refused any analysis) to becoming active collaborators in scientific investigations. The debate to identify where data stands in the hierarchy of knowledge is evolving but it is no longer academic.
Numbers have flooded the public sphere and critical engagement with data—the ability to distinguish sincere and helpful conclusions from skullduggery and plain idiocy—is increasingly necessary to strengthen democracy. In a fine new book, Whole Numbers And Half-Truths: What Data Can And Cannot Tell Us About Modern India, data journalist Rukmini S. offers what she calls a “toolkit” to enable that engagement, and shows how seemingly dry numbers capture the nuance and complexity of India. (Full disclosure: Rukmini was my editor when we both worked at The Hindu.)
Also Read: This is the real problem with India's employment data
This is not a book about data—it’s jargon-free, equation-free, econ-speak-free. Neither is it a book about Indian statistics. We meet the organisations and people behind the numbers but they are not the book’s central focus. This is a book about modern India, the story of India told through the lens of its data. Through its narration and choice of questions, the book reveals both the power and limitations of data journalism.
The book is structured around 10 questions (how India votes, lives, works, spends money and so on), each analysed in a separate chapter. Barring a few pages with collections of numerical facts, the numbers are interspersed with paragraph-length stories to show what statistics don’t reveal: the individuals whose lives these numbers summarise.
The breakdown of a 42-year-old cook’s family budget is a great example. I knew that official data from the National Sample Survey Office (NSSO) says people spending more than ₹8,500 per month in urban India lie in the country’s top 5% and my reaction was similar to that of almost every person who hears this for the first time: How is that possible?
The cook’s story offers an insight: She makes ₹19,000 a month and her older son, ₹15,000. So, is she in the top 5%? No. After accounting for money that goes towards clearing pre-committed payments—including loans (for a total debt of ₹1.65 lakh) and self-help group contributions—the family of four with two earning members is left with a monthly per capita spend of ₹5,875, placing them in the top 20%. The NSSO data only tells us the threshold; the cook’s story shows why cash flow matters to understand it.
The book’s rigorous research will show that data has a lot to say about India. Data-phobics may even fall in love with data. Readers—especially those not hooked to the news—will find socioeconomic insights that confound preconceived notions. But the book’s key contribution is the public getting access to a sharp data journalist’s process to engage with Indian statistics that do not find space in news stories a few hundred words long. Data analysis and interpretation necessarily involves good judgement—it’s not a mechanical game of running numbers. How to argue for or against an empirical claim and spot bullshit is what readers should seek to learn from the book—and use that framework to answer their own questions about India.
The book’s strength is rooted in Rukmini’s reporting. Her analytical framework is informed by the often untold story behind the statistics (the one conveniently hidden behind an asterisk). Numbers don’t exist in isolation, she says, showing how sociopolitical forces impact their creation, administration and interpretation. Ignoring these forces leads to misleading and garbage conclusions. (The implicit lesson for emerging journalists is quite clear: Spreadsheets don’t tell the full story—pick up the damn phone!)
What the author does not explain—possibly to avoid technicalities—is the process to choose and reject data sources, a crucial step for anyone engaging with statistics, especially to sort out the nonsense manufactured by private agencies. That process matters: The data one decides to use or not use is a considered choice. It’s not strictly formulaic.
This choice is often abused in popular discourse, where ideological motivations are disguised as methodological arguments. The current government is a repeat offender: Official employment and consumption data that did not suit its chosen narrative was discredited or suppressed, hurting the independence of India’s statistical institutions.
These decisions have polarised the debate on learning about India from official data. Data-cynicism is rising. The immediate urge to say data is “fraud” or “fake”—especially if it points to facts that do not conform with your world view—does not help. Yes, the fight for good data is political, Rukmini says, but warns against blind scepticism and vague criticism.
The book ends with a five-point diagnosis of India’s data problems: not collecting essential data, not publishing what exists in a usable format, obfuscating and suppressing some data, overselling what limited data says, and knee-jerk criticism of inconvenient data. Data is not the problem—problematic data and selective reading of data is.
The book could have explored the second half of its title—“what data cannot tell us about modern India”—in more detail. Yes, we do learn about the questions that cannot be answered for lack of good data (hate crime trends, for example), but those are questions that can be answered if better data exists.
I am pointing to a deeper—and more difficult—epistemological question: What can we not learn about India even with high-quality data? A nuanced understanding of this question is crucial in an increasingly data-obsessed world where numbers are given the ultimate authority as a source of truth.
“What official data alone can sometimes not do very well is tell us the why,” Rukmini writes in the conclusion. (Nine of her 10 chosen questions answer “how”, one deals with “what”—there are no “why” questions.) “That is where high quality privately collected data can step in.”
Maybe yes, maybe no: It’s a debate that has kept statisticians and philosophers occupied for decades. Data is profoundly dumb about causal explanations which make up the bulk of what we know, computer scientist Judea Pearl writes in The Book Of Why. “You are smarter than your data,” says Pearl. “Data do not understand causes and effects; humans do.” (Pearl’s argument falls in the school of thought I subscribe to. It has its own critics.)
Also Read: ‘Data Protection Bill can reduce the state’s surveillance powers'
American historian Jill Lepore notes that the use of data in journalism followed the empirical revolution in social science. But the cultural worship of data and raising it to a pedestal, she says, diminishes all other ways of knowing. “There’s a whole set of assumptions in that world that we should be talking about. I mean, not to say there’s no amazing extraordinary research being done that is data-driven or that falls under the heading of data science, but there were a lot of mistakes made when people decided in the 1890s that social science would solve every problem. It was kind of important for other people to say, ‘You know what, social science can’t necessarily solve every problem. It’s really useful, but it’s important to think about when we should use it and when not.’”
Prasanta Chandra Mahalanobis was the man responsible for organising and developing the Indian statistical system. “Statistics must have a purpose,” he stressed on multiple occasions. Those responsible for data collection must request the authorities to “explain as clearly as possible the purposes for which the information would be used,” Mahalanobis said. That should guide the concepts, definitions and standards for data collection.
Rukmini’s thoughtful writing shows the power of statistics if the purpose is understanding India or using them as a tool for problem-solving. But she also exposes us to the dangers if the purpose turns political. The idea of coupling the delayed 2021 census with the National Population Register (NPR) will lead to a massive counting challenge, she writes. Statisticians are worried that linking these two exercises—one that establishes citizenship, another that forms the bedrock of key administrative decisions—will hurt data integrity as a movement is building up, particularly among Muslims, to reject the NPR, fearing alienation.
When statistics become a tool for politics, the powerful wield more power. And the power of statistics to elevate our understanding of India diminishes. The book makes a compelling case why it must be preserved.
Samarth Bansal is a journalist based in Landour. He tweets at @PySamarth
Also Read: Devaki Jain's adventures in feminist economics