There’s also the story of Compliance Scanner which started in 2014 as Data Watch. After the first year of meandering, trying to find its business purpose, it pivoted into a useful tool that allowed anyone to instantly identify the statutory compliance status of Indian firms on various laws. Regulators, lawyers, consultants, civic-minded citizens and NGOs; all relied on it to spot variances between what a company claimed in public, and what it practiced in private. It was single handedly responsible for dozens of well-known companies being found of falling short of their provident fund (EPFO) obligations.
This is where we must apologize and tell you that none of the above three examples are true.
Know Your Defaulter, Best Trip and Compliance Scanner do not exist.
In fact, these startups cannot exist (and neither can the efficiency they generate) because of the way various Indian governments and regulators have either ignored or actively stymied the “Open Data” initiative.
What is open data?
“Open data and content can be freely used, modified, and shared by anyone for any purpose.”
Data = Innovation Fuel
A 2013 McKinsey research report showed how governments around the world could unlock an additional $3 trillion (yes, that’s right) in economic value merely by enabling open data across seven domains.
To quote from the report: “An estimated $3 trillion in annual economic potential could be unlocked across seven domains. These benefits include increased efficiency, development of new products and services, and consumer surplus (cost savings, convenience, better-quality products). We consider societal benefits, but these are not quantified. For example, we estimate the economic impact of improved education (higher wages), but not the benefits that society derives from having well-educated citizens. We estimate that the potential value would be divided roughly between the United States ($1.1 trillion), Europe ($900 billion) and the rest of the world ($1.7 trillion).”
The seven domains McKinsey identified were education, transportation, consumer products, electricity, oil & gas, healthcare and consumer finance.
Needless to say, these are some of the sectors where India needs innovation and transformation at scale. Today. If only.
Even though there were early indicators that the Indian government was interested in furthering open data and transparency initiatives, the last few years have been a big let-down. After finding the Right to Information (RTI) Act “somewhat wanting”, the current government laid special emphasis on the need to release raw data in a machine-readable format. The Open Government Data platform data.gov.in was supposed to do exactly that. Even before the current government, back in 2011, India was also a formative member of the Open Government Partnership (which has since been joined by 75 countries) but withdrew just before the kickoff.
With the benefit of hindsight, these pronouncements seem to be pretentious at best.
Higher value data sets still remain behind access walls. In fact, in some cases, access has been made more difficult since the government started driving its open data movement. Much of this data was provided for public access earlier but increasingly, access has been restricted. The government is opening up less useful data sets while blocking access to richer ones. This is entirely contrary to the open data policy of the government which says that data collected by the government with the public money shall be in the open.
Data death by a thousand CAPTCHAS
Here, an important question must be asked. What does it mean for data to be truly open? In a way that it acts as an innovation multiplier for a country’s economy? Here’s how McKinsey described it:
For any activity that needs to be done at scale and with few errors and at speed, machine readability is crucial. Scanned documents or badly formatted PDFs force diligence and other processes into manual mode which are slow, error-prone and costly – and sometimes altogether impossible to do manually.
Diligence (both at individual level as well as sector level) is increasingly an on-going activity rather than a one-time, at-initiation activity. Without crawlability, on-going diligence might as well be written off.
Unfortunately in India, we’re on the “closed” end of the spectrum on almost all counts. CAPTCHAS, paywalls and secretiveness are increasingly what we encounter while collating most government data.
Case statuses and details in district courts were originally crawlable (meaning, could be read and indexed by software spiders) but have now been put behind CAPTCHAs. This means a human intervention is required for downloading every single unit of data. An argument is made that if you need to search for a case, you can directly put in a CAPTCHA on the government website and search there. However, unless service providers can pre-download all of the data, various matching and tracking algorithms cannot be run. Further, unless these databases can be continuously updated, early warning systems (of the type recommended by the RBI for instance) cannot possibly be built.