UIDAI, NIC And India's Data Security Nightmare

Should the worst happen to India’s official information technology infrastructure, AS4758 is a term that will feature prominently in it. The term denotes a unique name/number (ASN) for a network that is used for routing traffic over IP networks and AS4758 is operated by the National Informatics Center. This prefix represents a vast majority of the servers and sites (the 164.100.0.0 – 164.100.255.255 IP address range) operated by the NIC. Some of the key sites operating from this network include UIDAI, website of the Chief Electoral Officer, Delhi and the NIC Certifying Authority. These three are just a minor part of the vast array of sites and services, that cover everything from the personal information of the citizens of the country, to key information about the government itself.
This post is one that I have been putting off writing for a while. The main reason is that it is not right to identify weak points in our key IT infrastructure in such a public manner. But the fact is that the speed with which we are going ahead to centralize a lot of this information, without thinking through the requisite safeguards is an issue that overrides that concern. Improperly secured, this information is a grave risk to everyone, including the government. And from the evidence seen in public, there is not adequate knowledge or expertise within the system to even take a call on what is adequate security for an undertaking this grave in nature. The secondary reason is the inadequacies of the underlying technology in mining this information. They are immature and not accurate enough and it will lead to a flood of false positives in a system where the legal system itself is under-equipped to make key differentiation when it comes to the evidence that supports the case made by the false positive.
Another point to note is that I am hardly a security expert, the little that I know is what I need to know to keep my applications secure. Whatever I have seen is a tiny percentage of what is available for everyone to see. Information security has become such a complicated and specialized field now that it is no longer good enough to know some of the factors involved in keeping an application and infrastructure secure from prying eyes. I would not dare to certify a client website/application as secure based on my own knowledge. I would rather get a specialized security firm to do that, even if they cost a lot of money. The important bit here is that if I can see these issues, someone with malicious intent can see a hundred other things that can be used to gain unauthorized access.
All Eggs In One Basket
Coming back to As4758, it is a case of keeping too many eggs in one basket. From the outside, it looks like multiple vendors have access to the servers on that network. Forget forcing users to SSL-enabled versions of the sites, most of them don’t even give that as an option. This is true of both the UIDAI website and the Delhi CEO’s website where users have to enter personal information to retrieve more personal information. A compromised machine on the network can easily listen to all network traffic and silently harvest all this data without anyone knowing about it.
A year ago, NISG, which is one of the key service providers for the NATGRID and UIDAI project was running its website on an old Windows desktop (Windows XP or 97, if I remember correctly). Thankfully, NISG seems to have moved to a Linux machine recently. Also, the NISG set-up is not hosted within the NIC’s network, so any the possibility of damage from the machine would have been comparatively lower. Though, we will never know for sure.
That said, even being on different networks won’t provide iron-clad security, if you don’t design networks, access protocols and authentication as the first order of business. Done as an afterthought, it will never be as effective as it needs to be. Agencies often require data from each other to be mashed up (example: overlay UIDAI data over NATGRID data) and this is often managed at the protocol level by restricting access by IP. In the hypothetical case of the NISG server being allowed access to UIDAI data and the former is compromised, you have a scenario where even the most secure UIDAI data center will leak information due to compromise in another network.
Cart Before Horse
A moot point here is the assumption that the UIDAI infrastructure is secure enough in the first place. An NISG requirement for a data center security and risk manager position does not give us confidence in that assumption one bit. As the saying goes, the chain is only as strong as its weakest link and in this case, it seems that security is an afterthought. Part of the problem is that there is not enough experience within the government machinery to even determine what is secure enough. A simple rule about getting work done by someone is that you need to know, better than the person you are engaging to get that work done, what you are looking to get done. We just don’t have that in place in India at the moment.
These systems need to be designed primarily with security in mind and that does not seem to be the case. My fear with these systems is not as much that the government itself will misuse the data (which is a valid and important concern for me), but that it will be quietly pilfered away by foreign players and nobody would know about it. Having such information about all of the citizens of a country opens up millions of avenues for the malicious players to recruit people to their cause as all those people become potential targets to blackmail. Since we are going to collect information about everyone in the country, the potential of who can be blackmailed can range from the richest and most powerful, to the poorest and the weakest. And the best part is that what exposes people to blackmail need not even be illegal behaviour, it can be perfectly legal behaviour that affects social and professional standing of an important person.
We are going to present all of that information to interested parties with a nice bow on top.
Access, Identity, Authentication, Logging

Any secure system will require you to control access to the resource as a whole and/or parts of the resource itself. This planning has to start from physical access to the core and nodes that access the core and it has to then take into account the applications that will provide access to the information and the applications that will access this information from the nodes.
Any secure system will have a clear policy in assigning identities to people who can access those resources. This needs to be consistent across the core and the nodes. This makes the system rather inflexible and a pain to operate, but it is necessary to mitigate even the weakest of attacks.
Any secure system will clear mechanism of of authenticating the identity of a valid user in the system. There cannot be any backdoors built into such a system as it has been proven time and again that the backdoors become a point of major weakness over time.
Any secure system will log all actions at all levels in the system and establish triggers for any out-of-band activity that covers even legitimate use.

The above four points are just an amateur attempt by me at defining the outlines of a reasonably secure system. A proper attempt at this by a real security professional will have a hell of a lot more of points and also go into a great deal of detail. But these points should give you a rough idea about the complexity involved in designing security for systems like these. You simply cannot slap on top security as an afterthought here.
Mining Nightmares
Which brings us to the issue of accuracy in data mining for initiatives like NATGRID.
Personally, I do believe that there is a valid case for governments to either collect or have access to information of any kind. What I do not like is unfettered collection, mining and access and zero oversight on any of those processes.
The reason why mining big data as a sort of Google search for suspicious activity is a terrible idea is simple. It does not work accurately enough to be of use in enforcement. The same technology that results in mis-targeted marketing phone calls and the tech that serves you ads that are irrelevant to you are the ones that are going to be used to determine whether a person or a group of people are likely to do bad things. Even in marketing or advertising it works with an appalling rate of failure, using it in intelligence, surveillance and enforcement will lead to an ocean of false positives and wind up putting a lot of innocent people behind bars for no good reason.
Even worse is the fact that legal system itself has such a weak grasp on these matters that appeals are likely to fall on deaf ears as the evidence is likely to be considered the gospel as there is no understanding available within the system that can say it is not the case. And then there is the potential for real abuse — not limited to planting evidence through spyware — that can ruin lives of anyone and everyone.
Conclusion
Our approach to security and centralized information collection is terrible beyond what can be expressed in words. It needs to be stopped in its tracks and reviewed closely and should be redesigned from the ground-up to keep security as the first objective and data collection as a final objective. We need to codify access laws to data collected in this manner and ensure that all of it does not reside in a single place and access to a complete picture is available only in the rarest and most exceptional of circumstances. What is happening right now is none of that and I am afraid we will find that out in the most painful manner in the coming years.