advertisement

Follow Mint Lounge

Latest Issue

Home > Smart Living> Innovation > The bridge between personal and not-so personal data

The bridge between personal and not-so personal data

This excerpt from Siddharth Sonkar's new book – ‘What Privacy Means’ – looks at the Data Protection Bill 2021 and how data anonymization works

When data is irreversibly anonymized, it is assumed that the individual cannot be identified anymore. (iStock)

Listen to this article

The Data Protection Bill, 2021 (DP Bill), albeit its limitations, is a significant step towards protecting our personal data. However, all personal data protection laws across the globe suffer from some inherent limitations. The limits of a personal data protection law are that it only protects personal data – that is, information which relates directly or indirectly to individuals. Personal data which has been ‘anonymized’ is not considered to relate to anyone in particular anymore. But what is anonymization? It is a process to make it impossible to identify the underlying individual based on the data. For instance, in the context of a study of labour statistics, information relating to wages that workers are generally entitled to in a particular state could be an instance of anonymized data (since the information does not relate to specific workers), without necessarily revealing anything about the underlying gender or ethnicity of the concerned workers. Similarly, gaps in salaries between workers across different regions could be an example of anonymized data (even though it is ultimately based on personal information, that is, wages attributable to specific individuals).

When data is irreversibly anonymized, it is assumed that the individual cannot be identified anymore. Consequently, the Data Protection Bill 2021 is not applicable to such data. This is because the Bill applies to personal data, or data which specifically relates to a particular individual. In other words, if information is anonymized and it does not relate to individuals anymore, the obligations in the personal data protection law will cease to apply.

Also read: What The Great Hack tells us about data privacy

The relaxation of obligations (for example, obtaining consent from the user before processing the dataset) is an incentive for businesses to reduce the risk of pinpointing or identifying individuals with the help of information. Anonymized data is, of course, only a type of ‘non-personal data’ (the other types that, apparently, did not relate to individuals previously, have been discussed earlier). At this juncture, a visualization of how anonymization works may be very helpful. Let us take the example of a data fiduciary that runs a contact-tracing application.

A visualization of how anonymization works. Let us take the example of a data fiduciary that runs a contact-tracing application
A visualization of how anonymization works. Let us take the example of a data fiduciary that runs a contact-tracing application (Hachette India)

In the Table 1, we have all the information we need to uniquely identify Alisha. With the help of a combination of her name and phone number, it is fairly easy to know whether Alisha or Ankit are COVID-19 positive. This information could be fairly sensitive, as insurers could use this information to charge them higher premiums. Let us look at what happens when their information undergoes some degree of anonymization:

From Table 2, it is very difficult to identify who this information is really about.
From Table 2, it is very difficult to identify who this information is really about. (Hachette India)

From Table 2, it is very difficult to identify who this information is really about. At the same time, this information by itself may not necessarily be very helpful to an insurer. They cannot identify which patients this information relates to (whether it is Alisha or Ankit). Consequently, they cannot charge greater premiums to specific customers (such as Alisha or Ankit) accordingly.

However, with a collection of all the names and numbers of more patients born in 1967 predictions could be made about the probabilities of someone of a particular age contracting COVID-19. Even though, this information may not result in being able to pinpoint either Alisha or Ankit, it may be prejudicial to the interests of members of that age group of patients – since as a result of this creation of the group by technological pattern prediction, the consequence may be an increase in the insurance premium that is charged to this particular age bracket (discussed in the next chapter in more detail).

It is also worth considering that the bridge between data which is personal and data which is not personal is a process of ‘anonymization’, when personal attributes are removed from the database of information about an individual in such a way that the individual is not identifiable based on the combination of information available to a person. Once a data fiduciary takes adequate steps of anonymization (for example, removing phone numbers to make it more difficult to identify the person), the information is not considered personal data anymore. As mentioned above, when personal information is irreversibly anonymized under the Data Protection Bill 2021, obligations under the Bill which may be onerous for businesses at times (such as the requirement to abide by obligations laid down in the Data Protection Bill discussed in the previous chapters) cease to apply to this information. The data fiduciary, therefore, has greater freedom in terms of what they want to do with this information. Anonymized datasets are, indeed, very useful in unlocking the value of our data, since it helps them predict useful statistical patterns or carry out data analytics which facilitates innovation, without undermining user privacy.

However, if we assume that anonymization protects us from undesirable consequences that could come with information that relates to us, we might be thinking a little ahead of ourselves. Researchers across the globe strongly believe that anonymization is susceptible to ‘re-identification’, because of the increasing sophistication of hackers. Let us take the following example (Table 3), where anonymization is done in a weak (that is, not so strong) manner:

An example where data anonymization is done in a weak manner.
An example where data anonymization is done in a weak manner. (Hachette India)

If there is only one person with the same birth date and the name Alisha from the city Mumbai, just removing the phone number may not be sufficient. If there is another publicly available register which contains the names, dates of birth, phone numbers, ZIP code or personal address (such as in voter directories) of Alisha and Ankit from the above table in addition to others, it may be possible to uniquely identify them, and find out a lot more about the individuals here. Our information does not exist in silos in databases. A person interested in finding out more about us can keep snooping around until they are able to create a whole picture of us.

In fact, back in the 1990s, a private corporation in the United States that operates in the health sector reported an instance where anonymized information was re-identified in this manner. In this instance, the names of persons from a dataset containing health information was removed. However, certain other attributes such as the ZIP code, gender and date of birth continued to be available in public registers such as voter lists. Even with such limited information, researchers were able to re-identify specific individuals with a success rate of about 80 per cent by combining this information with publicly available information – such as voter lists, which contained common datasets – helping to link it back to specific individuals.

The book cover of What Privacy Means, by Siddharth Sonkar.
The book cover of What Privacy Means, by Siddharth Sonkar. (Hachette India)

In other words, anonymization must be done to an extent that it carefully conceals the identity of the underlying person to whom the information belongs. This often becomes difficult for businesses that are interested in doing analyses with our information, which becomes less useful if anonymized to greater extents, since anonymization reduces the possibilities of inferences that can be drawn from our data. Some of these inferences may not relate to specific persons, as we saw in case of Table 2, where decisions regarding insurance premiums could still be made in relation to specific age groups, and not specific individuals.

But doesn’t privacy mean looking at a space only from the lens of the individual? Is this book only about putting the individual at the centre? Then, where does the community fit in?

Excerpted with permission from What Privacy Means, by Siddharth Sonkar, published by Hachette India.

Also read: How to check if that mobile app is spying on you

Next Story