This week, various media outlets reported that WhatsApp exposed user data on a massive scale. Some called it the “largest data leak in history,” others said the situation to be worse than one might think at first. We somewhat disagree. A brief assessment.
Austrian researchers wrote a program that runs through all possible phone numbers and checks whether each one is linked to a WhatsApp account. If this is the case, the program stores not only the phone number but also the associated profile data (the profile picture and the About text), provided this profile data is public.
This way, the researchers were able to compile a complete directory of all phone numbers linked to a WhatsApp account, including public profile data.
According to media reports, the ability to create such a user directory entails the following undesirable consequences, which are particularly concerning from a data protection perspective:
The generated user directory reveals more about WhatsApp than Meta might be comfortable with (for competitive and regulatory reasons), such as how many individuals and companies use WhatsApp in each country or how large the user churn is.
The directory discloses personal information. Random samples suggest that around two-thirds of public profile data includes a human face as a profile picture, and About fields sometimes contain email addresses, other personal data, or references to the users’ identities.
The leaked information could be life-threatening. If authorities in countries where WhatsApp is banned are able to identify those using WhatsApp illegally, the consequences can be drastic.
The generated user directory certainly provides much more detailed information about Meta’s messaging platform than can be derived from app store charts or similar sources. However, this issue does not directly affect user privacy, especially since the information relates to the platform as a whole and not to individual users.
It is important to stress that the accessed profile data was all public. If a WhatsApp user makes their About text accessible to everyone, it can be viewed (and potentially saved) by anyone – this is not unexpected. Apparently, around 30% of all WhatsApp users have a public About text, while for the other 70%, it was not possible to retrieve this data.
Because WhatsApp uses the phone number as unique identifier, it is also not unexpected that it must be possible to determine whether or not a given phone number is associated with a WhatsApp user account. Otherwise, it would not be possible to add contacts in WhatsApp.
In light of this, it is somewhat misleading to speak of a “data leak.” What we have here is classic “scraping”: the researchers succeeded in systematically exporting public information.
Nevertheless, the generated user directory poses a serious threat from a data protection perspective. As stated above, around two-thirds of the public profile pictures show a face. So if a person’s face is known, facial recognition software could potentially be used to search the directory to obtain that person’s phone number.
This point is the most sensitive. Without doubt, it is extremely worrying if authorities in totalitarian states gain access to a directory of all WhatsApp users’ phone numbers.
However, using WhatsApp in such a scenario is a very bad idea to begin with. Like any provider, WhatsApp must verify a user’s phone number via SMS. SMS messages are not end-to-end encrypted, and in totalitarian states, it must be assumed that this communication channel is compromised.
Consequently, authorities can find out that a phone number is linked to a WhatsApp user account as soon as it is registered – no periodic scraping is required.
As has been shown, it is somewhat of an exaggeration to speak of a “data leak” in this case, given that the accessed information was public. When users make information public, it is in the nature of things that anyone can view (and potentially save) it. Having said that, many users may not have been aware of the scope of the profile privacy settings or what they actually mean.
Even if this is essentially “only” scraping, it is, of course, highly problematic – and just as surprising as it is concerning – that Meta did not implement effective measures to prevent scraping at this massive level.
Still, even the best anti-scraping mechanisms cannot prevent that anyone is able to determine whether a given phone number is associated with a WhatsApp user account. For example, it was revealed that Mark Zuckerberg uses the Signal app when a security researcher made his private phone number public.
This demonstrates the fundamental issue with using phone numbers as unique identifiers. For a whole range of reasons relating to data protection, phone numbers are not an ideal means for this purpose:
They cannot be easily changed (e.g., after a data leak).
They provide information about which country they belong to.
They are not anonymous – in many countries, official identification is required for registration.
If different platforms require a phone number, users can be identified across these platforms.
It may be possible to obtain the phone number of a previous owner, which can have various unexpected consequences.
For these reasons, Threema deliberately does not require a phone number and instead uses a random string of characters as unique identifier. This Threema ID is completely anonymous and can be revoked at any time.