If you are on WhatsApp, you will have realized an outcry over the where they disclose what data they will share with their mother company Facebook. Many tech pundits noted that the changes are not dramatic and did not even matter for Europeans. But as Adam Smith () at the wrote:
“WhatsApp might not share data to Facebook but it feels like it does, in the same way, that your phone is () not actually listening to your conversations but the mass of information about your location, preferences, chat history, all collected by advertisers, means that the TV programme you were talking about only yesterday is now the first advert you see on the site.”
Welcome to the post-factual world. It does not matter what Facebook does; it matters that we don’t Facebook. Why? Because we don’t know what happens with our data. Therefore, we should not discuss who has access to the data but, more importantly, what is the data used for?
Data can be helpful. From Waze to Netflix, recommendation systems are great. We want those apps to use our data and give us the service we love. Let’s look at healthcare. We want doctors to use health data to diagnose our potential issues. But we do not expect our doctor to share this health information with an insurance company that will then, in turn, change the pricing of our life insurance.
The healthcare industry is regulated, and laws will prohibit your doctor from sharing your data. But the social media industry is not. Already in 2015, I was approached by a large European insurance company who wanted to know whether they could use deep learning to identify riskier sports, like skydiving in social media photos. You might find this creepy, but this has become a reality if you are in Data should be used to make our lives better and more productive. But if data is used without our knowledge to assess whether you will , whether you are fit for , or whether you should get a (), then we will feel – and rightly so – uncomfortable.
Facebook is in the spotlight because it is big and has loads of potent data about us. Michal Kosinski () using Facebook data could accurately predict a range of personal attributes, including sexual orientation, ethnicity, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The model even managed to separate between Democrat and Republican in 85% of cases. We don’t trust Facebook anymore because we learned that they gave away our data .
We should note that WhatsApp is probably not the biggest issue privacy offender we have in our phones. We know that our phone companies, , sold our location data. Additionally, there is an ecosystem of apps that share a lot of information about us. () from Latanya Sweeney () details how much data is shared by some apps: Drugs.com, a Medication guide rated with almost 5 stars by over 12k users is – according to DataMaps – sharing your medical information with quant serve, scorecartresearch, etc.
For most parts, those companies operate hidden from most of the public eye, but in 2020, much of the location data market became visible. During the peak of the pandemic, many companies created dashboards that showed aggregated movement data. Their intent was good. This type of information can help predict covid 19 outbreaks and help governments and health officials prepare. It was surprising to see the types of companies that produced those dashboards. There were surely the big corporations you would expect to publish those datasets. (Disclaimer: I am not only working for Cornell but as well for Google. But all opinions are my own). But then there were many smaller companies such as and . Where did they get the data from? From the ecosystem of apps on your phone. Companies you probably never heard of before like , , and many others. () reported about a company , that gathered location information via more than 500 apps, including many that provide local news. Again the question should be, for what do you need this information? If location helps to render a better service to me, then yes, please use my information. If the whole purpose is to sell my location information to others who might then use it away I don’t know or understand, then “no – stop using my data.”
We need a better framework to work with data. The regulation of the EU and California are leading here the way. GDPR demands that organizations only collect personal data for a specific purpose, clearly stating its purpose. However, as Data Scientist and Product Manager, I can tell you that this is not too easy. Companies often do not know the value of the data they collect; thus, they will struggle to detail that specific purpose upfront. Facebook is an ideal example. As Antonio García Martínez () detailed in his , Facebook did not know what data would drive their business model’s value for quite a while. But those challenges should not stop us. Think about the following questions: Is it correct to , as asked? Is it correct to use , as reported? Is it correct to use gender to ? There are no easy answers, and only if the usage of data and the application of AI is transparent we, as a society, will be able to discuss them.
Regulation of data use will force companies to ask for consent before developing new algorithms. This extra step will inadvertently slow down innovation. We might agree that it is a worthwhile consequence, but such regulation might be a factor in the global innovation competition. As I described in :
“But Western nations are not the only industrialized nations using data. If centralization and access are advantages, centralized governments without a lot of regulation have a competitive advantage. For example, China has established a social scoring system to rate its citizens. […] Not bound by any regulatory constraints, China can now correlate and analyze which of the datasets will help them best in achieving their objectives. Such datasets will have an impact outside of China on our economy and our norms.”
We will need to not only regulate our markets but also expect our international partners to comply with the same rules. We see this discourse taking place already between the EU and the US. While the . In this national and international setup, grassroots movements such as will not be long-lived. We saw this in 2018, and 2021 will be no different. All of my various WhatsApp groups are back on WhatsApp after a quick time on Signal. Despite that, we should push for clear regulations and agree on international norms on transparency, accountability, and data ownership. Only then will we start to trust again.