How AI can identify people even in anonymized datasets

 How you interact with a crowd may help you stick out from it, at least to artificial intelligence.

When fed information about a target individual’s mobile phone interactions, as well as their contacts’ interactions, AI can correctly pick the target out of more than 40,000 anonymous mobile phone service subscribers more than half the time, researchers report January 25 in Nature Communications. The findings suggest humans socialize in ways that could be used to pick them out of datasets that are supposedly anonymized.

It’s no surprise that people tend to remain within established social circles and that these regular interactions form a stable pattern over time, says Jaideep Srivastava, a computer scientist from the University of Minnesota in Minneapolis who was not involved in the study. “But the fact that you can use that pattern to identify the individual, that part is surprising.”

According to the European Union’s General Data Protection Regulation and the California Consumer Privacy Act, companies that collect information about people’s daily interactions can share or sell this data without users’ consent. The catch is that the data must be anonymized. Some organizations might assume that they can meet this standard by giving users pseudonyms, says Yves-Alexandre de Montjoye, a computational privacy researcher at Imperial College London. “Our results are showing that this is not true.”





de Montjoye and his colleagues hypothesized that people’s social behavior could be used to pick them out of datasets containing information on anonymous users’ interactions. To test their hypothesis, the researchers taught an artificial neural network — an AI that

simulates the neural circuitry of a biological brain — to recognize patterns in users’ weekly social interactions.

For one test, the researchers trained the neural network with data from an unidentified mobile phone service that detailed 43,606 subscribers’ interactions over 14 weeks. This data included each interaction’s date, time, duration, type (call or text), the pseudonyms of the involved parties and who initiated the communication.

Each user’s interaction data were organized into web-shaped data structures consisting of nodes representing the user and their contacts. Strings threaded with interaction data connected the nodes. The AI was shown the interaction web of a known person and then set loose to search the anonymized data for the web that bore the closest resemblance.

The neural network linked just 14.7 percent of individuals to their anonymized selves when it was shown interaction webs containing information about a target’s phone interactions that occurred one week after the latest records in the anonymous dataset. But it identified 52.4 percent of people when given not just information about the target’s interactions but also those of their contacts. When the researchers provided the AI with the target’s and contacts’ interaction data collected 20 weeks after the anonymous dataset, the AI still correctly identified users 24.3 percent of the time, suggesting social behavior remains identifiable for long periods of time.

_____________________________________________________________________________________________________________

More info:
Website link: https://databasescientist.org/
Contact us: contact@databasescientist.org
Nomination Link: https://databasescientist.org/award-nomination/?category=Awards&rcategory=Awardee

_______________________________________________________________________________________________________________
social media:

Twitter: https://x.com/databasesc10061
Pinterest: https://in.pinterest.com/databasescientist/
Linked in: https://www.linkedin.com/in/databasescientist-database-440a12365/

Comments

Popular posts from this blog

Large Language Models and Vector Databases for News Recommendations

Memory Management in Flutter: Best Practices and Pitfalls

NIH autism database announcement raises concerns among researchers