Social Media Listening has two lessons for ChatGPT
Even ChatGPT learns everyday
Most of the complaints OpenAI has faced in Europe boil down to two:
1.- “I asked about myself, and that bastard ChatGPT responded with inaccurate information: I FILE A COMPLAINT.”
Or simply:
2.- “I asked about myself, and that bastard ChatGPT knew who I was: it was trained on my personal data without my consent: I FILE A COMPLAINT.”
I’m not saying there aren’t other data protection issues, but these are the most notable ones, so far.
Every week in Zero Party Data, we break down the latest developments in technology, AI, and data protection from a legal perspective.
This post is the second part of this one:
AI vs GDPR: Three alternative paths to compliance
[Golden Girls´ Sophia Petrillo speaking]: “Remember: Europe, December 18, 2024”
Social Media Listening
Social media platforms (“social networks” but also the digital versions of legacy media) have democratized freedom of expression and information, giving a voice to the anonymous citizen, who can gain traction and popularity on these platforms to become a “citizen journalist” or the “visible face of a shared opinion” independently, without relying on traditional media outlets.
This phenomenon brings a significant increase in public exposure (with all that entails), especially for those who reach the status of Influencer or Key Opinion Leader, as the specialized terminology puts it.
Social Media Listening (SML) emerges as a hybrid between traditional public opinion surveys or representative metrics from the 20th century and today’s ability to capture the opinions of an entire community present on a particular social network—while weighting those opinions based on the varying influence of its members.
The Double DNA Strand of SML from a Data Protection Perspective
The legitimacy of personal data processing in social media listening is based on an inversely proportional double rule, stated by both CJEU and EDPB:
The more followers (and influence) a person has on social media, the more weight their opinion carries (and the greater the interest in identifying or distinguishing them from the masses), because they serve as a catalyst for many others. Conversely, the identity of each individual who is influenced or not is merely a statistical point that is of no inherent interest.
The more public or publicly interesting a person is—whether due to voluntary exposure or circumstances of public interest—the smaller their protected privacy becomes.
Explanation with Pictures
The second point deserves some explanation. I think it’s best understood with the example of the Rodríguez Zapatero family:
Case 1: Zapatero
While José Luis Rodríguez Zapatero was the President of Spain, everything he did was of general interest. He was a public figure.
An entire body of case law and doctrine from regulatory authorities recognizes that public figures have a reduced right to data protection, limited to their personal and family sphere. And I’d argue not even that—because if a public figure gets too close to someone who is not as public (and not their spouse), no journalist, professional or amateur, will be punished for making it public.
Because private behaviors that contradict public ones are matters of general interest.
This same principle, read in reverse, explains why Trump voters were affected at all when their champion was convicted of rape.
No one saw an inconsistency between his public and private conduct.
Neither his voters, not his detractors.
Case 2: A Random Gothic Girl
She is not a public figure. We don’t know who she is or what her name is. Her right to personal data protection fully covers her.
But now we reach an interesting case: the in-between situations.
Case 3: Zapatero’s Daughters
When the Obamas visited Spain, they took a picture with the Rodríguez Zapatero family—including his daughters.
Not a single loudmouth refrained from making unsolicited comments about the outfits of the two girls, who were minors at the time.
This case was in a gray area because the girls posed for an official photo that was made public.
In my opinion, they should have been more protected.
Case 4: Alba, Zapatero’s Daughter Again
But what happens when Alba, one of Zapatero’s daughters, takes the step of "becoming an influencer"—that is, publicly sharing her opinions and views on certain topics in an attempt to shape public opinion?
Without going too deep (because this isn’t the main topic of the post), from a data protection perspective, the key criteria for properly assessing these situations are “the legitimate expectations of the data subject” and “the consequences of processing for the data subject.”
Now it’s clearer why we are not all equal under the GDPR (or to Social Media Listeners):
The scope of personal data protection diminishes significantly as a person moves from being (i) a “random anonymous creepy girl” to (ii) the “daughter of a public figure,” then to (iii) an “influencer” (who come in many shapes and sizes), and finally to (iv) a “public figure” like Zapatero.
That’s why, when feeding a dataset to a multimodal model that will learn from all the published data and works, it seems like a good idea to emphasize that the statistical significance of a person’s appearances is a criterion worth considering when setting thresholds above or below which the model (preferably the model itself, rather than the system) simply does not respond—no matter how much it’s asked.
There’s More, of Course
This, of course, is not a complete solution—it’s just a starting point.
My approach is always the same: start with what is already legally settled and build from there.
It’s obvious, for example, that OpenAI has processed and may continue to process the data of “non-public individuals,” even if it restricts access to that data for its users.
The application of this filtering or thresholding should be done when collecting the data that will form the training dataset—before model training—or during training so that the model ignores information about “statistically insignificant” individuals, so to speak.
As for other rights, such as the right to erasure, we’ll likely have to wait for the CJEU to work its magic with one of its landmark rulings—like the Google Spain case—balancing technical limitations with the application of the rule of law and guarantees for individual rights.
Those very principles and guarantees that, with each passing day, we are getting closer to calling "European" principles and guarantees—properly, and without condescension.
Have a great week.
Jorge García Herrero