It would be horrifying to discover that every conversation I'd had in public had been recorded and indexed. It would be even more horrifying if those conversations were collected by a 3rd party. A 3rd party with a blatant lack of understanding of how to anonymize data? Paralyzing.
This happens constantly though. We talk with our friends and our conversations are swept up and tucked away for days, months, years, even decades. We know it's happening… that's how text based conversations work.
Social Media is a bit different than a conversation in a cafe or online chat room. We're posting for a bigger audience. Sometimes we're posting to anyone who will read our words. But as anyone who had a Facebook as a teenager (and hasn't deleted it completely yet) knows… we can say some embarassing stuff, and the Internet remembers. Luckily we can delete our angsty, edgy posts. But should we have to do this consciously?
Why does something we said 10 years ago not fade into memory?
Who benefits from these memories? Who is scrolling back even six months in your Timeline?
I can understand, perhaps, the desire for a diary of sorts. I've never been able to keep a regular journal—my parents taught me that anything written down can be read—so social media posts and old chat logs have occassionally been a substitute, a font of nostalgia and sometimes chagrin. But it's possible to save these memories without leaving them up for the world (or even just your Friends List) to see. Archives are tedious to trawl through, but that's something that could change.
Some Mastodon instances have set up ephemeral posting (so that old posts are automatically deleted from the server). There is a service that is used to purge old Toots from Mastodon. Scripts exist to purge old Tweets and old Tumblr posts. Signal (the messaging app) lets you automatically delete old messages. But these things aren't baked into most of our social media. And I'm not sure why we don't think about it more often, especially when our Public conversations are copied, stored, processed, and published without our consent.
Luckily Harvard took down the article that prompted this blog post (hopefully it stays down).
Legal issue or Data Usage Agreement Many entries in the datasets do not fulfill the law about personal data release since they allow identification of personal information
This should never have happened. But it did, and it will certainly happen again, and we should consider how to mitigate the damage of this invasion of privacy. The fact that our privacy is never 100% safe doesn't mean we can't make it safer by limiting what future bad actors can get their grimy hands on.