Don’t blame Slack for training its AI on your sensitive data

An abstract image of digital security.
(Image credit: Shutterstock) (Image credit: Shutterstock)

Slack has come under siege for using customer data to train its global AI models and generative AI add-on. Sure, requiring users to manually opt-out via email seems sneaky (isn’t avoiding email the whole point of Slack?), but the messaging app doesn’t bear all the responsibility here. The most popular workplace apps have all integrated AI into their products, including Slack AI, Jira AI-Powered Virtual Agent, and Gemini for Google Workspace. Anyone using technology today — especially for work — should assume their data will be used to train AI. That’s why it’s up to individuals and companies to avoid sharing sensitive data with third-party apps. Anything less is naive and risky.

Rohan Sathe

Co-founder and CTO of Nightfall AI.

Trust no one

There’s a valid argument floating around the internet that Slack’s opt-out policy sets a dangerous precedent for other SaaS apps to automatically opt customers in to share data with AI models and LLMs. Regulating bodies will likely examine this, especially for companies working in locations protected by the General Data Protection Regulations (but not the California Consumer Privacy Act, which allows businesses to process personal data without permission until a user opts out). Until then, anyone using AI — which IBM estimates is more than 40% of enterprises — should assume shared information will be used to train models.

We could dive into the ethics of training AI on individuals' billion-dollar business ideas that come to life in Slack threads, but surely someone on the internet has already written that. Instead, let's focus on what’s actually important: whether or not Slack’s AI models are trained on its users’ sensitive data. This means personally identifiable information (PII) like social security numbers, names, email addresses, and phone numbers; personal health information (PHI); or secrets and credentials that can expose PII, PHI, and other valuable business and customer information. This is important because if AI is trained on this information, it creates risks for sensitive data exposure, prompt injection attacks, model abuse, and more. And those are the things that can make or break a company.

While Slack’s updated privacy principles state, “For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of Customer Data,” companies should take it upon themselves to ensure that their sensitive data doesn’t come in contact with any third-party’s AI models. Here’s how.

Adopt a shared responsibility model

This isn’t the first time the question of who holds the onus of security, the service provider or the technology user, has come up. In fact, it was such an important topic of discussion during the mass migration to the cloud that The National Institute of Standards and Technology (NIST) came up with an answer. It is a framework that clearly defines the responsibilities of the cloud service providers (CSPs) and the cloud consumers to ensure that both parties contribute to security and compliance. This is called the cloud shared responsibility model, and it’s been working well for more than a decade.

The same shared responsibility model can be applied if you substitute Slack (or any other SaaS app that uses AI) for the CSP. Slack should be responsible for securing its underlying infrastructure, platform, and services, and Slack customers should be responsible for securing their sensitive company and customer data. In this model, here are some ways Slack customers can ensure that sensitive data isn’t used to train Slack’s AI.

- Use a human firewall. Employees are the first line of defense against sensitive data entering a third-party application like Slack. While regular security training is important, it is best combined with a solution that identifies potential policy violations and lets employees remove or encrypt sensitive data before sharing. - Filter inputs. The best way to prevent sensitive data from being input into Slack’s AI model is not to share it with Slack in the first place. Companies should use a solution that intercepts outgoing Slack messages and scrubs or encrypts sensitive data before it’s shared with Slack. - Never share secrets, keys, or credentials on Slack. At a minimum, this information should be encrypted and stored and shared using a password manager or vault. In addition, companies should leverage the tips for using a human firewall and filtering inputs above to ensure that these keys to the kingdom don’t accidentally get shared via Slack (or email or GitHub — we’ve seen how that goes).

Perhaps the Hacker News community is rightfully pissed that they didn’t know they needed to opt out of letting Slack use their data to train their global AI models and Slack AI. And for those opting out now, there are still many unanswered questions like whether or not their data will be retroactively deleted from Slack’s models and what compliance implications that may pose. This has surely prompted discussions about transparency around AI model training in conference rooms or Slack channels (too soon?) across the industry, and we’re likely to see more companies updating their privacy policies in the coming months to prevent similar user backlash to what Slack’s seen this week.

No matter what those policies say, the best way to prevent AI training on your sensitive data is to avoid exposing it in the first place.

We've featured the best encryption software.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Rohan Sathe is the co-founder and CTO of Nightfall AI, the first DLP platform to leverage generative AI to discover, classify and protect sensitive data across the modern enterprise.