Introducing a New Privacy Testing Library in TensorFlow
|Overview of a membership inference attack. An attacker tries to figure out whether certain examples were part of the training data.|
Privacy is an emerging topic in the Machine Learning community. There aren’t canonical guidelines to produce a private model. There is a growing body of research showing that a machine learning model can leak sensitive information of the training dataset, thus creating a privacy risk for users in the training set.
Last year, we launched TensorFlow Privacy, enabling developers to train their models with differential privacy. Differential privacy adds noise to hide individual examples in the training dataset. However, this noise is designed for academic worst-case scenarios and can significantly affect model accuracy.
These challenges led us to tackle privacy from a different perspective. A few years ago, research around the privacy properties of machine learning models started to emerge. Cost-efficient “membership inference attacks” predict whether a specific piece of data was used during training. If an attacker is able to make a prediction with high accuracy, they will likely succeed in figuring out if a data piece was used in the training set. The biggest advantage of a membership inference attack is that it is easy to perform, i.e., does not require any re-training.
A test produces a vulnerability score that determines whether the model leaks information from the training set. We found that this vulnerability score often decreases with heuristics, such as early stopping or using DP-SGD for training.
Unsurprisingly, differential privacy helps in reducing these vulnerability scores. Even with very small amounts of noise, the vulnerability score decreased.
After using membership inference tests internally, we’re sharing them with developers to help them build more private models, explore better architecture choices, use regularization techniques such as early stopping, dropout, weight decay, and input augmentation, or collect more data. Ultimately, these tests can help the developer community identify more architectures that incorporate privacy design principles and data processing choices.
We hope this library will be the starting point of a robust privacy testing suite that can be used by any machine learning developer around the world. Moving forward, we’ll explore the feasibility of extending membership inference attacks beyond classifiers and develop new tests. We’ll also explore adding this test to the TensorFlow ecosystem by integrating with TFX.
Reach out to firstname.lastname@example.org and let us know how you’re using this new module. We’re keen on hearing your stories, feedback, and suggestions!
Acknowledgments: Yurii Sushko, Andreas Terzis, Miguel Guevara, Niki Kilbertus, Vadym Doroshenko, Borja De Balle Pigem, Ananth Raghunathan. Read More
Related Google News:
- Sustainable Monetized Websites: A new video series March 24, 2021
- Introducing Network Connectivity Center: A revolution in simplifying on-prem and cloud networking March 23, 2021
- New and improved Google Chat UI on the web March 22, 2021
- Introducing the latest Slurm on GCP scripts March 19, 2021
- TensorFlow Quantum turns one year old March 18, 2021
- How we built a new tool without ever meeting in person March 18, 2021
- Turbo boost your Compute Engine workloads with new 100 Gbps networking March 17, 2021
- How carbon-free is your cloud? New data lets you know March 17, 2021