The SmartNoise SDK includes 2 packages:
- smartnoise-sql: Run differentially private SQL queries
- smartnoise-synth: Generate differentially private synthetic data
To get started, see the examples below. Click into each project for more detailed examples.
pip install smartnoise-sql
import snsql
from snsql import Privacy
import pandas as pd
csv_path = 'PUMS.csv'
meta_path = 'PUMS.yaml'
data = pd.read_csv(csv_path)
privacy = Privacy(epsilon=1.0, delta=0.01)
reader = snsql.from_connection(data, privacy=privacy, metadata=meta_path)
result = reader.execute('SELECT sex, AVG(age) AS age FROM PUMS.PUMS GROUP BY sex')
print(result)
See the SQL project
pip install smartnoise-synth
import pandas as pd
import numpy as np
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
nf = pums.to_numpy().astype(int)
synth = snsynth.MWEMSynthesizer(epsilon=1.0, split_factor=nf.shape[1])
synth.fit(nf)
sample = synth.sample(10) # get 10 synthetic rows
print(sample)
import pandas as pd
import numpy as np
from snsynth.pytorch.nn import PATECTGAN
from snsynth.pytorch import PytorchDPSynthesizer
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'), None)
synth.fit(pums, categorical_columns=pums.columns.values.tolist())
sample = synth.sample(10) # synthesize 10 rows
print(sample)
See the Synthesizers project
- You are encouraged to join us on GitHub Discussions
- Please use GitHub Issues for bug reports and feature requests.
- For other requests, including security issues, please contact us at [email protected].
Please let us know if you encounter a bug by creating an issue.
We appreciate all contributions. Please review the contributors guide. We welcome pull requests with bug-fixes without prior discussion.
If you plan to contribute new features, utility functions or extensions to this system, please first open an issue and discuss the feature with us.