This is an implementation of an algorithm to search over encrypted data.
Specifically, given two tables A
and B
, some join attributes A.a
and B.b
,
and a filter clause A.x = c_a and B.y = c_b
, we would like to perform the following query
SELECT * FROM A
INNER JOIN B ON A.x = B.x
WHERE A.a IN (s_1) AND B.b IN (t_1)
(for constants s_1
, t_1
), with minimal leakage. That is, using a (master) secret key,
we would like to encrypt the tables (i.e. their rows),
execute an encrypted query on them, resulting in the retrieval of the correct rows.
This implementation uses a function-hiding inner product encryption scheme whose code is contained in fhipe. Section 2.3 of the paper is the API specification for the functionality implemented in fhipe/ipe.py. See the fhipe project's readme for more details.
Make sure you have the following installed:
$ git clone --recursive https://github.com/EquiJoins/EquiJoinsOverEncryptedData.git
$ cd encryped-joins
$ sudo make install # ( or use `make install-mac` if running on MAC OS X)
See the following list for common build errors and how to fix them.
-
Error:
charm/core/math/integer/integermodule.c:129:19: error: dereferencing pointer to incomplete type ‘BIGNUM {aka struct bignum_st}’
Resolution: this is a known issue with an older version of
charm
. Using the latest dev branch of thecharm
repo should resolve this (i.e.cd charm
andgit checkout dev
).
In hash_based_join.py, we run a hash based join
on TPC-H data (schema found here).
In the hash_based_impl/data/
folder, one can find the datasets separated by the scale factor used to generate them.
We only use the orders
table and customer
table. For these tables, we have appended a column selectivity
to have granular control over which rows are selected in a WHERE
clause. For example, for selectivity 1/100
,
there will be num_rows / 100
rows that have value 100
for the selectivity
attribute.
The organization of the py submodules are as follows
- fhipe/ipe.py contains the code required to generate the vectors, ciphertext and tag
- fhipe/join.py contains the code to run the actual join query - currently it is a nested inner loop join
- fhipe/encrypt_functions.py contains the code required to encrypt a specific row or table - typically done as a preprocessing step
Typical workflow
- Load in a table from a csv file to a 2d array
- Determine the number of attributes that are in the query, the vector length is equal to the number of attributes plus 4
- Generate a secret key by calling ipe.setup(int) and passing in the vector length