locis

Implementation of A Parallel Spatial Co-location Mining Algorithm Based on MapReduce paper

Colocation Pattern

A spatial colocation pattern is a set of features that co-occur in space. For example, two crimes, say Robbery and Assault, would form a colocation pattern if they are reported together at many places. Think of spatial colocation pattern mining as association rule mining in the spatial domain.

Setup

Download and setup Scala, Hadoop (with HDFS) and HBase for versions given here.
Refer this for sample values for Hadoop and HBase configurations in pseudo distributed mode and this for some known issues when setting up HBase.
Start Hadoop using $HADOOP_HOME/sbin/start-dfs.sh and HBase using $HBASE_HOME/bin/start-hbase.sh.
Verify that Hadoop and HBase are working propery by opening http://localhost:50070/ and http://localhost:16010/ respectively.
Copy src/main/resources/reference.conf.sample to src/main/resources/reference.conf and populate values.
Run mvn clean install in project folder.

To download dataset

Obtain an application token from Socrata portal and copy it to socrata.key field in reference.conf.
Copy schema from scripts/schema.
Run python scripts/scrapper/socrata.py.

To load data in HDFS

Run scala -cp target/uber-locis-0.0.1-SNAPSHOT.jar com.github.locis.apps.DataLoader <input_path_to_write_raw_data>
If no path is provided, it writes to /user/locis/input/data

Dummy Dataset

A very small dataset (6 rows) can be found in sampleData/data file. The file can be used for testing the different MapReduce tasks without having to download the socrata dataset.
Add the file to hdfs using the put command $HADOOP_HOME/bin/hdfs dfs -put <path_to_locis>/sampleData/data <input_path_to_write_raw_data> and proceed to run MapReduce tasks.

To run Neighbour Search MapReduce task

Run $HADOOP_HOME/bin/hadoop jar target/uber-locis-0.0.1-SNAPSHOT.jar com.github.locis.apps.NeighborSearch <input_path_to_read_raw_data> <output_path_to_write_neighbors>

To run Neighbour Grouping MapReduce task

Run $HADOOP_HOME/bin/hadoop jar target/uber-locis-0.0.1-SNAPSHOT.jar com.github.locis.apps.NeighborGrouping <input_path_to_read_neighbors> <output_path_to_write_neighbor_groups>

To run Count Instance MapReduce task

Run $HADOOP_HOME/bin/hadoop jar target/uber-locis-0.0.1-SNAPSHOT.jar com.github.locis.apps.CountInstance <input_path_to_read_neighbor_groups> <output_path_to_write_instance_count>

To run Colocation Pattern Search MapReduce task

Run $HADOOP_HOME/bin/hadoop jar target/uber-locis-0.0.1-SNAPSHOT.jar com.github.locis.apps.PatternSearch <input_path_to_read_neighbor_groups> <output_path_to_write_prevalence_scores> <size_of_colocation>

Note that for running colocation pattern search task for size k, the results for size 1 to k-1 should already be in the db. So to find colocation patterns of size k, run the script for 1 to k and not just k. This task can be easily automated using a bash script.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
docs		docs
sampleData		sampleData
scripts		scripts
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

locis

Colocation Pattern

Setup

To download dataset

To load data in HDFS

Dummy Dataset

To run Neighbour Search MapReduce task

To run Neighbour Grouping MapReduce task

To run Count Instance MapReduce task

To run Colocation Pattern Search MapReduce task

License

About

Releases 1

Packages

Languages

shagunsodhani/locis

Folders and files

Latest commit

History

Repository files navigation

locis

Colocation Pattern

Setup

To download dataset

To load data in HDFS

Dummy Dataset

To run Neighbour Search MapReduce task

To run Neighbour Grouping MapReduce task

To run Count Instance MapReduce task

To run Colocation Pattern Search MapReduce task

License

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages