MapReduce Join Algorithms for RDF

I implemented two join algorithms: the sort-merge join and the improved repartition join. I am in the process of evaluating the algorithms on Amazon EC2 and Elastic MapReduce. A lot of work is required to generate, store, and transfer the terabyte size files on Amazon S3 - all while trying to keep costs low.

Sort-Merge Join

The sort-merge (reduce side) join is a join algorithm for use in MapReduce environments (e.g. Hadoop). Each mapper node reads its local data blocks and extracts the join attribute for that record. These records are then sent to the appropriate reducer and the actual comparison is done at the reducer nodes.

Repartition Join

The repartition join uses a compound key to identify which relation the row originates from. It uses a custom Hadoop Partitioner, Sort, and Grouping function. You can read a paper detailing this implementation here:

[1] Blanas, Spyros, et al. "A comparison of join algorithms for log processing in mapreduce." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010.

Software

This join is implemented using Hadoop 2.2.0 and HBase 0.94.7.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.settings		.settings
src		src
.classpath		.classpath
.gitignore		.gitignore
BSBM-SplitKeys-10M.txt		BSBM-SplitKeys-10M.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.settings

.settings

src

src

.classpath

.classpath

.gitignore

.gitignore

BSBM-SplitKeys-10M.txt

BSBM-SplitKeys-10M.txt

README.md

README.md

pom.xml

pom.xml

Repository files navigation

MapReduce Join Algorithms for RDF

Sort-Merge Join

Repartition Join

Software

About

Releases

Packages

Languages

subbu3490/rdf-mapreduce-joins

Folders and files

Latest commit

History

Repository files navigation

MapReduce Join Algorithms for RDF

Sort-Merge Join

Repartition Join

Software

About

Resources

Stars

Watchers

Forks

Languages