This code is released under the Apache License Version 2.0 http://www.apache.org/licenses/.
It is a library to compress and uncompress arrays of integers very fast. The assumption is that most (but not all) values in your array use less than 32 bits. These sort of arrays often come up when using differential coding in databases and information retrieval (e.g., in inverted indexes or column stores).
This libary is used by ClueWeb Tools (https://github.com/lintool/clueweb).
It is a java port of the fastpfor C++ library (https://github.com/lemire/FastPFor). There is also a Go port (https://github.com/reducedb/encoding). The C++ library is used by the zsearch engine (http://victorparmar.github.com/zsearch/) as well as in GMAP and GSNAP (http://research-pub.gene.com/gmap/).
Some CODECs ("integrated codecs") assume that the integers are in sorted orders. Most others do not.
Using this code in your own project is easy with maven, just add the following code in your pom.xml file:
<dependencies>
<dependency>
<groupId>me.lemire.integercompression</groupId>
<artifactId>JavaFastPFOR</artifactId>
<version>0.0.11</version>
</dependency>
</dependencies>
Naturally, you should replace "version" by the version you desire.
You can also download JavaFastPFOR from the Maven central repository: http://repo1.maven.org/maven2/me/lemire/integercompression/JavaFastPFOR/
We found no library that implemented state-of-the-art integer coding techniques such as Binary Packing, NewPFD, OptPFD, Variable Byte, Simple 9 and so on in Java. We wrote one.
Main contributors
- Daniel Lemire, http://lemire.me/en/
- Muraoka Taro, https://github.com/koron
with contributions by
- Di Wu, http://www.facebook.com/diwu1989
- Stefan Ackermann, https://github.com/Stivo
In our tests, Kamikaze PForDelta does not fare well. See the benchmarkresults directory for some results.
A recent Java compiler. Java 7 or better is recommended.
Good instructions on installing Java 7 on Linux:
http://forums.linuxmint.com/viewtopic.php?f=42&t=93052
See example.java for a simple demonstration.
Compile the code and execute me.lemire.integercompression.benchmarktools.Benchmark.
I recommend running all the benchmarks with the "-server" flag on a desktop machine.
Speed is always reported in millions of integers per second.
mvn compile
mvn exec:java
If you use Apache ant, please try this:
$ ant Benchmark
or:
$ ant Benchmark -Dbenchmark.target=BenchmarkBitPacking
http://lemire.me/docs/javafastpfor/
We wrote a research paper which documents many of the CODECs implemented here:
Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Pratice & Experience (to appear) http://arxiv.org/abs/1209.2137
Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, arXiv:1401.6399, 2014 http://arxiv.org/abs/1401.6399
Ikhtear Sharif wrote his M.Sc. thesis on this library:
Ikhtear Sharif, Performance Evaluation of Fast Integer Compression Techniques Over Tables, M.Sc. thesis, UNB 2013. http://hdl.handle.net/1882/45703
He also posted his slides online: http://www.slideshare.net/ikhtearSharif/ikhtear-defense