Trident-CF is an highly scalable recommendation engine. This library is built on top of Storm, a distributed stream processing framework which runs on a cluster of machines and supports horizontal scaling.
This library implements a user-based collaborative filtering algorithm with binary ratings.
Note that Trident-CF is still in a beta phase and isn't production ready. It only lays the fundamental algorithm.
Trident-CF is based on Trident, a high-level abstraction for doing realtime computing. If you're familiar with high level batch processing tools like Pig or Cascading, the concepts of Trident will be very familiar.
It's recommended to read the Storm and Trident documentation.
The Trident-CF algorithm is build over a TridentTopology and process a stream of binary ratings in order to measure similarity between users. Binary ratings are added in real time while the similarities for changed users have to be re-processed on demand using a trigger stream. The TridentCollaborativeFilteringBuilder helps you to build up the recommendation engine.
For the purposes of illustration, this example processes an existing stream of binary ratings and re-computes users' similarities every time the trigger stream emit a tuple.
// Your trident topology
TridentTopology topology = ...;
// Stream which contain the binary ratings
Stream preferenceStream = ...;
// Stream which emit an empty tuple when user similarities must be re-computed
Stream triggerStream = ...;
// Create collaborative filtering topology
TridentCollaborativeFiltering tcf = new TridentCollaborativeFilteringBuilder()
.use(topology)
.process(preferenceStream)
.updateSimilaritiesOn(triggerStream)
.build();
Note that the preference stream must contain at least the 2 fields ("user" and "item") while the trigger stream doesn't need any field.
Trident-CF provides 2 spouts implementations which can be used to create the trigger stream : DelayedSimilaritiesUpdateLauncher and PermanentSimilaritiesUpdateLauncher.
Item recommendations are generated by aggregating preferences of the most similar users. Here's the code to process a recommendation query stream to retrieve item recommendations :
// The Trident-CF algorithm
TridentCollaborativeFiltering tcf = ...;
// Recommendations parameters
int nbItems = 10;
int neighborhoodSize = 100;
// Stream containing recommendation queries
Stream recommendationQueryStream = ...;
// Create a new stream which contains a single field : "recommendedItems" (a List of RecommendedItem).
Stream recommendationStream = tcf.createItemRecommendationStream(recommendationQueryStream, nbItems, neighborhoodSize);
Note that the recommendation query stream must contain a "user" field containing a user id (long).
You can configure the Trident-CF by providing a custom Options to the TridentCollaborativeFilteringBuilder :
// Custom options
Options options = ...;
// Create collaborative filtering topology
TridentCollaborativeFiltering tcf = new TridentCollaborativeFilteringBuilder()
.use(topology)
.with(options)
.process(preferenceStream)
.updateSimilaritiesOn(triggerStream)
.build();
This Options lets you specify others StateFactory implementations and new parallelism configurations.
Trident-CF uses some non-transactional memory states by default however it provides non-transactional redis states. You can easily instanciate pre-configured Options with redis states :
Options options = Options.redis();
To use Trident-CF, you'll need the jar on your classpath. Trident-CF is hosted on Clojars (a Maven repository). You should either download and include the last version jar in the classpath for your project or use Maven to include Trident-CF as a development dependency in your pom.xml :
<repository>
<id>clojars.org</id>
<url>http://clojars.org/repo</url>
</repository>
<dependency>
<groupId>com.github.pmerienne</groupId>
<artifactId>trident-cf</artifactId>
<version>0.0.1</version>
</dependency>