Skip to content

storm/trident based highly scalable recommendation engine

License

Notifications You must be signed in to change notification settings

jinbochen/iterative-cf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trident-CF is an highly scalable recommendation engine. This library is built on top of Storm, a distributed stream processing framework which runs on a cluster of machines and supports horizontal scaling.

This library implements a user-based collaborative filtering algorithm with binary ratings.

Note that Trident-CF is still in a beta phase and isn't production ready. It only lays the fundamental algorithm.

Usage

Trident-CF is based on Trident, a high-level abstraction for doing realtime computing. If you're familiar with high level batch processing tools like Pig or Cascading, the concepts of Trident will be very familiar.

It's recommended to read the Storm and Trident documentation.

Build collaborative filtering topology

The Trident-CF algorithm is build over a TridentTopology and process a stream of binary ratings in order to measure similarity between users. Binary ratings are added in real time while the similarities for changed users have to be re-processed on demand using a trigger stream. The TridentCollaborativeFilteringBuilder helps you to build up the recommendation engine.

For the purposes of illustration, this example processes an existing stream of binary ratings and re-computes users' similarities every time the trigger stream emit a tuple.

// Your trident topology
TridentTopology topology = ...;

// Stream which contain the binary ratings
Stream preferenceStream = ...;

// Stream which emit an empty tuple when user similarities must be re-computed
Stream triggerStream = ...;

// Create collaborative filtering topology
TridentCollaborativeFiltering tcf = new TridentCollaborativeFilteringBuilder()
    .use(topology)
    .process(preferenceStream)
    .updateSimilaritiesOn(triggerStream)
    .build();

Note that the preference stream must contain at least the 2 fields ("user" and "item") while the trigger stream doesn't need any field.

Trident-CF provides 2 spouts implementations which can be used to create the trigger stream : DelayedSimilaritiesUpdateLauncher and PermanentSimilaritiesUpdateLauncher.

Get item recommendations

Item recommendations are generated by aggregating preferences of the most similar users. Here's the code to process a recommendation query stream to retrieve item recommendations :

// The Trident-CF algorithm
TridentCollaborativeFiltering tcf = ...;

// Recommendations parameters
int nbItems = 10;
int neighborhoodSize = 100;

// Stream containing recommendation queries
Stream recommendationQueryStream = ...;

// Create a new stream which contains a single field : "recommendedItems" (a List of RecommendedItem).
Stream recommendationStream = tcf.createItemRecommendationStream(recommendationQueryStream, nbItems, neighborhoodSize);

Note that the recommendation query stream must contain a "user" field containing a user id (long).

Configure the Trident CF topology

You can configure the Trident-CF by providing a custom Options to the TridentCollaborativeFilteringBuilder :

// Custom options
Options options = ...;

// Create collaborative filtering topology
TridentCollaborativeFiltering tcf = new TridentCollaborativeFilteringBuilder()
    .use(topology)
    .with(options)
    .process(preferenceStream)
    .updateSimilaritiesOn(triggerStream)
    .build();

This Options lets you specify others StateFactory implementations and new parallelism configurations.

Trident-CF states

Trident-CF uses some non-transactional memory states by default however it provides non-transactional redis states. You can easily instanciate pre-configured Options with redis states :

Options options = Options.redis();

Maven

To use Trident-CF, you'll need the jar on your classpath. Trident-CF is hosted on Clojars (a Maven repository). You should either download and include the last version jar in the classpath for your project or use Maven to include Trident-CF as a development dependency in your pom.xml :

<repository>
  <id>clojars.org</id>
  <url>http://clojars.org/repo</url>
</repository>
<dependency>
  <groupId>com.github.pmerienne</groupId>
  <artifactId>trident-cf</artifactId>
  <version>0.0.1</version>
</dependency>

About

storm/trident based highly scalable recommendation engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%