This is a java project. I have used Eclipse to develop and run this
- I have not tested this extensively.
- The similarity measure used to recommend relevant videos is Jaccard, a simple set based similarity measure
All the configuration can be set at "data/config/config.properties" The main functions are the following
- Tagger
- Recommender
Tagger parses the json file provided with the youtube descriptions Then uses Zemanta API to spot entities in the description and Title of the videos without duplication The entities are used as tags and the following output files are generated:
- outputTaggedJson.json: This JSON file is used to recommend using the tags and categories. This helps to not hit the Zemanta API unnecessarily.
- outputTaggedVideos: This file is a viewable tagged file.
Recommender uses the files generated by the Tagger. Further information on the configurations can be found in config.properties. The Video title for which the recommendation has to be done can also be set in config.properties. Recommender uses a simple Jaccard co-efficient with both categories and tags to recommend other videos.
Let me know if you have any questions