Skip to content

dbackhausen/TaskClustering

Repository files navigation

TaskClustering

This project aims to create task-based clusters from the users' browser history.

The project is split into three subprojects: -BHCBaseDataCollector analyzes the history and fetches the contents of visited pages from the web. -BHCTopicExtractor uses Maui to collect topics from the fetched documents. -BHCPageClusterer applies a hierarchical clustering to the document collection and labels each clusters using the topics generated by BHCTopicExtractor.

At the moment the project is limited to history files of Mozilla Firefox browsers (places.sqlite).

First Steps

Build projects: In TaskClustering directory, run: ant -f buildAll.xml

Run all projects sequentially: In TaskClustering directry, run: ant -f runAll.xml

Each of the projects can be run individually with its ant file, stored in TaskClustering Dir:

  • runBaseDataCollector.xml
  • runTopicExtractor.xml
  • runPageClusterer.xml

Note: Each project expects the output of the previous stage to be in place.

Configuration

Each project requires the file 'config.ini' in the TaskClustering directory. The entries can remain untouched for the most part. Relevant Exceptions: Entry 'history' in [filesystem] section: denotes filename of the browser history file that is to be analyzed (usually places.sqlite). Entry 'canopy_ranges' in [clustering] section: defines how many levels the cluster hierarchy has and how aggressive the clusters of one level are split in order to build the next deeper level. List of double values, seperated by blanks.

About

This project aims to create task-based clusters from the users' browser history.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages