Skip to content

carrot2/carrot2

Repository files navigation

Github Build Status

Carrot2

Carrot2 is a programming library for clustering text. It can automatically discover groups of related documents and label them with short key terms or phrases.

Carrot2 can turn, for example, search result titles and snippets into groups like these:

Search result titles and snippets and corresponding cluster labels (right).

Installation

Carrot2 is a software component and typically integrates with other software as a library dependency (see the API documentation available with each release).

Binary releases are published on GitHub and they ship with a HTTP/JSON REST API service called the DCS (document clustering server) for integration with other languages.

Integration with document retrieval services is possible via Apache Solr plugin and Elasticsearch plugin.

Building from Sources

If you need to build the distribution from sources, run:

./gradlew -p distribution assemble

The distribution is placed under distribution/build/dist/ and a compressed version is available at distribution/build/distZip/

Documentation

Source code

Source code is at GitHub.

Contact and more information

License

Carrot2 is licensed under the BSD license.