Skip to content

scoppens/eic-entity-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tool for TREC Entity Track 2011

This tool is a demo for indexing and querying the Sindice dataset 2011 available
at [1]. Its purpose is to show how to extract entities from the dataset and to
index them using SIREn [2]. Some example queries are written in the JUnit tests.
The tool is composed of the following classes:
- Indexing: it takes care of creating the index;
- Sindice{ED,DE}Indexing: it extracts entities from the dataset, depending on
  the format;
- Entity: it is the class representing an Entity;
- Utils: it contains utility methods used for indexing.

The command line interface is the class org.sindice.siren.index.IndexingCLI.

### Basic build steps ###

  0) Install JDK 1.6 (or greater), Maven 2.0.9 (or greater)
  1) Move to the root folder of the tool
  2) Run maven

Step 0) Set up your development environment (JDK 1.6 or greater, Maven 2.0.9 or
greater)

We'll assume that you know how to get and set up the JDK - if you don't, then we
suggest starting at http://java.sun.com and learning more about Java, before
continuing as this tool is based on SIREn. SIREn runs with JDK 1.6 and later.

Like many Open Source java projects, SIREn uses Apache Maven for build control.
Specifically, you MUST use Maven version 2.0.9 or greater.

Step 1) From the command line, change (cd) into the root directory of the tool

The root directory contains the project pom.xml file. By default, you do not
need to change any of the settings in this file, but you do need to run maven
from this location so it knows where to find pom.xml.

Step 2) Run maven

Assuming you have maven in your PATH, typing the following commands at the shell
prompt should run maven.

  1-    To create a jar with maven:
                $ mvn assembly:assembly
        this will create the jar target/siren-eostool-0.1-SNAPSHOT-assembly.jar

  2-    To run the JUnit tests with maven:
                $ mvn test

  3-    To create an index for the Sindice-ED dataset format:
                $ java -cp $JAR org.sindice.siren.index.IndexingCLI     \
                                --dumps-dir src/test/resources/         \
                                --index-dir /tmp/test-ED		\
                                --format SINDICE_ED
                where $JAR=target/siren-eostool-0.1-SNAPSHOT-assembly.jar
        This will create an index at /tmp/test-ED from the files at
        src/test/resources/ corresponding to the dataset format SINDICE_ED.

[1] http://data.sindice.com/trec2011/index.html
[2] https://github.com/rdelbru/SIREn

About

Ingestor for building Siren index based on ntuple files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages