Skip to content

nddsg/discrmetapath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

discrmetapath

Usage:

  1. Please clone this repo to your local machine.

  2. Download Wikipedia's database from here.

  3. Import this database into your MySQL Server.

  4. Edit DiscrMetaPath/src/main/java/edu/nd/dsg/util/ConnectionPool.java, change URL, USER, and PASS to yours.

  5. Unzip DiscrMetaPath/data.tar.gz to DiscrMetaPath/, after this you should have all the data under DiscrMetaPath/data

  6. Build the project by make wikibuild. The jar file will be generated under DiscrMetaPath/target/

  7. Run generated jar file by java -jar JAR_FILE_YOU_GENERATED

The command line arguments are:

    Usage
    Generate paths: -GEN [-NoSQL cache types first to speedup] [-all get all paths instead of pathLength == 2] [-p build patent]
    Translate paths: -TRANS [-a output all paths] [-nd do not get most discri/similar paths] [-oNum get NUM paths between discri&similar paths] [-p build patent]
    Generate Term frequency: -TERM [-BuildWikiTF generate term frequency] [-BuildPatentTF generate term frequency] [-BuildWikiDF generate document frequency] [-BuildPatentDF generate document frequency]
    Generate Cos distance frequency(sequential): -COS [-p build patent]
    Generate BM25 score: -BM [-ACC accumulative (x,y),(x+y,z),...] [-NODE  sequential (x,y),(y,z),...] [-p build patent]

Results:

If you only interested in the results we get, you can get the data from result folder. The data format for each file is:

  • For CrowdFlower result files:

      _unit_id,
      _golden,
      _canary,
      _unit_state,
      _trusted_judgments,
      _last_judgment_at,
      choose_path,    // Path that chosen by human
      choose_path:confidence,
      end,    // End article
      path_1, // Path between start and end, generated by our algorithm
      path_2,
      path_3,
      path_4,
      path_5,
      start   // Start article
    
  • For other csv files:

      groupId, // Each unique groupId represent for a CrowdFlower task
      pathId, //  Equivalent to CrowFlower's path_*
      nodeId, //  Score of the node at position `i` in path_*
    

About

Discriminative Meta Paths

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published