Skip to content

bejvisek/htools

 
 

Repository files navigation

HTools

This project contains a set of shared tools across projects:

  • io: transparent reading/writing to local and Hadoop FS to allow the same application to run on both platforms (underlying implementations are for instance FSPth and HDFSPath)
  • search: factory for fast search of regular expressions in byte arrays
  • collection: many collection types that allow easy processing for specific purposes
  • fcollection: many collection types that allow efficient processing for specific purposes
  • extract: modular extraction pipeline that operates on raw byte arrays, for instance to remove noise, convert data, and extract tokens.
  • compressed: support for compressed tar archives
  • struct: support for writing structured files using records, e.g. xml, json, tsv, binary.
  • web: simple multi-threaded crawler for web pages
  • words: stemmers and stopword lists
  • buffer: buffered reader and writer that transparently uses an in memory byte array or can be connected to a data stream, that supports reading an writing of a wide range of datatypes.
  • hadoop: wraps hadoop classes with additional functionality
  • hadoop.io: custom inputformat and outputformat to automatically setup jobs that read and write custom key-value classes
  • hadoop.archivereader: configurable readers to process data stored in an archive format
  • hadoop.backup: backup/update/restore a backup of files or folders on HDFS
  • hbase: primitive HBase support
  • lib: several (static) libraries with general purpose tools

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 99.5%
  • ANTLR 0.5%