HTools

This project contains a set of shared tools across projects:

io: transparent reading/writing to local and Hadoop FS to allow the same application to run on both platforms (underlying implementations are for instance FSPth and HDFSPath)
search: factory for fast search of regular expressions in byte arrays
collection: many collection types that allow easy processing for specific purposes
fcollection: many collection types that allow efficient processing for specific purposes
extract: modular extraction pipeline that operates on raw byte arrays, for instance to remove noise, convert data, and extract tokens.
compressed: support for compressed tar archives
struct: support for writing structured files using records, e.g. xml, json, tsv, binary.
web: simple multi-threaded crawler for web pages
words: stemmers and stopword lists
buffer: buffered reader and writer that transparently uses an in memory byte array or can be connected to a data stream, that supports reading an writing of a wide range of datatypes.
hadoop: wraps hadoop classes with additional functionality
hadoop.io: custom inputformat and outputformat to automatically setup jobs that read and write custom key-value classes
hadoop.archivereader: configurable readers to process data stored in an archive format
hadoop.backup: backup/update/restore a backup of files or folders on HDFS
hbase: primitive HBase support
lib: several (static) libraries with general purpose tools

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.idea		.idea
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
htools.iml		htools.iml
nb-configuration.xml		nb-configuration.xml
nbactions-createrelease.xml		nbactions-createrelease.xml
nbactions-default.xml		nbactions-default.xml
nbactions-release-profile.xml		nbactions-release-profile.xml
nbactions.xml		nbactions.xml
pom.xml		pom.xml
pom.xml.releaseBackup		pom.xml.releaseBackup
release.properties		release.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

htools.iml

htools.iml

nb-configuration.xml

nb-configuration.xml

nbactions-createrelease.xml

nbactions-createrelease.xml

nbactions-default.xml

nbactions-default.xml

nbactions-release-profile.xml

nbactions-release-profile.xml

nbactions.xml

nbactions.xml

pom.xml

pom.xml

pom.xml.releaseBackup

pom.xml.releaseBackup

release.properties

release.properties

Repository files navigation

HTools

About

Releases

Packages

Contributors 2

Languages

License

htools/htools

Folders and files

Latest commit

History

Repository files navigation

HTools

About

Resources

License

Stars

Watchers

Forks

Languages