Skip to content
/ tnh Public
forked from internetarchive/tnh

(T)he (N)ew (H)otness. Improved full-txt search of archival web data.

License

Notifications You must be signed in to change notification settings

rlugojr/tnh

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The New Hotness
2010-06-16

The New Hotness (TNH) is a (near) drop-in replacement for
NutchWAX-based search services.  It is primarily intended to be used
internally at Internet Archive, but may be of use/interest to other
NutchWAX users.

TNH started as an experiment to prototype a Lucene TopDocCollector
that collapses results based on the 'site' field as the documents are
scored, rather than collapsing after the results collected.  The
result is the CollapsingCollector class.

Once that class was developed, an OpenSearch web service was built, as
well as metasearch across multiple OpenSearch servers.

The last piece was the ability to read Nutch segments for
snippetizing, thus enabling use of NutchWAX-built index+segment
shards.

About

(T)he (N)ew (H)otness. Improved full-txt search of archival web data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 92.6%
  • XSLT 7.1%
  • Shell 0.3%