Introduction

The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet using the open source webcrawler Heritrix from Internet Archives.

The NetarchiveSuite is a complete web archiving software package developed from 2004 and onwards. The primary function of the NetarchiveSuite is to plan, schedule and run web harvests of parts of the Internet. It scales to a wide range of tasks, from small, thematic harvests (e.g. related to special events, or special domains) to harvesting and archiving the content of an entire national domain. The software has built-in bit preservation functionality. The systems architecture allows for the software to be distributed among several machines, possibly on more than one geographical location.

See the wiki documentation for further details.

History

NetarchiveSuite was initially developed by The Royal Danish Library (KB) and The Danish State and University Library (SB) together as part of the National Netarkivet.dk initialtiv. The first release of the platform was in July 2005 and Netarkivet.dk has since used NetarchiveSuite to harvest Danish websites as required by the latest Danish Legal Deposit Act. As of Januar 1st 2017, The Danish State and University Library is now part of The Royal Danish Library NetarchiveSuite was released on July 2007 as Open Source under the LGPL license. The French National Library (BnF) and the Austrian National Libraries (ONB) joined the project in 2008.

Getting Started

See the Quickstart manual for a short guide on how to setup a small standalone system.

Developer documentation

See developer wiki.

Name		Name	Last commit message	Last commit date
Latest commit History 4,788 Commits
.run		.run
archive		archive
build-tools		build-tools
common		common
deploy		deploy
hadoop-uber-jar-invoker		hadoop-uber-jar-invoker
hadoop-uber-jar		hadoop-uber-jar
harvester		harvester
integration-test		integration-test
monitor		monitor
quickstart-vagrant-environment		quickstart-vagrant-environment
quickstart		quickstart
wayback		wayback
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
Narcana.md		Narcana.md
README.md		README.md
pom.xml		pom.xml
precommit.sh		precommit.sh
reset_gitignore_files.sh		reset_gitignore_files.sh

License

netarchivesuite/netarchivesuite

Folders and files

Latest commit

History

Repository files navigation

Introduction

History

Getting Started

Developer documentation

About

Resources

License

Stars

Watchers

Forks

Languages