Skip to content

NewsIndexer is a document retrieval system. It has two main tasks: find relevant documents to user queries and evaluate the matching results and sort them according to relevance. This project aims to parse simple news articles and index a decent sized subset of the given news corpus. Indexing is a way of storing data to facilitate fast and accur…

chandana1332/News-Indexer

Repository files navigation

News-Indexer

NewsIndexer is a document retrieval system. It has two main tasks: find relevant documents to user queries and evaluate the matching results and sort them according to relevance. This project aims to parse simple news articles and index a decent sized subset of the given news corpus. Indexing is a way of storing data to facilitate fast and accurate information retrieval. The system consists of two main components: a parser and an indexer.The Parser is responsible for converting a given text file into a Document representation. A Document is nothing but a collection of fields.Once a given file has been converted into a Document, the IndexWriter is responsible for writing the fields to the corresponding indexes. We have four different kinds of indexes: Term index, Place index, Author index and Category index. We also provide an index introspection mechanism that can later be built upon to support queries.

About

NewsIndexer is a document retrieval system. It has two main tasks: find relevant documents to user queries and evaluate the matching results and sort them according to relevance. This project aims to parse simple news articles and index a decent sized subset of the given news corpus. Indexing is a way of storing data to facilitate fast and accur…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages