Apache Nutch

What Does Apache Nutch Mean?

Apache Nutch is a web crawler software product that can be used to aggregate data from the web. It is used in conjunction with other Apache tools, such as Hadoop, for data analysis.

Advertisements

Techopedia Explains Apache Nutch

Apache Nutch is an open-source product licensed by the Apache Software Foundation. This developer community holds licenses for a range of Apache software tools that can sort and analyze data. One of the central technologies is Apache Hadoop, a big data analytics tool that is very popular in the business community.

Along with tools like Apache Hadoop and features for file storing, analysis and more, the role of Nutch is to collect and store data from the web through the use of web crawling algorithms.

Users can take advantage of simple commands in Apache Nutch to collect information under URLs. Users typically use Apache Nutch along with another open-source tool, a framework called Apache Solr, which can act as a repository for the data collected with Apache Nutch.

Advertisements

Related Terms

Latest Emerging Technology Terms

Related Reading

Margaret Rouse

Margaret Rouse is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical, business audience. Over the past twenty years her explanations have appeared on TechTarget websites and she's been cited as an authority in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine and Discovery Magazine.Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages. If you have a suggestion for a new definition or how to improve a technical explanation, please email Margaret or contact her…