What Does Content Scraping Mean?
Content scraping is an illegal way of stealing original content from a legitimate website and posting the stolen content to another site without the knowledge or permission of the content’s owner. Content scrapers often attempt to pass off stolen content as their own, and fail to provide attribution to the content’s owners.
Content scraping can be accomplished via manual copy and paste, or may use more sophisticated techniques, such as using special software, HTTP programming or HTML or DOM parsers.
Much of the content that falls prey to scraping is copyrighted material; reposting it without the copyright owner’s permission is a punishable offense. However, scraper sites are hosted all over the world, and scrapers who are asked to remove copyrighted content may just switch the domains or disappear.
Techopedia Explains Content Scraping
Content scrapers are able to drive traffic to their websites by scraping high-quality, keyword-dense content from other sites. Bloggers are particularly susceptible to this, probably because individual bloggers are unlikely to launch a legal attack against scrapers. Scrapers are encouraged to continue this practice because search engines have not yet found an effective way to filter out unique content from scraped content, allowing scrapers to continue to benefit.
Website administrators can protect themselves against scraping through simple measures, such as adding links to their own site within the content. This will at least allow them to get some traffic from scraped content. More sophisticated methods of dealing with scraping by bots include:
- Commercial anti-bot applications
- Catching bots with a honeypot and blocking their IP addresses
- Blocking bots with JavaScript code