<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Crazy Mind &#187; Internet security</title>
	<atom:link href="http://www.lunaticmarks.com/category/internet-security/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lunaticmarks.com</link>
	<description>A Webmaster Blog</description>
	<lastBuildDate>Mon, 08 Oct 2007 10:00:15 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Scrapers exploit the sitemap.xml and make easy money</title>
		<link>http://www.lunaticmarks.com/scrapers-exploit-the-sitemapxml-and-make-easy-money/</link>
		<comments>http://www.lunaticmarks.com/scrapers-exploit-the-sitemapxml-and-make-easy-money/#comments</comments>
		<pubDate>Mon, 07 May 2007 06:04:48 +0000</pubDate>
		<dc:creator>Ravi</dc:creator>
				<category><![CDATA[Internet security]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Scraper sites]]></category>
		<category><![CDATA[cloaking]]></category>
		<category><![CDATA[keywords]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[search engine spiders]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[website security]]></category>

		<guid isPermaLink="false">http://www.lunaticmarks.com/?p=118</guid>
		<description><![CDATA[ What Gets the scrapers to target your website in the first place?
1. If Your website is a very popular site in your niche and getting lots of traffic from search engines, it means that your website URLs are crawled very highly and this makes it easy for scrapers to steal the content and make [...]]]></description>
			<content:encoded><![CDATA[<p><strong> What Gets the scrapers to target your website in the first place?</strong></p>
<p>1. If Your website is a very popular site in your niche and getting lots of traffic from search engines, it means that your website URLs are crawled very highly and this makes it easy for scrapers to steal the content and make a <strong>MADE FOR ADSENSE(MFA)</strong> sites putting your content.</p>
<p>2. Some are marketing analytics for advertising companies to gather data about you and your company and sell it to advertisers for profit. The marketing strategies involve continuous observations on following factors</p>
<li>Charting Your Internet Mind Share and Buzz Index with sites like compete.com, quantcast.com or spyfu.com gives good info about your websites</li>
<li>Tracking On-Line Opinion and Issues</li>
<li>Listening In on Word of Mouth and </li>
<li>Customer Generated Media — Blogs,Consumer     Portals, Special Interest Sites, Political Cause Networks, On-Line News Services, and Archives.</li>
<p>In the recent times, Many people seem to post about sitemap.xml suffering a problem with content. In the sitemap you give a title, description and URL of the webpages in your website</p>
<blockquote><p>Is the new content title and meta tag scraped before the sitemap is submitted to google by sitemap generators? And the Answer is <strong>YES</strong></p></blockquote>
<p>The sitemap.xml file hands over a list of urls of website directly to any scraper who wants to make use of it for cloaking</p>
<blockquote><p><strong>Cloaking is primarily used to show an optimized page to the search engines and a different page to humans</strong></p></blockquote>
<p> Excessively scraped sites can struggle in the SERPs- This means that When someone mirrors your content it&#8217;s possible for your page/site to get hit with a <strong>duplicate content penalty.</strong></p>
<p><strong>Some Ideas to make it hard for Scrapers</strong> </p>
<li>Including sitemap reference in robots.txt should be abandoned and all sitemaps submitted via ping to all search engines that use them and random generated file each time a sitemap is created. </li>
<li>A seperate tool by search engines that allows you to generate an .xml sitemap and as these are only for search engine use I see no reason name of file could not be randomly generated and it could also delete previous sitemap file.</li>
<li>A safe sitemap generator benefit in many ways than a free sitemap generator which might send info to scraper sites without your knowledge. I would trust one from search engines.</li>
<p>But&#8230;.</p>
<p>Any time you give scrapers a clear path to avoid honey pots and spider traps they&#8217;ll use it. With that said, the scrapers can simply scrape a search engine first using site:mydomain.com to get the equivalent of a sitemap and avoid your spider traps anyway.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lunaticmarks.com/scrapers-exploit-the-sitemapxml-and-make-easy-money/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
