techno power...: September 2010

Monday, September 13, 2010

sm more to know abt Google...

Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site.

Googlebot was designed to be distributed on several machines to improve performance and
scale as the web grows. Also, to cut down on bandwidth usage, we run many crawlers on machines located near the sites they're indexing in the network.

Once you've created your robots.txt file, there may be a small delay before Googlebot discovers your changes. If Googlebot is still crawling content you've blocked in robots.txt, check that the robots.txt is in the correct location. It must be in the top directory of the server (e.g., www.myhost.com/robots.txt); placing the file in a subdirectory won't have any effect.

If you want to prevent Googlebot from following any links on a page of your site, you can use the nofollow meta tag. To prevent Googlebot from following an individual link, add the rel="nofollow" attribute to the link itself.

Test that your robots.txt is working as expected. The Test robots.txt tool in Webmaster Tools lets you see exactly how Googlebot will interpret the contents of your robots.txt file. The Google user-agent is (appropriately enough) Googlebot.
The Fetch as Googlebot tool in Webmaster Tools helps you understand exactly how your site appears to Googlebot. This can be very useful when troubleshooting problems with your site's content or discoverability in search results.

The IP addresses used by Googlebot change from time to time. The best way to identify accesses by Googlebot is to use the user-agent (Googlebot). You can verify that a bot accessing your server really is Googlebot by using a reverse DNS lookup

More to know-:In search engine optimization (SEO) terminology a backlink is a hyperlink that links from a Web page, back to your own Web page or Web site. Also called an Inbound Link (IBL) these links are important in determining the popularity (or importance) of your Web site. Some search engines, including Google will consider Web sites with more backlinks more relevant in search results pages. May also be written as two separate words, back link.

Bypass techniques

BYPASS FILTERS....

1.Typing IP address instead of domain name
check out the site baremetal.com where you can look up the IP address of just about any site
a better approach is to ignore the IP/URL altogether and examine the data on the web page itself. This is a little more resource intensive, but far more effective. It's much more accurate since a web ite such as Google or Yahoo can call data from other sites

2: Finding a cached version
Search providers, like Google, cache websites on a regular basis - which basically means that they save a version of the site on Google's servers. You can navigate to a cached site in Google by clicking the 'cached' button after the search result and you are still at an address run by Google that may be unblocked.The strategy for the security department here is the same as with IP addresses: Disregard the URL and inspect the content itself

3: Hiding behind encryption
Entering HTTPS in front of the web address will often give you a stripped down version of the restricted site and can be used as another technique to gain verboten access.
"There is also SSH, encrypted SOCKS, all of these different alternative channels that masquerade as web traffic on not-so-intelligent network devices
, many companies are now opting to implement web proxies and gateways that allow this type of content to be analyzed by creating a pit stop along the way

4: Using proxy servers and other privacy-friendly tools
Employees can setup their browser so that their web queries go through an encrypted tunnel to an external server which may give them unrestricted online access.GhostFox, a Firefox browser extension, has a privacy bar just below the URL bar where users can select a proxy that is privacy friendly.
If the proxy server is unencrypted, then you can inspect the traffic and block either by blocking proxy connections at your firewall and/or by looking at web page content
There may be ways to fingerprint Tor with something like an Intrusion Detection System,

5: Using smartphones
"Devices such as Blackberries that are owned and managed by the business can be restricted through group policies and proxy servers, much the same way that laptops and desktops are,"