The googlebot monopoly

1 · Dan Luu · May 27, 2015, midnight
TIL that Bell Labs and a whole lot of other websites block archive.org, not to mention most search engines. Turns out I have a broken website link in a GitHub repo, caused by the deletion of an old webpage. When I tried to pull the original from archive.org, I found that it's not available because Bell Labs blocks the archive.org crawler in their robots.txt: User-agent: Googlebot User-agent: msnbot User-agent: LSgsa-crawler Disallow: /RealAudio/ Disallow: /bl-traces/ Disallow: /fast-os/ Disallo...