Click to See Complete Forum and Search --> : Curious entries in web logs


Bowtie
03-13-2005, 01:09 AM
I noticed recently that my entire web server contents were cached on Yahoo's search engine. I wasn't too thrilled about that since I had personal photos that I don't feel like sharing with the entire world. I moved the photos outside of document root so they wouldn't be visible anymore. Also I noticed that some of the images were being linked in various message boards and my bandwidth was getting hammered. I'm peering through my logs for apache and this is just one of many similar entries I came across:

202.157.192.162 - - [12/Mar/2005:02:27:07 -0600] "GET /photo/misc/owned.gif HTTP/1.1" 404 307 "http://www.myspace.com/index.cfm?fuseaction=user.viewProfile&friendID=131527&Mytoken=20050311233225" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;.NET CLR 1.1.4322)"

Is there a way to block stuff like this and if so what's A) the best way, or B) the easiest/quickest way. I would imagine hosts.deny and hosts.allow are going to come into play here. Forgive me for the lengthy post but just trying to throw out as much info as I can. Thanks in advance.

Johnny

madcompnerd
03-13-2005, 12:54 PM
Yes there is. You can quickly put a password protections on it with a quick insertion of the right stuff into a .htaccess; you can read about this on apache's website.
You can block the search engines from viewing publically available things by making a file in the same directory as the files you with to protect, call it "robots.txt" and put the right stuff in it (once again, I think apache's site may have good info on how to do that, your sashdot.org has a pretty extensive robots.txt to use as an example).