[CCP14 Home: (Frames | No Frames)]
CCP14 Mirrors: [UK] | [CA] | [US] | [AU]

(This Webpage Page in No Frames Mode)

Collaborative Computational Project Number 14

for Single Crystal and Powder Diffraction

CCP14

Search Engines giving bad links into the CCP14 site (27th March 2001)

(Bad Bots, Bad Bots - what ya gonna do? What ya gonna do when they come for you?!)

The CCP14 Homepage is at http://www.ccp14.ac.uk

[Back to CCP14 Web/Config Main Page] | [Back to CCP14 Web/Config Misc Things]

Summary: According to user feedback, some web spiders that trawl websites (e.g., altavista) seem to be broken with respect to HTTP error documents.

Solution: Try using Google Search which seems to be the most 'with it' search engine at the moment

27th March 2001

There have been complaints that some search engines (such as AltaVista) have been linking to non-existant pages on the CCP14 site (mirrored webpages and material that have changed over time). Thus people using these search engines get masses of Bad-link pages.

e.g.,

http://www.ccp14.ac.uk/ccp/ccp14/ftp-mirror/ghostscript/ghost/aladdin/lpng105.zip
or
http://www.ccp14.ac.uk/ccp/web-mirrors/ccsl/dif/mansub/gmsame.htm

This seems to be due to broken web spider software used by Altavista - or that they are not rechecking their links as often as rival Search Engine companies. It is presently outside of the control of the CCP14 site administration.

According to the Altavista information at http://web.altavista.com/cgi-bin/query?pg=addurl: "URLs that return a 404 Error code will be removed from the index."

This is exactly what the Apache webserver on the CCP14 site is configured to output for bad page requests as per the following line from the configuration file.

ErrorDocument 404 /bad-link.html


Apache Webserver Configuration at CCP14

Apache webserver is configured to deliberately output a bad-link document (with appropriate 404 or 402 error headers) that will retain the mal-functioning URL web address in case someone has typed something in in-correctly. This makes it easily fixable (rather than do a complete retype if the browser was sent to a formal page such as http://www.ccp14.ac.uk/bad-link.html )

E.g., if a page is broken:
http://www.ccp14.ac.uk/ccp/ccp14/ftp-mirror/ghostscript/ghost/aladdin/lpng105.zip

e.g., You could then just delete lpng105.zip to browse around and re-find things.

(Apache is also setup to do simple error checking for errors in 'case' (HTM instead of htm) or minor spelling errors (htm instead of html)


Possible Solutions

Resubmission of the CCP14 site is presently problematic. There is much coporate gloss at the altavista site (combined with trying to encourage for payment for the URL submissions) and problematic freebee submission ????? (http://doc.altavista.com/addurl/)

An E-mail has been sent to them describing this problem.

Thus try using Google Search as your default search engine (27th March 2001)


[Back to CCP14 Web/Config Main Page] | [Back to CCP14 Web/Config Misc Things]

[CCP14 Home: (Frames | No Frames)]
CCP14 Mirrors: [UK] | [CA] | [US] | [AU]

(This Webpage Page in No Frames Mode)

If you have any queries or comments, please feel free to contact the CCP14