The Isearch Indexing and Search Engine is free and available for UNIX; plus:
The Isearch ftp download area is at ftp://ftp.cnidr.org/pub/software/Isearch/.
Also be wary that there seems to be another distributions and information at:
In the case of the CCP14. All html, htm and relevant text files are indexed for each virtual domain (www, alife, programming, netlib, gnu, etc). For possible regional mirroring purposes, it was decided to keep things separate.
For CCP14, because there are a variety of different "virtual domains" with their own search databases, each search database is put in it's own cgi-bin directory.
In this case, for the ccp14web index, the three config files, ifetch, ihtml and ihtml are put in /usr/local/etc/httpd/cgi-bin/ccp14/ as designated by the apache 1.3.x configuration setup. Though the CGI executables are in /usr/local/etc/httpd/cgi-bin/ so that different virtual domains (www, netlib, gnu, programming, alife) use the same executable.
#!/bin/sh # From this script, run the isrch_fetch utility and pass 4 arguments: # # isrch_fetch$1 $2 $3 # # /path/to/Isearch-cgi/isrch_fetch /path/to/my/databases exec /usr/local/apache/share/cgi-bin/isrch_fetch /web_disc/ccp14/web_area/isearch/ccp14web $1 $2 $3
#!/bin/sh # From this script, run the isrch_srch utility and pass a single argument # that is the directory where your database are stored. # # For example: # # /path/to/Isearch-cgi/isrch_html /path/to/my/databases exec /usr/local/apache/share/cgi-bin/isrch_html /web_disc/ccp14/web_area/isearch/ccp14web
#!/bin/sh # From this script, run the isrch_srch utility and pass a single argument # that is the directory where your database are stored. # # For example: # # /path/to/Isearch-cgi/isrch_srch /path/to/my/databases exec /usr/local/apache/share/cgi-bin/isrch_srch /web_disc/ccp14/web_area/isearch/ccp14web
As automirroring of webpages is implemented between 1am and 5am each morning using WGET, it is necessary that the Iindex database reflects this change after the auto-mirroring session. While an incremental update is feasible using the "-a" option, the Isearch mailing list subscribers recommend just generating the database from scratch which under this cercumstance.
Note: If the cron script does not seem to be working, check that you have either specified the full path for running Iindex or that the path is specified in the default PATH
In the .crontab file (which can then be passed into the crontab using the command crontab .crontab), put the script file that is going to be run after the automirroring. In this case, the script will run each morning at 5.07am.
05 07 * * * ./isearch.index.script
This calls a script file to regenerate the index file using the recommend method (generating a file of all the files to be indexed, then running Iindex on this file), then move it over the old one so as to minimize downtime of the indexing to a fraction of a second. The last lines send an email to ccp14@dl.ac.uk confirming the script has run and the time completed.
#!/bin/csh
# You should CHANGE THE NEXT 3 LINES to suit your local setup
setenv LOGDIR ./web_area/mirrorbin/logs # directory for storing logs
setenv PROGDIR ./web_area/mirrorbin # location of executable
setenv PUTDIR ./web_area/web_live/ccp # relative directory for mirroring
# relative due to possible kludge in wget
#can change to absolute if you wish - some internal links may not work
set DATE=(`date`)
sed "/START_Iindex/s/NOT_FINISHED/Regeneration_STARTED $DATE/" ./report-template.txt > ./report.txt.new
mv report.txt.new report.txt
rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/web_live/ -name "*.html" -type f -print > web_area/isearch/tmpfile.txt
find web_area/web_live/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.txt
find web_area/web_live/ -name "*.txt" -type f -print > web_area/isearch/tmpfile.txt2
find web_area/web_live/ -name "readme.1st" -type f -print >> web_area/isearch/tmpfile.txt2
find web_area/web_live/ -name "readme.2nd" -type f -print >> web_area/isearch/tmpfile.txt2
grep -v Ray-Tracing-News web_area/isearch/tmpfile.txt > web_area/isearch/tmpfile.txta
grep -v CCP14-by-OS web_area/isearch/tmpfile.txt2 > web_area/isearch/tmpfile.txt2a
grep -v ccp14-by-program web_area/isearch/tmpfile.txt2a > web_area/isearch/tmpfile.txt2b
/usr/local/bin/Iindex -d web_area/isearch/temp/ccp14web -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txta > web_area/isearch/summary.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/ccp14web -m 16 -t SIMPLE -a -f web_area/isearch/tmpfile.txt2b >> web_area/isearch/summary.txt
mv web_area/isearch/ccp14web web_area/isearch/ccp14webold
mv web_area/isearch/temp web_area/isearch/ccp14web
rm -rf web_area/isearch/ccp14webold
# 2>&1 - puts standard err to the file as well.
rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/xrd/web/ -name "*.html" -type f -print > web_area/isearch/tmpfile.txt
find web_area/xrd/web/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.txt
grep -v web_stats web_area/isearch/tmpfile.txt > web_area/isearch/tmpfile.txta
/usr/local/bin/Iindex -d web_area/isearch/temp/wwwxrd -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txta > web_area/isearch/summary.txt
mv web_area/isearch/wwwxrd web_area/isearch/wwwxrdold
mv web_area/isearch/temp web_area/isearch/wwwxrd
rm -rf web_area/isearch/wwwxrdold
set DATE=(`date`)
sed "/WWWXRD_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new
mv report.txt.new report.txt
set DATE=(`date`)
sed "/WWW_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" ./report.txt > ./report.txt.new
mv report.txt.new report.txt
rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/programming/ -name "*.html" -type f -print > web_area/isearch/tmpfile.programming.txt
find web_area/programming/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.programming.txt
find web_area/programming/ -name "*.txt" -type f -print > web_area/isearch/tmpfile2.programming.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/programming -m 15 -t SGMLTAG -f web_area/isearch/tmpfile.programming.txt > web_area/isearch/summary.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/programming -m 15 -t SIMPLE -a -f web_area/isearch/tmpfile2.programming.txt >> web_area/isearch/summary.txt
mv web_area/isearch/programming web_area/isearch/progwebold
mv web_area/isearch/temp web_area/isearch/programming
rm -rf web_area/isearch/progwebold
set DATE=(`date`)
sed "/PROGRAMMING_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new
mv report.txt.new report.txt
rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/alife/ -name "*.html" -type f -print > web_area/isearch/tmpfile.alife.txt
find web_area/alife/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.alife.txt
find web_area/alife/ -name "*.txt" -type f -print > web_area/isearch/tmpfile2.alife.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/alife -m 15 -t SGMLTAG -f web_area/isearch/tmpfile.alife.txt > web_area/isearch/summary.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/alife -m 15 -t SIMPLE -a -f web_area/isearch/tmpfile2.alife.txt >> web_area/isearch/summary.txt
mv web_area/isearch/alife web_area/isearch/alifewebold
mv web_area/isearch/temp web_area/isearch/alife
rm -rf web_area/isearch/alifewebold
set DATE=(`date`)
sed "/ALIFE__Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new
mv report.txt.new report.txt
rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/netlib/ -name "*.html" -type f -print > web_area/isearch/tmpfile.netlib.html.txt
find web_area/netlib/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.netlib.html.txt
find web_area/netlib/ -name "*.txt" -type f -print > web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "readme" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.c" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.src" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.f" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "manual" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "manlc" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "helplc" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "imsl" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "nag" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "port" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "siam" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "index" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "doc" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "source" -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.text" -type f -print >> web_area/isearch/tmpfile.netlib.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/netlib -m 25 -t SGMLTAG -f web_area/isearch/tmpfile.netlib.html.txt > web_area/isearch/summary.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/netlib -m 25 -t SIMPLE -a -f web_area/isearch/tmpfile.netlib.txt >> web_area/isearch/summary.txt
mv web_area/isearch/netlib web_area/isearch/netlibwebold
mv web_area/isearch/temp web_area/isearch/netlib
rm -rf web_area/isearch/netlibwebold
set DATE=(`date`)
sed "/NETLIB_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new
mv report.txt.new report.txt
rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/xrd/web/ -name "*.html" -type f -print > web_area/isearch/tmpfile.txt
find web_area/xrd/web/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/wwwxrd -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txt > web_area/isearch/summary.txt
mv web_area/isearch/wwwxrd web_area/isearch/wwwxrdold
mv web_area/isearch/temp web_area/isearch/wwwxrd
rm -rf web_area/isearch/wwwxrdold
/usr/sbin/Mail -s "Isite_Isearch_Creation_Results `date`" ccp14@ccp14.ac.uk < ./report.txt
Operation of Isearch-cgi
------------------------
1) Create access points to databases
Create a base HTML file with the program search_form. It takes
two arguments: the path to your databases, and the name of the
database this new page should access. The page is printed to
standard output, so you may redirect it to a file if you like.
search_form /home/databases TEST > form.html
There is another, optional argument that indicates to
search_form which type of search page you wish to generate. The
form types are:
-simple
-boolean
-advanced
-html
If no type is given to search_form, it will default to -simple
Examples:
search_form -simple /home/databases TEST > form.html
search_form -boolean /home/databases TEST > boolean.html
search_form -advanced /home/databases TEST > advanced.html
search_form -html /home/databases TEST > htmlform.html
For example, to generate this for the CCP14 crystallographic Iindex database, you would use the command lines:
search_form -simple /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > form.html search_form -boolean /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > boolean.html search_form -advanced /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > advanced.html search_form -html /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > htmlform.htmlThen edit the resulting html file to get it in the form you like. In the case of the CCP14 Crystallographic search page, only the boolean and advanced search have been used. Full Text; TITLE, HEAD and ADDRESS are searchable fields with "Full Text" being the default. With TITLE, HEAD, ADDRESS being the result display options and TITLE being the default. AND, OR, NOT and NEAR being menu selected options to relate keywords with AND being the default.
Isearch-CGI Setup for the web
(There is also a Word Document that goes into the setup of the Web Interface for Isearch)