ESPOIR

A special purpose Reverse Monte Carlo code for ab initio crystal structure determination, either really from scratch or by molecule location, fitting either to "|Fobs|" extracted by powder diffractometry or to single crystal data

A. Le Bail - April-June 1999 - version 2.01
alb@cristal.org - http://www.cristal.org/

DOWNLOAD ESPOIR
(compressed .zip file, 600Ko)


Structures may come across by chance,
thanks to Monte Carlo ;-)


Content
Introduction and how running the program
Package
Parameters in the .dat file
Structure factors .hkl file
Output files
Strategies
Next to do in ESPOIR


Introduction

The RMC (Reverse Monte Carlo) code built in ESPOIR was strongly inspired from the RMCA program written by Malcolm Howe, using the 2.11 version for glass structure modelling (last altered as such by 26th May 1992). However, please note that a more recent version (RMCA 3.04) is available. See R.L. McGreevy, Nucl. Instr. and Meth. in Phys. Res. A354 (1995) 1-16, for one of the most recent reviews about RMC, and visit the RMC homepage at Studsvik. In this paper it is said that "RMC modelling will certainly not enable ab initio crystal structure models to be obtained starting from random initial structure "(p. 10, § 4.2). On the contrary, ESPOIR shows that this is possible by using the pertinent strategy, though the limit seems to be near of 30 independent atoms maximum. This limit can be overcome by using the Molecular Replacement (MR) method also inserted in ESPOIR (version 2).

A more thorough introduction will be published in the next CPD Newsletter (you may download a preprint as a MS Word97 document compressed by Winzip).

ESPOIR in french = HOPE in english (something not to lose when dealing with structure determination). However, do not load too much hope in ESPOIR, you could be deceived.

Running the program

This manual will explain how ESPOIR can be used, hopefully. Two files are necessary :

                         name.hkl       containing the "|Fobs|", from any origin (powder data or single crystal)
                         name.dat       containing parameters for defining the model (scratch or molecule fragment)
                                              and piloting Monte carlo
                                              (it is like trying to pilot a bottle in the ocean : you just can hope to attain
                                               your expected destination, but ESPOIR will guide you somewhere that
                                               you may not have wished).
                                              This file can be partly or fully prepared by answering to the questions
                                               of PRESPOIR, in interactive mode.

The PC version will run by clicking on espoir.exe, opening a DOS box, in which the generic name of the two above files will be asked for (do not give any extension, only name), and then you will have to wait a lot (between a few seconds and several days, depending of your problem complexity).

Source code and latest modifications

You may consider building a version for your own computer if you possess a FORTRAN compiler. The source code is included in the package. Moreover, the GNU licence allows you to hack the code at your convenience.

ESPOIR 0.9
The main modifications in ESPOIR 0.9 from the original RMCA code consisted in adding the |F| calculations for working on crystalline compounds instead of glasses, and the possibility to permute atoms.

MODIFICATIONS IN ESPOIR 1.0 (still available)
The main modification since the first ESPOIR 0.9 version, still available, is the possibility to work with any space group, not only P1. Moreover, the contrainsts on distances and coordination numbers were found useless and suppressed (with or without those constraints, the retained atom moves are almost the same), leading to computer time saving. Some simulated annealing was introduced, progressively reducing the distances the atoms can move. The possibility to accept events that do not improve the fit was introduced, annealed too. When facing obviously false minima with the structure model frozen at high R factor, the calculation automatically restarts from a different random configuration, according to parameters selected by the user. Many simple and understandable parameters were added that allow the user to control more closely the way ESPOIR is working. Optimization of the |F| calculation was done by keeping the whole stuff in memory and changing only the arrays parts concerned by the particular moving atom or the pair of atoms permuting.

MODIFICATIONS IN ESPOIR 2.0
Two main modifications concern :
-  Fit possible on a pseudo powder pattern regenerated from the extracted "|Fobs|" - this allows keeping the whole set of extracted structure factors, and speed is enhanced if compared to a fit on the true pattern. Moreover, the step is variable : indexed on the FWHM. An output in .prf readable by DMPLOT is built (however, it will be fully operational only if the step is constant, of course : use U=V=0 in the Caglioti law).
-  Molecular replacement method by rotation + translation of (only) one fragment (molecule location). Test files for molecular replacement are : pyrene (from X-ray data) ; 1-methylfluorene (without C14) ; cimetidine (from synchrotron data) ; SDPDRR sample II (tetracycline hydrochloride, test successful without considering the Cl atom) ; SDPDRR sample I (cobalt amine, test successful by just searching for a CoN5O octahedra)
Minor modifications concern :
-  Option of minimal interatomic distances reinserted.
-  Annealing law revisited.
-  Special positions considered (not all possibilities, but you have the source code, don't you ?)..

Version 2.01 allows to treat some data affected by twinning by merohedry. See the parameter ns=2.

Introduction of the possibility to cope with several fragments simultaneously, and with torsion angles, will be for version 3, maybe.


Package

This ESPOIR 2.0 package, espoir2.zip (600Ko) contains :

          espoir.exe      :  ESPOIR 2.0, executable for Win95/98/NT
          espoir.f          :  Fortran code
          espoir.ico      :  Icon for ESPOIR
          prespoir.exe   :  Executable for preparing the .dat file
          prespoir.f       :  Fortran code
          prespoir.ico   :  Icon for PRESPOIR
          random.exe    :  Executable for random positions generation
          espoir.html    :  This manual in HTML language
          espoir.gif       :  Logo
          al2o3.gif        :  Figure showing a pattern "regenerated" from the "|Fobs|"
          license.html   :  The GNU license applying to this software, in HTML language

together with the example files (in principle, for each example are given 3 files, the name.dat containing instructions, the name.hkl containing reflections, and the name.imp file containing the results) :

The codes used below are meaning :
(MR) = Molecule replacement ;
(S) = from scratch (i.e. random starting model) ;
(P) = from regenerated powder pattern ;
(Fo) = from raw extracted "|Fobs|" ;
(Fc) = from calculated exact |F| ;
(GP) = guessed special positions ;
(DC) = distance constraint ;
(GO) = guessed occupation number ;
(CC) = Cartesian coordinates ;
(FC) = fractional coordinates ;
(PR) = restarted from a previous result ;
(EX) = experimental data ;
(CA) = calculated data : from ICSD or CSD atomic coordinates, a theoretical
            powder pattern is built with the U,V,W of PbSO4 (Rietveld Round Robin),
            then the "|Fobs|" are extracted from this pattern by FULLPROF ;
(E1) = with ESPOIR version 1 (care to data incompatibilities with version 2).

Some .dat files contain sometimes unused data at the end : frequently the exact coordinates that you should find.

    generic file name

Data from the Endeavour challenge (Scratch)
           al2o3          :  Al2O3 in R-3c                                           (S)(P)(GO)(CA)
           al2o3F        :  as above, but                                             (S)(Fo)(GO)(CA)
           al2o3GP     :  as above, but                                             (S)(P)(GP)(GO)
           aragonite    :  CaCO3 in Pmcn                                         (S)(P)(GO)(CA)
           caf2            : CaF2 in Fm3m                                           (S)(P)(GO)(CA)
           calcite        : CaCO3 in R-3c                                          (S)(P)(GO)(CA)
           forsterite    : Mg2SiO4 in Pbnm                                      (S)(P)(GO)(CA)
ESPOIR original data (Scratch), many are in ESPOIR 1.00 format
          cuvo3           : CuVO3 in P 1                                           (S)(P)(CA)
          cuvo3c         : CuVO3 in P -1                                          (S)(P)(CA)
          TeI               : TeI in P 1                                                 (S)(P)(CA)
          TeIC            :  TeI in P -1                                              (S)(Fc)(E1)
          pbso4P1      :  PbSO4 in P1                                            (S)(Fc)(E1)
          pbso4          : PbSO4 in Pnma all atoms in general position  (S)(Fc)(P)
          pbso41        : PbSO4 in Pnma with guessed special positions (S)(Fc)(P)(GP)(E1)
          Im2m           : Ba2CdP3O10(OH) in Im2m space group (S)(P)(GP)(EX)
          coamin        : [Co(NH3)5CO3]NO3.H2O in P1, good |Fs| (the famous SDPD Round Robin sample I) (E1)
          coP21         :  [Co(NH3)5CO3]NO3.H2O in P21, good |Fs| (E1)
          zhu5           :  [Co(NH3)5CO3]NO3.H2O in P21, with "|Fobs|" extracted (E1)
                                   from the SDPD Round Robin sample I data, by the Le Bail method,
                                   excluding reflections having a neighbouring one at less than 0.05 two-theta degrees
          cim01         : Test for cimetidine in P21/n with good |Fs|      (E1)
New Molecule Replacement (MR) files
          zhutest         : hypothetical pattern for 2 CoN6 octahedra in P21             (MR)(CC)
          zhu              :  [Co(NH3)5CO3]NO3.H2O in P21, full "|Fobs|" set        (MR)(P)(EX)
          cim0            : Test for cimetidine in P21/n, full "|Fobs|" set                   (MR)(P)(EX)
          pyrene         : pyrene                                                                               (MR)(P)(CA)
          methyl         : 1-methylfluorene                                                               (MR)(P)(CA)
          testtetra       : SDPDRR sample II : tetracycline hydrochloride              (MR)(P)(EX)


Parameters in the .dat file

An example delivered with this version (cuvo3c.dat) is detailed below :
 

Test on CuVO3                            :   text for this run
4.9646 5.4023 4.9154 90.32 119.13 63.93  :   cell parameters
P -1                                     :   SG : Space group
1.54056 4 5 3 1                          :   wa, kxr, na, nt, ns
0.09 -0.03 0.04 3                        :   U, V, W, Nstep (optional line if ns = 1)
cu  v   o                                :   atom type names (nt names, no capital letters)
1 1 3                                    :   ni : numbers of atoms in each type (nt values)
1.1 0 0 0 1                              :   bov, nocc, ncon, nspe, ipri      
1.0 0.008                                :   sigma, reject
3. 3. 3.                                 :   delta : max moves for each atom types
2.                                       :   nanneal : allowing to reduce delta according to a defined law (see below)
5000                                     :   n1 : print after n1 moves
100000 20000                             :   n2 and n3 : end at n2 moves; save at n3 moves
20000 0.25 2                             :   nstart, rmax, ichi  
10                                       :   n4 : try one permutation after n4 moves or one translation after n4 rotations (MR)
10                                       :   n5 : number of random different starting models
 -0.0760  0.3091 -0.5059 1.              :   na lines of x,y,z coordinates and occupation number
 -0.6436 -0.8249 -0.7986 1.                       to be given only if n5 = 0
  0.3537  0.6792  0.6986 1.                       (given here just to show, since n5=10)
  0.9300 -0.1930  0.9905 1.
 -0.1040 -0.0487 -0.3215 1.              :   care to make a return here


Parameters definitions

text         (format 20A4)   A title for the run.

cell         The cell parameters a, b, c, alpha, beta, gamma (free format, 6 real).

SG        The space group to be interpreted by Prof. Burzlaff's subroutine.
                   In principle, any SG (with inversion center at the origin) should work.
                   Examples (use blanks appropriately) :
                   P 1                     P-1               P 21/C              P 21/N                C C                C 2/M
                   P MMM            P N M A       I M 2 M            F M M M            C M C M       C 21 21 21
                   I 41/A               I 4/M M M    ...
                   P 63/M M M     R -3 C          P 3 2 1              ...
                   I M 3                 P -4 3 M       I M 3 M            F D 3 M        etc
                   Verify the printed symmetry operators (24 maximum, since adding atoms due to F and I
                   Bravais lattices has no influence on intensity, and inversion center is treated apart).

wa, kxr, na, nt, ns
                  One real and 3 integers (free format)
                   wa    = Wavelength, only the following ones are recognized for delta-f
                               and delta-f" anomalous dispersion terms for X-ray
                                2.28962  1.93597  1.54051  0.70926  0.556363
                   kxr   = Allows to define neutron data (kxr=0) or X-ray data (kxr=4)
                   na    = Total number of atoms (max = 200) in the asymetric unit
                   nt     = Number of different atom types (max = 8)
                   ns    = Code defining the job type according to data
                              ns = 0 for working on "|Fobs|" (they must be quite good, without too much overlapping)
                                                            or single crystal data (ns = 0 recommended ! why degrading data ?)
                              ns = 1 for working on the regenerated powder pattern (overlapping no matter),
                                      note that the profile shapes are Gaussian but this is not so important, the
                                      trick is to treat overlapped data as overlapped data, no more ;-)
                             ns = 2 supposes that you have a twinning hypothesis on single crystal data.
                                     Only merohedry is considered with 2 domains (ns=2) at 50% in volume.
                                     When ns = 2, the next 3 lines should be the 3x3 matrix transforming the
                                     hkl of domain 1 into the hkl of domain 2. For instance :
                                     0 1 0
                                     1 0 0        will transform hkl into kh-l
                                     0 0 -1
 

U, V, W, nstep      Optional line occuring only if ns = 1 (see above)
                  U, V, W = Caglioti law refined when the "|Fobs|" were extracted
                  nstep = number of points that you estimate useful above the FWHM
                              (try nstep = 3 to 5), if nstep is given smaller that 3, it will be reset to 3.

name1, name2...        n atom names  (characters in format nA4 where n = nt)
           DO NOT USE CAPITAL LETTERS
                                   In principle, ionic definitions are recognized
                                   like o-2, al+3, ca+2, f-1, ba+2, etc. (max =8)

ni1, ni2...         nt values giving the number of atoms in each type in the asymetric unit
                         in the same order as their names above (integers in free format).
                         (sum of ni values is = na, max 200)

bov, nocc, ncon, nspe, ipri : one real and 5 integers (free format)
                    bov = Overall thermal B factor (real in free format)
                              Use a value near of 1.0 or 1.5 for inorganic materials
                              and 3.0 or more for organic compounds.
                    nocc = code for reading individual guessed occupation factors
                               if = 1 : a next optional line should give all the occup. factors
                               if = 0 : the program generates all occup = 1.
                    ncon = code for constraints on shortest interatomic distances
                               if = 1 : read rcut below
                               if = 0 : no constraints
                    nspe = code for general or special positions
                              if = 1 : read special positions codes nsp below
                              if = 0 : all atoms in general position, do not read anything
                    ipri = code for printing Iobs/Icalc and Fobs/Fcalc at the end of a run
                              if = 0 : no printout
                              if = 1 : printout

Of course, if nocc=1, and ncon=1, and nspe=1, then 3 optional lines should be given in that same order

And the special position codes (nsp) currently defined in the program are :

 nsp code    position
   1         x,y,z (general position)
   2         x,x,x
   6         x,0,z
   9         x,y,0
  10         0,y,z
  13         x,0,0
  12         0,y,0
   8         0,0,z
   4         x,1/4,z
   5         0,1/2,z
   7         x,0,1/4
  11         1/2,y,0
   3         0,0,0
If you want more : ask me for them or do it yourself (you have the source code, don't you ?). There are two places where you will have to act. In the main program :
9000  IF(MR.NE.0) GO TO 400
      GO TO (1,2,3,4,5,6,7,8,9,10,11,12,13), nsp(i)
and in the esp_genmove subroutine.
 

So : care that there could be up to 3 optional lines here :
        occupation factors (real in free format)
        rcut values (real in free format) (shortest interatomic distances
                            given in the order 11, 12, 13, 21, 23, 33 for 3 different atom types, for instance)
        nsp codes for special positions (integers in free format)

If you have 100 atoms, your optional lines may extend on several lines. The important point being that
       the expected number of values is found by the program.

sigma, reject       (reals)
                              sigma =  standard deviation on |F| (an overall value).
                                    The best is to explore different values. Data are arbitrarily normalized
                                    for having a mean |F|=50. You may try sigma=1. at the beginning with
                                    delta values (see below) of the order of the maximum cell parameter,
                                    and then reduce to sigma = 0.1 or 0.01 in further tests with delta values
                                    in the range 0.1-0.5A.
                                    This parameter is of no use if you select ichi=2 below.
                             reject = test for accepting randomly 40% moves that do not improve the fit
                                          Anyway, all events that lead to delta(R) < -reject are really rejected,
                                          where R (< 0) is the reliability on |F|
                                          Try reject = 0.01 or 0.005, and observe the number of kept events.
                                          Remember that a global decrease of R is searched, so that reduce
                                          reject if R does not finally decreases.
                                          This could help in not being trapped in a false minima.
                                          The value of reject is dumped by the nanneal parameter (see below),
                                          progressively reduced to zero up to n2 (see below), the total number
                                          of events, is reached.

delta1, delta2..        (reals)     The maximum  move  for each type of atom.
                                  Recommended values are in the range 0.1-0.5 in the final stages.
                                  Use values of the order of 5 Angstroms at the beginning (or more, up to
                                  the cell parameters). Otherwise, you may stick to a false minima.
                                  A value of zero is possible and will allow only some types of atoms to move.
                                  delta values are progressively damped by the anneal parameter below.
                                  (max = 8)

nanneal       (real)      Move amplitudes will be progressively reduced following the equation :
                                    move=move*dump
                                   dump=(1.-ngent/ngenmax)**nanneal
                                   ngent = number of generated events during the program execution
                                   ngenmax = maximum number of events allowed (see n2 below)
                                             for nanneal=1, the reduction will be linear
                                                  it is suggested to use nanneal=2
                                  note that dump will apply on atom moves but also on molecule translations if
                                                 Rp(F) or RF < rmax (see rmax definition below)
 

                               This is a way for doing some simulated annealing, avoiding sometimes
                                  the necessity to make two steps (one step with large move amplitudes, and
                                  a subsequent step with smaller move amplitudes)

n1            (integer)   Determines how often a summary will be written to the standard output.
                                   It will be every after n1 events generated (moves + permutations) except that
                                   it will only occur when an event is accepted.

n2, n3     (integers)  n2 = The total number of events the program should run for.
                                   n3 = The number of events afterward the results will be saved
                                          (possibly several times in a run).

nstart, rmax, ichi     If after nstart (integer) events (moves + permutations), the R factor is
                                 still higher than rmax (real), then restart from a new random configuration,
                                 unless the total number of allowed starting models (n5 see below) is
                                 already attained.
                                 ichi (integer) determines the test made for accepting or rejecting an event :
                                      ichi = 1 : the test is made on the decreasing of
                                                   Sum on   (|Fobs|-|Fcalc|)**2/sigma**2
                                      ichi = 2 : the test is made on the decreasing of R :
                                                   Sum on  | |Fobs|-|Fcalc| | / Sum on |Fobs|
                                      Try both, however, there seems to be no clear difference.
                                     nstart = 40000 and rmax = 0.2 - 0.3 is fine for small structures
                                     nstart = 120000 and rmax = 0.35 -0.40 could work for large structures

n4           (integer)  This parameter may have 2 meanings according to the choice of
                                   a run from scratch (random atoms), or a run from a molecular model

                                  If scratch : try permutations of atoms after n4 moves. Examples :
                                  If n4 = 10, the ratio of atom moves and permutations will be 10 for 1.
                                  If n4 = 1, only permutations will occur
                                  If n4 = 0 only atom moves will occur
                                  care that some combinations of constrained occupation numbers may not
                                  allow any permutations. If permutations are not allowed, the program will
                                  infinitely loop, but you will be given a message ;-)
                                  try n4=10 to 100 like you wish (most test files use n4=10)

                                 Or, in case or Molecular Replacement,
                                                  try translations of model after n4 rotations
                                  If n4 = 10, the ratio of rotations and translations will be 10 for 1.
                                  If n4 = 1, only translations will occur (not recommended in the general case)
                                  If n4 = 0 only model rotations will occur (not recommended as well)
                                      Rotations are made around the molecule or fragment center of gravity
                                  try n4 = 2 to 25 or more if you wish (many test files use n4=2 or 4)

n5         (integer)   |n5| is the number of runs (try 5, 10 or 50 or 100... but care to computer time)
                                 if n5 > 0 the job concerns random starting models and data stop there
                                 if n5 = 0 the job will reuse previous atomic coordinates and the x,y,z,occup
                                               should be given just below
                                 if n5 < 0 the job concerns Molecule Replacement and the next line should be
                                       either a,b,c,alpha,beta,gamma of the cell in which is described the molecule
                                       or  0. 0. 0. 90. 90. 90. if the model is described with Cartesian coordinates
                                          and the following lines will then be the x,y,z,occup values as described below

x,y,z,occup     (reals)    na lines of atomic coordinates and occupation numbers (max 100 atoms)
                                        To be given only if n5 = 0 and n5 < 0
                                         occup=1 means a general position fully occupied
                                        Note that if n5 is different from 0, then occup will always be = 1
                                        for any atom (the program cannot decide in your place), or defined by nocc parameters.
                                        How many atoms in general and special positions ? You have to guess !
                                        You may put there either :
                                                      - random coordinates obtained from RANDOM.EXE
                                                      - one result from a previous test, that you want to continue
                                                         with different Monte Carlo parameters.
                                                      - your fragment or molecular model in either cartesian or fractional
                                                        coordinates


Structure factors .hkl file

The organization in the .hkl file is quite simple :
One line for the number N of hkl (N maximum is 1000), and then N lines including h, k, l, |Fobs|. An example is below. Data are not formatted, just list 3 integers and one real in free format.

A sufficient number of hkl could be 10 reflections for one atom in the asymetric unit.

 120
   0   1   0  21.580          you may find possibly next values in the test files
   0   0   1  39.622                       but they are ignored
   1   1   0   9.749   
   1   0  -1  29.746   
   1   0   0 195.923   
    ...
    ...
    ...
   4   1  -2 128.143   
   3   4   0 159.884   
   0   3   3 142.716   
   3   2   1   8.925   
   2  -2   0 349.860        Care to make a return here.

Output files

ESPOIR will create 5 output files :
             name.imp            :    will contain all intermediate results
             name.res             :    will contain the best result in SHELX format
             name.spf             :    will contain the best configuration almost ready for searching
                                             symmetry by PLATON
             namenew.cfg      :    will contain the best configuration ready for a copy-paste in name.dat
                                             for further cycles with n5=0 for instance.
             name.prf             :    if the option ns = 1 is chosen. Will contain the "observed" and
                                             regenerated powder patterns in a format readable by DMPLOT.
                                             Note however that when the U and V parameters are different from 0,
                                             the step will be variable so that DMPLOT will not produce the correct
                                             angles and reflection positions.


Strategies

From Scratch
Endeavour challenge examples
CuVO3
TeI
PbSO4
Ba2CdP3O10(OH)
[Co(NH3)5CO3]NO3.H2O
Cimetidine (pharmaceutical)

Molecular Replacement
Finding an octahedron
[Co(NH3)5CO3]NO3.H2O
Cimetidine (pharmaceutical)
pyrene
1-methylfluorene
tetracycline hydrochloride (pharmaceutical)

Last words


ESPOIR needs a cell, a space group and structure factors. That means that you should have indexed the cell, guessed a space group, and either extracted the "|Fobs|" from a powder pattern by ways at your convenience (see the SDPD tutorial), or recorded single crystal data. Care to the quality of your "|Fobs|"...

The n5 parameter is the key for applying the choosen strategy :
                    n5 > 0 : scratch (i.e. random starting model)
                    n5 < 0 : molecule location.

The other key is the ns parameter, depending on the data quality (overlap or not) :
                   ns = 0 : working on "|Fobs|" (they must be quite good, without too much overlapping)
                               or single crystal data (ns = 0 recommended ! why degrading data by option ns=1?)
                   ns = 1 : working on the powder pattern regenerated from the "|Fobs|" (overlapping no matter),
                                note that the profile shapes are Gaussian but this is not so important, the
                                trick is to treat overlapping data as overlapping data, and gain on speed, no more ;-)

How many reflections ?
                 The first 100 "|Fobs|" were sufficient for all Molecule Replacement (MR) examples.
                 Ten reflections by independent atom is the minimum for a scratch test.

How many Monte Carlo events ?
                 For scratch : 5000 to 8000000 were used, depending of the problem complexity.
                 For Molecule Replacement : 20000 to 1000000 should be sufficient.
 

From Scratch

The recommended strategy is to try with all the possible space groups (for instance, you may have a case where you should try Immm, I222, I212121, Imm2, Im2m, and I2mm), but maybe in P1 if your problem does not exceed 30 independent atoms in that space group. The problem of determining if atoms are on special or general positions is your problem : think. However, you will see below that Pb in PbSO4 in the Pnma space group is always found near of the special position x, 1/4, z, whatever you decide to fix the occupation number to 1 or to 0.5 (heavy atom problems are the simplest).

ESPOIR works generally better in P1 space group. If you decide to give a try in P1, you need to present the "|Fobs|" as if they were corresponding to the P1 space group, by separating those hkl reflections having a multiplicity greater than 2 (anyway, do not separate hkl and -h-k-l, because the program adds them as if the data were coming from a powder diffraction measurement). For instance, in the monoclinic P21/m space group, you may have extracted the following "|Fobs|":

   1   0   1    35.
   1   1   0    75.
   0   1   1    27.
   1   0  -1    38.
   1   1   1    45.
   1   0   2    34.
   1   1  -1    89.
   0   2   0    23.
And you may decide to try in triclinic acentric. Then, these data must be presented differently to ESPOIR for matching to the P1 space group :
   0   1   0     0.   unique : add all forbidden reflections with "|Fobs|" = 0.
   1   0   1    35.   unique (the 1 0 -1 should appear later)
   1   1   0    75.   double (but keep the same "|Fobs|" value

   1  -1   0    75.
   0   1   1    27.   double
   0  -1   1    27.
   1   0  -1    38.   unique
   1   1   1    45.   double
   1  -1   1    45.
   1   0   2    34.   unique
   1   1  -1    89.   double
   1  -1  -1    89.
   0   2   0    23.   unique


Extinctions should obviously be considered as giving |Fhkl| = 0. and therefore this is a quite useful information for ESPOIR that you should absolutely include. Of course, the "|Fobs|" quality is essential. When using a set of structure factors extracted from powder data by either the Pawley of Le Bail methods, you should keep only those reflections that are reasonably sure (use a dataset reduced with the OVERLAPsoftware, for instance).

The starting configurations are built up by using a generator of random positions inserted in ESPOIR. But, if you want only one test with ESPOIR (n5=0), the program RANDOM provided with the package will help you to prepare data. Just give to RANDOM the number of random positions to generate (corresponding to na, the total number of atoms in the asymmetric unit). The result will be saved in a .cfg file that you may edit. Copy-paste your coordinates in the .dat file, and guess if your atoms occupy special or general positions. If n5 is equal to 1 or more, ESPOIR generates automatically the random starting configurations, with full occupation numbers for all atoms.

Note that you may not retrieve exactly the same results as for the examples below because the generator of random numbers is really efficient (hence the bottle in the ocean)...  Those examples are corresponding to mederately complex crystal structures, giving you maybe the limits of feasibility with the current ESPOIR version 2.0. In fact, the two first cases belong really to the P-1 space group, the third case (PbSO4) is one of the Rietveld Round Robin sample (Pnma). The sample I of the SDPD Round Robin (cobaltamine) for which no participant proposed a model is also treated as well as the famous cimetidine. But the SDPDRR sample II structure could not be obtained from scratch, up to now (33 atoms in general position in P212121 space group, that would lead to 132 atoms in P1 : not tried, we have to wait for 10 or 100GHz computer speed).

Those examples are sorted from the simplest to the more complex.

Endeavour challenge examples (Al2O3, aragonite, CaF2, calcite, forsterite)
The following examples are taken from the Endeavour software list of test files. This being done in the context of the ESPOIR-Endeavour challenge announced on the SDPD Mailing List.

Al2O3
This example will illustrate how guessed constraints on occupation numbers and special positions allow to obtain more surely the solution, provided the constraints are true...

The first example (al2o3.dat) has ns=1 and so works on the pattern regenerated from the "|Fobs|", and n5>0, thus working in scratch mode, with n5=10 (10 independent runs) :

Al2O3 R-3c 
4.764 4.764 13.009 90.0 90.0 120.0
R -3 C
1.54056 4 2 2 1                              <-- last value ns = 1
0.02511  -0.04562   0.03019  3               if ns = 1 : this line occurs : U,V,W,nstep
al+3o-2                                      use scattering factors of al+3 and o-2
1 1                                          one atom for each atom type
1.0 1 0 0 1
2. 3.                                       occupation factors for Al and O
1.0 0.002
6. 6.                                       maximum move for each atom type : 6 angstroems
2                                           annealing law : second order
5000   
20000 20000
5000 0.2 2
10
10                                          <-- n5 = 10


The results lead to R values in the range 0.135-0.005, with a success rate of 7/10 for R=0.005. The propositions at R=0.135 are false minima. Below is the result :
 

     66 moves acc.   18000 tested; Chi**2=0.491E-02, R=0.005
      0 perm. acc.    1999 tested
      1 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
al 1    0.99153    0.99167    0.14792      2.000
o- 1    0.64258    0.96913    0.91818      3.000
The Al coordinates are not exactly 0,0,z
and the O coordinates are hard to recognize as being x,0,1/4, however this is a R Bravais lattice, and so you must think to lattice translations : 0.91=0.25+0.66 and 0.64 is close to 0.66, so you will have to permute x and y for the O atom position.

The true positions are :
Al    0.00000  0.00000  0.35210
O     0.30600  0.00000  0.25000

Now see the .prf file by the DMPLOT software :

You may note that the hkl are at the good positions but the peaks are not (because the calculation is made with a variable step : indexed on the FWHM). Accordance between hkl markers and peak positions will only occur if U=V=0, giving constant FWHM related to W. So, you have seen now what is the pattern "regenerated" from the "|Fobs|". Of course, there is no relation with the true powder pattern on the point of view of peak heights. But this artifact allows to cope with overlapping peaks.

Now the second example for Al2O3 will not work with ns=1 but with ns=0 (the fit is directly on the "|Fobs|") (al2o2F.dat) :
 

Al2O3 R-3c
4.764 4.764 13.009 90.0 90.0 120.0
R -3 C
1.54056 4 2 2 0            <-- ns=0 , no need for the U,V,W,nstep line
al+3o-2
1 1
1.0 1 0 0 1
2. 3.
1.0 0.002
6. 6.
2
5000   
20000 20000
5000 0.2 2
10
10
This is at least 5 times faster than the previous run on the regenerated pattern. The results are similar because there is not much overlapping problems in such a very simple case.

The third test for Al2O3 makes use of constraints on special positions : 0,0,z for Al and x,0,1/4 for O. Such positions could be guessed knowing the chemical formula and being sure of the space group (anyway, this will not be always so easy...) (al2o3GP.dat) :
 

Al2O3 R-3c
4.764 4.764 13.009 90.0 90.0 120.0
R -3 C
1.54056 4 2 2 1
0.02511  -0.04562   0.03019  3
al+3o-2
1 1
1.0 1 0 1 1       We have here the codes for reading occupations and special positions
2. 3.             occupations for Al and O
8 7               special positions : 8 is code for 0,0,z  and 7 is for x,0,1/4
1.0 0.002
6. 6.
2
5000   
20000 20000
5000 0.2 2
0               Note that in such a case, permutations are impossible. If you
10              try, the program will loop infinitely, but will tell you...


In this way, a 100% success rate is ensured. This is because the number of degrees of freedom (DoF) was considerably reduced : 2 unknown parameters instead of 6. And the final result is :
 

     54 moves acc.   19999 tested; Chi**2=0.602E-02, R=0.006
      0 perm. acc.       0 tested
     17 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
al 1    0.00000    0.00000    0.35210      2.000
o- 1    0.69395    0.00000    0.25000      3.000


Aragonite

This example is a bit more complex with 4 independent atoms (12 DoF) (aragonite.dat) :

Aragonite CaCO3
4.961   7.967   5.741  90.00  90.00  90.00
P M C N
1.54056 4 4 3 1
0.02511  -0.04562   0.03019  3.
ca  c   o
1 1 2
1.0 1 0 0 1
0.5 0.5 0.5 1.
1.0 0.01
4. 4. 4.
2
5000   
60000 60000
20000 0.3 2
10
10
We should find :
Ca        0.25000  0.41508  0.24046  0.50000     
C         0.25000  0.76211  0.08518  0.50000         
O1        0.25000  0.92224  0.09557  0.50000         
O2        0.47347  0.68065  0.08726  1.00000
According to the formula and Z = 4, we know that Ca and C are necessarily on a special position with 4 equivalents. But the O atoms could either be distributed on one general plus one special or 3 specials. By chance ;-), the good choice was made here and the success rate is 8/10 :
    412 moves acc.   59999 tested; Chi**2=0.436E-01, R=0.044
     87 perm. acc.    5999 tested
    211 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
ca 1    0.24469    0.08486    0.74000      0.500
c  1    0.74805    0.26311    0.41341      0.500
o  1    0.74901    0.42197    0.40669      0.500
o  2    0.02773    0.68130    0.08825      1.000


CaF2
This seems to be a rather simple example. However the success rate (3/50) is far from the Al2O3 success rate (8/10). Why ? There seem to be a lot of false minima with R~15% or higher (caf2.dat) :

CaF2 Fm3m
5.462   5.462   5.462  90.0  90.0  90.0
F M 3 M
1.54056 4 2 2 1
0.02511  -0.04562   0.03019  3.
ca+2f-1
1 1
1.0 1 0 0 1
1. 2.
1.0 0.02
6. 6.
2
5000   
60000 60000
10000 0.1 2
10
50
The success rate would have been certainly enhanced if the special positions had been guessed.
 
    787 moves acc.   59999 tested; Chi**2=0.951E-02, R=0.010
      0 perm. acc.    5999 tested
    401 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
ca 1    0.50174    0.00162    0.49287      1.000
f- 1    0.26616    0.74359    0.74955      2.000


Expected positions were :

Ca     0.    0.    0.
F     1/4  1/4   1/4
 

Calcite
This case is just a bit more complex than Al2O3, with the same space group (calcite.dat) :
 

Calcite CaCO3
4.990   4.990  17.061  90.0  90.0 120.0
R -3 C
1.54056 4 3 3 1
0.02511  -0.04562   0.03019  3.
ca  c   o
1 1 1
1.0 1 0 0 1
6. 6. 18.
1.0 0.005
9. 9. 9.
2
5000   
60000 10000
20000 0.3 2
10
20
The expected results are :

Ca   0.00000  0.00000  0.00000
C    0.00000  0.00000  0.25000
O    0.25682  0.00000  0.25000

Again, the good occupation numbers were guessed for the O atom, leading to a success rate 8/20. Below is the result :
 

    516 moves acc.   59999 tested; Chi**2=0.375E-02, R=0.004
      3 perm. acc.    5999 tested
    220 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
ca 1    0.00124    0.00162    0.49998      6.000
c  1    0.33332    0.66213    0.91922      6.000
o  1    0.66684    0.07675    0.08548     18.000
Again, one has to retrieve special positions thinking to the R Bravais lattice.

Forsterite
Forsterite is the most complex example found in the Endeavour package, with up to 6 independent atoms (forsterite.dat) :
 

Forsterite Mg2SiO4
4.755  10.198   5.979  90.0  90.0  90.0
P B N M
1.54056 4 6 3 1
0.02511  -0.04562   0.03019  3.
mg+2si+4o-2
2 1 3
1.0 1 0 0 1
0.5 0.5 0.5 0.5 0.5 1.0
1.0 0.005
5. 5. 5.
2
10000   
100000 50000
40000 0.3 2
10
40
And the expected result is :
Mg1    0.00000  0.00000  0.00000   0.50000
Mg2    0.99130  0.27730  0.25000   0.50000
Si        0.42610  0.09400  0.25000   0.50000
O1       0.76580  0.09190  0.25000   0.50000
O2       0.22100  0.44700  0.25000   0.50000
O3       0.27740  0.16300  0.03290   1.00000

Miraculously, the true occupation number were guessed (but special positions were not forced to occur). The DoF is now of 18. And the result is obtained with a 3/40 success rate, rather low :
 

    362 moves acc.   99999 tested; Chi**2=0.173E-01, R=0.017
     66 perm. acc.    9999 tested
    101 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
mg 1    0.00838    0.72263    0.75846      0.500
mg 2    0.49021    0.50067    0.00196      0.500
si 1    0.92644    0.40621    0.74431      0.500
o- 1    0.23412    0.90803    0.75301      0.500
o- 2    0.21983    0.44714    0.25644      0.500
o- 3    0.77804    0.33711    0.96673      1.000


That's all for the Endeavour challenge examples on the Endeavour side. Now the test files strictly from ESPOIR. Some of the results below were already obtained from ESPOIR 1.0 version :
 

CuVO3
In order to illustrate the difference of behaviour of ESPOIR on the same problem if treated in P 1 and in P -1, observe the  results below (corresponding to the test files, with some annealing) :

in P 1, with 10 atoms in general position, for 10 tests starting from different random models (file cuvo3.dat).

Test N°   Moves      Perm.   Events without  Starting R     Final R 
        accepted   accepted  fit improvement    (%)          (%)
  1        236        77         114            72.2         11.7
  2        206       134         123            80.4         11.6
  3        269        22         102            81.7         16.7
  4        151        43          53            80.4          3.8
  5        264        35          99            82.4          4.7
  6        290        69         131            80.3          4.5
  7           failed to attain R = 30% after 40000 events
  8        239        69          93            80.0          4.5
  9        281        29         101            84.8          5.5
 10        237        16          79            85.3         11.8


in P -1, with 5 atoms guessed to be in general position (which is true), for 10 tests (file cuvo3C.dat).
 

Test N°   Moves      Perm.   Events without  Starting R     Final R 
        accepted   accepted  fit improvement    (%)          (%)
  1         87         3          13            82.6          5.8
  2 to 5      failed to attain R = 30% after 40000 events
  6        109         7          17            80.8          3.7
  7 and 8     failed to attain R = 30% after 40000 events
  9        145         3          43            82.9         16.5
 10        119        20          32            84.2          5.0
The difference is that when an atom is moving, two atoms really move in P-1 according to a completely arbitrary origin. Generally, you will observe much less moves accepted in any space group than for the same problem described in P1. The problem is due (I think) to the impossibility to build a truly random starting model, excepted in P1. Anyway, ESPOIR do the job in 4 tests for 10, to be compared to a succes rate of 9/10 in the P1 space group.
 

TeI

This example is quite more complex for two reasons : more atoms (16 in P 1), and almost same diffusion factors for both atom types. If ESPOIR succeeds here, this should mean that organic materials at least as complex as TeI should be solved from scratch by ESPOIR. Again, you can compare the performances in P 1 and in P -1.

in P 1, with 16 atoms in general position, for 20 tests starting from different random models (file tei.dat).

Test N°   Moves      Perm.   Events without  Starting R     Final R 
        accepted   accepted  fit improvement    (%)          (%)


  1        126       411           0            92.2          7.3
  2 to 11   failed
  12       127       436           0            89.1          5.7
  13-20     failed


in P -1, the above success rate (2/20) is reduced by a factor 2, but ESPOIR still works (file teiC.dat).

Test N°   Moves      Perm.   Events without  Starting R     Final R 
        accepted   accepted  fit improvement    (%)          (%)
  1 to 16   failed
  17       100       189           0            97.0          5.4
  18-20     failed
Clearly, one should not expect that Te and I are really well differentiated here (they are not, of course). So that looking at interatomic distances would allow to recognize Te and I atoms (no I-I direct contact, but Te-I and Te-Te are allowed).
 

PbSO4

In P1, there are 24 atoms, but the large success rate below (9/20) is certainly due to easy location of the 4 heavy Pb scatterers. Anyway, in the best solutions, also the S and many of the O atoms were located (file pbso4P1.dat).
 

Test N°   Moves      Perm.   Events without  Starting R     Final R 
        accepted   accepted  fit improvement    (%)          (%)
  1      51636      1267       24647            75.6          9.0
  2         failed
  3      55264      1335       26235            83.0          8.8
  4      57811      1447       27484            83.5         10.4
  5 to 10   failed
 11      58931      1485       28067            84.6         10.5
 12-13      failed
 14      55237      1245       26162            82.8          8.8
 15         failed
 16      54465      1317       25839            80.5         10.1
 17      56120      1304       26794            81.2         13.0
 18      55048      1228       26217            83.5         10.6
 19      51325      1479       24593            81.3          9.3
 20         failed
Determining the space group, finding a new origin, if any.

Finding symmetry elements can then be attempted by using PLATON on the name.spf output :

TITL Test on PbSO4                                             
CELL    8.4820   5.3980   6.9590  90.0000  90.0000  90.0000
SPGR P1
ATOM pb1    0.42876    0.65290    0.18283
ATOM pb2    0.81621    0.16184    0.49586
ATOM pb3    0.31170    0.14734    0.69931
ATOM pb4    0.93629    0.65219    0.02093
Testing with PLATON the above .spf file containing only the Pb atoms as proposed by ESPOIR is sufficient to retrieve the true space group (Pnma) :
====================================================================================================================================
ADDSYM - CHECK  (cf. MISSYM (C): Le Page, Y., J. Appl. Cryst. (1987), 20, 264-269; J. Appl. Cryst. (1988), 21, 983-984)
------------------------------------------------------------------------------------------------------------------------------------
 
- This ADDSYM Search is run on ALL NON-H Chemical Types
- Number of Input Atoms Included in Search  =     4
- Density based on Input Atom set = 4.319 g.cm-3 - Vol / Non-H atom = 79.7 Ang3
- The Structure implies the following Symmetry Elements subject to the Criteria:
   1.00 Deg., (metric)  0.25 Ang. (distances) and   0.45 Ang. (inv. and transl.)
 
Symm.  Input    Reduced (Ang)         (Deg)   (Ang)                 Input Cell
Elem Cell Row  Cell Row   d  Type Dot Angle  Max. dev.             x     y     z
--------------------------------------------------------------------------------
 a * [ 0 0 1]  [ 0-1 0]  6.959  2  1   0.00   0.070    through     0     0 0.102
                                           Pb2  -Pb3   Glide =   1/2     0     0
 m * [ 0 1 0]  [ 1 0 0]  5.398  2  1   0.00   0.048    through     0 0.653     0
                                           Pb2  -Pb2
 n * [ 1 0 0]  [ 0 0 1]  8.482  2  1   0.00   0.159    through 0.370     0     0
                                           Pb4  -Pb2   Glide =     0   1/2   1/2
-1 * ======================================   0.151    at      0.122 0.407 0.339
                                           Pb3  -Pb4
 
   Reduced->Convent         Input->Reduced       T = Input->Convent:    a' = T a
--------------------------------------------------------------------------------
(     0     0     1 )   (     0    -1     0 )   (    -1     0     0 )     Det(T)
(     1     0     0 ) X (     0     0     1 ) = (     0    -1     0 )       =
(     0     1     0 )   (    -1     0     0 )   (     0     0     1 )     1.000
 
 
Cell Lattice  a       b       c    alpha   beta  gamma Volume CrystalSystem Laue
--------------------------------------------------------------------------------
Input   aP  8.482   5.398   6.959  90.00  90.00  90.00    319    Triclinic    -1
Reduced  P  5.398   6.959   8.482  90.00  90.00  90.00    319
Convent oP  8.482   5.398   6.959  90.00  90.00  90.00    319 Orthorhombic   mmm
 
          Conventional, New or Pseudo Symmetry
================================================================================
 
Space Group Pnma                   No:  62, Laue:  mmm [Hall: -P 2ac 2n        ]
 
Lattice Type oP,  Centric, Orthorhombic, Order   8( 4) [Shoenflies: D2h^16     ]
 
  Nr            ***** Symmetry Operation(s) *****
 
   1               X ,              Y ,              Z
   2         1/2 - X ,            - Y ,        1/2 + Z
   3         1/2 + X ,        1/2 - Y ,        1/2 - Z
   4             - X ,        1/2 + Y ,            - Z
   5             - X ,            - Y ,            - Z
   6         1/2 + X ,              Y ,        1/2 - Z
   7         1/2 - X ,        1/2 + Y ,        1/2 + Z
   8               X ,        1/2 - Y ,              Z
 
:: Origin shifted to:-0.122,-0.407, 0.339 after transformation
 
 
:: * Symmetry Elements preceded by an Asterisk are New and indicate
:: Missed/Pseudosymmetry Summary
:: M/P Test on PbSO4                aP => oP 0.000 0.00 0.500 100% Pnma
The ESPOIR proposition of S and O atoms is only partially exact for PbSO4 in P 1 space group, although many atoms are at close-right position. This is understandable because of the quite heavy weight of the Pb atoms.

In the case of CuVO3 and TeI, the best ESPOIR propositions are almost fully correct in P 1 space group, so that PLATON has no difficulty to locate the inversion center (the true structures are both P-1) from the whole atomic positions, provided the I atoms are labelled also Te, in the hypothesis of some misplacement.

In Pnma, the problem is of course to decide which and how many atoms are on special positions. With Z = 4, there is little doubt that Pb and S are on special positions, but the question is for the O atoms. You can postulate that, owing to the Pb weight, this will not be important and try all atoms in general position (this will only play on the scale factor).

The two ways to run the PbSO4 test case, either in automatic mode (all atoms on general position), or by guessing if atoms could be on special positions, are shown below.

Best final result at R=5.1%, automatic run (all occupation factors = 1.) :

pb 1    0.68700    0.24790    0.16692      1.000
s  1    0.56372    0.26778    0.68455      1.000
o  1    0.41881    0.95491    0.19341      1.000
o  2    0.40791    0.23496    0.59896      1.000
o  3    0.31096    0.75743    0.45229      1.000
and the corresponding starting pbso4.dat file :
Test on PbSO4 Pnma
8.482 5.398 6.959 90. 90. 90.
P N M A
1.54056 4 5 3
pb  s   o
1 1 3
1.0
1.0 0.005
5. 5. 5.
5.
20000
300000 100000
40000 0.3 2
10
10


The 10 tests produce 100% success rate with 5.1 < R < 12.8 %. There is not a lot of difference with one test using a set of positions produced by RANDOM and guessing the occupation factors. Of course this is still due to the high Pb weight.

Best result at R=5.5%, one run with guessed occupation factors

pb 1    0.81187    0.75229    0.16561      0.500
s  1    0.06220    0.26771    0.31318      0.500
o  1    0.08402    0.47410    0.19429      1.000
o  2    0.30732    0.74682    0.95795      0.500
o  3    0.90995    0.24051    0.40013      0.500


and the corresponding starting pbso41.dat file :
 

Test on PbSO4 Pnma
8.482 5.398 6.959 90. 90. 90.
P N M A
1.54056 4 5 3
pb  s   o
1 1 3
1.0
1.0 0.005
5. 5. 5.
5.
20000
300000 100000
40000 0.3 2
10
0
  0.9951962  0.2052748  0.1227539 0.5
  0.4912747  0.8526265  0.0630862 0.5
  0.3038542  0.9097319  0.2772063 1.0
  0.3659430  0.3213414  0.7311531 0.5
  0.0127584  0.2117186  0.1964869 0.5


Ba2CdP3O10(OH)

This compound structure is solved in Im2m, however all the other possible groups had to be tried. This example illustrates a more complicated manual choice of the occupation numbers (file im2m.dat).
 

Ba2CdP3O10(OH) Im2m
11.9031   7.3407   5.5533  90.0  90.0  90.0
I M 2 M
1.54056 4 9 4
ba  cd  p   o
1 1 2 5
1.0
1.0 0.001
5. 5. 5. 5.
5.
20000
1000000 100000
40000 0.3 2
10
0
  0.3545507  0.4620205  0.2947413  0.5
  0.4923450  0.7782467  0.7220183  0.25
  0.0822101  0.1718911  0.3469331  0.5
  0.3218593  0.5036837  0.9342188  0.25
  0.5619266  0.7106229  0.0156163  1.0
  0.6033217  0.8264415  0.4910100  0.5
  0.5698345  0.9051973  0.0746788  0.5
  0.9920360  0.9374517  0.8545054  0.5
  0.8335328  0.2782474  0.2700855  0.25
and the result after 24minutes on a Pentium II 333MHz is :

    201 moves acc.  980206 tested; Chi**2=0.879E-01, R=0.088
    993 perm. acc.   98020 tested
         527 events did not improved the fit

 Final coordinates x,y,z and occupation numbers

ba 1    0.80476    0.53371    0.51686      0.500
cd 1    0.00181    0.80944    0.99895      0.250
p  1    0.87267    0.87782    0.71433      0.500
p  2    0.00169    0.39609    0.00658      0.250
o  1    0.33724    0.66500    0.95494      1.000
o  2    0.50340    0.00505    0.72417      0.500
o  3    0.09766    0.36232    0.03478      0.500
o  4    0.60878    0.52288    0.99350      0.500
o  5    0.38358    0.52165    0.99919      0.250


Many of the true positions are special positions with 0 or 1/2 coordinates. You should not expect that ESPOIR will give you such exact values. You will have to give a look to the International Table for Crystallography.
In such a case, the automatic mode will not work ! Can you easily guess those occupation numbers ? Not for all the oxygen atoms but there is not a lot of possibilities for the Ba, Cd and P atoms, so that a part of the solution is attainable, at least. Fortunately, many space groups do not present any special positions, or many organic compounds show all their atoms in general position (like the cimetidine in P21/n - last example below). Ahem, note that the final structure was found distorted in the monoclinic system, with beta=90.09°...

[Co(NH3)5CO3]NO3.H2O

In all the previous examples, the structure factors presented to ESPOIR were excellent ones (as provided by a single crystal study). The present case is the SDPD round robin sample I, for which no participant proposed a model (although it was solved by the organizers, see : Solid State Sciences 1, 1999, 55-62). Below is shown the ESPOIR performance on this compound with good data and with selected "|Fobs|" extracted from the Round Robin X-ray pattern.

- Good data in P 1 (file coamin.dat). ESPOIR works fine for this 30 atoms problem, but you will have to apply PLATON for searching the missing symmetry operators, and some atoms are certainly misplaced or not well identified. Have a look at the 2 Co aoms related by a y+1/2 translation, which is a good sign :

    640 moves acc. 1981001 tested; Chi**2=0.783E-01, R=0.078
  22227 perm. acc.  198100 tested
       10743 events did not improved the fit

 Final coordinates x,y,z and occupation numbers

co 1    0.08745    0.07298    0.55174      1.000
co 2    0.73690    0.57190    0.43416      1.000
n  1    0.11759    0.56399    0.53891      1.000
n  2    0.72046    0.12305    0.95530      1.000
n  3    0.65802    0.71939    0.60319      1.000
n  4    0.81151    0.43511    0.25895      1.000
n  5    0.82421    0.44626    0.66004      1.000
n  6    0.10960    0.62280    0.02686      1.000
n  7    0.66342    0.70737    0.19529      1.000
n  8    0.34270    0.98788    0.61994      1.000
n  9    0.16966    0.20956    0.79560      1.000
n 10    0.47434    0.48721    0.35990      1.000
n 11    0.17457    0.22209    0.37140      1.000
n 12    0.01310    0.94498    0.32230      1.000
c  1    0.01814    0.94085    0.73032      1.000
c  2    0.14872    0.06910    0.54082      1.000
o  1    0.85626    0.20192    0.98168      1.000
o  2    0.31789    0.98017    0.11658      1.000
o  3    0.24503    0.69159    0.54010      1.000
o  4    0.71507    0.00071    0.87474      1.000
o  5    0.56270    0.19712    0.95573      1.000
o  6    0.27031    0.69413    0.03566      1.000
o  7    0.57408    0.18812    0.46404      1.000
o  8    0.16760    0.47710    0.60812      1.000
o  9    0.96780    0.70361    0.02286      1.000
o 10    0.11143    0.49537    0.11406      1.000
o 11    0.66362    0.99035    0.37812      1.000
o 12    0.50667    0.47603    0.88218      1.000
o 13    0.83425    0.17602    0.44959      1.000
o 14    0.99477    0.67122    0.53212      1.000


Good data in P21, that is more direct (cop21.dat file) :

    344 moves acc. 1981123 tested; Chi**2=0.365E-01, R=0.037
  15846 perm. acc.  198112 tested
        7769 events did not improved the fit

 Final coordinates x,y,z and occupation numbers
 

co 1    0.81950    0.23785    0.93907      1.000
n  1    0.74516    0.10563    0.69804      1.000
n  2    0.89369    0.37553    0.76021      1.000
n  3    0.26168    0.59025    0.88503      1.000
n  4    0.09917    0.87128    0.82629      1.000
n  5    0.80493    0.69140    0.46587      1.000
n  6    0.56948    0.32317    0.87206      1.000
c  1    0.79570    0.70602    0.94995      1.000
o  1    0.34462    0.11217    0.53715      1.000
o  2    0.74427    0.83145    0.88826      1.000
o  3    0.07663    0.13665    0.04575      1.000
o  4    0.59402    0.33707    0.37846      1.000
o  5    0.05891    0.11206    0.51844      1.000
o  6    0.79752    0.81448    0.38107      1.000
o  7    0.34391    0.11612    0.04028      1.000
The success rate is here of 4/40 and 17 hours of calculation on a Pentium II 266 MHz, and the structure is almost perfect, no error on C, N and O assignments...

Now, if we examine the result with real data, as extracted from the powder pattern distributed with the SDPD Round Robin, the result is certainly not as beautiful. Zhu5.hkl corresponds to a set of reflections extracted by the Le Bail method with Fullprof, excluding those reflections having a neighbouring one at less than 0.05 2-theta degrees. Near of 150 reflections (at the lowest angles) are used (for 15 atoms to be found in P21). Below are the best results :

    179 moves acc. 1980792 tested; Chi**2=0.193    , R=0.193
   5133 perm. acc.  198079 tested
        2639 events did not improved the fit

 Final coordinates x,y,z and occupation numbers

co 1    0.82093    0.36020    0.94044      1.000
n  1    0.37594    0.07268    0.14666      1.000
n  2    0.62631    0.90305    0.47297      1.000
n  3    0.83421    0.84308    0.44808      1.000
n  4    0.37878    0.39791    0.31452      1.000
n  5    0.78679    0.38652    0.44956      1.000
n  6    0.69884    0.09903    0.45740      1.000
c  1    0.16988    0.23056    0.21554      1.000
o  1    0.79673    0.36041    0.97713      1.000
o  2    0.63487    0.91408    0.05895      1.000
o  3    0.43222    0.33256    0.09743      1.000
o  4    0.43146    0.66930    0.73072      1.000
o  5    0.40683    0.95036    0.59913      1.000
o  6    0.78341    0.77434    0.91579      1.000
o  7    0.74343    0.48008    0.05556      1.000
This is not a complete solution but many atoms are already well placed, including the Co atom.

Cimetidine

The success rate is of the order of 1/50. The best result is shown below with R = 3.7% after 8000000 moves and 7 hours of  calculation (Pentium II 266Mhz). If you dispose of a big and powerful computer, try to do more tests on it, and let me know the result.

    432 moves acc. 7990498 tested; Chi**2=0.372E-01, R=0.037
  15679 perm. acc.  799049 tested
        7894 events did not improved the fit

 Final coordinates x,y,z and occupation numbers

s  1    0.98255    0.91396    0.69786      1.000
c  1    0.08635    0.83805    0.07537      1.000
c  2    0.64513    0.60618    0.14642      1.000
c  3    0.06324    0.65779    0.72502      1.000
c  4    0.72905    0.74969    0.67829      1.000
c  5    0.12604    0.54853    0.25767      1.000
c  6    0.90039    0.78266    0.20823      1.000
c  7    0.03221    0.78999    0.16923      1.000
c  8    0.46558    0.90546    0.90365      1.000
c  9    0.47914    0.41234    0.50753      1.000
c 10    0.73601    0.53650    0.28145      1.000
n  1    0.87490    0.26547    0.77556      1.000
n  2    0.40569    0.09148    0.11088      1.000
n  3    0.29048    0.31663    0.43408      1.000
n  4    0.33178    0.01164    0.35182      1.000
n  5    0.95375    0.62665    0.56134      1.000
n  6    0.63445    0.05413    0.25410      1.000


Molecular Replacement

Finding an octahedron
A powder pattern was calculated by using the SDPDRR sample I characteristics, but keeping only 2 CoN6 octahedra related by the 21 axis. Then ESPOIR 2 was run with (zhutest.dat) :

Finding an octahedron CoN6
7.662 9.626 7.072 90. 106.20 90.     
P 21
1.54056 4 7 2 1
0.25000  -0.26925   0.12779   0.04
co  n   
1 6     
1.5 0 1 0 1      
5. 2.0 2.5     
1. 0.01
7. 7.
2.
500   
5000 5000
5000 0.25 2
2
-10
10. 10. 10. 90. 90. 90.
   0.00   0.00   0.001  1.00
   0.207  0.00   0.001  1.00
  -0.207  0.00   0.001  1.00
   0.00   0.207  0.001  1.00
   0.00  -0.207  0.001  1.00
   0.00   0.00   0.208  1.00
   0.00   0.00  -0.206  1.00
Note that the chances to find an octahedron are enhanced by a factor 6 because of the equivalent positions which superpose the octahedron corners by 90° rotations. Thus the total number of events (rotations + translations) may be reduced to 5000 here for a full success ratio (10/10), using the 100 first reflections of the powder pattern. The final R are in the range 0.150-0.052. Below is the best result. The original position was 0,0,0 for Co. The proposition 0.49,0.15,0.51 is correct, owing to the P21 space group (freedom along y).
     16 rot+tr acc.    2500 gen. and    2499 tested; Chi**2=0.519E-01, R=0.052
      6 trans. acc.    2498 tested
      4 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
co 1    0.49117    0.15466    0.50805      1.000
n  1    0.73961    0.25510    0.56978      1.000
n  2    0.24274    0.05423    0.44631      1.000
n  3    0.56692    0.02909    0.30678      1.000
n  4    0.41542    0.28024    0.70931      1.000
n  5    0.38305    0.29745    0.28762      1.000
n  6    0.59930    0.01188    0.72847      1.000


[Co(NH3)5CO3]NO3.H2O
Now, the question is : does ESPOIR will locate the CoN5O octahedra from the SDPDRR sample I data ? In fact the C atom was added without completing the CO3 group, in order to build a CoN5OC unit. The test was from the true SDPDRR data, using the first 100 extracted "|Fobs|" (zhu.dat) :
 

Test on Zhu3f
7.662 9.626 7.072 90. 106.20 90.     
P 21
1.54056 4 8 4 1
0.25000  -0.26925   0.12779   3.
co  n   c   o
1 5 1 1    
1.5 0 1 0 1      
5. 2.0 3.0 2.0
2.5 2.5 2.5
5. 1.25
2.5     
1. 0.005
7. 7. 7. 7.
2.
5000   
60000 30000
60000 0.30 2
4
-10
10. 10. 10. 90. 90. 90.
   0.00   0.00   0.001  1.00
   0.207  0.00   0.001  1.00
  -0.207  0.00   0.001  1.00
   0.00   0.207  0.001  1.00
   0.00  -0.207  0.001  1.00
   0.00   0.00   0.208  1.00
   0.00   0.00  -0.336  1.00
   0.00   0.00  -0.206  1.00
The final R values for 10 tests are in the range 0.296-0.327, with the best below :
     41 rot. acc.   45000 gen. and   44965 tested; Chi**2=0.296    , R=0.296
     10 trans. acc.   14999 tested
     13 events did not improved the fit, dump = 0.000000
 Final coordinates x,y,z and occupation numbers
co 1    0.79439    0.21918    0.44450      1.000
n  1    0.85490    0.10401    0.70181      1.000
n  2    0.73387    0.33436    0.18718      1.000
n  3    0.70579    0.38121    0.58665      1.000
n  4    0.88299    0.05716    0.30234      1.000
n  5    0.53432    0.13718    0.36395      1.000
c  1    0.21779    0.35268    0.57563      1.000
o  1    0.05446    0.30119    0.52504      1.000
This is to be compared to the positions in the publication (Solid State Sciences 1, 1999, 55-62) :
 
Co      0.8192     0.25*      0.4408      
N  5    0.899      0.119      0.668      
N  1    0.742      0.383      0.198      
N  2    0.746      0.395      0.626      
N  3    0.893      0.113      0.271      
N  4    0.567      0.162      0.364      
C  1    0.210      0.280      0.557      
O  1    0.081      0.351      0.537
*fixed


So, it is a success, isn't it ? The remaining atoms should be found by Fourier difference synthesis.

Cimetidine (pharmaceutical)
If you knew the full molecule of cimetidine, could you locate it with ESPOIR version 2 ? Undoubtedly yes (cim0.dat) :

Test on cimetidine C10H16N6S
10.7001 18.8206  6.8255  90.0 111.28  90.0
P 21/N
1.52904 4 17 3 1
0.01176 -0.00481 0.00223 3.
s   c   n   
1 10 6 
3.0 0 0 0 1
1.0 0.005
7. 7. 7.
2.
5000   
400000 100000
400000 0.35 2
4
-10
10.7001 18.8206  6.8255  90.0 111.28  90.0
        -0.01477  0.51390  0.29953  1.0000
         0.23174  0.35195  0.77730  1.0000   
         0.08631  0.43676  0.67645  1.0000   

         0.02869  0.38828  0.76868  1.0000   
        -0.10273  0.38175  0.80964  1.0000
         0.02448  0.51120  0.59391  1.0000   
         0.14323  0.49332  0.24463  1.0000   
         0.23598  0.56322  0.37858  1.0000   
         0.46070  0.50489  0.50155  1.0000   
         0.56444  0.44226  0.82692  1.0000   
         0.62392  0.54993  0.35794  1.0000   
         0.45268  0.47273  0.66131  1.0000   
         0.21030  0.41723  0.66634  1.0000   
         0.36874  0.54625  0.34632  1.0000   
         0.12808  0.33514  0.82688  1.0000   
         0.59352  0.50863  0.49126  1.0000   
         0.66651  0.58746  0.25019  1.0000
After 6 tests (300000 rotations + 100000 translations for each run, using the first 100 extracted "|Fobs|" for regenerating the powder pattern), the R values are in the range 0.514-0.121, with several proposal ar R~0.25. The solution for R=0.121 is absolutely correct for trying a subsequent Rietveld Refinement.

pyrene
This compound was used as an example for the GAP program (Genetic Algorithm Program, still unavailable) (Zeit. Kristallogr. 212, 1997, 550-552). By GAP, a solution for the whole C16D10 molecule (ToF neutron data) could be obtained in 33 seconds.

ESPOIR is not able to do as well, however, the C16 group is here located from X-ray data. The R values are in the range 0.140-0.317 for 10 tests, (pyrene.dat) :

Test on Pyrene
13.649 9.256 8.470 90.0 100.28 90.0
P 21/A
1.52904 4 16 1 1
0.02511  -0.04562   0.03019  3
c      
16
3.0 0 1 0 1
1.3
1.0 0.01
7.
2.
2000   
400000 100000
400000 0.25 2
4
-10
   0.00  0.00 0.00  90.00  90.00 90.00
 4.2223   -0.3721    3.4328    1.00 
 4.6117    0.2277    2.2644    1.00 
 3.9412   -0.0713    1.0618    1.00 
 4.2967    0.5248   -0.1984    1.00 
 3.6721    0.2194   -1.3151    1.00 
 2.5940   -0.6831   -1.3384    1.00 
 1.8878   -1.0209   -2.5169    1.00 
 0.8355   -1.9345   -2.4719    1.00 
 0.4684   -2.5417   -1.3284    1.00 
 1.1167   -2.2177   -0.1092    1.00 
 0.7541   -2.8416    1.1301    1.00 
 1.3758   -2.5658    2.2552    1.00 
 2.4843   -1.6059    2.2694    1.00 
 3.1909   -1.3069    3.4678    1.00 
 2.8695   -1.0098    1.0859    1.00 
 2.1862   -1.3042   -0.1133    1.00
The Cartesian coordinates were taken from the Cambridge Structural Databank (CSD), here recognized by putting a=b=c=0.
The time for testing 300000 rotations and 100000 translations (100 "|Fobs|" used for building the regenerated powder pattern) was 84mn on a Pentium II 266MHz (loaded with 3 or 4 other calculations so that the true CPU could be ~20mn). Probably, lower R values could be obtained if more runs were done. Anyway, at R=0.140, the structure is determined.

1-methylfluorene
The structure of this compound was determined by the OCTOPUS96 program (still unavailable, Monte Carlo approach) (. Mater. Chem. 6, 1996, 1601-1604). The solution was from a C13 fragment at move 18251 in a run of 20000 events, with Rwp = 33.7%.

The same conditions were used here : data range 5-55° (2-theta), with the C13 fragment, however, only the 100 first extracted "|Fobs|" were retained for building the regenerated powder pattern. The R values obtained for 10 runs are in the range 0.403-0.250 (success ratio : 2/10) for 400000 events (300000 rotations + 100000 translations, needing ~20mn for one run on a Pentium II 266MHz). At R=0.25, the structure is determined, when one C atom is lacking. It should be located by Fourier difference (methyl.dat).

Test on 1-methylfluorene
14.2973   5.7011  12.3733  90.00  95.1060 90.
P 21/N
1.52904 4 13 1 1
0.02511  -0.04562   0.03019  3
c      
13
3.0 0 0 0 1
1.0 0.01
7.
2.
2000   
400000 100000
400000 0.25 2
4
-10
   14.297300   5.701100  12.3733  90.00  95.1060 90.
     0.29800  1.24300 -0.03500  1.00  
     0.37100  1.08300 -0.04400  1.00  
     0.39900  0.91000  0.03700  1.00  
     0.33900  0.86300  0.11900  1.00  
     0.26500  1.01800  0.12500  1.00  
     0.19700  1.02800  0.20900  1.00  
     0.18600  0.88300  0.29700  1.00  
     0.11300  0.93100  0.36300  1.00  
     0.05300  1.12000  0.33600  1.00  
     0.06100  1.25900  0.24400  1.00  
     0.13500  1.21300  0.18300  1.00  
     0.16300  1.52200  0.08400  1.00  
     0.24300  1.20000  0.05000  1.00


tetracycline hydrochloride
This is the SDPDRR sample II. It was solved in due time by two participants of which one used a GOM (Global Optimization Method, program DRUID, still unavailable) and obtained the whole structure from the first 100 reflections extracted by the Pawley method (the other successfull participant used the Patterson method). It was also solved after the deadline by the commercial PowderSolve software (MSI).

ESPOIR version II cannot cope actually with a fragment and an isolated atom (Cl here) together. But, can the tetracycline fragment, witout the Cl atom, be located from the SDPDRR data ? Absolutely yes, by using the same model (tetracycline hexahydrate from CSD) that gave the success to the GOM procedure (testtetra.dat):

Test on tetracycline hydrochloride C22H25ClN2O8
10.980181  12.852233  15.733344  90.0  90.0  90.0
P 21 21 21
0.692 4 32 3 1
0.00600   0.00446   0.00257   3.
c   n   o
22 2 8
3.5 0 0 0 1
1.0 0.005
8. 8. 8.
2
5000   
400000 100000
400000 0.35 2
4
-10
   10. 10. 10. 90. 90. 90.
     -0.22081   -0.53245   -0.35051  1.0
     -0.27849   -0.66194   -0.37186  1.0
     -0.41052   -0.69479   -0.31734  1.0
     -0.20611   -0.75874   -0.44949  1.0
     -0.05317   -0.75703   -0.43833  1.0
     -0.00174   -0.76502   -0.68536  1.0
      0.14454   -0.87923   -0.52443  1.0
      0.01312   -0.62129   -0.40945  1.0
      0.07153   -0.62111   -0.26783  1.0
      0.15275   -0.49490   -0.23964  1.0
      0.19505   -0.48280   -0.09118  1.0
      0.27360   -0.60591   -0.04888  1.0
      0.27359   -0.35248   -0.07258  1.0
      0.37728   -0.34364    0.01915  1.0
      0.44414   -0.22175    0.03971  1.0
      0.40770   -0.10912   -0.03139  1.0
      0.30356   -0.11522   -0.12295  1.0
      0.23606   -0.23711   -0.14507  1.0
      0.12660   -0.24238   -0.24374  1.0
      0.07787   -0.37109   -0.28573  1.0
     -0.02680   -0.37653   -0.37392  1.0
     -0.08748   -0.50566   -0.42679  1.0
     -0.44952   -0.82301   -0.32213  1.0
      0.00533   -0.83355   -0.55292  1.0
     -0.26825   -0.44164   -0.28126  1.0
     -0.48680   -0.60976   -0.26366  1.0
     -0.25540   -0.85358   -0.51172  1.0
      0.07413   -0.47422   -0.01185  1.0
      0.27169   -0.00178   -0.19072  1.0
      0.07709   -0.13458   -0.28894  1.0
     -0.08530   -0.26761   -0.42377  1.0
     -0.12576   -0.48989   -0.56368  1.0
After 10 tests (300000 rotations + 100000 translations for each run, using the first 100 extracted "|Fobs|" for regenerating the powder pattern), the R values are in the range 0.368-0.264. The solution for R=0.264 is absolutely correct for trying a subsequent Rietveld Refinement and finding the remaining Cl atom by Fourier difference synthesis. You could also try to find it by ESPOIR, just by fixing the tetracycline model (with amplitudes of moves = 0) and using the "scratch" option for trying to locate the Cl atom...

Last words

You may try to modify/improve ESPOIR in order to include the simultaneous search of several molecular fragments. If you do so, please contact me, and remember, you should absolutely let the source code open (GNU license).

In the "scratch" option, ESPOIR uses the "brute force", thanks to fast and cheap computers. Is it really more capable to solve your structure than by classical Patterson and Direct methods ? This is not so sure. Have also a look at the more conventional methods as described in the SDPD tutorial. More generally, visit the SDPD Database, and subscribe to the SDPD Mailing List.


Next to do in ESPOIR

Indroduce the possibility to include several fragments to be simultaneously translated and rotated, or fragments + isolated atoms. Add the possibility to treat torsion angles.

If you want to help, or do the whole job, you are welcome !


*** You may have to find what is the best way by yourself, since this program did not existed last month at all !

GNU license
Copyright © 1999 - Armel Le Bail