CIF Syntax
Syntax for the CIF Format
We have already covered the syntax employed in CIFs by example. Here a more
formal summary of the rules is presented, which includes some details we
have not yet considered.
- A text string is a string of printable ASCII characters bounded by
blanks, matching single quotes (') or double quotes ("), or (if the string
extends over several physical records) by a semicolon as the first
character of the first and trailing lines.
- A data name is a text string starting with an underline (_) character.
- A data item is a text string not starting with an underline, but
preceded by a data name to identify it.
- A data loop is a list of data names, preceded by `loop_' and followed by
a list of data items.
- A data block is a collection of data names (looped or not) and data
items preceded by a data_xxxx code record (the xxxx represents an arbitrary
text string). A data name must be unique within a data block. A data block
is terminated by another data_ statement or by the end of file.
- A data file is a collection of data blocks. The block codes must be
unique within a data file.
- A hash character (#) introduces a comment - all further text to the end
of a line may be ignored.
These rules are a large subset of the syntax rules governing Self-Defining
Text Archive and Retrieval (STAR) files, as described by Hall (1991). The
Crystallographic Information File is a particular application of STAR, with
some additional restrictions to facilitate crystallographic use. These are:
- Lines must not exceed 80 characters in length.
- Data names and block codes may not exceed 32 characters in length, and
should be treated as case-insensitive. NOTE This only applies to CIF's
which conform to the Core Dictionary Version 1. There is NO formal
restriction in Version 2 (though in practise the length is restricted to 76
characters).
- Data items are recognised as being of number or character type. A text
string that is more than 80 characters long, and so extends over more than
one line, is of type text, which may be regarded as a subset of the
character type.
- A data item is of type number if it starts with a digit, plus, minus or
period [0-9+-.].
- A number may be given in integer, floating-point or scientific notation.
A trailing integer within parentheses is understood to be the estimated
standard deviation in the final digit(s) of the number.
- A data item is of type text if it extends over more than one line.
Semicolons as the first character of the first and last lines bound the
data.
- A data item is of type character if it is not a number or text.
- Only one level of loop_ is permitted. Nested loops must be stored as
lists within a text field.
- Numeric data with physical significance have a default unit stated in
the CIF Dictionary. Some alternative units are permitted for certain data
items. The indexing data name then has a units extension as specified in
the CIF Dictionary.