SMILES stand for Simplified Molecular Input Line Entry Specification
Introduction of SMILES
- is a line notation (a typographical method using printable characters) for entering and representing molecules and reactions.
- The primary reason SMILES is more useful than a connection table is that it is a linguistic construct, rather than a computer data structure.
- SMILES is a true language, albeit with a simple vocabulary (atom and bond symbols) and only a few grammar rules.
- SMILES representations of structure can in turn be used as "words" in the vocabulary of other languages designed for storage of chemical information (information about chemicals) and chemical intelligence (information about chemistry).
A) Canonicalization
SMILES denotes a molecular structure as a graph with optional chiral indications. This is essentially the two-dimensional picture chemists draw to describe a molecule.
- "unique SMILES" - A canonicalization algorithm exists to generate one special generic SMILES among all valid possibilitie.
- "isomeric SMILES" - SMILES written with isotopic and chiral specifications.
Examples:
Input SMILES | Unique SMILES |
OCC | CCO |
[CH3][CH2][OH] | CCO |
OC(=O)C(Br)(Cl)N | NC(Cl)(Br)C(=O)O |
ClC(Br)(N)C(=O)O | NC(Cl)(Br)C(=O)O |
B) Atoms
Atoms are represented by their atomic symbols. Each non-hydrogen atom is specified independently by its atomic symbol enclosed in square brackets, [ ].Elements in the "organic subset" B, C, N, O, P, S, F, Cl, Br, and I may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds.
- In aromatic rings - specified by lower case letters, e.g., aliphatic carbon is represented by the capital letter C.
- In aromatic carbon - specified by lower case c.
Following atomic symbols are valid SMILES notations
Examples:
C | methane | CH4 |
N | ammonia | NH3 |
Atoms with valences other than "normal" and elements not in the "organic subset" must be described in brackets [ ].
Examples:
[S] | ELEMENT SULPHUR |
[Au] | ELEMENT GOLD |
C) Bonds
Single, double, triple, and aromatic bonds are represented by the symbols -, =, #, and :, respectively.
Examples:
CC | ethane | CH3CH3 |
C=O | formaldehyde | CH2O |
O=C=O | carbon dioxide | CO2 |
D) Branches
Branches are specified by enclosing them in parentheses, and can be nested or stacked. In all cases, the implicit connection to a parenthesized expression (a "branch") is to the left.
Example:
These are the examples of SMILES that I have
made using ACD/ChemSketch
SMILES is not only comprehensive but also well documented!!!