Skip to main content
Chemistry LibreTexts

2.5: Structural Data Files

  • Page ID
    154857
  • Learning Objectives:

    • Gain understanding of chemical structural data files
    • Survey data formats
    • Survey molecular visualization and manipulation software and web services

     

    Introduction

    Structural data files are the files software agents typically use when processing chemical structural information, but can also contain additional information like molecular spectra.  In principle you could say that there are two major components any structural data file, the simplified connection table and additional information. In effect the InChI line notation sort of models them, in that the main layer is the simplified connection table and the other layers are the additional information, except that in a structural data files hydrogen can be implicit or explicit (in the InChI they are explicit).  So when you look at the different types of structural data files you will see they all have an atom table and a bond table.  Information about individual atoms like isotopic definitions are associated with the atom table.  That atom table may also indicate the 3d coordinates associated with a specific environment, and if that information is missing software agents will use an energy minimization calculation to determine 3D structure of an isolated atom.

     

    In this section we will give a brief of the different types of structural data files and a survey of software programs and web services that can be used to display and manipulate structural data files, with a focus on open source options.  There will be some overlap with these software programs and next section on chemical resolvers, which allow you to convert between file types.

     

    Common Types of Structural Data Files

    There are a variety of file formats and the most common are based on the MDL Molfile, of which V2000 is the most common, although V3000 is also commonly used.  The SDF (Structure Data File) is based on the Molfile and figure represent an SDF file for acetone obtained through the NCI/CADD chemical identifier resolver.

    Molfile  

     

    The following is a molfile for acetone obtained from the NCI chemical resolver.  All molfiles have a header and a connection table (CTAB) that has two blocks, the Atom Block  and the Bond Block.

    The Header block has two lines, the first gives the name/formula of the molecule (if known) and is of variable format, the second gives the program that made file, the date and time it was made, and if 2D or 3D coordintates are given (figure 2.5.1 was created June 5, 2019 at 22:46 and has 3D coordinates).

    The Count line block tells us acetone has 10 atoms and 9 bonds, it also provides the version number of the molfile.  N

     

    molfile2.PNG

    Figure \(\PageIndex{1}\):  Molefile for acetone

     

    d

    Activity \(\PageIndex{1}\)

    Go to the NCI Chemical Identifier Resolver (https://cactus.nci.nih.gov/chemical/structure ); in Structure Identifier type "acetone",  choose convert to SD File and submit.

    NCIReslover1.JPG

    Compare your file to figure 2.5.1, and hopefully the only difference you will see is the date the file was generated.  We will discuss chemical resolvers in the next section.

    d

    Professor Bob Hanson at Saint Olaf College created a program for an earlier offering of this class called Hack-a-Mol that we will use to explore data files.  

    https://chemapps.stolaf.edu/jmol/jsmol/hackamol.htm

     

    d

    Activity \(\PageIndex{2}\)

    Open Hack-a-mol in a new window and search NCI for "acetone".  

    hackamol1.JPG

    Now compare the molfile to the molfile from activity 2.5.1.  What is the difference, and can you explain what is going on?

     

    d

    Activity \(\PageIndex{3}\)

    Open a new browser window, load Hack-a-Mol, search for Acetone, but use Pubchem instead of NCI.

     

     

    d

     

     

    Note the atom numbers in the atom block are implicit starting with 1 and going down to 10.  We can also see that the oxygen is atom 4.  From the bond block we see that atom 4 is attached to atom 2 (carbon) and it is a double bond.  We also see that atom two is involved with two additional bonds, one to atom 1 and the other to atom 3, and both of those atoms are carbon.  The connection table defines the molecules connectivity, and when coupled with 3D coordinates, gives its geometric shape.  In this particular table we have included the hydrogens explicitly, but they could have been ommited.  Also note the file ends with the four dollar signs, $$$$.  

     

    Figure 2.5.2 is the same data

     

     

     

     

    C3H6O
    APtclcactv06051922463D 0   0.00000     0.00000
     
     10  9  0  0  0  0  0  0  0  0999 V2000
        1.3051    0.6772    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
        0.0000   -0.0763   -0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       -1.3051    0.6772   -0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       -0.0000   -1.2839   -0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
        1.1059    1.7488   -0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.8767    0.4138    0.8900 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.8767    0.4138   -0.8900 H   0  0  0  0  0  0  0  0  0  0  0  0
       -1.1059    1.7488    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
       -1.8767    0.4138   -0.8900 H   0  0  0  0  0  0  0  0  0  0  0  0
       -1.8767    0.4138    0.8900 H   0  0  0  0  0  0  0  0  0  0  0  0
      1  2  1  0  0  0  0
      2  3  1  0  0  0  0
      2  4  2  0  0  0  0
      1  5  1  0  0  0  0
      1  6  1  0  0  0  0
      1  7  1  0  0  0  0
      3  8  1  0  0  0  0
      3  9  1  0  0  0  0
      3 10  1  0  0  0  0
    M  END
    
    ADDITIONAL INFORMATION CAN BE ADDED HERE
    $$$$

     

    180
      -OEChem-06051922532D
    
     10  9  0     0  0  0  0  0  0999 V2000
        3.7320    0.7500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
        2.8660    0.2500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
        2.0000    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
        2.8660   -0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
        2.3100    1.2869    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.4631    1.0600    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.6900    0.2131    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
        2.2460   -0.7500    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
        2.8660   -1.3700    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
        3.4860   -0.7500    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
      1  2  2  0  0  0  0
      2  3  1  0  0  0  0
      2  4  1  0  0  0  0
      3  5  1  0  0  0  0
      3  6  1  0  0  0  0
      3  7  1  0  0  0  0
      4  8  1  0  0  0  0
      4  9  1  0  0  0  0
      4 10  1  0  0  0  0
    M  END
    > <PUBCHEM_COMPOUND_CID>
    180
    
    > <PUBCHEM_COMPOUND_CANONICALIZED>
    1
    
    > <PUBCHEM_CACTVS_COMPLEXITY>
    26.3
    
    > <PUBCHEM_CACTVS_HBOND_ACCEPTOR>
    1
    
    > <PUBCHEM_CACTVS_HBOND_DONOR>
    0
    
    > <PUBCHEM_CACTVS_ROTATABLE_BOND>
    0
    
    > <PUBCHEM_CACTVS_SUBSKEYS>
    AAADcYBAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGgAAAAAACASAgAACAAAAAAAIAIAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
    
    > <PUBCHEM_IUPAC_OPENEYE_NAME>
    acetone
    
    > <PUBCHEM_IUPAC_CAS_NAME>
    2-propanone
    
    > <PUBCHEM_IUPAC_NAME_MARKUP>
    propan-2-one
    
    > <PUBCHEM_IUPAC_NAME>
    propan-2-one
    
    > <PUBCHEM_IUPAC_SYSTEMATIC_NAME>
    propan-2-one
    
    > <PUBCHEM_IUPAC_TRADITIONAL_NAME>
    acetone
    
    > <PUBCHEM_IUPAC_INCHI>
    InChI=1S/C3H6O/c1-3(2)4/h1-2H3
    
    > <PUBCHEM_IUPAC_INCHIKEY>
    CSCPPACGZOOCGX-UHFFFAOYSA-N
    
    > <PUBCHEM_XLOGP3_AA>
    -0.1
    
    > <PUBCHEM_EXACT_MASS>
    58.042
    
    > <PUBCHEM_MOLECULAR_FORMULA>
    C3H6O
    
    > <PUBCHEM_MOLECULAR_WEIGHT>
    58.08
    
    > <PUBCHEM_OPENEYE_CAN_SMILES>
    CC(=O)C
    
    > <PUBCHEM_OPENEYE_ISO_SMILES>
    CC(=O)C
    
    > <PUBCHEM_CACTVS_TPSA>
    17.1
    
    > <PUBCHEM_MONOISOTOPIC_WEIGHT>
    58.042
    
    > <PUBCHEM_TOTAL_CHARGE>

     

     

     

     

     

     

     

    Hack-a-Mol

    This page is designed especially for students of cheminformatics who are just starting to learn about how chemical structures are represented digitally.

    With this page you can draw a structure in 2D, compare that with its 3D structure, and also see its structural data in a variety of formats. You can also enter a chemical identifier -- a chemical name, a SMILES string, or a Chemical Abstracts Registry Number, for instance -- in the box under the JSmol window.

    If you hack the structural data (carefully!) and then press ENTER, the 2D and 3D structures will update.

    You can also drag-drop a structure file into the JSmol window or copy/paste it into the textarea.

    How It Works

    Author: Bob Hanson
     
     
     
      labels console        info clear no info
    InChI:  
    InChIKey:  
    SMILES: 
     Modify the data and press ENTER to see changes above.    UNDO