Skip to main content
Chemistry LibreTexts

1: Introduction to Data

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)


    Data is the plural form of the Latin word "datum", which refers to a "fact" or "something given". Data is a "mass plural noun" in that it is used for both the singular case of a piece of data and the plural case of a set of data (you do not say give me the titration datas, but the titration data). Although the word data refers to "information" the context differs between empirical science and computer science applications and it is worth stepping back and looking at.  To the empirical scientist data is used to understand natural (observable) phenomena while to the computer scientist it is used for digital information representation, that often results in computational tasks, algorithm development and information processing.

    • Data and the Empirical Scientist
      • Focus on Observation and Measurable Values
      • Raw material from which scientists draw conclusions and postulate hypothesis
      • Analyzed and interpreted to draw meaningful conclusions 
    • Data and the Computer Scientist
      • Represents digital information and how that information is stored
        • integers, floating numbers, string literals, boolean values
      • Data bases and complex data structures
      • Emphasis on organizing, transforming and extracting meaning from large data sets


    Empirical Science Data

    These are essentially the results of observations and measurements and fit into two broad categories

    1. Qualitative Data
      • Descriptive features (rocky, sandy, wet, dry, 3 lobed leaves, 4 lobed leaves, 5 lobed leaves....)
      • Amenable to Boolean Algebra (true or false statements)
    2. Quantitative (Numerical) Data
      • Two Types
        • Counted - has a number and an entity (two moths)
          • Exact number without uncertainty
          • Can be represented by integer numbers
        • Measured - has a number, unit and entity (2.2 grams of moth)
          • Inexact number with uncertainty
          • Can be represented by floating decimal numbers
      • Amenable to Arithmetic and Boolean Algebra
    3. Can be used to describe functional (causal) relationships between two or more variables
      • y = f(x1, x2, x3... )
        • y = dependent variable, x1, x2, x3... = independent variables
      • Stored in files
        • csv (comma separated values), tsv (tab separated values)
        • XML (eXtensible Markup Language) that include metadata
          • SensorML (Open Geospatial Consortium Markup Language)
          • AniML (Analytical Information Markup Language)
    4. Used to validate scientific theories
    5. Historically shared through printed artifacts (Gutenberg era publications)
    6. Digital Data
      1. Legacy Data excerpted from the primary literature (printed journals)
      2. Assay data acquired through automated techniques applied to empirical experiments
      3. IOT data deposited to online databases in real time through environmental monitoring.


    FAIR Data

    clipboard_e3dc24724ecc7800db9e7eac36304bea7.pngFigure \(\PageIndex{1}\): FAIR data principles (wikicommons, CC 4.0)


    Fourth paradigm science involves data intensive discovery across often disparate data sets and for this to occur the data needs to be Findable, Accessible, Interoperable and Reusable, and this led to the FAIR data principles.  The following links go to organizations promoting FAIR data principles. Metadata is data about the data and proper metadata structures are key to FAIR data.



    Metadata is data about the data and it is critically important to understanding the meaning of the data.  This can range to the parameters an instrument was operating on, to the solvent deployed, to factors like the temperature or GPS location the data was obtained at. Traditionally the laboratory notebook was where the experimental parameters (metadata) were recorded but science is evolving with ELN (Electronic Laboratory Notebooks) and LMS (Laboratory Management Systems) enabling researchers across organizations to have instant access to the data of their collaborators, and the metadata schema allow this to be discoverable.  Data standards organizations like SensorML and AniML spend tremendous efforts in developing metadata schema for capturing the metadata, and tomorrow scientist will need to be versed in this technologies.  Furthermore FAIR data principles can allow data to be shared by researchers who are performing seeming unrelated studies like this work here, where data of the International Monitoring System of the global Comprehensive Nuclear-Test-Band Treaty (CBCT) that was set up to detect nuclear explosions  was used for tracking ice-bergs and understanding climate change. (note, link is to the Way Back Machine of the Internet Archive).


    1: Introduction to Data is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?