Skip to main content
Chemistry LibreTexts

1: String (atomic)

  • Page ID
    430693
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    In the pre-digital era data was preserved in the form of books and data tables, with data that has been excerpted from the primary literature often being classified as legacy data.  A string character is just a digital representation of a letter or number and can not be used in computational mathematical operations (if you add one and one you get eleven, not two).  That is, it is two ones next to each other.

    clipboard_e13e47005b5c97677103e5bade6fa5757.pngFigure 2.12.1: Royal Typewriter (Public DomainWikimedia Commons), Christian Herald Press Room, 1898 (Public Domain, Wikimedia Commons)

    ASCII (American Standard Code for Information Interchage)

    One of the first things needed to be encoded if people were to move from typewriters to word processors were the characters of the alphabet. In the early days of computers 8 bit chip of memory was common and the American Standard Code for Information Interchange (ASCII) was developed and based on the 8 bit byte, which shows one set of code that allows computers to interact with a keyboard to store information.  Note the first 32 ASCII characters are unprintable codes used to control devices, and the remaining 156 characters are used to store symbols like numbers and the letters of the alphabet.  We are calling this an atomic code level as each byte (8 bits) is the encoding of a single character of the class string.  As we shall see, a word is also a string, but a string of multiple characters.

    clipboard_e11931e0cf5ba7b01f01c4c4411fc29bf.pngFigure \(\PageIndex{3}\): ASCII Code. (CC BY-SA; Yuriy Arabskyy)

    The original asciII code was 7 bit and the extende asci II is 8 bit and so there are 256 characters.  The complete table can be seen at www.ascii-code.com and a few representations are shown in the table below.

    Dec

    BIN

    Symbol

    Description

    0

    00000000

     

    Null Char

    10

    00001010

     

    Line Feed

    48

    00110000

    0

    Zero

    68

    01000100

    D

    Uppercase D

    100

    01100100

    d

    Lowercase d

    128

    10000000

    Euro

    197

    11000101

    Å

    Angstrom symbol

    The original ASCII code was 7 bit and the extended asci II is 8 bit and so there are 256 characters.  The complete table can be seen at www.ascii-code.com. One of the shortcoming of ascii is that there are only 256 possible representations and so there is a limit to the number of symbols (or commands like "tab") that can be encoded.

    Unicode & UTF-8 

    In 1988 the Unicode Consortium was formed, which was formed as a public benefit (non-profit) in 1991 and unicode is a variable length encoding schema based on the UTF-8 (Universal coded character set Transformation Formate - bit) format.  This is backward compatible with ascii, but the variable length of between 1 and 4 eight bit bytes allows for the encoding of of more characters (two bytes = 256 x 256=65536 possible code points) and this allows for various symbols, sub/superscripts beyond the original ascii. 

    STIX fonts - the Scientific Technical and Information Exchange (STIX) is a project based on UTF-8 that provides font for scients and is integrated into Google Fonts.

     

    In Python strings are also a class of multiple characters, and what is being described here is the representation of a single character of this class.  See the section on string (container) to learn about the methods associated with a string.

     

    References

    Information Science for Chemists by Stuart Chalk, 2015 Cheminformatics OLCC


    This page titled 1: String (atomic) is shared under a not declared license and was authored, remixed, and/or curated by Robert Belford.

    • Was this article helpful?