2: Python Data Types

Last updated
Save as PDF

Page ID: 269709

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

What is data?

To the chemist data are the measured or counted values that can be collected or produced to understand relationships of observable or computed phenomena that are germane to the practice of science (both empirical and computational). To the chemist there are different types of data that are defined by how the data is generated, like the mass or temperature of a sample, or the spectra of a compound. This data is often stored on a computer in a file or database, and can be subsequently processed through various software programs.

To the computer scientist or software program data has a different meaning in that there are different data types that represent how the computer stores information. That is, a computer does not store a measured phenomena like the temperature of a sample, but a digital data type, a representation of the temperature that a software agent can interact with. For example, a letter of the alphabet would be a different type of data than a number, because you can not do arithmetic calculations on letters like you do on numbers.

We need to understand both meanings of the concept of data, and in this section we will learn how computers store data, and the different types of data from the perspective of programming and software agents. Then we will move onto data in the chemistry sense of the word.

What is a database?

Databases are a way computers store information in a manner that can be retrieved. You use databases all the time. Do you realize that as you read this web page you are using a database? Yes, this web page is not a digital file like a MS Word document that saves the information like a sheet of paper, but instead the web browser is displaying information that was pulled from a database as the page is loaded. That is, LibreText is a Wiki that is hosted on the MindTouch knowledge management platform and the information you see is drawn from a database when the page is loaded. Webpages that are pulled from databases are often called dynamic web content, and those that are files are called static web content. Of course, databases can store different types of information, and this class will be using databases that store information related to chemical compounds. But it is important to realize that the use of databases in the twentieth century are pervasive, and you are actually using a database right now, as you read this webpage.

How do databases store information?

Databases store data, which is the representation of information through a binary code that computing machines can read. A bit is the smallest binary value with two possibilities, 0 or 1. This data needs to be stored on a physical medium so the machine can read it. In the old days data was stored on punch cards (figure 1.3.1), which allowed for a binary representation of each position, which could be either punched or not punched (bitten or not bitten). If each location of memory is allowed a certain number of bits, then you can generate different combinations, and give those different combinations different meanings.

Figure \(\PageIndex{1}\): 5081 data processing card containing a line of DOS JCL code. The code reads: //STEP2 EXEC PROC=SLINK,TESTPGM=DADK,ACCT=DADKThe (CC BY-SA; Dick Kutz at English Wikipedia)

Figure \(\PageIndex{2}\): Old Fortran punch card, one of the earliest computer based means for storing data (CC BY-SA; Arnold Reinhold)

A quick look at these possibilities shows that n bits gives 2ⁿ possible combinations.

1 bit has two (2¹)possibilities : 0 or 1, and so can represent two different things
2 bits has four (2²) possibilities: 00, 01, 10, or 11, and so can represent four different things
3 bits has 8 (2³) possibilities: 001, 010, 100, 011, 101, 101, 110, 111, and so can represent 8 different things.
8 bits has 8 (2⁸) possibilities, which is 128, ranging from 00000000 to 11111111, and we won't write them all down here.
n bits has (2ⁿ)

A byte of data is defined as 8 bits and so has (2⁸) or 128 values (which run for 0 to 127). In the early days of computers 8 bit chip of memory was common and the American Standard Code for Information Interchange (ASCII) was developed and based on the 8 bit byte, which shows one set of code that allows computers to interact with a keyboard to store information. Note the first 32 ASCII characters are unprintable codes used to control devices, and the remaining 156 characters are used to store symbols like numbers and the letters of the alphabet.

Figure \(\PageIndex{3}\): ASCII Code. (CC BY-SA; Yuriy Arabskyy)

How a hard drive works?

The take home message here is that everything is stored on the computer in the form of a binary bit, be it a text document, picture, molecular structure data or a spectral file. Each of these represent a different data type and so when you interact with the database, you need to know what type of data is stored, and then use software that can "read" that type of data. Likewise, if you write some simple script to interact with data, you need to recognize the data type you are interacting with, for example, you can do math with numbers, but not letters, and so a number needs to be a different data type than a letter.

Today we do not use punch cards but still store data as a binary representation on a physical device that can be electronically read, like magnetic tape, hard drives, flash drives, SSD (Solid State Disk) and the like. The way magnetic based storage devices work is through the North-South alignment of the magnetic field, where one of these (N-S) would be given the value of 1, and the other (S-N) would be the 0. If you are interested in learning how a hard drive works there is a real good 6 minute video on Nick Parlante's computer science page from Stanford. Flash drives and SSDs have no moving parts and are not based on magnetism, but represent ones and zeros by the ability of tiny channels (gates) within a transistor to be able to conduct (1) or not conduct (0) electricity. It should be noted that after 10-20 years flash drives can lose their memory. In fact, surprisingly magnetic tape is the longest lasting digital storage, although it is the slowest to use.

Python Data Types

There are five basic data types built into Python: Numeric, Sequence, Boolean, Set and Dictionary types. Some of these types have subtypes. So a variable can have different data types and to find the type you can use the type() command

Numeric Data Type

Numeric data type include:

Integers – Any positive or negative whole numbers (no decimal/fractions). Integer can be virtually any length as there is no set limit in Python.

In Python integers can be represented as decimal, binary, octal and hexadecimal numbers. A sequence of decimal digits without a prefix will be interpreted by Python as a decimal number. See the table below for integer values with a base other than 10.

Prefix	Base	Interpretation
0b or 0B	2	Binary
0o or 0O	8	Octal
0x or 0X	16	Hexadecimal

Float is a real number with floating point representation. All numbers that have a decimal point are floats. Scientific notations, containing the character “e” or “E” are also considered floats.

Most platforms represent float in Python as 64-bit “double-precision” values. This format is used in IEEE 754 standard for Floating Point Arithmetic, according to which, a floating-point format is specified by a base (either binary or decimal), a precision and an exponent range from e_min to e_max, where e_min= 1 - e_max.

Figure \(\PageIndex{4}\): 64-bit “double-precision” float representation. (Codekaizen / CC BY-SA; Wikimedia Commons)

The last category of Numeric data type are Complex numbers. Complex numbers consist of a real part and an imaginary part. Imaginary numbers are real multiplies of the imaginary unit j (or i in mathematics), which is defined as a number whose square is equal to -1.

Sequence Data Type

Sequence is the ordered collection of different data types. There are several sequence data types in Python:

String is a chain of Unicode characters. You can use single, double or triple quotes to indicate a string. There’s no separate data type for characters. In Python we use a string of length one for characters.

List is collection of different types of data. You can use single strings, multiple values and you can also nest a list inside a list. Lists can be modified.

Tuple is similar to the list, but once created it cannot be changed. In Python we use comma to separate items in a tuple. Items in a tuple can be of any data type (string, integer, list, etc.)

Boolean Data Type

This data type has only two set values: True and False. Note, that you have to use capital “T” and “F” for the values to avoid an error. Non-Boolean objects can be evaluated using Boolean expressions.