Skip to main content
Chemistry LibreTexts

2.4: Strings and Lists

  • Page ID
    366622
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Introduction

    In lecture A2.2: "Data and Operators" we introduced strings and lists as two of the basic Python data types and structures. Both strings and lists are ordered sequences of one or more data items.  Strings are immutable, designated by parenthesis and consist of only one data type, the "string".  Lists are mutable, designated by brackets and consist of one or more items of many different data types; such as strings, integers, floats, dictionaries, tuples, sets and even other lists.

    Strings and lists are classes of objects that were included with Python when you installed it, sort of like the in-built functions we covered in the last chapter ((print(), input() and type()). Like a function these can also be imported through libraries or be user defined. A class has data structures, features and may have methods associated with it, which are like internal functions that operate on the items of the class. You can call a class method on an object of a class by typing a period followed by the method name.  For example, the string method "upper" converts all characters of a string to uppercase and if you have a string named "string_name", the command string_name.upper() converts it to upper case.   In this chapter we are going to develop an understanding of how to identify and use the functions and methods associated with the classes string and list.

    We will be running code in Script boxes that use Binder to access a Jupyter Hub that is running in the background of LibreTexts.  The first time you run code it will take a minute to load the kernal.  Once the kernal is loaded things should run quickly, and as long as you are running the same kernal instance, items you have created in earlier script boxes will be available for latter ones.  If you are reopening a page you may need to rerun earlier code if the scripts you are running use items loaded or generated in earlier script boxes.

    The following code creates two variables, one a string and one a list, and then shows the data type they are.  Note the output statements also include printing the type of the print() function itself.

    Script \(\PageIndex{1}\): Identifying types

    Predict what the following code will do:

    s_string="this sentence counts 1 to 2"
    l_list=["1", 1,"2",2,s_string]
    print(type(s_string))
    print(type(l_list))
    print(type(print))
            hello world
          

    Note:  The above script shows the type for the three python objects

     

    In lecture A2.3 "Formatting and Logic Control Structures" we introduced how to format strings and logic control structures like for-loops. For-loops are actually often called For-in Loops in that you are iteratively looping for [iterable item] in [data structure of multiple items].  We can apply for loops to the items in a list or string.

    Script \(\PageIndex{2}\) using for-loop to print a list or string

    Predict what the following code will do

    s_string="this sentence counts 1 to 2"
    l_list=["1", 1,"2",2,s_string]
    
    for s in s_string:
        print(s)
    print(10*" * ")
    for l in l_list:
        print(l)
    Hello world!

    Note:  The above script uses a for-loop to print each item in a string, then 10 stars and then a list, which includes the string.  Notice how the last line of the list, which is the string, prints in one line everything that is above the ten stars

     

     

    Useful Functions

    We will also go over some useful functions for working with strings and lists.  In this section we will look at two types of functions. First are stand alone functions that can take a string, list or other item as their arguments. Second are methods, which are associated with an object's class.  In this chapter we will look at the methods associated with objects of the class strings and lists. 

    len()

    The len() function is a built-in function that returns the number of items in an object.  

    Script \(\PageIndex{3}\): len() function

    Predict what the following code will do:

    s_string="this sentence has characters 50 (including spaces)"
    l_list=["1", 1,"2",2,s_string]
    
    print(f'The length of s_string is {len(s_string)} characters')
    print(f'The length of l_list is: {len(l_list)} items')
    
    #the following prints the length of the 5th item, which is the first string
    print(10*" * ")
    print(f"The fifth item in l-list is s_string and it's length is: {len(l_list[4])} characters")
    Hello world!

    Note:  In the above script we used the length function to determine the number of characters in a string and the number of items in the list.  The last item of the list was the string, and we used an index number of 4 to determine the number of items of the string that was embedded in the list.  We will cover indexing in the very next section. We also used the f-formatting method that you need to become familiar with.example text here.

     

     

    range()

    The range operator is a built-in function that generates integers and is very useful to use with for-loops, as it increments by integer values. It is a special type of function called a generator function, in that it generates data.  Later we will look at other generator functions like random(), which create random numbers. The syntax for the range function is:

    range(start, end, increment)

    • start - initial integer, default is zero
    • end - the first value not generated (it generates up to the end)
    • increment - default is one, allows you to increment by values other than one

     

    Script \(\PageIndex{4}\): Use of range() function

    Predict what the following code will do

    print(f'range(11) returns values from 0 to 10')
    for num in range(11):
        print(num)
    print("  ")
    print(f'range(5,11) returns values from 5 to 10')
    for num in range(5,11):
        print(num)
    print("  ")
    print(f'range(0,11,2) returns values from 0 to 10 in increments of 2')
    for num in range(0,11,2):
        print(num)
    Hello world!

    Note:  How the range of the range function is a less than (not less than or equal), and a range to 11 does not print 11, as it is not less than 11

     

    Strings

    Strings are ordered sequences of characters, typically ASCII characters or UTF, the later of which can contain the symbols of multiple languages. As strings are ordered sequences they can be indexed, that is, each position in the string can be assigned an index number. By knowing the index number of a character in a string we can do a variety of string manipulations, like splicing and concatenating. In splicing we can cut strings into sections and in concatenation we can combine strings. 

    You may ask, why should students in an IOST class know techniques like string splicing and concatenation?  The reason is the first step in dealing with data is cleaning it up. Numerical data is often transmitted over the internet as string characters, which not only need to be converted to numerical values, but also need to have extraneous items or even empty spaces removed.  This is the data workup aspect of data science.

    As covered in lecture A2.2: Data and Operators, strings in python are identified by quotation marks.  To create a string variable you use the assignment operator, like the following, which creates a variable of the type string with the label s_string.

    s_string="Index numbers identify the position of a character in a string."

    String indexing

    Because a string is an ordered sequence of characters the location of each character can be indexed, as shown in the following image, where each character is assigned to an index number. Note the first index position is zero and empty spaces are indexed as if they contained a character.

    clipboard_e0934a2f5d8905486b792ecc8d8f9c070.pngFigure \(\PageIndex{1}\): Index positions for the line of characters in a string (Copyright; Belford, CC0.0)

    The syntax for extracting a character by the index number is:

    string_variable_name[index number]

    Run the  code and compare to figure \(\PageIndex{1}\) the above image.

     

    Script \(\PageIndex{5}\): Index number basics

    Predict what the following code will do:

    s_string="Index numbers identify the position of a character in a string."
    print(type(s_string))
    print(s_string[0])
    print(s_string[5])
    print(s_string[10])
    print(s_string[62])
    Hello world!

    Note:  The first output line verifies that s_string is indeed a string.  We then look at the character of each position by using it's index number that is identified by closed brackets ([i]). Note how s_string[5] is an empty line, and that is because it is an empty space.

     

     

    Predict \(\PageIndex{1}\):

    The string of \(\PageIndex{1}\) has a final index position of 62.  Will the following two code statements give the same result? (You can change the code in script box \(\PageIndex{5}\) to test this).

    s_string="Index numbers identify the position of a character in a string."
    
    print(s_string[62])
    print(len(s_string))
    

     

    Exercise \(\PageIndex{1}\)

    Why is the length of the code in Predict \(\PageIndex{1}\) 63 when the last item of the code is the period and has an index number of 62?

    Answer

    In Python the first index position is zero, so there are 63 index positions in the above line of code with numbers ranging from 0 to 62.  You need to be careful, if you want the fifth item in the code, it is index position 4, as the first item is index position 0.

     

    Reverse indexing

    You can also index backward, which is often useful if you need to operate on the last character of a string.  Note, negative zero is the same as positive zero, and so the initial position in reverse indexing is -1.

    Script \(\PageIndex{6}\): Reverse Indexing

    The following code uses reverse index numbers. Can you predict what the code will do?

    s_string="Index numbers identify the position of a character in a string."
    
    print(s_string[-1])
    print(s_string[-2])
            hello world
          

    Note:  Reverse indexing is very valuable as you often need to manipulate values at the end of a string.

     

    Script \(\PageIndex{7}\): More reverse string indexing

    Which of the following output functions will give a different value?

    s_string="Index numbers identify the position of a character in a string."
    print(s_string[0])
    print(s_string[-0])
    print(s_string[-63])
    print(s_string[-len(s_string)])
    Hello world!

    Note how index numbers [0], [-0], [-63] and [-len(s_string)] are all the same character, the first character, capital I. Note how the last output line uses a function to generate the index number.

     

    Exercise \(\PageIndex{2}\)

    From the above code we see the value of print(s_string[-63]), is the same as print(s_string[0]).  What is the value of (print(s_string[63])?

    Answer

    The program will give an index out of range error, as zero is one of the values in the forward direction, but not in the reverse.

    String splicing

    String splicing uses index numbers to grab sections of a string.

    string_name[:Index] - gives all values before the Index Number 
    string_name[Index:] - gives the index number and all values after
    string_name[Index1:Index2] - gives the first index number and all values up to the final index number (but not the final)
    string_name[Index1:Index2:i] - gives the first index number and all values up to the final index number, being incremented by the value "i")

     

    Script \(\PageIndex{8}\)

    The following code splices two strings, one with numbers and one with letters (words). Predict the output for the following, and then run the code.

    given:    

    • num_string="0123456789"
    • s_string="Index numbers identify the position of a character in a string."

    predict

    • s_string[:10]
    • s_string[10:]
    • s_string[:10]+s_string[10:]
    • num_string[0:len(num_string):2]
    • n{num_string[::-1]
    • num_string[1:9:3]

    Solution

    num_string="0123456789"
    s_string="Index numbers identify the position of a character in a string."
    
    print(f'i=index number\n')
    
    print(f's_string is "{s_string}"\n')
    print(f'  Using s_string[:10] \n    print first 10 characters (i=0 to 9).\n{s_string[:10]}\n')
    print(f'  Using s_string[10:] \n    print everything after 10th character (i=10 and beyond).\n{s_string[10:]}\n')
    print(f'  Using s_string[:10]+s_string[10:] \n    combine the first 10 characters with everything after.\n{s_string[:10]+s_string[10:]}\n')
    
    print(f'num_string is "{num_string}"\n')
    print(f'    using the len() function \n  print every other value of num_string \n{num_string[0:len(num_string):2]}\n')
    print(f'    using reverse indexing \n  print num_string in reverse order \n{num_string[::-1]}\n')
    print(f'    using num_string[1:9:3] \n   print every third character from i=1 up to i=9 \n{num_string[1:9:3]}')
    
    Hello world!

     

     

    String methods

    Methods are like functions for a class that can be called to act on objects of a class by placing a period after the objects name and following by the method name.  You can identify available methods by typing a period after an objects name and pressing the tab key.

    clipboard_ec26e7e4e28a02a2ca2d9345f42ad4aa1.pngFigure \(\PageIndex{2}\): You can identify available methods by hitting tab after you type a period after a string object. (Copyright; Belford CC0.0)

     

    kite

    Kite has a very valuable lookup service that can help you identify methods with various python assignment (https://www.kite.com/python/docs/).  If you choose to install the Anaconda or miniconda package in this class you can also run the Spyder IDE, and Kite has an AI code completion app that can be added to IDEs like Spyder, but not Thonny. We will not be doing that much coding in this course, but their python docs lookup service will be very useful once you start importing libraries and want to work with custom class methods. 

    It should be noted that many of these methods operate on more than one object class. The first two methods we will look at are join() and split(), which in a way are opposite processes.  In a join() operation we can take multiple objects, like the items of a list, and join them into a single string, separating them by some iterator, which could be a blank space.  In a split() we can take the items of a single string and break them into multiple objects in a list.

    .join()

    - takes iterable items (list, string, tuple) and joins them into a string separated by an iterator, which can be nothing ("")

    Script \(\PageIndex{9}\): join method

    Predict the output for the following code.

    halogen=['fluorine', 'chlorine', 'bromine', 'iodine', 'astatine','tennessine']
    print('1.',halogen,'\n',type(halogen),'\n')
    
    name1_halogen=' | '.join(halogen)
    print('2.',name1_halogen,'\n',type(name1_halogen),'\n')
    
    print('3.',' '.join(halogen),'\n',type(halogen),'\n')
    
    halogen=' '.join(halogen)
    print('4.',halogen,'\n',type(halogen),'\n')
    Hello world!
    Note:  

    Add texts here. Do not delete this text first.

     

    the type of the output lines

    Exercise \(\PageIndex{3}\)

    Why in the above code output lines 1 & 3 are lists, while output line 2 & 4 are string

    Answer

    Line one is simply printing the initial halogen list, while in line three the .join() function was used in the print statement and never set to a variable, and so an object was never created.  In lines two and four a variable was assigned through the join statement, which converted the list to a string.

     

    .split()

    The split method uses a delimiter to return a list of strings. If the delimiter is not specified blank spaces are used. The syntax is

    object.split(deliminator, max num of strings)

    The max number of strings counts like an index number (zero gives one string)

    Script \(\PageIndex{10}\): Split() method

    Predict the output for the following code.

    halogen_string='fluorine, chlorine, bromine, iodine, astatine,tennessine'
    print(halogen_string,type(halogen_string),'\n')
    halogen_list=halogen_string.split(',',len(halogen_string))
    print(halogen_list,type(halogen_list))
    Hello world!
    Note:  

    In the above script we used the length of the list to delineate the max number of strings. Try changing it to a number like zero, or removing it entirely.

     


    .replace()

    The replace method replaces every occurrence in a string with another character.  The syntax is:

    string_object(old, new, max occurrences)

    Script \(\PageIndex{11}\):Replace method

    Predict the output for the following code

    initial_string='You wish to make 500 mL of a solution with a 500 ml volumetric flask'
    print(initial_string)
    final_string=initial_string.replace('500','250')
    print(final_string)
    Hello world!
    Note:  

     

     

    method syntax function
    .capitalized() .capitalize() Capitalizes the first letter of string
    .count() .count(substring, start=..,end=..)) counts how many times a substring occurs, start/end are optional, and defined by indexes
    .endswith() .endswith('suffix',start,end) Boolean (True/False)
    .find() .find('word', start,end)  
    .index() .index(substring,start,end) gives lowest index of substring
    .upper()   converts entire string to upper case
    .lower()   converts entire string to lower case

     

    Lists

    Lists are a python object class that represent one of the fundamental data structures and are similar to a string in that they are ordered sequences of items.  But a list is fundamentally different in that the items of a list can be of different data structures (those of a string must be of the class str) and the items of the list are mutable.  That is, the items of a list can be replaced. The syntax for a list is to use brackets to define the list and separate the items with parenthesis. 

    [item1,item2,item3,...,itemn]

    The following code shows four lists, one of strings, one of integers, one of floats, and one of the first three lists nested into a fourth list, with each list being a different data type. So the list "halogens" consists of a list of the names of each halogen, it's atomic number and atomic weight.

    Script \(\PageIndex{12}\): lists of various types of items

    Predict the output of the following code

    name_halogen=['fluorine', 'chlorine', 'bromine', 'iodine', 'astatine','tennessine']
    z_halogen=[9,17,35,53,85,117]
    a_halogen=[18.99840316,35.45,79.90,126.9045,209.98715,294.211]
    halogen=[name_halogen, z_halogen,a_halogen]
    
    print(name_halogen,'\n',a_halogen,'\n',z_halogen,'\n',10*' *!* ','\n',halogen)
    Hello world!
    Note:  

    The first list is a string, second integer, third float and fourth is a list of lists.

     

    Script \(\PageIndex{13}\):Single line list assignment

    Compare the input statement for this script with the last one. Is there a difference in the value a halogen?

    name_halogen,z_halogen,a_halogen=['fluorine', 'chlorine', 'bromine','iodine','astatine',
        'tennessine'],[9,17,35,53,85,117],[18.99840316,35.45,79.90,126.9045,209.98715,294.211]
    halogen=[name_halogen, z_halogen,a_halogen]
    
    print(halogen)
            hello world
          
    Note:  

    Here we created the three lists in one ine of code.  Although this may be shorter code, it is harder to compare the differences between the lists.

     

    Script \(\PageIndex{14}\): List Constructor

    Use of the List Constructor

    name_halogen=list(('fluorine', 'chlorine', 'bromine','iodine','astatine','tennessine'))
    print(name_halogen)
            hello world
          
    Note:  

    The list constructor is a function that generates lists.

     

    List Indices

    Like strings, lists are ordered sequences and so the items of a list can be indexed.

    [NOTE: you must run the code at the beginning of this section to create the halogen list.]

    Script \(\PageIndex{15}\): List Indices & Embedded Lists

    Predict the output of the following code:

    name_halogen=['fluorine', 'chlorine', 'bromine', 'iodine', 'astatine','tennessine']
    z_halogen=[9,17,35,53,85,117]
    a_halogen=[18.99840316,35.45,79.90,126.9045,209.98715,294.211]
    halogen=[name_halogen, z_halogen,a_halogen]
    print(name_halogen[2],'\n',a_halogen[1],'\n',halogen[0],'\n',halogen[0][2])
            hello world
          
    Note:  

    Note how the list halogen is a list of lists and halogen[0][2] gave the third item of the first list. You can splice lists like you can strings

     

    Script \(\PageIndex{16}\): List splicing with embedded lists

    Predict the output of the following code.  Note, you must have run the above code in the kernal to generate the lists

    #you must make the list halogen by running code above in the kernal
    print(name_halogen[1:3])
    print(halogen[0][1:3])
    Hello world!
    Note:  

    As each of the nested lists within halogen correlates to the same element we can use the index numbers to extract the information of that element. You had to run the code to create the halogen list before executing this script (Script \(\PageIndex{15}\))

     

    Script \(\PageIndex{17}\): for-loop with embedded lists

    Predict the output for the following code:

    print(halogen[0][1],halogen[1][1],halogen[2][1],'\n')
    print('Using a for-loop:')
    for atom in halogen[0:3]:
        print(atom[1])
            hello world
          
    Note:  

    You had to run the code to create the halogen list before executing this script (Script \(\PageIndex{15}\))

     

    Negative Indexing

    Script \(\PageIndex{18}\)

    Add exercises text here.

    Note:  

    Add texts here. Do not delete this text first.

     

    List Operations

    Changing List Items

    In the methods section we will cover additional ways to change list items

    you can use index numbers, but the methods below like append(), extend(), insert(), del() are more common

    Script \(\PageIndex{19}\): Replacing an item

    Predict the output of the following code

    name_halogen=['fluorine', 'chlorine', 'bromine', 'iodine', 'astatine','tennessine']
    bad=name_halogen
    bad[1]='MISTAKE'
    print(bad)
    Hello world!

    Note: We used the index position to replace the second item (i-1) with the string "MISTAKE"

     

    Script \(\PageIndex{20}\)

    Predict the output of the following code:

    name_halogen=list(('fluorine', 'chlorine', 'bromine','iodine','astatine','tennessine'))
    #to change a list item
    name_halogen[1]="Chlorine"
    print(name_halogen)
    #to change a range of list items
    name_halogen[2:5]=["Bromine","Iodine","Astatine",]
    Hello world!
    Note:  

    Add texts here. Do not delete this text first.

    Nested for-loops

    The following code uses nested for-loops to extract the information for each element of the halogens list:

    Script \(\PageIndex{21}\)

    Predict the output of the following code:

    name_halogen=['fluorine', 'chlorine', 'bromine', 'iodine', 'astatine','tennessine']
    z_halogen=[9,17,35,53,85,117]
    a_halogen=[18.99840316,35.45,79.90,126.9045,209.98715,294.211]
    halogen=[name_halogen, z_halogen,a_halogen]
    
    print(halogen[0][0],halogen[1][0],halogen[2][0],'\n')
    print('Using a for-loop:')
    i=0
    j=0
    atom=halogen[i][j]
    for j in range(0,6):
        atom=halogen[i][j]
        print(' ')
    
        for i in range (0,3):
            atom=halogen[i][j]
            print(atom)
    Hello world!
    Note:  

    Add texts here. Do not delete this text first.

     

    List Functions

    • del()
    • max() - returns largest value or word that starts nearest end of alphabet
    • min() -  returns smallest value or word that starts nearest front of alphabet
    • len() - returns length of list
    • insert(index,'object to be inserted')

     

    List Comprehension

    one-line technique for creating lists with an expression and a for-loop

    Script \(\PageIndex{22}\): List Comprehension

    You may see code that runs a for-loop over a range in a single line.  These can even run mathematical operations on the iterator

    l_list=[x for x in range(1,11)]
    print(l_list)
    e_list=[2*x for x in range(1,11)]
    print(e_list)
    Hello world!

    Note: In the second list we ran a function as we generated the list 

     

    List Membership

    Script \(\PageIndex{23}\): in, not in

    Note here we are using "in" and "not in" to determine membership. Before running this you should think of what kind of data this is, and what the output would be.

    name_halogen=['fluorine', 'chlorine', 'bromine', 'iodine', 'astatine','tennessine']
    print('fluorine'in name_halogen)
    print('fluorine' not in name_halogen)
    print('argon'in name_halogen)
    print('argon' not in name_halogen)
    Hello world!

    Note: This script output boolean values, which can be used in the logic flow of your program.

     

     

    List iteration

    Script \(\PageIndex{24}\)

    Predict the output of the following code

    name_halogen=['fluorine', 'chlorine', 'bromine', 'iodine', 'astatine','tennessine']
    for element in name_halogen:
        print(f"{element} is a halogen")
    Hello world!
    Note:  

    Add texts here. Do not delete this text first.

     

    List Methods

    In the following list x is a variable name and i is an index number.

    • append(x)
    • clear() - removes all elements and returns empty list
    • copy() - makes a copy of the list
    • count(x) returns number of item x exists
    • extend(iterable) - add all elements of one list, tuple, dict,... to another
    • index(x,start,end) - returns index of first matched term
    • insert(i,x) - inserts item at desired location)
    • pop(i) - returns and removes item at given index
    • remove(x)-
    • reverse()  - reverses order of list
    • sort

    The following code uses methods to add and remove items from a list

    Script \(\PageIndex{25}\) ; Assorted List Methods

    After running this code you should play around and try different things.

    name_halogen=list(('fluorine', 'chlorine', 'bromine','iodine','astatine','tennessine'))
    print(name_halogen)                                      
    #to remove second item                                  
    name_halogen.remove('chlorine')
    print(name_halogen)
    #to replace chlorine
    name_halogen.insert(1,'chlorine')
    print(name_halogen)
    #to remove last item
    name_halogen.pop(-1)
    print(name_halogen)
    #to append last item
    name_halogen.append('tennessine')
    print(name_halogen)
    Hello world!

     

     

    Script \(\PageIndex{26}\): combining two lists with .extend() 

    Predict the output of the following code:

    name_halogen=list(('fluorine', 'chlorine', 'bromine','iodine','astatine','tennessine'))
    z_halogen=[9,17,35,53,85,117]
    #Extend allows you to add a second list of iterables
    name_halogen.extend(z_halogen)
    print(name_halogen)
            hello world
          

     Note: We are in effect adding the two lists (try: name_halogen=name_halogen+z_halogen)

     

    OK, hopefully the above gives you enough familiarity with strings and lists to be able to look at code and figure out what is going on.


    This page titled 2.4: Strings and Lists is shared under a not declared license and was authored, remixed, and/or curated by Robert Belford.

    • Was this article helpful?