Skip to main content
Chemistry LibreTexts

Python_Assignment_2B

  • Page ID
    188804
  • Interconversion between PubChem records

    Downloadable Files

    Lecture03_list_conversion.ipynb

    • Download the ipynb file and run your Jupyter notebook. 
      • You can use the notebook you created in section 1.5 or  the Jupyter hub at LibreText: https://jupyter.libretexts.org (see your instructor if you do not have access to the hub).   
      • This page is an html version of the above .ipynb file. 
        • If you have questions on this assignment you should use this web page and the hypothes.is annotation to post a question (or comment) to the 2019OLCCStu class group.   Contact your instructor if you do not know how to access the 2019OLCCStu group within the hypothes.is system.

     

     

     

    PUG-REST can be used to retrieve PubChem records related to another PubChem records. Basically, PUG-REST takes an input list of records in one of the three PubChem databases (Compound, Substance, and BioAssay) and returns a list of the related records in the same or different database. Here, the meaning of the relationship between the input and output records may be specified using an optional parameter. This allows one to do various tasks, including (but not limited to):

    • Depositor-provided records (i.e., substances) that are standardized to a given compound.
    • Mixture compounds that contain a given component compound.
    • Stereoisomers/isotopomers of a given compound.
    • Compounds that are tested to be active in a given assay.
    • Compounds that have similar structures to a given compound.

     

    Getting depositor-provided records for a given compound

     

    First let's import the requests package necessary to make a web service request.

    In [1]:

    import requests
    

    The code snippet below retrieves the substance record associated with a given CID (CID 129825914).

    In [2]:

    prolog    = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
    
    pr_input  = "compound/cid/129825914"
    pr_oper   = "sids"
    pr_output = "txt"
    url       = prolog + '/' + pr_input + '/' + pr_oper + '/' + pr_output
    
    res = requests.get(url)
    print(res.text)
    
    341669951
    
    

    It is also possible to provide a comma separated list of CIDs as input identifiers.

    In [3]:

    pugin   = "compound/cid/129825914,129742624,129783988"
    pugoper = "sids"
    pugout  = "txt"
    url     = prolog + '/' + pugin + '/' + pugoper + '/' + pugout
    
    res = requests.get(url)
    print(res.text)
    
    341669951
    341492923
    341577059
    345261280
    368769438
    
    

    In the example above, the input list has three CIDs, but the PUG-REST request returned five SIDs. It means that some CID(s) must be associated with multiple SIDs, but it is hard to see which CID it is. Therefore, we want the SIDs grouped by the corresponding CIDs. This can be done using the optional parameter "list_return=grouped" and changing the output format to json.

    In [4]:

    pugin   = "compound/cid/129825914,129742624,129783988"
    pugoper = "sids"
    pugout  = "json"
    pugopt  = "list_return=grouped"
    url     = prolog + '/' + pugin + '/' + pugoper + '/' + pugout + "?" + pugopt
    
    res = requests.get(url)
    print(res.text)
    
    {
      "InformationList": {
        "Information": [
          {
            "CID": 129825914,
            "SID": [
              341669951
            ]
          },
          {
            "CID": 129742624,
            "SID": [
              341492923
            ]
          },
          {
            "CID": 129783988,
            "SID": [
              341577059,
              345261280,
              368769438
            ]
          }
        ]
      }
    }
    
    

    Note that the json output format is used in the above request. The "txt" output format in PUG-REST returns data into a single column but the result from the above request cannot fit well into a single column.

    If you want output records to be "flattened", rather than being grouped by the input identifiers, use "list_return=flat".

    In [5]:

    pugopt  = "list_return=flat"
    url     = prolog + '/' + pugin + '/' + pugoper + '/' + pugout + "?" + pugopt
    
    res = requests.get(url)
    print(res.text)
    
    {
      "IdentifierList": {
        "SID": [
          341492923,
          341577059,
          341669951,
          345261280,
          368769438
        ]
      }
    }
    
    

    The default value for the "list_return" parameter is:

    • "flat" when the output format is TXT
    • "grouped" when the output format is JSON and XML

    It is also possible to specify the input list implicitly, rather than providing the input identifiers explicitly. For example, the following example uses a chemical name to specify the input list.

    In [6]:

    # Input CIDs are provided using a chemical name
    url = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/lactose/cids/txt'
    res = requests.get(url)
    cids = res.text.split()
    print("# CIDs returned:", len(cids))
    print(",".join(cids))
    
    # Input CIDs are provided using the name, then coverted to SIDs.
    url = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/lactose/sids/txt'
    res = requests.get(url)
    sids = res.text.split()
    print("# SIDs returned (method 1):", len(sids))
    #print(",".join(sids))
    
    # Input *SIDs* are provided using the name, and returned the input SIDs.
    url = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/name/lactose/sids/txt'
    res = requests.get(url)
    sids = res.text.split()
    print("# SIDs returned (method 2):", len(sids))
    #print(",".join(sids))
    
    # CIDs returned: 7
    6134,440995,84571,294,439186,49837892,69301022
    # SIDs returned (method 1): 419
    # SIDs returned (method 2): 125
    

    The above example illustrates how the list conversion works.

    • In the first request, the name "lactose" is searched for against the Compound database and the resulting 7 CIDs are returned.
    • If you change the operation part from "cids" to "sids" (as in the second request), the same name search is done first against the Compound database, followed by the list conversion from the resulting 7 CIDs to associted 415 SIDs.
    • In the third request, the name search is performed against the Substance database, and the resulting 125 SIDs are returned.

    Exercise 1a Statins are a class of drugs that lower cholesterol levels in the blood. Retrieve in JSON the substance records associated with the compounds whose names contain the string "statin".

    • Make only one PUG-REST request.
    • For partial name matching, set the name_type parameter to "word" (See the PUG-REST document for an example).
    • Group the substances by the corresponding compound records.
    • Print the json output using print()

    In [7]:

    # Write your code in this cell.
    

     

    Getting mixture/component molecules for a given molecule.

     

    The list interconversion may be used to retrieve mixtures that contain a given molecule as a component. To do this, the input molecule should be a single-component compound (that is, with only one covalently-bound unit), and the optional parameter "cids_type=component" should be provided.

    In [8]:

    prolog    = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
    
    url = prolog + "/compound/name/tylenol/cids/txt?cids_type=component"
    res = requests.get(url)
    cids = res.text.split()
    print(len(cids))
    print( cids )
    
    349
    ['137528085', '137524012', '136090259', '134821662', '134821661', '134539182', '132232560', '130454883', '129031778', '129031742', '129031741', '129010982', '122707137', '122640664', '122640662', '121470966', '118988662', '118437740', '118437739', '118437726', '118437724', '118437719', '118437718', '118096890', '110191848', '101532036', '91971975', '91844722', '91809174', '91799326', '91373323', '91304928', '91249665', '90190366', '89154873', '88829467', '88808486', '88375685', '88265592', '87793048', '87767189', '87689257', '87616094', '87616016', '87550936', '87316578', '87130989', '87105096', '86763272', '86649518', '86566332', '86345462', '77910487', '73774600', '72721109', '72707116', '72700784', '72188553', '71572418', '71550786', '71550785', '71550784', '71548672', '71467818', '71369141', '71362726', '71361566', '70976727', '70682632', '70657241', '70553665', '70542675', '70537575', '70505414', '70504628', '70475261', '70472316', '70426385', '70426383', '70406374', '70406139', '70405742', '70376964', '70303418', '70300153', '70291296', '70289957', '70288195', '70269572', '70242946', '70211817', '70189253', '70022384', '70020949', '69998447', '69993029', '69967975', '69933339', '69929845', '69854774', '69764657', '69722397', '69679639', '69611065', '69606026', '69573144', '69540272', '69512447', '69466568', '69459739', '69423341', '69405977', '69304461', '69272079', '69265432', '69233272', '69230240', '69225615', '68897200', '68897197', '68110089', '67882622', '67820890', '67783664', '67744130', '67737493', '67731544', '67686669', '67660813', '67660812', '67654146', '67625285', '67592310', '67591390', '67575749', '67458691', '67450933', '67439236', '67438851', '67437760', '67437722', '67437466', '67437444', '67436790', '67427214', '67409095', '67355967', '67340968', '67286835', '67230371', '67113962', '67113814', '67108121', '67107882', '67107874', '67071608', '66986843', '66883337', '66788721', '66723746', '66645004', '57455278', '57415653', '57351140', '56843745', '56841857', '56841820', '56841818', '56841593', '56841580', '56832576', '54738332', '54695506', '53243122', '51001930', '50987060', '50986678', '50986663', '50986655', '49837973', '49837932', '49821783', '49821277', '46917107', '46916977', '46916926', '46916925', '46842915', '46841220', '46841219', '45102769', '44631883', '44539701', '44363480', '44363323', '44231128', '44195152', '44153587', '44152261', '44150691', '44150058', '44149583', '44147053', '44146829', '44144793', '25256727', '25134193', '24947116', '24936175', '24847942', '24847941', '24847940', '24847939', '24847938', '24847937', '24847936', '24847935', '24847934', '24847933', '24847798', '24847789', '24832363', '24812777', '24765810', '24748207', '23722664', '23690943', '23690172', '23667500', '23663623', '23653632', '23653596', '23617244', '23615516', '23446931', '23424531', '23376922', '23351294', '23351235', '22215565', '22119336', '22119335', '22119334', '22119330', '22119329', '22119328', '22119327', '22097997', '21400158', '21219338', '21126091', '21115975', '21115970', '21115962', '19905501', '19879675', '19829568', '18521256', '16750359', '16654983', '16214982', '16126867', '16034831', '15953843', '11993742', '11988033', '11979875', '11979874', '11957781', '11949635', '11949633', '11857249', '11643341', '11593372', '11581230', '11526511', '11468318', '11336902', '11319975', '11247932', '11221915', '10198442', '10176464', '9962537', '9961898', '9938297', '9937997', '9932573', '9931843', '9928049', '9918711', '9917788', '9915846', '9911255', '9904930', '9892141', '9887612', '9884980', '9875625', '9875580', '9873798', '9872842', '9871873', '9871508', '9863408', '9862775', '9854311', '9853988', '9852063', '9845722', '9844700', '9839998', '9831424', '9830967', '9829022', '9828641', '9825027', '9819411', '9810502', '9808295', '9806105', '6321309', '6321307', '6321228', '5748525', '5492657', '5492530', '5492179', '5491280', '5491240', '5491012', '5487068', '4301564', '3081207', '196124', '193442', '187072', '174272', '174131', '171290', '163158', '157897', '156411', '154366', '150304', '131293', '127238', '123857', '83966']
    


    It should be noted that, if the input molecule is a multi-component compound, the option "cids_type=component" returns the components of that compound. For example, the following example shows how to get all components of the first molecule in the "cids" list generated in the previous example.

    In [9]:

    url = prolog + "/compound/cid/" + cids[0] + "/cids/txt?cids_type=component"
    res = requests.get(url)
    component_cids = res.text.split()
    print( "CID:", cids[0])
    print( "Number of Components", len(component_cids))
    print( component_cids )
    
    CID: 137528085
    Number of Components 3
    ['446155', '4891', '1983']
    

    Exercise 2a: Many over-the-counter drugs contain more than one active ingredients. In this exercise, we want to find component molecules that occur with three common pain killers (aspirin, tylenol, advil) as a mixture.

    Step 1. Define a list that contains three drug names (aspirin, tylenol, advil).

    In [10]:

    # Write your code in this cell.
    

    Step 2. Using a for loop, retrieve PubChem CIDs corresponding to the three drugs and store them in a new list. In order not to overload the PubChem servers, stop the program for 0.2 second for each iteration in the for loop (using sleep()).

    In [11]:

    # Write your code in this cell.
    

    Step 3. Using another for loop, do the following things for each drug:

    • Get the PubChem CIDs of the mixture compounds that contain each drug and store them in a list.
    • Get the PubChem CIDs of the components that occur in any of the returned mixtures, by setting the "list_return" parameter to "flat". This can be done with a single request.
    • Print all the components.
    • Stop the code for 0.2 second using sleep() each time a PUG-REST request is made.

    In [12]:

    # Write your code in this cell.
    

     

    Getting compounds tested in a given assay

     

    PUG-REST may be used to retrieve compounds tested in a given assay. For example, the following code cell shows how to get all compounds tested in AID 1207599.

    In [13]:

    url = prolog + "/assay/aid/" + "1207599" + "/cids/txt"
    res = requests.get(url)
    cids = res.text.split()
    print(len(cids))
    print(cids)
    
    791
    ['6175', '6197', '8547', '10219', '14169', '17558', '21389', '68050', '84677', '95783', '95996', '142779', '177894', '180548', '182792', '241056', '253602', '302770', '348623', '379338', '408190', '427456', '453048', '456183', '458959', '463795', '467892', '467895', '467898', '467900', '467902', '468692', '493035', '540335', '615754', '628093', '653020', '658095', '659146', '659572', '660337', '660996', '661700', '664853', '665381', '670727', '678644', '679624', '684193', '686636', '692799', '696459', '697239', '701785', '705510', '709466', '711950', '718105', '722343', '726776', '728907', '732311', '742641', '745456', '746602', '759319', '763219', '780973', '783532', '787413', '787416', '805487', '807557', '819039', '819041', '826058', '826108', '826140', '865238', '866779', '871153', '876820', '879749', '899915', '929152', '933766', '934186', '935739', '939076', '940283', '945743', '951335', '951809', '962627', '972880', '973099', '991453', '1000261', '1036940', '1042562', '1046660', '1091318', '1092462', '1104079', '1104215', '1104245', '1121234', '1132057', '1160282', '1187199', '1187221', '1202678', '1214451', '1230978', '1249688', '1253822', '1256574', '1271219', '1272562', '1312134', '1315452', '1316619', '1327632', '1330474', '1331204', '1347726', '1355348', '1388836', '1434951', '1486593', '1489416', '1489418', '1507416', '1531863', '1546587', '1579827', '1591101', '1643514', '1777249', '1836886', '1848339', '1891362', '1915905', '1923383', '1929483', '1931935', '1945930', '1954158', '1979800', '1987880', '2000188', '2043476', '2190897', '2192323', '2226126', '2227970', '2229100', '2253996', '2261114', '2454286', '2526359', '2728441', '2767328', '2767330', '2788193', '2789798', '2790585', '2790771', '2826655', '2827013', '2829106', '2837334', '2840340', '2840397', '2840651', '2841226', '2842878', '2844171', '2853195', '2855062', '2856734', '2858891', '2864616', '2864698', '2865851', '2871670', '2871881', '2876588', '2877048', '2877655', '2878059', '2887065', '2890365', '2895488', '2897031', '2897388', '2897581', '2900550', '2910877', '2917555', '2917883', '2918568', '2918689', '2920941', '2921451', '2923731', '2927543', '2929712', '2938150', '2942140', '2946841', '2948608', '2979608', '2988155', '2996938', '3010592', '3020289', '3098392', '3107646', '3114195', '3117161', '3119532', '3124081', '3124283', '3127328', '3129553', '3130863', '3160767', '3164076', '3191424', '3210926', '3235634', '3249970', '3257554', '3269624', '3304735', '3338861', '3351585', '3368424', '3392688', '3406365', '3454585', '3491998', '3499680', '3503392', '3504865', '3525061', '3546208', '3557460', '3559623', '3606116', '3614497', '3618367', '3665408', '3706362', '3732278', '3742333', '3749024', '3805646', '3808173', '3812294', '3814649', '3869813', '3973344', '3978205', '3978878', '4103922', '4109525', '4136614', '4139926', '4145335', '4204024', '4207389', '4208062', '4242282', '4266124', '4267688', '4329630', '4400747', '4411107', '4523414', '4524296', '4539365', '4571213', '4617997', '4655782', '4727895', '4827679', '4909257', '4970781', '4970834', '4986662', '5051873', '5061949', '5065884', '5073170', '5151552', '5158191', '5243139', '5304895', '5308471', '5311382', '5322214', '5322341', '5328733', '5332967', '5440108', '5736589', '6404647', '6459920', '6464095', '6472985', '6493903', '6603416', '6603435', '6617681', '6798868', '7086352', '7150960', '7161317', '7267121', '7292609', '7292627', '7292667', '7292689', '7294801', '7294819', '7693212', '7693819', '7708127', '8441907', '8741335', '9549410', '9549480', '9550693', '9802843', '10066728', '10173796', '10215271', '10237991', '10432767', '11237028', '11534555', '11834445', '11837127', '11953179', '12005317', '13751046', '15987793', '15998601', '16012811', '16015353', '16015355', '16018313', '16020383', '16025899', '16032335', '16033711', '16189535', '16192614', '16192765', '16193792', '16241953', '16247239', '16437129', '16454233', '16475153', '16682138', '16799558', '16806786', '16815088', '16931822', '17236226', '17325420', '17354463', '17367378', '17369361', '17373844', '17375211', '17388866', '17454029', '17577933', '18177365', '18566671', '18566672', '18574682', '18589704', '19988685', '19988703', '19988741', '20080889', '20856297', '20857481', '20883002', '20883057', '20883959', '20886787', '20925640', '20925641', '21120338', '21626819', '21626821', '21758532', '21758548', '21758549', '21783605', '22334090', '22518254', '22549208', '22549211', '22549224', '22549315', '22549348', '22578577', '22583924', '22767708', '23341191', '23602449', '23604751', '23724273', '23878297', '24070578', '24219693', '24761496', '24792460', '25132194', '25279797', '25282247', '25295758', '25459838', '26325235', '26458299', '27443667', '27478178', '27478226', '27509345', '35281662', '40481893', '40481904', '40504628', '40565203', '42210143', '42484148', '42518226', '44351802', '44521965', '44521972', '44522016', '44522026', '44522038', '44522052', '44522061', '44522064', '44522108', '44522109', '44522111', '44522112', '44522191', '44522192', '44522193', '44522210', '44522254', '44522281', '44522286', '44522288', '44522300', '44522358', '44522363', '44522367', '44522370', '44522371', '44522432', '44522445', '44522455', '44522502', '44522583', '44522691', '44522796', '44522817', '44522830', '44522835', '44522837', '44522956', '44523101', '44523176', '44523178', '44523216', '44523221', '44523267', '44523276', '44523308', '44523367', '44523483', '44523550', '44523664', '44523722', '44523726', '44523772', '44523822', '44523824', '44523826', '44523840', '44523901', '44523903', '44523905', '44523906', '44523907', '44523909', '44523913', '44524270', '44524414', '44524415', '44524421', '44524435', '44524839', '44524878', '44524880', '44525180', '44525320', '44525321', '44525322', '44525323', '44525593', '44525750', '44525761', '44525872', '44526408', '44526710', '44526754', '44526756', '44526800', '44526805', '44526815', '44526825', '44526827', '44527011', '44527222', '44527478', '44527483', '44527516', '44527532', '44527534', '44527536', '44527538', '44527542', '44527548', '44527677', '44527682', '44527731', '44527733', '44528139', '44528412', '44528435', '44528594', '44528595', '44528645', '44528698', '44528753', '44528762', '44528799', '44528811', '44528836', '44528911', '44529050', '44529089', '44529095', '44529350', '44529362', '44529363', '44529528', '44529532', '44529555', '44529567', '44529731', '44529772', '44529819', '44529820', '44529854', '44529925', '44529971', '44530112', '44530114', '44530124', '44530128', '44530132', '44530383', '44530466', '44530480', '44530841', '44530965', '44531000', '44531061', '44531085', '44531113', '44531131', '44531143', '44531146', '44531161', '44531163', '44531197', '44531233', '44531242', '44531284', '44531288', '44531375', '44531377', '44531379', '44531383', '44531412', '44531418', '44531426', '44531489', '44531491', '44531500', '44531502', '44531504', '44531514', '44531628', '44531666', '44531672', '44531674', '44531677', '44531683', '44531695', '44531710', '44531722', '44531723', '44531724', '44531726', '44531727', '44531751', '44531757', '44531761', '44531771', '44531780', '44531797', '44531818', '44531830', '44531839', '44531851', '44531878', '44531889', '44531897', '44531904', '44531905', '44531909', '44531911', '44531922', '44531928', '44531939', '44531943', '44531945', '44531948', '44531949', '44531952', '44531954', '44531967', '44531968', '44531969', '44531988', '44531998', '44531999', '44532003', '44532037', '44532413', '44532446', '44532568', '44532601', '44532621', '44532840', '44532843', '44532903', '44532938', '44532999', '44533015', '44533033', '44533053', '44533115', '44533186', '44533212', '44533402', '44533413', '44533485', '44533746', '44533750', '44533804', '44533849', '44533946', '44533970', '44533978', '44533982', '44534040', '44534061', '44534063', '44534153', '44534159', '44534215', '44534218', '44534219', '44534263', '44534576', '44534852', '44534854', '44534856', '44534973', '44534975', '44535067', '44535073', '44535098', '44535166', '44535181', '44535556', '44535558', '44535562', '44535738', '44535749', '44535752', '44535819', '44535855', '44535969', '44536025', '44536135', '44536165', '44536181', '44536183', '44536356', '44536565', '44536569', '44536587', '44536598', '44536603', '44536616', '44536663', '44536677', '44536707', '44537143', '44537238', '44537280', '44537288', '44537308', '44537313', '44537359', '45176114', '45210433', '45480698', '45480775', '45480913', '45480945', '45481017', '45481028', '45481049', '45481085', '45481291', '45481409', '45481490', '45488429', '45488433', '45488525', '45488843', '45489280', '45489322', '45489331', '45489340', '45489378', '45489391', '52949030', '57504040', '135400261', '135403990', '135414241', '135415440', '135421442', '135440751', '135444435', '135479358', '135543311', '135551662', '135609775', '135731891', '135752941', '135752970', '135752975', '135857472', '135880900', '135925134', '135925140', '135925147', '135925159', '135925164', '135971382', '136032603']
    

    If you are interested in only the compounds that are tested "active" in a given assay, set the "cids_type" parameter to "active", as shown in the code below.

    In [14]:

    url = prolog + "/assay/aid/" + "1207599" + "/cids/txt?cids_type=active"
    res = requests.get(url)
    cids = res.text.split()
    print(len(cids))
    print(cids)
    
    435
    ['6197', '10219', '14169', '17558', '68050', '177894', '182792', '253602', '348623', '453048', '456183', '458959', '463795', '467892', '467895', '467898', '467900', '540335', '628093', '697239', '701785', '742641', '745456', '807557', '826140', '972880', '973099', '1092462', '1104215', '1104245', '1187199', '1253822', '1272562', '1330474', '1507416', '1591101', '1929483', '1931935', '2226126', '2229100', '2454286', '2526359', '2788193', '2826655', '2840340', '2840651', '2865851', '2871881', '2876588', '2877655', '2895488', '2897031', '2900550', '2917883', '2918568', '2923731', '2946841', '3010592', '3020289', '3098392', '3114195', '3124081', '3124283', '3304735', '3351585', '3732278', '4524296', '4827679', '4970781', '5065884', '5311382', '5322214', '5322341', '5328733', '6404647', '6603435', '7086352', '7292609', '7292627', '7292667', '7292689', '7294801', '7294819', '9549410', '9549480', '9802843', '10066728', '10173796', '10215271', '10237991', '10432767', '11237028', '11534555', '11953179', '13751046', '16012811', '16018313', '16032335', '16192614', '16192765', '16193792', '17325420', '17354463', '17388866', '18566671', '18566672', '18589704', '19988685', '19988703', '19988741', '20080889', '20883959', '20925640', '21626819', '21626821', '21758532', '21758548', '21758549', '21783605', '22767708', '23341191', '23604751', '24070578', '25132194', '25459838', '27478178', '27478226', '27509345', '42210143', '44351802', '44521965', '44522016', '44522026', '44522038', '44522052', '44522061', '44522064', '44522108', '44522191', '44522192', '44522193', '44522254', '44522281', '44522286', '44522300', '44522358', '44522363', '44522367', '44522371', '44522432', '44522445', '44522455', '44522502', '44522583', '44522691', '44522796', '44522817', '44522830', '44522835', '44522837', '44522956', '44523101', '44523176', '44523178', '44523216', '44523221', '44523267', '44523276', '44523308', '44523367', '44523483', '44523550', '44523664', '44523722', '44523726', '44523772', '44523822', '44523824', '44523826', '44523840', '44523901', '44523903', '44523905', '44523906', '44523907', '44523909', '44523913', '44524270', '44524414', '44524415', '44524421', '44524435', '44524839', '44524878', '44524880', '44525180', '44525320', '44525321', '44525322', '44525323', '44525593', '44525750', '44525761', '44525872', '44526408', '44526710', '44526754', '44526756', '44526800', '44526805', '44526815', '44526825', '44526827', '44527011', '44527222', '44527478', '44527483', '44527516', '44527532', '44527534', '44527536', '44527538', '44527542', '44527548', '44527677', '44527682', '44527731', '44527733', '44528139', '44528412', '44528594', '44528595', '44528645', '44528698', '44528753', '44528762', '44528799', '44528811', '44528836', '44528911', '44529050', '44529089', '44529095', '44529350', '44529362', '44529363', '44529528', '44529532', '44529555', '44529567', '44529731', '44529772', '44529819', '44529820', '44529854', '44529925', '44529971', '44530112', '44530114', '44530124', '44530128', '44530132', '44530383', '44530466', '44530480', '44530841', '44530965', '44531061', '44531085', '44531113', '44531131', '44531143', '44531146', '44531161', '44531163', '44531197', '44531233', '44531242', '44531284', '44531288', '44531375', '44531377', '44531379', '44531383', '44531412', '44531418', '44531426', '44531489', '44531491', '44531500', '44531502', '44531504', '44531514', '44531666', '44531672', '44531674', '44531677', '44531683', '44531695', '44531710', '44531722', '44531723', '44531724', '44531726', '44531727', '44531751', '44531757', '44531771', '44531780', '44531797', '44531818', '44531830', '44531839', '44531851', '44531878', '44531889', '44531897', '44531904', '44531911', '44531922', '44531939', '44531943', '44531945', '44531948', '44531949', '44531952', '44531954', '44531968', '44531969', '44531988', '44531998', '44531999', '44532003', '44532037', '44532413', '44532568', '44532601', '44532621', '44532840', '44532843', '44532903', '44532999', '44533015', '44533033', '44533053', '44533115', '44533186', '44533212', '44533402', '44533413', '44533485', '44533746', '44533750', '44533804', '44533849', '44533946', '44533970', '44533978', '44533982', '44534040', '44534061', '44534063', '44534153', '44534159', '44534215', '44534218', '44534219', '44534263', '44534576', '44534852', '44534854', '44534856', '44534973', '44534975', '44535067', '44535073', '44535098', '44535166', '44535181', '44535556', '44535558', '44535562', '44535738', '44535749', '44535752', '44535819', '44535855', '44535969', '44536025', '44536135', '44536165', '44536181', '44536183', '44536356', '44536565', '44536569', '44536587', '44536598', '44536603', '44536616', '44536663', '44536677', '44536707', '44537143', '44537238', '44537280', '44537288', '44537308', '44537313', '44537359', '45480698', '45480775', '45480913', '45480945', '45481017', '45481049', '45481085', '45481291', '45481409', '45481490', '45489378', '135415440', '135421442', '135551662', '135731891', '135752941', '135752970', '135752975', '135857472', '135925134', '135925140', '135925147', '135925159', '135925164', '136032603']
    

    It is also possible to specify the input assay list implicitly. For example, the following code cell retrieves compounds tested in any assays targeting human Carbonic anhydrase 2 (CA2), whose accession number is P00918.

    In [15]:

    url = prolog + "/assay/target/accession/" + "P00918" + "/cids/txt"
    res = requests.get(url)
    cids = res.text.split()
    print(len(cids))
    #print(cids)
    
    23978
    

    Exercise 3a: Find compounds that are tested to be active against human acetylcholinesterase (accession: P08173) and retrieve SMILES strings for those compounds.

    • Split the CID list into smaller chunks (with a chunk size of 100).
    • Print the retrieved data in a CSV format (CID and SMILES strings in the first and second columns, respectively).

    In [16]:

    # Write your code in this cell.
    

    In [ ]:

    
    

     

     

     

    • Was this article helpful?