Final Project Spring 2019

Final Project Spring 2019

Data Visualization

Data visualization is a powerful mode of communication that is increasingly becoming a must-have tool for anyone who deals with data. Your brain is much better at interpreting visual representations of data than understanding pages of numbers. A well-designed visualization of data conveys enormous amounts of information much faster than lists in a spreadsheet. For example, imagine that you need to decide which geographical areas in the United States were hardest hit by the opioid epidemic. Table 1 and Figure 1 present the same data but a quick glance at Figure 1 reveals that many of the states with the highest rate of overdose deaths coincide with what is commonly referred to as the “rust belt” in the United States. Trends or relationships, lost when viewing columns of thousands or millions of numbers in a spreadsheet, jump out with the appropriate graphical representation.

Table 1 Source: https://www.cdc.gov/drugoverdose/data/statedeaths.html

Figure 1. Drug Overdose Death Rate per 100,000 Residents for 2017 For an explanation and similar maps, see https://www.cdc.gov/drugoverdose/data/statedeaths.html.

Final Project Overview

Through the CSCI 203 Final Project, you will gain experience with data visualization. Specifically, you will determine the answers to several questions by analyzing big data sets with Python and displaying the answers graphically using Matplotlib, a 2-D plotting tool. For Part 1 of the CSCI 203 Final Project, you will answer the following questions:

  1. How have the yearly total number of unintentional overdose deaths varied over the years starting in 1999 and ending in 2017?
  2. Compare the states with the five highest rates (deaths per 100,000 people) of unintentional overdose deaths in 1999 with those in 2017. Did the states with the highest rates of unintentional overdose deaths change from 1999 to 2017? Were the values of the largest rates in 1999 different from those in 2017?

For Part 2 of the CSCI 203 Final Project, you will pose your own question about a public-health issue that can be answered by analyzing data obtained from the Centers for Disease Control and Prevention website called CDC Wonder at https://wonder.cdc.gov/. You will then design and implement a graphical representation of your answer using Matplotlib.

For both parts, you will need to write Python functions and incorporate techniques learned this semester. The following text explains the requirements for the project proposal and the project.

Due Dates

  • Project Proposal: Sunday, April 14, 2019
  • Final Project
    • Part 1: Monday, April 22, 2019
    • Part 2 (code): Thursday, April 25, 2019
    • Part 2 (summary, final revisions): Sunday, April 28, 2019

Provided Data for Part 1

The linked zip folder part_1_project_files_spring_2019 has the following files to be analyzed for Part 1:

Project Preparation

Before you write your proposal, you should read this entire document to understand the scope of the project. Along with reading this document to fully grasp the expectations for the final project and looking at https://wonder.cdc.gov/ for ideas of available datasets, you should:

Project Proposal (To be done individually.)

Your project proposal should describe the functions that you will write for both parts of the CSCI 203 Final Project. For Part 2, you need to pose a question to be answered using data from the CDC Wonder website and describe the functions needed to analyze the data and visualize the answer to your posed question.

Details on Part 1 of the Proposal

For your proposal, you should describe the functions that you will use in Part 1 to complete the required elements. Propose names for your Python functions and include docstrings describing what will be passed into the function and what will be returned if anything. While we explain Part 1 in significantly more detail lower in the document, below is a quick summary:

For Part 1, you will analyze unintended overdose deaths in the United States over the years 1999 to 2017. The data is located in the file overdose_X40_X44_1999_2017.csv.

To answer the question “How have the yearly number of unintentional overdose deaths varied from 1999 to 2017?”, you need to:

  • create a Python dictionary with the year as the key and the total of unintended overdose deaths for that year as the value.
  • plot the yearly total of unintended overdose deaths as a function of year using Matplotlib.

To compare the states with the five highest rates (deaths per 100,000 people) of unintentional overdose deaths in 1999 with those in 2017, you need to:

  • create a Python dictionary with the location (state or District of Columbia) as the key and rate of unintended overdose deaths (as deaths per 100,000 residents) in 1999 as the value.
  • create a Python dictionary with the location (state or District of Columbia) as the key and rate of unintended overdose deaths (as deaths per 100,000 residents) in 2017 as the value.
  • draw a bar plot, with Matplotlib, showing the five states with the highest rate of unintended overdose deaths for 1999 and 2017.

Details on Part 2 of the Proposal

For Part 2 of the proposal, you will:

  • pose a public-health related question,
  • select a dataset from https://wonder.cdc.gov/ corresponding to your area of inquiry,
  • outline functions to be saved in a file named my_analysis.py to
    • analyze the data with Python to answer your question, and
    • illustrate your results using Matplotlib.

For ideas of public-health issues, go to cdc.gov. We encourage you to think ambitiously here. The amount of available data is vast, as are the many questions that you could ask. The idea is that you are using your new skills to investigate a question that is meaningful to you.

After formulating a question about public health, thoughtfully consider the steps needed to complete your analysis. Your proposal should be broken into the following sections:

  • Overview What question(s) do you wish to answer?
  • Motivation What led you to this question and/or why do you find this question interesting?
  • Dataset(s) that you will analyze including the type of public-health data and the source of the data.
    • Select a dataset that aligns with the overarching question(s) that you are planning to answer.
      • Go to https://wonder.cdc.gov/ and click on a topic at this link related to your area of inquiry.
      • For some topics, you need to agree to the terms of use before being forwarded to data selection menus.
    • Include a preliminary count of the number of measurements in your dataset. The total number of cells in your spreadsheet should be at least 10K. If you cannot find 10K from one downloaded dataset, consider combining more than one dataset in your study.
  • Analysis: How will you analyze your data? There are many potential analyses of data such as determination of maximum values, minimum values, means, running averages, or variance. Your analysis for Part 2 should be significantly different from that done in Part 1. Like any good analysis, think of an interesting question first. Then, determine which kinds of analysis will help you answer that question
  • Hand-drawn sketch with annotations of how you would like to visualize your data. You can scan or take a picture of your hand-drawn sketch and insert it in your proposal. Remember to include labels on your axes, if axes are used in your graphic, to show what you will be plotting. Since you do not know the results of your analysis, guess how your final plots might look and sketch that. You will probably want to flip through existing plots on Matplotlib to get a sense of what’s possible: http://matplotlib.org/gallery.html. We encourage you to learn new ways to plot or show your results from viewing the code given in the Matplotlib gallery. Remember to always cite any code used from a resource by providing the reference in your docstring.
  • List of major functions that you will need for your project. Think through the logical components you need to accomplish your analysis. What high-level functions do you need? We don’t expect you to perfectly imagine all the functions that you need, but we do expect you to think carefully about the complexity of your problem. Include the proposed function name, the information that needs to be passed into the function, followed by a docstring-like description, and what, if anything, is returned from the function.
  • Challenges that you may face in completing the project. Describe your personal assessment of the difficulties of your project and how your personal strengths and weaknesses will contribute or hinder your work. What will involve the largest investment of time? What resources will you need?

Communication is critical in computer science, as it is in any other major. We will grade your proposal on its ability to convey your ideas as well as its fulfillment of the required elements and overall quality. Meaningful feedback from us will be impossible if you present a haphazard plan.

For Submission on Sunday, April 14, 2019

Submit your project proposal as a PDF of about two single-spaced pages, but no more than three, on Moodle. Organize the text into sections with section titles. Here is a checklist for your submission:

  • Overview
  • Motivation
  • Data
    • Source
    • Approximate size (number of cells to be analyzed)
  • Analysis
  • Sketches
  • Challenges

Proposal Grading

The proposal is worth 20 points, distributed as follows:

  • 10 pts: Completeness: Did you do everything that we asked you to do?
  • 10 pts: Quality: Was your writing clear? Does your document look professional in quality? Is your document structured well?

While you should wait for an instructor’s feedback on your proposal before moving to programming Part 2, you can (and should) work on Part 1 well before that date.

Final Project (To be done individually.)

For your final project, you will perform two different kinds of analyses. The first is to make sure you can nail down the basics. The second is an implementation and visualization of the answer to your posed question.

Part 1: Drug Use and Visualization

One of the current major health issues is the opioid epidemic. For Part 1, you will explore aspects of this problem by analyzing unintentional overdose deaths from 1999 to 2017. Before you begin, scroll through the provided data to understand what values are given in each column of each file. In particular, the data in overdose_X40_X44_1999_2017.csv:

  • include only deaths classified under the 10th revision of the International Classification of Diseases (ICD-10) as X40-X44, unintentional overdoses. (For more details on classification of overdose deaths, see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547584/. The full reference for the article at this link is given at the bottom of this page.)
  • do not include data for every month and year from every location. For example, no values are included for overdose deaths for Alaska before 2003. In some cases, data is suppressed when an insufficient number of deaths occur in a location over a time span. Imagine that one death occurred this month in Lewisburg and the identity of the person who died was reported in the newspaper but not the cause of death. If the CDC reported that an overdose death occurred in Lewisburg this month, the identity of the person could be determined. See https://wonder.cdc.gov/wonder/help/faq.html#Privacy for details on data suppression due to privacy concerns.

For overdose_analysis.py, you need functions to:

  • read the data from the file overdose_X40_X44_1999_2017.csv.
  • create a dictionary with the year as the key and the total of unintended overdose deaths for that year as the value.
  • print a table with the year and the total of unintended overdose deaths for that year as the value.
  • plot with Matplotlib the yearly total of unintended overdose deaths as a function of year.
  • read the population of each state from the file census2000_state_pop.csv if the year to be analyzed is 2005 or before and census2010_state_pop.csv if the year to be analyzed is 2006 or later.
  • create a dictionary with the location (state or District of Columbia) as the key and rate of unintended overdose deaths (as deaths per 100,000 residents) in a specific year as the value.
  • print a table with the location and unintended overdose death rate for that location in a specific year.
  • determine the five states with the highest rates of unintended overdose deaths for a given year.
  • draw a bar plot, with Matplotlib, showing the five states with the highest rates of unintended overdose deaths for 1999 and 2017.
  • call the other helper functions and produce the desired results. This last function should be called main. When the program runs, main should automatically be called which in turn calls functions to:
    • print a table with the year and the yearly total of unintended overdose deaths for the United States for the years 1999 to 2017.
    • plot the yearly total of unintended overdose deaths in the United States as a function of year for the years 1999 to 2017.
    • print a table with the unintended overdose death rate for 1999 for each location
    • print a table with the unintended overdose death rate for 2017 for each location
    • draw a bar plot showing the five states with the highest unintended overdose death rate for 1999 and 2017.

It is important that your program works exactly as described above. The program should automatically call a main() function when it is run. For example:

def main():
   print("There is where my code would go")

main()   # Begins running the code

Examples

To check your code and show possible formats for your results for Part 1, review the following examples.

Example 1: For Part 1, you need to determine the yearly total of unintended overdose deaths for the United States for the years 1999 to 2017. To give you an idea of how your results could be displayed, Table 2 and Figure 2 show the yearly total of unintended overdose deaths for just Pennsylvania for the years 1999 to 2017.  The data for these results come from the provided file overdose_X40_X44_1999_2017.csv.

Table 2: Unintended Overdose Deaths in Pennsylvania

(Data source: https://wonder.cdc.gov/)

Figure 2. Unintended Overdose Deaths in Pennsylvania, ICD-10 classification:X40-X44. (Data source: https://wonder.cdc.gov/)

Example 2: For Part 1, you need to determine the yearly unintended overdose death rate for each state and the District of Columbia for the years 1999 and 2017. This example shows the results for the yearly unintended overdose death rate for each state and the District of Columbia for the years 2000 and 2010, instead of 1999 and 2017, in Tables 3 and 4 and Figure 3. The data for these results come from the provided file overdose_X40_X44_1999_2017.csv.

Table 3. Unintended Overdose Death Rate (deaths per 100,000 residents) for the United States for the Year 2000 (Data source: https://wonder.cdc.gov/)

 

Table 4. Unintended Overdose Death Rate (deaths per 100,000 residents) for the United States for the Year 2010 (Data source: https://wonder.cdc.gov/)

 

Figure 3. Five states with the highest overdose death rate (deaths per 100,000 residents) for the years 2000 and 2010. (Data source: https://wonder.cdc.gov/)

All of your plots should be labeled, with proper titles, and legends when appropriate. Any csv files used in your analysis should be included along with your code that you submit in Moodle. Finally, think carefully about your code in Part 1. You should be able to use (copy-paste) the components you build for this part, such as your function for reading data from a file, into the next one as well.

Scroll down to Programming Tips and Suggestions for some help on these points.

Part 2: Your Visualization and Analysis

You should complete your own analysis and data visualization (using Matplotlib) that you described in your proposal. If at some point you find that your proposed analysis was too ambitious, it’s fine to change it. Simply explain the reason(s) in your README.txt at the end of the project. The analysis to answer the question that you proposed should include at least 10K cells in the data file.

We encourage you to tackle an interesting question, not just an easy one! You will be graded partially on the creativity that you exhibit in your analysis. An analysis that requires very few changes from Part 1 will receive a lower grade than one that requires more significant changes.

This is a complex task, and will differ for different individuals, so before you get started make sure to take a good look at the feedback you get from instructors on the project proposal, and think carefully about the structure of your program.

Similar to Part 1, the program should automatically call the main() function when it is run without forcing the user to type the function name in the Python shell prompt.

For Submission on Monday, April 22, 2019

Submit the following on Moodle:

  • overdose_analysis.py, a Python program that:
    •  prints and plots the total overdose deaths in the United States from 1999 to 2017
    • prints the overdose death rates of each state and the District of Columbia for 1999 and 2017
    • graphically shows the five states with the highest overdose death rate in 1999 and 2017.
  • README.txt, a text file that:
    • provides clear instructions on how to run your program. (This should be short. Describe the files needed to run the program. Then, hopefully, you just need to instruct us to load and run your program.)
    • explains what does or doesn’t work in your project.
  • Any data files used by the program, e.g. census2000_state_pop.csv, census2010_state_pop.csv, and overdose_X40_X44_1999_2017.csv.
  • A screenshot (or screenshots) of what your visualization(s) look like when you run them on your computer. This provides us context for what you think is working, even if it doesn’t work on our computers for some reason. To avoid difficulties due to differences in your version of Python and ours, run your program in the lab with the Linux operating system before submission and insure that your program works on the system on which your program will be tested.

For Submission on Thursday, April 25, 2019

Submit the following on Moodle:

  • my_analysis.py, a Python program that contains your analysis and visualization that you proposed. This will take a significant amount of time! Don’t put this off!
  • README.txt, a text file that:
    • gives clear instructions about how to run your analysis. (Again this should be short.)
    • explains what does or doesn’t work in your project.
  • Any data files used by the program.
  • A screenshot (or screenshots) of what your visualization(s) look like when you run them on your computer. Again, this is to give us an idea of what you think works.

For Submission on Sunday, April 28, 2019

Submit the following on Moodle by 11: 55 pm on Sunday, April 28, 2019:

  • a pdf file titled “CSCI203_Final_Project.pdf” that:
    • includes a concise reflection about your experience with the project. Include a paragraph about any Python or Matplotlib skills that you learned or improved as a result of this project. If you encountered technical difficulties or changed your project from that described in your project proposal, provide a second paragraph on the roadblocks that you encountered and what your final project became.
    • includes a concise reflection on what you learned from the results of your analyses in Parts 1 and 2. Include a paragraph with answers to the questions posed for Part 1 and a second paragraph for answers to the question(s) that you posed in your proposal. For both parts, did you learn anything about a public health issue or how big datasets can be used to understand issues?
  • project files (even if you have not revised them from a previous submission):
    • overdose_analysis.py,
    • my_analysis.py,
    • any files needed to run your programs including:
      • overdose_X40_X44_1999_2017.csv
      • census2000_state_pop.csv,
      • census2010_state_pop.csv
      • files needed for doctests, and
      • data files for Part 2
  • all screenshots of what your visualizations look like when you run them on your computer.
  • README.txt, a text file that combines the text in your previously-submitted README.txt files and:
    • gives clear instructions about how to run your analysis. (Again this should be short.)
    • explains what does or doesn’t work in your project.

Email to your lecture instructor by 11: 55 pm on Sunday, April 28, 2019 a copy of your favorite graphic (plot or other visualization) resulting from Part 2 of your final project.

  • Name the file with your last and first name, ie. baish_susan.png.
  • Prepare to briefly describe in class with roughly four sentences:
    • the question that you asked for Part 2,
    • the results shown in your graphic,
    • any observations (something your learned, future questions that you would like to address on your topic, …)

Final Project Grading

  • 8 pts: How the organization of your program’s functions separates the tasks into a manner that is easy to understand and follow.
  • 8 pts: Use of good programming style, as defined by our programming style guidelines.
  • 10 pts: Use of doctests for each function that you write.
  • 8 pts: Correct submission, including the README and all necessary data files to run the program.
  • 40 pts: The functionality of the two programs. Do they work as specified? Any run time errors?
  • 6 pts: Creativity in choice of analysis and design of the visualization.

Programming Tips and Suggestions

Reading in text file(s)

The following code can be used to read data from the file state_pop_2000.csv or state_pop_2010.csv that includes two columns of data: location (a state or the District of Columbia) and the population. The columns with indices 0 and 1 correspond to location and population, respectively.

import csv

def read_state_pop_file(desired_year):
    '''
    Function to read a csv file.
    Input desired_year: an integer representing year under analysis
    If the desired year is <= 2005, the 2000 census data is used.
    If the desired year is after 2005, the 2010 census data is used.
    Column 0 of the file has the state
    Column 1 of the file has the population of the state
    Return lists of state, population
    The index of a state in the list state is the same
    index of the population of the same state in the list population.
    '''
    if desired_year <= 2005:                            # select file by desired_year
        filename = 'census2000_state_pop.csv'
    else:
        filename = 'census2010_state_pop.csv'
    state = []                                          # initialize state list
    population = []                                     # initialize population list
    with open(filename, newline='') as csvfile:         # creates a file object
        csvreader = csv.reader(csvfile, delimiter=',')  # set for reading csv file
        next(csvreader)                                 # skips header
        for row in csvreader:
            if row[0] and row[1]:               # check for data
                state += [row[0]]               # adds state name as a string in the list
                population += [int(row[1])]     # adds state population as an integer in the list 
    return state, population

Column headers or missing data?

Your code needs to be able to handle column headers and missing measurements. Examine the above function readStatePopFile to see how these issues are handled.  The line next(csvreader)skips the first row or header in the data file. Since an empty string is read as False by Python, the line if row[0] and row[1]: checks so that neither the state nor the population is an empty string.

A Dictionary with an Order?

Dictionaries can be very handy for your project. For example, imagine that you need a dictionary called d with keys that are locations (states and the District of Columbia) and the values are the populations for the location. (This dictionary could be useful in determining the overdose death rate, deaths per 100,000 residents of a state.)  The following code creates such a dictionary.

def state_pop_d(state, population):
    """
    Return a dictionary with the state as the key
    and the population of the state as the value
    Input state: list of states and District of Columbia
    Input population: list of population of
    corresponding state or District of Columbia
    Return d: a dictionary
    """
    d = {}             # create an empty dictionary
    n = len(state)     # determine length of the list with locations
    for i in range(n):  # add each location as a key and population as a value
        d[state[i]] = population[i]
    return d

The Python dictionary created by the above function does not keep keys and values in a particular order.  Similarly, the dictionary d created below is not ordered.

>>> # Create a small dictionary
>>> d = {'Ohio': 11353140, 'Alaska': 626932, 'Utah': 2233169, 'Maine': 1274923}

However, you can use something called an ordered dictionary to sort a dictionary by the keys or the values in ascending or descending order. For example, the dictionary d created in the above sample code could be ordered by the keys, the states, in alphabetically ascending or descending order. Or you could order your dictionary by the value, the state population, in ascending or descending numerical order. To create an ordered dictionary, you need to import OrderedDict as shown below. Here are a few examples of how a Python ordered dictionary works.

>>> from collections import OrderedDict
>>> ord_by_value_ascend =OrderedDict(sorted(d.items(), key = lambda t: t[1], reverse = False))
>>> ord_by_value_ascend
OrderedDict([('Alaska', 626932), ('Maine', 1274923), ('Utah', 2233169), ('Ohio', 11353140)])
>>> # Now the states are ordered from least populous state to most.
>>> ord_by_key_descend =OrderedDict(sorted(d.items(), key = lambda t: t[0], reverse = True))
>>> ord_by_key_descend
OrderedDict([('Utah', 2233169), ('Ohio', 11353140), ('Maine', 1274923), ('Alaska', 626932)])
>>> # Now the states are ordered in reverse alphabetical order.
>>> ord_by_key_ascend =OrderedDict(sorted(d.items(), key = lambda t: t[0], reverse = False))
>>> ord_by_key_ascend
OrderedDict([('Alaska', 626932), ('Maine', 1274923), ('Ohio', 11353140), ('Utah', 2233169)])
>>> # Now the states are in alphabetical order

To order by the keys, use t[0] in the above code. To order by the values, use t[1] in the code. Whether the values are ascending or descending is determined by the reverse parameter with True for descending and False for ascending. The function order_d takes a dictionary as an argument and returns a dictionary ordered on the value from largest to smallest. Here is the code for order_d:

from collections import OrderedDict
def order_d(d):
    """
    Returns an ordered dictionary
    The returned dictionary is ordered on the value of the original dictionary,
    from highest value to lowest value
    Input: d, an unordered dictionary
    """
    return OrderedDict(sorted(d.items(), key = lambda t: t[1], reverse = True))

Here is an example of a function call of order_d:

>>> order_d(d)
OrderedDict([('Ohio', 11353140), ('Utah', 2233169), ('Maine', 1274923), ('Alaska', 626932)])

Printing a Dictionary

The function print_d uses the Python dictionary method items to loop through each of the keys and the corresponding values and then print the keys and values. The code for the function is:

def print_d(d):
    """
    Print the keys and values of the dictionary
    The keys are assumed to be locations.
    The values are assumed to be population of the location.
    """
    # print header for table of values
    print(36*'-')
    print(' {0:^20s} {1:^14s} '.format("Location", "Population"))
    print(36*'-')
    # print keys and values
    for key, value in d.items():
        print(' {0:^20s} {1:^14,} '.format(key, value))

Putting It All Together

For  both of your programs, overdose_analysis.py and my_analysis.py, you need a main function that executes when you run each program. The main function is different for each program. An example of a main function showing how to call read_state_pop_file, state_pop_d, and order_d and use the returned values is shown below.

def main():
    '''
    Function to call helper functions that
    read population data,
    create a dictionary with the location and corresponding population
    order the locations from largest to smallest populations
    print the locations and corresponding populations
    '''
    # Provide year for analysis
    desired_year = 2000
    # Read location and corresponding population for 2000
    state, population = read_state_pop_file(desired_year)
    # Create a dictionary with the location as the key
    # and population as the value
    d = state_pop_d(state, population)
    # Order the dictionary
    ord_d = order_d(d)
    # Print the dictionary
    print_d(ord_d)  

Running this code produces:

>>> main()
------------------------------------
       Location         Population   
------------------------------------
      California        33,871,648   
        Texas           20,851,820   
       New York         18,976,457   
       Florida          15,982,378   
       Illinois         12,419,293   
     Pennsylvania       12,281,054   
         Ohio           11,353,140   
       Michigan         9,938,444    
      New Jersey        8,414,350    
       Georgia          8,186,453    
    North Carolina      8,049,313    
       Virginia         7,078,515    
    Massachusetts       6,349,097    
       Indiana          6,080,485    
      Washington        5,894,121    
      Tennessee         5,689,283    
       Missouri         5,595,211    
      Wisconsin         5,363,675    
       Maryland         5,296,486    
       Arizona          5,130,632    
      Minnesota         4,919,479    
      Louisiana         4,468,976    
       Alabama          4,447,100    
       Colorado         4,301,261    
       Kentucky         4,041,769    
    South Carolina      4,012,012    
       Oklahoma         3,450,654    
        Oregon          3,421,399    
     Connecticut        3,405,565    
         Iowa           2,926,324    
     Mississippi        2,844,658    
        Kansas          2,688,418    
       Arkansas         2,673,400    
         Utah           2,233,169    
        Nevada          1,998,257    
      New Mexico        1,819,046    
    West Virginia       1,808,344    
       Nebraska         1,711,263    
        Idaho           1,293,953    
        Maine           1,274,923    
    New Hampshire       1,235,786    
        Hawaii          1,211,537    
     Rhode Island       1,048,319    
       Montana           902,195     
       Delaware          783,600     
     South Dakota        754,844     
     North Dakota        642,200     
        Alaska           626,932     
       Vermont           608,827     
 District of Columbia    572,059     
       Wyoming           493,782     



Writing Good Code.

Your project will be judged on not only whether you code works, but also whether you wrote good code. Refer to Python Style Requirements on the home page for general information, but here are the main points:

  • Make sure that each of your Python functions encapsulates one particular defined task. A function shouldn’t do too much at once. If one of your functions ends up with many lines of code, that is probably a good sign that you should split it up into different functions.
  • Each function should have a docstring
  • Each function should have tests in the docstring (much like we’ve been doing in lab). When a doctest is inappropriate (for example, it involves graphical output), you may describe how you tested it in a sentence or two. If you write doctests as you go (instead of at the end), it should save you an enormous amount of time.
  • Avoid magic numbers and global variables!
  • Use descriptive function and variable names. A function name like printBoard is more descriptive than pb. A variable name like colors or numColors is more descriptive than c or x.
  • Don’t repeat yourself. If you have the same code more than once, it probably means that you should move that code into a function and then call it from wherever you need it.
  • Comments should be used for any code that isn’t obvious.

Reference

Slavova, S., O’Brien, D. B., Creppage, K., Dao, D., Fondario, A., Haile, E., Hume, B., Largo, T., Nguyen, C., Sabel, J., Wright, D., and members of the Council of State and Territorial Epidemiologists Overdose Subcommittee (2015). “Drug Overdose Deaths: Let’s Get Specific”. Public health reports (Washington, D.C. : 1974), 130(4), 339–342.

Back to the CSCI 203 home page.