Final Project Spring 2019
Data Visualization
Data visualization is a powerful mode of communication that is increasingly becoming a must-have tool for anyone who deals with data. Your brain is much better at interpreting visual representations of data than understanding pages of numbers. A well-designed visualization of data conveys enormous amounts of information much faster than lists in a spreadsheet. For example, imagine that you need to decide which geographical areas in the United States were hardest hit by the opioid epidemic. Table 1 and Figure 1 present the same data but a quick glance at Figure 1 reveals that many of the states with the highest rate of overdose deaths coincide with what is commonly referred to as the “rust belt” in the United States. Trends or relationships, lost when viewing columns of thousands or millions of numbers in a spreadsheet, jump out with the appropriate graphical representation.
Table 1 Source: https://www.cdc.gov/drugoverdose/data/statedeaths.html
Figure 1. Drug Overdose Death Rate per 100,000 Residents for 2017 For an explanation and similar maps, see https://www.cdc.gov/drugoverdose/data/statedeaths.html.
Final Project Overview
Through the CSCI 203 Final Project, you will gain experience with data visualization. Specifically, you will determine the answers to several questions by analyzing big data sets with Python and displaying the answers graphically using Matplotlib, a 2-D plotting tool. For Part 1 of the CSCI 203 Final Project, you will answer the following questions:
- How have the yearly total number of unintentional overdose deaths varied over the years starting in 1999 and ending in 2017?
- Compare the states with the five highest rates (deaths per 100,000 people) of unintentional overdose deaths in 1999 with those in 2017. Did the states with the highest rates of unintentional overdose deaths change from 1999 to 2017? Were the values of the largest rates in 1999 different from those in 2017?
For Part 2 of the CSCI 203 Final Project, you will pose your own question about a public-health issue that can be answered by analyzing data obtained from the Centers for Disease Control and Prevention website called CDC Wonder at https://wonder.cdc.gov/. You will then design and implement a graphical representation of your answer using Matplotlib.
For both parts, you will need to write Python functions and incorporate techniques learned this semester. The following text explains the requirements for the project proposal and the project.
Due Dates
- Project Proposal: Sunday, April 14, 2019
- Final Project
- Part 1: Monday, April 22, 2019
- Part 2 (code): Thursday, April 25, 2019
- Part 2 (summary, final revisions): Sunday, April 28, 2019
Provided Data for Part 1
The linked zip folder part_1_project_files_spring_2019 has the following files to be analyzed for Part 1:
overdose_X40_X44_1999_2017.csv
, a spreadsheet with the date, month, and location of unintentional overdose deaths from 1999 to 2017 obtained from https://wonder.cdc.gov/controller/datarequest/D76.- United State census data from https://factfinder.census.gov/faces/nav/jsf/pages/community_facts.xhtml
census2000_state_pop.csv
, census data from the year 2000census2010_state_pop.csv
, census data from the year 2010
Project Preparation
Before you write your proposal, you should read this entire document to understand the scope of the project. Along with reading this document to fully grasp the expectations for the final project and looking at https://wonder.cdc.gov/ for ideas of available datasets, you should:
- look over the tutorial for Matplotlib that we have created for you,
- read instructions of how to install Matplotlib if you’d like to install the library on your own computer,
- read the Programming Tips and Suggestions section of this document for some hints, and
- look over the provided datasets to see the data format.
Project Proposal (To be done individually.)
Your project proposal should describe the functions that you will write for both parts of the CSCI 203 Final Project. For Part 2, you need to pose a question to be answered using data from the CDC Wonder website and describe the functions needed to analyze the data and visualize the answer to your posed question.
Details on Part 1 of the Proposal
For your proposal, you should describe the functions that you will use in Part 1 to complete the required elements. Propose names for your Python functions and include docstrings describing what will be passed into the function and what will be returned if anything. While we explain Part 1 in significantly more detail lower in the document, below is a quick summary:
For Part 1, you will analyze unintended overdose deaths in the United States over the years 1999 to 2017. The data is located in the file overdose_X40_X44_1999_2017.csv
.
To answer the question “How have the yearly number of unintentional overdose deaths varied from 1999 to 2017?”, you need to:
- create a Python dictionary with the year as the key and the total of unintended overdose deaths for that year as the value.
- plot the yearly total of unintended overdose deaths as a function of year using Matplotlib.
To compare the states with the five highest rates (deaths per 100,000 people) of unintentional overdose deaths in 1999 with those in 2017, you need to:
- create a Python dictionary with the location (state or District of Columbia) as the key and rate of unintended overdose deaths (as deaths per 100,000 residents) in 1999 as the value.
- create a Python dictionary with the location (state or District of Columbia) as the key and rate of unintended overdose deaths (as deaths per 100,000 residents) in 2017 as the value.
- draw a bar plot, with Matplotlib, showing the five states with the highest rate of unintended overdose deaths for 1999 and 2017.
Details on Part 2 of the Proposal
For Part 2 of the proposal, you will:
- pose a public-health related question,
- select a dataset from https://wonder.cdc.gov/ corresponding to your area of inquiry,
- outline functions to be saved in a file named my_analysis.py to
- analyze the data with Python to answer your question, and
- illustrate your results using Matplotlib.
For ideas of public-health issues, go to cdc.gov. We encourage you to think ambitiously here. The amount of available data is vast, as are the many questions that you could ask. The idea is that you are using your new skills to investigate a question that is meaningful to you.
After formulating a question about public health, thoughtfully consider the steps needed to complete your analysis. Your proposal should be broken into the following sections:
- Overview What question(s) do you wish to answer?
- Motivation What led you to this question and/or why do you find this question interesting?
- Dataset(s) that you will analyze including the type of public-health data and the source of the data.
- Select a dataset that aligns with the overarching question(s) that you are planning to answer.
- Go to https://wonder.cdc.gov/ and click on a topic at this link related to your area of inquiry.
- For some topics, you need to agree to the terms of use before being forwarded to data selection menus.
- Include a preliminary count of the number of measurements in your dataset. The total number of cells in your spreadsheet should be at least 10K. If you cannot find 10K from one downloaded dataset, consider combining more than one dataset in your study.
- Select a dataset that aligns with the overarching question(s) that you are planning to answer.
- Analysis: How will you analyze your data? There are many potential analyses of data such as determination of maximum values, minimum values, means, running averages, or variance. Your analysis for Part 2 should be significantly different from that done in Part 1. Like any good analysis, think of an interesting question first. Then, determine which kinds of analysis will help you answer that question
- Hand-drawn sketch with annotations of how you would like to visualize your data. You can scan or take a picture of your hand-drawn sketch and insert it in your proposal. Remember to include labels on your axes, if axes are used in your graphic, to show what you will be plotting. Since you do not know the results of your analysis, guess how your final plots might look and sketch that. You will probably want to flip through existing plots on Matplotlib to get a sense of what’s possible: http://matplotlib.org/gallery.html. We encourage you to learn new ways to plot or show your results from viewing the code given in the Matplotlib gallery. Remember to always cite any code used from a resource by providing the reference in your docstring.
- List of major functions that you will need for your project. Think through the logical components you need to accomplish your analysis. What high-level functions do you need? We don’t expect you to perfectly imagine all the functions that you need, but we do expect you to think carefully about the complexity of your problem. Include the proposed function name, the information that needs to be passed into the function, followed by a docstring-like description, and what, if anything, is returned from the function.
- Challenges that you may face in completing the project. Describe your personal assessment of the difficulties of your project and how your personal strengths and weaknesses will contribute or hinder your work. What will involve the largest investment of time? What resources will you need?
Communication is critical in computer science, as it is in any other major. We will grade your proposal on its ability to convey your ideas as well as its fulfillment of the required elements and overall quality. Meaningful feedback from us will be impossible if you present a haphazard plan.
For Submission on Sunday, April 14, 2019
Submit your project proposal as a PDF of about two single-spaced pages, but no more than three, on Moodle. Organize the text into sections with section titles. Here is a checklist for your submission:
- Overview
- Motivation
- Data
- Source
- Approximate size (number of cells to be analyzed)
- Analysis
- Sketches
- Challenges
Proposal Grading
The proposal is worth 20 points, distributed as follows:
- 10 pts: Completeness: Did you do everything that we asked you to do?
- 10 pts: Quality: Was your writing clear? Does your document look professional in quality? Is your document structured well?
While you should wait for an instructor’s feedback on your proposal before moving to programming Part 2, you can (and should) work on Part 1 well before that date.
Final Project (To be done individually.)
For your final project, you will perform two different kinds of analyses. The first is to make sure you can nail down the basics. The second is an implementation and visualization of the answer to your posed question.
Part 1: Drug Use and Visualization
One of the current major health issues is the opioid epidemic. For Part 1, you will explore aspects of this problem by analyzing unintentional overdose deaths from 1999 to 2017. Before you begin, scroll through the provided data to understand what values are given in each column of each file. In particular, the data in overdose_X40_X44_1999_2017.csv
:
- include only deaths classified under the 10th revision of the International Classification of Diseases (ICD-10) as X40-X44, unintentional overdoses. (For more details on classification of overdose deaths, see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547584/. The full reference for the article at this link is given at the bottom of this page.)
- do not include data for every month and year from every location. For example, no values are included for overdose deaths for Alaska before 2003. In some cases, data is suppressed when an insufficient number of deaths occur in a location over a time span. Imagine that one death occurred this month in Lewisburg and the identity of the person who died was reported in the newspaper but not the cause of death. If the CDC reported that an overdose death occurred in Lewisburg this month, the identity of the person could be determined. See https://wonder.cdc.gov/wonder/help/faq.html#Privacy for details on data suppression due to privacy concerns.
For overdose_analysis.py, you need functions to:
- read the data from the file
overdose_X40_X44_1999_2017.csv
. - create a dictionary with the year as the key and the total of unintended overdose deaths for that year as the value.
- print a table with the year and the total of unintended overdose deaths for that year as the value.
- plot with Matplotlib the yearly total of unintended overdose deaths as a function of year.
- read the population of each state from the file
census2000_state_pop.csv
if the year to be analyzed is 2005 or before andcensus2010_state_pop.csv
if the year to be analyzed is 2006 or later. - create a dictionary with the location (state or District of Columbia) as the key and rate of unintended overdose deaths (as deaths per 100,000 residents) in a specific year as the value.
- print a table with the location and unintended overdose death rate for that location in a specific year.
- determine the five states with the highest rates of unintended overdose deaths for a given year.
- draw a bar plot, with Matplotlib, showing the five states with the highest rates of unintended overdose deaths for 1999 and 2017.
- call the other helper functions and produce the desired results. This last function should be called
main
. When the program runs,main
should automatically be called which in turn calls functions to:- print a table with the year and the yearly total of unintended overdose deaths for the United States for the years 1999 to 2017.
- plot the yearly total of unintended overdose deaths in the United States as a function of year for the years 1999 to 2017.
- print a table with the unintended overdose death rate for 1999 for each location
- print a table with the unintended overdose death rate for 2017 for each location
- draw a bar plot showing the five states with the highest unintended overdose death rate for 1999 and 2017.
It is important that your program works exactly as described above. The program should automatically call a main()
function when it is run. For example:
def main(): print("There is where my code would go") main() # Begins running the code
Examples
To check your code and show possible formats for your results for Part 1, review the following examples.
Example 1: For Part 1, you need to determine the yearly total of unintended overdose deaths for the United States for the years 1999 to 2017. To give you an idea of how your results could be displayed, Table 2 and Figure 2 show the yearly total of unintended overdose deaths for just Pennsylvania for the years 1999 to 2017. The data for these results come from the provided file overdose_X40_X44_1999_2017.csv
.
Table 2: Unintended Overdose Deaths in Pennsylvania
(Data source: https://wonder.cdc.gov/)
Figure 2. Unintended Overdose Deaths in Pennsylvania, ICD-10 classification:X40-X44. (Data source: https://wonder.cdc.gov/)
Example 2: For Part 1, you need to determine the yearly unintended overdose death rate for each state and the District of Columbia for the years 1999 and 2017. This example shows the results for the yearly unintended overdose death rate for each state and the District of Columbia for the years 2000 and 2010, instead of 1999 and 2017, in Tables 3 and 4 and Figure 3. The data for these results come from the provided file overdose_X40_X44_1999_2017.csv
.
Table 3. Unintended Overdose Death Rate (deaths per 100,000 residents) for the United States for the Year 2000 (Data source: https://wonder.cdc.gov/)
Table 4. Unintended Overdose Death Rate (deaths per 100,000 residents) for the United States for the Year 2010 (Data source: https://wonder.cdc.gov/)
Figure 3. Five states with the highest overdose death rate (deaths per 100,000 residents) for the years 2000 and 2010. (Data source: https://wonder.cdc.gov/)
All of your plots should be labeled, with proper titles, and legends when appropriate. Any csv files used in your analysis should be included along with your code that you submit in Moodle. Finally, think carefully about your code in Part 1. You should be able to use (copy-paste) the components you build for this part, such as your function for reading data from a file, into the next one as well.
Scroll down to Programming Tips and Suggestions for some help on these points.
Part 2: Your Visualization and Analysis
You should complete your own analysis and data visualization (using Matplotlib) that you described in your proposal. If at some point you find that your proposed analysis was too ambitious, it’s fine to change it. Simply explain the reason(s) in your README.txt
at the end of the project. The analysis to answer the question that you proposed should include at least 10K cells in the data file.
We encourage you to tackle an interesting question, not just an easy one! You will be graded partially on the creativity that you exhibit in your analysis. An analysis that requires very few changes from Part 1 will receive a lower grade than one that requires more significant changes.
This is a complex task, and will differ for different individuals, so before you get started make sure to take a good look at the feedback you get from instructors on the project proposal, and think carefully about the structure of your program.
Similar to Part 1, the program should automatically call the main()
function when it is run without forcing the user to type the function name in the Python shell prompt.
For Submission on Monday, April 22, 2019
Submit the following on Moodle:
- overdose_analysis.py, a Python program that:
- prints and plots the total overdose deaths in the United States from 1999 to 2017
- prints the overdose death rates of each state and the District of Columbia for 1999 and 2017
- graphically shows the five states with the highest overdose death rate in 1999 and 2017.
README.txt
, a text file that:- provides clear instructions on how to run your program. (This should be short. Describe the files needed to run the program. Then, hopefully, you just need to instruct us to load and run your program.)
- explains what does or doesn’t work in your project.
- Any data files used by the program, e.g.
census2000_state_pop.csv
,census2010_state_pop.csv
, andoverdose_X40_X44_1999_2017.csv
. - A screenshot (or screenshots) of what your visualization(s) look like when you run them on your computer. This provides us context for what you think is working, even if it doesn’t work on our computers for some reason. To avoid difficulties due to differences in your version of Python and ours, run your program in the lab with the Linux operating system before submission and insure that your program works on the system on which your program will be tested.
For Submission on Thursday, April 25, 2019
Submit the following on Moodle:
- my_analysis.py, a Python program that contains your analysis and visualization that you proposed. This will take a significant amount of time! Don’t put this off!
README.txt
, a text file that:- gives clear instructions about how to run your analysis. (Again this should be short.)
- explains what does or doesn’t work in your project.
- Any data files used by the program.
- A screenshot (or screenshots) of what your visualization(s) look like when you run them on your computer. Again, this is to give us an idea of what you think works.
For Submission on Sunday, April 28, 2019
Submit the following on Moodle by 11: 55 pm on Sunday, April 28, 2019:
- a pdf file titled “CSCI203_Final_Project.pdf” that:
- includes a concise reflection about your experience with the project. Include a paragraph about any Python or Matplotlib skills that you learned or improved as a result of this project. If you encountered technical difficulties or changed your project from that described in your project proposal, provide a second paragraph on the roadblocks that you encountered and what your final project became.
- includes a concise reflection on what you learned from the results of your analyses in Parts 1 and 2. Include a paragraph with answers to the questions posed for Part 1 and a second paragraph for answers to the question(s) that you posed in your proposal. For both parts, did you learn anything about a public health issue or how big datasets can be used to understand issues?
- project files (even if you have not revised them from a previous submission):
- overdose_analysis.py,
- my_analysis.py,
- any files needed to run your programs including:
overdose_X40_X44_1999_2017.csv
census2000_state_pop.csv
,census2010_state_pop.csv
- files needed for doctests, and
- data files for Part 2
- all screenshots of what your visualizations look like when you run them on your computer.
README.txt
, a text file that combines the text in your previously-submittedREADME.txt
files and:- gives clear instructions about how to run your analysis. (Again this should be short.)
- explains what does or doesn’t work in your project.
Email to your lecture instructor by 11: 55 pm on Sunday, April 28, 2019 a copy of your favorite graphic (plot or other visualization) resulting from Part 2 of your final project.
- Name the file with your last and first name, ie.
baish_susan.png
. - Prepare to briefly describe in class with roughly four sentences:
- the question that you asked for Part 2,
- the results shown in your graphic,
- any observations (something your learned, future questions that you would like to address on your topic, …)
Final Project Grading
- 8 pts: How the organization of your program’s functions separates the tasks into a manner that is easy to understand and follow.
- 8 pts: Use of good programming style, as defined by our programming style guidelines.
- 10 pts: Use of doctests for each function that you write.
- 8 pts: Correct submission, including the README and all necessary data files to run the program.
- 40 pts: The functionality of the two programs. Do they work as specified? Any run time errors?
- 6 pts: Creativity in choice of analysis and design of the visualization.
Programming Tips and Suggestions
Reading in text file(s)
The following code can be used to read data from the file state_pop_2000.csv
or state_pop_2010.csv
that includes two columns of data: location (a state or the District of Columbia) and the population. The columns with indices 0 and 1 correspond to location and population, respectively.
import csv def read_state_pop_file(desired_year): ''' Function to read a csv file. Input desired_year: an integer representing year under analysis If the desired year is <= 2005, the 2000 census data is used. If the desired year is after 2005, the 2010 census data is used. Column 0 of the file has the state Column 1 of the file has the population of the state Return lists of state, population The index of a state in the list state is the same index of the population of the same state in the list population. ''' if desired_year <= 2005: # select file by desired_year filename = 'census2000_state_pop.csv' else: filename = 'census2010_state_pop.csv' state = [] # initialize state list population = [] # initialize population list with open(filename, newline='') as csvfile: # creates a file object csvreader = csv.reader(csvfile, delimiter=',') # set for reading csv file next(csvreader) # skips header for row in csvreader: if row[0] and row[1]: # check for data state += [row[0]] # adds state name as a string in the list population += [int(row[1])] # adds state population as an integer in the list return state, population
Column headers or missing data?
Your code needs to be able to handle column headers and missing measurements. Examine the above function readStatePopFile
to see how these issues are handled. The line next(csvreader)
skips the first row or header in the data file. Since an empty string is read as False by Python, the line if row[0] and row[1]:
checks so that neither the state nor the population is an empty string.
A Dictionary with an Order?
Dictionaries can be very handy for your project. For example, imagine that you need a dictionary called d
with keys that are locations (states and the District of Columbia) and the values are the populations for the location. (This dictionary could be useful in determining the overdose death rate, deaths per 100,000 residents of a state.) The following code creates such a dictionary.
def state_pop_d(state, population): """ Return a dictionary with the state as the key and the population of the state as the value Input state: list of states and District of Columbia Input population: list of population of corresponding state or District of Columbia Return d: a dictionary """ d = {} # create an empty dictionary n = len(state) # determine length of the list with locations for i in range(n): # add each location as a key and population as a value d[state[i]] = population[i] return d
The Python dictionary created by the above function does not keep keys and values in a particular order. Similarly, the dictionary d created below is not ordered.
>>> # Create a small dictionary >>> d = {'Ohio': 11353140, 'Alaska': 626932, 'Utah': 2233169, 'Maine': 1274923}
However, you can use something called an ordered dictionary to sort a dictionary by the keys or the values in ascending or descending order. For example, the dictionary d
created in the above sample code could be ordered by the keys, the states, in alphabetically ascending or descending order. Or you could order your dictionary by the value, the state population, in ascending or descending numerical order. To create an ordered dictionary, you need to import OrderedDict
as shown below. Here are a few examples of how a Python ordered dictionary works.
>>> from collections import OrderedDict >>> ord_by_value_ascend =OrderedDict(sorted(d.items(), key = lambda t: t[1], reverse = False)) >>> ord_by_value_ascend OrderedDict([('Alaska', 626932), ('Maine', 1274923), ('Utah', 2233169), ('Ohio', 11353140)]) >>> # Now the states are ordered from least populous state to most. >>> ord_by_key_descend =OrderedDict(sorted(d.items(), key = lambda t: t[0], reverse = True)) >>> ord_by_key_descend OrderedDict([('Utah', 2233169), ('Ohio', 11353140), ('Maine', 1274923), ('Alaska', 626932)]) >>> # Now the states are ordered in reverse alphabetical order. >>> ord_by_key_ascend =OrderedDict(sorted(d.items(), key = lambda t: t[0], reverse = False)) >>> ord_by_key_ascend OrderedDict([('Alaska', 626932), ('Maine', 1274923), ('Ohio', 11353140), ('Utah', 2233169)]) >>> # Now the states are in alphabetical order
To order by the keys, use t[0]
in the above code. To order by the values, use t[1]
in the code. Whether the values are ascending or descending is determined by the reverse
parameter with True
for descending and False
for ascending. The function order_d
takes a dictionary as an argument and returns a dictionary ordered on the value from largest to smallest. Here is the code for order_d
:
from collections import OrderedDict def order_d(d): """ Returns an ordered dictionary The returned dictionary is ordered on the value of the original dictionary, from highest value to lowest value Input: d, an unordered dictionary """ return OrderedDict(sorted(d.items(), key = lambda t: t[1], reverse = True))
Here is an example of a function call of order_d
:
>>> order_d(d) OrderedDict([('Ohio', 11353140), ('Utah', 2233169), ('Maine', 1274923), ('Alaska', 626932)])
Printing a Dictionary
The function print_d
uses the Python dictionary method items
to loop through each of the keys and the corresponding values and then print the keys and values. The code for the function is:
def print_d(d): """ Print the keys and values of the dictionary The keys are assumed to be locations. The values are assumed to be population of the location. """ # print header for table of values print(36*'-') print(' {0:^20s} {1:^14s} '.format("Location", "Population")) print(36*'-') # print keys and values for key, value in d.items(): print(' {0:^20s} {1:^14,} '.format(key, value))
Putting It All Together
For both of your programs, overdose_analysis.py and my_analysis.py, you need a main
function that executes when you run each program. The main
function is different for each program. An example of a main
function showing how to call read_state_pop_file
, state_pop_d
, and order_d
and use the returned values is shown below.
def main(): ''' Function to call helper functions that read population data, create a dictionary with the location and corresponding population order the locations from largest to smallest populations print the locations and corresponding populations ''' # Provide year for analysis desired_year = 2000 # Read location and corresponding population for 2000 state, population = read_state_pop_file(desired_year) # Create a dictionary with the location as the key # and population as the value d = state_pop_d(state, population) # Order the dictionary ord_d = order_d(d) # Print the dictionary print_d(ord_d)
Running this code produces:
>>> main() ------------------------------------ Location Population ------------------------------------ California 33,871,648 Texas 20,851,820 New York 18,976,457 Florida 15,982,378 Illinois 12,419,293 Pennsylvania 12,281,054 Ohio 11,353,140 Michigan 9,938,444 New Jersey 8,414,350 Georgia 8,186,453 North Carolina 8,049,313 Virginia 7,078,515 Massachusetts 6,349,097 Indiana 6,080,485 Washington 5,894,121 Tennessee 5,689,283 Missouri 5,595,211 Wisconsin 5,363,675 Maryland 5,296,486 Arizona 5,130,632 Minnesota 4,919,479 Louisiana 4,468,976 Alabama 4,447,100 Colorado 4,301,261 Kentucky 4,041,769 South Carolina 4,012,012 Oklahoma 3,450,654 Oregon 3,421,399 Connecticut 3,405,565 Iowa 2,926,324 Mississippi 2,844,658 Kansas 2,688,418 Arkansas 2,673,400 Utah 2,233,169 Nevada 1,998,257 New Mexico 1,819,046 West Virginia 1,808,344 Nebraska 1,711,263 Idaho 1,293,953 Maine 1,274,923 New Hampshire 1,235,786 Hawaii 1,211,537 Rhode Island 1,048,319 Montana 902,195 Delaware 783,600 South Dakota 754,844 North Dakota 642,200 Alaska 626,932 Vermont 608,827 District of Columbia 572,059 Wyoming 493,782
Writing Good Code.
Your project will be judged on not only whether you code works, but also whether you wrote good code. Refer to Python Style Requirements on the home page for general information, but here are the main points:
- Make sure that each of your Python functions encapsulates one particular defined task. A function shouldn’t do too much at once. If one of your functions ends up with many lines of code, that is probably a good sign that you should split it up into different functions.
- Each function should have a docstring
- Each function should have tests in the docstring (much like we’ve been doing in lab). When a doctest is inappropriate (for example, it involves graphical output), you may describe how you tested it in a sentence or two. If you write doctests as you go (instead of at the end), it should save you an enormous amount of time.
- Avoid magic numbers and global variables!
- Use descriptive function and variable names. A function name like
printBoard
is more descriptive thanpb
. A variable name likecolors
ornumColors
is more descriptive thanc
orx
. - Don’t repeat yourself. If you have the same code more than once, it probably means that you should move that code into a function and then call it from wherever you need it.
- Comments should be used for any code that isn’t obvious.
Reference
Slavova, S., O’Brien, D. B., Creppage, K., Dao, D., Fondario, A., Haile, E., Hume, B., Largo, T., Nguyen, C., Sabel, J., Wright, D., and members of the Council of State and Territorial Epidemiologists Overdose Subcommittee (2015). “Drug Overdose Deaths: Let’s Get Specific”. Public health reports (Washington, D.C. : 1974), 130(4), 339–342.
Back to the CSCI 203 home page.