Lab 03: Recursion
0. Credits
The lab content is the result of collective work of CSCI 204 instructors.
1. Objectives
In this lab, you will learn the following,
- Understanding the basic structure of Linux directories and files;
- Implementing a recursive solution to traverse the Linux file system;
2. Preparation
Begin by opening a terminal window and creating a directory for this lab in your course directory.
cd ~/ cd csci204/ cd labs mkdir lab03 cd lab03
Then copy all files, including the skeleton program
dirs.py
, from the course Linux directory using the following command. You can use thescp
command to copy the files from Linux server to your own computer if you so choose.cp -rp ~csci204/student-labs/lab03/* .
The command line option
-r
means copying recursively as there is another directory and files within thelab03
directory. The optionp
means to preserve the dates when the files and the directories are created.There are a few text files other than the Python program in the directory can help you practice the Linux
diff
command as described below.Linux command
diff
Linux provides many useful commands for programmers. The command
diff
is one of such commands. Thediff
command compares two text files and prints out the differences between the two. It can be used in many ways. For example, you can use the command to see if the output of your program is same, or very similar to the required output. Please read the following blog post to understand the general meaning ofdiff
output.
https://unix.stackexchange.com/questions/81998/understanding-of-diff-output
After reading the post, try the
diff
command among the four text files you just copied. For examplediff file1.txt file2.txt diff file1.txt file3.txt diff file2.txt file4.txt
Make sure you can explain the output of these comparisons.
3. Recursively Listing Files
Recursive algorithms are natural for solutions to problems with hierarchical structure. An example problem is listing all the files in a directory and all of its sub-directories. Since the Linux file system is hierarchical, we should immediately think of using a recursive approach. You can see this in action with the following command.
ls -R ~csci204/student-labs/lab03/testpages/
The command will list all files and directories recursively. As you can probably guess, the command option
-R
tells the listing to be recursively going down the directory tree.For this part of the lab, you will write a Python program to list all of the files in a directory and, recursively, in all of its sub-directories, in a similar fashion by the
ls -R
command.Read the code segment contained within
dirs.py
, which you copied at the beginning of the lab, to get an idea what is involved in the program. Then try the following commands with the program using Linux terminal. If you are running your program from IDLE, spyder, or other IDEs, read the comments at the end of the program and revise the program properly. Note that IDLE or spyder may not work with paths that contain the character tilde~
. In these cases, you'd have to use the full path. To find out the full path for a file or directory, use the command pwd, e.g.,cd ~/ pwd
You should see the full path to your home directory.
The following description assumes the use of commandline Python. But for spyder or idle, the operations are similar, with the exception of the tilde character which has to be replaced by a full path.
python dirs.py dirs.py python dirs.py ../ python dirs.py ~/csci204
Observe the behavior of the program as it stands before you make any modifications. The given version of the program
dirs.py
lists the names of the files and sub-directories in a directory, but does not recursively list the files under any sub-directories. You'll notice that it checks to see if there are any command-line arguments passed to it. If no argument is given, the program prints a usage message, asking the user to supply an argument.You can also run the Linux command to list the above directories recursively, as a comparison to what the program
dirs.py
does. Try the following command:ls -R ../ ls -R ~/csci204
You are to modify the
dirs.py
program to recursively print the names of all the files in all sub-directories.Here are some details.
- You must use a recursive solution.
- File names should be printed one per line.
- Just before any recursive calls, print out a message
"-- Entering [dir-name]"
where your program should fill in the directory name "dir-name". The methodos.path.basename()
will extract the directory name from the path argument. Note that if the path ends with a '/' character, this method will return an empty string, so it is worth putting in a check to remove the trailing '/'.- Just after any recursive calls, print out
"-- Leaving [dir-name]"
where your program should fill in the directory name.- Python provides a few useful methods and functions in the
os
module and theos.path
module that you need to use (read the relevant Python online documents for details).
- You need to import the
os
andos.path
modules. Since you do not know what parts of them you want, use an*
to get all the parts
from os import * from os.path import *
- The
os
module has a function calledlistdir()
that returns a Python list of the content in the directory. The list contains both files and sub-directories. Read this link to understand how it works: http://www.tutorialspoint.com/python/os_listdir.htm- While a directory contains a list of files or sub-directories, a file doesn't have any directories. Thus you can't use the function
listdir()
to generate a list if the parameter to the function is a file itself. To check if an object is a file or a directory, you can use the functionisfile()
andisdir()
in theos.path
module, respectively.
Not sure how they work? Search for them online! Lots of people use them and there is also official python documentation. Search for
os.path.isfile()
.TIP: when checking if an object is a file or a directory, the isfile() and isdir() functions require the parameter to be a full, absolute path. A relative path is something similar to the following:
~csci204/2019-fall/
, or~/message.txt
or../readme.txt
while an absolute path is a complete path from the root directory, such as:
/home/accounts/student/s/sam023/hello.txt
The problem is that
listdir()
doesn't give you an absolute path (ugh!). It gives you the file name or directory name. The good news is thatos.path
has you covered:
os.path.abspath(curr_dir_name)
- will take the name of a directory as input and convert it to the absolute path;os.path.join(absolute_path, file_or_dir_name)
- will intelligently combine an absolute path with your file or directory name.- To check your program's output, compare your result with that of the following Linux command:
ls -R ~csci204/student-labs/lab03/testpages/
where the -R option is used to recursively list sub-directories encountered. Your listing should contain the same folders and files listed (albeit in a different format, or a different order since Python doesn't list files in a fixed order.)
Here is a sample output:
--Entering testpages grading.html index.html level2 readme.txt test2.html test.html --Entering level2 page1.html page2.html --Leaving level2 --Leaving testpages
Run your program using at least two more different directories each of which has files and a sub-directory (or sub-directories). You must test your program using
~csci204/student-labs/lab03/testpages/
. Remember to deal with the trailing '/'. Make sure your program is well commented. Save and upload your program to Moodle.4. Get File Statistics When Traversing Directories
If you list files on Linux using commands such as
ls -l
orls -lt
, you will find that other file statistics are printed, including dates when the files are created and the size of the files. Try these commands in your own home directory.
ls -l ~/ ls -lt ~/
For now, we will just concentrate on two pieces of information, the size of a file and the date when a file is last modified. These are the two middle columns in the above listing. For example, the first file
grading.html
in thetestpages
directory was created on September 11, 2019 with a size of 667 bytes, or 667 characters since this is a text file in which each character is a byte long.Your tasks
Create a new program based on the program you just finished
dirs.py
so that the new program can count the total number of bytes all the files used on the disk, list the maximum and minimum sizes, as well the oldest and newest time of the files created in the directories your program is visiting.The basic logic of the program is to compute the maximum and minimum sizes, as well as the oldest and newest time stamp of all files when visiting each file. Once you find the size and the date of creation of a file, you should be able to compute the max and min in a collection of those values.
Make a copy of your existing program, name the new program
dirattrib.py
for directory attributes. Modify the programdirattrib.py
so that it can accomplish the following tasks.4.1 Develop a FileStats class
- Since you are going to be creating a series of statistics, create a new
FileStats
Python class with the following data attributes:max_size, min_size, oldest_time,
andnewest_time
.- You should define three methods within
FileStats
.
- The constructor where you define your data attributes;
- The
print_results(self)
method which prints the statistics in the following format.
maximum size of files : 667 minimum size of files : 53 oldest time : Thu Aug 29 15:56:41 2019 newest time : Thu Aug 29 16:07:47 2019
- An
update(self, filename)
function to carry out the task of retrieving the file statistics and updating the values held in yourFileStats
object.- To retrieve file statistics, Python's
os
module provides a function calledlstat()
. The functionlstat()
returns an object. Among other pieces of information, the returned object contains the size of the file and the time stamp when the file is last modified. These two data members are called st_size and st_mtime. You can use them to collect the required statistics. Read the relevant Python document to make sure you understand how to access these values.- The number of bytes a file takes is an integer, so you can print the maximum size and minimum size directly when the directory traversal is completed. However, the time when a file is last modified is the number of seconds since epoch (January 1, 1970), which is a huge value and in general it won't make sense for a human being to read.
For example: a time stamp for March 16, 2011 about 1 o'clock in the afternoon would read something like: 1300296729.762571 seconds. The time module in Python provides a function called
ctime()
that converts a value in seconds to a human-readable time such as Wed Mar 16 13:32:09 2011. While you would use the time value directly to find minimum and maximum (oldest and newest time), you must usectime()
function to print these values in a human readable format. Remember to import thetime
package for these tasks.4.2 Other Modifications
Complete the following tasks to make the program work.
- Modify the
list_dir()
method so that it takes aFileStats
object as a parameter. We ask that thelist_dir()
method recursively call itself, passing your updatedFileStats
object each time.- Initialize a
FileStats
object in themain()
method before calling thelist_dir()
method. During the creation of yourFileStats
object, you'll be initializing its data attributes. What should those initial values be?
Here are two hints.
- Python provides a
sys.maxsize
as a reasonable maximum integer value. (Note that there is really no limit on the values of numbers in Python.)- On Linux systems, file size is measured by bytes and the time is measured as the number of seconds since January 1, 1970. Both are non-negative values.
The Final Product
Show your program works correctly in at least two directory listings, each of which must have multiple levels of directories. The first must be the directory
~csci204/student-labs/lab03/testpages/
The test run should show the following values:
maximum size of files : 667 minimum size of files : 53 oldest time : Wed Sep 11 20:51:49 2019 newest time : Thu Sep 12 09:02:40 2019
Save and upload to Moodle your newly completed program
dirattrib.py
as well as your modifieddirs.py
file.Congratulations! You just completed the lab exercises!