CSCI 363 — Computer Networks -- Labs

Lab 01 : HTTP Protocol and C Programming Tools of GDB and Valgrind

Goals

Review how to use GDB, the GNU Project Debugger in Emacs.
Learn or review how to use Valgrind, a tool to detect memory management bugs.
Learn and program some basic HTTP protocols in C.

Setup

Assume you have set up your Gitlab account, created and shared your csci363 project (directory) with the instructor by now. If you haven't done so, please do so before continuing. You can visit this file for some brief guidelines.
Open a terminal window and run the following sequence of commands (the directory "csci363" has been created when setting up Gitlab):

cd csci363

mkdir labs

mkdir labs/lab01

git add labs

git commit -m "lab01 created"

git push lab01

cd labs/lab01

Now, in your csci363/labs/lab01 directory, create a file to contain answers to the questions in this assignment using the command below.

touch answers.txt

After running this command in your shell, open the file in a text editor such as vi or emacs and write down the lab information including lab number, your name, and the date of the lab. This should be a standard heading that you are required to use in all lab reports.

Copy the files for this lab to your lab01 directory

cp -r ~cs363/Spring16/student/labs/lab01/* .
Note there is a space and a dot '.' at the end of the command, and a '-r' as the command argument, which means copy recursively for all sub-directories. You should see a collection of four files, dlist.c, dlist.h, Makefile, test_dlist.c, and a directory simple-web-server-client which contains a collection of program files.

Problem 1: Using GDB to find where segmentation faults are

The programs you just copied consist of an implementation of simple doubly-linked list and a test program. You are asked to compile and run the test program, find and fix the bugs in the implementation using GDB.

Compile and run the program.

make

./test_dlist < /usr/share/dict/words

Run the program multiple times. You will see the program behaves normally. If you encounter an error (e.g., segmentation fault) ignore it, running the program again will mostly likely result in normal output.

Now using an editor such as vi or emacs to examine the program. You will find that the program basically reads from standard input one word at a time and insert the word into a doubly linked list. After finishing reading and building the list, the program simply traverses the list and prints out all the words it reads so far. The number printed before the word can be considered an order, or a count. The meaning isn't significant in this program. The program then removes a word at a random position in the list and prints the list again.

To make a good test, modify the program test_dlist.c such that the program will remove the very first node and very last node in the list, in addition to removing a node in a random spot in the list. In this program, the very first node stores the word[0] and very last node stores the word[NUM_WORD-1].

You will now encounter segmentation fault when removing first or last node of the list node from the list. You can certainly use printf to find out where the errors are. But a debugging tool such as gdb is much more flexible and easier to use.

Debug programs with GDB

GDB is a very powerful debugging tool. Programmers can use gdb to examine status of a program such as the value of a variable, address of a pointer, and elements in a structure, among others. One can also set value of a variable, thus alter the execution sequence of the program. You have been using gdb since taking CSCI 206. So consider this exercise as a review and learn to use gdb within the emacs editor if you have not used gdb in this way. General information about GDB and emacs are widely available on the web. We will work with a subset of commonly used gdb features. This website has a list of these commands. This file contains an abbreviated list that is easier to use.

In order to use gdb, you need to compile your program with the option of -g, e.g.,

gcc -c -g myprog.c
gcc -o myprog myprog.o

Then you can run gdb either as an independent program

gdb myprog

or run it within the emacs editor as in the following exercise.

Compile the programs you copied from the course directory: make
Start the emacs editor: emacs
Start gdb: M-x gdb followed by the name of the program you want to debug, e.g., test_dlist. The sequence M-x is read as meta-X, meaning that you type Esc followed by the letter x. This sequence allows you type command in emacs.
Once gdb is running, you can take various actions with its command. Here are some of the common ones. You should try them out with the compiled program test_dlist

Setting breakpoints: You can set the break point by a program line number, or by a function name. Try the following. (Note that (gdb) is the prompt.)

(gdb) b main to break at the main function.
(gdb) b dlist_create to break at the function dlist_create
(gdb) b dlist.c:56 to break at line 56 in the file dlist.c. This is at the end of the function dlist_insert.

Running a program: Once you set the break points at the places you want, you can run the program. You can certainly add break points as you go. Type the following to run the program.
(gdb) r < /usr/share/dict/words

The above command execute the program within gdb and take the file /usr/share/dict/words as the standard input.
Examining various values: With break points set, the program in gdb will stop at these points. You can examine or set various values at any break point. Once done with the actions at a break point, you can continue the execution of the program by simply typing c for continue. Try the following. (At this point, the program should stop at the first line of the main function.)
(gdb) p argc to check the value of the first parameter of the main function. You should see
```
(gdb) p argc
$1 = 1
(gdb) 
```
(gdb) c to continue the program. The program will stop at the second break point which is in the function dlist_create. Let's examine what is the variable new_list.
(gdb) whatis new_list
(gdb) p new_list

The first command checks what type of variable is new_list, the second command prints out the value of the variable new_list. The following should be what you see.
```
(gdb) whatis new_list
type = struct dlist *
(gdb) p new_list
$2 = (struct dlist *) 0x7fffffffd950
(gdb) 
```
(gdb) c to continue the program. The program will stop at the third break point which is at the end of the function dlist_insert. Let's examine more values.
(gdb) p *the_list which prints the value of *the_list
(gdb) p a_node which prints the value of a_node
(gdb) p *a_node which prints the value of *a_node

What is the difference between *a_node and a_node? What are they?

You can continue the program and stop at this break point a couple more times.
Stepping through the program: If you'd like to go through the program one line at a time, you can simply type the next command in gdb. Try a few times with the command.

(gdb) n
(gdb) n
(gdb) n
Viewing breakpoints: You can see a list of break points by the command info.

(gdb) info b
(gdb) info break
Deleting breakpoints: You can delete break points by specifying which break point to delete. For example, if you want to delete the first break point in the main() function, do the following.

(gdb) d 1
Disabling or enabling breakpoints: Sometimes you want to temporarily disable a break point without deleting it, and later on you can enable the break point again. Use the command disable/enable.

(gdb) disable 2
(gdb) enable 2
Finding out where the current line of execution is: You can find out where the current line of execution is by the command where. This is especially useful if you encounter a segmentation fault or a bus error.
(gdb) where
Automating the execution of a sequence of commands: Often when debugging a program, we want to examine certain values for many times (e.g., traversing through a long list), it is tedious to stop and print the value one at a time. You can use the following gdb commands to automate the process. On the next time you stop at break point 3 (at the end of the function dlist_insert), type the following command.

(gdb) command
Type commands for breakpoint(s) 3, one per line.
End with a line saying just "end".
>p *the_list
>p a_node
>p *a_node
>c
>end
(gdb)

Here you defined a set of commands to be executed at this break point without stopping the program because the last command you issued is a continuation of the program execution. If you issue a continuation command (c) now, the program will continue and print information specified in the above commands.
Leaving GDB: When you are done with gdb, simply type the command quit or q.

(gdb) q

Now that you have reviewed the commonly used gdb commands, let's put the knowledge in use.

Your work

Do the following.

Edit the program test_dlist.c so that in addition to deleting a node in a random spot in the list, add the code segment to delete the first and last node in the list. After all, it is critical to test the boundary conditions. The first and last node can be specified by deleting the node with the first word word[0] and last word word[NUM_WORD - 1]. (You have done this step above when discussing GDB.)
Compile and run the program again. Now your program will result in the infamous segmentation fault. Though the segmentation faults here are simple, you can use other scheme to find out where they are, you are asked to use gdb to locate where the problem(s) are. Copy and past into answers.txt the gdb message you see after you identify the location of the segmentation using the gdb command where. Explain briefly what causes the segmentation faults and how you fix the problem(s). Label this part of answer as Problem 1.1
Revise the program dlist.c so that these above mentioned segmentation faults won't happen again. Make sure the drive program test_dlist.c prints proper information to remove the first node and last node. Copy and paste the now correct output into answers.txt. Label this part of the answer as Problem 1.2.

Problem 2: Using Valgrind to eliminate memory leaks

You have corrected the segmentation fault problem in your program. And the program runs fine, does what it is supposed to do. However the program still has the problem of memory leaks, that the program did not release the memories no longer needed. If you visually inspect the program, you will see that it has a number of calls to malloc(), but has no free() calls. Again, in this relatively simple program, you probably could fix the problem without other tools by simply adding free() to proper locations. But we will use a tool, valgrind, to help identifying the memory leak problems. You can then fix these problems.

Your work

Compile the program. (No special C flags are needed to use valgrind.) Run the program with valgrind.

% make
% valgrind --leak-check=full ./test_dlist < /usr/share/dict/words

You will see the report from valgrind, something similar to the following.

==13203== 
==13203== HEAP SUMMARY:
==13203==     in use at exit: 771 bytes in 41 blocks
==13203==   total heap usage: 41 allocs, 0 frees, 771 bytes allocated
==13203== 
==13203== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 1 of 4
==13203==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==13203==    by 0x40093A: make_node (dlist.c:126)
==13203==    by 0x400A2F: main (test_dlist.c:33)
==13203== 
==13203== 735 (16 direct, 719 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 4
==13203==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==13203==    by 0x400775: dlist_create (dlist.c:28)
==13203==    by 0x4009C5: main (test_dlist.c:23)
==13203== 
==13203== LEAK SUMMARY:
==13203==    definitely lost: 48 bytes in 2 blocks
==13203==    indirectly lost: 723 bytes in 39 blocks
==13203==      possibly lost: 0 bytes in 0 blocks
==13203==    still reachable: 0 bytes in 0 blocks
==13203==         suppressed: 0 bytes in 0 blocks
==13203== 
==13203== For counts of detected and suppressed errors, rerun with: -v
==13203== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6)

Some heading information from valgrind and the normal output of your program are not included here. This report clearly tells you that while calls to malloc() or the like are made, no memory is freed, thus resulting in memory leaks. Your work now is to put calls of free() in proper places to free up the allocated memory blocks that were no longer needed.

You should do

Read manual pages of library calls malloc() and free() to review what they do.
Put calls free() in proper places so that the program still works correctly, and no memory leaks exist any more.

When done properly, you should see something similar to the following from valgrind.

==13410== 
==13410== HEAP SUMMARY:
==13410==     in use at exit: 0 bytes in 0 blocks
==13410==   total heap usage: 201 allocs, 201 frees, 3,761 bytes allocated
==13410== 
==13410== All heap blocks were freed -- no leaks are possible
==13410== 
==13410== For counts of detected and suppressed errors, rerun with: -v
==13410== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)

Include in your answers.txt as Problem 2. a screen capture (copy-and-paste text on the screen, not a real image) of executing the following two commands.

% make
% valgrind --leak-check=full ./test_dlist < /usr/share/dict/words

Clear and commit your work

Clear the directory by make realclean, then add, commit, and push the files dlist.c, dlist.h, Makefile, test_dlist.c to your git repository.

Problem 3: Working with HTTP (Hypertext Transfer Protocol)

HTTP has been the most popular application protocol on the web in the last decade. It uses a collection of simple, text-based commands to send and receive files such as web pages between a web server such as www.google.com and a client such as a browser. All web browsers follow HTTP protocol to communicate with a web server to retrieve web pages. You can consult many web resources (e.g., http://www.w3.org/Protocols/ or https://www.jmarshall.com/easy/http/) for the details of the protocol. In this exercise, you are asked to experiment the protocol by sending text-based HTTP command to a simple web server and observe the behavior of the protocol. Then you are asked to augment the server program such that the HEAD request can be served.

The most frequently used two commands in HTTP are GET and POST (case insensitive). The GET command requests a file from the web server, and the POST command sends information to the server for processing, e.g., submitting a form.

The basic flow of work is described as follows.

An HTTP server is up and and running.
On the client side, a socket connection is established.
The client can then request a web page from the server by sending the following line-based text commands at the minimum, (HTTP supports many other commands.)
```
GET <path> HTTP/1.1\r\n
Host: <host name where the server is running>\r\n
\r\n
```
where <path> is the path for the web page to be retrieved and <host> is the name of the computer where the server is running. The end of the client request is signaled by a new line by itself. For example, the command
```
GET /index.html HTTP/1.1\r\n
Host: brki164-lnx-10\r\n
\r\n
```
is asking the web page index.html from the server that is running on the computer brki164-lnx-10.
The client can also send data to the server and ask for processing by sending the following line-based text commands,
```
POST <path-to-action> HTTP/1.1\r\n
Host:<host name where the server is running>\r\n
name1=value1&name2=value2\r\n
\r\n
```
where <path-to-action> specifies the path to a pre-defined action on the server, and name1=value1&name2=value2 are the name/value pairs submitted by the client for processing. The client may submit as many pairs as needed. Upon receiving the request, the server will extract the data and take proper action based the requested <path-to-action>. For example, the command
```
POST /form HTTP/1.1\r\n
Host: brki164-lnx-10\r\n
Name=XMeng&Major=CS&Status=Graduated\r\n
\r\n
```
is asking the server to take action (see the following lab exercises for the meaning of actions here) for the submitted three pairs of data.

Your work

Compile, run, and experiment with the programs in the directory of simple-web-server-client which is a part of the programs you copied at the beginning of the lab. Specifically, do the following.

Compile and run the server,
% cd simple-web-server-client
% make
% ./webserver <port-number>

where <port-number> should be one of your assigned port numbers. Then in a separate terminal window (it could be in the same terminal window if you run your server in the background) run the client program.
Run the client to access your own server,
% ./webclient <server-name> <GET|HEAD> <path> [port-number]

where <server-name> is the computer name on which the server is running, e.g., dana132-lnx-3, <GET|HEAD> is one of the two commands you prefer to run, <path> is the file path to the document you'd like to retrieve from the server, and [port-number] is the port at which the server program is running. The following example shows the case of running the server on computer dana132-lnx-3 at port 6789 and running the client to retrieve a web page called home.html from the server.

dana132-lnx-3 % ./webserver 6789

dana132-lnx-3 % ./webclient dana132-lnx-3 GET /default.html 6789

dana132-lnx-3 % ./webclient dana132-lnx-3 HEAD /default.html 6789

If the server is running at the standard HTTP port number at 80, then the client doesn't have to supply the port number argument. Usually, the pair of brackets '[' and ']' means the argument inside is optional.
Run the client to access other public servers,
% ./webclient www.bucknell.edu GET /
% ./webclient www.eg.bucknell.edu GET /
% ./webclient www.example.org GET /
Run a web browser against your own server. Assume your server is running on dana132-lnx-3 at port 6789. Use a web browser to access your server by the URL
dana132-lnx-3:6789/

Include in your answers.txt as Problem 3.1 a summary of what you saw in the above exercises in a couple of paragraphs. In particular, describe the relation between the program webclient.c and a browser, as well as the program webwerver.c and a general web server such as www.bucknell.edu or www.example.org. You will note that the HEAD function has not been implemented yet.

Examine both programs, webclient.c and webserver.c. Get a general idea how the two programs work. You are then asked to implement the function process_head() following the pattern of process_get() in the program of webserver.c. Note that currently the process_head() function is a skeleton. You need to complete the function. In addition, the function process_get() works for a set of specific files. Your process_head(), however, should work for files of any name. You can limit the file type to be text (html) and image (jpeg, png). Read the information about the head method from sources such as HTTP Method Definitions.
Include in your answers.txt as Problem 3.2 a screem capture of running the client program webclient against your webserver with the following requests. Before running the following commands, change the file protection for default.html to be not-readable by others, that is, chmod 640 default.html.

% ./webclient dana132-lnx-3 HEAD / 5678
% ./webclient dana132-lnx-3 HEAD /JLH.jpg 5678
% ./webclient dana132-lnx-3 HEAD /default.html 5678
% ./webclient dana132-lnx-3 HEAD /none.html 5678

Clear and commit your work

Clear the directory simple-web-server-client by make realclean inside the directory, then go up to your lab01 directory, add, commit, and push the directory simple-web-server-client to your git repository.

Lastly add and commit your answers.txt to your git repository.

Deliverables: You should have added and committed the following files to your git repository.

The answers.txt file which contains answers to the three sets of problems.
The complete collection of files dlist.c, dlist.h, Makefile, test_dlist.c, the directory simple-web-server-client and all the files in that directory.

Congratulations! You have just completed this lab exercise!

Extra credit work

If you have time and are interested in exploring further, consider implementing the following as extra credit work. These pieces of work are not dependent on each other, so you can pick any to try.

If you complete any extra credit work, please indicate so in the answers.txt and tell the instructor how to test your extra credit work.

Implement HTTP cookies: Modify your simple web server and client so that HTTP cookies are implemented. Refer to the resource page on the course web site, or other sites on the web, for information about the HTTP cookies. Test the cookie mechanism between your own server and client, between your server and a regular browser.
Implement HTTP conditional GET method: Modifiy your simple web server and client so that both support the HTTP conditional GET method. Refer to the textbook, the resource page on the course web site, or other sites on the web, for information about conditional GET method.