''' ========================================================================== Python for Parallelism in Introductory Computer Science Education SC '13 HPC Educators Program Steven Bogaerts, Wittenberg University Joshua Stough, Washington and Lee http://www.joshuastough.com/SC13 MIT License: see README_LICENSE.txt file: spawningProcesses.py author: bogaerts Summary:These examples demonstrate the basic creation of processes, including passing arguments and a customized name. The notion of a lock is necessarily introduced in order to facilitate printing. ========================================================================== ''' ''' ========================================================================== Background ========================================================================== See slides: - Parallel Programming Mechanisms - Programming Background ''' ''' ------ Tuples ------ This example is not particularly about accelerated computing, but it provides an early introduction to the concept of tuples, which are needed to pass arguments to child processes in the multiprocessing module of Python. ''' def testTuple(): a = (6 + 2) print a, type(a) b = (6) print b, type(b) c = (6,) print c, type(c) d = (6, 7,) print d, type(d) e = (6, 7) print e, type(e) f = () print f, type(f) ''' KEY IDEAS - Tuples are almost exactly like lists, except that they use parentheses instead of brackets. - If you want a tuple with just one item in it, you must put a comma after that item (in the parentheses), otherwise the parentheses will be interpreted as mathematical symbols, and you won't get a tuple! - For a tuple with multiple items, you don't have to use the comma at the end, but you can. But you still have to have commas *between* each item. - If you make a change in the editor window, don't forget to close the shell, repoen it, and say from ______ import * again. ''' ''' ----------------- Keyword Arguments ----------------- Again, this is not particularly about accelerated computing, but is a small bit of background needed to use the multiprocessing module. It demonstrates the ability to specify arguments in an arbitrary order using "keyword arguments". Keyword arguments also allow the leaving out of any argument when a default is given, but this fact is not needed at this point. ''' def myFunc(a, b, c): print a, b, c def testMyFunc(): myFunc(c='Bow', a='Wow', b='Yo') ''' KEY IDEAS - When a function takes multiple arguments, you can use the notation above to pass the arguments in any order you want. ''' ''' Also: from random import randint x = randint(low, high) # x is now between low and high, *inclusive* from time import sleep sleep(5) # wait here for 5 seconds ''' ''' ------------------ Spawning a Process ------------------ See slide: Spawning Processes Our first look at multiprocessing. A child process is created, and process IDs are shown for the child and parent. ''' from multiprocessing import * def sayHi(): print "Hi from process", current_process().pid def procEx(): print "Hi from process", current_process().pid, "(main process)" # Construct the process (but it doesn't start automatically). # target must be a function, without the parentheses. # args must be a tuple, containing the arguments to pass to the target, # or () if no arguments. otherProc = Process(target=sayHi, args=()) # Start the process we just constructed. otherProc.start() ''' KEY IDEAS - To create a child process, call the Process constructor, specifying a target (the function to be run - without parentheses), and also specifying the arguments that will go to that target function. - Use sleep to make a process wait for some number of seconds. - The Process class has many instance variables, including pid and name, which are public and can be used to identify the process. current_process().pid gives the pid of the currently-running process (same idea for name). - current_process() is a function defined in the multiprocessing module. It is an expression, giving a Process object representing the currently running process. - The Process class has a start method that must be called for the process to start doing its task. ''' ''' --------------------------- Spawning Multiple Processes --------------------------- Exercise: Copy the "Spawning a Process" example above, and modify to create 3 processes, each of which says hi as above. ''' def procEx2(): print "Hi from process", current_process().pid, "(main process)" # Construct the process (but it doesn't start automatically). # target must be a function, without the parentheses. # args must be a tuple, containing the arguments to pass to the target, # or () if no arguments. p1 = Process(target=sayHi, args=()) p2 = Process(target=sayHi, args=()) p3 = Process(target=sayHi, args=()) # Start the process we just constructed. p1.start() p2.start() p3.start() ''' KEY IDEAS - You can make multiple child processes simply by calling the Process constructor multiple times. These processes are independent of each other. ''' # --------------------------------------------------------------------- # In this example, we're using the args parameter to pass an argument. def sayHi2(n): print "Hi", n, "from process", current_process().pid def manyGreetings(): print "Hi from process", current_process().pid, "(main process)" name = "Jimmy" p1 = Process(target=sayHi2, args=(name,)) # Note that I'm passing an argument! p2 = Process(target=sayHi2, args=(name,)) p3 = Process(target=sayHi2, args=(name,)) p1.start() p2.start() p3.start() ''' KEY IDEAS - When you create multiple processes, they run independently of each other. - The args tuple can be set to contain whatever data you want to pass to the target function. - The args tuple must contain exactly as many items as the target takes arguments. The arguments must be provided in the correct order (you can't use the keyword arguments style here). ''' ''' ------------------- Anonymous Processes ------------------- Exercise: Write a function that first asks for your name, and then asks how many processes to spawn. That many processes are created, each greets you by name, and gives its pid. ''' def manyGreetings2(): name = raw_input("Enter your name: ") numProc = input("How many processes? ") for i in range(numProc): #(Process(target=sayHi2, args=(name,))).start() p = Process(target=sayHi2, args=(name,)) p.start() def f(): for a in range(10): grade = 97 # This doesn't become gr0de, gr1de, etc... ''' KEY IDEAS - A process (or any object) doesn't have to be stored in a variable to be used. But if you don't store it in a variable, you won't be able to use it after the first time. An object not stored in a variable is called an anonymous object. - If you need to access the process objects later, you could store them in a list when you make them. To do this, make a list accumulator (starting with []) that you append Process objects onto, so that you can loop through the list later. - Once a process has been started, changing the variable that stored it won't stop the process. - For loops don't substitute values into the middle of variable names. ''' ''' ------------------------ Specifying Process Names ------------------------ This example demonstrates that if we aren't happy with the automatically- assigned process IDs, we are welcome to provide our own additional names. ''' def sayHi3(personName): # current_process() is defined in the multiprocessing module. # It returns the currently running process, from which we can get # the name. print "Hi", personName, "from process", current_process().name, "- pid", current_process().pid def manyGreetings3(): print "Hi from process", current_process().pid, "(main process)" personName = "Jimmy" for i in range(10): Process(target=sayHi3, args=(personName,), name=str(i)).start() ''' -------------------------------- Using a Lock to Control Printing -------------------------------- See slide: Locks In the examples above, it is likely that output from the different processes will get mixed up, due to the sharing of stdout by multiple processes and context switching. The example below shows how to prevent this using a lock. ''' def sayHi4(lock, name): lock.acquire() print "Hi", name, "from process", current_process().pid lock.release() def manyGreetings3(): lock1 = Lock() print "Hi from process", current_process().pid, "(main process)" name = "Jimmy" for i in range(10): Process(target=sayHi4, args=(lock1, name)).start() ''' KEY IDEAS - Locks prevent multiple processes from trying to do something at the same time that they shouldn't. For example, multiple processes should not try to print (access stdout) at the same time. - Define the lock in the parent process, so it can be passed to all the children. - Don't forget to use lock.release() sometime after every lock.acquire(), otherwise any other processes waiting for the lock will wait forever. ''' ''' ------------- Digging Holes ------------- Exercise: Imagine that you have 10 hole diggers, named A, B, C, D, E, F, G, H, I, and J. Think of each of these as a process, and complete the function assignDiggers() started below, that creates 10 processes with these worker names working on hole 0, 1, 2, ..., 9, respectively. When you're done, you should get output like the following (except perhaps in a different order): >>> assignDiggers() >>> Hiddy-ho! I'm worker G and today I have to dig hole 6 Hiddy-ho! I'm worker A and today I have to dig hole 0 Hiddy-ho! I'm worker C and today I have to dig hole 2 Hiddy-ho! I'm worker D and today I have to dig hole 3 Hiddy-ho! I'm worker F and today I have to dig hole 5 Hiddy-ho! I'm worker I and today I have to dig hole 8 Hiddy-ho! I'm worker H and today I have to dig hole 7 Hiddy-ho! I'm worker J and today I have to dig hole 9 Hiddy-ho! I'm worker B and today I have to dig hole 1 Hiddy-ho! I'm worker E and today I have to dig hole 4 >>> ''' def dig(workerName, holeID, lock): lock.acquire() print "Hiddy-ho! I'm worker", workerName, "and today I have to dig hole", holeID lock.release() def assignDiggers(): lock = Lock() workerNames = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"] # Complete the code for assignDiggers here for holeID in range(len(workerNames)): currWorkerName = workerNames[holeID] p1 = Process(target=dig, args=(currWorkerName, holeID, lock)) p1.start() # Process(target=dig, args=(workerNames[holeID], holeID, lock)).start() # Loop by element (inefficient) # for workerName in workerNames: # Process(target=dig, args=(workerName, workerNames.index(workerName), lock)).start() ''' KEY IDEAS - If it doesn't matter what order the processes run in, then code like the above can happen very efficiently. - This is a good example of situations where you need to choose carefully between looping by index and looping by element. By index is ideal here. ''' #if __name__ == '__main__': # testTuple() # testMyFunc() # procEx() # manyGreetings() # manyGreetings2() # manyGreetings3() # manyGreetings4() # assignDiggers()