![]() |
TFCC Newsletter |
![]() |
The IEEE Computer Society Task Force on Cluster Computing (TFCC) has been in existence for almost a year. CO-Chair Rajkumar Buyya received confirmation of the formation of TFCC from the IEEE Computer Society Technical Activities Board last December 17, 1998. And what a fruitful year it has been for TFCC!! Much of our activity is reflected in this the second TFCC Newsletter.
Included below is the Executive Summary and link to the TFCC Annual Report. Also, find a link to a draft of TFCC's White Paper
See below for details on our first conference IWCC'99 in Melbourne, Australia, December 2, 1999. TFCC had a presence at SuperComputing '99 (SC'99) with two BOFs. We encourage you to participate in our second conference CLUSTER 2000 at Technische Universit�t Chemnitz, Saxony, Germany, November 28 - December 2, 2000. Many other conferences and workshops are listed as well.
TFCC's Educational Promotion Program for Book Donations has been very successful with four books and two journals.
See especially the new book SCI: Scalable Coherent Interface: Architecture and Software for High-Performance Compute Clusters edited by Hermann Hellwagner and Alexander Reinefeld.
See the two short articles on System I/O Effort Renamed InfiniBandTM Trade Association and Sandia's CPLant Cluster Ranked 44th in Top 500 List. Please note that I am seeking short articles on hot topics in cluster computing for the April, 2000 TFCC Newsletter.
I view the TFCC Newsletter as an on-line dynamic document. Therefore, my strategy as Newsletter Editor has been to provide useful links to information in an organized manner. I have organized each Newsletter to be printed as a single document. If any of you have comments or suggestions on the format or contents, please send me a note.
Dan
Executive Summary
The IEEE Computer Society Task Force on Cluster Computing (TFCC) has been in existence since early in February 1999. In its short life the TFCC has started to have an impact on the cluster computing research and development community in both academia and industry. Some evidence of this can be seen by the numbers influential people willing to get involved in the TFCC activities. Furthermore, the number of TFCC-based events and numbers of willing volunteers prepared to promote our activities all provide ample evidence that the TFCC is not only being successful, but also having an impact on the community
One particularly successful TFCC activity has been our cluster computing educational programme. Here we are attempting to promote the inclusion of cluster computing and its related technologies in the core curriculum of universities around the world. Our efforts in this area also include a book donation programme. In conjunction with influential international authors and publishers we have donated more than 250 books to academic institutions around the world.
The TFCC is also starting to co-organize and sponsor a number of technical events in addition to its own annual event. The first annual (International Workshop on Cluster Computing - IWCC99) event is scheduled to be held in early December 1999, Melbourne, Australia.
TFCC members may obtain on-line subscriptions to Cluster Computing Journal at a low-cost of $40. Members should fill in the order form at http://www.baltzer.nl/cluster/cluster.order.html and mention being a "TFCC member". Only individual non-library subscriptions allowed.
The donation is sponsored by the ESRPIT Network of Excellence, Working Group 22582 (SCIWG). Thanks to Wolfgang Karl, TUM, for making the donation possible. For book requests, please send an email to Mrs. Petra Fehlhauer, fehlhauer@zib.de with your complete postal address.
Thanks, Greg Pfister, Barry Wilkinson, Wolfgang Gentzsch, Alexander Reinefeld and Wolfgang Karl for your support of the TFCC Education Promotion Program. We would also like to thank the publishers Baltzer Science Publishers, Elsevier Publisher, Morgan Kaufmann and Prentice Hall for their donations.
If you are interested in using any of the above books or issues in your course, please contact TFCC Co-Chair Rajkumar Buyya.
Contributed Papers Due: EXTENDED to November 15, 1999 (was October 25)
Paper submission September 15th, 1999
Final papers due December 31st, 1999
Chair: Rajkumar Buyya
URL: http://ceng.usc.edu/~hjin/apscc2000/
Clustering is becoming increasingly popular, as cluster systems can deliver better performance than traditional mainframes and supercomputers at a much lower hardware cost. They can also offer high performance, scalability and high availability to organizations. This comprehensive guide book covers every key issue associated with high-performance cluster computing such as networking, light-weight protocols, resource management systems, representative cluster computing systems. Topics covered include cluster middleware, single system image, active messages, process migration and load balancing, metacomputing, Beowulf cluster, and much more.
High Performance Cluster Computing: Programming and Applications, Volume 2, Edited by Rajkumar Buyya, Prentice Hall, NJ, USA, 1999.
This is the only comprehensive source for up-to-the-minute research on programming and applications for state-of-the-art highly-parallel "commodity supercomputers." The book is organized into three areas: programming, environments and development tools; Java as a language of choice for development in highly parallel systems; and state-of-the-art high performance algorithms and applications. All three areas have seen major advances in recent years-and in all three areas, this book offers unprecedented breadth and depth. Now, in this second volume, Rajkumar Buyya brings together contributions from some 30 leaders, addressing the field's most critical programming and applications challenges.
More information and additional resources at http://www.dgs.monash.edu.au/~rajkumar/cluster/index.html
Sandia's Scalable Clusters Workshop, Nov., 1997:
http://rocs-pc.ca.sandia.gov/scw/scw.html
Note: some broken links!
Individual files can be
accessed at http://rocs-pc.ca.sandia.gov/scw/
A new switched-fabric input/output (I/O) connectivity standards group formerly known as System I/O has been recently renamed the InfiniBandTM Trade Association. The new organization is led by seven steering companies: Compaq, Dell, Hewlett-Packard, IBM, Intel, Microsoft, and Sun Microsystems. Sponsoring companies include 3Com, Adaptec, Cisco, Fujitsu-Siemens, Hitachi, Lucent, NEC and Nortel Networks. Since another thirteen companies are also members, the Association has wide representation within the industry. The Association states it is dedicated to developing a new common I/O specification to deliver a channel based, switched fabric technology that the entire industry can adopt.
The players in the industry are moving quickly to replace the existing shared bus I/O bottleneck. Back in August, 1999, seven of the computer industry's leading companies, Compaq, Dell, Hewlett-Packard, IBM, Intel, Microsoft, and Sun Microsystems merged the best ideas of the Future I/O (FIO) and Next Generation I/O (NGIO) input/output architectures into one called System I/O. Two months later in October, 1999, these same seven companies formed the steering members of the new InfiniBandTM Trade Association. The Association claims to be on schedule to deliver a comprehensive draft specification to members by the end of 1999. A final release specification is targeted for early 2000 and initial products based on the specification are expected to be in production in 2001.
The new initiative is to provide one, powerful and scalable I/O architecture for the computer industry. The Association is developing an industry specification for a channel-based, switched fabric architecture that provides a scalable performance range of 500MB/s to 6GB/s per link, meeting the needs from entry level to high-end enterprise systems.
Of interest to TFCC members is that the initiative will provide an industry-standard for a system area network (SAN) fabric for efficient support of both conventional server I/O and inter-processor communication within parallel clusters.
Details of the specification aren't yet available because it's still being drafted. However, one can gleam some ideas from the presentations of the TFCC Sponsored Birds of a Feather (BOF) Session on High Speed Interconnects for COTS Cluster Computing at Super Computing '99, Portland, Oregon, USA on November 16, 1999. The presentations are on-line at http://www.atoll-net.de/news-bofsc99.html
It is clear to many in the industry that there will be dramatic increases in I/O requirements in the near future. And current bus-based I/O architectures such as PCI or its extension PCI-X won't hack it. The new specification seems to be aiming at a channel-based I/O architecture connecting two address spaces. Protected direct memory access (DMA) engines are driven by a work queue at each end. The channel communication uses packet switching and is controlled by a Host Channel Adaptor (HCA) and a Target Channel Adaptor (TCA) driven by a second work queue.
Hints from TFCC member Greg Pfister of IBM, who is actively involved in the specification, indicate that InfiniBandTM's target includes the "mass market" (high volume and cheap) and it's possible that the necessary components will eventually be on processor chips.
More information on InfiniBandTM is available at http://www.sysio.org/.
On November 11, 1999 for SuperComputing '99, Jack Dongarra of University of Tennessee, and Hans-Werner Meuer of Mannheim University published the 14th version of the TOP 500 Supercomputing Sites (http://www.top500.org/). Interestingly, the machine ranked 44th is described as "Self-made"! Two other "Self-made" or build-your-own computing clusters are listed as well. Computing clusters have moved into the big ring!
The 44th entry is held by the CPlant cluster of Sandia National Laboratories, Albuquerque , New Mexico, USA (http://www.cs.sandia.gov/cplant/). The compute partition of the Computational Plant (CPLANT) is composed of 592 Compaq XP1000 workstations, each of which contains:
The 592 compute node workstations are connected with Myricom's Myrinet gigabit networking hardware. Each node contains a 64-bit, 33 MHz Myrinet LANai-7 network interface card connected to a 16-port SAN/LAN switch.
The ranking of the TOP 500 is determined by the best Linpack benchmark performance. A 580 node CPlant performed at a RMAX of 232.6 GFLOPS compared to the number one ranking of the Intel-constructed ASCI Red also at Sandia National Labs, Albuquerque (RMAX of 2379.6 GFLOPS).
The other two "Self-made" Top 500 machines are the Avalon Cluster of Los Alamos National Laboratory (ranked 265th at RMAX of 48.6 GFLOPS) (http://cnls.lanl.gov/avalon/) and the Parnass2 Cluster at University Bonn (ranked 454th at RMAX of 34.23 GFLOPS) (http://wwwwissrech.iam.uni-bonn.de/research/projects/parnass).