Q-Logic IB6054601-00 D Bedienungsanleitung

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

Zur Seite of

Richtige Gebrauchsanleitung

Die Vorschriften verpflichten den Verkäufer zur Übertragung der Gebrauchsanleitung Q-Logic IB6054601-00 D an den Erwerber, zusammen mit der Ware. Eine fehlende Anleitung oder falsche Informationen, die dem Verbraucher übertragen werden, bilden eine Grundlage für eine Reklamation aufgrund Unstimmigkeit des Geräts mit dem Vertrag. Rechtsmäßig lässt man das Anfügen einer Gebrauchsanleitung in anderer Form als Papierform zu, was letztens sehr oft genutzt wird, indem man eine grafische oder elektronische Anleitung von Q-Logic IB6054601-00 D, sowie Anleitungsvideos für Nutzer beifügt. Die Bedingung ist, dass ihre Form leserlich und verständlich ist.

Was ist eine Gebrauchsanleitung?

Das Wort kommt vom lateinischen „instructio”, d.h. ordnen. Demnach kann man in der Anleitung Q-Logic IB6054601-00 D die Beschreibung der Etappen der Vorgehensweisen finden. Das Ziel der Anleitung ist die Belehrung, Vereinfachung des Starts, der Nutzung des Geräts oder auch der Ausführung bestimmter Tätigkeiten. Die Anleitung ist eine Sammlung von Informationen über ein Gegenstand/eine Dienstleistung, ein Hinweis.

Leider widmen nicht viele Nutzer ihre Zeit der Gebrauchsanleitung Q-Logic IB6054601-00 D. Eine gute Gebrauchsanleitung erlaubt nicht nur eine Reihe zusätzlicher Funktionen des gekauften Geräts kennenzulernen, sondern hilft dabei viele Fehler zu vermeiden.

Was sollte also eine ideale Gebrauchsanleitung beinhalten?

Die Gebrauchsanleitung Q-Logic IB6054601-00 D sollte vor allem folgendes enthalten:
- Informationen über technische Daten des Geräts Q-Logic IB6054601-00 D
- Den Namen des Produzenten und das Produktionsjahr des Geräts Q-Logic IB6054601-00 D
- Grundsätze der Bedienung, Regulierung und Wartung des Geräts Q-Logic IB6054601-00 D
- Sicherheitszeichen und Zertifikate, die die Übereinstimmung mit entsprechenden Normen bestätigen

Warum lesen wir keine Gebrauchsanleitungen?

Der Grund dafür ist die fehlende Zeit und die Sicherheit, was die bestimmten Funktionen der gekauften Geräte angeht. Leider ist das Anschließen und Starten von Q-Logic IB6054601-00 D zu wenig. Eine Anleitung beinhaltet eine Reihe von Hinweisen bezüglich bestimmter Funktionen, Sicherheitsgrundsätze, Wartungsarten (sogar das, welche Mittel man benutzen sollte), eventueller Fehler von Q-Logic IB6054601-00 D und Lösungsarten für Probleme, die während der Nutzung auftreten könnten. Immerhin kann man in der Gebrauchsanleitung die Kontaktnummer zum Service Q-Logic finden, wenn die vorgeschlagenen Lösungen nicht wirksam sind. Aktuell erfreuen sich Anleitungen in Form von interessanten Animationen oder Videoanleitungen an Popularität, die den Nutzer besser ansprechen als eine Broschüre. Diese Art von Anleitung gibt garantiert, dass der Nutzer sich das ganze Video anschaut, ohne die spezifizierten und komplizierten technischen Beschreibungen von Q-Logic IB6054601-00 D zu überspringen, wie es bei der Papierform passiert.

Warum sollte man Gebrauchsanleitungen lesen?

In der Gebrauchsanleitung finden wir vor allem die Antwort über den Bau sowie die Möglichkeiten des Geräts Q-Logic IB6054601-00 D, über die Nutzung bestimmter Accessoires und eine Reihe von Informationen, die erlauben, jegliche Funktionen und Bequemlichkeiten zu nutzen.

Nach dem gelungenen Kauf des Geräts, sollte man einige Zeit für das Kennenlernen jedes Teils der Anleitung von Q-Logic IB6054601-00 D widmen. Aktuell sind sie genau vorbereitet oder übersetzt, damit sie nicht nur verständlich für die Nutzer sind, aber auch ihre grundliegende Hilfs-Informations-Funktion erfüllen.

Inhaltsverzeichnis der Gebrauchsanleitungen

  • Seite 1

    IB6054601-00 D Page i Q Simplify InfiniPath User Guide V ersion 2.0[...]

  • Seite 2

    InfiniPath User Guide Version 2.0 Q Page ii IB6054601-00 D Information fu rnished in this manual is believed to be accurate and reliab le. However , QLogic Corporation assumes no responsibility for its use, nor for any infringements of patent s or othe r rights of third pa rti es which may result from its use . QLogic Corporation reserves the right[...]

  • Seite 3

    InfiniPath User Guide Version 2.0 Q IB6054601-00 D Page iii Added info about using MPI over uDAPL. Need to load modules rdma_cm and rdma_ucm. 3.7 Added section: Error me ssages gener ated by mpirun. T his explains more about the types of errors fo und in the sub-secti ons. Also added error messages related to failed connections between nodes C.8.12[...]

  • Seite 4

    InfiniPath User Guide Version 2.0 Q Page iv IB6054601-00 D © 2006, 2007 QLogic Cor poration. All rights reserved worldwide . © PathScale 2004, 20 05, 2006. All rights reserved. First Published: August 2005 Printed in U.S.A.[...]

  • Seite 5

    IB6054601-00 D Page v Table of Contents Section 1 Introduction 1.1 Who Should Read this Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 How this Guide is Organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .[...]

  • Seite 6

    InfiniPath User Guide Version 2.0 Page vi IB6054601-00 D Q 2.10 Performance and Management Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 2.10.1 Remove Unneeded Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 2.10.2 Disable Powersaving Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . [...]

  • Seite 7

    InfiniPath User Guide IB6054601-00 D Page vii Q InfiniPath User Guide Version 2.0 3.11 Debugging MPI Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3.11.1 MPI Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3.11.2 Using Debuggers . . . . . . . . . . . . .[...]

  • Seite 8

    InfiniPath User Guide Version 2.0 Page viii IB6054601-00 D Q C.4.5 OpenFabrics Load Errors If ib_ipath Driver Load Fails . . . . . . . . . . C-10 C.4.6 InfiniPath ib_ipath Initialization Failure . . . . . . . . . . . . . . . . . . . . . . C-1 1 C.4.7 MPI Job Failures Due to Initialization Problems . . . . . . . . . . . . . . . . . C-1 1 C.5 OpenFab[...]

  • Seite 9

    InfiniPath User Guide IB6054601-00 D Pa ge ix Q InfiniPath User Guide Version 2.0 C.9.1 1 ipath_pkt_test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-35 C.9.12 ipathstats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-35 C.9.13 lsmod . . . . . . . . . . . . . . . . . . [...]

  • Seite 10

    InfiniPath User Guide Version 2.0 Page x IB6054601-00 D Q Notes[...]

  • Seite 11

    IB6054601-00 D 1-1 Section 1 Introduction This chapter describes the ob jectives, in tended audience, and organization of the InfiniPath User Guide. The InfiniPath User Guide is intended to give the end users of an Inifin iPath cluster what they need to know to use it. In this case, end users a re understood to include both the cluster administrato[...]

  • Seite 12

    1 – Introduction Interoperability 1-2 IB6054601-00 D Q ■ Appendix E Glossary of technical terms ■ Index In addition, the InfiniPath Install Guide contain s information on InfiniPath hardware and software inst allation. 1.3 Overview The material in this docu mentation pertains to an In finiPath cluster . This is defined as a collection of node[...]

  • Seite 13

    1 – Introduction What’s New in this Release IB6054601-00 D 1-3 Q NOTE: OpenFabrics was known as OpenIB until March 2006. All relevant references to OpenIB in this documentation have been updated to reflect this change. See th e OpenFabrics website at http://www .openfabrics.org for more information on the OpenFab rics Alliance. 1.6 What’ s Ne[...]

  • Seite 14

    1 – Introduction Supported Distrib utions and Kernels 1-4 IB6054601-00 D Q Support for multiple versio ns of MPI has been added. Y ou can use a different version of MPI and achieve the high-ba ndwidth and low-latency performance that is standard with InfiniPath MPI. Also included is exp anded operating system support, and support for the latest O[...]

  • Seite 15

    1 – Introduction Software Components IB6054601-00 D 1-5 Q 1.8 S o ft w a r e C o m p o n e n ts The software p rovided with the InfiniPath Interconnect product co nsists of: ■ InfiniPath driver (including OpenFabrics) ■ InfiniPath ethernet emulation ■ InfiniPath libraries ■ InfiniPath utilities, configuration, and support to ols ■ Infin[...]

  • Seite 16

    1 – Introduction Documentation and T echnical Support 1-6 IB6054601-00 D Q NOTE: 32 bit OpenFabrics programs using the verb interfaces are not supported in this InfiniPath release, but w ill be suppo rted in a future release. 1.9 Conventions Used in this Document This Guide uses these ty pographical conventions: 1.10 Document ation and T echnical[...]

  • Seite 17

    1 – Introduction Documentation and Technica l Support IB6054601-00 D 1-7 Q ■ Readme file The T roubleshooting Appendix for inst allation, InfiniPath and OpenFabrics administration, and MPI issues is located in the InfiniPath User Guide . Visit the QLogic support Web site for document ation and the latest software updates. http://www .qlogic.com[...]

  • Seite 18

    1 – Introduction Documentation and T echnical Support 1-8 IB6054601-00 D Q Notes[...]

  • Seite 19

    IB6054601-00 D 2-1 Section 2 InfiniPath Cluster Administration This chapter describes what the cluster administra tor needs to know about the InfiniPath sof tware and system administration. 2.1 Introduction The InfiniPath driver ib_ipath , laye red Ethernet driver ipath_ether , OpenSM, and other modules and the protocol and MPI support libraries ar[...]

  • Seite 20

    2 – InfiniPath Clus ter Administratio n Memory Footpr int 2-2 IB6054601-00 D Q MPI include files are in: /usr/include MPI programming examples and source for several MPI benchmarks are in: /usr/share/mpich/examples InfiniPath utility programs, as well as MP I utilities and benchmarks are installed in: /usr/bin The InfiniPath kernel modules are in[...]

  • Seite 21

    2 – InfiniPath Cluster Administration Memory Footprint IB6054601-00 D 2-3 Q on system configura tion. OpenFabrics support is under develo pment and has not been fully characterized. This t able summarizes the guidelines. Here is an example for a 1024 processor system: ■ 1024 cores over 256 nodes (each no de has 2 sockets with dual-core processo[...]

  • Seite 22

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-4 IB6054601-00 D Q This breaks down to a memory footprint of 331MB per no de, as follows: 2.4 Configuration and S tartup 2.4.1 BIOS Settings A properly configured BIOS is required . The BIOS settings, which are stored in non-volatile memory , contain certa in parameters character[...]

  • Seite 23

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-5 Q Y ou can check and adjust these BIOS setti ngs using t he BIOS Setup Utility . For specific instructions on how to do this, follow the hardware document ation that came with your system. 2.4.2 InfiniPath Driver St artup The ib_ipath module provides low le vel Inf[...]

  • Seite 24

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-6 IB6054601-00 D Q and unmounted when the infinip ath script is invoked with the "stop" option (e.g. at system shutdown). The layout of the filesystem is as follows: atomic_stats 00/ 01/ ... The atomic_stats file cont ains general driver statistics. There is one numbere[...]

  • Seite 25

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-7 Q Y ou must create a network device configuration file for the layered Ethernet device on the InfiniPath adapter . This configuration file will resemble the configuration files for the other Ethernet devices on the no des. T ypically on servers there are two Ethern[...]

  • Seite 26

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-8 IB6054601-00 D Q If you are using DHCP (dynamic host configuration protoco l), add the following lines to ifcfg-eth2 : # QLogic Interconnect Ethernet DEVICE=eth2 ONBOOT=yes BOOTPROTO=dhcp If you are using static IP addresse s, use th e following lines instead, substituting your[...]

  • Seite 27

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-9 Q S tep 3 is applicable only to SLES 10; it is required because SLES 10 uses a newer version of the udev subsystem. NOTE: The MAC address (media access control address) is a unique identifier attached to most forms of networking equipment. S tep 2 below determines [...]

  • Seite 28

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-10 IB6054601-00 D Q Check each of the lines st arting with SUBSYSTEM= , to find the highest numbered interface. (For st andard motherboards, the highest numbered interface will typically be 1.) Add a new line at the end of the file, in crementing the interface number by one. In t[...]

  • Seite 29

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-11 Q 6. T o verify that the configuration files are correct, you will normally now be able to run the commands: # ifup eth2 # ifconfig eth2 Note that it may be necessary to reboot the syste m before the configuration changes will work. 2.4.7 OpenFabrics Configuration[...]

  • Seite 30

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-12 IB6054601-00 D Q T o verify the configuration, type: # ifconfig ib0 The output from this command should be similar to this: ib0 Link encap:InfiniBand HWaddr 00:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.1.17.3 Bcast:10.1.17.255 Mask:255.255.255.0 UP [...]

  • Seite 31

    2 – InfiniPath Cluster Administration Starting and Stopping the InfiniPath Software IB6054601-00 D 2-13 Q and you can stop it ag ain like this: # /etc/init.d/opensmd stop If you wish to pass any arguments to the OpenSM program, mo dify the file: /etc/init.d/opensmd and add the argument s to the "OPTIONS" variable. Here is an example : #[...]

  • Seite 32

    2 – InfiniPath Clus ter Administratio n Starting and Stopping the InfiniPath Software 2-14 IB6054601-00 D Q T o disable the driver on the next system boot, use the command (as ro ot): # chkconfig infinipath off NOTE: This does not stop and unload the driver , if it is already loaded. Y ou can start, stop, or rest art (as root) the InfiniPath supp[...]

  • Seite 33

    2 – InfiniPath Cluster Administration Configuring ssh and sshd Using shosts.equiv IB6054601-00 D 2-15 Q If there is output, you should look at the ou tput from this command to determine if it is configured: $ /sbin/ifconfig -a Finally , if you need to find which Infini Path and OpenFabrics modules are running, try the following command: $ lsmod |[...]

  • Seite 34

    2 – InfiniPath Clus ter Administratio n Configuring ssh and sshd Using shosts.equiv 2-16 IB6054601-00 D Q This next example assumes the following: ■ Both the cluster nodes and the front en d system are running the openssh package as d istributed in current Linux systems. ■ All cluster users have accounts with the same account name on the fron[...]

  • Seite 35

    2 – InfiniPath Cluster Administration Performance and Manag ement Tips IB6054601-00 D 2-17 Q 0zwxSL7GP1nEyFk9wAxCrXv3xPKxQaezQKs+KL95FouJvJ4qrSxxHdd1NYNR0D avEBVQgCaspgWvWQ8cL 0aUQmTbggLrtD9zETVU5PCgRlQL6I3Y5sCCHuO7/UvTH9nneCg== Change the file to mode 60 0 when finished editing. 4. On e ach node, the system file /etc/ssh/s shd_config must be edi[...]

  • Seite 36

    2 – InfiniPath Clus ter Administratio n Performance a nd Management Tips 2-18 IB6054601-00 D Q nodes. Since these are presumed t o be specialized computi ng appliances, they do not need many of the service daemons normally running on a general Linux computer . Following are several group s constituting a minimal necessary set of services. These a[...]

  • Seite 37

    2 – InfiniPath Cluster Administration Performance and Manag ement Tips IB6054601-00 D 2-19 Q For SUSE 9.3 and 10.0 run this comman d as root: # /sbin/chkconfig --level 12345 powersaved off After running e ither of these commands, the system will need to be rebooted for these changes to t ake effect. 2.10.3 Balanced Processor Power Higher processo[...]

  • Seite 38

    2 – InfiniPath Clus ter Administratio n Performance a nd Management Tips 2-20 IB6054601-00 D Q 2.10.6 Hyper-Threading If using Intel processors that support Hyper-Th reading, it is recommended that HyperThreading is turned o ff in the BIOS. This will provide more consistent performance. Y ou can check a nd adjust this setting using the BIOS Setup[...]

  • Seite 39

    2 – InfiniPath Cluster Administration Performance and Manag ement Tips IB6054601-00 D 2-21 Q 00: LID=0x30 MLID=0x0 GUID=00:11:75:00:00:07:11:97 Serial: 1236070407 Note that i path_control will report whether the inst alled adapter is the QHT7040, QHT7140, or the QLE7 140. It will also report whether the driver is InfiniPath-specific or not with t[...]

  • Seite 40

    2 – InfiniPath Clus ter Administratio n Customer Acceptance Utility 2-22 IB6054601-00 D Q $Id: kernel.org InfiniPath Release 2.0 $ $Date: 2006-09-15-04:16 $ /lib/modules/2.6.16.21-0.8-smp/updates/ipath.ko: $Id: kernel.org InfiniPath Release2.0 $ $Date: 2006-09-15-04:20 $ NOTE: ident is in the optional rcs RPM, and is not always inst alled. string[...]

  • Seite 41

    2 – InfiniPath Cluster Administration Customer Acceptance Utility IB6054601-00 D 2-23 Q 3. Gather a nd analyze system configuration from nodes. 4. Gather a nd analyze RPMs installed on nodes. 5. V erify InfiniPath hardware and software status and co nfiguration. 6. V erify ability to mpirun jobs on nodes. 7. Run b andwidth and latency test on eve[...]

  • Seite 42

    2 – InfiniPath Clus ter Administratio n Customer Acceptance Utility 2-24 IB6054601-00 D Q Notes[...]

  • Seite 43

    IB6054601-00 D 3-1 Section 3 Using InfiniPath MPI This chapter provides information on using InfiniPath MPI. Examp les are provided for compiling and running MPI programs. 3.1 InfiniPath MPI QLogic’s imple mentation of the MPI standard is derived fro m the MPICH reference implementation V ersion 1.2.6. The Infini Path MPI libraries have been high[...]

  • Seite 44

    3 – Using InfiniPath MPI Getting Started with MPI 3-2 IB6054601-00 D Q These examples assume that: ■ Y our cluster administrator has properly inst alled InfiniPath MPI and the PathScale compilers. ■ Y our cluster ’s policy allows you to use t he mpirun script directly , without having to submit the job to a batch queuing system. ■ Y ou or[...]

  • Seite 45

    3 – Using InfiniPath MPI Getting Started with MPI IB6054601-00 D 3-3 Q Here ./cpi designates the execut able of the example program in the working directory . The -np parameter to mpirun defines the number of processes to be used in the paralle l computation. Now try it with four processes: $ mpirun -np 4 -m mpihosts ./cpi Process 3 on hostname1 [...]

  • Seite 46

    3 – Using InfiniPath MPI Configuring MPI Programs for Infin iPath MPI 3-4 IB6054601-00 D Q and run it with: $ mpirun -np 2 -m mpihosts ./pi3f90 The C++ program hello++.cc is a p arallel processing version of the traditional “Hello, World” program. Notice that this version makes use of the external C bindings of the MPI functions if the C++ bi[...]

  • Seite 47

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-5 Q Y ou may need to instead p ass arguments to configure directly , in a fashion similar to this: $ ./configure -cc=mpicc -fc=mpif77 -c++=mpicxx -c++linker=mpicxx Sometimes you may need to edit a Makefile to achieve this result, adding lines similar to: CC=mpicc F77=mpif77 F90=mpif[...]

  • Seite 48

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-6 IB6054601-00 D Q The process is shown in the following step s: 1. Create a key pair . Use the default file name, and be sure to enter a p assphrase. $ ssh-keygen -t rsa 2. Enter a passphrase for your key pair when prompted. Note that the key agent does not survive X1 1 logout or system reboot: $[...]

  • Seite 49

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-7 Q 3.5.2 Compiling and Linking These scripts invoke the compiler and linker for programs in each of the respective languages, and t ake care of referring to the correct include files and libraries in each case. mpicc mpicxx mpif77 mpif90 mpif95 On x86_64, by default these ca ll the[...]

  • Seite 50

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-8 IB6054601-00 D Q line options. See the PathScale compiler documen tation and the man p ages for pathcc and pathf90 for complete information o n its options. See the corresponding documentation for any other compiler/linker you may call for it s options. 3.5.3 T o Use Another Compiler In addition[...]

  • Seite 51

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-9 Q T o use the Intel compiler for Fortran90/Fortran95 programs, use: $ mpif90 -f90=ifort ..... $ mpif95 -f95=ifort ..... Usage for other compilers will be similar to the examples above, substituting the options following -cc , -CC , -f77 , -f90 , or -f95 . Consu lt the documentatio[...]

  • Seite 52

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-10 IB6054601-00 D Q The current workaround for this is to comp ile on a supported and compatible distribution, then run the execut able on one of the systems that uses the GNU 4.x compilers and environment. ■ T o run on FC4 or FC5, install FC3 or RHEL4/CentOS on your build machine. Compile your [...]

  • Seite 53

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-11 Q program-name will generally be the p athname to the executable MPI program. I f the MPI program resides in the curr ent directory and the current directory is not in your search path, the n program-name must begin with ‘./’, such as: ./program-name Unless you want to run on[...]

  • Seite 54

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-12 IB6054601-00 D Q programs will be started on that host before using the next entry in the mpihosts file. If the full mpihosts file is processed, and there are still more processes requested, processing st arts again at the st art of the file. Y ou have several alternative ways of specifying the[...]

  • Seite 55

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-13 Q LD_LIBRARY_PATH, and other environment variables for the node programs through the use of the -rcfile option o f mpirun: $ mpirun -np n -m mpihosts -rcfile mpirunrc program In the absence of this option, mpirun checks to see if a file called $HOME/.mpirunrc exists in the user&a[...]

  • Seite 56

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-14 IB6054601-00 D Q 3.5.9 Multiprocessor Node s Another command line option, -ppn , i nstructs mpirun to assign a fixed numbe r p of node programs to e ach node, as it distributes the n inst ances among the nodes: $ mpirun -np n -m mpihosts -ppn p program-name This option overrides the :p specific[...]

  • Seite 57

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-15 Q -verbose Print diagnostic messages from mpir un itself. Can be useful in troubleshooting Default: Off -version, -v Print MPI version. Default: Of f -help, -h Print mpirun help message. Default: Of f -rcfile node-shell-script S tartup script for setting environment on nodes. Def[...]

  • Seite 58

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-16 IB6054601-00 D Q -nonmpi Run a non-MPI program. Required if the node program makes no MPI calls. Default: Off -quiescence-timeout, seconds W ait time in seconds for quiescence (absence o f MPI communication) on the nodes. Useful for detecting deadlocks. 0 disables qu iescence detection. Default[...]

  • Seite 59

    3 – Using InfiniPath MPI MPD IB6054601-00 D 3-17 Q -statsfile file-prefix S pecifies alternate file to receive the output from the -print-stats option. Default: stderr 3.6 Using Other MPI Implement ations Support for multiple MPI implement ations has been added. Y ou can use a diffe rent version of MPI and achieve the high-ba ndw idth and low-lat[...]

  • Seite 60

    3 – Using InfiniPath MPI File I/O in MPI 3-18 IB6054601-00 D Q 3.8.1 MPD Description The Multi-Purpose Daemon (MPD) was dev eloped by Argonne National Laborato ry (ANL), as part of the MPICH-2 system. While the ANL MPD had certain advant ages over the use of their mpirun (faster launching, better clea nup after crashes, better tolerance of node f[...]

  • Seite 61

    3 – Using InfiniPath MPI InfiniPath MPI and Hybrid MPI/OpenMP Applicatio ns IB6054601-00 D 3-19 Q accessed via some network file system, typically NFS. Paralle l programs usually need to have some dat a in files to be shared by all of the processes of an MPI job. Node programs may also use non-shared, node-specific files, such as for scratch stor[...]

  • Seite 62

    3 – Using InfiniPath MPI Debugging MP I Programs 3-20 IB6054601-00 D Q may be desirable to run multiple MPI processes and multiple OpenMP threads per node. The number of OpenMP threads is typically controlled by th e OMP_NUM_THREADS environment variable in the . mpirunrc file. This may be used to adjust the split between MPI pr ocesses and OpenMP[...]

  • Seite 63

    3 – Using InfiniPath MPI InfiniPath M PI Limitations IB6054601-00 D 3-21 Q Symbolic debugging is easier than machine language debugging. T o enable symbolic debugging you must have compile d with the -g option to mpicc so that the compiler will have included symbol t ables in the compiled object code. T o run your MPI program with a debugger use [...]

  • Seite 64

    3 – Using InfiniPath MPI InfiniPath M PI Limitations 3-22 IB6054601-00 D Q No ports available on /dev/ipath NOTE: If port sharing is enable d, this limit is raised to 16 and 8 respectively . T o enable port sharing, set PSM_SHAREDPOR TS=1 in your environment There are no C++ bindings to MPI -- use the exte rn C MPI function calls. In MPI-IO file [...]

  • Seite 65

    IB6054601-00 D A-1 Appendix A Benchmark Programs Several MPI performance measurement programs are inst alled from the mpi-benchmark RPM. This Appendix describe s these useful benchmarks and how to run them. These pr ograms are based on code from the group of Dr . Dhabaleswar K. Panda at the Network-Based Computing Laboratory at the Ohio S tate Univ[...]

  • Seite 66

    A – Benchmark Programs Benchmark 2: Mea suring MPI Bandwidth Between Two Node s A-2 IB6054601-00 D Q This benchmark always involves just two node programs. Y ou can run it with the command: $ mpirun -np 2 -ppn 1 -m mpihosts osu_latency The -ppn 1 option is ne eded to be certain that the two communicatin g processes are on differe nt nodes. Otherw[...]

  • Seite 67

    A – Benchmark Programs Benchmark 3: Messaging Rate Microbenchmarks IB6054601-00 D A-3 Q MPI_Isend function, while th e receiving node consumes them as quickly as it can using the non-blocking MPI_Irecv , and then returns a zero-length acknowledgement when all of the set has be en received. Y ou can run this program with: $ mpirun -np 2 -ppn 1 -m [...]

  • Seite 68

    A – Benchmark Programs Benchmark 3: Messaging Rate Micr obenchmarks A-4 IB6054601-00 D Q benchmark (as shown in the example above). It ha s been enhanced with the following additional functionality: ■ Messaging rate reported as well as bandwid th ■ N/2 dynamically calculated at end of run ■ Allows user to run multiple processes per node an [...]

  • Seite 69

    A – Benchmark Programs Benchmark 4: Measuring MPI Latency in Host Rings IB6054601-00 D A-5 Q A.4 Benchmark 4: Measuring MPI Latency in Host Rings The program mpi_latency can be used to measure latency in a ring of hosts. Its syntax is a bit different from Benchmark 1 in that it t akes command line argument s that let you specify the message size [...]

  • Seite 70

    A – Benchmark Programs Benchmark 4: Measur ing MPI Latency in Host Rings A-6 IB6054601-00 D Q Notes[...]

  • Seite 71

    IB6054601-00 D B-1 Appendix B Integration with a Batch Queuing System Most cluster systems use some kind of ba tch queuing system as an orderly way to provide users with access to the resou rce s they need to meet their job’s performance requirements. One of the tasks o f the clus ter administrator is to provide means for users to submit MPI jobs[...]

  • Seite 72

    B – Integration with a Batch Queuing System A Batch Queu ing Script B-2 IB6054601-00 D Q require that his node program be the on ly application running on each node CPU. In a typical batch environ ment, the MPI us er would still specify the number of node programs, but would depend on the batch system to allocate specific nodes when the required [...]

  • Seite 73

    B – Integration with a Batch Queuing Syst em A Batch Queuing Script IB6054601-00 D B-3 Q by mpirun. Each line consists of a node name, a colon , and the number of processes to start on that node. NOTE: This is one of two format s that the file may use. See section 3.5.6 for more information. B.1.3 Simple Process Management At this point, your scr[...]

  • Seite 74

    B – Integration with a Batch Queuing System Lock Enough Memory on Nodes When Using SLURM B-4 IB6054601-00 D Q The following command will terminate all processes using the InfiniPath interconnect: # /sbin/fuser -k /dev/ipath For more information, see the man pages for fuser(1) and lsof(8). NOTE: Run these commands as root to insure that all proces[...]

  • Seite 75

    IB6054601-00 D C-1 Appendix C T roubleshooting This Appendix describes some of the exis ting provisions fo r diagnosing and fixing problems. The sections a re organized in the following order: ■ C.1 “T roubleshooting InfiniPath adapter inst allation” ■ C.2 “BIOS settings” ■ C.3 “Software inst allation issues” ■ C.4 “Kernel and[...]

  • Seite 76

    C – Troubleshooting BIOS Settings C-2 IB6054601-00 D Q states of the LEDs. The gre en LED will normally illuminate first. The normal state is Green On, Amber On. If a node repeatedly and spont aneously reboots when attemptin g to load the InfiniPath driver , it may be a symptom that it s InfiniPath interconnect board is not well seated in the HTX[...]

  • Seite 77

    C – Troubleshooting BIOS Settings IB6054601-00 D C-3 Q C.2.1 MTRR Mapping and Write Combining MTRR (Memory T ype Range Registers) is us ed by the InfiniPath driver to enable write combining to the InfiniPa th on-chip transmit buffers. This improves write bandwidth to the In finiPath chip by writi ng multiple words in a single bus transaction (typ[...]

  • Seite 78

    C – Troubleshooting BIOS Settings C-4 IB6054601-00 D Q C.2.3 Incorrect MTRR Mapping Causes Unexpected Low Bandwid t h This same MTRR Mapping setting a s described in the previous section can also cause unexpected low bandwid th if it is set incorrectly . The setting should look like this: MTRR Mapping [Discrete] The MTRR Mapping needs to be set t[...]

  • Seite 79

    C – Troubleshooting Software Installation Issues IB6054601-00 D C-5 Q C.3 Sof tware Inst allation Issues This section cove rs issues related to sof tware installation. C.3.1 OpenFabrics Depe ndencies Y ou need to install sysfsutils for your distributio n before installing the OpenFabrics RPMs, as there are dependencies. If sysfsutils has not been[...]

  • Seite 80

    C – Troubleshooting Software Installation Issues C-6 IB6054601-00 D Q In older distributions, such as RHEL4, the 32-bit glibc will be contained in the libgcc RPM. The RPM will be named similarly to: libgcc-3.4.3-9.EL4.i386.rpm In newer distributions, glibc is an RPM name. The 32-b it glibc will be named similarly to: glibc-2.3.4-2.i686.rpm or gli[...]

  • Seite 81

    C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-7 Q 8. Relo ad all modules by using this command (as root): # /etc/init.d/infinipath start An alternate mechanism can be used, if provide d as part of your alternate installation . 9. Run a n OpenFabrics test program, such as ibstatus , to verify that your InfiniPath card(s) wo[...]

  • Seite 82

    C – Troubleshooting Kernel and Initializ ation Issues C-8 IB6054601-00 D Q C.4.1 Kernel Needs CONFIG_PCI_MSI=y If the InfiniPath driver is being compil ed on a machine without CONFIG_PCI_MSI=y configured, you will get a compilation error similar to this: ib_ipath/ipath_driver.c:46:2: #error "InfiniPath driver can only be used with kernels wi[...]

  • Seite 83

    C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-9 Q NOTE: This problem has been fixed in the 2.6.17 kernel.org kernel. C.4.3 Driver Load Fails Due to Unsupported Kernel If you try to load th e InfiniPath driver on a kernel that InfiniPath sof tware does not support, the load fails. Error me ssages similar to this appear : mo[...]

  • Seite 84

    C – Troubleshooting Kernel and Initializ ation Issues C-10 IB6054601-00 D Q A zero count in all CPU columns me ans that no interrupts have bee n delivered to the processor . Possible causes are: ■ Booting the linux kernel with ACPI (Adv anced Configuratio n and Power Interface) disabled on the boot command line, or in the BIOS configuration ■[...]

  • Seite 85

    C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-11 Q C.4.6 InfiniPath ib_ipath Initialization Failure There may be cases where ib_ipath was not properly initialized. Symptoms of this may show up in error messages from an MPI job or another program. Here is a sample command and error message: $ mpirun -np 2 -m ~/tmp/mbu13 osu[...]

  • Seite 86

    C – Troubleshooting System Administration Troubleshooting C-12 IB6054601-00 D Q C.5 OpenFabrics Issues This section covers items related to Open Fabrics, including OpenSM. C.5.1 S top OpenSM Before Stoppi ng/Rest arting InfiniPath OpenSM must be stopped before stopping or rest arting InfiniPath. If not, error messages such as the following will o[...]

  • Seite 87

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-13 Q C.6.1 Broken Intermediate Link Sometimes message traffic p asses through the fabric while other traf fic appears to be blocked. In this case, MPI jobs fail to run. In large cluster configura tions, switches may be attached to other switche s in order to supply the necessary [...]

  • Seite 88

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-14 IB6054601-00 D Q $ mpirun -v MPIRUN:Infinipath Release2.0 : Built on Wed Nov 19 17:28:58 PDT 2006 by mee The following is the error that occurs when m pirun from the 2.0 release is being used with the 1.3 libraries: $ mpirun-ipath-ssh -np 2 -ppn 1 -m ~/tmp/idev osu_latency MPIRUN: mpirun fr[...]

  • Seite 89

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-15 Q On a SLES 10 system, you would need: ■ compat-libst dc++ (for FC3) ■ compat-libst dc++5 (for SLES 10) Depending upon the ap plication, you may need to use the -W1 ,- Bstatic o ption to use the static ve rsions of some libraries. C.8.3 Compiler/Linker Mismatch This is a t[...]

  • Seite 90

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-16 IB6054601-00 D Q For these examples in Section C.8.5 below , we assume that these new locations are: /path/to/devel (for mpi-devel-*) /path/to/libs (for mpi-libs-*) C.8.5 Compiling on Development Nodes If the mpi-devel-* rpm is inst alled with the --prefix /path/to/devel option then mpicc ,[...]

  • Seite 91

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-17 Q The above compiler command insures that the program will run using this path on any machine. For the second option, we change the file /etc/ld.so.conf on the compute nodes rather than using the -Wl,-rpath , option when compiling on the development node . We assume that the m[...]

  • Seite 92

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-18 IB6054601-00 D Q Examples are given below . In the following command, the HP-MPI version of mpirun is invoked by the full pathname. Howeve r , the program mpi_nxnlatbw was compiled with the QLogic version of mpicc . The mismatch will produc e errors similar this: $ /opt/hpmpi/bin/mpirun -ho[...]

  • Seite 93

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-19 Q The following two commands will bo th work properly: QLogic mpirun and execut able used together: $ mpirun -m ~/host-bbb -np 4 /usr/bin/mpi_nxnlatbw HP-MPI mpirun and execut able used together: $ /opt/hpmpi/bin/mpirun -hostlist "bbb-01,bbb-02,bbb-03,bbb-04" -np 4[...]

  • Seite 94

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-20 IB6054601-00 D Q ^ pathf95-389 pathf90: ERROR BORDERS, File = communicate.F, Line = 407, Column = 18 No specific match can be found for the generic subprogram call "MPI_RECV". If it is necessary to use a non-st andard argument list, it is advisable to create your own MPI module fi[...]

  • Seite 95

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-21 Q integer count, datatype, root, co mm, ierror ! Call the Fortran 77 style impli cit interface to "mpi_bcast" external mpi_bcast call mpi_bcast(buffer, count, dat atype, root, comm, ierror) end subroutine additional_mpi_bca st_for_character end module additional_bcas[...]

  • Seite 96

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-22 IB6054601-00 D Q If this file is not present or the node has not been rebooted af ter the infinipath RPM has been inst alled, a failure message similar to this will be generated: $ mpirun -m ~/tmp/sm -np 2 -mpi_latency 1000 1000000 node-00:1.ipath_update_tid_err: failed: Cannot allocate mem[...]

  • Seite 97

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-23 Q Found unknown timer type type unknown frame type type recv done: available_tids now n, but max is m (freed p) cancel recv available_tids now n, but max is m (freed %p) [n] Src lid error: sender: x, exp send: y Frame receive from unknown sender. exp. sender = x, came from y F[...]

  • Seite 98

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-24 IB6054601-00 D Q The following message indicates th at a node program may not be processing incoming packe ts, perhaps due to a very high system load: eager array full after overflow, flushing (head h, tail t) The following indicates an invalid In finiPath link protocol version: InfiniPath [...]

  • Seite 99

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-25 Q These messages appear in the mpirun output. Most a re followed by an abort, and possibly a backtrace. Ea ch is preceded by the name of the function in which the exception occurred. Error sending packet: description Error receiving packet: description A fatal protocol error o[...]

  • Seite 100

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-26 IB6054601-00 D Q There is no route to any host: $ mpirun -np 2 -m ~/tmp/q mpi_latency 100 100 ssh: connect to host <nodename> port 22: No route to host ssh: connect to host <nodename> port 22: No route to host MPIRUN: All node programs ended prematurely without connecting to mpi[...]

  • Seite 101

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-27 Q $ mpirun -np 2 -m ~/tmp/q -q 60 mpi_latency 1000000 1000000 MPIRUN: MPI progress Quiescence Detected after 9000 seconds. MPIRUN: 2 out of 2 ranks showed no MPI send or receive progress. MPIRUN: Per-rank details are the following: MPIRUN: Rank 0 (<nodename>) caused MPI [...]

  • Seite 102

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-28 IB6054601-00 D Q C.8.13 MPI St ats Using the -print-stats option to mpirun will result in a listing to stderr of various MPI statistics. Here is example o utput for the -print-stats option when used with an 8-rank run of the HPCC benchmark. Message statistics are available for tr ansmitted [...]

  • Seite 103

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-29 Q C.9 Useful Programs and Files f or Debugging The most useful programs and files for debugging are listed in the sections below . Many of these programs and files have been discussed elsewhere in the documentation : this information is summarized and repeated here f[...]

  • Seite 104

    C – Troubleshooting Useful Programs and Files for Debugging C-30 IB6054601-00 D Q C.9.3 Summary of Useful Programs and Files Useful programs and files are summarized in the table below . Descriptions for some of the programs and files follow . Check ma n pages for mo re information on the programs. T able C-2. Useful Programs and Files Program or[...]

  • Seite 105

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-31 Q C.9.4 boardversion It may be useful to keep track of the current version of the inst alled software. Y ou can check the version of the installed In finiPath software by looking in: /sys/bus/pci/drivers/ib_ipath/00/boardversion Example content s are: Driver 2.0,Infi[...]

  • Seite 106

    C – Troubleshooting Useful Programs and Files for Debugging C-32 IB6054601-00 D Q C.9.5 ibstatus This program displays basic information on t he st atus of InfiniBand devices that are currently in use when the OpenFabrics modules are loaded . C.9.6 ibv_devinfo This program displays information about Infi niBand de vices, including various kinds o[...]

  • Seite 107

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-33 Q C.9.8 ipath_checkout ipath_checkout is a bash script used to verify that the inst allation is correct and that all the nodes of the network are functioning and mutually connected by the InfiniPath fabric. It is to be run on a front end node, and re quires specifica[...]

  • Seite 108

    C – Troubleshooting Useful Programs and Files for Debugging C-34 IB6054601-00 D Q --workdir=DIR Use DIR to hold intermediate files crea ted while running tests. DIR must not already exist. -k, --keep Keep intermediate files that were created while pe rforming tests and compiling reports. Result s will be saved in a directory created by mktemp and[...]

  • Seite 109

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-35 Q 00: LID=0x30 MLID=0x0 GUID=00:11:75:00:00:07:11:97 Serial: 1236070407 C.9.10 ipathbug-helper The tool ipathbug-helper is useful for verifying homogene ity . Prior to seeking assistance from QLogic tech nical support, you should run this script on the head node of y[...]

  • Seite 110

    C – Troubleshooting Useful Programs and Files for Debugging C-36 IB6054601-00 D Q C.9.13 lsmod If you need to find which InfiniPath and OpenFabrics modules are running, try the following command: # lsmod | egrep ’ipath_|ib_|rdma_|findex’ C.9.14 mpirun mpirun can give information on whether the program is being run against a QLogic or non-QLog[...]

  • Seite 111

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-37 Q The following table sh ows the possible contents of the file, with brief explanations of the entries. In this same directory are other files cont aining information related to st atus. They are summarized in t able C-4 . T able C-3. status_str File File content s D[...]

  • Seite 112

    C – Troubleshooting Useful Programs and Files for Debugging C-38 IB6054601-00 D Q C.9.17 strings The command strings can also be used. It s format is as follows: $ strings /usr/lib/libinfinipath.so.4.0 | grep Date: will produce output like this: $Date: 2006-09-15 04:07 Release2.0 InfiniPath $ NOTE: strings is part of binutils (a development RPM),[...]

  • Seite 113

    IB6054601-00 D D-1 Appendix D Recommended Reading Reference material for furthe r reading is provided here. D.1 References for MPI The MPI S tandard specification document s. http://www .mpi-forum.org/docs The MPICH implementation of MPI and its documentation. http://www-unix.mcs.anl.gov/mpi/mpich / The ROMIO distribution and it s documentation. ht[...]

  • Seite 114

    D – Recommended Reading Rocks D-2 IB6054601-00 D Q D.6 Clusters Gropp, William, Ewing Lusk, and Thomas S terling, Beowulf Cluster Computing with Linux , Second Edition, 2003, MIT Press, ISBN 0-262-69292-9. D.7 Rocks Extensive document ation on instal ling Rocks and cu stom Rolls. http://www .rocksclusters.org/[...]

  • Seite 115

    IB6054601-00 D E-1 Appendix E Glossary A glossary is provided below for technica l terms used in the documentation. bandwid th The rate at which dat a can be transmitted. This represents the cap acity of the network connection. Theoretical peak bandwid th is fixed, but the effective bandwid th , the ideal rate is modified by overhead in hardware an[...]

  • Seite 116

    E – Glossary E-2 IB6054601-00 D Q GID For Global Identifier . Used for routing between dif ferent InfiniBand subnet s. GUID For Globally Unique Identifier for the InfiniPath chip. Equivalent to Ethernet MAC address. head node Same as front end node . HCA For Host Channel Adapter . HCAs are I/O engine s located within processing nodes, connecting [...]

  • Seite 117

    E – Glossary IB6054601-00 D E-3 Q LID For Local Identifier . Assigned by the Subnet Manager (SM) to each visible node within a sin gle InfiniBand fabric. It is similar conceptually to an IP ad dress for TCP/IP . Lustre Open source project to dev elop scalable cluster file systems. MAC Address For Media Access Control Address . It is a unique iden[...]

  • Seite 118

    E – Glossary E-4 IB6054601-00 D Q MTRR For Memory T y pe Range Registers . MTRR For "Memory T ype Range Registers". Used by the InfiniPath driver to enable write combinin g to the InfiniPath on-chip transmit bu f fers. This improves write bandwidth to th e InfiniPath chip, by writing multiple words in a single bus tra nsaction (typicall[...]

  • Seite 119

    E – Glossary IB6054601-00 D E-5 Q SDP For S ockets Direct Protocol . An I nfiniBand-specific upper layer protocol. It defines a standard wire protocol to support stream socket s networking over InfiniBand. SRP For SCSI RDMA Protocol . The implement ation of this protocol is under developm ent for utilizing block storage devices over an InfiniBan [...]

  • Seite 120

    E – Glossary E-6 IB6054601-00 D Q Notes[...]

  • Seite 121

    IB6054601-00 D Index- 1 Index A ACPI, enabling C-9 B Batch queuing for MPI jobs B-1 – B-4 Benchmarking MPI bandwidth A-2 – A-3 MPI latency measurement A-1 – A-2 MPI latency measurement in host rings A-5 C Compiling MPI programs compiler and linker variables 3-9 scripts for invoking compiler and linker 3- 7 specifying compilers and linkers 3-4[...]

  • Seite 122

    InfiniPath User Guide Version 2.0 Beta2 Index-2 IB6054601-00 D Q configuration of on SUSE and SLES 10 2-8 – 2-11 layered Ethernet driver 2-6 ipathbug_helper C-30 , C-35 L LEDs, showing state of system with C-1 Limitations of PathScale MPI 3-21 M Management tips maintaining homogeneous nodes 2-20 useful tools for verifying homogeneity 2- 20 MPD, a[...]