Q-Logic IB6054601-00 D manual

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

Ir para a página of

Bom manual de uso

As regras impõem ao revendedor a obrigação de fornecer ao comprador o manual com o produto Q-Logic IB6054601-00 D. A falta de manual ou informações incorretas fornecidas ao consumidor são a base de uma queixa por não conformidade do produto com o contrato. De acordo com a lei, pode anexar o manual em uma outra forma de que em papel, o que é frequentemente utilizado, anexando uma forma gráfica ou manual electrónicoQ-Logic IB6054601-00 D vídeos instrutivos para os usuários. A condição é uma forma legível e compreensível.

O que é a instrução?

A palavra vem do latim "Instructio" ou instruir. Portanto, no manual Q-Logic IB6054601-00 D você pode encontrar uma descrição das fases do processo. O objetivo do manual é instruir, facilitar o arranque, a utilização do equipamento ou a execução de determinadas tarefas. O manual é uma coleção de informações sobre o objeto / serviço, um guia.

Infelizmente, pequenos usuários tomam o tempo para ler o manual Q-Logic IB6054601-00 D, e um bom manual não só permite conhecer uma série de funcionalidades adicionais do dispositivo, mas evita a formação da maioria das falhas.

Então, o que deve conter o manual perfeito?

Primeiro, o manual Q-Logic IB6054601-00 D deve conte:
- dados técnicos do dispositivo Q-Logic IB6054601-00 D
- nome do fabricante e ano de fabricação do dispositivo Q-Logic IB6054601-00 D
- instruções de utilização, regulação e manutenção do dispositivo Q-Logic IB6054601-00 D
- sinais de segurança e certificados que comprovam a conformidade com as normas pertinentes

Por que você não ler manuais?

Normalmente, isso é devido à falta de tempo e à certeza quanto à funcionalidade específica do dispositivo adquirido. Infelizmente, a mesma ligação e o arranque Q-Logic IB6054601-00 D não são suficientes. O manual contém uma série de orientações sobre funcionalidades específicas, a segurança, os métodos de manutenção (mesmo sobre produtos que devem ser usados), possíveis defeitos Q-Logic IB6054601-00 D e formas de resolver problemas comuns durante o uso. No final, no manual podemos encontrar as coordenadas do serviço Q-Logic na ausência da eficácia das soluções propostas. Atualmente, muito apreciados são manuais na forma de animações interessantes e vídeos de instrução que de uma forma melhor do que o o folheto falam ao usuário. Este tipo de manual é a chance que o usuário percorrer todo o vídeo instrutivo, sem ignorar especificações e descrições técnicas complicadas Q-Logic IB6054601-00 D, como para a versão papel.

Por que ler manuais?

Primeiro de tudo, contem a resposta sobre a construção, as possibilidades do dispositivo Q-Logic IB6054601-00 D, uso dos acessórios individuais e uma gama de informações para desfrutar plenamente todos os recursos e facilidades.

Após a compra bem sucedida de um equipamento / dispositivo, é bom ter um momento para se familiarizar com cada parte do manual Q-Logic IB6054601-00 D. Atualmente, são cuidadosamente preparados e traduzidos para sejam não só compreensíveis para os usuários, mas para cumprir a sua função básica de informação

Índice do manual

  • Página 1

    IB6054601-00 D Page i Q Simplify InfiniPath User Guide V ersion 2.0[...]

  • Página 2

    InfiniPath User Guide Version 2.0 Q Page ii IB6054601-00 D Information fu rnished in this manual is believed to be accurate and reliab le. However , QLogic Corporation assumes no responsibility for its use, nor for any infringements of patent s or othe r rights of third pa rti es which may result from its use . QLogic Corporation reserves the right[...]

  • Página 3

    InfiniPath User Guide Version 2.0 Q IB6054601-00 D Page iii Added info about using MPI over uDAPL. Need to load modules rdma_cm and rdma_ucm. 3.7 Added section: Error me ssages gener ated by mpirun. T his explains more about the types of errors fo und in the sub-secti ons. Also added error messages related to failed connections between nodes C.8.12[...]

  • Página 4

    InfiniPath User Guide Version 2.0 Q Page iv IB6054601-00 D © 2006, 2007 QLogic Cor poration. All rights reserved worldwide . © PathScale 2004, 20 05, 2006. All rights reserved. First Published: August 2005 Printed in U.S.A.[...]

  • Página 5

    IB6054601-00 D Page v Table of Contents Section 1 Introduction 1.1 Who Should Read this Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 How this Guide is Organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .[...]

  • Página 6

    InfiniPath User Guide Version 2.0 Page vi IB6054601-00 D Q 2.10 Performance and Management Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 2.10.1 Remove Unneeded Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 2.10.2 Disable Powersaving Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . [...]

  • Página 7

    InfiniPath User Guide IB6054601-00 D Page vii Q InfiniPath User Guide Version 2.0 3.11 Debugging MPI Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3.11.1 MPI Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3.11.2 Using Debuggers . . . . . . . . . . . . .[...]

  • Página 8

    InfiniPath User Guide Version 2.0 Page viii IB6054601-00 D Q C.4.5 OpenFabrics Load Errors If ib_ipath Driver Load Fails . . . . . . . . . . C-10 C.4.6 InfiniPath ib_ipath Initialization Failure . . . . . . . . . . . . . . . . . . . . . . C-1 1 C.4.7 MPI Job Failures Due to Initialization Problems . . . . . . . . . . . . . . . . . C-1 1 C.5 OpenFab[...]

  • Página 9

    InfiniPath User Guide IB6054601-00 D Pa ge ix Q InfiniPath User Guide Version 2.0 C.9.1 1 ipath_pkt_test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-35 C.9.12 ipathstats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-35 C.9.13 lsmod . . . . . . . . . . . . . . . . . . [...]

  • Página 10

    InfiniPath User Guide Version 2.0 Page x IB6054601-00 D Q Notes[...]

  • Página 11

    IB6054601-00 D 1-1 Section 1 Introduction This chapter describes the ob jectives, in tended audience, and organization of the InfiniPath User Guide. The InfiniPath User Guide is intended to give the end users of an Inifin iPath cluster what they need to know to use it. In this case, end users a re understood to include both the cluster administrato[...]

  • Página 12

    1 – Introduction Interoperability 1-2 IB6054601-00 D Q ■ Appendix E Glossary of technical terms ■ Index In addition, the InfiniPath Install Guide contain s information on InfiniPath hardware and software inst allation. 1.3 Overview The material in this docu mentation pertains to an In finiPath cluster . This is defined as a collection of node[...]

  • Página 13

    1 – Introduction What’s New in this Release IB6054601-00 D 1-3 Q NOTE: OpenFabrics was known as OpenIB until March 2006. All relevant references to OpenIB in this documentation have been updated to reflect this change. See th e OpenFabrics website at http://www .openfabrics.org for more information on the OpenFab rics Alliance. 1.6 What’ s Ne[...]

  • Página 14

    1 – Introduction Supported Distrib utions and Kernels 1-4 IB6054601-00 D Q Support for multiple versio ns of MPI has been added. Y ou can use a different version of MPI and achieve the high-ba ndwidth and low-latency performance that is standard with InfiniPath MPI. Also included is exp anded operating system support, and support for the latest O[...]

  • Página 15

    1 – Introduction Software Components IB6054601-00 D 1-5 Q 1.8 S o ft w a r e C o m p o n e n ts The software p rovided with the InfiniPath Interconnect product co nsists of: ■ InfiniPath driver (including OpenFabrics) ■ InfiniPath ethernet emulation ■ InfiniPath libraries ■ InfiniPath utilities, configuration, and support to ols ■ Infin[...]

  • Página 16

    1 – Introduction Documentation and T echnical Support 1-6 IB6054601-00 D Q NOTE: 32 bit OpenFabrics programs using the verb interfaces are not supported in this InfiniPath release, but w ill be suppo rted in a future release. 1.9 Conventions Used in this Document This Guide uses these ty pographical conventions: 1.10 Document ation and T echnical[...]

  • Página 17

    1 – Introduction Documentation and Technica l Support IB6054601-00 D 1-7 Q ■ Readme file The T roubleshooting Appendix for inst allation, InfiniPath and OpenFabrics administration, and MPI issues is located in the InfiniPath User Guide . Visit the QLogic support Web site for document ation and the latest software updates. http://www .qlogic.com[...]

  • Página 18

    1 – Introduction Documentation and T echnical Support 1-8 IB6054601-00 D Q Notes[...]

  • Página 19

    IB6054601-00 D 2-1 Section 2 InfiniPath Cluster Administration This chapter describes what the cluster administra tor needs to know about the InfiniPath sof tware and system administration. 2.1 Introduction The InfiniPath driver ib_ipath , laye red Ethernet driver ipath_ether , OpenSM, and other modules and the protocol and MPI support libraries ar[...]

  • Página 20

    2 – InfiniPath Clus ter Administratio n Memory Footpr int 2-2 IB6054601-00 D Q MPI include files are in: /usr/include MPI programming examples and source for several MPI benchmarks are in: /usr/share/mpich/examples InfiniPath utility programs, as well as MP I utilities and benchmarks are installed in: /usr/bin The InfiniPath kernel modules are in[...]

  • Página 21

    2 – InfiniPath Cluster Administration Memory Footprint IB6054601-00 D 2-3 Q on system configura tion. OpenFabrics support is under develo pment and has not been fully characterized. This t able summarizes the guidelines. Here is an example for a 1024 processor system: ■ 1024 cores over 256 nodes (each no de has 2 sockets with dual-core processo[...]

  • Página 22

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-4 IB6054601-00 D Q This breaks down to a memory footprint of 331MB per no de, as follows: 2.4 Configuration and S tartup 2.4.1 BIOS Settings A properly configured BIOS is required . The BIOS settings, which are stored in non-volatile memory , contain certa in parameters character[...]

  • Página 23

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-5 Q Y ou can check and adjust these BIOS setti ngs using t he BIOS Setup Utility . For specific instructions on how to do this, follow the hardware document ation that came with your system. 2.4.2 InfiniPath Driver St artup The ib_ipath module provides low le vel Inf[...]

  • Página 24

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-6 IB6054601-00 D Q and unmounted when the infinip ath script is invoked with the "stop" option (e.g. at system shutdown). The layout of the filesystem is as follows: atomic_stats 00/ 01/ ... The atomic_stats file cont ains general driver statistics. There is one numbere[...]

  • Página 25

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-7 Q Y ou must create a network device configuration file for the layered Ethernet device on the InfiniPath adapter . This configuration file will resemble the configuration files for the other Ethernet devices on the no des. T ypically on servers there are two Ethern[...]

  • Página 26

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-8 IB6054601-00 D Q If you are using DHCP (dynamic host configuration protoco l), add the following lines to ifcfg-eth2 : # QLogic Interconnect Ethernet DEVICE=eth2 ONBOOT=yes BOOTPROTO=dhcp If you are using static IP addresse s, use th e following lines instead, substituting your[...]

  • Página 27

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-9 Q S tep 3 is applicable only to SLES 10; it is required because SLES 10 uses a newer version of the udev subsystem. NOTE: The MAC address (media access control address) is a unique identifier attached to most forms of networking equipment. S tep 2 below determines [...]

  • Página 28

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-10 IB6054601-00 D Q Check each of the lines st arting with SUBSYSTEM= , to find the highest numbered interface. (For st andard motherboards, the highest numbered interface will typically be 1.) Add a new line at the end of the file, in crementing the interface number by one. In t[...]

  • Página 29

    2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-11 Q 6. T o verify that the configuration files are correct, you will normally now be able to run the commands: # ifup eth2 # ifconfig eth2 Note that it may be necessary to reboot the syste m before the configuration changes will work. 2.4.7 OpenFabrics Configuration[...]

  • Página 30

    2 – InfiniPath Clus ter Administratio n Configuration and Startup 2-12 IB6054601-00 D Q T o verify the configuration, type: # ifconfig ib0 The output from this command should be similar to this: ib0 Link encap:InfiniBand HWaddr 00:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.1.17.3 Bcast:10.1.17.255 Mask:255.255.255.0 UP [...]

  • Página 31

    2 – InfiniPath Cluster Administration Starting and Stopping the InfiniPath Software IB6054601-00 D 2-13 Q and you can stop it ag ain like this: # /etc/init.d/opensmd stop If you wish to pass any arguments to the OpenSM program, mo dify the file: /etc/init.d/opensmd and add the argument s to the "OPTIONS" variable. Here is an example : #[...]

  • Página 32

    2 – InfiniPath Clus ter Administratio n Starting and Stopping the InfiniPath Software 2-14 IB6054601-00 D Q T o disable the driver on the next system boot, use the command (as ro ot): # chkconfig infinipath off NOTE: This does not stop and unload the driver , if it is already loaded. Y ou can start, stop, or rest art (as root) the InfiniPath supp[...]

  • Página 33

    2 – InfiniPath Cluster Administration Configuring ssh and sshd Using shosts.equiv IB6054601-00 D 2-15 Q If there is output, you should look at the ou tput from this command to determine if it is configured: $ /sbin/ifconfig -a Finally , if you need to find which Infini Path and OpenFabrics modules are running, try the following command: $ lsmod |[...]

  • Página 34

    2 – InfiniPath Clus ter Administratio n Configuring ssh and sshd Using shosts.equiv 2-16 IB6054601-00 D Q This next example assumes the following: ■ Both the cluster nodes and the front en d system are running the openssh package as d istributed in current Linux systems. ■ All cluster users have accounts with the same account name on the fron[...]

  • Página 35

    2 – InfiniPath Cluster Administration Performance and Manag ement Tips IB6054601-00 D 2-17 Q 0zwxSL7GP1nEyFk9wAxCrXv3xPKxQaezQKs+KL95FouJvJ4qrSxxHdd1NYNR0D avEBVQgCaspgWvWQ8cL 0aUQmTbggLrtD9zETVU5PCgRlQL6I3Y5sCCHuO7/UvTH9nneCg== Change the file to mode 60 0 when finished editing. 4. On e ach node, the system file /etc/ssh/s shd_config must be edi[...]

  • Página 36

    2 – InfiniPath Clus ter Administratio n Performance a nd Management Tips 2-18 IB6054601-00 D Q nodes. Since these are presumed t o be specialized computi ng appliances, they do not need many of the service daemons normally running on a general Linux computer . Following are several group s constituting a minimal necessary set of services. These a[...]

  • Página 37

    2 – InfiniPath Cluster Administration Performance and Manag ement Tips IB6054601-00 D 2-19 Q For SUSE 9.3 and 10.0 run this comman d as root: # /sbin/chkconfig --level 12345 powersaved off After running e ither of these commands, the system will need to be rebooted for these changes to t ake effect. 2.10.3 Balanced Processor Power Higher processo[...]

  • Página 38

    2 – InfiniPath Clus ter Administratio n Performance a nd Management Tips 2-20 IB6054601-00 D Q 2.10.6 Hyper-Threading If using Intel processors that support Hyper-Th reading, it is recommended that HyperThreading is turned o ff in the BIOS. This will provide more consistent performance. Y ou can check a nd adjust this setting using the BIOS Setup[...]

  • Página 39

    2 – InfiniPath Cluster Administration Performance and Manag ement Tips IB6054601-00 D 2-21 Q 00: LID=0x30 MLID=0x0 GUID=00:11:75:00:00:07:11:97 Serial: 1236070407 Note that i path_control will report whether the inst alled adapter is the QHT7040, QHT7140, or the QLE7 140. It will also report whether the driver is InfiniPath-specific or not with t[...]

  • Página 40

    2 – InfiniPath Clus ter Administratio n Customer Acceptance Utility 2-22 IB6054601-00 D Q $Id: kernel.org InfiniPath Release 2.0 $ $Date: 2006-09-15-04:16 $ /lib/modules/2.6.16.21-0.8-smp/updates/ipath.ko: $Id: kernel.org InfiniPath Release2.0 $ $Date: 2006-09-15-04:20 $ NOTE: ident is in the optional rcs RPM, and is not always inst alled. string[...]

  • Página 41

    2 – InfiniPath Cluster Administration Customer Acceptance Utility IB6054601-00 D 2-23 Q 3. Gather a nd analyze system configuration from nodes. 4. Gather a nd analyze RPMs installed on nodes. 5. V erify InfiniPath hardware and software status and co nfiguration. 6. V erify ability to mpirun jobs on nodes. 7. Run b andwidth and latency test on eve[...]

  • Página 42

    2 – InfiniPath Clus ter Administratio n Customer Acceptance Utility 2-24 IB6054601-00 D Q Notes[...]

  • Página 43

    IB6054601-00 D 3-1 Section 3 Using InfiniPath MPI This chapter provides information on using InfiniPath MPI. Examp les are provided for compiling and running MPI programs. 3.1 InfiniPath MPI QLogic’s imple mentation of the MPI standard is derived fro m the MPICH reference implementation V ersion 1.2.6. The Infini Path MPI libraries have been high[...]

  • Página 44

    3 – Using InfiniPath MPI Getting Started with MPI 3-2 IB6054601-00 D Q These examples assume that: ■ Y our cluster administrator has properly inst alled InfiniPath MPI and the PathScale compilers. ■ Y our cluster ’s policy allows you to use t he mpirun script directly , without having to submit the job to a batch queuing system. ■ Y ou or[...]

  • Página 45

    3 – Using InfiniPath MPI Getting Started with MPI IB6054601-00 D 3-3 Q Here ./cpi designates the execut able of the example program in the working directory . The -np parameter to mpirun defines the number of processes to be used in the paralle l computation. Now try it with four processes: $ mpirun -np 4 -m mpihosts ./cpi Process 3 on hostname1 [...]

  • Página 46

    3 – Using InfiniPath MPI Configuring MPI Programs for Infin iPath MPI 3-4 IB6054601-00 D Q and run it with: $ mpirun -np 2 -m mpihosts ./pi3f90 The C++ program hello++.cc is a p arallel processing version of the traditional “Hello, World” program. Notice that this version makes use of the external C bindings of the MPI functions if the C++ bi[...]

  • Página 47

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-5 Q Y ou may need to instead p ass arguments to configure directly , in a fashion similar to this: $ ./configure -cc=mpicc -fc=mpif77 -c++=mpicxx -c++linker=mpicxx Sometimes you may need to edit a Makefile to achieve this result, adding lines similar to: CC=mpicc F77=mpif77 F90=mpif[...]

  • Página 48

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-6 IB6054601-00 D Q The process is shown in the following step s: 1. Create a key pair . Use the default file name, and be sure to enter a p assphrase. $ ssh-keygen -t rsa 2. Enter a passphrase for your key pair when prompted. Note that the key agent does not survive X1 1 logout or system reboot: $[...]

  • Página 49

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-7 Q 3.5.2 Compiling and Linking These scripts invoke the compiler and linker for programs in each of the respective languages, and t ake care of referring to the correct include files and libraries in each case. mpicc mpicxx mpif77 mpif90 mpif95 On x86_64, by default these ca ll the[...]

  • Página 50

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-8 IB6054601-00 D Q line options. See the PathScale compiler documen tation and the man p ages for pathcc and pathf90 for complete information o n its options. See the corresponding documentation for any other compiler/linker you may call for it s options. 3.5.3 T o Use Another Compiler In addition[...]

  • Página 51

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-9 Q T o use the Intel compiler for Fortran90/Fortran95 programs, use: $ mpif90 -f90=ifort ..... $ mpif95 -f95=ifort ..... Usage for other compilers will be similar to the examples above, substituting the options following -cc , -CC , -f77 , -f90 , or -f95 . Consu lt the documentatio[...]

  • Página 52

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-10 IB6054601-00 D Q The current workaround for this is to comp ile on a supported and compatible distribution, then run the execut able on one of the systems that uses the GNU 4.x compilers and environment. ■ T o run on FC4 or FC5, install FC3 or RHEL4/CentOS on your build machine. Compile your [...]

  • Página 53

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-11 Q program-name will generally be the p athname to the executable MPI program. I f the MPI program resides in the curr ent directory and the current directory is not in your search path, the n program-name must begin with ‘./’, such as: ./program-name Unless you want to run on[...]

  • Página 54

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-12 IB6054601-00 D Q programs will be started on that host before using the next entry in the mpihosts file. If the full mpihosts file is processed, and there are still more processes requested, processing st arts again at the st art of the file. Y ou have several alternative ways of specifying the[...]

  • Página 55

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-13 Q LD_LIBRARY_PATH, and other environment variables for the node programs through the use of the -rcfile option o f mpirun: $ mpirun -np n -m mpihosts -rcfile mpirunrc program In the absence of this option, mpirun checks to see if a file called $HOME/.mpirunrc exists in the user&a[...]

  • Página 56

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-14 IB6054601-00 D Q 3.5.9 Multiprocessor Node s Another command line option, -ppn , i nstructs mpirun to assign a fixed numbe r p of node programs to e ach node, as it distributes the n inst ances among the nodes: $ mpirun -np n -m mpihosts -ppn p program-name This option overrides the :p specific[...]

  • Página 57

    3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-15 Q -verbose Print diagnostic messages from mpir un itself. Can be useful in troubleshooting Default: Off -version, -v Print MPI version. Default: Of f -help, -h Print mpirun help message. Default: Of f -rcfile node-shell-script S tartup script for setting environment on nodes. Def[...]

  • Página 58

    3 – Using InfiniPath MPI InfiniPath MPI Details 3-16 IB6054601-00 D Q -nonmpi Run a non-MPI program. Required if the node program makes no MPI calls. Default: Off -quiescence-timeout, seconds W ait time in seconds for quiescence (absence o f MPI communication) on the nodes. Useful for detecting deadlocks. 0 disables qu iescence detection. Default[...]

  • Página 59

    3 – Using InfiniPath MPI MPD IB6054601-00 D 3-17 Q -statsfile file-prefix S pecifies alternate file to receive the output from the -print-stats option. Default: stderr 3.6 Using Other MPI Implement ations Support for multiple MPI implement ations has been added. Y ou can use a diffe rent version of MPI and achieve the high-ba ndw idth and low-lat[...]

  • Página 60

    3 – Using InfiniPath MPI File I/O in MPI 3-18 IB6054601-00 D Q 3.8.1 MPD Description The Multi-Purpose Daemon (MPD) was dev eloped by Argonne National Laborato ry (ANL), as part of the MPICH-2 system. While the ANL MPD had certain advant ages over the use of their mpirun (faster launching, better clea nup after crashes, better tolerance of node f[...]

  • Página 61

    3 – Using InfiniPath MPI InfiniPath MPI and Hybrid MPI/OpenMP Applicatio ns IB6054601-00 D 3-19 Q accessed via some network file system, typically NFS. Paralle l programs usually need to have some dat a in files to be shared by all of the processes of an MPI job. Node programs may also use non-shared, node-specific files, such as for scratch stor[...]

  • Página 62

    3 – Using InfiniPath MPI Debugging MP I Programs 3-20 IB6054601-00 D Q may be desirable to run multiple MPI processes and multiple OpenMP threads per node. The number of OpenMP threads is typically controlled by th e OMP_NUM_THREADS environment variable in the . mpirunrc file. This may be used to adjust the split between MPI pr ocesses and OpenMP[...]

  • Página 63

    3 – Using InfiniPath MPI InfiniPath M PI Limitations IB6054601-00 D 3-21 Q Symbolic debugging is easier than machine language debugging. T o enable symbolic debugging you must have compile d with the -g option to mpicc so that the compiler will have included symbol t ables in the compiled object code. T o run your MPI program with a debugger use [...]

  • Página 64

    3 – Using InfiniPath MPI InfiniPath M PI Limitations 3-22 IB6054601-00 D Q No ports available on /dev/ipath NOTE: If port sharing is enable d, this limit is raised to 16 and 8 respectively . T o enable port sharing, set PSM_SHAREDPOR TS=1 in your environment There are no C++ bindings to MPI -- use the exte rn C MPI function calls. In MPI-IO file [...]

  • Página 65

    IB6054601-00 D A-1 Appendix A Benchmark Programs Several MPI performance measurement programs are inst alled from the mpi-benchmark RPM. This Appendix describe s these useful benchmarks and how to run them. These pr ograms are based on code from the group of Dr . Dhabaleswar K. Panda at the Network-Based Computing Laboratory at the Ohio S tate Univ[...]

  • Página 66

    A – Benchmark Programs Benchmark 2: Mea suring MPI Bandwidth Between Two Node s A-2 IB6054601-00 D Q This benchmark always involves just two node programs. Y ou can run it with the command: $ mpirun -np 2 -ppn 1 -m mpihosts osu_latency The -ppn 1 option is ne eded to be certain that the two communicatin g processes are on differe nt nodes. Otherw[...]

  • Página 67

    A – Benchmark Programs Benchmark 3: Messaging Rate Microbenchmarks IB6054601-00 D A-3 Q MPI_Isend function, while th e receiving node consumes them as quickly as it can using the non-blocking MPI_Irecv , and then returns a zero-length acknowledgement when all of the set has be en received. Y ou can run this program with: $ mpirun -np 2 -ppn 1 -m [...]

  • Página 68

    A – Benchmark Programs Benchmark 3: Messaging Rate Micr obenchmarks A-4 IB6054601-00 D Q benchmark (as shown in the example above). It ha s been enhanced with the following additional functionality: ■ Messaging rate reported as well as bandwid th ■ N/2 dynamically calculated at end of run ■ Allows user to run multiple processes per node an [...]

  • Página 69

    A – Benchmark Programs Benchmark 4: Measuring MPI Latency in Host Rings IB6054601-00 D A-5 Q A.4 Benchmark 4: Measuring MPI Latency in Host Rings The program mpi_latency can be used to measure latency in a ring of hosts. Its syntax is a bit different from Benchmark 1 in that it t akes command line argument s that let you specify the message size [...]

  • Página 70

    A – Benchmark Programs Benchmark 4: Measur ing MPI Latency in Host Rings A-6 IB6054601-00 D Q Notes[...]

  • Página 71

    IB6054601-00 D B-1 Appendix B Integration with a Batch Queuing System Most cluster systems use some kind of ba tch queuing system as an orderly way to provide users with access to the resou rce s they need to meet their job’s performance requirements. One of the tasks o f the clus ter administrator is to provide means for users to submit MPI jobs[...]

  • Página 72

    B – Integration with a Batch Queuing System A Batch Queu ing Script B-2 IB6054601-00 D Q require that his node program be the on ly application running on each node CPU. In a typical batch environ ment, the MPI us er would still specify the number of node programs, but would depend on the batch system to allocate specific nodes when the required [...]

  • Página 73

    B – Integration with a Batch Queuing Syst em A Batch Queuing Script IB6054601-00 D B-3 Q by mpirun. Each line consists of a node name, a colon , and the number of processes to start on that node. NOTE: This is one of two format s that the file may use. See section 3.5.6 for more information. B.1.3 Simple Process Management At this point, your scr[...]

  • Página 74

    B – Integration with a Batch Queuing System Lock Enough Memory on Nodes When Using SLURM B-4 IB6054601-00 D Q The following command will terminate all processes using the InfiniPath interconnect: # /sbin/fuser -k /dev/ipath For more information, see the man pages for fuser(1) and lsof(8). NOTE: Run these commands as root to insure that all proces[...]

  • Página 75

    IB6054601-00 D C-1 Appendix C T roubleshooting This Appendix describes some of the exis ting provisions fo r diagnosing and fixing problems. The sections a re organized in the following order: ■ C.1 “T roubleshooting InfiniPath adapter inst allation” ■ C.2 “BIOS settings” ■ C.3 “Software inst allation issues” ■ C.4 “Kernel and[...]

  • Página 76

    C – Troubleshooting BIOS Settings C-2 IB6054601-00 D Q states of the LEDs. The gre en LED will normally illuminate first. The normal state is Green On, Amber On. If a node repeatedly and spont aneously reboots when attemptin g to load the InfiniPath driver , it may be a symptom that it s InfiniPath interconnect board is not well seated in the HTX[...]

  • Página 77

    C – Troubleshooting BIOS Settings IB6054601-00 D C-3 Q C.2.1 MTRR Mapping and Write Combining MTRR (Memory T ype Range Registers) is us ed by the InfiniPath driver to enable write combining to the InfiniPa th on-chip transmit buffers. This improves write bandwidth to the In finiPath chip by writi ng multiple words in a single bus transaction (typ[...]

  • Página 78

    C – Troubleshooting BIOS Settings C-4 IB6054601-00 D Q C.2.3 Incorrect MTRR Mapping Causes Unexpected Low Bandwid t h This same MTRR Mapping setting a s described in the previous section can also cause unexpected low bandwid th if it is set incorrectly . The setting should look like this: MTRR Mapping [Discrete] The MTRR Mapping needs to be set t[...]

  • Página 79

    C – Troubleshooting Software Installation Issues IB6054601-00 D C-5 Q C.3 Sof tware Inst allation Issues This section cove rs issues related to sof tware installation. C.3.1 OpenFabrics Depe ndencies Y ou need to install sysfsutils for your distributio n before installing the OpenFabrics RPMs, as there are dependencies. If sysfsutils has not been[...]

  • Página 80

    C – Troubleshooting Software Installation Issues C-6 IB6054601-00 D Q In older distributions, such as RHEL4, the 32-bit glibc will be contained in the libgcc RPM. The RPM will be named similarly to: libgcc-3.4.3-9.EL4.i386.rpm In newer distributions, glibc is an RPM name. The 32-b it glibc will be named similarly to: glibc-2.3.4-2.i686.rpm or gli[...]

  • Página 81

    C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-7 Q 8. Relo ad all modules by using this command (as root): # /etc/init.d/infinipath start An alternate mechanism can be used, if provide d as part of your alternate installation . 9. Run a n OpenFabrics test program, such as ibstatus , to verify that your InfiniPath card(s) wo[...]

  • Página 82

    C – Troubleshooting Kernel and Initializ ation Issues C-8 IB6054601-00 D Q C.4.1 Kernel Needs CONFIG_PCI_MSI=y If the InfiniPath driver is being compil ed on a machine without CONFIG_PCI_MSI=y configured, you will get a compilation error similar to this: ib_ipath/ipath_driver.c:46:2: #error "InfiniPath driver can only be used with kernels wi[...]

  • Página 83

    C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-9 Q NOTE: This problem has been fixed in the 2.6.17 kernel.org kernel. C.4.3 Driver Load Fails Due to Unsupported Kernel If you try to load th e InfiniPath driver on a kernel that InfiniPath sof tware does not support, the load fails. Error me ssages similar to this appear : mo[...]

  • Página 84

    C – Troubleshooting Kernel and Initializ ation Issues C-10 IB6054601-00 D Q A zero count in all CPU columns me ans that no interrupts have bee n delivered to the processor . Possible causes are: ■ Booting the linux kernel with ACPI (Adv anced Configuratio n and Power Interface) disabled on the boot command line, or in the BIOS configuration ■[...]

  • Página 85

    C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-11 Q C.4.6 InfiniPath ib_ipath Initialization Failure There may be cases where ib_ipath was not properly initialized. Symptoms of this may show up in error messages from an MPI job or another program. Here is a sample command and error message: $ mpirun -np 2 -m ~/tmp/mbu13 osu[...]

  • Página 86

    C – Troubleshooting System Administration Troubleshooting C-12 IB6054601-00 D Q C.5 OpenFabrics Issues This section covers items related to Open Fabrics, including OpenSM. C.5.1 S top OpenSM Before Stoppi ng/Rest arting InfiniPath OpenSM must be stopped before stopping or rest arting InfiniPath. If not, error messages such as the following will o[...]

  • Página 87

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-13 Q C.6.1 Broken Intermediate Link Sometimes message traffic p asses through the fabric while other traf fic appears to be blocked. In this case, MPI jobs fail to run. In large cluster configura tions, switches may be attached to other switche s in order to supply the necessary [...]

  • Página 88

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-14 IB6054601-00 D Q $ mpirun -v MPIRUN:Infinipath Release2.0 : Built on Wed Nov 19 17:28:58 PDT 2006 by mee The following is the error that occurs when m pirun from the 2.0 release is being used with the 1.3 libraries: $ mpirun-ipath-ssh -np 2 -ppn 1 -m ~/tmp/idev osu_latency MPIRUN: mpirun fr[...]

  • Página 89

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-15 Q On a SLES 10 system, you would need: ■ compat-libst dc++ (for FC3) ■ compat-libst dc++5 (for SLES 10) Depending upon the ap plication, you may need to use the -W1 ,- Bstatic o ption to use the static ve rsions of some libraries. C.8.3 Compiler/Linker Mismatch This is a t[...]

  • Página 90

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-16 IB6054601-00 D Q For these examples in Section C.8.5 below , we assume that these new locations are: /path/to/devel (for mpi-devel-*) /path/to/libs (for mpi-libs-*) C.8.5 Compiling on Development Nodes If the mpi-devel-* rpm is inst alled with the --prefix /path/to/devel option then mpicc ,[...]

  • Página 91

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-17 Q The above compiler command insures that the program will run using this path on any machine. For the second option, we change the file /etc/ld.so.conf on the compute nodes rather than using the -Wl,-rpath , option when compiling on the development node . We assume that the m[...]

  • Página 92

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-18 IB6054601-00 D Q Examples are given below . In the following command, the HP-MPI version of mpirun is invoked by the full pathname. Howeve r , the program mpi_nxnlatbw was compiled with the QLogic version of mpicc . The mismatch will produc e errors similar this: $ /opt/hpmpi/bin/mpirun -ho[...]

  • Página 93

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-19 Q The following two commands will bo th work properly: QLogic mpirun and execut able used together: $ mpirun -m ~/host-bbb -np 4 /usr/bin/mpi_nxnlatbw HP-MPI mpirun and execut able used together: $ /opt/hpmpi/bin/mpirun -hostlist "bbb-01,bbb-02,bbb-03,bbb-04" -np 4[...]

  • Página 94

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-20 IB6054601-00 D Q ^ pathf95-389 pathf90: ERROR BORDERS, File = communicate.F, Line = 407, Column = 18 No specific match can be found for the generic subprogram call "MPI_RECV". If it is necessary to use a non-st andard argument list, it is advisable to create your own MPI module fi[...]

  • Página 95

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-21 Q integer count, datatype, root, co mm, ierror ! Call the Fortran 77 style impli cit interface to "mpi_bcast" external mpi_bcast call mpi_bcast(buffer, count, dat atype, root, comm, ierror) end subroutine additional_mpi_bca st_for_character end module additional_bcas[...]

  • Página 96

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-22 IB6054601-00 D Q If this file is not present or the node has not been rebooted af ter the infinipath RPM has been inst alled, a failure message similar to this will be generated: $ mpirun -m ~/tmp/sm -np 2 -mpi_latency 1000 1000000 node-00:1.ipath_update_tid_err: failed: Cannot allocate mem[...]

  • Página 97

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-23 Q Found unknown timer type type unknown frame type type recv done: available_tids now n, but max is m (freed p) cancel recv available_tids now n, but max is m (freed %p) [n] Src lid error: sender: x, exp send: y Frame receive from unknown sender. exp. sender = x, came from y F[...]

  • Página 98

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-24 IB6054601-00 D Q The following message indicates th at a node program may not be processing incoming packe ts, perhaps due to a very high system load: eager array full after overflow, flushing (head h, tail t) The following indicates an invalid In finiPath link protocol version: InfiniPath [...]

  • Página 99

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-25 Q These messages appear in the mpirun output. Most a re followed by an abort, and possibly a backtrace. Ea ch is preceded by the name of the function in which the exception occurred. Error sending packet: description Error receiving packet: description A fatal protocol error o[...]

  • Página 100

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-26 IB6054601-00 D Q There is no route to any host: $ mpirun -np 2 -m ~/tmp/q mpi_latency 100 100 ssh: connect to host <nodename> port 22: No route to host ssh: connect to host <nodename> port 22: No route to host MPIRUN: All node programs ended prematurely without connecting to mpi[...]

  • Página 101

    C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-27 Q $ mpirun -np 2 -m ~/tmp/q -q 60 mpi_latency 1000000 1000000 MPIRUN: MPI progress Quiescence Detected after 9000 seconds. MPIRUN: 2 out of 2 ranks showed no MPI send or receive progress. MPIRUN: Per-rank details are the following: MPIRUN: Rank 0 (<nodename>) caused MPI [...]

  • Página 102

    C – Troubleshooting InfiniPath MPI Tr oubleshooting C-28 IB6054601-00 D Q C.8.13 MPI St ats Using the -print-stats option to mpirun will result in a listing to stderr of various MPI statistics. Here is example o utput for the -print-stats option when used with an 8-rank run of the HPCC benchmark. Message statistics are available for tr ansmitted [...]

  • Página 103

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-29 Q C.9 Useful Programs and Files f or Debugging The most useful programs and files for debugging are listed in the sections below . Many of these programs and files have been discussed elsewhere in the documentation : this information is summarized and repeated here f[...]

  • Página 104

    C – Troubleshooting Useful Programs and Files for Debugging C-30 IB6054601-00 D Q C.9.3 Summary of Useful Programs and Files Useful programs and files are summarized in the table below . Descriptions for some of the programs and files follow . Check ma n pages for mo re information on the programs. T able C-2. Useful Programs and Files Program or[...]

  • Página 105

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-31 Q C.9.4 boardversion It may be useful to keep track of the current version of the inst alled software. Y ou can check the version of the installed In finiPath software by looking in: /sys/bus/pci/drivers/ib_ipath/00/boardversion Example content s are: Driver 2.0,Infi[...]

  • Página 106

    C – Troubleshooting Useful Programs and Files for Debugging C-32 IB6054601-00 D Q C.9.5 ibstatus This program displays basic information on t he st atus of InfiniBand devices that are currently in use when the OpenFabrics modules are loaded . C.9.6 ibv_devinfo This program displays information about Infi niBand de vices, including various kinds o[...]

  • Página 107

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-33 Q C.9.8 ipath_checkout ipath_checkout is a bash script used to verify that the inst allation is correct and that all the nodes of the network are functioning and mutually connected by the InfiniPath fabric. It is to be run on a front end node, and re quires specifica[...]

  • Página 108

    C – Troubleshooting Useful Programs and Files for Debugging C-34 IB6054601-00 D Q --workdir=DIR Use DIR to hold intermediate files crea ted while running tests. DIR must not already exist. -k, --keep Keep intermediate files that were created while pe rforming tests and compiling reports. Result s will be saved in a directory created by mktemp and[...]

  • Página 109

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-35 Q 00: LID=0x30 MLID=0x0 GUID=00:11:75:00:00:07:11:97 Serial: 1236070407 C.9.10 ipathbug-helper The tool ipathbug-helper is useful for verifying homogene ity . Prior to seeking assistance from QLogic tech nical support, you should run this script on the head node of y[...]

  • Página 110

    C – Troubleshooting Useful Programs and Files for Debugging C-36 IB6054601-00 D Q C.9.13 lsmod If you need to find which InfiniPath and OpenFabrics modules are running, try the following command: # lsmod | egrep ’ipath_|ib_|rdma_|findex’ C.9.14 mpirun mpirun can give information on whether the program is being run against a QLogic or non-QLog[...]

  • Página 111

    C – Troubleshooting Useful Programs and File s for Debugging IB6054601-00 D C-37 Q The following table sh ows the possible contents of the file, with brief explanations of the entries. In this same directory are other files cont aining information related to st atus. They are summarized in t able C-4 . T able C-3. status_str File File content s D[...]

  • Página 112

    C – Troubleshooting Useful Programs and Files for Debugging C-38 IB6054601-00 D Q C.9.17 strings The command strings can also be used. It s format is as follows: $ strings /usr/lib/libinfinipath.so.4.0 | grep Date: will produce output like this: $Date: 2006-09-15 04:07 Release2.0 InfiniPath $ NOTE: strings is part of binutils (a development RPM),[...]

  • Página 113

    IB6054601-00 D D-1 Appendix D Recommended Reading Reference material for furthe r reading is provided here. D.1 References for MPI The MPI S tandard specification document s. http://www .mpi-forum.org/docs The MPICH implementation of MPI and its documentation. http://www-unix.mcs.anl.gov/mpi/mpich / The ROMIO distribution and it s documentation. ht[...]

  • Página 114

    D – Recommended Reading Rocks D-2 IB6054601-00 D Q D.6 Clusters Gropp, William, Ewing Lusk, and Thomas S terling, Beowulf Cluster Computing with Linux , Second Edition, 2003, MIT Press, ISBN 0-262-69292-9. D.7 Rocks Extensive document ation on instal ling Rocks and cu stom Rolls. http://www .rocksclusters.org/[...]

  • Página 115

    IB6054601-00 D E-1 Appendix E Glossary A glossary is provided below for technica l terms used in the documentation. bandwid th The rate at which dat a can be transmitted. This represents the cap acity of the network connection. Theoretical peak bandwid th is fixed, but the effective bandwid th , the ideal rate is modified by overhead in hardware an[...]

  • Página 116

    E – Glossary E-2 IB6054601-00 D Q GID For Global Identifier . Used for routing between dif ferent InfiniBand subnet s. GUID For Globally Unique Identifier for the InfiniPath chip. Equivalent to Ethernet MAC address. head node Same as front end node . HCA For Host Channel Adapter . HCAs are I/O engine s located within processing nodes, connecting [...]

  • Página 117

    E – Glossary IB6054601-00 D E-3 Q LID For Local Identifier . Assigned by the Subnet Manager (SM) to each visible node within a sin gle InfiniBand fabric. It is similar conceptually to an IP ad dress for TCP/IP . Lustre Open source project to dev elop scalable cluster file systems. MAC Address For Media Access Control Address . It is a unique iden[...]

  • Página 118

    E – Glossary E-4 IB6054601-00 D Q MTRR For Memory T y pe Range Registers . MTRR For "Memory T ype Range Registers". Used by the InfiniPath driver to enable write combinin g to the InfiniPath on-chip transmit bu f fers. This improves write bandwidth to th e InfiniPath chip, by writing multiple words in a single bus tra nsaction (typicall[...]

  • Página 119

    E – Glossary IB6054601-00 D E-5 Q SDP For S ockets Direct Protocol . An I nfiniBand-specific upper layer protocol. It defines a standard wire protocol to support stream socket s networking over InfiniBand. SRP For SCSI RDMA Protocol . The implement ation of this protocol is under developm ent for utilizing block storage devices over an InfiniBan [...]

  • Página 120

    E – Glossary E-6 IB6054601-00 D Q Notes[...]

  • Página 121

    IB6054601-00 D Index- 1 Index A ACPI, enabling C-9 B Batch queuing for MPI jobs B-1 – B-4 Benchmarking MPI bandwidth A-2 – A-3 MPI latency measurement A-1 – A-2 MPI latency measurement in host rings A-5 C Compiling MPI programs compiler and linker variables 3-9 scripts for invoking compiler and linker 3- 7 specifying compilers and linkers 3-4[...]

  • Página 122

    InfiniPath User Guide Version 2.0 Beta2 Index-2 IB6054601-00 D Q configuration of on SUSE and SLES 10 2-8 – 2-11 layered Ethernet driver 2-6 ipathbug_helper C-30 , C-35 L LEDs, showing state of system with C-1 Limitations of PathScale MPI 3-21 M Management tips maintaining homogeneous nodes 2-20 useful tools for verifying homogeneity 2- 20 MPD, a[...]