Sun Microsystems X4440 manual

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

Ir a la página of

Buen manual de instrucciones

Las leyes obligan al vendedor a entregarle al comprador, junto con el producto, el manual de instrucciones Sun Microsystems X4440. La falta del manual o facilitar información incorrecta al consumidor constituyen una base de reclamación por no estar de acuerdo el producto con el contrato. Según la ley, está permitido adjuntar un manual de otra forma que no sea en papel, lo cual últimamente es bastante común y los fabricantes nos facilitan un manual gráfico, su versión electrónica Sun Microsystems X4440 o vídeos de instrucciones para usuarios. La condición es que tenga una forma legible y entendible.

¿Qué es un manual de instrucciones?

El nombre proviene de la palabra latina “instructio”, es decir, ordenar. Por lo tanto, en un manual Sun Microsystems X4440 se puede encontrar la descripción de las etapas de actuación. El propósito de un manual es enseñar, facilitar el encendido o el uso de un dispositivo o la realización de acciones concretas. Un manual de instrucciones también es una fuente de información acerca de un objeto o un servicio, es una pista.

Desafortunadamente pocos usuarios destinan su tiempo a leer manuales Sun Microsystems X4440, sin embargo, un buen manual nos permite, no solo conocer una cantidad de funcionalidades adicionales del dispositivo comprado, sino también evitar la mayoría de fallos.

Entonces, ¿qué debe contener el manual de instrucciones perfecto?

Sobre todo, un manual de instrucciones Sun Microsystems X4440 debe contener:
- información acerca de las especificaciones técnicas del dispositivo Sun Microsystems X4440
- nombre de fabricante y año de fabricación del dispositivo Sun Microsystems X4440
- condiciones de uso, configuración y mantenimiento del dispositivo Sun Microsystems X4440
- marcas de seguridad y certificados que confirmen su concordancia con determinadas normativas

¿Por qué no leemos los manuales de instrucciones?

Normalmente es por la falta de tiempo y seguridad acerca de las funcionalidades determinadas de los dispositivos comprados. Desafortunadamente la conexión y el encendido de Sun Microsystems X4440 no es suficiente. El manual de instrucciones siempre contiene una serie de indicaciones acerca de determinadas funcionalidades, normas de seguridad, consejos de mantenimiento (incluso qué productos usar), fallos eventuales de Sun Microsystems X4440 y maneras de solucionar los problemas que puedan ocurrir durante su uso. Al final, en un manual se pueden encontrar los detalles de servicio técnico Sun Microsystems en caso de que las soluciones propuestas no hayan funcionado. Actualmente gozan de éxito manuales de instrucciones en forma de animaciones interesantes o vídeo manuales que llegan al usuario mucho mejor que en forma de un folleto. Este tipo de manual ayuda a que el usuario vea el vídeo entero sin saltarse las especificaciones y las descripciones técnicas complicadas de Sun Microsystems X4440, como se suele hacer teniendo una versión en papel.

¿Por qué vale la pena leer los manuales de instrucciones?

Sobre todo es en ellos donde encontraremos las respuestas acerca de la construcción, las posibilidades del dispositivo Sun Microsystems X4440, el uso de determinados accesorios y una serie de informaciones que permiten aprovechar completamente sus funciones y comodidades.

Tras una compra exitosa de un equipo o un dispositivo, vale la pena dedicar un momento para familiarizarse con cada parte del manual Sun Microsystems X4440. Actualmente se preparan y traducen con dedicación, para que no solo sean comprensibles para los usuarios, sino que también cumplan su función básica de información y ayuda.

Índice de manuales de instrucciones

  • Página 1

    Sun Microsystems, Inc. www .sun.com Submit comments about this document at: http://www.sun.com/hwdocs/feedback Sun Fire™ X4140, X4240, and X4440 Ser v ers Diagnostics Guide P ar t No . 820-3067-11 August 2008, Re vision A[...]

  • Página 2

    Please Recycle Copyright © 2008 Sun Microsystems, Inc., 4150 Network Cir cle, Santa Clara, California 95054, U.S.A. All rights reserved. Unpublished - rights reserved under the Copyright Laws of the United States. THIS PRODUCT CONT AINS CONFIDENTIAL INFORMA TION AND TRADE SECRETS OF SUN MICROSYSTEMS, INC. USE, DISCLOSURE OR REPRODUCTION IS PROHIBI[...]

  • Página 3

    iii Contents Preface vii 1. Initial Inspection of the Server 1 Service Troubleshooting Flowchart 1 Gathering Service Information 2 System Inspection 3 T r oubleshooting Power Problems 3 Externally Inspecting the Server 3 Internally Inspecting the Server 4 2. Using SunVTS Diagnostic Software 7 Running SunVTS Diagnostic Tests 7 SunVTS Documentation 8[...]

  • Página 4

    iv Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Uncorrectable DIMM Err ors 12 Correctable DIMM Err ors 14 BIOS DIMM Error Messages 15 DIMM Fault LEDs 15 Isolating and Correcting DIMM ECC Errors 18 A. Event Logs and POST Codes 21 V iewing Event Logs 21 Power-On Self-T est (POST) 25 How BIOS POST Memory T esting W orks 2[...]

  • Página 5

    Contents v Handling of Uncorrectable Err ors 53 Handling of Correctable Err ors 56 Handling of Parity Errors (PERR) 59 Handling of System Errors (SERR) 61 Handling Mismatching Processors 63 Hardwar e Error Handling Summary 64 Index 69[...]

  • Página 6

    vi Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008[...]

  • Página 7

    vii Pr eface The Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide contains information and procedur es for using available tools to diagnose problems with the servers. Befor e Y ou Read This Document It is important that you review the safety guidelines in the Sun Fir e X4140, X4240, and X4440 Safety and Compliance Guide.[...]

  • Página 8

    viii Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Related Documentation The document set for the Sun Fire X4140, X4240, and X4440 Servers is described in the Where T o Find Sun Fir e X4140, X4240, and X4440 Servers Documentation sheet that is packed with your system. Y ou can also find the documentation at http://docs.[...]

  • Página 9

    Preface ix T ypographic ConventionsThir d-Party W eb Sites Sun ™ is not responsible for the availability of third-party web sites mentioned in this document. Sun does not endorse and is not responsible or liable for any content, advertising, products, or other materials that are available on or thr ough such sites or resour ces. Sun will not be r[...]

  • Página 10

    x Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Sun W elcomes Y our Comments Sun is interested in impr oving its documentation and welcomes your comments and suggestions. Y ou can submit your comments by going to: http://www.sun.com/hwdocs/feedback Please include the title and part number of your document with your feed[...]

  • Página 11

    1 CHAPTER 1 Initial Inspection of the Server This chapter includes the following topics: ■ “Service T roubleshooting Flowchart” on page 1 ■ “Gathering Service Information” on page 2 ■ “System Inspection” on page 3 Service T roubleshooting Flowchart Use the following flowchart as a guideline for using the subjects in this book to t[...]

  • Página 12

    2 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Gathering Service Information The first step in determining the cause of a pr oblem with the server is to gather information from the service-call paperwork or the onsite personnel. Use the following general guideline steps when you begin troubleshooting. T o gather servic[...]

  • Página 13

    Chapter 1 Initial Inspection of the Server 3 System Inspection Controls that have been impr operly set and cables that are loose or improperly connected are common causes of problems with har dware components. T roubleshooting Power Pr oblems ■ If the server will power on, skip this section and go to “Externally Inspecting the Server” on page[...]

  • Página 14

    4 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Internally Inspecting the Server T o perform a visual inspection of the internal system: 1. Choose a method for shutting down the server from main power mode to standby power mode. See FIGURE 1- 1 and FIGURE 1-2 . ■ Graceful shutdown – Use a ballpoint pen or other styl[...]

  • Página 15

    Chapter 1 Initial Inspection of the Server 5 FIGURE 1-2 X4440 Server Front Panel 2. Remove the server cover . For instructions on removing the server cover , refer to your server ’s service manual. 3. Inspect the internal status indicator LEDs. These can indicate component malfunction. For the LED locations and descriptions of their behavior , se[...]

  • Página 16

    6 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 10. If the problem with the server is not evident, you can obtain additional information by viewing the power-on self test (POST) messages and BIOS event logs during system startup. Continue with “V iewing Event Logs” on page 21 .[...]

  • Página 17

    7 CHAPTER 2 Using SunVTS Diagnostic Softwar e This chapter contains information about the SunVTS™ diagnostic software tool. Running SunVTS Diagnostic T ests The servers are shipped with a Bootable Diagnostics CD that contains the Sun V alidation T est Suite (SunVTS) software. SunVTS provides a compr ehensive diagnostic tool that tests and validat[...]

  • Página 18

    8 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ QLogic Host Bus Adapter T est (qlctest) ■ RAM T est (ramtest) ■ Serial Port T est (serialtest) ■ System T est (systest) ■ T ape Drive T est (tapetest) ■ Universal Serial Board T est (usbtest) ■ V irtual Memory T est (vmemtest) SunVTS software has a sophisti[...]

  • Página 19

    Chapter 2 Using SunVTS Diagnostic Software 9 Using the Bootable Diagnostics CD T o use the diagnostics CD to perform diagnostics: 1. W ith the server powered on, insert the CD into the DVD-ROM drive. 2. Reboot the server , and press F2 during the start of the reboot so that you can change the BIOS setting for boot-device priority . 3. When the BIOS[...]

  • Página 20

    10 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ Solaris system message log is a log of all the general Solaris events logged by syslogd . The path name of this log file is /var/adm/messages . a. Click the Log button. The Log file window is displayed. b. Specify the log f ile that you want to view by selecting it fr[...]

  • Página 21

    11 CHAPTER 3 T roubleshooting DIMM Pr oblems This chapter describes how to detect and correct problems with the server ’s Dual Inline Memory Modules (DIMM)s. It includes the following sections: ■ “DIMM Population Rules” on page 1 1 ■ “DIMM Replacement Policy” on page 12 ■ “How DIMM Errors Are Handled by the System” on page 12 ?[...]

  • Página 22

    12 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 DIMM Replacement Policy Replace a DIMM when one of the following events takes place: ■ The DIMM fails memory testing under BIOS due to Uncorrectable Memory Err ors (UCEs). ■ UCEs occur and investigation shows that the errors originated from memory . In addition, a DIM[...]

  • Página 23

    Chapter 3 T roubleshooting DIMM Problems 13 3. BIOS reports this event in the service pr ocessor ’s system event log (SEL) as shown in the sample IPMItool output below: # ipmitool -H 10.6.77.249 -U root -P changeme -I lanplus sel list 8 | 09/25/2007 | 03:22:03 | System Boot Initiated #0x02 | Initiated by warm reset | Asserted 9 | 09/25/2007 | 03:[...]

  • Página 24

    14 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 The lines in the display start with event numbers (in hex), followed by a description of the event. T ABLE 3-1 describes the contents of the display: Corr ectable DIMM Err ors If a DIMM has 24 or more corr ectable errors in 24 hours, it is considered defective and should [...]

  • Página 25

    Chapter 3 T roubleshooting DIMM Problems 15 to view ECC errors ■ Linux: The HERD utility can be used to manage DIMM errors in Linux. See the x64 Servers Utilities Reference Manual for details. ■ If HERD is installed, it copies messages from /dev/mcelog to /var/log/messages . ■ If HERD is not installed, a program called mcelog copies messages [...]

  • Página 26

    16 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Note – The DIMM Fault and Motherboard Fault LEDs operate on stored power for up to a minute when the system is powered down, even after the AC power is disconnected, and the motherboard (or mezzanine board) is out of the system. The stored power lasts for about half an [...]

  • Página 27

    Chapter 3 T roubleshooting DIMM Problems 17 FIGURE 3-1 DIMMs and LEDs on Motherboard[...]

  • Página 28

    18 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 FIGURE 3-2 DIMMs and LEDs on Mezzanine Board Isolating and Corr ecting DIMM ECC Err ors If your log files r eport an ECC error or a problem with a DIMM, complete the steps below until you can isolate the fault. In this example, the log file r eports an error with the DIMM[...]

  • Página 29

    Chapter 3 T roubleshooting DIMM Problems 19 3. Press the PRESS TO SEE F AUL T button, and inspect the DIMM fault LEDs. See FIGURE 3-1 and FIGURE 3-2 . A flashing LED identifies a component with a fault. ■ For CEs, the LEDs correctly identify the DIMM where the err ors were detected. ■ For UCEs, both LEDs in the pair flash if there is a pr oblem[...]

  • Página 30

    20 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 1 1. Power on the server and run the diagnostics test again. 12. Review the log f ile. If the tests identify the same error , the problem is in the CPU, not the DIMMs.[...]

  • Página 31

    21 APPENDIX A Event Logs and POST Codes This appendix contains information about the BIOS event log, the BMC system event log, the power-on self-test (POST), and console r edirection. It contains the following sections: ■ “V iewing Event Logs” on page 21 ■ “Power-On Self-T est (POST)” on page 25 V iewing Event Logs Use this procedur e t[...]

  • Página 32

    22 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Main Advanced PCIPnP Boot Security Chipset Exit ****************************************************************************** * Advanced Settings * Configure CPU. * * *************************************************** * * * WARNING: Setting wrong values in below section[...]

  • Página 33

    Appendix A Event Logs and POST Codes 23 b. From the Advanced Settings screen, select Event Log Conf iguration. The Advanced Menu Event Logging Details screen is displayed. c. From the Event Logging Details screen, select V iew Event Log. All unread events ar e displayed. 4. V iew the BMC system event log: a. From the BIOS Main Menu screen, select A[...]

  • Página 34

    24 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 c. From the IPMI 2.0 Conf iguration screen, select V iew BMC System Event Log. The log takes about 60 seconds to generate, then it is displayed on the screen. 5. If the problem with the server is not evident, continue with “Using the ILOM Service Processor GUI to V iew [...]

  • Página 35

    Appendix A Event Logs and POST Codes 25 Power -On Self-T est (POST) The system BIOS provides a rudimentary power -on self-test. The basic devices requir ed for the server to operate are checked, memory is tested, the LSI 1064 disk controller and attached disks ar e probed and enumerated, and the two Intel dual Gigabit Ethernet controllers ar e init[...]

  • Página 36

    26 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Redir ecting Console Output Use the following instructions to access the service pr ocessor and redir ect the console output so that the BIOS POST codes can be read. 1. Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power-on se[...]

  • Página 37

    Appendix A Event Logs and POST Codes 27 10. Set the color depth for the redirection console at either 6 or 8 bits. 1 1. Click the Start Redirection button. 12. When you are prompted for a user name and password, type the following: ■ User Name: root ■ Password: changeme The current POST scr een is displayed.[...]

  • Página 38

    28 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Changing POST Options These instructions are optional, but you can use them to change the operations that the server performs during POST testing. T o change POST options: 1. Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power[...]

  • Página 39

    Appendix A Event Logs and POST Codes 29 3. Select Boot Settings Conf iguration. The Boot Settings Configuration scr een is displayed. 4. On the Boot Settings Conf iguration screen, there are several options that you can enable or disable: ■ Quick Boot – This option is disabled by default. If you enable this, the BIOS skips certain tests while b[...]

  • Página 40

    30 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ Boot Num-Lock – This option is On by default (keyboard Num-Lock is turned on during boot). If you set this to off, the keyboar d Num-Lock is not turned on during boot. ■ W ait for F1 if Error – This option is disabled by default. If you enable this, the system w[...]

  • Página 41

    Appendix A Event Logs and POST Codes 31 POST Codes T ABLE A-1 contains descriptions of each of the POST codes, listed in the same order in which they are generated. These POST codes appear as a four -digit string that is a combination of two-digit output from primary I/O port 80 and two-digit output from secondary I/O port 81. In the POST codes lis[...]

  • Página 42

    32 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 de00 Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors pr esent. NOTE: APs are left in the CLI HL T state. 8613 Initialize PM regs and PM PCI regs at Early-POST . Initialize multi-host bridge, if system supports it. Set[...]

  • Página 43

    Appendix A Event Logs and POST Codes 33 POST Code Checkpoints The POST code checkpoints are the lar gest set of checkpoints during the BIOS pre- boot process. T ABLE A-2 describes the type of checkpoints that might occur during the POST portion of the BIOS. These two-digit checkpoints are the output fr om primary I/O port 80. T ABLE A-2 POST Code C[...]

  • Página 44

    34 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 0E T esting and initialization of different Input Devices. Also, update the Kernel V ariables. T raps the INT09h vector , so that the POST INT09h handler gets control for IRQ1. Uncompress all available language, BIOS logo, and Silent logo modules. 13 Initialize PM regs an[...]

  • Página 45

    Appendix A Event Logs and POST Codes 35 60 Initializes NUM-LOCK status and programs the KBD typematic rate. 75 Initialize Int-13 and prepar e for IPL detection. 78 Initializes IPL devices controlled by BIOS and option ROMs. 7A Initializes remaining option ROMs. 7C Generate and write contents of ESCD in NVRam. 84 Log errors encounter ed during POST [...]

  • Página 46

    36 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 B1 Save system context for ACPI. 00 Prepar es CPU for booting to OS by copying all of the context of the BSP to all application processors pr esent. NOTE: APs are left in the CLI HL T state. 61-70 OEM POST Error . This range is reserved for chipset vendors and system manu[...]

  • Página 47

    37 APPENDIX B Status Indicator LEDs This appendix contains information about the locations and behavior of the LEDs on the server . It describes the external LEDs that can be viewed on the outside of the server and the internal LEDs that can be viewed only with the main cover removed. External Status Indicator LEDs See the following figur es and ta[...]

  • Página 48

    38 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Fr ont Panel LEDs FIGURE B-1 Front Panel LEDs (X4140 shown) Back Panel LEDs FIGURE B-2 Back Panel LEDs (X4140 shown) Figure Legend 1 Locator LED/Locator button: White 4 Rear PS LED: (Amber) P ower supply f ault 2 Service Required LED: Amber 5 System Over T emperature LED:[...]

  • Página 49

    Appendix B Status Indicator LEDs 39 Har d Drive LEDs FIGURE B-3 Hard Drive LEDs Internal Status Indicator LEDs The server has internal status indicators on the motherboard, and on the mezzanine board. For motherboar d locations, see FIGURE B-4 . For mezzanine board locations, see FIGURE B-5 . ■ The DIMM Fault LEDs indicate a problem with the corr[...]

  • Página 50

    40 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Note – The mezzanine board, when present, obscur es part of the motherboard, including the LEDs. The Motherboard Fault LED indicates that one or more of the LEDs on the motherboard is active. FIGURE B-4 DIMMs and LEDs on Motherboard[...]

  • Página 51

    Appendix B Status Indicator LEDs 41 FIGURE B-5 DIMMs and LEDs on Mezzanine Board[...]

  • Página 52

    42 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008[...]

  • Página 53

    43 APPENDIX C Using the ILOM Service Pr ocessor GUI to V iew System Information This appendix contains information about using the Integrated Lights Out Manager (ILOM) Service processor (SP) GUI to view monitoring and maintenance information for your server . ■ “Making a Serial Connection to the SP” on page 44 ■ “V iewing ILOM SP Event Lo[...]

  • Página 54

    44 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Making a Serial Connection to the SP T o make a serial connection to the SP: 1. Connect a serial cable from the RJ-45 Serial Management port on server to a terminal device. 2. Press ENTER on the terminal device to establish a connection between that terminal device and th[...]

  • Página 55

    Appendix C Using the ILOM Service Processor GUI to View System Information 45 V iewing ILOM SP Event Logs Events are notif ications that occur in response to some actions. The IPMI system event log (SEL) provides status information about the server ’s hardwar e and software to the ILOM softwar e, which displays the events in the ILOM web GUI. T o[...]

  • Página 56

    46 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 FIGURE C-1 System Event Logs Page 3. Select the category of event that you want to view in the log from the drop- down list box. Y ou can select from the following types of events: ■ Sensor-specif ic events. These events relate to a specific sensor for a component, for [...]

  • Página 57

    Appendix C Using the ILOM Service Processor GUI to View System Information 47 After you have selected a category of event, the Event Log table is updated with the specified events. The f ields in the Event Log are described in T ABLE C-1 . 4. T o clear the event log, click the Clear Event Log button. A confirmation dialog box is displayed. 5. Click[...]

  • Página 58

    48 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ ILOM web GUI operation; for example, from the Maintenance tab, selecting Reset SP ■ An SP firmwar e upgrade After an SP reboot, the SP clock is changed by the following events: ■ When the host is booted. The host’s BIOS unconditionally sets the SP time to that i[...]

  • Página 59

    Appendix C Using the ILOM Service Processor GUI to View System Information 49 2. From the System Information tab, select Components. The Replaceable Component Information page is displayed. See FIGURE C-2 . FIGURE C-2 Replaceable Component Information Page 3. Select a component from the drop-down list. Information about the selected component is di[...]

  • Página 60

    50 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 V iewing Sensors This section describes how to view the server temperature, voltage, and fan sensor readings. For a complete list of sensors, see Appendix D . T o view sensor readings: 1. Log in to the SP as Administrator or Operator to reach the ILOM web GUI: a. T ype th[...]

  • Página 61

    Appendix C Using the ILOM Service Processor GUI to View System Information 51 FIGURE C-3 Sensor Readings Page 3. Click the Refresh button to update the sensor readings to their current status. 4. Click a sensor to display its thresholds. A display of properties and values appears. See the example in FIGURE C-4 .[...]

  • Página 62

    52 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 FIGURE C-4 Sensor Details Page 5. If the problem with the server is not evident after viewing sensor readings information, continue with “Running SunVTS Diagnostic T ests” on page 7 .[...]

  • Página 63

    53 APPENDIX D Err or Handling This appendix contains information about how the servers process and log errors. See the following sections: ■ “Handling of Uncorrectable Errors” on page 53 ■ “Handling of Correctable Errors” on page 56 ■ “Handling of Parity Errors (PERR)” on page 59 ■ “Handling of System Errors (SERR)” on page [...]

  • Página 64

    54 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Note – If the error is on low 1MB, the BIOS fr eezes after rebooting. Therefor e, no DMI log is recor ded. ■ An example of the error r eported by the SEL through IPMI 2.0 is as follows: ■ When low memory is erroneous, the BIOS is fr ozen on pre-boot low memory test [...]

  • Página 65

    Appendix D Error Handling 55 FIGURE D-1 DMI Log Screen, Uncorr ectable Error[...]

  • Página 66

    56 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Handling of Corr ectable Errors This section lists facts and considerations about how the server handles correctable errors. ■ During BIOS POST : ■ The BIOS polls the MCK registers. ■ The BIOS logs to DMI. ■ The BIOS logs to the SP SEL through the BMC. ■ The fea[...]

  • Página 67

    Appendix D Error Handling 57 FIGURE D-2 DMI Log Screen, Corr ectable Error ■ If during any stage of memory testing the BIOS finds itself incapable of reading/writing to the DIMM, it takes the following actions: ■ The BIOS disables the DIMM as indicated by the Memory Decreased message in the example in EXAMPLE D-1 . ■ The BIOS logs an SEL reco[...]

  • Página 68

    58 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 EXAMPLE D-1 DMI Log Screen, Corr ectable Error , Memory Decreased[...]

  • Página 69

    Appendix D Error Handling 59 Handling of Parity Err ors (PERR) This section lists facts and considerations about how the server handles parity errors (PERR). ■ The handling of parity errors works thr ough NMIs. ■ During BIOS POST , the NMI is logged in the DMI and the SP SEL. See the following example command and output: ■ FIGURE D-3 shows an[...]

  • Página 70

    60 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 FIGURE D-3 DMI Log Screen, PCI Parity Err or ■ The BIOS displays the following messages and freezes (during POST or DOS): ■ NMI EVENT!! ■ System Halted due to Fatal NMI! ■ The Linux NMI trap catches the interrupt and reports the following NMI “confusion report?[...]

  • Página 71

    Appendix D Error Handling 61 Note – The Linux system reboots, but does not inform the BIOS of this incident. Handling of System Err ors (SERR) This section lists facts and considerations about how the server handles system errors (SERR). ■ System error handling works through the HyperT ransport Synch Flood Error mechanism on 81 1 1 and 8131. ?[...]

  • Página 72

    62 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ FIGURE D-5 shows an example DMI log screen fr om the BIOS Setup Page with a system error . FIGURE D-5 DMI Log Screen with Err or EvM Revision : 04 Sensor Type : Critical Interrupt Sensor Number : 00 Event Type : Sensor-specific Discrete Event Direction : Assertion Eve[...]

  • Página 73

    Appendix D Error Handling 63 Handling Mismatching Pr ocessors This section lists facts and considerations about how the server handles mismatching processors. ■ The BIOS performs a complete POST . ■ The BIOS displays a report of any mismatching CPUs, as shown in the following example: ■ No SEL or DMI event is recor ded. ■ The system enters [...]

  • Página 74

    64 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Har dware Err or Handling Summary T ABLE D-1 summarizes the most common hardwar e errors that you might encounter with these servers. T ABLE D-1 Hardwar e Error Handling Summary Error Description Handling Logged (DMI Log or SP SEL) Fatal? SP failure The SP fails to boot u[...]

  • Página 75

    Appendix D Error Handling 65 Single-bit DRAM ECC error W ith ECC enabled in the BIOS Setup, the CPU detects and corrects a single-bit error on the DIMM interface. The CPU corrects the err or in hardware. No interrupt or machine check is generated by the hardwar e. The polling is triggered every half-second by SMI timer interrupts and is done by the[...]

  • Página 76

    66 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 PCI SERR, PERR System or parity error on a PCI bus. Sync floods on HyperT ransport links, the machine resets itself, and err or information gets retained thr ough reset. The BIOS reports, A Hyper Transport sync flood error occurred on last boot, press F1 to continue . DMI[...]

  • Página 77

    Appendix D Error Handling 67 Multiple fan failure Fan failure is detected by reading tach signals. The Front Fan Fault, Service Action Requir ed, and individual fan module LEDs are lit. SP SEL Fatal Single power supply failure When any of the AC/DC PS_VIN_GOOD or PS_PWR_OK signals are deasserted. Service Action Required, and Power Supply Fault LEDs[...]

  • Página 78

    68 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008[...]

  • Página 79

    69 Index B BIOS changing POST options, 28 event logs, 21 POST code checkpoints, 33 POST codes, 31 POST overview, 25 redir ecting console output for POST, 2 6 Bootable Diagnostics CD, 8 C comments and suggestions, x component inventory viewing with ILOM SP GUI, 48 console output, redir ecting, 2 6 correctable err ors, handling, 56 D diagnostic softw[...]

  • Página 80

    70 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • A ugust 2008 external, 3 internal, 4 Integrated Lights-Out Manager Service Processor , See ILOM SP GUI internal inspection, 4 isolating DIMM ECC errors, 18 L LEDs external, 37 LEDs, ports, and slots illustrated, 38, 3 9 locations of ports, slots, and LEDs (illustration), 38, 39 M mis[...]