Ibm
Senior Technical Staff Member and Master Inventor at Ibm
Skills:
Soa Enterprise Architecture Websphere Enterprise Software Software Development Storage Solution Architecture Distributed Systems System Architecture High Availability Solaris Middleware Architectures Unix Ibm Aix Linux Service Oriented Architecture
George Henry Ahrens - Pflugerville TX George John Dawkins - Austin TX Michael Youhour Lim - Leander TX Timothy Lee Toohey - Austin TX
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
H02H 305
US Classification:
714 10, 713 1
Abstract:
A method and apparatus for detecting an error condition during initialization of a multiprocessor data processing system is provided. A master processor identification indicator is initialized to an initial value by a service processor in the data processing system. The master processor identification indicator may be a location in nonvolatile RAM to protect data integrity. One of the plurality of processors in the multiprocessor system is selected to be the master processor by being released by the service processor and winning the Ă¢race conditionĂ¢ to fetch the first instruction from memory for program execution. This processor then sets the master processor identification indicator to a unique processor identification value. The initial value may be a spoof number indicating whether the master processor has yet written its unique processor identification value. At some later point in time, the service processor detects a freeze or hang condition in the data processing system.
Method And System For Error Isolation During Pci Bus Configuration Cycles
George Henry Ahrens - Pflugerville TX John C. Kennel - Austin TX Jeffrey Scott Mayes - Austin TX Maulin Ishwarbhai Patel - Round Rock TX David Lee Randall - Leander TX
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
H02H 305
US Classification:
714 43, 714 6, 711148
Abstract:
A method, system and computer program are described for isolating bus errors detected during system start-up by utilizing a technique in which a shared mailbox associated with a service processor is provided for holding the address of an adapter in an I/O drawer. If an error is detected the server processor is notified. The server processor then retrieves the address from the mailbox, uses it to derive a location code which is then passed along with the error code to an appropriate error analysis routine. The start-up procedure is then shut down.
Method And Apparatus For Locating And Displaying A Defective Component In A Data Processing System During A System Startup Using Location And Progress Codes Associated With The Component
George Henry Ahrens - Pflugerville TX George John Dawkins - Austin TX Michael Youhour Lim - Leander TX Thomas Francis Ploski - Tucson AZ David Lee Randall - Leander TX
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 126
US Classification:
713 2, 713 1, 713330, 713310, 713340, 713300
Abstract:
A method for locating a defective component in a data processing system during system startup is disclosed. Each component within the data processing system is assigned a location code. Then, a progress code is associated with a location code and a function being loaded to, tested, or executed in a component. After supplying power to the data processing system, the components of the data processing system are initialized and tested to establish a configuration. During the initialization and testing, a location code of a component and a corresponding progress code are displayed on a display panel. In response to a system hang, a defective component can be identified utilizing the location code and the progress code displayed on the display panel.
Identifying Field Replaceable Units Responsible For Faults Detected With Processor Timeouts Utilizing Ipl Boot Progress Indicator Status
George Henry Ahrens - Pflugerville TX David Russell Armstrong - Austin TX
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
H02H 305
US Classification:
714 10, 713 2
Abstract:
Described is a method for isolating faults to a correct field replaceable unit (FRU) of a data processing system. When a processor timeout occurs, a fault isolation logic is triggered and checks the boot record to determine if the timeout occurred because of an FRU fault before or after the service processor completed its system initialization. When the timeout occurred because of fault that occurred while the service processor was loading operating system (OS) (e. g. , AIX) instructions from the boot device in the input/output (I/O) subsystem, then the FRU callout indicates a boot fault associated with the I/O planar and the CPU (processor) card. When the FRU fault occurred prior to fetching the OS instructions from the boot device or after the service processor completed its system initialization procedures, then the FRU callout is attributed to the processor card and backplane. Attributing boot error faults to incorrect FRUs is therefore substantially eliminated.
System And Method For Reporting Platform Errors In Partitioned Systems
George Henry Ahrens - Pflugerville TX Douglas Marvin Benignus - Dime Box TX Arthur James Tysor - Buda TX
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 1130
US Classification:
714 57, 714 20, 714 48
Abstract:
Hardware errors are stored in an error buffer for processing by one or more system partitions within a computer system. When errors are first placed in the buffer, an Already Reported Flag (ARF) is initialized to indicate that the error has not yet been reported to any of the system partitions. When one of the system partitions receives the corresponding error information by running a diagnostics routine, the ARF is set indicating that the error has been reported to at least one system partition. The system partition, in turn, uses the ARF information to determine how to handle the corresponding error. In an environment using a remote hardware service provider, the ARF determines whether to transmit the error information to the service provider. In environments without remote service providers, the ARF information is used to highlight newly reported errors to the user.
Cache Thresholding Method, Apparatus, And Program For Predictive Reporting Of Array Bit Line Or Driver Failures
George Henry Ahrens - Pflugerville TX Alongkorn Kitamorn - Austin TX Charles Andrew McLaughlin - Round Rock TX Michael Thomas Vaden - Austin TX
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 1100
US Classification:
714 5, 714 23
Abstract:
A mechanism is provided for predicting cache array bit line or driver failures. This mechanism checks for five consecutive errors at different addresses within the same syndrome on invocation of event scan polling to characterize the failure. Once the failure is characterized, it is reported to the system for corrective maintenance including dynamic and/or boot time processor deconfiguration or preventive processor replacement.
Method And Apparatus For Managing Service Indicator Lights In A Logically Partitioned Computer System
A low-level function which enforces logical partitioning establishes a set of virtual indicator lights for certain physical components, the virtual indicator lights being only data in memory, a separate set of virtual indicator lights corresponding to each respective partition. Processes running in a partition can switch and sense the virtual indicator lights corresponding to the partition, but have no direct capability to either switch or to sense the virtual lights of any other partition. The low-level enforcement function alone can switch the state of the physical indicator light, which is generally the logical OR of the virtual indicator lights of the different partitions.
Method And System For Log Repair Action Handling On A Logically Partitioned Multiprocessing System
Mark Edwards - Austin TX, US George Ahrens - Pflugerville TX, US Douglas Benignus - Dime Box TX, US Arthur Tysor - Buda TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F011/00
US Classification:
714/005000
Abstract:
A method for handling a log repair action in a logically partitioned (LPAR) multiprocessing system is disclosed. The LPAR multiprocessing system includes a plurality of partitions. The method and system comprise recording the log repair action on one of the plurality of partitions. The method and system further include sending the recording of the log repair action to a single log repair action source, the recording including the log repair action and the partition identifier of the one of the plurality of partitions. The method and system further includes sending the log repair action to each of the other of the plurality of partitions from the single service. Accordingly, a system and method in accordance with the present invention solves the problem of having to perform the same action in multiple partitions by using a notification scheme with a single focal point of control. When the focal point determines that the action performed is common to other partitions, that action is broadcast by the focal point to the other partitions and thus eliminates the need for visiting each partition to repeat the action. Each receiving partition uses the broadcast information to update its log repair action record. Accordingly shortened repair scenarios and less interruptions to actively working partitions is provided, thus providing the customer with increased system availability which should result in higher customer satisfaction.