Inventors:
- Fremont CA, US
Cheng Liao - Pleasanton CA, US
Assignee:
Silicon Graphics International Corp. - Fremont CA
International Classification:
G06F 12/00
Abstract:
High performance computing systems perform complex or data-intensive calculations using a large number of computing nodes and a shared memory. Disclosed methods and systems provide nodes having a special-purpose coprocessor to perform these calculations, along with a general-purpose processor to direct the calculations. Computational data transfer from the shared memory to the coprocessor incurs a data copying latency. To reduce this latency as experienced by the coprocessor, a complex computation is divided into work units, and one or more threads executing on the processor copy the work units from the shared memory to a local buffer memory of a computing node. By buffering these data for transfer from the local memory to coprocessor memory, and by ensuring that new data are copied while the coprocessor operates on older data, data copying latency is hidden from the coprocessor.