Thursday, January 5, 2017

AMD VEGA – the details of the architecture and memory organization in the next generation of AMD graphics cards – PCLab.pl

the memory Organization is the largest new AMD VEGA.

In the GCN architecture is still the memory consisted of two levels of cache memory, further connected to the memory controller, which supports external memory GDDR5 or HBM. The geometry and computational units were equipped with in their pools of L1 cache and was connected in the shared pool cache of the second level. The rasterizer (Pixel Engine in the figure) had only one level of cache memory, and then interacted with a prize Fund of external memory.

This organization was rebuilt in Vedze:

first, the cache memory of the second level is now available for rasteryzatora – in a minute we’ll explain why.

second, traditional memory controller and local memory pool was replaced by the HBCC (High-Bandwidth Cache Controller) and the cache memory with high bandwidth located in the case of the GPU: in this role, the memory HBM2. HBCC is combined with any external data storage: system memory, hard drives and even network storage. The rectangles above did not represent the GPU local memory as in the diagram GCN version first to fourth, and device input / output.

512 TB address space

Action HBCC is a key element of the system memory and worth a closer look at it, although AMD is not misleading too many technical details.

Controller HBCC partially bridge the gap in bandwidth and capacity between the local memory of the GPU (video card) and storage of data at least two orders of magnitude slower, but two orders of magnitude pojemniejsze. AMD VEGA uses a 49-bit virtual address space, allowing HBCC to manage 512 TB of memory. Part of the address space is the local memory – everything else is memory, external. This address space is numbered, and HBCC does, move pages from memory to external and local. Data transfer occurs dynamically, reaktywnie or active: HBCC and algorithms, software controlled driver can pre-load data, access to which will soon need, in the local memory. As far as we know, control of this mechanism is the software.

cache Memory with high capacity: HBM2

the role of local memory controlled by HBCC will be HBM2, the second generation of memory layers in the stack. HBM method we described in the article “HBM, HMC, Wide I/O – three future technologies of memory”. HBM2 is different from the first generation, throughput, and capacity. Due to the frequency of the interface is accelerated to 1000 MHz bandwidth of a single 128-bit interface is 256 GB/sec. two times more than is bandwidth the same stack HBM the first generation.

Memory HBM currently produced by Samsung and SK Hynix in the form of czterowarstwowych stacks 4 GB. In the near future, the two companies also plan to offer ośmiowarstwowe piles with capacity of 16 GB, but due to thermal limitations of their use in cards can be difficult.

AMD does not mention the width of the HBM interface in the processor of VEGA. If he was the same as in Fiji (Radeon Fury X), the Vega processor can have up to 16 GB of local memory with a total capacity of 1 TBIT/sec.

it is Worth noting that the only other processor with memory HBM2, Nvidia GP100 (the so-called big Pascal), uses four 4-gigabajtowych stacks HBM2, but the clock speed of the interface spowolniono about 700 MHz – probably because of thermal constraints and because the internal bus of the PROCESSOR and so could not consume the data much faster.

– This is only speculation on our part, but it is likely that the CPU VEGA will act in several ways. Only the largest of them, designed among other things to compute accelerators, can be equipped so a large amount of local memory. In versions that will do for cards for “enthusiasts”, can be found in configurations with fewer stacks HBM2, for example, two 8 GB and a bandwidth of 512 GB/s. or dwuwarstwowymi stack HBM2 2 GB/stack (4 GB or 8 GB).

why so much virtual memory?

This is not the first time AMD is trying to equip the GPU in the memory of the absurd, it would seem, capacity. Interested developers several months to buy a test kit with card Radeon Pro SSG, except the processor which Fiji has a switch, PCI Express, RAID controller and two media SSD in RAID0 configuration. AMD provides a software mechanism that allows data transmission between devices PCI-E without performing a working copy in the computer’s memory. Thus, although the Bank is local, ultra-fast GPU memory is only 4GB, appropriately written programs can take advantage of 1 TB of memory, more faster to be connected in the usual way a SOLID state media. This feature is very useful in applications of computing or associated with the video processing.

Czterdziestodziewięciobitową address space and the ability to make direct transfers between devices, PCI Express provides a processor Nvidia Pascal GP100.

Large virtual address space allows the GPU to easily refer to resources that are located in the memory of another graphics processor, system memory, or data stores. The development of methods of storing data will give us soon the memory types that fill the gap in capacity and bandwidth between the RAM-em and a flash memory, such as memory memrystorowa (e.g., 3D XPoint Intel and Microna). Work is also underway on related wysokoprzepustowymi interfaces, such as: OpenCAPI, Gene, NVLink, which can be used to share memory between akceleratorami calculation.

How will this affect the effectiveness of the architecture of VEGA in the games?

As noted by AMD, a typical program 3D run too smoothly memory. In most games, they noticed that the number of reserved program memory of the video card, as a rule, two times more than the amount of data actually used for the current calculations.

the Yellow line on the chart shows the reserved memory, purple – quantity data for which were obtained GPU access. Both the presented tests were carried out on the machine with the Zen CPU and graphics card GPU VEGA. The use of virtual address space larger the capacity of local obvious. Games can be arranged more memory than it has locally the GPU and the controller HBCC dbałby to Ethernet, fast Bank the required data is stored in advance.

Please note that the graphs presented AMD show the amount of used data, but does not prove that throughout the test the GPU can access the same data. The excess memory used by the game experience not aimlessly, but in case data is needed immediately, and didn’t have time to pass them on to PCI Express. Fitness HBCC in the games will largely depend on whether there is an algorithm called the data between the local memory and external, respectively, will be healthy and will be perfectly load the data in advance. If this algorithm would be perfect, GPU VEGA could only have 4 GB of local memory, and at the same time uniknęłyby problems Radeon R9 X, who had plenty of computing power, but not enough local memory for the game in high resolutions.

LikeTweet

No comments:

Post a Comment