State of the art at the end of 2017 – Part II


Dear readers, I present to you another article (here is the first part) by our author Nobody. Happy reading and thanks again to anyone for their input.

This is the second part of an article on the evolution of the GPU market in 2017. The first part can be found here.

Let’s now analyze the new AMD GPU technology, Vega.

Let’s first note that AMD has removed all references to GCN from all of its documentation. AMD likes to point out that the Vega is something new compared to its five-year-old architecture that failed to gain a single edge against its competition. Although there are many novelties in this Vega, The foundation however remains GCNAnd, like GCN, this is its (unsatisfactory) behaviour. But let’s look at the numbers:

The GPU measures 484mm^2, 4096 shaders are integrated into its deck divided into 64 controllers, 256 TMUs, 32 ROPs and an HBM controller…but…they are exactly the same specs as Fiji! So let’s get to the differences: memory is 8GB HBM2 (expandable to 16GB as in consumer version) so finally 4GB limit in Fiji, base frequency 1247mhz and boost 1546mhz (appears to be using a system Similar to nVidia for clock power as far as possible) and TDP for … 295 watts!

Theoretical computing power, as always very high: 12.7TFLOPS in FP32 (1080Ti is introduced for 11TFLOPS and 1080 for 8.2TFLOPS) and news in the desktop field, the FP16’s boxed mathematical execution potential, i.e. double TFLOPS When using 16-bit floating-point numbers instead of 32-bit (also useful in certain situations when calculating shaders used in games).

This engineering is also action driven far from the point of maximum efficiency. AMD has declared that Vega is designed for high frequencies (obviously GCN is not), but tests show that the frequencies reached exceed the maximum: very small increases in frequency cause Huge increases in consumption, not different (and maybe worse) than what happens with the Polaris 10 mounted on the RX 580 (whose 1200-1400MHz consumption dropped from 160W to over 240W, +16% OC = +50% consumption). Here’s what we’ve said in previous episodes: in the consumer space, operating at one point outside of maximum efficiency allows you to get a few percentage points higher in performance and thus better position the GPU in the market in order to earn a few extra bucks.

In the HPC market, this would be exactly the opposite, since efficiency is very important and we would also rather install a few panels than pay proportionately more megawatts to do the same job. The same is true in the mobile world, where it is not possible to “steal” at the expense of consumption: which is one of the main reasons why AMD has practically disappeared in mobile since the introduction of GCN And since the same nVidia GPUs in mobile are served at a hundred megahertz lower, or closer to a higher power-efficient point, with significantly lower consumption and marginal performance loss.

From these numbers it is clear, as was the case for Fiji, that the objective is certainly not to compete for second or third place in the market. But while Fiji, despite the costs, HBM and Watt were perfectly compatible with the competitor’s best resolution (at least at 4K), Vega definitely falls behind the GTX1080Ti It struggles a lot to keep up with the slower and less plentiful in specs (watts) GTX1080.

The conclusion is the same as it has been for all of AMD’s GPUs: align with lower-end costs to remain competitive. Therefore, the price list for new AMD cards is exactly that of the 1080 and 1070 against which they compete in performance, and in some cases come out as winners in games that support AMD architecture better than nVidia.

Now, identifying this outcome as what AMD initially wanted and when showing mockery of Poor Volta(ge) is simply unacceptable. Once again AMD found itself with a solution that didn’t quite amount to the effort put into it, and if you ask me why, my answer is easy (and it’s always the same): GCN.

This architecture is very unbalanced in terms of (theoretical) computational capabilities and suffers from the inability to efficiently perform the work required by the video game engines (which also do the computations, but that is not the only thing that matters to know a good performance, Otherwise, Intel’s Larrabee would have been a triumph rather than a bitter disappointment).

In the end, however much you want to change it, GCN is completely inefficient and only gets good performance when it’s properly compressed thanks to custom programming techniques meant to circumvent and compensate for its inefficiency (asynchronous computing is one such technique).

As with Terascale, performance is adequate for the resources used when the code is optimized for the architecture, while Show the side when running the public code It does not consider the necessary measures (which cost in terms of time and money) to make it act as well as possible.

Unlike Terascale, as I mentioned in my first article on the evolution of 28nm GPUs, GCN suffers from inefficiencies in terms of performance/mm2 and performance/W, the two important parameters for determining when it is good (and economically viable) from the competition. Being slower and having the disadvantage of being larger and consuming more does not allow you to make any compromises as was possible with Terascale, less powerful but also always smaller and less power consuming than its counterpart.

The only thing that can be done is Reduce loss to a minimumgiving up the margin necessary to overcome the costs of development but selling it anyway, trying to improve it until the arrival of a new architecture that is built on a different basis and allows finally to compete or even take over the performance and lead the market.

When this happens to a single generation, the problem is limited. nVidia had to do this for the first version of Fermi, and quickly tackled it in 9 months precisely to avoid continuing the economic drain (although the GPU that went badly had performance outperforming the competitor, so let’s think if it’s not It has them just as it succeeded AMD for several years).

Instead, when a situation has been going on for 6 years in a row (and maybe as many as 8 years, or even 7nm), it’s not about making a mistake in implementing PP, but about actually creating something that does. Do not work in the norm.

Think of Formula 1 losing the championship to the same competitor for 6 consecutive years trying to use the same engine and only making changes to the chassis. This is a clue What is applied is not enough to compete (Notwithstanding any excuses, justifications and promises made), participating in the Formula 2 championship with the same car and winning (not always) in the lower class certainly does not help make the project something viable (the cost and investments are those of the higher class).

In addition to inherently suffering from the aforementioned shortcomings, AMD’s GCN architecture used in Fiji and Vega has clearly shown that it does not scale as much as it should with too many CUs. L’Uncertain (the part of the GPU that’s outside and around the main controllers that have ALUs) has a limit that AMD hasn’t been able to overcome, despite a couple of attempts. Real performance is very far from being achieved in theory, although the basic arithmetic behaves as one would expect when properly programmed.

There is clearly something wrong with the architecture and only by changing it would that be possible (even if not guaranteed) Finally what the numbers on paper promise when actually used. It also happens incidentally for competition cards, which scale almost perfectly as the resources used increase.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *