This is something AMD has done to great effect with its Infinity Cache and, while Nvidia isn't necessarily going with some fancy new branded approach, it is dropping a huge chunk more L2 cache into the Ada core. What else do you do when you want more speed and you've already packed in as many advanced transistors as you can? You stick some more cache memory into the package. The smaller compute chiplets, however, ought to bring costs down, and drive yields up.įor now, though, the brute force monolithic approach is still paying off for Nvidia. Given the AD102 GPU's complexity is second only to the 80 billion transistors of the advanced 814mm² Nvidia Hopper silicon, it's sure to be an expensive chip to produce. GPU rival, AMD, is promising to shift to graphics compute chiplets for its new RDNA 3 chips launching in November. That doesn't mean the monolithic GPU can continue forever, unchecked. For reference, the RTX 2080 Ti's TU102 chip was 754mm² and held just 18.6 billion 12nm transistors. The fact Nvidia can keep on jamming this ever-increasing number of transistors into a monolithic chip, and still keep shrinking its actual die size, is testament to the power of advanced process nodes in this sphere. Considering the 608.5mm² Ada GPU contains so many more than the 28.3 billion transistors of the GA102 silicon, it's maybe surprising it's that much smaller than the 628.4mm² Ampere chip. We've actually seen our Founders Edition card averaging 2,716MHz in our testing, which puts it almost a full 1GHz faster than the RTX 3090 of the previous generation.Īnd, because of that process shrink, Nvidia's engineers working with TSMC have crammed an astonishing 76.3 billion transistors into the AD102 core. This has meant Nvidia can be super-aggressive in terms of clock speeds, with the RTX 4090 listed with a boost clock of 2,520MHz. Compared with the 8N Samsung process of Ampere, the TSMC-built 4N process is said to offer either twice the performance at the same power, or half the power with the same performance. Part of that is down to the new 4N production process Nvidia is using for its Ada Lovelace GPUs. The 'little higher' than commensurate performance increase though does show there are some differences at that level. If you ignore ray tracing and upscaling there is a corresponding performance boost that's only a little higher than you might expect from the extra number of CUDA Cores dropped into the AD102 GPU. You can see how similar the two architectures are from a rasterisation perspective when looking at the relative performance difference between an RTX 3090 and RTX 4090. Each SM is still using the same 64 dedicated FP32 units, but with a secondary stream of 64 units that can be split between floating point and integer calculations as necessary, the same as was introduced with Ampere. On the raw shader side of the equation, things haven't really moved that far along from the Ampere architecture either. Almost a full 1GHz faster than the RTX 3090 of the previous generation.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |