The Coming of General Purpose GPUs
Until the advent of DirectX 10, there was no point in adding undue complexity by enlarging the die area, which increased vertex shader functionality in addition to boosting the floating point precision of pixel shaders from 24-bit to 32-bit to match the requirement for vertex operations. With DX10’s arrival, vertex and pixel shaders maintained a large level of common function, so moving to a unified shader arch eliminated a lot of unnecessary duplication of processing blocks. The first GPU to utilize this architecture was Nvidia’s iconic G80.
Four years in development and $475 million produced a 681 million-transistor, 484mm² behemoth — first as the 8800 GTX flagship and 8800 GTS 640MB on November 8. An overclocked GTX, the 8800 Ultra, represented the G80’s pinnacle and was sandwiched between the launches of two lesser products: the 320MB GTS in February and the limited production GTS 640MB/112 Core on November 19, 2007.
Aided by the new Coverage Sample anti-aliasing (CSAA) algorithm, Nvidia had the satisfaction of seeing its GTX demolish every single and dual-graphics competitor in outright performance. Despite that success, the company dropped three percentage points in discrete graphics market share in the fourth quarter — points AMD picked up on the strength of OEM contracts.
MSI’s version of the GeForce 8800 GTX
The remaining components of Nvidia’s business strategy concerning the G80 became reality in February and June of 2007. The C-language based CUDA platform SDK (Software Development Kit) was released in beta form to enable an ecosystem leveraging the highly parallelized nature of GPUs. Nvidia’s PhysX physics engine as well as its distributed computing projects, professional virtualization and OptiX, Nvidia’s ray tracing engine, are the more high profile applications using CUDA.
Both Nvidia and ATI (now AMD) had been integrating ever-increasing computing functionality into the graphics pipeline. ATI/AMD would choose to rely upon developers and committees for the OpenCL path, while Nvidia had more immediate plans in mind with CUDA and high performance computing.
To this end, Nvidia introduced its Tesla line of math co-processors in June, initially based on the same G80 core that had already powered the GeForce and Quadro FX 4600/5600, and after a prolonged development that included at least two (and possibly three) major debugging exercises, AMD released the R600 in May.
Aided by the new Coverage Sample anti-aliasing (CSAA) algorithm, Nvidia had the satisfaction of seeing its GTX demolish every single and dual-graphics competitor in outright performance.
Media hype made the launch hotly anticipated as AMD’s answer to the 8800 GTX, but what arrived as the HD 2900 XT was largely disappointing. It was an upper-midrange card allied with the power usage of an enthusiast board, consuming more power than any other contemporary solution.
The scale of the R600 misstep had profound implications within ATI, prompting strategy changes to meet future deadlines and maximize launch opportunities. Execution improved with RV770 (Evergreen) as well as the Northern and Southern Islands series.
Along with being the largest ATI/AMD GPU to date at 420mm², R600 incorporated a number of GPU firsts. It was AMD’s first DirectX 10 chip, its first and only GPU with a 512-bit memory bus, first vendor desktop chip with a tessellator unit (which remained largely unused thanks to game developer indifference and a lack of DirectX support), first GPU with integrated audio over HDMI support, as well as its first to use VLIW, an architecture that has remained with AMD until the present 8000 series. It also marked the first time since the Radeon 7500 that ATI/AMD hadn’t fielded a top tier card in relation to the competition’s price and performance.
AMD updated the R600 to the RV670 by shrinking the GPU from TSMC’s 80nm process to a 55nm node in addition to replacing the 512-bit bidirectional memory ring bus with a more standard 256-bit. This halved the R600’s die area while packing nearly as many transistors (666 million versus 700 million in the R600). AMD also updated the GPU for DX10.1 and added PCI Express 2.0 support, all of which was good enough to scrap the HD 2000 series and compete with the mainstream GeForce 8800 GT and other lesser cards.
In the absence of a high-end GPU, AMD launched two dual-GPU cards along with budget RV620/635-based cards in January 2008. The HD 3850 X2 shipped in April and the final All-In-Wonder branded card, the HD 3650, in June. Released with a polished driver package, the dual GPU cards made an immediate impact with reviewers and the buying public. The HD 3870 X2 comfortably became the single fastest card and the HD 3850 X2 wasn’t a great deal slower. Unlike Nvidia’s SLI solution, AMD instituted support for Crossfiring cards with a common ASIC.
Full Story: History of the Modern Graphics Processor, Part 4 – TechSpot.