Senin, 16 Maret 2020

AMD Details Renoir: The Ryzen Mobile 4000 Series 7nm APU Uncovered - AnandTech

The notebook market has not been kind to AMD over the last decade – for a long, long time the company was only ever seen as the discount option for those on a strict budget. It didn’t help that OEMs only saw AMD in that light, fitting bulky units with sub-standard displays and storage options meant that even retailers were only presenting AMD as something for the budget conscious.

All that seems set to change. Fast forward to 2020, and notebook users are eagerly awaiting the arrival of products based on AMD’s latest Ryzen Mobile 4000 series processors, which combine up to eight Zen 2 cores and upgraded Vega graphics into a small CPU for the notebook market. AMD has already made waves with its Zen 2 cores in the desktop and enterprise space, and the company has already announced it plans to put eight of those cores, along with a significantly upgraded graphics design, into a processor that has a thermal design point of 15 W. These 15 W parts are designed for ultraportable notebooks, and AMD has a number of design wins lined up to show just how good an AMD system can be.

The same silicon will also go into 45 W-class style notebooks, with a higher base frequency. These parts are geared more towards discrete graphics options, for gaming notebooks or more powerful business designs. The gaming market (at 45 W), the commercial market (15W to 45W) and the ultraportable market (15 W) are where AMD is hoping to strike hardest with the new hardware.

Since earlier this year in January, at the annual CES trade show, we saw a number of early designs based on the new Ryzen Mobile 4000 family. These included TUF laptops from ASUS, the Lenovo Yoga Slim 7, Lenovo Thinkpads using Ryzen Mobile 4000 Pro, Dell’s G5 15 SE, the Acer Swift 3, and the ASUS Zephyrus G14 to name but a few. All of these are key design wins for different segments of the market, and the two that AMD seem to be pushing most are the Zephyrus and the Yoga Slim 7.


The rear panel of the ASUS Zephyrus G14 with its LED rear

The ASUS Zephyrus G14 is set to be the only 14-inch laptop on the market that has both a H-series processor, a 1080p 120Hz panel, and an RTX 2060 discrete graphics card solution in that form factor. The aim here is to have something both portable and high performance, with within the right thermal envelope, for gamers and users who need something with a bit more oomph while on the road, such as video editors who need up to 32 GB of DDR4 inside. There’s an added rear panel effect with moveable LEDs, just for a DJ to show off or to show a logo. The Zephyrus G14 will also be the first design with a HS-series processor, which we’ll cover in a bit.

The second key system AMD is promoting is an ultraportable, the Lenovo Yoga Slim 7. It comes with the highest grade Ryzen Mobile 4000 15 W U-series processor, the Ryzen 7 4800U, and is designed to turbo up to 25 W when needed based on the design of the chassis. Paired with Wi-Fi 6, a FreeSync display, and LPDDR4X, this is the system that AMD is using for all their battery life performance demonstrations.

AMD is working with Lenovo to source these units for press sampling, which should have been for today, however due to the world situation the shipment of these have been delayed, and users will start to see reviews from next month, even though they might be available in China before then.

The Processor Offerings

As with Intel’s mobile processors, AMD’s latest lines fall into two categories. For the ultraportable and low end gaming market, we have 15 W parts called ‘U-Series’. For the gaming market where discrete GPUs are used, there are 45 W parts called ‘H-Series’. The commercial market will take from both sets, and later in the year we might see mini-PC manufacturers (like Zotac, perhaps) use one or the other to bolster their portfolio.

Not previously announced until today is the AMD Ryzen 9 4900H family, the new halo Ryzen Mobile 4000 hardware. These are AMD’s first processors with the Ryzen 9 designation, and we have a specific news story about them here.

AMD Ryzen Mobile 4000 APUs
AnandTech Cores
Threads
Base
Freq
Turbo
Freq
L2 L3 GPU CUs
GPU Freq
TDP
H-Series
Ryzen 9 4900H 8 / 16 3.3 GHz 4.4 GHz 4 MB 8 MB 8 / 1750 MHz 45 W
Ryzen 9 4900HS 8 / 16 3.0 GHz 4.3 GHz 4 MB 8 MB 8 / 1750 MHz 35 W
Ryzen 7 4800H 8 / 16 2.9 GHz 4.2 GHz 4 MB 8 MB 7 / 1600 MHz 45 W
Ryzen 7 4800HS 8 / 16 2.9 GHz 4.2 GHz 4 MB 8 MB 7 / 1600 MHz 35 W
Ryzen 5 4600H 6 / 12 3.0 GHz 4.0 GHz 3 MB 8 MB 6 / 1500 MHz 45 W
Ryzen 5 4600HS 6 / 12 3.0 GHz 4.0 GHz 3 MB 8 MB 6 / 1500 MHz 35 W

The H series processors are split into H and HS parts. For all except the Ryzen 9, the specifications between the two match, aside from the TDP, which is 45 W for the H and 35 W for the HS, but both of them are considered ‘H-Series class’ processors. Technically the H series can be de-rated to run at 35 W, however to get the S in the name requires collaboration with AMD, which we’ll get into later.

AMD Ryzen Mobile 4000 APUs
AnandTech Cores
Threads
Base
Freq
Turbo
Freq
L2 L3 GPU CUs
GPU Freq
TDP
U-Series
Ryzen 7 4800U 8 / 16 1.8 GHz 4.2 GHz 4 MB 8 MB 8 / 1750 MHz 15 W
Ryzen 7 4700U 8 / 8 2.0 GHz 4.1 GHz 4 MB 8 MB 7 / 1600 MHz 15 W
Ryzen 5 4600U 6 / 12 2.1 GHz 4.0 GHz 3 MB 8 MB 6 / 1500 MHz 15 W
Ryzen 5 4500U 6 / 6 2.3 GHz 4.0 GHz 3 MB 8 MB 6 / 1500 MHz 15 W
Ryzen 3 4300U 4 / 4 2.7 GHz 3.7 GHz 2 MB 4 MB 5 / 1400 MHz 15 W

The U-series parts, by the nature of the lower TDP, ultimately have a lower base frequency than the others. These CPUs also tend to rely more on the integrated graphics, which means that the power budget is often split between the CPU and GPU. AMD is also going for an interesting mix here of parts with-and-without simultaneous multithreading. The bottom processor, the Ryzen 3 4300U, even has half of its L3 cache disabled.

All of these CPUs support DDR4-3200 (up to 64 GB, 51.2 GB/s) and LPDDR4X-4266 (up to 32 GB, 68.3 GB/s), and it will be up to the OEM which one to use: LPDDR4X should offer better idle battery life and peak performance, but DDR4 offers more capacity. It is likely that we’ll see the ultraportable market use LPDDR4X, while the more gaming and workstation class systems will use DDR4.

All of the CPUs are PCIe 3.0 only, rather than PCIe 4.0 like the desktop parts. This is primarily due to power – the double bandwidth of PCIe 4.0 requires more power, and given that storage or graphics rarely need peak speeds, AMD felt the product portfolio would prefer battery life in this regard. Each chip has sixteen PCIe 3.0 lanes, split such that x8 is available for a graphics card, and two x4 links for storage. There are separate PCIe lanes for other modules such as Wi-Fi 6 or mobile network access (4G/5G).

Display support for the CPUs allows for two 4K monitors through DisplayPort over Type-C, an additional 4K monitor if Thunderbolt is used, and a fourth monitor if USB 4.0 used. AMD has designed Renoir to not need additional chips to detect which way a Type-C is connected – that is all handled on die. With the display and USB support, the processor allows for concurrent USB 3.2 and DisplayPort use, with the peak DP v1.4 8.1G HBR3 standard in play using display stream compression (DSC).

Silicon Details

AMD surprised us by offering some details on the silicon here. The APU was manufactured on TSMC’s N7 process (7nm DUV), using a 13-layer metal stack. The whole die is 9.8 billion transistors. In January, we calculated through photography the die size to be about 150-151 mm2. AMD is stating that it is 156 mm2, which given previous measurements, probably doesn’t include scribe lines.

While we don’t have performance numbers for Renoir today, due to world events, we do have some deeper details into the platform that have not been disclosed before. These cover CPU and GPU improvements, significant changes to power management, Infinity Fabric, and how AMD is taking better control of thermals, performance, and battery life this type around.

AMD has stated that they expect to see 100+ designs using Renoir this year, with a number of those being key design wins that the company has not had in recent memory. Considering where the company was only four years ago, surrounded in a vicious negative feedback loop, this is a significant upswing in OEM participation, putting AMD in premium designs. Ultimately it’s the consumer who wins, as we should now see some serious competition in the notebook market.

AMD’s latest Ryzen mobile product is the first design the company has done that combines CPU, GPU, and IO all on a monolithic die in TSMC’s 7nm process.

The CPU part of the design is very similar to what we’ve seen on the desktop: two quad core groups each with their own L3 cache shared between the cores. Compared to the desktop design, the mobile is listed as being ‘optimized for mobile’, primarily by the smaller L3 cache – only 4 MB per quad-core group, rather than the 32 MB per quad-core group we see on the desktop. While the smaller L3 cache might mean more trips out to main memory to get data, overall AMD sees it as saving both power and die area, with this level of cache being the right balance for a power limited chip.

Compared to the precious generation of Zen mobile processors, this generation on the CPU side of the equation comes with the 15% per-core iso-frequency improvement, down to the improvements at the heart of each core. We’ve covered these in detail in our desktop analysis. However for the mobile platform, not only is there a raw performance uplift, but we’re also seeing frequency uplift as well, moving from 4.0 GHz in the prior gen up to 4.3 GHz here. Actual workload performance AMD says gets a significant uplift due to the new power features we’ll discuss in due course.

On the GPU side is where we see bigger changes. AMD does two significant things here – it has reduced the maximum number of graphics compute units from 11 to 8, but also claims a +59% improvement in graphics performance per compute unit despite using the same Vega graphics architecture as in the prior generation. Overall, AMD says, this affords a peak compute throughput of 1.79 TFLOPS (FP32), up from 1.41 TFLOPS (FP32) on the previous generation, or a +27% increase overall.

AMD manages to improve the raw performance per compute unit through a number of changes to the design of the APU. Some of this is down to using 7nm, but some is down to design decisions, but it also requires a lot of work on the physical implementation side.

For example, the 25% higher peak graphics frequency (up from 1400 MHz to 1750 MHz) comes down a lot to physical implementation of the compute units. Part of the performance uplift is also due to memory bandwidth – the new Renoir design can support LPDDR4X-4266  at 68.3 GB/s, compared to DDR4-2400 at 38.4 GB/s. Most GPU designs need more memory bandwidth, especially APUs, so this will help drastically on that front.

There are also improvements in the data fabric. For GPUs, the data fabric is twice as wide, allowing for less overhead when bulk transferring data into the compute units. This technically increases idle power a little bit compared the previous design, however the move to 7nm easily takes that onboard. With less power overhead for bulk transfer data, this makes more power available to the GPU cores, which in turn means they can run at a higher frequency.

Coming to the Infinity Fabric, AMD has made significant power improvements here. One of the main ones is decoupling the frequency of Infinity Fabric from the frequency of the memory – AMD was able to do this because of the monolithic design, whereas in the chiplet design of the desktop processors, the fix between the two values has to be in place otherwise more die area would be needed to transverse the variable clock rates. This is also primarily the reason we’re not seeing chiplet based APUs at this time. However, the decoupling means that the IF can idle at a much lower frequency, saving power, or adjust to a relevant frequency to mix power and performance when under load.

Again we see the double bus width from the graphics to the engine pop up here, giving a better power-per-bit metric. But one of the key aspects from this graph is showing that the power consumed by the fabric in the new processors is very even across a wide bandwidth range compared to the older processor, where the voltages likely had to be stepped up as bandwidth increased, and introducing additional latency factors for performance. Luckily Renoir does away with this, and AMD are claiming a 75% better fabric efficiency compared to the previous generation.

Orthogonal to the raw improvements, AMD has also improved the media capabilities, with a new HDR/WCG encode engine for HEVC, which according to AMD should give a 31% encoding speedup when used.

Earlier in the year AMD was keen to promote that in Renoir it has made significant advances as to how power is managed across the APU, leading to increased performance and better battery life. The two key figures here were ‘20% reduced SoC power’ and ‘5x reduction in power gating latency’ (also known as an 80% reduction, because you can’t have a 5x reduction of a time). We now have some details.

First up it should be mentioned that 7nm helps a lot here. The smaller process node, with smaller transistors (assuming they’ve been laid out correctly), will require a lower voltage. That lower voltage directly translates into lower power, and we’ve seen how well AMD has pushed the 7nm designs on the desktop and in the enterprise space to know that compared to previous process nodes, there is a lot of power to save here. That being said, the design choices and features matter too.

AMD’s power management all goes through a system-level management controller. For this generation, AMD has re-written the firmware with speed in mind (they claim 33% faster), but also made other improvements, such as aggressive clock gating of the L3 cache when not needed, and using power optimized circuits for IO features such as for the embedded display controller and PCIe physical layers.

The updated system management controller (SMC) is built around user preference. In this case if the user tells the OS he or she wants more performance, or more battery life, then the SMC can take into consideration everything involved in the system and plan accordingly. If the OS can provide guidance as to an upcoming workload, then voltages and frequencies (or parts of the chip unused can be put in idle), then the SMC is built to understand it.

Ultimately there are many sensors around the APU, monitoring activity and the type of activity going on in that particular region, even down to the types of instructions being used. The SoC is a lot more dynamic in its clock control, allowing for different clock domains in various parts of the SoC to be adjusted depending on both the activity of the region but also the thermal limits, system limits, and other items that might affect performance. This is especially useful for powering down parts of the SoC that are not in use, leading to AMD’s efficiency claims, or the performance claims such as maintaining a specific bandwidth across an interconnect (quality of service). The thresholds for these activity monitors can be set by the OS and by the user. The SMU also takes into account the power source (battery vs power supply) and connected hardware (displays, power over USB).

For the power gating latency, AMD has doubled the save and restore bus width from the buffers to the cores, allowing for a system to resume faster from a CPUOFF state. Not only this, but AMD is using the ACPI 6.3 specifications to take advantage of offering multiple C states in the OS.

One of the issues of the previous generation of Picasso APUs, on the left, is that there was only a single set of states that the processor could be in. This means that at any time, the CPU could fall from a power state (a P state) into a lower power state, or an idle state, or an off state. If the CPU went too far down this stack, while it would be saving power, each hop down the rabbit hole meant a longer time to get back out of it, diminishing performance and latency but also requiring more power changes at the silicon level. Each hop in its own right requires additional power.

With the new Renoir designs, a system can take advantage of multiple different sets of states. This means that the CPU can’t go down too low when the system is in use. With a system in use, the OS or system controller can’t put parts of the core into low power states because those are not available, which means that even if the system goes into the lowest power mode possible, while the system is still being used, then there are fewer jumps to get back up to high speed.

As the system becomes less used, known as ‘increased idle duration’, then the system has access to sets of states that allow the parts of the APU to enter deeper idle states. This means that the system can only enter a low frequency domain if that part of the core has been sufficiently idle, or user interaction has willed it.

This is all part of the ACPI 6.3 standard, and AMD states that this combined with the reduced SoC power gives both better battery life and better immediate performance for the user. To show this in action, AMD pinpointed a common activity that most users might be familiar with: opening applications.

In this case, AMD took the start of the PCMark 10 Application Loading benchmark. In this benchmark a number of applications are loaded, and the requirements are often more IO driven than CPU driven. A good CPU with a fast reaction time will keep its power and frequency low while the IO requests are being done, and speed up one or two threads when the CPU needs to get involved.

In AMD’s benchmark, where they are using frequency as a proxy for power, They show that in the initial 5 seconds of the test, the new Ryzen 4000 CPU is hovering at an idle frequency, whereas the older Ryzen 3000 CPU is fluttering around, even peaking near 4.0 GHz when it doesn’t need to. This allows parts of the new CPU to be powered down for longer periods of time, even when the system is actually in use.

When I asked AMD’s executives where they stand on battery life, one of them hinted that the difference between themselves and the competition (in similar designs) should be on the order of minutes rather than dozens of minutes. Specifically AMD sees itself better than the competition in productivity/web browsing workloads, graphics workloads, and video playback, and cited that most battery benchmarks don’t often take into account a good mix of ‘the average user’. A number of the media responded that often our benchmarks are geared towards different types of users consummate to our audience, such as gamers or content creators. Ultimately we will see what the results are when we have hardware on hand.

We’ve covered AMD’s SmartShift before, when it was announced at CES, as a technology that allows a system management controller to interact with both the mobile APU and an AMD graphics card in the same system in order to shift power where it is needed. This solution is still based on separate power rails between the two, however the use of the scalable control fabric (SCF) part of Infinity Fabric means that parts of the APU and parts of the GPU can interact together in this way. Ultimately AMD believes they can score 10-12% better on heavy CPU workloads like CineBench or gaming workloads like The Division.

The solution is firmware based, but requires interaction between qualified hardware. AMD states that often in these sorts of CPU+GPU designs, while the chassis has a total design TDP, if one of the elements of the system is idle, then the other can’t take advantage of the extra turbo headroom available. SmartShift aims to fix that.

What’s new here is that AMD is primarily focus will be on Ryzen Mobile 4000 + Vega 10 style systems. The first one with SmartShift enabled will be the Dell G5 SE. The Dell G5 SE is being labelled as ‘the ultimate mobile gaming experience’, and will feature a new H-series processor, the Radeon RX 5600M, SmartShift, FreeSync, and have a 15-inch display. The unit will be coming out in Q2.

SmartShift is also part of a new System Temperature Tracking paradigm that AMD is implementing in its new APUs. Even if there is power headroom, a system can’t turbo if there isn’t thermal headroom. Smart Temperature Tracing v2 (or STTv2) is designed to help a system boost for longer by knowing more about the thermal profile of the device.

My placing additional thermal probes inside the system, such as on hot controllers or discrete GPUs, the readings of these can be passed through the Infinity Fabric to an embedded management controller. Through learning how the system thermals interact when different elements are loaded, the controller can determine if the system still has headroom to stay in turbo for longer than the current methodology (AMD’s Skin Temperature Aware Power Management). This means that rather than having a small number of sensors getting a single number for the temerpature of the system, AMD takes in many more values to evaluate a thermal profile of what areas of the system are affected at what point.

What STTv2 does at the end of the day is potentially extend the boost time for a given system, depending on its thermal capabilities. For example, the Lenovo Yoga Slim 7 which we are expecting for review only has a 15 W processor inside, but the chassis has been built for a 25 W TDP design, which means that STTv2 should kick in and provide the user with peak performance for longer.

As mentioned, at the top end of the new Ryzen Mobile 4000 list are the HS processors, offering almost exactly the same specification as the 45 W H-series processors, but at 35 W. AMD has marked these as special processors, not available to every OEM, because they fall under AMD’s new cooperative design and continuous validation programs.

In order to be able to use a HS processor, an OEM must work with AMD on the design. This is similar to what Intel does – ensure that the OEM partner gets the best from the hardware, and try and assist as to what design decisions the hardware was built for. Ultimately the product that comes out should be one that shows off the best of the hardware and gives the best user experience. On top of this, AMD has a list of ‘assured components’, validated to work against the new processors, and has created two continuous validation labs.

These labs, one in Austin and one in Shanghai, take the systems and pre-test all the new drivers and vendor software on them before they are released to the public. This is to ensure that the system is in no way compromised in performance, power, or thermals as a result of the update (to ensure that a company doesn’t completely mess with the power profile after it launches through a BIOS update or similar). AMD didn’t state how long its continuous validation program will run after the product is shipped, though I should think that at least 12-18 months should be plausible.

If the OEM does all this, and AMD agrees, then the product can use one of the HS processors.

(Note, the normal 45 W processors can be run in a 35 W mode, but just doing that doesn’t mean you can call it a HS. No doubt some third-tier OEM might try…)

The first HS-class system on the market will be the ASUS Zephyrus G14, and we learned in January that ASUS’s design has an exclusive for six months from launch. We’re expecting the G14 to hit the market in Q2, even with the current state of production, so we’ll see more HS models later in the year.

Ultimately today was the day that AMD was going to lift the embargo on Ryzen Mobile 4000 reviews, with the systems that AMD and its partners have provided. Due to the current ongoing issues around the world, those technical reviews of the systems will have to wait a few weeks while production is being ironed out. But for now we have a good grasp as to what AMD has pushing into the new processors coming out later this year.

Regular readers of AnandTech may remember back in 2016 I wrote a very long piece about AMD’s laptop strategy, where I tested five laptops from OEMs that were using AMD’s latest Carrizo APU at the time. The conclusions to that review were three fold: AMD was shooting itself in the foot by providing a platform that allowed its partners to be cheap; the OEM partners were being cheap by giving the hardware 13x7 screens, poor storage, poor trackpads and such because that’s all the customers seemed to want; and the customers were continually asking for cheaper systems, then getting frustrated with the poor user experience, ultimately blaming AMD rather than the OEM. It was a vicious cycle that required someone to break it.

Normally for these launches, a company will create a reference design for its partners to work with. AMD for years was creating these dull $500-$700 reference designs, which ultimately led to the paragraph above. We pled for generations for AMD to make a halo reference design, something ultraportable for $1500. For Renoir, given the reasonable performance uplift from the previous generation, the company worked with partners to create a range of high profile devices. I’ve covered a few of them in this article – the Lenovo Yoga Slim 7, the Dell G15 SE, and the ASUS Zephyrus G14 all attack different markets in very different ways, but all are examples of high-end products and design wins that AMD has needed in this market.

One of AMD’s big targets here is commercial. Despite the poor consumer performance of its older generations of laptops, the commercial laptop arm of AMD did reasonably well by comparison. AMD has announced its Ryzen Mobile 4000 Pro designs that afford admin control and sustainability over the lifetime of the product, and the key win here is that we’re seeing the processors in Lenovo’s Thinkpads, a key market.

The other big market is gaming. AMD can attack this on two fronts - the ultraportable market with the improved integrated graphics should get some good perfomrance, but also the more power hungry gaming market will get access to features like SmartShift to help balance the power between the APU and discrete GPU. AMD is also playing in the middle market here, with devices like the ASUS Zephyrus G14 with a HS processor and a Radeon RX5600M inside a 14-inch chassis, which AMD claims is the first 14-inch device with a H-class CPU and a discrete GPU inside. AMD's gaming team seem to be very happy with this design.

However, announcing systems is one thing. Deploying them is another. AMD has made a lot of claims about its Ryzen Mobile 4000 platform – performance, power consumption, and battery life. We’ve gone into detail into a lot of these, but we’re still missing one piece of the puzzle – the on-hand data. We’re hoping to get a system or two here in due course, and compare it against the competition.

Let's block ads! (Why?)


https://news.google.com/__i/rss/rd/articles/CBMiZmh0dHBzOi8vd3d3LmFuYW5kdGVjaC5jb20vc2hvdy8xNTYyNC9hbWQtZGV0YWlscy1yZW5vaXItdGhlLXJ5emVuLW1vYmlsZS00MDAwLXNlcmllcy03bm0tYXB1LXVuY292ZXJlZNIBAA?oc=5

2020-03-16 15:53:51Z
52780668994667

Tidak ada komentar:

Posting Komentar