NVMe SSD Thermal Management: What We Have Learned from Marathons

Part 1: Environmental Assessment

NVMe solid state drives (SSDs) are known to run at blistering speeds. They are 4X faster than Serial ATA (SATA) drives! As such, they are prone to overheating, especially when installed in systems with limited airflow. This series of articles explores the considerations and thermal solutions offered by ATP, so NVMe SSDs can beat the heat and thus deliver reliable sustained performance over extended periods of time.

Thermal Management is Like a Marathon

Thermal management can be likened to a marathon, which is a long-distance footrace that requires endurance and strategy. Here are five things that they have in common. These variables can affect the performance of both marathon runners and NVMe SSDs.

 

Marathons and the thermal management of NVMe SSDs have these five things in common, each of which can impact performance.

 

Environmental Assessment

We start with understanding the system environment. It could be a box PC, data logger or IIoT server operating under various temperatures, airflow conditions, and customer’s criteria.

 

In assessing the system environment, we check the airflow, as systems are normally already equipped with fans. To ensure sufficient ventilation, we also check if powerful fans or cooling plates are needed for heat dissipation at high temperatures.

 

The table below shows examples of various test environments as well as customers’ criteria.

 

 

Case Temp. Airflow Customer Criteria
Box PC ~70°C No Airflow Needs to stay operational without shutting down
Data Logger ~85°C 1200 to 1800 LFM* Sustained Read/Write performance
IIoT Server ~55°C Min. 2000 LFM*
Fan actions triggered by composite temp need to stay within certain range

                                                                                                                                * LFM: Linear Feet per Minute

By knowing the environment and requirements to set up the thermal simulation and optimize the performance, we can meet customer’s expectations.

 

Part 2: Physical Conditions

Thermoregulation, which is the body’s ability to maintain its core or optimal temperature, is crucial and helps avoid potentially dangerous conditions. The same is true with an SSD — if it keeps heating up, the drive with trigger a thermal shutdown. Through thermal simulation, we would like to find the balance between temperature and performance.

Component-Level Simulation in Design Phase

Cadence Simulation software is used to perform component-level simulation of IR drop analysis (signal integrity) and thermal simulation during product design phase.

Hardware engineers input important component and package information into the system, such as case dissipation, power loss, printed circuit board (PCB) dissipation in watts, as well as junction temperature, case and board temperature, and other relevant information.

The Cadence Simulation software generates the results showing the distribution of heat in each PCB layer, which indicates which area(s) have the potential risk of heat accumulation.

Hardware engineers then consider adjusting the layout circuits, wire thickness, the quantity and/or position of through-holes, and other variables.

 

                                                                                   

Sample temperature distribution result of the PCB top layer shows that heat accumulates on the Controller.

NOTE: Pure hardware simulation at worst-case scenario (full speed, not considering firmware-based thermal throttling mechanism)

Checking the System’s Mechanical Design

Apart from the SSD itself, we check the system’s mechanical design. We can add an additional heatsink to improve heat dissipation. Ideally, an 8 mm heatsink is better than a copper foil but not every system has enough space.

What factors do we consider when assessing heat dissipation for mechanical design within the system?

  • Space. In compact systems, space is very cramped and heatsink solutions are not typically considered during design. How then can we make sure that the heatsink fits? Is there ample space for a heatsink? We need to consider the area surrounding the NVMe SSD, from the top (height), the bottom, and the width (thickness/length).
  • Mechanical Interference. We also make a carefully evaluation to make sure that all components on the system PCB do not physically interfere or overlap with the heatsink.

                                                                                           

It is important to verify that the system has enough room for installing a heatsink.

ATP Heatsink Solutions: Design Considerations

Upon consideration of the surrounding area, it is time to come up with the optimal heatsink solution. As an example, ATP’s specially designed 8 mm fin-type heatsink offers the following advantages for systems with limited space:

  • More surface area for heat dissipation
  • Lightweight aluminum material offers good thermal conductivity
  • Good adhesiveness of thermal pad for conductivity
  • Clips design for assembly efficiency on SSDs

 

                                                                    

The figure above shows the top and bottom parts of an ATP 8 mm fin-type heatsink and how it looks when assembled onto an M.2 2280 NVMe SSD.

 

                                                    

                                                                                                  

The clips are made of thin 0.3 mm stainless steel for more reliable for top and bottom heatsink fixing. The space-saving design is suitable for systems with limited space, as it does not interfere with other components on the system PCB.

 

Part 3: Ambient Simulation (Training)

“Ambient simulation” can be likened to a runner’s training program in preparation for a marathon. This means subjecting SSDs to different ambient conditions that mimic real scenarios to make sure that the SSDs are fit to perform well under such conditions.

System-Level Simulation Using Cadence Thermal Simulation System

The Cadence Thermal Simulation System runs system/module-level simulation. The customer inputs thermal elements such as ambient temperature, airflow, and ATP SSD parameters.

Given different ambient conditions and airflow, the Cadence system can estimate the SSD temperature with or without the heatsink. The following figures demonstrate the effectivity of ATP’s 8 mm heatsink in pure hardware, full-operation mode (worst-case scenarios).

           

                                                                    Ambient Condition 2: 70°C, 1200 ft/m

                                                                    

In this scenario, the ambient temperature is higher but there is sufficient airflow, which helped lower the controller area temperature to 145°C. Sufficient airflow complemented by the 8 mm heatsink resulted in the reduction of the overall temperature.

NOTE: Pure hardware simulation test, full-operation mode (worst-case scenario)

 

ATP-Built Mini Chamber for SSD Testing

Aside from the Cadence software simulation tool, ATP also performs actual SSD tests using our own-built mini chamber. Sized just like a notebook/laptop, it is more compact and flexible as well as easier to use compared with typical giant test chambers.

 

 

The ATP-built mini chamber

 

The mini chamber allows us to perform SSD tests in a controlled environment. We can change the airflow, temperature setting, and test scripts via external system and save the log files. The chamber is equipped with an alarm/alert, which enables an emergency stop and overheat protection in case the temperature goes over the threshold limits.

Part 4 : Gear/Equipment

Choosing the right gear or equipment to dissipate heat is very important. For athletes participating in a marathon, choosing suitable clothing can provide protection and enhance breathability, and the right fabric allows easy evaporation of sweat for better cooling and comfort.

Material Selection

There are several factors to consider when choosing the right material for NVMe SSD heatsinks. In this article, we will discuss a few of the most important ones.

                             

 

Reliability Test for Copper Foil

To evaluate the reliability of our copper foil solution, we perform resistance testing at high/low temperature on the adhesive layer to make sure that there is no deformation.

                                              

The adhesive strength of the copper foil heatsink is tested to ensure reliability and excellent retention of the heatsink to the SSD.

 
Shore Hardness Scale of Thermal Pad

The Shore Durometer is a way of measuring the hardness of materials such as plastics and rubber. Flexible, soft thermal pads should attach closely between the heatsink and SSD components to transfer heat away from the SSD and keep the operating temperature cool. If the pads are too soft, it means that the percentage of the silicone is high, and the heat dissipation substance is low. This could lead to poor thermal conductivity.

                                                                      

This illustration shows that if the thermal pad is too soft, it compromises the heatsink’s attachment to the SSD components,resulting in lower heat dissipation.

 

Cadence Simulation Tool

As mentioned in the previous article, the Cadence Simulation software can be used to run component-level thermal simulation. With this software, you can compare which material and/or heatsink solution is the best option. Component-level simulation consists of several factors, including ambient temperature, airflow, and thermal resistance/power consumption of main components. Cadence Simulation is pure hardware simulation based on full-speed operation (worst-case scenario). The thermography below compares thermal data for bare SSD and SSD using ATP’s 8 mm heatsink. In the following example, the ambient temperature is higher at 70°C but with 1200 ft/m airflow. At full speed, controller temperature for the bare SSD rises to 145°C. With the 8 mm heatsink plus sufficient airflow, the temperature goes down to 133°C, giving a 12°C reduction.

                                                      

NOTE: Pure hardware simulation test at worst-case scenario (full speed without firmware-based thermal throttling mechanism)

 
What You Wear Keeps You Cooler

Special quick-dry outfits worn by athletes provide protection and keep moisture or sweat away from the skin to keep it cool. The same is true for NVMe SSDs. ATP’s special heatsink solutions keep heat away from the SSD and reduce the temperature of the controller, where heat typically accumulates.

The graph below shows the controller temperature is reduced from a bare SSD’s 68.5°C to 46.9°C with ATP’s 4 mm heatsink, and further down to 30°C with the 8 mm heatsink. These images were taken at room temperature with minimal airflow of 450 linear feet per minute (LFM), and 100% Sequential write after the 30-minute test.

                                                                                                                  

ATP Heat Dissipation Solutions

Not every system has room or space for a powerful heatsink. Considering space constraints, ATP offers different heat dissipation solutions described in the table below.

 

Heat Dissipation Solution

Type

Copper Foil

4 mm Heatsink

8 mm Heatsink

Length

Width

L: 80 mm

W: 22 mm

L: 80 mm

W: 24.4 mm

L: 80 mm

W: 24.4 mm

Height

3.9 mm

4 mm: 8.3 mm

8 mm: 12.3 mm

Material

Copper

Upper: Aluminum alloy

Bottom: Stainless steel

Upper: Aluminum alloy

Bottom: Stainless steel

Suitability

Limited space

Enough space for effective heat dissipation

Fixedness

Stick

Clips design

Clips design

 

Here is another example showing the importance of choosing the right gear. The figures on below shows that the bare SSD repeatedly slows down to cool whenever the composite temperature keeps increasing. The 8 mm heatsink helps to keep the SSD cool by dissipating heat complemented by airflow support.

As the heatsink continues to reduce the composite temperature of the NVMe SSD, steady performance is achieved with ATP’s unique firmware (FW) algorithm, resulting in better sustained performance.

 

                                                                         

 

 

Part 5 : Pacing Strategy

We will discuss the importance of having a pacing strategy. For marathon runners, this means manage their energy throughout the race. At ATP, we use the Dynamic Thermal Throttling mechanism to manage the heat of our NVMe SSDs to ensure sustained performance at optimal levels.

Steady Wins the Race!

When running heavy workload at high temperature, the drives will trigger “thermal throttling” to slow down the drive and prevent overheating. The downside is that the SSD is unable to sustain the optimum operation required to perform and finish the tasks at hand.

ATP Dynamic Thermal Throttling Mechanism

Contrary to standard throttling mechanisms, the ATP Dynamic Thermal Throttling mechanism does not push the SSD to its temperature limits and then sharply drop speed to cool it down. Instead, it provides a delicate balance between performance and temperature by continuously detecting device temperature and adjusting the pace.

ATP Thermal Management Solutions combine both hardware (heatsink) and firmware (Dynamic Throttling mechanism) to make sure that the SSD delivers optimum sustained performance throughout its operation.

By continuously detecting the device temperature and adjusting the pace, lower power consumption is achieved, unlike SSDs that always run under full speed and waste a lot of energy. With fewer fan operations, less energy is required, and less noise is generated.

                                           

ATP Thermal Management Solutions combine HW and FW to provide a pacing strategy that results in steady speed and efficient power management.

By continuously detecting the device temperature and adjusting the pace, lower power consumption is achieved, unlike SSDs that always run under full speed and waste a lot of energy. With fewer fan operations, less energy is required, and less noise is generated.

Simulation and Customization: One Scenario Does Not Fit All

At ATP, we believe that every application scenario presents unique thermal requirements. This is why we carefully consider different factors to come up with a solution that fits the specific needs. One scenario does not fit all, so we offer customization options.

The following table summarizes some of the scenarios presented by customers — different airflow environments, ambient temperatures, workloads, heatsink types according to available space in the system design, and respective test results.

                                        

Simulation and Customization Table


With customers’ varied application requirements, ATP welcomes inquiries for customization. For more information on ATP’s customizable thermal management solutions for M.2 and U.2 NVMe drives, visit the ATP website or contact an ATP Representative.

 

お問い合わせ