The Challenge
One of the most critical issues for high-performance and large-density NVMe solid state drives (SSDs) is heat. The overheat could result from multiple die stacking per integrated circuit (IC), intensive components in the limited printed circuit board (PCB) space especially for double-sided designs, and intense workloads. Compact in size yet big in performance, these ultra-fast SSDs are also often installed in enclosures with limited or no airflow.
Excessive heat can cause thermal shutdown, which can damage the SSD and compromise the data stored in it. To prevent this, SSDs are typically equipped with a thermal throttling mechanism, which cools the device by reducing the clock speed when a certain temperature is reached. The challenge, however, is that such mechanism causes drastic performance drops and thus makes it difficult to sustain the performance.
For systems equipped with powerful airflow capability, heat dissipation may not be an issue; however, there could be other concerns such as power management and the noise of fans.
The Solution
ATP recognizes that thermal challenges are unique for different use cases and scenarios; hence, a “one-size fits all” approach may not be the most suitable. To meet a customer’s specific thermal requirements, ATP offered a holistic and customizable solution that combined firmware and hardware technologies.
Joint Validation Service
ATP worked with system developers to overcome the challenges unique to the specific case. By understanding the performance criteria, user application and system specifications (including, but not limited to temperature, workload, airflow, and mechanical design), ATP was able to customize a tailor-fitted NVMe solution for the customer.
ATP’s customized thermal management solution consists of the following components:
Dynamic Thermal Throttling: Adaptive thermal control through the ATP Dynamic Thermal Throttling mechanism provides a delicate balance between performance and temperature instead of dramatic performance reduction. Temperature sensors continuously detect the device temperature. After sophisticated FW transactions, the performance gradually declines, and the temperature is adjusted.
H/W Heatsink Solution: A variety of HW heatsink options (materials, dimensions, types) are considered to match the mechanical constraints of each system design.
Garbage Collection F/W Tuning. A periodic background refresh offsets the significant performance drop caused by the long garbage collection process.
The Result
With ATP and the customer working closely together, an optimized solution combining both HW and FW met the customer’s needs. As the graph below shows, performance dropped sharply when standard thermal throttling was used. ATP NVMe SSDs with the customized thermal management solution, on the other hand, delivered higher sustained write performance at Ta 80°C.
|