tldr: I did some analysis for the accuracy about the HDD/SSD data we collected and based on my testing, it's a 95% accuracy, so I am happy about the result.
The databrick note book that I used to do the analysis: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/121937/command/121950, you probably can't view it because the permission, please feel free to ask me to grant you a permission.
Total HDD: 296086
Verified HDD: 280212
Total SSD: 483759
Verified SSD: 468225
I'd like to summarize how I verified the data, just for the record.
The challenge I was facing was I couldn't find enough disk information from reliable sources that could provide the model name and the disk type (HDD or SSD) because I needed these data to compare with the data we collected. I was able to get some disk models and disk types data from some disk benchmarking websites, however, it wasn't enough. There were a lot of models were missing. So I wrote a script that would do a google search for a given model, checked the search results see if it contained more SSD related terms than HDD related terms, if so it was an SSD, otherwise HDD. I didn't some smoke tests (ie, I manually googled the model name and check the disk type) to make sure this method was acceptable, and it seemed quite accurate to me.