Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Any way to reset/clear SMART attributes (i.e. 199 UDMA_CRC_Error_Count) #172

Open
stevecs opened this issue Dec 7, 2024 · 3 comments

Comments

@stevecs
Copy link

stevecs commented Dec 7, 2024

More of a question so if this is the wrong forum let me know. I have been looking to see if there is a way to clear/reset some SMART attributes or how to go about it. I am, in particular, looking at 199 UDMA_CRC_Error_Count as I have a good number of drives that have values there due to misbehaving back planes or bad cables/hba's in the past.

Yes I can track each variable to see if it increases but that gets harder to see/monitor with hundreds of drives. The ability to reset/clear that to zero would be very useful. Likewise other values like 188 Command Timeout.

I know that these can be cleared by the OEM on refurbished drives, as well as I've seen some instances where they can be cleared with certain firmware updates. But have not found any means so far to clear them for general/advanced users.

Vast majority of our rotating rust drives are seagate if that matters (ST4000's though ST20000's) if it's a oem specific type of command.

@vonericsen
Copy link
Contributor

HI @stevecs,

Thanks for the question!
There is no way to do this for SMART attributes on ATA drives.
SAS is a bit different with the ability to reset log pages (but not every counter is resettable).

Since SMART attributes are obsolete and being replaced with Device Statistics, I did check and there is a feature that can be used to reset some statistics. The SATA Phy event counters log also has something like this.
openSeaChest does not currently have an option to reset these, but I will look into adding those options.

The phy event counters log does support the CRC counter and device statistics has both CRC counter and command timeouts (It's called "Number of Resets Between Command Acceptance and Command Completion").

I will test a few different products for these features as well and update this issue as I find out more.

@stevecs
Copy link
Author

stevecs commented Dec 9, 2024

@vonericsen Thanks for taking a look and will be interested in what you find.

Yes I've been seeing the 'slow demise' of SMART attributes over the years (not to mention that they were never really standardized or enforced) but they did at least provide a lot of data that was very useful (and have always wanted similar details in SAS/SCSI/FC devices over the last ~40 years).

I was not aware of "Device Statistics" for SATA drives (to be fair, I only have a couple hundred SATA drives most are SAS/FC or NVME). So that's interesting. Would be interested if you could point to any URL's for specs or standards to that for "bedtime reading".

@vonericsen
Copy link
Contributor

@stevecs,

Yes I've been seeing the 'slow demise' of SMART attributes over the years (not to mention that they were never really standardized or enforced) but they did at least provide a lot of data that was very useful (and have always wanted similar details in SAS/SCSI/FC devices over the last ~40 years).

Yeah, there are multiple reasons for this, some dating back to when SMART was released in ATA-3.
There was an attempt to create standardized attributes that made it as far as a draft, but it ran into other issues. One of them was that vendors wanted to report more data than SMART had space to report. This was one of the main driving factors to create the Device Statistics log in place. There is also a technical report called SMART Attribute Descriptions (SAD) which was the reference the committee used to determine which attributes should be standardized based on what they could find searching around the web and reported by the members of the committee.

Seagate's firmware group has not given any timeline in which SMART attributes will be removed, but the device statistics log has been supported for quite a while now.

I was not aware of "Device Statistics" for SATA drives (to be fair, I only have a couple hundred SATA drives most are SAS/FC or NVME). So that's interesting. Would be interested if you could point to any URL's for specs or standards to that for "bedtime reading".

ACS-3 was the first spec to define the majority of the device statistics log on SATA and it is very similar to the standardized outputs from SAS/FC log pages. The SAT specs even translate many of these statistics to these log pages today as well.
All the most common attributes now have a statistic, although some may have a slightly different name (like I mentioned about command timeout).
There have been a few additions over time to the log for more statistics, including for Zoned devices and most recently CDL (command duration limits).
I have not had a chance to see how long the option to reinitialize certain statistics has been in the standard but I will be looking that up when I get started on implementing that option.

We support showing that page with --deviceStatistics in openSeaChest_SMART, and I also added support for SAS devices to read the various log pages to get similar output.

One other part of device statistics added to the standard is Device Statistics Notifications. The idea here is the drive can generate a sense code when one of these notifications triggers, similar to a SMART trip type of event. It can be based on a firmware monitored event and there is also support from the standard for programmable notifications as well.
I do not think there is a lot of support for setting notifications from software yet though which is why it is not yet part of openSeaChest, however we do have a way to note which statistics do allow setting a notification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants