-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ST16000NM000J] errors after switching sector size to 4096 #157
Comments
Hi @insunaa, It is possible that changing the sector size has caused a bug within the USB adapter's firmware. While It sounds like you were able to complete the command to get the drive running again on a SATA port on another system. One case I have seen (although extremely uncommon) is that all the |
Thank you! I'll do the test on S-ATA as soon as I'm able. In the meantime this is the shortGeneric test via the USB enclosure:
|
The 16TBW comes from one pass of badblocks. The thing that throws me off is how or why would a USB HDD controller care what the sector size is? Shouldn't it just pass along ATA/SCSI commands to the controller? How does the sector size even interact with that in any way?
|
Based on your output it looks like the drive is working properly and that the fast format completed successfully.
I will try to explain this as best I can. The first thing to understand is that USB adapters are all SCSI translators to change from SCSI commands to ATA commands that the drive needs. USB translators are notorious for missing translations and bugs in translation.
The first thing to keep in mind is many of these devices started being developed before 4K sectors were implemented on any device. Even once 4K became available, most consumer models of HDD didn't change or allow changing to a 4K logical sector size on the drive so the adapters still saw everything as a traditional 512B device.
For reads and writes, yes. Where it gets tricky is the SAT spec describes a command called Inside of an ATA Passthrough CDB you specify all the command inputs to send to the ATA drive (task file registers) as well as some additional information to tell the translator how to do the data transfer (direction, protocol, how to calculate the total transfer length). When you run a drive with a 512B logical sector size, all commands to and from the device are all multiples of 512B. When you run a drive with a 4096B logical sector size, all read and write commands to LBA space are 4096B multiples and all other commands are still the 512B multiples (Identify, read SMART data, read log data, etc). What seems to be happening with this USB adapter is that has something in its firmware hard-coded to expect all data transfers to be multiples of the logical sector size or it is not properly handling the byte_block field in the CDB that tells the adapter that the transfer is multiples of 512B or multiples of the logical sector size (it may be treating these two options as one in the same). So, while your drive was in 512e mode it worked fine, but switching to 4096B now it cannot complete the ATA passthrough requests for sizes that are not multiples of the logical sector size. Now we get into the next part of this whole mess....there are multiple ways to construct equivalent SAT passthrough CDBs. Due to how ATA has developed there are a lot of weird cases to handle which is why SAT passthrough can get complicated quite quickly and it is why there are so many different bit fields used to communicate to the translator how much data is being transferred. What I have found over my years of dealing with the headache that is USB is that some adapters just don't work as expected no matter what you do, others will work in most situations without a problem, and some need special workarounds to get them to work as best as possible. There are often still various limitations even after adding workarounds, but it's not always possible to find all the corner cases. In your case this USB adapter is not already in our list, so it used the default rules which seem to work fine in 512e mode. If you run These are the two outputs from that tool that tend to help me the most: Both of these are data-safe tests. The second is a bit more aggressive to look for a few additional cases that seem common to cause the USB adapter to hang and stop responding which is why it is not enabled by default. |
@vonericsen You, sir, are a saint. passthroughtest.txt If there are any other tests you'd like me to run, regardless of data safety as this HDD is meant to become a cold spare drive anyway, please do tell me. Thank you very much, again! Edit: Is it feasible to convert the drive back to 512e or will trying to use |
Thanks! I do my best and try to find ways to explain these problems as best I can in a way others can understand. Would you mind running this in verbose mode? It did not find a way to get the ATA passthrough commands to work with the known workarounds.
You can convert it back to 512e if you want. I don't suspect it will make anything worse, however changing sector size on the drive does have its own risks as you see in all the warnings (and some issues/discussions from other users). If you convert it back to 512e that will get it reporting SMART data again with this adapter since that was working for you originally. |
@vonericsen Thank you for your reply! The resulting file was too large to attach to GitHub, so I sent an e-mail with a Google Drive link to the E-Mail you list on GitHub. I will convert the device back to 512e later tonight and run another passthrough test. This is probably the best course of action since the other Seagate drives in my array, which are currently actively in use, are also formatted to 512e, so no real reason to break the mold after all. |
@vonericsen Thank you for your patience. Here are the passthrough tests with 512e: passthroughtest2.txt
|
Adding user reported USB bridge info to try and enhance support for it further. This is from running a 512e drive, but 4KN it does not seem to work, so more debugging will be necessary to figure out if there is another possible workaround. [Seagate/openSeaChest#157] Signed-off-by: Tyler Erickson <[email protected]>
Thank you for the email! I found the same enclosure on amazon for the US. I'll see if we can get it or if I can get the same adapter another way. I pushed the info about what worked properly from the 512e output so at least that is in there for now. I have a couple ideas on things that I can try to see if I can get it working in 4KN mode, but no guarantee they will work. Let me see if I can get this adapter here as I would like to figure out if there is a possible workaround since USB adapters are often used to try quicky moving drives around and recover data from them. Luckily it does at least read and write still, but it would be nice to pull other drive data as well. |
I was testing a new X18 16TB S-ATA drive I bought a few days ago via a USB dock, since all of my hot-swap bays are in use and cannot currently be spared. It showed up normally in
smartctl
under linux and after a 24h smart long test all came back clear.I then attempted to swap to 4096 sector size, despite the warnings, which failed and left the drive in a temporarily inoperable state. I then plugged the drive into my desktop PC where I connected it with S-ATA directly and using the latest release of openSeaChest at the time of writing I ran the command again, and this time it apparently completed successfully.
smartctl -a
properly reported all of the info and openSeaChest seemed happy. I disconnected the drive again, but when I plugged it back into the USB enclosure to do a badblocks test on my server,smartctl
no longer detected the device.When I run
openSeaChest_Basic -i --SATInfo
I get the following output:Is there any way to recover from this state, given that showing all of this information has worked through that USB enclosure in the past?
I've tested the enclosure on an AMD64 system, too. This is not caused by the current machine being armv7
Edit: The enclosure itself should not be broken. I tested it with an Intel SSD and all SMART values were reported correctly.
The text was updated successfully, but these errors were encountered: