I’m not sure why every time I look at this project, it rubs me the wrong way. Anyone found anything wrong with it?
I’m not sure why every time I look at this project, it rubs me the wrong way. Anyone found anything wrong with it?
As far as I can tell it dates back to at least 2010 - https://docs.oracle.com/cd/E19253-01/819-5461/githb/index.html. See the Solaris version. You can try it with small test files in place of disks and see if it works. I haven’t done it expansion yet but that’s my plan for growing beyond the 48T of my current pool. I use ZFS on Linux btw. Works perfectly fine.
I think data checksums allow ZFS to tell which disk has the correct data when there’s a mismatch in a mirror, eliminating the need for 3-way mirror to deal with bit flips and such. A traditional mirror like mdraid would need 3 disks to do this.
Not that I want to push ZFS or anything, mdraid/LVM/XFS is a fine setup, but for informational purposes - ZFS can absolutely expand onto larger disks. I wasn’t aware of this until recently. If all the disks of an existing pool get replaced with larger disks, the pool can expand onto the newly available space. E.g. a RAIDz1 with 4x 4T disks will have usable space of 12T. Replace all disks with 8T disks (one after another so that it can be done on the fly) and your pool will have 24T of space. Replace those with 16T and you get 48T, and so on. In addition you can expand a pool by adding another redundant topology just like you can with LVM and mdraid. E.g. 4x 4T RAIDz1 + 3x 8T RAIDz2 + 2x 16T mirror for a total of 44T. Finally, expanding existing RAIDz with additional disks has recently landed too.
And now for pushing ZFS - I was doing file based replication on a large dataset for many years. Just going over all the hundreds of thousands of dirs and files took over an hour on my setup. That’s then followed by a diff transfer. Think rsync or Syncthing. That’s how I did it on my old mdraid/LVM/Ext4 setup, and that’s how I continued doing on my newer ZFS setup. Recently I tried using ZFS send/receive which operates within the filesystem. It completely eliminated the dataset file walk and stat phase since the filesystem already knows all of the metadata. The replication was reduced to just the diff file transfer time. What used to take over an hour got reduced to seconds or minutes, depending on the size of the changed data. I can now do multiple replications per hour without significant load on the system. Previously it was only feasible overnight because the system would be robbed of IOPS for over an hour.
If you can, move to a RAID-equivalent setup with ZFS (preferred in my opinion) in order to also know about and fix silent data corruption. RAIDz1, RAIDz2 would do the equivalent to RAID5, RAID6. That should eliminate one more variable with cheap drives.
I’d advise against using docker from docker.com’s repo on Ubuntu unless you need to. Ubuntu LTS includes a fairly recent docker package starting with 22.04. By using that you eliminate the chance for breakage due to a defective or incompatible docker update. You also get the security support for it that comes with Ubuntu. The package is docker.io
.
Virtualization isn’t required for docker on Linux generally, unless a container tries to use KVM or something like that. Also docker already exists in Ubuntu’s repos under the docker.io
package so that’s the easiest place to download (apt install docker.io
) from.
Due to risk of failure or risk of data corruption because the mirror can’t tell which drive is right when there’s a difference?
Already done. I’m just trying to exhaust all the hypotheses I have in case I stumble upon a durable workaround that is applicable for others and cheaper. Good USB add-in cards are not cheap.
I’ve been trying. Nothing has worked so far. I’ve got a few more cables/permutations to try.
Get more drives, run higher redundancy 💪
You’re right, the correct term is Gb/s or Gbps. Edited.
The thing is that I’m already at the last couple of leaves in the investigation tree and I’m not willing to change anything upwards of the USB driver level. That’s why there isn’t much point in getting people to spin their wheels for solutions I can’t or won’t apply. If I was completely unable to get the data corruption and disconnects under control, I’d trash the system and replace it with Intel. Fortunately, a PCIe add-in USB controller seems to work well so I avoided the most costly solution. At this point I don’t actually need to get the motherboard ports to work well but I’m curious to follow down the signalling rabbit hole because I’m not the only one who’s having this problem and the problem doesn’t affect just this one use case. If I find a solution like an in-line 5Gb USB hub (reduces data rate), or just using USB-C ports instead of USB-A (reduces noise), or using this kind of cable instead of that kind, I could throw that as a cheaper workaround in this ZFS thread and elsewhere. The PCIe cards work but aren’t cheap.
Unfortunately it won’t because the transfers are happening between ZFS and the hardware storing the data so I can’t control the data rate at the application level (there are many different applications) or even at the ZFS level. This is why in this particular case I’m stuck with a potential hardware-related workaround. I mean I could do something stupid like configuring a suboptimal recordsize in ZFS but there could still be spikes and I’d prefer to get the hardware to stop losing bits and hoping ZFS would catch that. Decreasing data rates is a generally acceptable strategy to deal with signalling issues, if the decreased rate is usable for the application at hand. In my case it is.
I am trying to transfer data via USB at high speed without data corruption, silent resets and occasional device disconnects. Those are things that happen because the USB controllers on my motherboard made by AMD with some help from ASMedia do not function correctly at the speed they advertise. So given the problem the right solution is to get a firmware or hardware fix for these USB controllers, however that’s unlikely to happen. So I’m trying to find a workaround. I already have one (PCIe add-in card) but now I’m also testing running the bad controllers at half-speed which seems successful so far but I was wondering if there’s a way to do it in software. I’m currently bottlenecking the links by using 5Gb hubs between the controllers and the devices.
Yup. A USB host controller. Specifically AMD Bixby.
Great question. In short, garbagy AMD USB controllers. I recently switched to a newer AMD board and have been hit with the same issues faced by these poor sods. I’ve been conducting testing over the last week, different combinations of ports, cables, loads, add-in PCIe USB controllers. The add-in cards seem to behave well, which is one way the folks from that thread solved their problems. The other being changing to Intel-based systems. Yesterday however I was watching an intro about USB redrivers by TI and they were discussing various signalling issues that could occur and how redrivers help. That led me to form the hypothesis that what I’m experiencing might be signalling related. E.g. that the combination of controllers/ports/cables simply can’t handle 10Gbps. That might be noise from some of those devices or surrounding ones that causes signal loss when operating at 10Gbps, speeds this setup can actually achieve. In order to test that I tried placing the DAS boxes behind a 5Gb hub plugged in a port that has previously shown a failure. So far it’s stable. This is why I was wondering whether there’s some magic in the kernel that could allow configuring 10Gb ports to operate at 5Gb.
The extension cable is a great idea. I’m currently trying 5Gb hubs on the path. Seems to work.
E: I think the USB-A connector for 5Gb and 10Gb is the same. The 10Gb cable must simply carry double the rate without losing data due to noise. Similar to Cat 5 vs Cat 6 ethernet cables. If so an extension should keep the controller-advertised speed downstream. Seems like hubs are the only option.
Oh yes, X would be AMD fixing their defective USB controllers but that won’t happen on a system produced years ago. 😂
I use Immich. It does what you described as well.