Since 2016, I’ve had a fileserver mostly just for backups. System is on 1 drive, RAID6 for files, and semi-annual cold backup.
I was playing with Photoprism, and their docs say “we recommend placing the storage folder on a local SSD drive for best performance.” In this case, the storage folder holds basically everything but the pictures themselves such as the database files.
Up until now, if I lost any database files, it was just a matter of rebuilding them by re-indexing my photos or whatever, but I’m looking for something more robust since I’ll have some friends/family using Pixelfed, Matrix, etc.
So my question is: Is it a valid strategy to keep database files on the SSD with some kind of nightly backup to RAID, or should I just store the whole lot on the RAID from the get go? Or does it even matter if all of these databases can fit in RAM anyway?
edit: I’m just now learning of ZFS caching which might be my answer.
You may be confused about the terminology:
RAID = many disks that help prevent data loss
NVME = Just an SSD
A collection of drives that are joined to a RAID group prevent data loss. An NVME drive is just a drive, which has no data loss prevention. If that drive dies, the data dies.
If those docs say anything about SSD, it’s because their code is slow, and it will seem less obvious by using a faster disk.
I understand all of that. Sorry I didn’t explain it well.
I have a RAID6 for data and a single HDD for system files. I’d like to move the HDD to an NVME/SSD. I suppose I could make another RAID with an additional NVME, but I’ve found it easier to deal with booting off a traditional drive.
My solution for redundancy for the NVME is to just backup the bits that I need every night. These are usually just a few hundred megabyte database files. I’m curious if that’s a typical solution.
edit: to clarify, it’s a software raid with mdadm.
Hah. I see your looking into ZFS caching. Highly recommend. I’m running Ubuntu 24.04 Root on ZFS RAID10. Twelve each data drives and one nvme cache drive. Gotta say it’s performing exceptionally. ZFS is a bit tricky, it requires an HBA not a RAID card. You may to to flash the raid card to get it working like I did. After that, I have put together a GitHub for the install on ZFS RAID 10, but you should easily be able to change it to RAIDz2. Fair warning, it wipes the drives entirely.
Picked up a LSI SAS 9305-16I. I was always planning to do software raid, so I think it’ll do the trick for zfs.
Hell yeah, it will. I need one of those bad boys.
Lucked out on eBay and got it for $50.
Don’t make the same mistake I did. Get a backup in place before using ZFS. Using ZFS and RAIDing your drives together makes them a singular failure point. If ZFS fucks up, you’re done. The only way to mitigate this is having another copy in a different pool and preferably different machine. I got lucky that my corrupted ZFS pool was still readable and I could copy files off, but others have not been so lucky.
Yeah, I wouldn’t dare.
The fact that I migrated from a 3 drive to 6 drive mdadm raid without losing anything is a damn miracle.
If your device permits it, run raid on disc, and use nvme as cache. My Synology does this.
I have run photoprism straight from mdadm RAID5 on some ye olde SAS drives with only a reduction in the indexing speed (About 30K photos which took ~2 hours to index with GPU tensorflow).
That being said I’m in a similar boat doing an upgrade and I have some warnings that I have found are helpful:
- Consumer grade NVMEs are not designed for tons of write ops, so they should optimally only be used in RAID 0/1/10. RAID 5/6 will literally start with a massive parity rip on the drives, and the default timer for RAID checks on Linux is 1 week. Same goes for ZFS and mdadm caching, just proceed with caution (ie 321 backups) if you go that route. Even if you end up doing RAID 5/6, make sure you get quality hardware with decent TBW, as sever grade NVMEs are often triple in TBW rating.
- ZFS is a load of pain if you’re running anything related to Fedora or Redhat, and the performance implications from lots and lots of testing is still arguably inconclusive on a NAS/Home lab setup. Unless you rely on the specific feature set or are making an actual hefty storage node, stock mdadm and LVM will probably fulfill your needs.
- Btrfs has all the features you need but is a load of trash in performance, highly recommend XFS for file integrity features + built in data dedup, and mdadm/lvm for the rest.
I’m personally going with the NVME scheduled backups to RAID because the caching just doesn’t seem worth it when I’m gonna be slamming huge media files around all day along with running VMs and other crap. For context, the 2TB NVME brand I have is only rated for 1200 TBW. That’s probably more then enough for a file server, but for my homelab server it would just be caching constantly with whatever workload I’m throwing at it. Would still probably last a few years no issues, but SSD pricing has just been awful these past few years.
On a related note, Photoprism needs to upgrade to Tensorflow 2 so I don’t have to compile an antiquated binary for CUDA support.
Thanks for the tips. I’ll definitely at least start with mdadm since that’s what I’ve already got running, and I’ve got enough other stuff to worry about.
Are you worried at all about bit rot? I hear that’s one drawback of mdadm or raid vs. zfs.
Also, any word on when photoprism will support the Coral TPU? I’ve got one of those and haven’t found much use for it.
Building RAID on top of SSDs is an answer.
My new motherboard actually has a RAID controller for the M.2 slots. I know people frown on hardware raid, but given it’s the boot drive, it might just be easiest to count on it for daily operation and backup to the software RAID/something else every night.
I meant software RAID of course. Hardware RAIDs just cause headacehes, but fake RAIDs that are built into motherboards are a real nightmare.