From 6d113980f4341533bd1a2aa483476e30d4db328b Mon Sep 17 00:00:00 2001 From: Yanxin Lu Date: Sat, 28 Mar 2026 21:47:21 -0700 Subject: [PATCH] vault backup: 2026-03-28 21:47:21 --- notes/Unraid Drive Replacement Plan.md | 260 +++++++++++++++++-------- 1 file changed, 183 insertions(+), 77 deletions(-) diff --git a/notes/Unraid Drive Replacement Plan.md b/notes/Unraid Drive Replacement Plan.md index 4afe143..e00d8e8 100644 --- a/notes/Unraid Drive Replacement Plan.md +++ b/notes/Unraid Drive Replacement Plan.md @@ -1,119 +1,225 @@ # Unraid Drive Replacement Plan -## Risk Tolerance: 20% cumulative failure probability +## Goals +- Reduce array from 8 to 5 HDD drives +- Replace consumer SMR drives with enterprise CMR +- Stay under 20% cumulative failure risk before each purchase +- Minimize cost and hassle ## Current Array (as of 2026-03-28) -| Slot | Model | Serial | Age | Est. AFR | Notes | -|------|-------|--------|-----|----------|-------| -| Parity 1 | ST8000DM004 8TB SMR | ZCT2K65B | 5.3 yr | ~4.5% | Had 180 GP errors (cleared) | -| Parity 2 | ST8000DM004 8TB SMR | ZCT2K5PQ | 5.3 yr | ~3.5% | Had 19 GP errors (cleared) | -| Disk 1 | ST8000DM004 8TB SMR | ZCT2K3FF | 5.3 yr | ~2.75% | Same batch as parity drives | -| Disk 2 | ST8000DM004 8TB SMR | ZCT2K51K | 5.3 yr | ~2.75% | 1 CRC error | -| Disk 3 | ST8000DM004 8TB SMR | ZCT3XAEJ | 4.8 yr | ~3.5% | Had 10 GP errors, 2 CRC, 72K load cycles | -| Disk 4 | WD40EFRX 4TB CMR | WD-WCC7K2YF4TDJ | 6.6 yr | ~2.5% | Pristine SMART, lowest priority | -| Disk 5 | ST8000DM004 8TB SMR | ZR14W632 | 2.5 yr | ~1.75% | 56C lifetime max | -| Disk 6 | ST8000DM004 8TB SMR | ZR150CVS | 2.5 yr | ~1.75% | Clean | -| Cache | Samsung 970 EVO 1TB | — | — | — | 1% wear, excellent | +| Slot | Model | Serial | Size | Age | Est. AFR | Health | +|------|-------|--------|------|-----|----------|--------| +| Parity 1 | ST8000DM004 SMR | ZCT2K65B | 8TB | 5.3yr | ~4.5% | Had 180 GP errors (cleared) | +| Parity 2 | ST8000DM004 SMR | ZCT2K5PQ | 8TB | 5.3yr | ~3.5% | Had 19 GP errors (cleared) | +| Disk 1 | ST8000DM004 SMR | ZCT2K3FF | 8TB | 5.3yr | ~2.75% | Same manufacturing batch as parity drives | +| Disk 2 | ST8000DM004 SMR | ZCT2K51K | 8TB | 5.3yr | ~2.75% | 1 CRC error, otherwise clean | +| Disk 3 | ST8000DM004 SMR | ZCT3XAEJ | 8TB | 4.8yr | ~3.5% | Had 10 GP errors, 2 CRC, 72K load cycles | +| Disk 4 | WD40EFRX CMR | WD-WCC7K2YF4TDJ | 4TB | 6.6yr | ~2.5% | Pristine SMART, only 1.7TB used | +| Disk 5 | ST8000DM004 SMR | ZR14W632 | 8TB | 2.5yr | ~1.75% | 56C lifetime max | +| Disk 6 | ST8000DM004 SMR | ZR150CVS | 8TB | 2.5yr | ~1.75% | Clean | +| Cache | Samsung 970 EVO | S5H9NS0N831296A | 1TB | — | — | 1% wear, excellent | -All 7 Seagate drives are consumer desktop SMR (not NAS-grade). Dual parity protects against up to 2 simultaneous failures. +- Combined monthly failure rate: ~1.9% +- Hits 20% cumulative failure at ~12 months (Mar 2027) +- Total data stored: ~25TB, growing ~1TB/year +- All 7 Seagate drives are consumer desktop SMR — not NAS-grade + +## Shopping List + +| # | Drive | Model | Purpose | Est. Price | +|---|-------|-------|---------|------------| +| 1 | Seagate Exos X18 16TB | ST16000NM000J | Parity 1 | ~$220 | +| 2 | Seagate Exos X18 16TB | ST16000NM000J | Parity 2 | ~$220 | +| 3 | Toshiba MG08 16TB | MG08ACA16TE | Data (Disk 1) | ~$200 | +| | | | **Total** | **~$640** | + +All are enterprise, CMR, 7200rpm, 5-year warranty. ## Timeline -### March 2027 — Parity 1 +### Phase 0 — Now (free) -| | Details | -|--|---------| -| Remove | ST8000DM004 8TB (ZCT2K65B) | -| Install | **Seagate Exos X24 24TB** (ST24000NM002H) | -| Cost | ~$380 | -| Time | Parity sync ~18 hrs | -| Risk at swap | ~20% | +**Remove Disk 4 (WD Red 4TB)** -**Steps:** Stop array > power down > swap drive > power up > assign to Parity 1 slot > start array > parity sync +Disk 4 only has 1.7TB used. Other drives have 14.5TB of free space. No purchase needed. -Keep old drive labeled as cold spare. +1. Open **Unbalance** plugin +2. Source: Disk 4 +3. Destination: Disk 3, Disk 5, Disk 6 (most free space) +4. Start transfer (~30 min for 1.7TB) +5. Verify Disk 4 is empty +6. **Stop array** (Main → Stop) +7. Unassign Disk 4 (click on the drive slot → set to "No Device") +8. **Start array** → confirm new configuration +9. Parity sync will run (~12-18 hrs, array is usable during sync) +10. Power down, physically remove the 4TB drive, power up + +**Result: 8 → 7 drives.** Keep the WD Red as a cold spare — it has pristine SMART. --- -### May 2028 — Parity 2 + Disk 3 +### Phase 1 — April 2027 -Do these a week apart. Parity first, then data drive. +**Buy: 1x Seagate Exos X18 16TB (~$220)** +**Replace: Parity 1 (ZCT2K65B — worst drive, had 180 GP errors)** +**Remove: Disk 3 (ZCT3XAEJ — next worst, had 10 GP errors + 2 CRC + 72K load cycles)** -**Parity 2:** +Cumulative failure risk at this point: ~20% -| | Details | -|--|---------| -| Remove | ST8000DM004 8TB (ZCT2K5PQ) | -| Install | **Seagate Exos X24 24TB** (ST24000NM002H) | -| Cost | ~$380 | -| Time | Parity sync ~18 hrs | +#### Step 1: Replace Parity 1 +1. **Stop array** +2. Power down server +3. Remove old Parity 1 drive (sdb, serial ZCT2K65B) +4. Install new Exos X18 16TB in the same slot +5. Power up → Main page → verify new drive shows in Parity 1 slot +6. **Start array** → Parity sync begins (~12-18 hrs) +7. Array is fully usable during sync — avoid heavy writes +8. Wait for sync to complete (check Main page for progress) -**Disk 3 (one week later):** +#### Step 2: Remove Disk 3 +9. Open **Unbalance** plugin +10. Source: Disk 3 (~4.3TB) +11. Destination: Disk 1, Disk 2, Disk 5, Disk 6 +12. Start transfer (~2-4 hrs) +13. Verify Disk 3 is empty +14. **Stop array** +15. Unassign Disk 3 +16. **Start array** → confirm new configuration → parity sync (~12-18 hrs) +17. Power down, physically remove old Disk 3, power up -| | Details | -|--|---------| -| Remove | ST8000DM004 8TB (ZCT3XAEJ) | -| Install | **Toshiba MG10 20TB** (MG10ACA20TE) | -| Cost | ~$300 | -| Time | Rebuild ~18-20 hrs | +#### Disposal +- Old Parity 1 (ZCT2K65B): **Recycle** — worst error history +- Old Disk 3 (ZCT3XAEJ): **Recycle** — 2nd worst -Combined cost: ~$700. Risk at swap: ~20%. After this, monthly failure rate drops to ~0.9%. +**Result: 7 → 6 drives.** --- -### Mid 2030 — Disks 1 & 2 +### Phase 2 — June 2027 -| Slot | Install | Cost | -|------|---------|------| -| Disk 1 | **Toshiba MG10 20TB** | ~$300 | -| Disk 2 | **WD Ultrastar HC560 20TB** | ~$300 | +**Buy: 1x Seagate Exos X18 16TB (~$220)** +**Replace: Parity 2 (ZCT2K5PQ — had 19 GP errors)** -Same batch as old parity drives, will be ~9 years old. Combined: ~$600. +#### Steps +1. **Stop array** +2. Power down, swap Parity 2 with new Exos X18 16TB +3. Power up → verify drive assignment → **Start array** +4. Parity sync (~12-18 hrs) + +#### Disposal +- Old Parity 2 (ZCT2K5PQ): **Recycle** + +**Result: Still 6 drives.** Cannot remove another drive yet — remaining 8TB data drives are too full to consolidate without a 16TB data drive. --- -### Whenever — Disks 4, 5, 6 +### Phase 3 — October 2027 -Low risk, replace opportunistically on sales. ~$300 each, ~$900 total. +**Buy: 1x Toshiba MG08 16TB (~$200)** +**Replace: Disk 1 (ZCT2K3FF — same batch as old parity, 5.3yr old)** +**Remove: Disk 5 (ZR14W632 — had temperature issues)** -## Total Cost +#### Step 1: Replace Disk 1 +1. **Stop array** +2. Power down, swap Disk 1 with new Toshiba MG08 16TB +3. Power up → verify drive assignment → **Start array** +4. **Rebuild** begins (~18-20 hrs) — reconstructs Disk 1's data from parity +5. Array is usable during rebuild -| Phase | When | Cost | Running Total | -|-------|------|------|--------------| -| Parity 1 | Mar 2027 | $380 | $380 | -| Parity 2 + Disk 3 | May 2028 | $700 | $1,080 | -| Disks 1 & 2 | Mid 2030 | $600 | $1,680 | -| Disks 4, 5, 6 | Whenever | $900 | $2,580 | +#### Step 2: Remove Disk 5 +6. Open **Unbalance** plugin +7. Source: Disk 5 (~5.2TB) +8. Destination: Disk 1 (new 16TB, ~9.4TB free after rebuild) +9. Start transfer (~3-5 hrs) +10. Verify Disk 5 is empty +11. **Stop array** +12. Unassign Disk 5 +13. **Start array** → confirm → parity sync (~12-18 hrs) +14. Power down, physically remove old Disk 5, power up -## If a Drive Fails Before Its Scheduled Replacement +#### Disposal +- Old Disk 1 (ZCT2K3FF): **Recycle** — same batch as failed parity drives +- Old Disk 5 (ZR14W632): **Keep as emergency spare** — only 2.5yr old, clean SMART -- **Parity drive fails:** Buy 1x Exos X24 24TB, swap, parity sync -- **Data drive fails:** Need 2 drives — 1x Exos X24 24TB (new parity, since 20TB data > 8TB parity) + 1x 20TB data drive. Upgrade parity first, then replace data drive. +**Result: 6 → 5 drives. Migration complete.** -Array stays online with a failed drive (dual parity). No data loss. Performance reduced until rebuild. +--- -## Final Array +## Final Array (October 2027) -| Slot | Drive | Brand | Size | -|------|-------|-------|------| -| Parity 1 | Exos X24 | Seagate | 24TB | -| Parity 2 | Exos X24 | Seagate | 24TB | -| Disk 1 | MG10 | Toshiba | 20TB | -| Disk 2 | HC560 | WD | 20TB | -| Disk 3 | MG10 | Toshiba | 20TB | -| Disk 4 | TBD | TBD | 20TB+ | -| Disk 5 | HC560 | WD | 20TB | -| Disk 6 | MG10 | Toshiba | 20TB | +| Slot | Drive | Size | Used | Source | Brand | +|------|-------|------|------|--------|-------| +| Parity 1 | Exos X18 | 16TB | — | New (Apr 2027) | Seagate | +| Parity 2 | Exos X18 | 16TB | — | New (Jun 2027) | Seagate | +| Disk 1 | MG08 | 16TB | ~12TB | New (Oct 2027) | Toshiba | +| Disk 2 | ST8000DM004 | 8TB | ~6.3TB | Original | Seagate | +| Disk 6 | ST8000DM004 | 8TB | ~5.2TB | Original | Seagate | -Usable capacity: ~120TB (up from ~48TB). All CMR, all enterprise/NAS-grade, 3 brands, 5-year warranties. +- **Usable capacity: 30.6TB** (16 + 7.3 + 7.3) +- **Data stored: ~25TB** with ~6TB free +- **Growth runway: ~6 years** at 1TB/year +- **Annual failure rate: ~4%** (down from ~19%) +- **3 empty drive bays** for future expansion + +## Optional Future Upgrade + +When Disk 2 (5.3yr) or Disk 6 (2.5yr) eventually show wear, or if you want more space: + +| Replace | With | Cost | New usable | +|---------|------|------|------------| +| Disk 2 → 16TB enterprise | Toshiba or WD | ~$200-230 | 38.6TB | +| Disk 6 → 16TB enterprise | Toshiba or WD | ~$200-230 | 48TB | + +Going all-enterprise (5x 16TB) costs an additional ~$430 and gives 48TB usable (23+ years of growth). No rush — Disk 6 especially is young and healthy. + +## If a Drive Fails Before Scheduled Replacement + +**Parity drive fails:** +- Buy the replacement Exos X18 16TB +- Swap, parity sync, done +- Other parity still protects the array during sync + +**Data drive fails (8TB):** +- Array keeps running — data is emulated from parity in real-time +- Buy a 16TB replacement (must upgrade parity first if no parity is ≥16TB yet) +- Swap, rebuild, done + +**Data drive fails (new 16TB):** +- Same process, buy another 16TB +- Dual parity protects against up to 2 simultaneous failures + +## Drives to Keep After Migration + +| Drive | Reason | +|-------|--------| +| WD Red 4TB (WD-WCC7K2YF4TDJ) | Cold spare — pristine SMART despite 6.6yr age | +| Disk 5 8TB (ZR14W632) | Emergency spare — only 2.5yr old | + +## Drives to Recycle + +| Drive | Reason | +|-------|--------| +| Parity 1 (ZCT2K65B) | 180 GP error history | +| Parity 2 (ZCT2K5PQ) | 19 GP error history | +| Disk 3 (ZCT3XAEJ) | 10 GP errors + 2 CRC + 72K load cycles | +| Disk 1 (ZCT2K3FF) | Same batch as parity drives, 5.3yr | ## Pending Config Fixes -- [ ] Remove compose.manager plugin (deprecated, still installed) -- [ ] Set share security to Private (disk shares still "public") -- [ ] Move domains share to cache (useCache should be "prefer") -- [ ] Set up email/push notifications for drive alerts -- [ ] Configure UPS auto-shutdown (CyberPower CP1500, nut-dw plugin installed) -- [ ] Enable XMP in BIOS (RAM running 1333 MT/s, rated for 1600 MT/s) +These can be done anytime from the Unraid WebGUI — no downtime needed: + +- [ ] Remove compose.manager plugin (Settings → Plugins → click X) +- [ ] Set disk share security to Private (Settings → Global Share Settings → Disk Shares → No) +- [ ] Set user share security to Private (Shares → each share → SMB Security → Private) +- [ ] Move domains share to cache (Shares → domains → Use Cache → Prefer, Cache Pool → cache) +- [ ] Set up email notifications (Settings → Notification Settings) +- [ ] Configure UPS auto-shutdown (Settings → UPS Settings — CyberPower CP1500 via USB, nut-dw plugin installed) +- [ ] Enable XMP in BIOS at next reboot (RAM: G.Skill DDR3 1600MHz running at 1333MHz) - [ ] Verify appdata backups — test a restore + +## SMART Test Schedule (already configured) + +- Weekly short: Sundays 2:00 AM (`0 2 * * 0`) +- Monthly extended: 1st of month 3:00 AM (`0 3 1 * *`) +- Configured via User Scripts plugin