From ff80846a2d0c1eac8428498fdf72aed3d952e58e Mon Sep 17 00:00:00 2001 From: Yanxin Lu Date: Sat, 28 Mar 2026 21:57:55 -0700 Subject: [PATCH] vault backup: 2026-03-28 21:57:55 --- notes/Unraid Drive Replacement Plan.md | 322 ++++++++++++------------- 1 file changed, 155 insertions(+), 167 deletions(-) diff --git a/notes/Unraid Drive Replacement Plan.md b/notes/Unraid Drive Replacement Plan.md index e00d8e8..72e2472 100644 --- a/notes/Unraid Drive Replacement Plan.md +++ b/notes/Unraid Drive Replacement Plan.md @@ -1,225 +1,213 @@ # Unraid Drive Replacement Plan ## Goals -- Reduce array from 8 to 5 HDD drives -- Replace consumer SMR drives with enterprise CMR -- Stay under 20% cumulative failure risk before each purchase -- Minimize cost and hassle + +- Replace the 3 worst drives (both parity + Disk 3) with enterprise CMR +- Remove the smallest drive (Disk 4, 4TB) — too small to be useful +- Keep the 4 healthiest data drives running +- Reduce from 8 to 6 HDD drives +- Stay under 20% cumulative failure risk +- Total budget: ~$440 ## Current Array (as of 2026-03-28) -| Slot | Model | Serial | Size | Age | Est. AFR | Health | -|------|-------|--------|------|-----|----------|--------| -| Parity 1 | ST8000DM004 SMR | ZCT2K65B | 8TB | 5.3yr | ~4.5% | Had 180 GP errors (cleared) | -| Parity 2 | ST8000DM004 SMR | ZCT2K5PQ | 8TB | 5.3yr | ~3.5% | Had 19 GP errors (cleared) | -| Disk 1 | ST8000DM004 SMR | ZCT2K3FF | 8TB | 5.3yr | ~2.75% | Same manufacturing batch as parity drives | -| Disk 2 | ST8000DM004 SMR | ZCT2K51K | 8TB | 5.3yr | ~2.75% | 1 CRC error, otherwise clean | -| Disk 3 | ST8000DM004 SMR | ZCT3XAEJ | 8TB | 4.8yr | ~3.5% | Had 10 GP errors, 2 CRC, 72K load cycles | -| Disk 4 | WD40EFRX CMR | WD-WCC7K2YF4TDJ | 4TB | 6.6yr | ~2.5% | Pristine SMART, only 1.7TB used | -| Disk 5 | ST8000DM004 SMR | ZR14W632 | 8TB | 2.5yr | ~1.75% | 56C lifetime max | -| Disk 6 | ST8000DM004 SMR | ZR150CVS | 8TB | 2.5yr | ~1.75% | Clean | -| Cache | Samsung 970 EVO | S5H9NS0N831296A | 1TB | — | — | 1% wear, excellent | +| Slot | Model | Serial | Size | Age | Est. AFR | Health | Action | +|------|-------|--------|------|-----|----------|--------|--------| +| Parity 1 | ST8000DM004 SMR | ZCT2K65B | 8TB | 5.3yr | ~4.5% | Had 180 GP errors (cleared) | **Replace** | +| Parity 2 | ST8000DM004 SMR | ZCT2K5PQ | 8TB | 5.3yr | ~3.5% | Had 19 GP errors (cleared) | **Replace** | +| Disk 1 | ST8000DM004 SMR | ZCT2K3FF | 8TB | 5.3yr | ~2.75% | Clean, same batch as parity | Keep | +| Disk 2 | ST8000DM004 SMR | ZCT2K51K | 8TB | 5.3yr | ~2.75% | 1 CRC error, otherwise clean | Keep | +| Disk 3 | ST8000DM004 SMR | ZCT3XAEJ | 8TB | 4.8yr | ~3.5% | Had 10 GP errors, 2 CRC, 72K load cycles | **Remove** | +| Disk 4 | WD40EFRX CMR | WD-WCC7K2YF4TDJ | 4TB | 6.6yr | ~2.5% | Pristine SMART, only 1.7TB used | **Remove** | +| Disk 5 | ST8000DM004 SMR | ZR14W632 | 8TB | 2.5yr | ~1.75% | 56C lifetime max, otherwise clean | Keep | +| Disk 6 | ST8000DM004 SMR | ZR150CVS | 8TB | 2.5yr | ~1.75% | Clean | Keep | +| Cache | Samsung 970 EVO | S5H9NS0N831296A | 1TB | — | — | 1% wear, excellent | Keep | -- Combined monthly failure rate: ~1.9% -- Hits 20% cumulative failure at ~12 months (Mar 2027) -- Total data stored: ~25TB, growing ~1TB/year -- All 7 Seagate drives are consumer desktop SMR — not NAS-grade +Total data stored: ~25TB, growing ~1TB/year ## Shopping List -| # | Drive | Model | Purpose | Est. Price | -|---|-------|-------|---------|------------| -| 1 | Seagate Exos X18 16TB | ST16000NM000J | Parity 1 | ~$220 | -| 2 | Seagate Exos X18 16TB | ST16000NM000J | Parity 2 | ~$220 | -| 3 | Toshiba MG08 16TB | MG08ACA16TE | Data (Disk 1) | ~$200 | -| | | | **Total** | **~$640** | +| Drive | Model | Purpose | Est. Price | +|-------|-------|---------|------------| +| Seagate Exos X18 16TB | ST16000NM000J | Parity 1 | ~$220 | +| Seagate Exos X18 16TB | ST16000NM000J | Parity 2 | ~$220 | +| | | **Total** | **~$440** | -All are enterprise, CMR, 7200rpm, 5-year warranty. - -## Timeline - -### Phase 0 — Now (free) - -**Remove Disk 4 (WD Red 4TB)** - -Disk 4 only has 1.7TB used. Other drives have 14.5TB of free space. No purchase needed. - -1. Open **Unbalance** plugin -2. Source: Disk 4 -3. Destination: Disk 3, Disk 5, Disk 6 (most free space) -4. Start transfer (~30 min for 1.7TB) -5. Verify Disk 4 is empty -6. **Stop array** (Main → Stop) -7. Unassign Disk 4 (click on the drive slot → set to "No Device") -8. **Start array** → confirm new configuration -9. Parity sync will run (~12-18 hrs, array is usable during sync) -10. Power down, physically remove the 4TB drive, power up - -**Result: 8 → 7 drives.** Keep the WD Red as a cold spare — it has pristine SMART. +Why 16TB parity for 8TB data drives: when a data drive eventually dies or fills up (~2030), you can drop in a 16TB replacement without touching parity. Buying 8TB parity now would mean buying parity drives again later — more expensive overall. The Exos X18 should last 8-12+ years in a home NAS, outlasting multiple generations of data drives. --- -### Phase 1 — April 2027 +## Phase 0 — Now (free) + +**Goal: Remove Disk 4 (WD Red 4TB) — too small to be useful** + +Disk 4 only has 1.7TB used. Other drives have 14.5TB of free space. + +1. Open **Unbalance** plugin in Unraid WebGUI +2. Source: Disk 4 +3. Destination: Disk 3, Disk 5, Disk 6 (most free space) +4. Start transfer (~30 min for 1.7TB) +5. Verify Disk 4 shows 0 bytes used +6. **Stop array** (Main → Array Operations → Stop) +7. Click on Disk 4 slot → set to **No Device** +8. **Start array** → check "Yes I want to do this" → confirm +9. Parity sync runs automatically (~12-18 hrs, array is usable during sync) +10. Once sync completes: power down, physically remove the 4TB drive, power up + +**Keep the WD Red 4TB as a cold backup drive** — pristine SMART despite its age. Good for periodic offline backups via USB dock. + +**Result: 8 → 7 drives.** + +--- + +## Phase 1 — April 2027 + +**Goal: Replace Parity 1, remove Disk 3** + +Cumulative failure risk reaches ~20% around this time. First purchase. **Buy: 1x Seagate Exos X18 16TB (~$220)** -**Replace: Parity 1 (ZCT2K65B — worst drive, had 180 GP errors)** -**Remove: Disk 3 (ZCT3XAEJ — next worst, had 10 GP errors + 2 CRC + 72K load cycles)** -Cumulative failure risk at this point: ~20% +### Step 1: Replace Parity 1 -#### Step 1: Replace Parity 1 1. **Stop array** 2. Power down server -3. Remove old Parity 1 drive (sdb, serial ZCT2K65B) -4. Install new Exos X18 16TB in the same slot -5. Power up → Main page → verify new drive shows in Parity 1 slot -6. **Start array** → Parity sync begins (~12-18 hrs) -7. Array is fully usable during sync — avoid heavy writes -8. Wait for sync to complete (check Main page for progress) +3. Locate Parity 1 drive (sdb, serial ZCT2K65B) and remove it +4. Install the new Exos X18 16TB in the same physical slot +5. Power up +6. Main page → verify new drive appears in the Parity 1 slot +7. **Start array** +8. Parity sync begins automatically (~12-18 hrs) +9. Array is fully usable during sync — avoid heavy writes for best performance +10. Wait for sync to complete (progress shown on Main page) -#### Step 2: Remove Disk 3 -9. Open **Unbalance** plugin -10. Source: Disk 3 (~4.3TB) -11. Destination: Disk 1, Disk 2, Disk 5, Disk 6 -12. Start transfer (~2-4 hrs) -13. Verify Disk 3 is empty -14. **Stop array** -15. Unassign Disk 3 -16. **Start array** → confirm new configuration → parity sync (~12-18 hrs) -17. Power down, physically remove old Disk 3, power up +### Step 2: Remove Disk 3 -#### Disposal -- Old Parity 1 (ZCT2K65B): **Recycle** — worst error history -- Old Disk 3 (ZCT3XAEJ): **Recycle** — 2nd worst +After parity sync completes: + +11. Open **Unbalance** plugin +12. Source: Disk 3 (~4.3TB after absorbing some of Disk 4's data) +13. Destination: Disk 1, Disk 2, Disk 5, Disk 6 +14. Start transfer (~2-4 hrs) +15. Verify Disk 3 shows 0 bytes used +16. **Stop array** +17. Click on Disk 3 slot → set to **No Device** +18. **Start array** → confirm → parity sync runs (~12-18 hrs) +19. Once sync completes: power down, physically remove Disk 3, power up + +### Disposal + +- Old Parity 1 (ZCT2K65B): **Recycle** — worst error history in the array +- Old Disk 3 (ZCT3XAEJ): **Recycle** — GP errors + CRC errors + high load cycles **Result: 7 → 6 drives.** --- -### Phase 2 — June 2027 +## Phase 2 — June 2027 + +**Goal: Replace Parity 2** **Buy: 1x Seagate Exos X18 16TB (~$220)** -**Replace: Parity 2 (ZCT2K5PQ — had 19 GP errors)** -#### Steps 1. **Stop array** -2. Power down, swap Parity 2 with new Exos X18 16TB -3. Power up → verify drive assignment → **Start array** -4. Parity sync (~12-18 hrs) +2. Power down server +3. Locate Parity 2 drive (sdc, serial ZCT2K5PQ) and remove it +4. Install the new Exos X18 16TB in the same physical slot +5. Power up +6. Main page → verify new drive appears in the Parity 2 slot +7. **Start array** +8. Parity sync begins (~12-18 hrs) +9. Wait for completion -#### Disposal -- Old Parity 2 (ZCT2K5PQ): **Recycle** +### Disposal -**Result: Still 6 drives.** Cannot remove another drive yet — remaining 8TB data drives are too full to consolidate without a 16TB data drive. +- Old Parity 2 (ZCT2K5PQ): **Recycle** — 19 GP error history + +**Result: Still 6 drives. Migration complete.** --- -### Phase 3 — October 2027 +## Final Array (June 2027) -**Buy: 1x Toshiba MG08 16TB (~$200)** -**Replace: Disk 1 (ZCT2K3FF — same batch as old parity, 5.3yr old)** -**Remove: Disk 5 (ZR14W632 — had temperature issues)** +| Slot | Drive | Size | Used | Age | Health | +|------|-------|------|------|-----|--------| +| Parity 1 | Seagate Exos X18 (new) | 16TB | — | New | Enterprise CMR, 5yr warranty | +| Parity 2 | Seagate Exos X18 (new) | 16TB | — | New | Enterprise CMR, 5yr warranty | +| Disk 1 | ST8000DM004 (kept) | 8TB | ~6.6TB | 6.3yr | Clean | +| Disk 2 | ST8000DM004 (kept) | 8TB | ~6.5TB | 6.3yr | 1 CRC, otherwise clean | +| Disk 5 | ST8000DM004 (kept) | 8TB | ~5.2TB | 3.5yr | Clean | +| Disk 6 | ST8000DM004 (kept) | 8TB | ~5.2TB | 3.5yr | Clean | -#### Step 1: Replace Disk 1 -1. **Stop array** -2. Power down, swap Disk 1 with new Toshiba MG08 16TB -3. Power up → verify drive assignment → **Start array** -4. **Rebuild** begins (~18-20 hrs) — reconstructs Disk 1's data from parity -5. Array is usable during rebuild +- **Usable capacity: 29.2TB** (4 × 7.3TB) +- **Data stored: ~25TB**, ~4TB free +- **Growth runway: ~4 years** at 1TB/yr before needing bigger data drives +- **Annual failure rate: ~8%** (down from ~19%) +- **2 empty drive bays** for future expansion -#### Step 2: Remove Disk 5 -6. Open **Unbalance** plugin -7. Source: Disk 5 (~5.2TB) -8. Destination: Disk 1 (new 16TB, ~9.4TB free after rebuild) -9. Start transfer (~3-5 hrs) -10. Verify Disk 5 is empty -11. **Stop array** -12. Unassign Disk 5 -13. **Start array** → confirm → parity sync (~12-18 hrs) -14. Power down, physically remove old Disk 5, power up +## Cost Summary -#### Disposal -- Old Disk 1 (ZCT2K3FF): **Recycle** — same batch as failed parity drives -- Old Disk 5 (ZR14W632): **Keep as emergency spare** — only 2.5yr old, clean SMART - -**Result: 6 → 5 drives. Migration complete.** +| When | What | Cost | +|------|------|------| +| Now | Remove Disk 4 | $0 | +| Apr 2027 | Exos X18 16TB (Parity 1) | $220 | +| Jun 2027 | Exos X18 16TB (Parity 2) | $220 | +| **Total** | | **$440** | --- -## Final Array (October 2027) +## When a Data Drive Eventually Dies or Fills Up (~2030) -| Slot | Drive | Size | Used | Source | Brand | -|------|-------|------|------|--------|-------| -| Parity 1 | Exos X18 | 16TB | — | New (Apr 2027) | Seagate | -| Parity 2 | Exos X18 | 16TB | — | New (Jun 2027) | Seagate | -| Disk 1 | MG08 | 16TB | ~12TB | New (Oct 2027) | Toshiba | -| Disk 2 | ST8000DM004 | 8TB | ~6.3TB | Original | Seagate | -| Disk 6 | ST8000DM004 | 8TB | ~5.2TB | Original | Seagate | +Since parity is already 16TB, just buy one replacement data drive up to 16TB. No parity upgrade needed. -- **Usable capacity: 30.6TB** (16 + 7.3 + 7.3) -- **Data stored: ~25TB** with ~6TB free -- **Growth runway: ~6 years** at 1TB/year -- **Annual failure rate: ~4%** (down from ~19%) -- **3 empty drive bays** for future expansion +**Recommended replacements** (for brand diversity): +- Toshiba MG08/MG10 16TB (~$200) — enterprise CMR +- WD Ultrastar HC550 16TB (~$230) — enterprise CMR -## Optional Future Upgrade +**Procedure:** +1. Stop array, power down +2. Swap the failed/full 8TB with the new 16TB +3. Start array → rebuild (~18-20 hrs) +4. Done — you just gained 8TB of usable space -When Disk 2 (5.3yr) or Disk 6 (2.5yr) eventually show wear, or if you want more space: - -| Replace | With | Cost | New usable | -|---------|------|------|------------| -| Disk 2 → 16TB enterprise | Toshiba or WD | ~$200-230 | 38.6TB | -| Disk 6 → 16TB enterprise | Toshiba or WD | ~$200-230 | 48TB | - -Going all-enterprise (5x 16TB) costs an additional ~$430 and gives 48TB usable (23+ years of growth). No rush — Disk 6 especially is young and healthy. - -## If a Drive Fails Before Scheduled Replacement - -**Parity drive fails:** -- Buy the replacement Exos X18 16TB -- Swap, parity sync, done -- Other parity still protects the array during sync - -**Data drive fails (8TB):** -- Array keeps running — data is emulated from parity in real-time -- Buy a 16TB replacement (must upgrade parity first if no parity is ≥16TB yet) -- Swap, rebuild, done - -**Data drive fails (new 16TB):** -- Same process, buy another 16TB -- Dual parity protects against up to 2 simultaneous failures - -## Drives to Keep After Migration - -| Drive | Reason | -|-------|--------| -| WD Red 4TB (WD-WCC7K2YF4TDJ) | Cold spare — pristine SMART despite 6.6yr age | -| Disk 5 8TB (ZR14W632) | Emergency spare — only 2.5yr old | +Each time you replace an 8TB with a 16TB, your usable capacity grows: +- 1 replaced: 37.2TB usable +- 2 replaced: 45.2TB usable +- 3 replaced: 53.2TB usable +- All 4 replaced: 61.2TB usable ## Drives to Recycle +Wipe before recycling if concerned about data: + | Drive | Reason | |-------|--------| | Parity 1 (ZCT2K65B) | 180 GP error history | | Parity 2 (ZCT2K5PQ) | 19 GP error history | -| Disk 3 (ZCT3XAEJ) | 10 GP errors + 2 CRC + 72K load cycles | -| Disk 1 (ZCT2K3FF) | Same batch as parity drives, 5.3yr | +| Disk 3 (ZCT3XAEJ) | 10 GP + 2 CRC + 72K load cycles | + +## Drives to Keep as Spares + +| Drive | Use | +|-------|-----| +| Disk 4 — WD Red 4TB (WD-WCC7K2YF4TDJ) | Cold backup via USB dock. Pristine SMART. Monthly rsync of irreplaceable files (photos, documents, configs). | + +## SMART Test Schedule (already running) + +- Weekly short: Sundays 2:00 AM +- Monthly extended: 1st of month 3:00 AM +- Configured via User Scripts plugin +- Monitor results in Main → click any drive → SMART attributes ## Pending Config Fixes -These can be done anytime from the Unraid WebGUI — no downtime needed: +Can be done anytime from WebGUI, no downtime needed: -- [ ] Remove compose.manager plugin (Settings → Plugins → click X) -- [ ] Set disk share security to Private (Settings → Global Share Settings → Disk Shares → No) -- [ ] Set user share security to Private (Shares → each share → SMB Security → Private) -- [ ] Move domains share to cache (Shares → domains → Use Cache → Prefer, Cache Pool → cache) -- [ ] Set up email notifications (Settings → Notification Settings) -- [ ] Configure UPS auto-shutdown (Settings → UPS Settings — CyberPower CP1500 via USB, nut-dw plugin installed) -- [ ] Enable XMP in BIOS at next reboot (RAM: G.Skill DDR3 1600MHz running at 1333MHz) -- [ ] Verify appdata backups — test a restore - -## SMART Test Schedule (already configured) - -- Weekly short: Sundays 2:00 AM (`0 2 * * 0`) -- Monthly extended: 1st of month 3:00 AM (`0 3 1 * *`) -- Configured via User Scripts plugin +- [ ] Remove compose.manager plugin (Settings → Plugins → click X next to it) +- [ ] Disable disk shares (Settings → Global Share Settings → Enable disk shares → No) +- [ ] Set user share security to Private for exported shares (Shares → isos → SMB Security → Private; repeat for any other exported shares) +- [ ] Move domains share to cache (Shares → domains → Primary storage → Cache → Cache Pool → cache) +- [ ] Set up email notifications (Settings → Notification Settings — use Gmail + App Password) +- [ ] Configure UPS auto-shutdown (Settings → UPS Settings — CyberPower CP1500, USB, battery level 20%) +- [ ] Enable XMP in BIOS at next reboot (G.Skill DDR3 running 1333 MT/s, rated 1600 MT/s) +- [ ] Verify appdata backups work — test a restore of one container