Files
obsidian-yanxin/.claude/commands/convert-pdfs.md
2026-02-11 11:14:35 -08:00

2.5 KiB

Convert Martial Arts PDF Notes to Markdown

Convert all handwritten martial arts training note PDFs in notes/martial_arts/ (including subdirectories) into structured markdown files for the Obsidian vault.

Pipeline

  1. Discover PDFs: Glob notes/martial_arts/**/*.pdf recursively. Skip any PDF that already has a matching .md file in the same directory (safe to restart after crashes).

  2. Convert PDF to PNG: For each unconverted PDF, create a unique temp directory and run:

    mkdir -p /tmp/pdf_pages/<basename>
    pdftoppm -png -r 200 <input.pdf> /tmp/pdf_pages/<basename>/page
    

    Requires poppler (brew install poppler).

  3. Launch Task subagents: For each PDF, launch a general-purpose Task subagent in the background. Each subagent:

    • Reads the PNG page images visually
    • Transcribes the handwritten content (mix of English and Chinese)
    • Writes a .md file in the same directory as the source PDF

    Use background subagents to process multiple PDFs in parallel (batches of ~5). Each subagent gets fresh context, preventing the 30MB API request limit from being hit.

  4. Verify: After all subagents complete, confirm every PDF has a matching .md file.

Markdown Format

All generated .md files must include YAML frontmatter matching templates/武术笔记.md:

---
类型: 笔记
tags:
  - 笔记
  - 武术
日期: <date from filename, e.g. 2024-08-06>
老师: <instructor name from filename>
武术: <martial art name>
---

# [Title]

**日期**: MM.DD

## 1. [Section Title]

a. [detail]
b. [detail]

Filename pattern

Filenames follow: <art>-<YYYY.MM.DD>-<instructor>.pdf

  • Extract date, instructor, and art from the filename.

Title conventions by art

  • FMA/Silat/SEAMA: # [Art] — [Instructor] 师傅
  • 八极拳 (Bajiquan): # 八极拳 Lesson [NNN] (identify lesson number from the handwritten notes if possible)
  • 劈挂拳 (Piguaquan): # 劈挂拳 — [Instructor] 师傅
  • Other (MMA, Muay Thai, Lethwei, etc.): # [Art] — [Instructor] 师傅

武术 field values

Use these canonical names: FMA, Silat, SEAMA, 八极拳, 劈挂拳, MMA, Muay Thai, Lethwei

Subagent Prompt Template

When launching each Task subagent, provide:

  • The list of PNG file paths to read visually
  • The output .md file path
  • The pre-filled YAML frontmatter
  • The title to use
  • An example of a completed conversion for reference
  • Instructions to transcribe faithfully, preserving both English and Chinese as written