vault backup: 2026-02-11 11:14:35

This commit is contained in:
Yanxin Lu
2026-02-11 11:14:35 -08:00
parent f39e84e218
commit dc8a732ceb
39 changed files with 1142 additions and 0 deletions

View File

@@ -0,0 +1,71 @@
# Convert Martial Arts PDF Notes to Markdown
Convert all handwritten martial arts training note PDFs in `notes/martial_arts/` (including subdirectories) into structured markdown files for the Obsidian vault.
## Pipeline
1. **Discover PDFs**: Glob `notes/martial_arts/**/*.pdf` recursively. Skip any PDF that already has a matching `.md` file in the same directory (safe to restart after crashes).
2. **Convert PDF to PNG**: For each unconverted PDF, create a unique temp directory and run:
```
mkdir -p /tmp/pdf_pages/<basename>
pdftoppm -png -r 200 <input.pdf> /tmp/pdf_pages/<basename>/page
```
Requires `poppler` (`brew install poppler`).
3. **Launch Task subagents**: For each PDF, launch a `general-purpose` Task subagent in the background. Each subagent:
- Reads the PNG page images visually
- Transcribes the handwritten content (mix of English and Chinese)
- Writes a `.md` file in the same directory as the source PDF
Use background subagents to process multiple PDFs in parallel (batches of ~5). Each subagent gets fresh context, preventing the 30MB API request limit from being hit.
4. **Verify**: After all subagents complete, confirm every PDF has a matching `.md` file.
## Markdown Format
All generated `.md` files must include YAML frontmatter matching `templates/武术笔记.md`:
```markdown
---
类型: 笔记
tags:
- 笔记
- 武术
日期: <date from filename, e.g. 2024-08-06>
老师: <instructor name from filename>
武术: <martial art name>
---
# [Title]
**日期**: MM.DD
## 1. [Section Title]
a. [detail]
b. [detail]
```
### Filename pattern
Filenames follow: `<art>-<YYYY.MM.DD>-<instructor>.pdf`
- Extract date, instructor, and art from the filename.
### Title conventions by art
- **FMA/Silat/SEAMA**: `# [Art] — [Instructor] 师傅`
- **八极拳 (Bajiquan)**: `# 八极拳 Lesson [NNN]` (identify lesson number from the handwritten notes if possible)
- **劈挂拳 (Piguaquan)**: `# 劈挂拳 — [Instructor] 师傅`
- **Other** (MMA, Muay Thai, Lethwei, etc.): `# [Art] — [Instructor] 师傅`
### 武术 field values
Use these canonical names: `FMA`, `Silat`, `SEAMA`, `八极拳`, `劈挂拳`, `MMA`, `Muay Thai`, `Lethwei`
## Subagent Prompt Template
When launching each Task subagent, provide:
- The list of PNG file paths to read visually
- The output `.md` file path
- The pre-filled YAML frontmatter
- The title to use
- An example of a completed conversion for reference
- Instructions to transcribe faithfully, preserving both English and Chinese as written