72 lines
2.5 KiB
Markdown
72 lines
2.5 KiB
Markdown
# Convert Martial Arts PDF Notes to Markdown
|
|
|
|
Convert all handwritten martial arts training note PDFs in `notes/martial_arts/` (including subdirectories) into structured markdown files for the Obsidian vault.
|
|
|
|
## Pipeline
|
|
|
|
1. **Discover PDFs**: Glob `notes/martial_arts/**/*.pdf` recursively. Skip any PDF that already has a matching `.md` file in the same directory (safe to restart after crashes).
|
|
|
|
2. **Convert PDF to PNG**: For each unconverted PDF, create a unique temp directory and run:
|
|
```
|
|
mkdir -p /tmp/pdf_pages/<basename>
|
|
pdftoppm -png -r 200 <input.pdf> /tmp/pdf_pages/<basename>/page
|
|
```
|
|
Requires `poppler` (`brew install poppler`).
|
|
|
|
3. **Launch Task subagents**: For each PDF, launch a `general-purpose` Task subagent in the background. Each subagent:
|
|
- Reads the PNG page images visually
|
|
- Transcribes the handwritten content (mix of English and Chinese)
|
|
- Writes a `.md` file in the same directory as the source PDF
|
|
|
|
Use background subagents to process multiple PDFs in parallel (batches of ~5). Each subagent gets fresh context, preventing the 30MB API request limit from being hit.
|
|
|
|
4. **Verify**: After all subagents complete, confirm every PDF has a matching `.md` file.
|
|
|
|
## Markdown Format
|
|
|
|
All generated `.md` files must include YAML frontmatter matching `templates/武术笔记.md`:
|
|
|
|
```markdown
|
|
---
|
|
类型: 笔记
|
|
tags:
|
|
- 笔记
|
|
- 武术
|
|
日期: <date from filename, e.g. 2024-08-06>
|
|
老师: <instructor name from filename>
|
|
武术: <martial art name>
|
|
---
|
|
|
|
# [Title]
|
|
|
|
**日期**: MM.DD
|
|
|
|
## 1. [Section Title]
|
|
|
|
a. [detail]
|
|
b. [detail]
|
|
```
|
|
|
|
### Filename pattern
|
|
Filenames follow: `<art>-<YYYY.MM.DD>-<instructor>.pdf`
|
|
- Extract date, instructor, and art from the filename.
|
|
|
|
### Title conventions by art
|
|
- **FMA/Silat/SEAMA**: `# [Art] — [Instructor] 师傅`
|
|
- **八极拳 (Bajiquan)**: `# 八极拳 Lesson [NNN]` (identify lesson number from the handwritten notes if possible)
|
|
- **劈挂拳 (Piguaquan)**: `# 劈挂拳 — [Instructor] 师傅`
|
|
- **Other** (MMA, Muay Thai, Lethwei, etc.): `# [Art] — [Instructor] 师傅`
|
|
|
|
### 武术 field values
|
|
Use these canonical names: `FMA`, `Silat`, `SEAMA`, `八极拳`, `劈挂拳`, `MMA`, `Muay Thai`, `Lethwei`
|
|
|
|
## Subagent Prompt Template
|
|
|
|
When launching each Task subagent, provide:
|
|
- The list of PNG file paths to read visually
|
|
- The output `.md` file path
|
|
- The pre-filled YAML frontmatter
|
|
- The title to use
|
|
- An example of a completed conversion for reference
|
|
- Instructions to transcribe faithfully, preserving both English and Chinese as written
|