Projects / open-source
open-sourceRustgraphsregextext-processingpdfclireference-manager~1 weekRemoteOpen
Citation Extraction Pipeline (Rust, Elaine CLI)
About the project
Extending a Rust CLI for reference management (Elaine)
Deterministic citation extraction from messy PDFs.
GitHub Repo: elaine-cli
The task
Improve an existing citation extraction pipeline. It works, but only on some PDFs.
Goal:
- improve parsing across formats
- handle messy layouts
- refine deterministic heuristics (no AI)
Inspired by citracer, but simpler and local-first.
Scoped task — not a full rewrite.
What you're working with
Prototype exists:
- PDF extraction (
pdftotext) - normalization
- reference parsing
- CLI (
eln trace)
You’ll get a working branch and build on it.
Success
- Works on real PDFs
- Fewer parsing failures
- Clean, simple code
What you get
- Real parsing experience
- Work under constraints
- Open-source contribution
About Andrew Garcia ↗
Founder of WorkDog
I like problems that seem complicated until you find the simpler way through them — sometimes after making them unnecessarily complicated first.
Full-stack developer. I build things that work for the user, not just things that make me feel smart.
Working on WorkDog: a platform for taking on projects and building proof of work.
Still building
~🐧
Oh! This is Markdown-compatible! So are project postings 😯
What you get
01A real project in your portfolio with your name on it
02A public review from the business on your WorkDog profile
03Proof of work that you can share anywhere — LinkedIn, GitHub, CV
04Direct experience working with a real client
Want to apply?
You need a WorkDog account first.