First 2011 Hack

by Mark Luffel

Mark was at Octane to hack for the first time in many weeks.
Tejus arrived later and they chatted about Vertical Acuity, Pardot, SITA, Georgia Tech, etc.

Mark wrote some python, installed nltk, pyPdf, xpdf, plasTeX, and probably some other software too. He’s building something that makes visual art out of abstract math.

Things learned:
1) Many PDFs don’t contain word breaks, they just position the letters in the right spot on the page, gasp!
2) Installing pdftotext via MacPorts installs all sorts of stuff: OpenMotif, libxml2, xorg-libXdmcp, etc, etc, etc.
3) pdftolatex is pretty sweet
4) Apple’s Automator can convert PDFs into text, and by default outputs UTF-16, which scares tools like diff into thinking it is binary
5) arXiv.org hosts the LaTeX for papers, which maintains lots of extra contextual information, which is awesome
6) The world sure is full of things

Leave a Reply