Archive for February, 2011

First 2011 Hack

Wednesday, February 16th, 2011

Mark was at Octane to hack for the first time in many weeks.
Tejus arrived later and they chatted about Vertical Acuity, Pardot, SITA, Georgia Tech, etc.

Mark wrote some python, installed nltk, pyPdf, xpdf, plasTeX, and probably some other software too. He’s building something that makes visual art out of abstract math.

Things learned:
1) Many PDFs don’t contain word breaks, they just position the letters in the right spot on the page, gasp!
2) Installing pdftotext via MacPorts installs all sorts of stuff: OpenMotif, libxml2, xorg-libXdmcp, etc, etc, etc.
3) pdftolatex is pretty sweet
4) Apple’s Automator can convert PDFs into text, and by default outputs UTF-16, which scares tools like diff into thinking it is binary
5) hosts the LaTeX for papers, which maintains lots of extra contextual information, which is awesome
6) The world sure is full of things