Quick Start
Download the latest release:
curl -o textworks.tar.gz -L https://github.com/iesl/watr-works/releases/download/v0.8/textworks-0.8.tar.gz
tar xzvf textworks.tar.gz
and run
path/to/textworks/bin/textworks --help
for option syntax.
Examples
# Single file
textworks --input input.pdf --output-file output.json
# Multiple files specified in list, output will be input file name + ext
textworks --input-list ./files-to-process.txt --output-ext ".wtxt.json"
Corpus Management
# Run on all files in corpus (see Corpus Management for explanation)
textworks --corpus --output-file "extracted-text.json"
Output Formats
Text layout and analysis options
# Single file
textworks --input input.pdf --output-file output.json --options dehyphenate