Document Conversion with SensusAccess
SensusAccess is a self-service document conversion tool that is available to Tufts students, faculty, and staff. SensusAccess can be used to convert:
- Untagged or scanned PDF to tagged PDF
- Untagged or scanned PDF to MP3
- Untagged or scanned PDF to digital Braille
- Accessible Word document to EPUB, EPUB3 with media overlay, or Mobi Pocket
- TeX and LaTeX documents containing mathematical equations to HTML5 with MathML
- Visit SensusAccess Conversion Options for a complete list
To use SensusAccess, visit the Tufts SensusAccess: Alternate Media Made Easy portal.
SensusAccess can be used to convert untagged or image (scanned) PDF documents to a tagged PDF, which is marked up for screen reader compatibility.
To convert an untagged PDF:
- Go to the Tufts SensusAccess Portal.
- Choose File from the Source options.
- Click Choose Files and select the file you want to convert
- Click Upload (there may be a delay while your file is uploaded)
- Choose your Target Format
- MP3 Audio will perform OCR (optical character recognition) and convert the recognized text to an mp3 audio file
- Braille will perform OCR and transcribe the recognized text to be embossed on a Braille embosser, displayed on a Braille display, or loaded onto a Braille notetaker.
- E-book will perform OCR and convert the recognized text to a simple EPUB or Mobi (Kindle) format
- Accessibility Conversion will perform OCR and convert the recognized text to Word, HTML, plain text, or tagged PDF.
- Enter your tufts.edu email address and click Submit.
Tagged PDF Conversion Options
When selecting your target format for tagged PDF you will see two options:
- Tagged PDF (text over image)
- Tagged PDF (image over text)
Selecting the text over image will cause untagged/scanned PDF documents to be OCR processed and returned with the recognized text in a layer on top of the original image. In most cases, presenting the recognized text on top of the original image will result in much clearer text. However, logos and other graphical elements may appear blurred or even appear disfigured.
Selecting the image over text will cause untagged/scanned PDF documents to be OCR processed and returned with the original image in a layer on top of the recognized text. Presenting the original image on top of the recognized text will retain all original graphical elements, but the visual presentation of the text will not be sharpened.
The quality of the text recognition is identical in the two options.
Caveats
- SensusAccess does not repair inaccessible tagged PDF documents. It is designed for untagged/scanned PDFs and therefore treats every document as untagged. If you already have a tagged PDF that needs minor repair, SensusAccess is not a good option.
- SensusAccess does not guarantee WCAG or PDF/UA compliant results. Most documents converted to tagged PDF with SensusAccess will need further remediation to reach WCAG or PDF/UA compliance.
- SensusAccess cannot convert protected (DRM) documents. Documents must be unlocked before conversion.
- SensusAccess cannot recognize mathematic or scientific notation, with the exception of notation that has been properly marked up in a Word document and the output is intended as an advanced EPUB3.
- SensusAccess can convert a PowerPoint file to PDF and it will retain slide titles, logical reading order, alternative text that was provided by the author. If you choose to include slide notes however, the slides will be converted as images so be sure to test your output.
SensusAccess can be used to convert text to a speech synthesized mp3 recording. It can convert a wide range of document types into MP3. For some languages, SensusAccess offers several voice alternatives. When converting into MP3, the selected language of the speech synthesizer must match the natural language of the document.
When converting to MP3, users can select their preferred reading speed:
- slow, slower, slowest
- fast, faster, fastest
Note that the quality of the mp3 file is wholly dependent on the quality of the text recognition. Poor quality scanned documents will not yield good results.
E-book formats offer users the flexibility to modify fonts, font sizes, line spacing, colors, and contrast in order to meet the needs of users with low-vision or reading disabilities like dyslexia. E-books can be used in e-book readers like Readium, Adobe Digital Editions, Apple iBooks, VI Reader, Menestrello for mobile, or on devices like the Amazon Kindle.
SensusAccess can be used to create simple (EPUB or Mobi) e-books or more advanced e-books (EPUB3 or EPUB3 with Media Overlay). Advanced e-books contain navigation, notes, references, and support mathematical notation.
- For the best quality, start with an accessible Word (.docx) document (semantic markup, metadata, language settings, mathematical markup, etc.)
- Untagged/scanned PDF to e-book conversion results in poor quality e-book. Note: If you try to convert an untagged PDF to an advanced e-book format like EPUB3 you will get an error saying the conversion failed.
Source documents in TeX and LaTeX containing mathematical equations can be converted into HTML5 with the mathematical contents represented as MathML. Documents may be uploaded as individual .TEX files or as self-contained ZIP archives.