Today, I was looking for some tools that I could use to extract the text from a PDF document. I found the pdftotext library, which is very easy to use and has a nice CLI:

pdftotext Customer_Verification_form_EN_v2.pdf #=> it creates Customer_Verification_form_EN_v2.txt

Here below, I made a small program that extracts the text from a PDF document and outputs it in the terminal:

package main

import (
"fmt"
"os/exec"
)

func main() {
// Extract text from a PDF document using pdftotext library and output it in terminal.
    cmd := exec.Command("pdftotext", "Customer_Verification_form_EN_v2.pdf", "-")
    output, err := cmd.Output()
    if err != nil {
        panic(err)
    }
    fmt.Println(string(output))
}

Next, I will integrate it with the text-to-speech service provided by Google Cloud and see how that goes. The goal is to make a simple PDF reader with voice.