Warm tip: This article is reproduced from stackoverflow.com, please click
latex tex pdflatex

Is there any way to extract text from .tex file?

发布于 2020-04-16 12:10:20

I am writing a program to count words in a file. I am facing problems while parsing .tex files.

This code need to go on a website where it has to count words from the file that is being uploaded. I have managed to do it but I am looing for some better solutions

case "application/x-tex": // Avoid words with '\' and count
            Scanner sc1;
            try {
                sc1 = new Scanner(new URL(URLPath).openStream());
                while (sc1.hasNext()) {
                    String str = sc1.next();
                    if (!str.contains("\\")) {
                        System.out.print(str + " ");
                        wordCount++;
                    }
                }
                sc1.close();
            } catch (IOException e) {
                System.out.println("There was a problem while reading File on the URL");
                break;
//              e.printStackTrace();
            }
            if (wordCount <= 0) {
                System.out.println("Total count is " + wordCount
                        + ". The uploaded File is either empty or it consists of Images only");
            } else {
                System.out.println("");
                System.out.println("**********");
                System.out.println("Word Count: " + wordCount);
                System.out.println("**********");
                System.out.println("");
            }
            break;

I am expecting a String output which I could further use to count words.

Questioner
Keshavram Kuduwa
Viewed
40
4,629 2020-02-04 17:02

// Trigger perl script

URL website = new URL(URLPath);
Path path = Paths.get("myfile.tex");
bufferFiles.add(new File("myFile.tex"));
try (InputStream in = website.openStream()) {
    Files.copy(in, path, StandardCopyOption.REPLACE_EXISTING);
}

URL texcount = new URL("https://papertrue.s3.us-west-1.amazonaws.com/draft/77e3c992-b70f-4711-8b9b-eaf390617bb8");
Path path1 = Paths.get("texcount.pl");
bufferFiles.add(new File("texcount.pl"));
try (InputStream in = texcount.openStream()) {
    Files.copy(in, path1, StandardCopyOption.REPLACE_EXISTING);
}

wordCount = 0;
Process process;
try {
    process = Runtime.getRuntime().exec("/etc/papertrue/texcount.pl myfile.tex");
    InputStream is = process.getInputStream();
    InputStreamReader isr = new InputStreamReader(is);
    BufferedReader br = new BufferedReader(isr);
    String line;
    while ((line = br.readLine()) != null) {
        String tem[] = line.split(":\\s");
        log.debug(tem[tem.length - 1]);
        try {
            wordCount += Integer.parseInt(tem[tem.length - 1]);
        } catch (Exception e) {

        }
    }

    process.waitFor();
    if (process.exitValue() == 0) {
        log.debug("Command Successful");
    } else {
        log.debug("Command Failure");
    }
    log.debug(wordCount);
} catch (IOException e) {
    log.debug("There was a problem while reading File on the URL");
    e.printStackTrace();
    break;
}