Warm tip: This article is reproduced from serverfault.com, please click

iText7: com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data

发布于 2020-12-03 08:14:52

I try to generate a toc(table of content) for my pdf, and I want to get some strings which look like chapter title in xxx.pdf using ITextExtractionStrategy. But I got com.itextpdf.kernel.PdfException when I am running a test.

Here is my code:

    @org.junit.Test
    public void test() throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PdfDocument pdfDoc = new PdfDocument(new PdfReader("src/test/resources/template/xxx.pdf"),
                new PdfWriter(baos));

        pdfDoc.addNewPage(1);
        Document document = new Document(pdfDoc);

        // when add this code, throw com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data.
        Paragraph title =  new Paragraph(new Text("index"))
                .setTextAlignment(TextAlignment.CENTER);
        document.add(title);

        SimpleTextExtractionStrategy extractionStrategy = new SimpleTextExtractionStrategy();
        for (int i = 1; i < pdfDoc.getNumberOfPages(); i++) {
            PdfPage page = pdfDoc.getPage(i);
            PdfCanvasProcessor parser = new PdfCanvasProcessor(extractionStrategy);
            parser.processPageContent(page);
        }
        ...

        document.close();
        pdfDoc.close();
        new FileOutputStream("./yyy.pdf").write(baos.toByteArray());
    }

Here is the output:

com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data.

    at com.itextpdf.kernel.font.PdfFontFactory.createFont(PdfFontFactory.java:123)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.getFont(PdfCanvasProcessor.java:490)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$SetTextFontOperator.invoke(PdfCanvasProcessor.java:811)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:454)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:282)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:303)
    at com.example.pdf.util.Test.test(Test.java:138)
Questioner
Bottle
Viewed
0
mkl 2020-12-03 17:44:58

Whenever you add content to a PdfDocument like you do here

Document document = new Document(pdfDoc);

Paragraph title = new Paragraph(new Text("index"))
    .setTextAlignment(TextAlignment.CENTER);
document.add(title);

you have to be aware that this content is not already stored in its final form; for example fonts used are not yet properly subset'ed. The final form is generated when you're closing the document.

Text extraction on the other hand requires the content to extract to be in its final form.

Thus, you should not apply text extraction to a document you're working on. In particular, don't apply text extraction to a page you've changed the content of.

If you need to extract text from the documents you create yourself, close your document first, open a new document from the output, and extract from that new document.