Warm tip: This article is reproduced from stackoverflow.com, please click
ipython jupyter latex python workflow

Difficult workflow writing Latex book full of Python code

发布于 2020-04-16 11:45:46

I'm writing a book on coding in python using Latex. I plan on having a lot of text with python code interspersed throughout, along with its output. What's really giving me trouble is when I need to go back and edit my python code, it's a huge pain to get it back nicely into my latest document.

I've done a whole lot of research and can't seem to find a good solution.

This one includes full files as one, doesn't solve my issues https://tex.stackexchange.com/questions/289385/workflow-for-including-jupyter-aka-ipython-notebooks-as-pages-in-a-latex-docum

Same with this one. http://blog.juliusschulz.de/blog/ultimate-ipython-notebook

Found Solution 1 (awful)

I can copy and paste python code into latex ok using the listings latex package.

Pros:

  1. Easy to update only small section of code.

Cons:

  1. For output need to run in python, copy, paste separately.
  2. Initial writing SLOW, need to do this process hundreds of times per chapter.

Found Solution 2 (bad)

Use jupyter notebook with markdown, export to Latex, \include file into main Latex document.

Pros:

  1. Streamlined
  2. Has output contained within.

Cons:

  • To make small changes, need to reimport whole document, any changes made to markdown text within Latex editor are not saved
  • Renaming a single variable in python after jupyter notebook could take hours.
  • Editing seems like a giant chore.

Ideal solution

  • Write Text in Latex
  • Write python in jupyter notebook, export to latex.
  • Somehow include code snippets (small sections of the exported file) into different parts of the main latex book. This is the part I can't figure out
  • When python changes are needed, changes in jupyter, then re-export as latex file with same name
  • Latex book is automatically updated from includes.

The key here is that the exported python notebook is being split up and sent to different parts of the document. In order for that to work it needs to somehow be tagged or marked in the markdown or code of the notebook, so when I re-export it those same parts get sent to the same spots in the book.

Pros:

  1. Python edits easy, easily propagated back to book.
  2. Text written in latex, can use power of latex

Any help in coming up with a solution closer to my ideal solution would be much appreciated. It's killing me.

Probably doesn't matter, but I'm coding both latex and jupyter notebooks in VS Code. I'm open to changing tools if it means solving these problems.

Questioner
Adam B
Viewed
39
ymonad 2019-03-09 21:32

Here's a small script I wrote. It splits single *.ipynb file and converts it to multiple *.tex file.

Usage is:

  1. copy following script and save as something like main.py
  2. execute python main.py init. it will create main.tex and style_ipython_custom.tplx
  3. in your jupyther notebook, add extra line #latex:tag_a, #latex:tag_b, .. to each cell which you want to extract. same tag will be extracted to same *.tex file.
  4. save it as *.ipynb file. fortunately, current VSCode python plugin supports exporting to *.ipynb, or use jupytext to convert from *.py to *.ipynb.
  5. run python main.py path/to/your.ipynb and it will create tag_a.tex and tag_b.tex
  6. edit main.tex and add \input{tag_a.tex} or \input{tag_b.tex} where ever you want.
  7. run pdflatex main.tex and it will produce main.pdf

The idea behind this script:

Converting from jupyter notebook to LaTex using default nbconvert.LatexExporter produces complete LaTex file which includes macro definitions. Using it to convert each cell will may create large LaTex file. To avoid the problem, the script first creates main.tex which has only macro definitions, and then converts each cell to LaTex file which has no macro defnition. This can be done using custom template file which is slightly modified from style_ipython.tplx

Tagging or marking the cell might be done using cell metadata, but I could not find how to set it in VSCode python plugin (Issue), so instead it scans source of each cell with regex pattern ^#latex:(.*), and remove it before converting it to LaTex file.

Source:

import sys
import re
import os
from collections import defaultdict
import nbformat
from nbconvert import LatexExporter, exporters

OUTPUT_FILES_DIR = './images'
CUSTOM_TEMPLATE = 'style_ipython_custom.tplx'
MAIN_TEX = 'main.tex'


def create_main():
    # creates `main.tex` which only has macro definition
    latex_exporter = LatexExporter()
    book = nbformat.v4.new_notebook()
    book.cells.append(
        nbformat.v4.new_raw_cell(r'\input{__your_input__here.tex}'))
    (body, _) = latex_exporter.from_notebook_node(book)
    with open(MAIN_TEX, 'x') as fout:
        fout.write(body)
    print("created:", MAIN_TEX)


def init():
    create_main()
    latex_exporter = LatexExporter()
    # copy `style_ipython.tplx` in `nbconvert.exporters` module to current directory,
    # and modify it so that it does not contain macro definition
    tmpl_path = os.path.join(
        os.path.dirname(exporters.__file__),
        latex_exporter.default_template_path)
    src = os.path.join(tmpl_path, 'style_ipython.tplx')
    target = CUSTOM_TEMPLATE
    with open(src) as fsrc:
        with open(target, 'w') as ftarget:
            for line in fsrc:
                # replace the line so than it does not contain macro definition
                if line == "((*- extends 'base.tplx' -*))\n":
                    line = "((*- extends 'document_contents.tplx' -*))\n"
                ftarget.write(line)
    print("created:", CUSTOM_TEMPLATE)


def group_cells(note):
    # scan the cell source for tag with regexp `^#latex:(.*)`
    # if sames tags are found group it to same list
    pattern = re.compile(r'^#latex:(.*?)$(\n?)', re.M)
    group = defaultdict(list)
    for num, cell in enumerate(note.cells):
        m = pattern.search(cell.source)
        if m:
            tag = m.group(1).strip()
            # remove the line which contains tag
            cell.source = cell.source[:m.start(0)] + cell.source[m.end(0):]
            group[tag].append(cell)
        else:
            print("tag not found in cell number {}. ignore".format(num + 1))
    return group


def doit():
    with open(sys.argv[1]) as f:
        note = nbformat.read(f, as_version=4)
    group = group_cells(note)
    latex_exporter = LatexExporter()
    # use the template which does not contain LaTex macro definition
    latex_exporter.template_file = CUSTOM_TEMPLATE
    try:
        os.mkdir(OUTPUT_FILES_DIR)
    except FileExistsError:
        pass
    for (tag, g) in group.items():
        book = nbformat.v4.new_notebook()
        book.cells.extend(g)
        # unique_key will be prefix of image
        (body, resources) = latex_exporter.from_notebook_node(
            book,
            resources={
                'output_files_dir': OUTPUT_FILES_DIR,
                'unique_key': tag
            })
        ofile = tag + '.tex'
        with open(ofile, 'w') as fout:
            fout.write(body)
            print("created:", ofile)
        # the image data which is embedded as base64 in notebook
        # will be decoded and returned in `resources`, so write it to file
        for filename, data in resources.get('outputs', {}).items():
            with open(filename, 'wb') as fres:
                fres.write(data)
                print("created:", filename)


if len(sys.argv) <= 1:
    print("USAGE: this_script [init|yourfile.ipynb]")
elif sys.argv[1] == "init":
    init()
else:
    doit()