Preserve text format on read/write to shape text python pptx

scanny 2020-02-02 06:05

@usr2564301 is on the right track. Character formatting (aka. "font") is specified at the run level. This is what a run is; a "run" (sequence) of characters all sharing the same character formatting.

When you assign to shape.text you replace all the runs that used to be there with a single new run having default formatting. If you want to preserve formatting you need to preserve whatever runs are not directly involved in the text replacement.

This is not a trivial problem because there is no guarantee runs break on word boundaries. Try printing out the runs for a few paragraphs and I think you'll see what I mean.

In rough pseudocode, I think this is the approach you would need to take:

do your search for the target text in the paragraph to determine the offset of its first character.
traverse all the runs in the paragraph keeping a running total of how many characters there are before each run, maybe something like (run_idx, prefix_len, length): (0, 0, 8), (1, 8, 4), (2, 12, 9), etc.
Identify which run is the starting, ending, and in-between runs involving your search string.
Split the first run at the start of the search term, split the last run at the end of the search term, and delete all but the first of the "middle" runs.
Change the text of the middle run to the replacement text and clone the formatting from the prior (original start) run. Maybe this last bit you do at split-start time.

This preserves any runs that do not involve the search string and preserves the formatting of the "matched" word in the "replaced" word.

This requires a few operations that are not directly supported by the current API. For those you'd need to use lower-level lxml calls to directly manipulate the XML, although you could get hold of all the existing elements you need from python-pptx objects without ever having to parse in the XML yourself.

Michael Berk 2020-02-03 05:16:28

Thanks for the answer - it worked! I wrote a less efficient solution, but it was very straight forward to code: 1) Iterate through runs and store text and format. 2) Perform text replacement on stored text. 3) Clear paragraph and write each run (with new text and format). Note: because you can't write to run.font, you have to choose which font properties to store (which isn't ideal).

Guilherme Henrique Mendes 2020-03-30 05:26:59

@MichaelBerk Can you share your code/solution?

Related issues

How to use python cut method to create bins, accept one parameter and return appropriate bin?

Create a dictionary from a list of lists with certain criteria

selecting columns based on row value, Python, Pandas

plotting count of zeros and ones in a dataframe

BeautifulSoup find.all() web scraping returns empty

python function. output a keys list from a dictionary if the key is todays date

Best way to perform multiple amount of Pandas lookups between two DataFrames

How to get the number of columns and the width of each column in a Pandas pivot table?

Display a column when a desired value is missing while grouping in Pandas dataframe

Python hide ticks but show tick labels