Warm tip: This article is reproduced from stackoverflow.com, please click
csv pipe python

"_csv.Error: line contains NULL byte" after truncating a csv log file that is being piped to by anot

发布于 2020-04-11 11:49:49

How do I truncate a csv log file that is being used as std out pipe destination from another process without generating a _csv.Error: line contains NULL byte error?

I have one process running rtlamr > log/readings.txt that is piping radio signal data to readings.txt. I don't think it matters what is piping to the file--any long-running pipe process will do.

I have a file watcher using watchdog (Python file watcher) on that file, which triggers a function when the file is changed. The function read the files and updates a database.

Then I try to truncate readings.txt so that it doesn't grow infinitely (or back it up).

file = open(dir_path+'/log/readings.txt', "w")
file.truncate()
file.close()

This corrupts readings.txt and generates the error (the start of the file contains garbage characters).

I tried moving the file instead of truncating it, in the hopes that rtlamr will recreate a fresh file, but that only has the effect of stopping the pipe.

EDIT I noticed that the charset changes from us-ascii to binary but attempting to truncate the file with file = open(dir_path+'/log/readings.log', "w",encoding="us-ascii") does not do anything.

Questioner
metalaureate
Viewed
66
ivan_pozdeev 2020-02-02 08:31

If you truncate a file1 while another process has it open in w mode, that process will continue to write to the same offsets, making the file sparse. Low offsets will thus be read as 0s.

As per x11 - Concurrent writing to a log file from many processes - Unix & Linux Stack Exchange and Can two Unix processes simultaneous write to different positions in a single file?, each process that has a file open has its own offset in it, and a ftruncate() doesn't change that.

If you want the other process to react to truncation, it needs to have it open in a mode.


Your approach has principal bugs, too. E.g. it's not atomic: you may (=will, eventually) truncate the file after the producer has added data but before you have read it so it would get lost.

Consider using dedicated data buffering utilities instead like buffer or pv as per Add a big buffer to a pipe between two commands.


1Which is superfluous because open(mode='w') already does that. Either truncate or reopen, no need to do both.