So I'm trying to crate a tar.gz file file from multiple directories and files. Something with the same usage as:
tar -cvzf sometarfile.tar.gz somedir/ someotherdir/ somefile.json somefile.xml
Assuming the directories have other directories inside of them. I have this as an input:
paths := []string{
"somedir/",
"someotherdir/",
"somefile.json",
"somefile.xml",
}
and using these:
func TarFilesDirs(paths []string, tarFilePath string ) error {
// set up the output file
file, err := os.Create(tarFilePath)
if err != nil {
return err
}
defer file.Close()
// set up the gzip writer
gz := gzip.NewWriter(file)
defer gz.Close()
tw := tar.NewWriter(gz)
defer tw.Close()
// add each file/dir as needed into the current tar archive
for _,i := range paths {
if err := tarit(i, tw); err != nil {
return err
}
}
return nil
}
func tarit(source string, tw *tar.Writer) error {
info, err := os.Stat(source)
if err != nil {
return nil
}
var baseDir string
if info.IsDir() {
baseDir = filepath.Base(source)
}
return filepath.Walk(source,
func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
header, err := tar.FileInfoHeader(info, info.Name())
if err != nil {
return err
}
if baseDir != "" {
header.Name = filepath.Join(baseDir, strings.TrimPrefix(path, source))
}
if err := tw.WriteHeader(header); err != nil {
return err
}
if info.IsDir() {
return nil
}
file, err := os.Open(path)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(tw, file)
if err != nil {
log.Println("failing here")
return err
}
return err
})
}
Problem: if a directory is large I'm getting:
archive/tar: write too long
error, when I remove it everything works.
Ran out of ideas and wasted many hours on this trying to find a solution...
Any ideas?
Thanks
I was having a similar issue until I looked more closely at the tar.FileInfoHeader doc:
FileInfoHeader creates a partially-populated Header from fi. If fi describes a symlink, FileInfoHeader records link as the link target. If fi describes a directory, a slash is appended to the name. Because os.FileInfo's Name method returns only the base name of the file it describes, it may be necessary to modify the Name field of the returned header to provide the full path name of the file.
Essentially, FileInfoHeader isn't guaranteed to fill out all the header fields before you write it with WriteHeader, and if you look at the implementation the Size field is only set on regular files. Your code snippet seems to only handle directories, this means if you come across any other non regular file, you write the header with a size of zero then attempt to copy a potentially non-zero sized special file on disk into the tar. Go returns ErrWriteTooLong to stop you from creating a broken tar.
I came up with this and haven't had the issue since.
if err := filepath.Walk(directory, func(path string, info os.FileInfo, err error) error {
if err != nil {
return check(err)
}
var link string
if info.Mode()&os.ModeSymlink == os.ModeSymlink {
if link, err = os.Readlink(path); err != nil {
return check(err)
}
}
header, err := tar.FileInfoHeader(info, link)
if err != nil {
return check(err)
}
header.Name = filepath.Join(baseDir, strings.TrimPrefix(path, directory))
if err = tw.WriteHeader(header); err != nil {
return check(err)
}
if !info.Mode().IsRegular() { //nothing more to do for non-regular
return nil
}
fh, err := os.Open(path)
if err != nil {
return check(err)
}
defer fh.Close()
if _, err = io.CopyBuffer(tw, fh, buf); err != nil {
return check(err)
}
return nil
})
It is also possible that the file has changed on disk and has became longer.
Heads up on this one - some filesystems don't accurate list file size in attrs, such as /proc. I've seen this happen when a rogue archive tries to include /proc/PID/attr/current which is SELinux context. The file attr will always report size zero, but a read finds data. In GoLang it invariably generates a "write too long" failure, whereas GNU tar/libz will have no problem. Be careful what filesystem you're using to archive and whether it accurately reports lengths.