Python NetCDF：复制所有变量和属性，但留下一个。

Question

Python NetCDF：复制所有变量和属性，但留下一个。

pythonnetcdf

18

I需要处理一个包含许多属性和变量的netcdf文件中的单个变量。我认为不可能更新netcdf文件（请参见问题如何在Scientific.IO.NetCDF.NetCDFFile中删除变量？）。

我的方法如下：

1.从原始文件获取要处理的变量 2.处理变量 3.将原始netcdf中除了处理过的变量之外的所有数据复制到最终文件 4.将处理过的变量复制到最终文件

我的问题是编写第3步的代码。我从以下内容开始：

def  processing(infile, variable, outfile):
        data = fileH.variables[variable][:]

        # do processing on data...

        # and now save the result
        fileH = NetCDFFile(infile, mode="r")
        outfile = NetCDFFile(outfile, mode='w')
        # build a list of variables without the processed variable
        listOfVariables = list( itertools.ifilter( lamdba x:x!=variable , fileH.variables.keys() ) )
        for ivar in listOfVariables:
             # here I need to write each variable and each attribute

如何在不必重建整个数据结构的情况下，用少量代码保存所有数据和属性？

- Bruno von Paris

6个回答

9

如果你只想复制文件并挑出变量，nccopy 是一个很好的工具，正如@rewfuss所提供的。

下面是一种使用python-netcdf4 的Python解决方案，它更加灵活。这使得你可以在写入文件之前进行处理和计算等其他操作。

with netCDF4.Dataset(file1) as src, netCDF4.Dataset(file2) as dst:

  for name, dimension in src.dimensions.iteritems():
    dst.createDimension(name, len(dimension) if not dimension.isunlimited() else None)

  for name, variable in src.variables.iteritems():

    # take out the variable you don't want
    if name == 'some_variable': 
      continue

    x = dst.createVariable(name, variable.datatype, variable.dimensions)
    dst.variables[x][:] = src.variables[x][:]

这不考虑变量属性，比如 fill_values。您可以根据文档很容易地完成这个任务。

请注意，以这种方式编写/创建的 netCDF4 文件无法撤销。一旦修改变量，它就会在 with 语句结束时写入文件，或者如果您对 Dataset 调用了 .close()，那么也会写入文件。

当然，如果您希望在写入之前处理变量，则必须小心创建哪些维度。在新文件中，不要在未创建变量的情况下写入变量。此外，不要创建没有定义维度的变量，正如在python-netcdf4文档中所指出的那样。

- Xavier Ho

4

解决问题的方法非常好，但在我能够让它运行之前需要进行几个修复。首先是'.iteritems()'在3.x版本中不再可用，需要改为使用'.items()'。其次，需要将x的使用替换为变量的字符串，例如这样'dst.variables[name][:] = src.variables[name][:]'。 - captain_M

6

这个答案在Xavier Ho的基础上构建(https://dev59.com/32Up5IYBdhLWcg3wo4ud#32002401)，但是我需要增加一些修复来完成它:

import netCDF4 as nc
import numpy as np
toexclude = ["TO_REMOVE"]
with nc.Dataset("orig.nc") as src, nc.Dataset("filtered.nc", "w") as dst:
    # copy attributes
    for name in src.ncattrs():
        dst.setncattr(name, src.getncattr(name))
    # copy dimensions
    for name, dimension in src.dimensions.iteritems():
        dst.createDimension(
            name, (len(dimension) if not dimension.isunlimited else None))
    # copy all file data except for the excluded
    for name, variable in src.variables.iteritems():
        if name not in toexclude:
            x = dst.createVariable(name, variable.datatype, variable.dimensions)
            dst.variables[name][:] = src.variables[name][:]

- Arne Babenhauserheide

“isunlimited”现在似乎是一个函数（isunlimited()）。 - Bart

3

C netCDF 4.3.0及以上版本的nccopy实用程序包括一个选项，可以列出要复制的变量（以及其属性）。不幸的是，它没有包括一个排除变量的选项，这正是你所需要的。然而，如果要包含的变量列表（以逗号分隔）不会超过系统限制，那么这将起作用。这个选项有两种变体：

nccopy -v var1,var2,...,varn input.nc output.nc
nccopy -V var1,var2,...,varn input.nc output.nc

第一个选项(-v)包含所有变量定义，但仅包含命名变量的数据。第二个选项(-V)不包含未命名变量的定义或数据。

- rewfuss

1

我知道这是一个旧问题，但作为另一种选择，您可以使用库netcdf和shutil：

import shutil
from netcdf import netcdf as nc

def processing(infile, variable, outfile):
    shutil.copyfile(infile, outfile)
    with nc.loader(infile) as in_root, nc.loader(outfile) as out_root:
        data = nc.getvar(in_root, variable)
        # do your processing with data and save them as memory "values"...
        values = data[:] * 3
        new_var = nc.getvar(out_root, variable, source=data)
        new_var[:] = values

- ecolell

1

嗨@ecolell，我知道这是一个较旧的回复。在此期间，我们使用netCDF4进行工作，似乎没有包括“loader”类。您是否知道如何在netCDF4中应用此代码？ - Linda

@Linda 我对这个库不是很活跃，但我刚刚上传了该库的最新版本副本。在以下链接中，您将找到代码：https://github.com/ecolell/netcdf/blob/0dafde1f72fcb932f5ed38e99019961456f8f1ce/netcdf/netcdf.py#L342 （希望对您有所帮助）。 - ecolell

0

到目前为止，所有的配方（除了一个来自@rewfuss的表单，它工作得很好，但不是一个典型的Pythonic）都会生成一个普通的NetCDF3文件，这可能会对高度压缩的NetCDF4数据集造成影响。这里尝试解决这个问题。

import netCDF4                                        
                                                      
infname="Inf.nc"                                      
outfname="outf.nc"                                    
                                                      
skiplist="var1 var2".split()                          
                                                      
with netCDF4.Dataset(infname) as src:                 
                                                      
    with netCDF4.Dataset(outfname, "w", format=src.file_format) as dst:
        # copy global attributes all at once via dictionary
        dst.setncatts(src.__dict__)                   
        # copy dimensions                             
        for name, dimension in src.dimensions.items():
            dst.createDimension(                      
                name, (len(dimension) if not dimension.isunlimited() else None))
        # copy all file data except for the excluded  
        for name, variable in src.variables.items():
                if name in skiplist:
                    continue
                createattrs = variable.filters()      
                if createattrs is None:               
                    createattrs = {}                  
                else:                                 
                    chunksizes = variable.chunking()  
                    print(createattrs)                
                    if chunksizes == "contiguous":    
                        createattrs["contiguous"] = True
                    else:                             
                        createattrs["chunksizes"] =  chunksizes
                x = dst.createVariable(name, variable.datatype, variable.dimensions, **createattrs)
                # copy variable attributes all at once via dictionary
                dst[name].setncatts(src[name].__dict__)
                dst[name][:] = src[name][:]

这似乎可以正常工作并以原始文件中的方式存储变量，但它不会复制一些以下划线开头且未知于NetCDF库的变量属性。

- Roux

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rich Signell · Accepted Answer

以下是我刚使用并成功的代码。@arne的回答已更新为Python 3，并且包括了复制变量属性：

import netCDF4 as nc
toexclude = ['ExcludeVar1', 'ExcludeVar2']

with netCDF4.Dataset("in.nc") as src, netCDF4.Dataset("out.nc", "w") as dst:
    # copy global attributes all at once via dictionary
    dst.setncatts(src.__dict__)
    # copy dimensions
    for name, dimension in src.dimensions.items():
        dst.createDimension(
            name, (len(dimension) if not dimension.isunlimited() else None))
    # copy all file data except for the excluded
    for name, variable in src.variables.items():
        if name not in toexclude:
            x = dst.createVariable(name, variable.datatype, variable.dimensions)
            dst[name][:] = src[name][:]
            # copy variable attributes all at once via dictionary
            dst[name].setncatts(src[name].__dict__)