我可以使用流来解压缩和反序列化文件吗?

7

我的应用使用Json.Net对对象进行序列化,压缩生成的JSON,然后保存到文件中。此外,该应用程序可以从其中一个文件加载对象。这些对象可能有数十MB大小,我担心由于现有代码创建大型字符串和字节数组的方式而导致的内存使用问题:

public void Save(MyClass myObject, string filename)
{
    var json = JsonConvert.SerializeObject(myObject);
    var bytes = Compress(json);
    File.WriteAllBytes(filename, bytes);
}

public MyClass Load(string filename)
{    
    var bytes = File.ReadAllBytes(filename);
    var json = Decompress(bytes);
    var myObject = JsonConvert.DeserializeObject<MyClass>(json);
}

private static byte[] Compress(string s)
{
    var bytes = Encoding.Unicode.GetBytes(s);

    using (var ms = new MemoryStream())
    {
        using (var gs = new GZipStream(ms, CompressionMode.Compress))
        {
            gs.Write(bytes, 0, bytes.Length);
            gs.Close();
            return ms.ToArray();
        }
    }
}

private static string Decompress(byte[] bytes)
{
    using (var msi = new MemoryStream(bytes))
    {
        using (var mso = new MemoryStream())
        {
            using (var gs = new GZipStream(msi, CompressionMode.Decompress))
            {
                gs.CopyTo(mso);
                return Encoding.Unicode.GetString(mso.ToArray());
            }
        }
    } 
}

我想知道是否可以用流替代Save/Load方法?我已经找到了使用Json.Net的流示例,但是还不知道如何添加压缩功能。


1
这可能对您很有趣 http://benfoster.io/blog/aspnet-web-api-compression - Paul Zahra
@Roy,最近我一直看到OOM异常,而这段代码似乎是罪魁祸首。我正在等待VS内存分析器完成生成报告(太慢了...),所以很快我就会有更好的想法,但我想在闲着无事的时候尝试重构这段代码! - Andrew Stephens
@AndrewStephens 好的,也许在问题中提一下你的OOM。祝好运! - user585968
2个回答

10

JsonSerializer有从JsonTextReader序列化和到StreamWriter序列化的方法,这两种方法都可以在任何类型的流(包括GZipStream)之上创建。使用它们,您可以创建以下扩展方法:

public static partial class JsonExtensions
{
    // Buffer sized as recommended by Bradley Grainger, https://faithlife.codes/blog/2012/06/always-wrap-gzipstream-with-bufferedstream/
    // But anything smaller than 85,000 bytes should be OK, since objects larger than that go on the large object heap.  See:
    // https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap
    const int BufferSize = 8192;
    // Disable writing of BOM as per https://datatracker.ietf.org/doc/html/rfc8259#section-8.1
    static readonly Encoding DefaultEncoding = new UTF8Encoding(false);

    public static void SerializeToFileCompressed(object value, string path, JsonSerializerSettings settings = null)
    {
        using (var fs = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Read))
            SerializeCompressed(value, fs, settings);
    }

    public static void SerializeCompressed(object value, Stream stream, JsonSerializerSettings settings = null)
    {
        using (var compressor = new GZipStream(stream, CompressionMode.Compress))
        using (var writer = new StreamWriter(compressor, DefaultEncoding, BufferSize))
        {
            var serializer = JsonSerializer.CreateDefault(settings);
            serializer.Serialize(writer, value);
        }
    }

    public static T DeserializeFromFileCompressed<T>(string path, JsonSerializerSettings settings = null)
    {
        using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
            return DeserializeCompressed<T>(fs, settings);
    }

    public static T DeserializeCompressed<T>(Stream stream, JsonSerializerSettings settings = null)
    {
        using (var compressor = new GZipStream(stream, CompressionMode.Decompress))
        using (var reader = new StreamReader(compressor))
        using (var jsonReader = new JsonTextReader(reader))
        {
            var serializer = JsonSerializer.CreateDefault(settings);
            return serializer.Deserialize<T>(jsonReader);
        }
    }
}

请参见Json.NET文档中的性能提示:优化内存使用

1
太棒了。我之前对于各种读取器和流应该如何嵌套感到混乱。重构使用这段代码后,内存使用情况得到了显著改善。 - Andrew Stephens

3

对于那些想知道如何在UWP应用程序中使用@dbc的扩展程序的人,我对代码进行了修改,其中StorageFile是您可以写入的文件。

public static async void SerializeToFileCompressedAsync(object value, StorageFile file, JsonSerializerSettings settings = null)
{
    using (var stream = await file.OpenStreamForWriteAsync())
        SerializeCompressed(value, stream, settings);
}

public static void SerializeCompressed(object value, Stream stream, JsonSerializerSettings settings = null)
{
    using (var compressor = new GZipStream(stream, CompressionMode.Compress))
    using (var writer = new StreamWriter(compressor))
    {
        var serializer = JsonSerializer.CreateDefault(settings);
        serializer.Serialize(writer, value);
    }
}

public static async Task<T> DeserializeFromFileCompressedAsync<T>(StorageFile file, JsonSerializerSettings settings = null)
{
    using (var stream = await file.OpenStreamForReadAsync())
        return DeserializeCompressed<T>(stream, settings);
}

public static T DeserializeCompressed<T>(Stream stream, JsonSerializerSettings settings = null)
{
    using (var compressor = new GZipStream(stream, CompressionMode.Decompress))
    using (var reader = new StreamReader(compressor))
    using (var jsonReader = new JsonTextReader(reader))
    {
        var serializer = JsonSerializer.CreateDefault(settings);
        return serializer.Deserialize<T>(jsonReader);
    }
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接