使用boost序列化和二进制归档时出错。

3

当我从boost::archive::binary_iarchive读取到变量时,我会遇到以下错误:

test-serialization(9285,0x11c62fdc0) malloc: can't allocate region
*** mach_vm_map(size=18014398509486080) failed (error code=3)
test-serialization(9285,0x11c62fdc0) malloc: *** set a breakpoint in malloc_error_break to debug

我的序列化和反序列化代码如下:

template<class Archive>
void save(Archive & archive, const helib::PubKey & pubkey, const unsigned int version){
  BOOST_TEST_MESSAGE("inside save_construct_data");
  archive << &(pubkey.context);
  archive << pubkey.skBounds;
  archive << pubkey.keySwitching;
  archive << pubkey.keySwitchMap;
  archive << pubkey.KS_strategy;
  archive << pubkey.recryptKeyID;
}

template<class Archive>
void load_construct_data(Archive & archive, helib::PubKey * pubkey, const unsigned int version){
  helib::Context * context = new helib::Context(2,3,1); //random numbers since there is no default constructor
  BOOST_TEST_MESSAGE("deserializing context");
  archive >> context;
  std::vector<double> skBounds;
  std::vector<helib::KeySwitch> keySwitching;
  std::vector<std::vector<long>> keySwitchMap;
  NTL::Vec<long> KS_strategy;
  long recryptKeyID;
  BOOST_TEST_MESSAGE("deserializing skbounds");
  archive >> skBounds;
  BOOST_TEST_MESSAGE("deserializing keyswitching");
  archive >> keySwitching;
  BOOST_TEST_MESSAGE("deserializing keyswitchmap");
  archive >> keySwitchMap;
  BOOST_TEST_MESSAGE("deserializing KS_strategy");
  archive >> KS_strategy;
  BOOST_TEST_MESSAGE("deserializing recryptKeyID");
  archive >> recryptKeyID;
  BOOST_TEST_MESSAGE("new pubkey");
  ::new(pubkey)helib::PubKey(*context);
  //TODO: complete
}

template<class Archive>
void serialize(Archive & archive, helib::PubKey & pubkey, const unsigned int version){
  split_free(archive, pubkey, version);
}

template<class Archive>
void load(Archive & archive, helib::PubKey & pubkey, const unsigned int version){
}


调用代码的测试如下:
BOOST_AUTO_TEST_CASE(serialization_pubkey)
{
  auto context = helibTestContext();
  helib::SecKey secret_key(context);
  secret_key.GenSecKey();
  // Compute key-switching matrices that we need
  helib::addSome1DMatrices(secret_key);
  // Set the secret key (upcast: SecKey is a subclass of PubKey)
  const helib::PubKey& original_pubkey = secret_key;

  std::string filename = "pubkey.serialized";

  std::ofstream os(filename, std::ios::binary);
  {
    boost::archive::binary_oarchive oarchive(os);
    oarchive << original_pubkey;
  }

  helib::PubKey * restored_pubkey = new helib::PubKey(helib::Context(2,3,1));
  {
    std::ifstream ifs(filename, std::ios::binary);
    boost::archive::binary_iarchive iarchive(ifs);
    BOOST_TEST_CHECKPOINT("calling deserialization");
    iarchive >> restored_pubkey;
    BOOST_TEST_CHECKPOINT("done with deserialization");

    //tests ommitted
  }
}

注意事项:

  1. 序列化使用 boost::archive::text_oarchiveboost::archive::binary_oarchive都可以正常工作。它们分别创建大小为46M和21M的文件(很大,我知道)。

  2. 反序列化使用boost::archive::text_iarchive基本上停止在执行 archive >> keySwitching;时自动被杀掉了。这实际上是档案中最大的部分。

  3. 由于文件大小减半,我决定尝试使用boost::archive::binary_iarchive,但我得到了开头显示的错误。该错误发生在从归档执行第一次读取时:archive >> context;

  4. 输入和输出(saveload_construct_data)之间的不对称性是因为我找不到避免派生类helib::PubKey的序列化实现的另一种方法。使用指向helib::PubKey的指针会让我编译错误,要求序列化派生类。如果有其他方法,请告诉我。

感谢您的帮助。

更新:

我正在实现加密库HElib中的一些类的反序列化,因为我需要在网络上传输密文。其中一个类是 helib::PubKey 。我使用boost serialization library(boost序列化库)来实现。我创建了一个代码示例(gist),以提供重现效果。其中有3个文件:

  1. serialization.hpp,它包含序列化实现。不幸的是,helib::PubKey依赖于许多其他类,使该文件变得相当长。所有其他类都经过了单元测试,并通过了测试。 此外,我还必须对该类进行微小修改,以便将其序列化。我公开了私有成员
  2. test-serialization.cpp,它包含了单元测试。
  3. Makefile。运行make将创建可执行文件test-serialization

这个问题很可能是由你自己的代码引起的,而且你的问题没有提供完整的最小化示例,这让我更加怀疑。例如,helib::PubKey的定义是缺失的。 - Superlokkus
@Superlokkus 我已经更新了问题,提供了更多的上下文和一个可重现的示例。 - Giancarlo Giuffra
请查看此问题,以查看您是否有相同的问题。 - Silver
关于私有成员的非侵入式序列化:请参阅serialization::access和这些方法:https://dev59.com/r4vda4cB1Zd3GeqPaYae#30595430 - sehe
1个回答

1

vector<bool>再次出现了

实际上,它在我的测试框中为0x1fffffffff20000位(即144 petabits)进行了分配。这直接来自IndexSet :: resize()。

现在我对HElib在这里使用std::vector<bool>有严重的疑问(似乎他们会更好地使用诸如boost :: icl :: interval_set<>之类的东西)。 enter image description here

好吧。那是一场狂野的鹅追赶(IndexSet序列化可以得到很大改进)。但是,真正的问题是您存在未定义行为,因为您在反序列化时未与序列化相同的类型进行反序列化。

您序列化了一个PubKey,但尝试作为PubKey*进行反序列化。哎呀。

除此之外,还存在许多问题:

  • You had to modify the library to make private members public. This can easily violate ODR (making the class layout incompatible).

  • You seem to treat the context as a "dynamic" resource, which will engage Object Tracking. This could be a viable approach. BUT. You'll have to think about ownership.

    It seems like you didn't do that yet. For example, the line in load_construct_data for DoublCRT is a definite memory-leak:

    helib::Context * context = new helib::Context(2,3,1);
    

    You never use it nor ever free it. In fact, you simply overwrite it with the deserialized instance, which may or may not be owned. Catch-22

    Exactly the same happens in load_construct_data for PubKey.

  • worse, in save_construct_data you completely gratuitously copy context objects for each DoubleCRT in each SecKey:

     auto context = polynomial->getContext();
     archive << &context;
    

    Because you fake it out as pointer-serialization, again (obviously useless) object tracking kicks in, just meaning you serialize redundant Context copies which will will be all be leaked un deserialization.

  • I'd be tempted to assume the context instances in both would always be the same? Why not serialize the context(s) separately anyways?

  • In fact I went and analyzed the HElib source code to check these assumptions. It turns out I was correct. Nothing ever constructs a context outside

    std::unique_ptr<Context> buildContextFromBinary(std::istream& str);
    std::unique_ptr<Context> buildContextFromAscii(std::istream& str);
    

    As you can see, they return owned pointers. You should have been using them. Perhaps even with the built-in serialization, that I practically stumble over here.

重新整合的时刻

我会使用HElib中的序列化代码(因为,为什么要重复发明轮子并制造大量错误呢?)。如果你坚持要与Boost Serialization集成,你可以两全其美:

template <class Archive> void save(Archive& archive, const helib::PubKey& pubkey, unsigned) {
    using V = std::vector<char>;
    using D = iostreams::back_insert_device<V>;
    V data;
    {
        D dev(data);
        iostreams::stream_buffer<D> sbuf(dev);
        std::ostream os(&sbuf); // expose as std::ostream
        helib::writePubKeyBinary(os, pubkey);
    }
    archive << data;
}

template <class Archive> void load(Archive& archive, helib::PubKey& pubkey, unsigned) {
    std::vector<char> data;
    archive >> data;
    using S = iostreams::array_source;
    S source(data.data(), data.size());
    iostreams::stream_buffer<S> sbuf(source);
    {
        std::istream is(&sbuf); // expose as std::istream
        helib::readPubKeyBinary(is, pubkey);
    }
}

就这些了,只有24行代码。而且它将由库的作者进行测试和维护。你无法击败它(显然)。我稍微修改了一下测试,所以我们不再滥用私人细节。

清理代码

通过分离一个帮助程序来处理Blob写入,我们可以以非常相似的方式实现不同的helib类型:

namespace helib { // leverage ADL
    template <class A> void save(A& ar, const Context& o, unsigned) {
        Blob data = to_blob(o, writeContextBinary);
        ar << data;
    }
    template <class A> void load(A& ar, Context& o, unsigned) {
        Blob data;
        ar >> data;
        from_blob(data, o, readContextBinary);
    }
    template <class A> void save(A& ar, const PubKey& o, unsigned) {
        Blob data = to_blob(o, writePubKeyBinary);
        ar << data;
    }
    template <class A> void load(A& ar, PubKey& o, unsigned) {
        Blob data;
        ar >> data;
        from_blob(data, o, readPubKeyBinary);
    }
}

我是一名有用的助手,可以翻译文本。

这对我来说就是优雅。

完整列表

我克隆了一个新的代码片段 https://gist.github.com/sehe/ba82a0329e4ec586363eb82d3f3b9326,其中包括以下变更集:

0079c07 Make it compile locally
b3b2cf1 Squelch the warnings
011b589 Endof investigations, regroup time

f4d79a6 Reimplemented using HElib binary IO
a403e97 Bitwise reproducible outputs

只有最后两个提交包含与实际修复相关的更改。

为了纪念,我也会在这里列出完整的代码。测试代码中有许多微妙的重新组织和相同的注释。您最好仔细阅读它们,看看是否理解它们以及其影响是否符合您的需求。我留下了描述测试断言为什么是它们所描述的内容的评论来帮助。

  • File serialization.hpp

    #ifndef EVOTING_SERIALIZATION_H
    #define EVOTING_SERIALIZATION_H
    
    #define BOOST_TEST_MODULE main
    #include <helib/helib.h>
    #include <boost/serialization/split_free.hpp>
    #include <boost/serialization/vector.hpp>
    #include <boost/iostreams/stream_buffer.hpp>
    #include <boost/iostreams/device/back_inserter.hpp>
    #include <boost/iostreams/device/array.hpp>
    
    namespace /* file-static */ {
        using Blob = std::vector<char>;
    
        template <typename T, typename F>
        Blob to_blob(const T& object, F writer) {
            using D = boost::iostreams::back_insert_device<Blob>;
            Blob data;
            {
                D dev(data);
                boost::iostreams::stream_buffer<D> sbuf(dev);
                std::ostream os(&sbuf); // expose as std::ostream
                writer(os, object);
            }
            return data;
        }
    
        template <typename T, typename F>
        void from_blob(Blob const& data, T& object, F reader) {
            boost::iostreams::stream_buffer<boost::iostreams::array_source>
                sbuf(data.data(), data.size());
            std::istream is(&sbuf); // expose as std::istream
            reader(is, object);
        }
    }
    
    namespace helib { // leverage ADL
        template <class A> void save(A& ar, const Context& o, unsigned) {
            Blob data = to_blob(o, writeContextBinary);
            ar << data;
        }
        template <class A> void load(A& ar, Context& o, unsigned) {
            Blob data;
            ar >> data;
            from_blob(data, o, readContextBinary);
        }
        template <class A> void save(A& ar, const PubKey& o, unsigned) {
            Blob data = to_blob(o, writePubKeyBinary);
            ar << data;
        }
        template <class A> void load(A& ar, PubKey& o, unsigned) {
            Blob data;
            ar >> data;
            from_blob(data, o, readPubKeyBinary);
        }
    }
    
    BOOST_SERIALIZATION_SPLIT_FREE(helib::Context)
    BOOST_SERIALIZATION_SPLIT_FREE(helib::PubKey)
    #endif //EVOTING_SERIALIZATION_H
    
  • File test-serialization.cpp

    #define BOOST_TEST_MODULE main
    #include <boost/test/included/unit_test.hpp>
    #include <helib/helib.h>
    #include <fstream>
    #include "serialization.hpp"
    #include <boost/archive/text_oarchive.hpp>
    #include <boost/archive/text_iarchive.hpp>
    #include <boost/archive/binary_oarchive.hpp>
    #include <boost/archive/binary_iarchive.hpp>
    
    helib::Context helibTestMinimalContext(){
      // Plaintext prime modulus
      unsigned long p = 4999;
      // Cyclotomic polynomial - defines phi(m)
      unsigned long m = 32109;
      // Hensel lifting (default = 1)
      unsigned long r = 1;
      return helib::Context(m, p, r);
    }
    
    helib::Context helibTestContext(){
      auto context = helibTestMinimalContext();
    
      // Number of bits of the modulus chain
      unsigned long bits = 300;
      // Number of columns of Key-Switching matix (default = 2 or 3)
      unsigned long c = 2;
    
      // Modify the context, adding primes to the modulus chain
      buildModChain(context, bits, c);
      return context;
    }
    
    BOOST_AUTO_TEST_CASE(serialization_pubkey) {
        auto context = helibTestContext();
        helib::SecKey secret_key(context);
        secret_key.GenSecKey();
        // Compute key-switching matrices that we need
        helib::addSome1DMatrices(secret_key);
        // Set the secret key (upcast: SecKey is a subclass of PubKey)
        const helib::PubKey& original_pubkey = secret_key;
    
        std::string const filename = "pubkey.serialized";
    
        {
            std::ofstream os(filename, std::ios::binary);
            boost::archive::binary_oarchive oarchive(os);
            oarchive << context << original_pubkey;
        }
        {
            // just checking reproducible output
            std::ofstream os(filename + ".2", std::ios::binary);
            boost::archive::binary_oarchive oarchive(os);
            oarchive << context << original_pubkey;
        }
    
        // reading back to independent instances of Context/PubKey
        {
            // NOTE: if you start from something rogue, it will fail with PAlgebra mismatch.
            helib::Context surrogate = helibTestMinimalContext();
    
            std::ifstream ifs(filename, std::ios::binary);
            boost::archive::binary_iarchive iarchive(ifs);
            iarchive >> surrogate;
    
            // we CAN test that the contexts end up matching
            BOOST_TEST((context == surrogate));
    
            helib::SecKey independent(surrogate);
            helib::PubKey& indep_pk = independent;
            iarchive >> indep_pk;
            // private again, as it should be, but to understand the relation:
            // BOOST_TEST((&independent.context == &surrogate));
    
            // The library's operator== compares the reference, so it would say "not equal"
            BOOST_TEST((indep_pk != original_pubkey));
            {
                // just checking reproducible output
                std::ofstream os(filename + ".3", std::ios::binary);
                boost::archive::binary_oarchive oarchive(os);
                oarchive << surrogate << indep_pk;
            }
        }
    
        // doing it the other way (sharing the context):
        {
            helib::PubKey restored_pubkey(context);
            {
                std::ifstream ifs(filename, std::ios::binary);
                boost::archive::binary_iarchive iarchive(ifs);
                iarchive >> context >> restored_pubkey;
            }
            // now `operator==` confirms equality
            BOOST_TEST((restored_pubkey == original_pubkey));
    
            {
                // just checking reproducible output
                std::ofstream os(filename + ".4", std::ios::binary);
                boost::archive::binary_oarchive oarchive(os);
                oarchive << context << restored_pubkey;
            }
        }
    }
    

测试输出

time ./test-serialization -l all -r detailed
Running 1 test case...
Entering test module "main"
test-serialization.cpp(34): Entering test case "serialization_pubkey"
test-serialization.cpp(61): info: check (context == surrogate) has passed
test-serialization.cpp(70): info: check (indep_pk != original_pubkey) has passed
test-serialization.cpp(82): info: check (restored_pubkey == original_pubkey) has passed
test-serialization.cpp(34): Leaving test case "serialization_pubkey"; testing time: 36385217us
Leaving test module "main"; testing time: 36385273us

Test module "main" has passed with:
  1 test case out of 1 passed
  3 assertions out of 3 passed

  Test case "serialization_pubkey" has passed with:
    3 assertions out of 3 passed

real    0m36,698s
user    0m35,558s
sys     0m0,850s

位可重复输出

反复序列化后,输出似乎确实是按位相同的,这可能是一种重要的属性:

sha256sum pubkey.serialized*
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.2
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.3
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.4

请注意,它在每次运行时并不相同(因为它生成不同的密钥材料)。

支线任务(寻找野鹅)

手动改进IndexSet序列化代码的一种方法是同时使用vector<bool>

template<class Archive>
    void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
        std::vector<bool> elements;
        elements.resize(index_set.last()-index_set.first()+1);
        for (auto n : index_set)
            elements[n-index_set.first()] = true;
        archive << index_set.first() << elements;
    }

template<class Archive>
    void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version){
        long first_ = 0;
        std::vector<bool> elements;
        archive >> first_ >> elements;
        index_set.clear();
        for (size_t n = 0; n < elements.size(); ++n) {
            if (elements[n])
                index_set.insert(n+first_);
        }
    }

更好的想法是使用 dynamic_bitset(我恰好为其贡献了序列化代码(请参见如何序列化boost::dynamic_bitset?):
template<class Archive>
    void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
        boost::dynamic_bitset<> elements;
        elements.resize(index_set.last()-index_set.first()+1);
        for (auto n : index_set)
            elements.set(n-index_set.first());
        archive << index_set.first() << elements;
    }

template<class Archive>
    void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version) {
        long first_ = 0;
        boost::dynamic_bitset<> elements;
        archive >> first_ >> elements;
        index_set.clear();
        for (size_t n = elements.find_first(); n != -1; n = elements.find_next(n))
            index_set.insert(n+first_);
    }

当然,你可能需要为IndexMap做类似的事情。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接