Protobuf-net反序列化Open Street Maps

7

我是protobuf-net的作者;目前我正在“工作”时间,但我会尽量在今天晚些时候查看一下,看看问题出在哪里。 - Marc Gravell
我知道你是Marc,我下载了你的软件。哈哈,我喜欢括号中的工作。感谢你的帮助(和框架)! - jonperl
4个回答

9

没错,问题在于这不仅仅是protobuf格式 - 它是一种混合文件格式(在此定义),其中包含了多种格式,其中包括protobuf。它还包括压缩(虽然看起来是可选的)。

我已经从规范中分离出了我能够的部分,并且我有一个使用protobuf-net处理块的C#读取器 - 它可以愉快地读取到文件末尾 - 我可以告诉你有4515个块(BlockHeader)。当它到达Blob时,我对规范如何标记OSMHeaderOSMData有点困惑 - 我乐意听取建议!我还使用ZLIB.NET来处理正在使用的zlib压缩。在没有理解这一点的情况下,我已经处理了ZLIB数据并根据所声称的大小验证了它,以检查它至少是合理的。

如果您能找出(或询问作者)他们如何区分OSMHeaderOSMData,我将很乐意再做些其他的事情。希望你不介意我就到这里了 - 但已经过去几个小时了 ;p

using System;
using System.IO;
using OpenStreetMap; // where my .proto-generated entities are living
using ProtoBuf; // protobuf-net
using zlib; // ZLIB.NET    

class OpenStreetMapParser
{

    static void Main()
    {
        using (var file = File.OpenRead("us-northeast.osm.pbf"))
        {
            // from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
            //A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
            //The format is a repeating sequence of:
            //int4: length of the BlockHeader message in network byte order
            //serialized BlockHeader message
            //serialized Blob message (size is given in the header)

            int length, blockCount = 0;
            while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
            {
                // I'm just being lazy and re-using something "close enough" here
                // note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
                // actually need the other way around (network byte order):
                uint len = (uint)length;
                len = ((len & 0xFF) << 24) | ((len & 0xFF00) << 8) | ((len & 0xFF0000) >> 8) | ((len & 0xFF000000) >> 24);
                length = (int)len;

                BlockHeader header;
                // again, v2 has capped-streams built in, but I'm deliberately
                // limiting myself to v1 features
                using (var tmp = new LimitedStream(file, length))
                {
                    header = Serializer.Deserialize<BlockHeader>(tmp);
                }
                Blob blob;
                using (var tmp = new LimitedStream(file, header.datasize))
                {
                    blob = Serializer.Deserialize<Blob>(tmp);
                }
                if(blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");

                using(var ms = new MemoryStream(blob.zlib_data))
                using(var zlib = new ZLibStream(ms))
                { // at this point I'm very unclear how the OSMHeader and OSMData are packed - it isn't clear
                    // read this to the end, to check we can parse the zlib
                    int payloadLen = 0;
                    while (zlib.ReadByte() >= 0) payloadLen++;
                    if (payloadLen != blob.raw_size) throw new FormatException("Screwed that up...");
                }
                blockCount++;
                Console.WriteLine("Read block " + blockCount.ToString());


            }
            Console.WriteLine("all done");
            Console.ReadLine();
        }
    }
}
abstract class InputStream : Stream
{
    protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
    public sealed override int Read(byte[] buffer, int offset, int count)
    {
        int bytesRead, totalRead = 0;
        while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
        {
            count -= bytesRead;
            offset += bytesRead;
            totalRead += bytesRead;
            pos += bytesRead;
        }
        return totalRead;
    }
    long pos;
    public override void Write(byte[] buffer, int offset, int count)
    {
        throw new NotImplementedException();
    }
    public override void SetLength(long value)
    {
        throw new NotImplementedException();
    }
    public override long Position
    {
        get
        {
            return pos;
        }
        set
        {
            if (pos != value) throw new NotImplementedException();
        }
    }
    public override long Length
    {
        get { throw new NotImplementedException(); }
    }
    public override void Flush()
    {
        throw new NotImplementedException();
    }
    public override bool CanWrite
    {
        get { return false; }
    }
    public override bool CanRead
    {
        get { return true; }
    }
    public override bool CanSeek
    {
        get { return false; }
    }
    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotImplementedException();
    }
}
class ZLibStream : InputStream
{   // uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
    private ZInputStream reader; // seriously, why isn't this a stream?
    public ZLibStream(Stream stream)
    {
        reader = new ZInputStream(stream);
    }
    public override void Close()
    {
        reader.Close();
        base.Close();
    }
    protected override int ReadNextBlock(byte[] buffer, int offset, int count)
    {
        // OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
        return reader.read(buffer, offset, count);
    }

}
// deliberately doesn't dispose the base-stream    
class LimitedStream : InputStream
{
    private Stream stream;
    private long remaining;
    public LimitedStream(Stream stream, long length)
    {
        if (length < 0) throw new ArgumentOutOfRangeException("length");
        if (stream == null) throw new ArgumentNullException("stream");
        if (!stream.CanRead) throw new ArgumentException("stream");
        this.stream = stream;
        this.remaining = length;
    }
    protected override int ReadNextBlock(byte[] buffer, int offset, int count)
    {
        if(count > remaining) count = (int)remaining;
        int bytesRead = stream.Read(buffer, offset, count);
        if (bytesRead > 0) remaining -= bytesRead;
        return bytesRead;
    }
}

这太棒了。谢谢你的帮助,我会看看能做到什么!(你真是个好人)。 - jonperl
我将尝试从 https://github.com/scrosby/OSM-binary/tree/master/src.java/crosby/binary 开始逆向工作。 - jonperl
我不理解注释 where my .proto-generated entities are living,也不知道你从哪里得到 OpenStreetMap - René Nyffenegger
@RenéNyffenegger 看起来.proto的外部链接已经被删除了;很可能,OpenStreetMap是在.proto文件中声明的命名空间,或者我为了方便在命令行上覆盖了它。 - Marc Gravell
@MarcGravell,您能否看一下这个问题:https://www.stackoverflow.com/questions/59599088,它涉及到OSMSharp,作者Ben Abelshausen使用了JonPerl在下面的答案。 - Youp Bernoulli

1

在 Mark 设定的大纲之后,我通过查看 http://git.openstreetmap.nl/index.cgi/pbf2osm.git/tree/src/main.c?h=35116112eb0066c7729a963b292faa608ddc8ad7 找到了最后一部分。

这是最终代码。

using System;
using System.Diagnostics;
using System.IO;
using crosby.binary;
using OSMPBF;
using PerlLLC.Tools;
using ProtoBuf;
using zlib;

namespace OpenStreetMapOperations
{
    class OpenStreetMapParser
    {
        static void Main()
        {
            using (var file = File.OpenRead(StaticTools.AssemblyDirectory + @"\us-pacific.osm.pbf"))
            {
                // from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
                //A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
                //The format is a repeating sequence of:
                //int4: length of the BlockHeader message in network byte order
                //serialized BlockHeader message
                //serialized Blob message (size is given in the header)

                int length, blockCount = 0;
                while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
                {
                    // I'm just being lazy and re-using something "close enough" here
                    // note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
                    // actually need the other way around (network byte order):
                    length = IntLittleEndianToBigEndian((uint)length);

                    BlockHeader header;
                    // again, v2 has capped-streams built in, but I'm deliberately
                    // limiting myself to v1 features
                    using (var tmp = new LimitedStream(file, length))
                    {
                        header = Serializer.Deserialize<BlockHeader>(tmp);
                    }
                    Blob blob;
                    using (var tmp = new LimitedStream(file, header.datasize))
                    {
                        blob = Serializer.Deserialize<Blob>(tmp);
                    }
                    if (blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");

                    HeaderBlock headerBlock;
                    PrimitiveBlock primitiveBlock;

                    using (var ms = new MemoryStream(blob.zlib_data))
                    using (var zlib = new ZLibStream(ms))
                    {
                        if (header.type == "OSMHeader")
                            headerBlock = Serializer.Deserialize<HeaderBlock>(zlib);

                        if (header.type == "OSMData")
                            primitiveBlock = Serializer.Deserialize<PrimitiveBlock>(zlib);
                    }
                    blockCount++;
                    Trace.WriteLine("Read block " + blockCount.ToString());


                }
                Trace.WriteLine("all done");
            }
        }

        // 4-byte number
        static int IntLittleEndianToBigEndian(uint i)
        {
            return (int)(((i & 0xff) << 24) + ((i & 0xff00) << 8) + ((i & 0xff0000) >> 8) + ((i >> 24) & 0xff));
        }
    }

    abstract class InputStream : Stream
    {
        protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
        public sealed override int Read(byte[] buffer, int offset, int count)
        {
            int bytesRead, totalRead = 0;
            while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
            {
                count -= bytesRead;
                offset += bytesRead;
                totalRead += bytesRead;
                pos += bytesRead;
            }
            return totalRead;
        }
        long pos;
        public override void Write(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }
        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }
        public override long Position
        {
            get
            {
                return pos;
            }
            set
            {
                if (pos != value) throw new NotImplementedException();
            }
        }
        public override long Length
        {
            get { throw new NotImplementedException(); }
        }
        public override void Flush()
        {
            throw new NotImplementedException();
        }
        public override bool CanWrite
        {
            get { return false; }
        }
        public override bool CanRead
        {
            get { return true; }
        }
        public override bool CanSeek
        {
            get { return false; }
        }
        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }
    }
    class ZLibStream : InputStream
    {   // uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
        private ZInputStream reader; // seriously, why isn't this a stream?
        public ZLibStream(Stream stream)
        {
            reader = new ZInputStream(stream);
        }
        public override void Close()
        {
            reader.Close();
            base.Close();
        }
        protected override int ReadNextBlock(byte[] buffer, int offset, int count)
        {
            // OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
            return reader.read(buffer, offset, count);
        }

    }
    // deliberately doesn't dispose the base-stream    
    class LimitedStream : InputStream
    {
        private Stream stream;
        private long remaining;
        public LimitedStream(Stream stream, long length)
        {
            if (length < 0) throw new ArgumentOutOfRangeException("length");
            if (stream == null) throw new ArgumentNullException("stream");
            if (!stream.CanRead) throw new ArgumentException("stream");
            this.stream = stream;
            this.remaining = length;
        }
        protected override int ReadNextBlock(byte[] buffer, int offset, int count)
        {
            if (count > remaining) count = (int)remaining;
            int bytesRead = stream.Read(buffer, offset, count);
            if (bytesRead > 0) remaining -= bytesRead;
            return bytesRead;
        }
    }
}

你在反序列化期间读取节点时遇到了任何问题吗?这段代码在我的电脑上运行没有错误,但是当我查找primitiveBlock中的数据时却什么也没有。 - ninehundreds
抱歉,我从未收到通知。你解决了这个问题吗?我记得曾经能够访问数据。尽管我们不再使用这段代码。 - jonperl
在查看了另一个项目后,我最终让代码正常工作了,但是在使用Open Street Maps时遇到更多问题后,我们决定选择另一个解决方案。 - ninehundreds
@jonperl,您能否请看一下这个问题:https://stackoverflow.com/questions/59599088/open-osm-pbf-results-in-protobuf-exception - Youp Bernoulli

1
是的,它来自Fileformat.cs中的protogen(基于OSM Fileformat.proto文件..下面是代码)。
package OSM_PROTO;
  message Blob {
    optional bytes raw = 1;
    optional int32 raw_size = 2; 
    optional bytes zlib_data = 3;
    optional bytes lzma_data = 4;
    optional bytes bzip2_data = 5;
  }

  message BlockHeader {
    required string type = 1;
    optional bytes indexdata = 2;
    required int32 datasize = 3;
  }

这是在生成的文件中BlockHeader的声明:

public sealed partial class BlockHeader : pb::GeneratedMessage<BlockHeader, BlockHeader.Builder> {...}

-> 使用 pb = global::Google.ProtocolBuffers;

(ProtocolBuffers.dll) 随此软件包一同提供:

http://code.google.com/p/protobuf-csharp-port/downloads/detail?name=protobuf-csharp-port-2.4.1.473-full-binaries.zip&can=2&q=


0

你尝试过获取一些较小的区域吗?例如us-pacific.osm.pbf

最终发布错误消息会很有用。


仍然返回 null。我尝试了 var f = Serializer.Deserialize<OSMPBF.HeaderBlock>(file); - jonperl

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接