我如何将JSON文件转换为Java 8对象流?

8

我有一个非常大的> 1GB JSON文件,其中包含一个数组(它是保密的,但这个睡眠持续时间数据文件是一个代理:)

 [
        {
            "date": "August 17, 2015",
            "hours": 7,
            "minutes": 10
        },
        {
            "date": "August 19, 2015",
            "hours": 4,
            "minutes": 46
        },
        {
            "date": "August 19, 2015",
            "hours": 7,
            "minutes": 22
        },
        {
            "date": "August 21, 2015",
            "hours": 4,
            "minutes": 48
        },
        {
            "date": "August 21, 2015",
            "hours": 6,
            "minutes": 1
        }
    ]

我已经使用JSON2POJO生成了一个“Sleep”对象定义。
现在,你可以使用Jackson的Mapper将其转换为一个数组,然后使用Arrays.stream(ARRAY)。但是这会导致程序崩溃(是的,它是个大文件)。
显然,我们需要使用Jackson的Streaming API。但那太底层了。特别是,我仍然想要Sleep对象。
如何使用Jackson Streaming JSON reader和我的Sleep.java类生成一个Java 8 Stream的Sleep对象?

你的意思是每个“Sleep”对象都是你的JSONArray中的一个元素吗? - IgorGanapolsky
2个回答

5

我找不到一个好的解决方案,但我需要一个特定的案例:我有一个超过1GB的JSON文件(顶级JSON数组,包含数万个大型对象),使用常规的Jackson映射器会在访问生成的Java对象数组时导致崩溃。

我找到的使用Jackson流API的示例失去了如此吸引人的对象映射,并且肯定不允许通过(明显适当的)Java 8流API访问对象。

现在代码已上传至GitHub

这是一个快速使用的示例:

 //Use the JSON File included as a resource
 ClassLoader classLoader = SleepReader.class.getClassLoader();
 File dataFile = new File(classLoader.getResource("example.json").getFile());

 //Simple example of getting the Sleep Objects from that JSON
 new JsonArrayStreamDataSupplier<>(dataFile, Sleep.class) //Got the Stream
                .forEachRemaining(nightsRest -> {
                    System.out.println(nightsRest.toString());
                });

这里是来自 example.json 的一些 JSON

   [
    {
        "date": "August 17, 2015",
        "hours": 7,
        "minutes": 10
    },
    {
        "date": "August 19, 2015",
        "hours": 4,
        "minutes": 46
    },
    {
        "date": "August 19, 2015",
        "hours": 7,
        "minutes": 22
    },
    {
        "date": "August 21, 2015",
        "hours": 4,
        "minutes": 48
    },
    {
        "date": "August 21, 2015",
        "hours": 6,
        "minutes": 1
    }
]

如果您不想前往GitHub(建议您前往),这里是包装类本身:

    /**
 * @license APACHE LICENSE, VERSION 2.0 http://www.apache.org/licenses/LICENSE-2.0
 * @author Michael Witbrock
 */
package com.michaelwitbrock.jacksonstream;

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.Spliterators;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

public class JsonArrayStreamDataSupplier<T> implements Iterator<T> {
    /*
    * This class wraps the Jackson streaming API for arrays (a common kind of 
    * large JSON file) in a Java 8 Stream. The initial motivation was that 
    * use of a default objectmapper to a Java array was crashing for me on
    * a very large JSON file (> 1GB).  And there didn't seem to be good example 
    * code for handling Jackson streams as Java 8 streams, which seems natural.
    */

    static ObjectMapper mapper = new ObjectMapper();
    JsonParser parser;
    boolean maybeHasNext = false;
    int count = 0;
    JsonFactory factory = new JsonFactory();
    private Class<T> type;

    public JsonArrayStreamDataSupplier(File dataFile, Class<T> type) {
        this.type = type;
        try {
            // Setup and get into a state to start iterating
            parser = factory.createParser(dataFile);
            parser.setCodec(mapper);
            JsonToken token = parser.nextToken();
            if (token == null) {
                throw new RuntimeException("Can't get any JSON Token from "
                        + dataFile.getAbsolutePath());
            }

            // the first token is supposed to be the start of array '['
            if (!JsonToken.START_ARRAY.equals(token)) {
                // return or throw exception
                maybeHasNext = false;
                throw new RuntimeException("Can't get any JSON Token fro array start from "
                        + dataFile.getAbsolutePath());
            }
        } catch (Exception e) {
            maybeHasNext = false;
        }
        maybeHasNext = true;
    }

    /*
    This method returns the stream, and is the only method other 
    than the constructor that should be used.
    */
    public Stream<T> getStream() {
        return StreamSupport.stream(Spliterators.spliteratorUnknownSize(this, 0), false);
    }

    /* The remaining methods are what enables this to be passed to the spliterator generator, 
       since they make it Iterable.
    */
    @Override
    public boolean hasNext() {
        if (!maybeHasNext) {
            return false; // didn't get started
        }
        try {
            return (parser.nextToken() == JsonToken.START_OBJECT);
        } catch (Exception e) {
            System.out.println("Ex" + e);
            return false;
        }
    }

    @Override
    public T next() {
        try {
            JsonNode n = parser.readValueAsTree();
            //Because we can't send T as a parameter to the mapper
            T node = mapper.convertValue(n, type);
            return node;
        } catch (IOException | IllegalArgumentException e) {
            System.out.println("Ex" + e);
            return null;
        }

    }


}

为什么需要一个 ClassLoader 来读取文件?你可以直接使用 File file = new File("path_to.../example.json").getAbsoluteFile(); - IgorGanapolsky
3
只有在从JAR包中的资源目录加载示例文件时才需要这样做。如果想从其他位置加载文件,你提供的建议很好(并且可能会提供一个更透明的示例)。 - Witbrock

4

移除Iterator的实现

我认为您可以使用Jackson的API来完全摆脱Iterator的实现。

问题在于readValueAs可能会返回一个迭代器,唯一我没有完全弄清楚的是为什么我必须先消耗JSON数组的开始标记,然后才能让Jackson开始工作。

public class InputStreamJsonArrayStreamDataSupplier<T> implements Supplier<Stream<T>> {


private ObjectMapper mapper = new ObjectMapper();
private JsonParser jsonParser;
private Class<T> type;



public InputStreamJsonArrayStreamDataSupplier(Class<T> type) throws IOException {
    this.type = type;

    // Setup and get into a state to start iterating
    jsonParser = mapper.getFactory().createParser(data);
    jsonParser.setCodec(mapper);
    JsonToken token = jsonParser.nextToken();
    if (JsonToken.START_ARRAY.equals(token)) {
        // if it is started with START_ARRAY it's ok
        token = jsonParser.nextToken();
    }
    if (!JsonToken.START_OBJECT.equals(token)) {
        throw new RuntimeException("Can't get any JSON object from input " + data);
    }
}


public Stream<T> get() {
    try {
        return StreamSupport.stream(Spliterators.spliteratorUnknownSize((Iterator<T>) jsonParser.readValuesAs(type), 0), false);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接