在Google App Engine上使用Java解压大型Blob

3

我正在使用Java (JDO)在Google App Engine上构建一个项目。我正在使用Deflater对大型byte[]进行编程压缩,然后将压缩后的byte[]作为blobstore存储。这个方法非常有效:

 public class Functions {

public static byte[] compress(byte[] input) throws UnsupportedEncodingException, IOException, MessagingException
    {

        Deflater df = new Deflater();       //this function mainly generate the byte code
        df.setLevel(Deflater.BEST_COMPRESSION);
        df.setInput(input);

        ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);   //we write the generated byte code in this array
        df.finish();
        byte[] buff = new byte[1024];   //segment segment pop....segment set 1024
        while(!df.finished())
        {
            int count = df.deflate(buff);       //returns the generated code... index
            baos.write(buff, 0, count);     //write 4m 0 to count
        }
        baos.close();

        int baosLength = baos.toByteArray().length;
        int inputLength = input.length;
        //System.out.println("Original: "+inputLength);
        // System.out.println("Compressed: "+ baosLength);

        return baos.toByteArray();

    }

 public static byte[] decompress(byte[] input) throws UnsupportedEncodingException, IOException, DataFormatException
    {

        Inflater decompressor = new Inflater();
        decompressor.setInput(input);

        // Create an expandable byte array to hold the decompressed data
        ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);

        // Decompress the data
        byte[] buf = new byte[1024];
        while (!decompressor.finished()) {
            try {
                int count = decompressor.inflate(buf);
                bos.write(buf, 0, count);
            } catch (DataFormatException e) {
            }
        }
        try {
            bos.close();
        } catch (IOException e) {
        }

        // Get the decompressed data
        byte[] decompressedData = bos.toByteArray();

        return decompressedData;


    }

 public static BlobKey putInBlobStore(String contentType, byte[] filebytes) throws IOException {

        // Get a file service
          FileService fileService = FileServiceFactory.getFileService();


          AppEngineFile file = fileService.createNewBlobFile(contentType);

          // Open a channel to write to it
          boolean lock = true;
          FileWriteChannel writeChannel = fileService.openWriteChannel(file, lock);

          // This time we write to the channel using standard Java
          BufferedInputStream in = new BufferedInputStream(new ByteArrayInputStream(filebytes));
          byte[] buffer;
          int defaultBufferSize = 524288;
          if(filebytes.length > defaultBufferSize){
              buffer = new byte[defaultBufferSize]; // 0.5 MB buffers
          }
          else{
              buffer = new byte[filebytes.length]; // buffer the size of the data
          }

            int read;
            while( (read = in.read(buffer)) > 0 ){ //-1 means EndOfStream
                System.out.println(read);
                if(read < defaultBufferSize){
                    buffer = new byte[read];
                }
                ByteBuffer bb = ByteBuffer.wrap(buffer);
                writeChannel.write(bb);
            }
            writeChannel.closeFinally();

        return fileService.getBlobKey(file);
    }
}

使用我Functions类中的静态compress()和putInBlobStore()函数,我可以这样压缩并存储byte[]:
BlobKey dataBlobKey =  Functions.putInBlobStore("MULTIPART_FORM_DATA", Functions.compress(orginalDataByteArray));

非常不错。我真的很喜欢GAE。

但现在,有一个问题:

我正在存储压缩的HTML,希望能够在JSP页面内的iframe中检索并解压缩以显示。压缩很快,但解压缩却需要很长时间!即使压缩的HTML只有15k,有时解压缩也会失败。

以下是我的解压缩方法:

 URL file = new URL("/blobserve?key=" + htmlBlobKey);
         URLConnection conn = file.openConnection();
         conn.setReadTimeout(30000);
         conn.setConnectTimeout(30000);
         InputStream inputStream = conn.getInputStream();
         byte[] data = IOUtils.toByteArray(inputStream);
         return new String(Functions.decompress(data));

有没有什么好的方法可以从Blobstore中获取压缩后的HTML并将其解压并显示出来?即使我需要将它传递给任务队列,同时显示进度条并轮询完成 - 这也是可以的。 我真的不在乎,只要它高效并最终能够正常工作。您能在这里与我分享任何指导吗?
感谢您的帮助。

延迟肯定是在解压过程中吗?你是否检查过仅输出压缩后的检索数据,以查看是否同样缓慢? - Sanjay Manohar
你为什么要从自己那里获取blob?为什么不直接使用blob读取API呢? - Nick Johnson
另外,为什么你要将数据存储在Blobstore中进行压缩,考虑到它会带来额外的延迟? - Nick Johnson
仍在努力实现Sasha下面的建议,但对于Nick的问题,这是一个可能会变得非常庞大的存档(每个客户端20TB或更多),而客户端可能每月只访问5-10个法律证言。因此,我愿意为存储大小而牺牲速度。Nick,我在Python文档中看到了blobReader对象,但Java中的等效物是什么?谢谢。 - Bob
看起来 BlobstoreInputStream 是等价的。我也会看一下这个。 - Bob
2个回答

0

你可以尝试使用异步运行的RequestBuilder

RequestBuilder requestBuilder = new RequestBuilder(RequestBuilder.GET,"/blobserve?key=" + htmlBlobKey);
try {
requestBuilder.sendRequest(null, new RequestCallback() {
public void onError(Request request, Throwable exception) {
  GWT.log(exception.getMessage());
}
public void onResponseReceived(Request request, Response response) {
  doSomething(response.getText());//here update your iframe and stop progress indicator
}
});
} catch (RequestException ex) {
  GWT.log(ex.getMessage());
}

非常酷。我明天会试一试,并更新帖子以反映我的发现。谢谢 Sasha! - Bob

0
我采用了Nick Johnson的想法,直接从Blobstore中读取而不是提供blobs。现在速度飞快!以下是代码:
try{
        ChainedBlobstoreInputStream inputStream = new ChainedBlobstoreInputStream(this.getHtmlBlobKey());
        //StringWriter writer = new StringWriter();
         byte[] data = IOUtils.toByteArray(inputStream);
         return new String(Functions.decompress(Encrypt.AESDecrypt(data)));
         //return new String(data);
    } 
    catch(Exception e){
            return "No HTML Version";
        }

我从这里得到了ChainedBlobstoreInputStream类: 读取大小>= 1MB的BlobstoreInputStream

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接