在Java中对文件进行base64编码时失败

Question

在Java中对文件进行base64编码时失败

7

我有一个用于编码和解码文件的类。当我使用 .txt 文件运行该类时，结果是成功的。但是当我使用 .jpg 或 .doc 运行代码时，我无法打开文件或者它与原始文件不相等。我不知道为什么会发生这种情况。我已经修改了这个类http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html。但我想更改这一行。

byte imageData[] = new byte[(int) file.length()];

for

byte example[] = new byte[1024];

"并且根据我们的需要多次阅读文件。谢谢。"

import java.io.*;
import java.util.*;

  public class Encode {

输入 = 输入文件路径 - 输出 = 输出文件路径 - imageDataString = 编码后的字符串

  String input;
  String output;
  String imageDataString;


  public void setFileInput(String input){
    this.input=input;
  }

  public void setFileOutput(String output){
    this.output=output;
  }

  public String getFileInput(){
    return input;
  }

  public String getFileOutput(){
    return output;
  }

  public String getEncodeString(){
    return  imageDataString;
  }

  public String processCode(){
    StringBuilder sb= new StringBuilder();

    try{
        File fileInput= new File( getFileInput() );
        FileInputStream imageInFile = new FileInputStream(fileInput);

我在示例中看到人们创建了一个与文件相同长度的byte[]。但我不想这样做，因为我不知道文件的长度会是多少。

        byte buff[] = new byte[1024];

        int r = 0;

        while ( ( r = imageInFile.read( buff)) > 0 ) {

          String imageData = encodeImage(buff);

          sb.append( imageData);

          if ( imageInFile.available() <= 0 ) {
            break;
          }
        }



       } catch (FileNotFoundException e) {
        System.out.println("File not found" + e);
      } catch (IOException ioe) {
        System.out.println("Exception while reading the file " + ioe);

    } 

        imageDataString = sb.toString();

       return imageDataString;
}  


  public  void processDecode(String str) throws IOException{

      byte[] imageByteArray = decodeImage(str);
      File fileOutput= new File( getFileOutput());
      FileOutputStream imageOutFile = new FileOutputStream( fileOutput);

      imageOutFile.write(imageByteArray);
      imageOutFile.close();

}

 public static String encodeImage(byte[] imageByteArray) {

      return  Base64.getEncoder().withoutPadding().encodeToString( imageByteArray);

    }

    public static byte[] decodeImage(String imageDataString) {
      return  Base64.getDecoder().decode(  imageDataString);  

    }


  public static void main(String[] args) throws IOException {

    Encode a = new Encode();

    a.setFileInput( "C://Users//xxx//Desktop//original.doc");
    a.setFileOutput("C://Users//xxx//Desktop//original-copied.doc");

    a.processCode( );

    a.processDecode( a.getEncodeString());

    System.out.println("C O P I E D");
  }
}

我试着改变

String imageData = encodeImage(buff);

for

String imageData = encodeImage(buff,r);

"和方法encodeImage"

public static String encodeImage(byte[] imageByteArray, int r) {

     byte[] aux = new byte[r];

     for ( int i = 0; i < aux.length; i++) {
       aux[i] = imageByteArray[i];

       if ( aux[i] <= 0 ) {
         break;
       }
     }
return  Base64.getDecoder().decode(  aux);
}

但是我遇到了错误：

Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits

- JGG

谢谢。你的问题让我得出了更好的结论。我错过了发送一个base64图像，而我认为我已经这样做了。因此，我收到了一个非法参数异常。 - Semo

2个回答

0

看这里：

    while ( ( r = imageInFile.read( buff)) > 0 ) {
      String imageData = encodeImage(buff);

read 在遇到文件结尾时会返回-1，否则返回实际读取的字节数。

因此，最后一个 buff 可能没有完全读取，甚至可能包含之前任意一次读取操作的垃圾数据。所以需要使用 r。

由于这是一个作业，其余部分由您决定。

顺便说一下：

 byte[] array = new byte[1024]

在Java中更为常规。语法如下：

 byte array[] = ...

是为了与C/C++兼容而设计的。

- Joop Eggen

谢谢您的回答。我已经添加了额外的信息。我尝试使用r运行类，但是我遇到了编译问题，而且我没有找到解决方案。 - JGG

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- RealSkeptic · Accepted Answer

你的程序有两个问题。

第一个问题，正如@Joop Eggen所提到的，是您没有正确处理输入。

事实上，Java不保证您在文件中间甚至读取整个1024字节。它可能只读取50个字节，并告诉您它读取了50个字节，然后下一次它将再读取50个字节。

假设您在上一轮读取了1024个字节。现在，在当前轮中，您只读取了50个字节。您的字节数组现在包含了50个新字节，其余的字节来自之前读取的旧字节！

因此，您总是需要复制粘贴到一个新数组中的确切字节数，并将其传递给您的编码函数。

因此，要解决这个特定的问题，您需要执行类似以下的操作:

 while ( ( r = imageInFile.read( buff)) > 0 ) {

      byte[] realBuff = Arrays.copyOf( buff, r );

      String imageData = encodeImage(realBuff);

      ...
 }

然而，这不是唯一的问题。你真正的问题在于Base64编码本身。

Base64的作用是将字节拆分为6位块，然后将每个块视为介于N 0和63之间的数字。然后它从其字符表中取出第N个字符来表示该块。

但这意味着它不能只编码单个字节或两个字节，因为一个字节包含8位，即一个6位块和2位剩余位。两个字节有16位。这是2个6位块和4个剩余位。

为了解决这个问题，Base64总是编码3个连续的字节。如果输入不能被3整除，则会添加额外的零位。

下面是一个演示问题的小程序：

package testing;

import java.util.Base64;

public class SimpleTest {

    public static void main(String[] args) {

        // An array containing six bytes to encode and decode.
        byte[] fullArray = { 0b01010101, (byte) 0b11110000, (byte)0b10101010, 0b00001111, (byte)0b11001100, 0b00110011 };

        // The same array broken into three chunks of two bytes.

        byte[][] threeTwoByteArrays = {
            {       0b01010101, (byte) 0b11110000 },
            { (byte)0b10101010,        0b00001111 },
            { (byte)0b11001100,        0b00110011 }
        };
        Base64.Encoder encoder = Base64.getEncoder().withoutPadding();

        // Encode the full array

        String encodedFullArray = encoder.encodeToString(fullArray);

        // Encode the three chunks consecutively 

        StringBuilder encodedStringBuilder = new StringBuilder();
        for ( byte [] twoByteArray : threeTwoByteArrays ) {
            encodedStringBuilder.append(encoder.encodeToString(twoByteArray));
        }
        String encodedInChunks = encodedStringBuilder.toString();

        System.out.println("Encoded full array: " + encodedFullArray);
        System.out.println("Encoded in chunks of two bytes: " + encodedInChunks);

        // Now  decode the two resulting strings

        Base64.Decoder decoder = Base64.getDecoder();

        byte[] decodedFromFull = decoder.decode(encodedFullArray);   
        System.out.println("Byte array decoded from full: " + byteArrayBinaryString(decodedFromFull));

        byte[] decodedFromChunked = decoder.decode(encodedInChunks);
        System.out.println("Byte array decoded from chunks: " + byteArrayBinaryString(decodedFromChunked));
    }

    /**
     * Convert a byte array to a string representation in binary
     */
    public static String byteArrayBinaryString( byte[] bytes ) {
        StringBuilder sb = new StringBuilder();
        sb.append('[');
        for ( byte b : bytes ) {
            sb.append(Integer.toBinaryString(Byte.toUnsignedInt(b))).append(',');
        }
        if ( sb.length() > 1) {
            sb.setCharAt(sb.length() - 1, ']');
        } else {
            sb.append(']');
        }
        return sb.toString();
    }
}

假设我的6字节数组是您的图像文件。并且假设您的缓冲区每次不是读取1024字节而是2字节。这将是编码的输出：

完整数组编码：VfCqD8wz
分块编码：VfAqg8zDM

正如您可以看到的，完整数组的编码给我们8个字符。每组三个字节被转换为四个6位的块，这些块又被转换为四个字符。

但是三个两字节数组的编码却给您一个9个字符的字符串。它是一个完全不同的字符串！每组两个字节通过用零填充来扩展为三个6位的块。由于您要求不进行填充，因此仅产生3个字符，没有额外的=，通常标记字节数不能被3整除。

程序解码正确编码的8个字符部分的输出很好：

从完整解码的字节数组：[1010101,11110000,10101010,1111,11001100,110011]

但尝试解码不正确编码的9个字符的结果是：

Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
    at java.util.Base64$Decoder.decode0(Base64.java:734)
    at java.util.Base64$Decoder.decode(Base64.java:526)
    at java.util.Base64$Decoder.decode(Base64.java:549)
    at testing.SimpleTest.main(SimpleTest.java:34)

不好！一个好的base64字符串应始终具有4的倍数个字符，但我们只有9个。

由于您选择了1024的缓冲区大小，这不是3的倍数，因此问题将发生。每次需要编码3个字节的倍数才能生成正确的字符串。因此，实际上，您需要创建一个大小为3072或类似的缓冲区。

但由于第一个问题，请非常小心地传递给编码器的内容。因为您总会读取少于3072字节。然后，如果该数字不能被三整除，则将出现相同的问题。