使用Apache POI API将xlsx转换为csv

19

我正在尝试将 .xlsx 文件转换为.csv文件,虽然可以进行转换但数据格式不正确。请查看下面的代码并建议更改。

这里我正在尝试读取一个.xlsx文件并将其写入csv文件即将xlsx转换为csv,但是我没有按正确格式获得.csv文件,所有数据都显示在一行中,但必须像Excel中的行一样显示。

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Iterator;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class XlsxtoCSV {

    static void xlsx(File inputFile, File outputFile) {
        // For storing data into CSV files
        StringBuffer data = new StringBuffer();

        try {
            FileOutputStream fos = new FileOutputStream(outputFile);
            // Get the workbook object for XLSX file
            XSSFWorkbook wBook = new XSSFWorkbook(new FileInputStream(inputFile));
            // Get first sheet from the workbook
            XSSFSheet sheet = wBook.getSheetAt(0);
            Row row;
            Cell cell;
            // Iterate through each rows from first sheet
            Iterator<Row> rowIterator = sheet.iterator();

            while (rowIterator.hasNext()) {
                row = rowIterator.next();

                // For each row, iterate through each columns
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) {

                    cell = cellIterator.next();

                    switch (cell.getCellType()) {
                        case Cell.CELL_TYPE_BOOLEAN:
                            data.append(cell.getBooleanCellValue() + ",");

                            break;
                        case Cell.CELL_TYPE_NUMERIC:
                            data.append(cell.getNumericCellValue() + ",");

                            break;
                        case Cell.CELL_TYPE_STRING:
                            data.append(cell.getStringCellValue() + ",");
                            break;

                        case Cell.CELL_TYPE_BLANK:
                            data.append("" + ",");
                            break;
                        default:
                            data.append(cell + ",");

                    }
                }
            }

            fos.write(data.toString().getBytes());
            fos.close();

        } catch (Exception ioe) {
            ioe.printStackTrace();
        }
    }
    //testing the application 

    public static void main(String[] args) {
        //reading file from desktop
        File inputFile = new File("C:\\Users\\user69\\Desktop\\test.xlsx");
        //writing excel data to csv 
        File outputFile = new File("C:\\Users\\user69\\Desktop\\test1.csv");
        xlsx(inputFile, outputFile);
    }
}

9
你缺少了换行符。 - Swapnil
Excel中的行必须在CSV中呈现相似,即如果我在Excel中有5行,则应在CSV中获得5行,但上述代码将所有五行添加到单个行中,我希望它们也能以行的形式添加到CSV中。 - user2335416
为什么你不只是保存为...csv文件呢?弄完了。 - Bhavik Shah
@Swapnil,你应该把它添加为答案 :-) - Jaydeep Patel
2
如果您不在每行末尾有条件地删除一个逗号,那么这是否也会在每行上创建一个额外的列? - fIwJlxSzApHEZIl
显示剩余2条评论
3个回答

16

这是感谢@Swapnil!

data.append("\r\n"); // After the columns have been appended.

以下内容由@Abdullah编辑(添加)
我的原始回答影响力不是很大,但是阿卜杜拉的编辑表现出了很多努力,因此我将其留给那些遇到这个问题和答案的人。
public class App {

    public void convertExcelToCSV(Sheet sheet, String sheetName) {
        StringBuilder data = new StringBuilder();
        try {
            Iterator<Row> rowIterator = sheet.iterator();
            while (rowIterator.hasNext()) {
                Row row = rowIterator.next();
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();

                    CellType type = cell.getCellTypeEnum();
                    if (type == CellType.BOOLEAN) {
                        data.append(cell.getBooleanCellValue());
                    } else if (type == CellType.NUMERIC) {
                        data.append(cell.getNumericCellValue());
                    } else if (type == CellType.STRING) {
                        String cellValue = cell.getStringCellValue();
                        if(!cellValue.isEmpty()) {
                            cellValue = cellValue.replaceAll("\"", "\"\"");
                            data.append("\"").append(cellValue).append("\"");
                        }
                    } else if (type == CellType.BLANK) {
                    } else {
                        data.append(cell + "");
                    }
                    if(cell.getColumnIndex() != row.getLastCellNum()-1) {
                        data.append(",");
                    }
                }
                data.append('\n');
            }
            Files.write(Paths.get("C:\\Users\\" + sheetName + ".csv"),
                data.toString().getBytes("UTF-8"));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String [] args)
    {
        App app = new App();
        String path =  "C:\\Users\\myFile.xlsx";
        try (InputStream inp = new FileInputStream(path)) {
            Workbook wb = WorkbookFactory.create(inp);

            for (int i = 0; i < wb.getNumberOfSheets(); i++) {
                System.out.println(wb.getSheetAt(i).getSheetName());
                app.convertExcelToCSV(wb.getSheetAt(i), wb.getSheetAt(i).getSheetName());
            }
        } catch (Exception ex) {
            System.out.println(ex.getMessage());
        } 
    }
}

这段代码运行良好。唯一遗漏的是在 NUMERIC 上判断它们是否不是日期类型。我在 CellType.NUMERIC 分支中使用以下语句:if (DateUtil.isCellDateFormatted(cell)) { SimpleDateFormat simpleDateFormat = new SimpleDateFormat(dateFormat); data.append(simpleDateFormat.format(cell.getDateCellValue())); } else { data.append(cell.getNumericCellValue()); } - michal.jakubeczy

5
使用Commons CSV来编码单元格值,这样更加健壮。不幸的是,仍需要额外的代码来迭代每个单元格并在每个单元格上调用Commons CSV(XSSF不提供此功能),但至少所写入的实际单元格值保证是标准CSV(即您无需担心转义字符或自行添加逗号)。
Maven中添加Commons CSV:
<dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-csv</artifactId>
            <version>1.5</version>
        </dependency>   

一旦Commons CSV可用,这是将工作簿导出为CSV的代码。此示例将写入OutputStream,但文件输出也很容易。
// Convert an XSSFWorkbook to CSV and write to provided OutputStream
private void writeWorkbookAsCSVToOutputStream(XSSFWorkbook workbook, OutputStream out) {        
    CSVPrinter csvPrinter = null;       
    try {       
        csvPrinter = new CSVPrinter(new OutputStreamWriter(out), CSVFormat.DEFAULT);                

        if (workbook != null) {
            XSSFSheet sheet = workbook.getSheetAt(0); // Sheet #0 in this example
            Iterator<Row> rowIterator = sheet.rowIterator();
            while (rowIterator.hasNext()) {               
                Row row = rowIterator.next();
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    csvPrinter.print(cell.getStringCellValue());
                }                   
                csvPrinter.println(); // Newline after each row
            }               
        }

    }
    catch (Exception e) {
        log.error("Failed to write CSV file to output stream", e);
    }
    finally {
        try {
            if (csvPrinter != null) {
                csvPrinter.flush(); // Flush and close CSVPrinter
                csvPrinter.close();
            }
        }
        catch (IOException ioe) {
            log.error("Error when closing CSV Printer", ioe);
        }           
    }
}   

在所有单元格上使用 cell.getStringCellValue() 会导致具有不同类型的单元格(例如数字、日期等)出现错误。您需要先确定单元格类型(使用 CellType type = cell.getCellTypeEnum()),然后通过 switch 语句获取值。 - michal.jakubeczy

0
public static void convertToXlsx(File inputFile, File outputFile) {
StringBuffer bf = new StringBuffer();
    FileOutputStream fos = null;
    String strGetValue = "";
    try {
        fos = new FileOutputStream(outputFile);
        XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(inputFile));
        XSSFSheet sheet = wb.getSheetAt(0);
        Row row;
        Cell cell;
        int intRowCounter = 0;
        Iterator<Row> rowIterator = sheet.iterator();
        while (rowIterator.hasNext()) {
            StringBuffer cellDData = new StringBuffer();
            row = rowIterator.next();
            int maxNumOfCells = sheet.getRow(0).getLastCellNum();
            int cellCounter = 0;
            while ((cellCounter) < maxNumOfCells) {
                if (sheet.getRow(row.getRowNum()) != null
                        && sheet.getRow(row.getRowNum()).getCell(cellCounter) != null) {
                    cell = sheet.getRow(row.getRowNum()).getCell(cellCounter);
                    switch (cell.getCellType()) {
                    case Cell.CELL_TYPE_BOOLEAN:
                        strGetValue = cell.getBooleanCellValue() + ",";
                        cellDData.append(removeSpace(strGetValue));
                        break;
                    case Cell.CELL_TYPE_NUMERIC:
                        strGetValue = new BigDecimal(cell.getNumericCellValue()).toPlainString();
                        if (DateUtil.isCellDateFormatted(cell)) {
                            strGetValue = new DataFormatter().formatCellValue(cell);
                        } else {
                            strGetValue = new BigDecimal(cell.getNumericCellValue()).toPlainString();
                        }
                        String tempStrGetValue = removeSpace(strGetValue);
                        if (tempStrGetValue.length() == 0) {
                            strGetValue = " ,";
                            cellDData.append(strGetValue);
                        } else {
                            strGetValue = strGetValue + ",";
                            cellDData.append(removeSpace(strGetValue));
                        }
                        break;
                    case Cell.CELL_TYPE_STRING:
                        strGetValue = cell.getStringCellValue();
                        String tempStrGetValue1 = removeSpace(strGetValue);
                        if (tempStrGetValue1.length() == 0) {
                            strGetValue = " ,";
                            cellDData.append(strGetValue);
                        } else {
                            strGetValue = strGetValue + ",";
                            cellDData.append(removeSpace(strGetValue));
                        }
                        break;
                    case Cell.CELL_TYPE_BLANK:
                        strGetValue = "" + ",";
                        cellDData.append(removeSpace(strGetValue));
                        break;
                    default:
                        strGetValue = cell + ",";
                        cellDData.append(removeSpace(strGetValue));
                    }
                } else {
                    strGetValue = " ,";
                    cellDData.append(strGetValue);
                }
                cellCounter++;
            }
            String temp = cellDData.toString();
            if (temp != null && temp.contains(",,,")) {
                temp = temp.replaceFirst(",,,", ", ,");
            }
            if (temp.endsWith(",")) {
                temp = temp.substring(0, temp.lastIndexOf(","));
                cellDData = null;
                bf.append(temp.trim());
            }
            bf.append("\n");
            intRowCounter++;
        }
        fos.write(bf.toString().getBytes());
        fos.close();
    } catch (Exception ex) {
        ex.printStackTrace();
    } finally {
        try {
            if (fos != null)
                fos.close();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}
private static String removeSpace(String strString) {
    if (strString != null && !strString.equals("")) {
        return strString.trim();
    }
    return strString;
}

  1. 代码示例处理,以及单元格中的空格字符

    #



我遇到了文件过早结束的错误。你知道如何解决吗?进程终止——一些文档出现异常,设置为停止所有文档:在[sub]_ihm-037_FinancialHierarchy处理中出现错误。进程终止——一些文档出现异常,设置为停止所有文档:执行数据处理时出现错误;原因是:文件过早结束。(在groovy2脚本中);原因是:文件过早结束。 - NK7983

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接