[转]Apache POI简单说明

wenhai_zhang

浏览: 181449 次
性别:
来自: 深圳

最近访客更多访客>>

GongZhiQiang1989

alineliang

wangzhe1991919

caihualin

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

java

最近项目上需要用到操作Word，于是又看了一下POI相关的API，网上关于POI的API虽然例子很多，大都是一些片段，最后发现ppjava中关于POI的相关表述比较系统，转过来标注一下，以备后用。

Apache POI

http://ppjava.com/?p=1946

Apache POI 是一套用于访问微软 Office 格式文档 (Word, Excel and PowerPoint) 的 Java API。其中用于操作 Excel 格式文件的 API 是 HSSF、XSSF，用于操作 Word 格式文件的 API 是 HWPF、XWPF，以及用于操作 PowerPoint 格式文件的 API 是 HSLF、XSLF。

当前最新的版本 3.9，下载引入 poi-version-yyyymmdd.jar、poi-scratchpad-version-yyyymmdd.jar、poi-ooxml-schemas-version-yyyymmdd.jar 和 poi-examples-version-yyyymmdd.jar即可使用POI，若需要操作 office 2007 文档还如加入 dom4j-version.jar 和 xmlbeans-version.jar。

POI 主要组成部分

POIFS 是该项目的最古老、最稳定的一部分，它同时支持读写功能，是OLE 2 复合文档格式的纯 Java 实现，所有的组件最终都依赖于它的定义；
HPSF 用来处理微软格式文档属性，微软的应用程序如word、excel 或 ppt 有标题、类别、作者和创建日期等文件的属性，这些文件的属性存储在所谓的属性集流（property set streams）中，HPSF是POI的纯Java实现读取和写入属性集；
HSSF 是对 Excel 97 文件格式（.xml）文件操作的纯Java接口，是POI项目中比较成熟的部分，而XSSF 是则用于操作 Excel 2007 的文件格式（.xlsx）；
HSLF 是针对 Microsoft PowerPoint 97 文件格式（.ppt）的文档操作的纯 Java 接口，XSLF 是针对 2007 格式（.pptx）文档操作的接口；
HWPF 是针对 Microsoft Word 97 文件格式（.doc）的文档操作的纯 Java 接口，XWPF 是针对 2007 格式（.docx）文档操作的接口；
HDGF 提供了对微软 Visio 97 文件格式的文档操作的纯 Java 接口，它目前只支持简单的文本提取；
HPBF 是POI项目的纯Java实现的操作Publisher格式文件的接口，目前仅有基本的文本提取的支持，还不支持写操作；
HSMF 用于对Outlook MSG格式文件的低级别的读取，如发件人、主题、邮件正文。

下面我们简单的介绍一下项目中经常会使用到的对于word文件进行操作的接口：

Excel文件操作

目前 POI 比较成熟的部分是 HSSF 接口，处理 MS Excel（97-2003）对象。它不像我们仅仅使用 csv 生成的没有格式的可以由 Excel 转换的东西，而是真正的 Excel 对象，你可以控制一些属性如 cell，sheet 等等。对于统计页数 (sheet 个数 ) 来说，HSSF 接口可以很简单的完成这一功能。当然，HSSF 也有一些缺点，比如不能直接支持 Excel 图表，包与包之间依赖关系比较复杂等等。

HSSF 提供给我们使用的对象在 org.apache.poi.hssf.usermodel 包中，主要部分包括 Excel 97 对象、样式和格式，还有辅助操作。主要有以下几种对象：

org.apache.poi.hssf.usermodel.HSSFWorkbook：对应于 Excel 的文档对象
org.apache.poi.hssf.usermodel.HSSFSheet：对应于 Excel 的表单
org.apache.poi.hssf.usermodel.HSSFRow：对应于 Excel 的行
org.apache.poi.hssf.usermodel.HSSFCell：对应于 Excel 的单元格
org.apache.poi.hssf.usermodel.HSSFFont：对应于 Excel 字体
org.apache.poi.hssf.usermodel.HSSFName：对应于 Excel 名称
org.apache.poi.hssf.usermodel.HSSFDataFormat：对应于日期格式
org.apache.poi.hssf.usermodel.HSSFHeader：对应于 Sheet 头
org.apache.poi.hssf.usermodel.HSSFFooter：对应于 Sheet 尾
org.apache.poi.hssf.usermodel.HSSFCellStyle：对应于 Cell 样式

XSSF 用于操作 Excel 2007 版本，其相关对象在 org.apache.poi.xssf.usermodel 包中。

读取Excel表格：

public static void readExcel(String excelFilePath) {
  FileInputStream inputStream = null;
  try {
    Workbook excel = null;
    inputStream = new FileInputStream(excelFilePath);
    if (excelFilePath.toLowerCase().endsWith(".xls"))
      excel = new HSSFWorkbook(inputStream); // 读取 Excel 97 文档并获取excel
    else
      excel = new XSSFWorkbook(inputStream); // 读取 Excel 2007 文档并获取excel
    for (int i = 0; i < excel.getNumberOfSheets(); i++) {
      Sheet sheet = excel.getSheetAt(i); // 获取第i个sheet
      int rowNumber = sheet.getPhysicalNumberOfRows();
      System.out.println("第 " + (i + 1) + " 个Sheet名称为 " + sheet.getSheetName()
         + " ，有 " + rowNumber + " 行");
      for (int r = 0; r < rowNumber; r++) {
        Row row = sheet.getRow(r); // 获取第r行
        if (row != null) {
          int cellNumber = row.getPhysicalNumberOfCells();
          System.out.println("\t第 " + (r + 1) + " 行有 " + cellNumber + " 个单元格");
          for (int c = 0; c < cellNumber; c++) {
            Cell cell = row.getCell(c); // 获取第c单元格
            if (cell != null) {
              System.out.print("\t\t第 " + (c + 1) + " 个单元格类型为 ");
              switch (cell.getCellType()) {
              case Cell.CELL_TYPE_NUMERIC:
                System.out.println("数字，值为 " + cell.getNumericCellValue());
                break;
              case Cell.CELL_TYPE_STRING:
                System.out.println("文本，值为 " + cell.getStringCellValue());
                break;
              case Cell.CELL_TYPE_BOOLEAN:
                System.out.println("布尔，值为 " + cell.getBooleanCellValue());
                break;
              case Cell.CELL_TYPE_FORMULA:
                System.out.println("公式，值为 " + cell.getCellFormula());
                break;
              case Cell.CELL_TYPE_BLANK:
                System.out.println("空白");
                break;
              case Cell.CELL_TYPE_ERROR:
                System.out.println("故障");
                break;
              default:
                System.out.println("未知类型");
                break;
              }
            }
          }
        }
      }
    }
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  } catch (IOException e) {
    e.printStackTrace();
  } finally {
    close(inputStream);
  }
}

创建excel表格：

public static void createExcel(String excelFilePath) {
  FileOutputStream outputStream = null;
  try {
    outputStream = new FileOutputStream(excelFilePath);
    // Workbook excel = new HSSFWorkbook();// 创建一个 Excel 97 文档对象
    Workbook excel = new XSSFWorkbook(); // 创建一个 Excel 2007 文档对象
    CreationHelper helper = excel.getCreationHelper();
    Sheet sheet = excel.createSheet();
    excel.setSheetName(0, "poi生成Excel");
    // 设置红色粗体字的单元格式样
    CellStyle cellStyle1 = excel.createCellStyle();
    Font font1 = excel.createFont();
    font1.setFontHeightInPoints((short) 12);
    font1.setColor((short) 0xA);
    font1.setBoldweight(Font.BOLDWEIGHT_BOLD);
    cellStyle1.setFont(font1);
    // 设置红底粗体蓝字的单元格式样
    CellStyle cellStyle2 = excel.createCellStyle();
    cellStyle2.setBorderBottom(CellStyle.BORDER_THIN);
    cellStyle2.setFillPattern((short) 1);
    cellStyle2.setFillForegroundColor((short) 0xA);
    Font font2 = excel.createFont();
    font2.setFontHeightInPoints((short) 10);
    font2.setColor((short) 0xf);
    font2.setBoldweight(Font.BOLDWEIGHT_BOLD);
    cellStyle2.setFont(font2);
    int rowIndex;
    for (rowIndex = 0; rowIndex < 6; rowIndex++) {
      Row row = sheet.createRow(rowIndex);
      if (rowIndex % 2 == 0)
        row.setHeight((short) 0x249);
      / /设置行高
      for (int cellIndex = 0; cellIndex < 6; cellIndex += 2) {
        Cell cell = row.createCell(cellIndex);
        cell.setCellValue((rowIndex + 1) * 1000 + cellIndex + 1);
        if (rowIndex % 2 == 0)
          cell.setCellStyle(cellStyle1);
        cell = row.createCell(cellIndex + 1);
        cell.setCellValue(helper.createRichTextString("测试内容"));
        // 设置单元格宽度
        sheet.setColumnWidth(cellIndex + 1, (int)(20 * 8 / 0.05));
        if ((rowIndex % 2) == 0)
          cell.setCellStyle(cellStyle2);
      }
    }
    // 在底部划一根粗线
    Row row = sheet.createRow(rowIndex);
    CellStyle cellStyle3 = excel.createCellStyle();
    cellStyle3.setBorderBottom(CellStyle.BORDER_THICK);
    for (int cellIndex = 0; cellIndex < 6; cellIndex++) {
      Cell cell = row.createCell(cellIndex);
      cell.setCellStyle(cellStyle3);
    }
    // 合并单元格
    sheet.addMergedRegion(new CellRangeAddress(2, 3, 0, 1));
    // 删除一个sheet
    sheet = excel.createSheet();
    excel.setSheetName(1, "DeletedSheet");
    excel.removeSheetAt(1);
    excel.write(outputStream);
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  } catch (IOException e) {
    e.printStackTrace();
  } finally {
    close(outputStream);
  }
}

 

生成的excel文档大概如下：

Word文件操作

HWPF 接口主要用来处理 MS Word（97-2003）对象，是 POI 中相对不太成熟的部分。但可以做一些基本的对于对 word 文档的读写操作。HWPF 提供给我们使用的对象在 org.apache.poi.hwpf.extractor 和 org.apache.poi.hwpf.usermodel 包中，主要部分包括 Word 对象，表格等。主要有以下几种对象：

org.apache.poi.hwpf.extractor.WordExtractor：从 Word 文档中提取出文本的类。
org.apache.poi.hwpf.usermodel.Paragraph：对应于 Word 的一个段落。
org.apache.poi.hwpf.usermodel.Table：对应于 Word 的一个表格。
org.apache.poi.hwpf.usermodel.TableCell：对应于 Word 的表格的一个单元格。
org.apache.poi.hwpf.usermodel.Range：是 HWPF 对象模型的核心类，适用于在 Word 文档中的字符的范围的所有属性扩展这个类。它可以插入文字或者选定一定范围的属性。

而操作 Word 2007 文档则有 XWPF 接口，其相关对象在 org.apache.poi.xwpf.extractor 和 org.apache.poi.xwpf.usermodel 包中。

获取word中文本内容：

public static String readWordText(String wordFilePath) {
  FileInputStream inputStream = null;
  String wordText = null;
  try {
    // 读取 Word 97 文档
    if (wordFilePath.toLowerCase().endsWith(".doc")) {
      inputStream = new FileInputStream(wordFilePath);
      WordExtractor wordExtractor = new WordExtractor(inputStream);
      // wordExtractor.getSummaryInformation().getPageCount();// 获取文档页数
      wordText = wordExtractor.getText();
    } else { // 读取 Word 2007 文档
      OPCPackage oPCPackage = POIXMLDocument.openPackage(wordFilePath);
      XWPFDocument wordDocument = new XWPFDocument(oPCPackage);
      POIXMLTextExtractor textExtractor = new XWPFWordExtractor(wordDocument);
      wordText = textExtractor.getText();
    }
    System.out.println(wordText);
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  } catch (IOException e) {
    e.printStackTrace();
  } finally {
    close(inputStream);
  }
  return wordText;
}
 

【注】POI 在读取 word 文件的时候不会读取 word 文件中的图片信息，如果是 2007 版的 word 文件中有表格，所有表格中的数据都会在读取出来的字符串的最后。对于 HWPF 组件。HWPF 中 WordExtractor 的 SummaryInformation 提供了一个 getPageCount 的方法来获取文档页数，经测试发现其无法正确地读取 word 文件页数，因为此方法获取的页数来自 word 文档的摘要部分，并不是实际的页数。

读取 Word 中的表格：

public static void readWordTable(String wordFilePath) {
  FileInputStream inputStream = null;
  try {
    if (wordFilePath.toLowerCase().endsWith(".doc")) {
      inputStream = new FileInputStream(wordFilePath);
      HWPFDocument wordDocument = new HWPFDocument(
        new POIFSFileSystem(inputStream));
      Range range = wordDocument.getRange(); // 得到文档的读取范围
      TableIterator iterator = new TableIterator(range);
      while (iterator.hasNext()) {
        Table table = iterator.next(); // 获取表格
        for (int r = 0; r < table.numRows(); r++) {
          TableRow row = table.getRow(r); // 获取第r行
          for (int c = 0; c < row.numCells(); c++) {
            TableCell cell = row.getCell(c); // 获取第c个单元格
            for (int p = 0; p < cell.numParagraphs(); p++) {
              Paragraph para = cell.getParagraph(p); // 获取第p个段落
              System.out.println(para.text());
            }
          }
        }
      }
    } else { // 读取 Word 2007 文档
      OPCPackage oPCPackage = POIXMLDocument
        .openPackage(wordFilePath);
      XWPFDocument wordDocument = new XWPFDocument(oPCPackage);
      List < XWPFTable > list = wordDocument.getTables();
      for (XWPFTable table: list)
        for (XWPFTableRow row: table.getRows())
          for (XWPFTableCell cell: row.getTableCells())
            for (XWPFParagraph paragraph: cell.getParagraphs())
              System.out.println(paragraph.getText());
    }
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  } catch (IOException e) {
    e.printStackTrace();
  } finally {
    close(inputStream);
  }
}

分享到：

几个SVN访问https命令出错的问题 | [转]真正的用户体验

2014-11-04 23:56
浏览 1553
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论