Mobile Study: PDFBox

2012年11月23日金曜日

PDFBox

PDFBoxはPDFファイルを操作するオープンソースのライブラリです。主な機能は以下の通りです。

PDFファイルからのテキストの抽出
PDFファイルの結合
PDFファイルの暗号・複合化
検索エンジン Lucene の組み込み
FDFデータの埋め込み
イメージをPDFに変換・PDFからのイメージ取得

ライセンス：Apache License, Version 2.0

情報源

公式サイト

サンプル

PDFファイルの読み込み

 String readFile = "xxx.pdf";
 PDDocument pdf = null; // ドキュメントオブジェクト
 FileInputStream pdfStream = null;
 try {
     pdfStream = new FileInputStream(readFile);
     PDFParser pdfParser = new PDFParser(pdfStream);
     pdfParser.parse(); // 分析
     pdf = pdfParser.getPDDocument();
 } catch (Exception e) {
     e.printStackTrace();
 } finally {
     if (pdfStream != null) {
         pdfStream.close();
     }
 }

PDFファイルの書き込み

 String writeFile = "xxx.pdf";
 COSWriter writer = null;
 FileOutputStream stream = null;
 try {
     stream = new FileOutputStream(writeFile);
     writer = new COSWriter(stream);
     writer.write(pdf); // ドキュメントオブジェクトの出力
 } catch (Exception e) {
     e.printStackTrace();
 } finally {
     if (stream != null) {
         stream.close();
     }
     if (writer != null) {
         writer.close();
     }
 }

フィールドの埋め込み

 String name = "title"; フィールドの名前
 String value = "タイトルです"; // フィールドに埋め込む文字列
 PDDocumentCatalog docCatalog = pdf.getDocumentCatalog();
 PDAcroForm acroForm = docCatalog.getAcroForm();
 PDField field = acroForm.getField(name); // フィールド取得
 if (field != null) {
     field.setValue(value); // フィールドに埋め込み
 } else {
     System.err.println("フィールドが見つかりません。:" + name);
 }

PDFからイメージを抽出

 String readFile = "C:\\tmp\\Antenna_Data_Sheet.pdf";
 PDDocument pdf = null; // ドキュメントオブジェクト
 FileInputStream pdfStream = null;
 try {
     pdfStream = new FileInputStream(readFile);
     PDFParser pdfParser = new PDFParser(pdfStream);
     pdfParser.parse(); // 分析
     pdf = pdfParser.getPDDocument();
     int imageCounter = 1;
     List pages = pdf.getDocumentCatalog().getAllPages();
     Iterator iter = pages.iterator();
     while (iter.hasNext()) { // 全ページからイメージを抽出
         PDPage page = (PDPage) iter.next();
         PDResources resources = page.getResources();
         Map images = resources.getImages();
         if (images != null) {
             Iterator imageIter = images.keySet().iterator();
             while (imageIter.hasNext()) {
                 String key = (String) imageIter.next();
                 PDXObjectImage image = (PDXObjectImage) images.get(key);
                 String name = key + "-" + imageCounter;
                 imageCounter++;
                 System.out.println("Writing image:" + name);
                 image.write2file(name); // ファイル出力
             }
         }
     }
 } catch (Exception e) {
     e.printStackTrace();
 } finally {
     if (pdfStream != null) {
         pdfStream.close();
     }
 }

Mobile Study

2012年11月23日金曜日

PDFBox

情報源

サンプル

PDFファイルの読み込み

PDFファイルの書き込み

フィールドの埋め込み

PDFからイメージを抽出

0 件のコメント:

コメントを投稿