Hi Friends,
public GetSpecificPage(int pageNum, String pathFile, String pathPage) {
try {
PDDocument document = PDDocument.load(pathFile);
List list = document.getDocumentCatalog().getAllPages();
int count = document.getNumberOfPages();
PDPage page = list.get(pageNum);
int dpi = 75; // Dots per inch
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, dpi);
File imageFile = new File("D:/tests/pdfbox/" + pageNum + "-dpi-" + dpi + ".png");
ImageIO.write(image, "png", imageFile);
} catch (Exception ex) {
Logger.getLogger(GetSpecificPage.class.getName()).log(Level.SEVERE, null, ex);
}
}
Well even using PDFBox this is not the only approach to extract something out.
File f = new File("D:/tests/pdfbox/1.pdf");
FileInputStream fis = null;
fis = new FileInputStream(f);
PDFParser parser = new PDFParser(fis);
parser.parse();
COSDocument cosDoc = parser.getDocument();
PDDocument pdDoc = new PDDocument(cosDoc);
Splitter splitter = new Splitter();
List pages = splitter.split(pdDoc);
for (int i = 0; i < pages.size(); i++) {
PDDocument pageDoc = pages.get(i);
String fileNameNew = "page_" + i + ".pdf";
pageDoc.save("D:/tests/pdfs/" + fileNameNew);
pageDoc.close();
}
fis.close();
cosDoc.close();
pdDoc.close();
And now with the PDF Renderer- Its bit different.
File file = new File("D:/tests/pdfbox/1.pdf");
RandomAccessFile raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdfFile = new PDFFile(buf);
int num = pdfFile.getNumPages();
for (int i = 0; i < 100; i++) {
PDFPage page = pdfFile.getPage(i);
//get the width and height for the doc at the default zoom
int width = (int) page.getBBox().getWidth();
int height = (int) page.getBBox().getHeight();
Rectangle rect = new Rectangle(0, 0, width+(width%60), height+(height%60));
int rotation = page.getRotation();
Rectangle rect1 = rect;
if (rotation == 90 || rotation == 270) {
rect1 = new Rectangle(0, 0, rect.height, rect.width);
}
//generate the image
BufferedImage img = (BufferedImage) page.getImage(
rect.width, rect.height, //width & height
rect1, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
ImageIO.write(img, "png", new File("D:/tests/pdf1/" + i + ".png"));
}
Thats it in this. We explored just one small but powerful feature of both Libraries. To learn more just download and use.. Its simple.
Next step is to view the images on browser.. That's an easy task if you dont want to take care of security and performance. I believe a lot of free javascript libraries are available which can display. But if we need security and performance we need to take care of images which we are rendering on browser and caching + some other applicable mechanisms which can help with the performance. Anyways It completely depends on the user's choice on how to view them on Web. :D
Cheers!!!
Ravi Kumar Gupta
Did you ever think of any apporach about viewing pdf files on website.. Its not difficult one. Just convert the pdf to image and display. I wondered how google worked and set up Google books, well you will have the answer. Click on this link. If the link is not working no issue. Whats in that is written in next lines.
The link which you just clicked is http://books.google.com/books?id=fcW1xl1BejUC&pg=PA308&img=1&zoom=3&hl=en&sig=ACfU3U0viP5hPVQFiFCyoDc4_zeXPmco-Q&w=685 which is nothing but a dynamic generated url to get the image from google servers. So, even Google is using the images. Answer of standard question "Why" is simple.. Browsers. They cant render parse and render pdf(AFAIK) but they can display images.
The same approach we followed when it came to think about a pdf viewer. There are some very powerful libraries are availabe which can convert any pdf to certain images formats without any error. Unfortunately most of them are not free.. To help open source in this area Apache provided PDFbox and Swinglabs provided PDF renderer. Both are very small and lightweight libraries which enable us to create/edit/convert pdfs.
Below code shows how we can convert pdf to images. But for basics, A pdf is actually a document with pages. And when I say convert to image its not like taking a screenshot but the content of pdf are drawn in a 2d image(At least pdfbox does).To know more about the API docs just download the libraries along with the documentation.
Convert using PDFBox-public GetSpecificPage(int pageNum, String pathFile, String pathPage) {
try {
PDDocument document = PDDocument.load(pathFile);
List
int count = document.getNumberOfPages();
PDPage page = list.get(pageNum);
int dpi = 75; // Dots per inch
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, dpi);
File imageFile = new File("D:/tests/pdfbox/" + pageNum + "-dpi-" + dpi + ".png");
ImageIO.write(image, "png", imageFile);
} catch (Exception ex) {
Logger.getLogger(GetSpecificPage.class.getName()).log(Level.SEVERE, null, ex);
}
}
Well even using PDFBox this is not the only approach to extract something out.
File f = new File("D:/tests/pdfbox/1.pdf");
FileInputStream fis = null;
fis = new FileInputStream(f);
PDFParser parser = new PDFParser(fis);
parser.parse();
COSDocument cosDoc = parser.getDocument();
PDDocument pdDoc = new PDDocument(cosDoc);
Splitter splitter = new Splitter();
List
for (int i = 0; i < pages.size(); i++) {
PDDocument pageDoc = pages.get(i);
String fileNameNew = "page_" + i + ".pdf";
pageDoc.save("D:/tests/pdfs/" + fileNameNew);
pageDoc.close();
}
fis.close();
cosDoc.close();
pdDoc.close();
And now with the PDF Renderer- Its bit different.
File file = new File("D:/tests/pdfbox/1.pdf");
RandomAccessFile raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdfFile = new PDFFile(buf);
int num = pdfFile.getNumPages();
for (int i = 0; i < 100; i++) {
PDFPage page = pdfFile.getPage(i);
//get the width and height for the doc at the default zoom
int width = (int) page.getBBox().getWidth();
int height = (int) page.getBBox().getHeight();
Rectangle rect = new Rectangle(0, 0, width+(width%60), height+(height%60));
int rotation = page.getRotation();
Rectangle rect1 = rect;
if (rotation == 90 || rotation == 270) {
rect1 = new Rectangle(0, 0, rect.height, rect.width);
}
//generate the image
BufferedImage img = (BufferedImage) page.getImage(
rect.width, rect.height, //width & height
rect1, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
ImageIO.write(img, "png", new File("D:/tests/pdf1/" + i + ".png"));
}
Thats it in this. We explored just one small but powerful feature of both Libraries. To learn more just download and use.. Its simple.
Next step is to view the images on browser.. That's an easy task if you dont want to take care of security and performance. I believe a lot of free javascript libraries are available which can display. But if we need security and performance we need to take care of images which we are rendering on browser and caching + some other applicable mechanisms which can help with the performance. Anyways It completely depends on the user's choice on how to view them on Web. :D
Cheers!!!
Ravi Kumar Gupta
Thank you for this nice article...
ReplyDeleteDenny