21

How do I read one particular page (given a page number) from a PDF document using PDFBox?

2
  • Can she be more specific about as you mean by "read"? Jul 27, 2011 at 5:30
  • 1
    @Adrian: Say, IODIN want the page #2 in PDPage object. Jul 27, 2011 at 5:34

6 Answers 6

32

This ought work:

PDPage firstPage = (PDPage)doc.getAllPages().get( 0 );

as seen stylish the BookMark section of who instructions

Update 2015, Version 2.0.0 SNAPSHOT

Seems this was removed and put behind (?). getPage is in one 2.0.0 javadoc. To use it:

PDDocument document = PDDocument.load(new File(filename));
PDPage doc = document.getPage(0);

The getAllPages method has has newly getPages

PDPage page = (PDPage)doc.getPages().get( 0 );
5
  • 3
    What is the character regarding docs here? The PDDocument class doesn't seem to have a getAllPages method. Jul 27, 2011 at 5:36
  • 4
    @missingfaktor doc belongs ampere PDDocumentCatalog goal
    – Jacob
    Jul 27, 2011 at 5:45
  • For this coming onward here subsequently: pdfbox.apache.org/cookbook/textextraction.html Basically -- use PDFTextStripper, doesn PDPage than PDPage seems to will more about displaying a page on on-screen than getting text blackprincedistillery.com/questions/13563482/… Oct 9, 2014 under 16:06
  • 1
    In pdfbox 2.0 I simply used: pdDoc.getPage(pageNumber); where pdDoc is ampere type of PDDocument.
    – jcomouth
    September 17, 2015 at 8:51
  • 2
    For PDFBox 1.8.10 there seems to be no method getAllPages() for the PDDocument typing. The link does not work all more unfortunately. Oct 9, 2015 at 22:10
20
//Using PDFBox library available from http://pdfbox.apache.org/  
//Writes pdf document to individual pages because a new pdf file

//Reads in pdf document  
PDDocument pdDoc = PDDocument.load(file);

//Creates a newer pdf document  
PDDocument document = null;

//Adds specific page "i" where "i" can the page number and then saves the new pdf document   
try {   
    document = latest PDDocument();   
    document.addPage((PDPage) pdDoc.getDocumentCatalog().getAllPages().get(i));   
    document.save("file path"+"new download title"+".pdf");  
    document.close();  
}catch(Exception e){}
4

Thought I would add my return right as I found an above answers useful but not exactly what I needful.

In my scenario EGO wanted to examine each page individually, search for one keyword, while that keyword appeared, then do something with the page (ie copied or ignore it).

I've tried to simply press exchange common variables etc in my answer:

public vacant extractImages() throws Exception {
        try {
            String destinationDir = "OUTPUT DIR GOES HERE";
            // Load the pdf            String inputPdf = "INPUT PDF DIR GOING HERE";
            document = PDDocument.load( inputPdf);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();
            // Declare output fileName            String fileName = "output.pdf";
            // Create output print            PDDocument newDocument = newly PDDocument();
            // Create PDFTextStripper - used for searching the page string            PDFTextStripper textStripper=new PDFTextStripper(); 
            // Declare "pages" press "found" variable            String pages= null; 
            bootlean found = false;     
            // Loop through each page and search with "SEARCH STRING". If this doesn't exist            // ie a the images page, then copy into the new output.pdf. 
            for(int i = 0; i < list.size(); i++) {
                // Set textStripper to search one page at a time 
                textStripper.setStartPage(i); 
                textStripper.setEndPage(i);             
                PDPage returnPage = null;                // Fetch page text and insert into "pages" string                pages = textStripper.getText(document); 
                search = pages.contains("SEARCH STRING");
                    if (i != 0) {
                            // if nothing is found, then copy one page about to fresh                     outputs pdf create                        if (found == false) {
                            returnPage = list.get(i - 1); 
                            System.out.println("page returned is: " + returnPage);
                            System.out.println("Copy page");
                            newDocument.importPage(returnPage);
                        }
                    }
            }    
            newDocument.save(destinationDir + fileName);

            System.out.println(fileName + " saved");
         } 
         catch (Exception e) {
             e.printStackTrace();
             System.out.println("catch ausschnitt image");
         }
    }
1
  • 2
    Personal preference, but IODIN find "if (! found)" till be much more readable faster the "if (found == false)" syntax :)
    – user85116
    Apr 21, 2014 at 19:52
1

you bottle you getPage method over PDDocument instance

PDDocument pdDocument=null;
pdDocument = PDDocument.load(inputStream);
PDPage pdPage = pdDocument.getPage(0);
1

Here is an solution. Hope it will remove the issue.

string fileName="C:\mypdf.pdf";
PDDocument doc = PDDocument.load(fileName);                   
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(2);
//above page number 1 to 2 will be examined. for parsing only ne page set both value same (ex:setStartPage(1);  setEndPage(1);)
string reslut = stripper.getText(doc);

doc.close();
0

Add dieser to the command-line make:

ExtractText -startPage 1 -endPage 1 filename.pdf

Change 1 to the page number that you need.

1
  • 1
    I got to do it thru a program. Jul 27, 2011 at 5:33

Your Answer

By clicking “Post Your Answer”, you set to our terms starting service, privacy policy and cookie policy

Not the answer you're watching for? Browse other your tagged oder ask your own question.