PDF (Portable Doc Format) files allow you to store text and image data for offline usage. To show text and graphics online, utilize a PDF file. To embed PDF files in the browser, utilize a web viewer. The text and graphic material are not included in the PDF file that is embedded on a webpage. The inability to render PDF content on the page has an impact on SEO. To get over this issue, extract text from PDF and upload it to the website.
PHP may be used to extract elements from PDF files using the PDF Parser module. This PHP library parses PDF files and extracts the text content from every page. Text, headers, and metadata can all be extracted from the PDF file using PHP. This tutorial will show you how to use PHP to extract text from PDF files.
You can use this sample script to see how to utilize PHP's PDF Parser module to extract text from PDF files. Additionally, we'll demonstrate how to use PHP to upload PDF files and extract data instantly.
To install the PDF Parser library alongside the composer, use the following line.
composer require smalot/pdfparser
It should be noted that all necessary files are included within the code source, so you don't need to install the PDF Parser library separately. If you want to set up and run PDF Parser with a composer, you can get the source code.
Include an autoloader in a PHP script to load the PDF Parser library and utility functions.
include 'vendor/autoload.php';
The PHP code snippet that follows pulls all of the text from a PDF file.
<?php
$parser = new SmalotPdfParserParser();
$PDFfile = 'test.pdf';
$PDF = $parser->parseFile($PDFfile);
$PDFContent = $PDF->getText();
echonl2br($PDFContent);
?>
You can explore more features by viewing the PDF Parser library documentation here.
This snippet of code demonstrates how to use PHP to upload PDFs and extract the text from them. Define the HTML elements used in forms for file uploads.
<form action="parse.php" method="POST" enctype="multipart/form-data">
<div class="pdf-input">
<label for="pdf">PDF File</label>
<input type="file" id="pdf" name="pdf" placeholder="Select a PDF file" required="">
</div>
<input type="submit" name="submit" class="btn btn-large" value="Submit">
</form>
The chosen file is uploaded to the server script for further processing when the form is submitted.
Server-side script (parse.php) to extract text from PDF File:
You can upload the file and extract the data from the PDF using the code below.
$PDFContent = '';
if(isset($_POST['submit'])){
if(!empty($_FILES["pdf"]["name"])){
$PDFfileName = basename($_FILES["pdf"]["name"]);
$PDFfileType = pathinfo($PDFfileName, PATHINFO_EXTENSION);
$allowTypes = array('pdf');
if(in_array($PDFfileType, $allowTypes)){
include 'vendor/autoload.php';
$parser = new SmalotPdfParserParser();
// Source file
$PDFfile = $_FILES["pdf"]["tmp_name"];
$PDF = $parser->parseFile($PDFfile);
$fileText = $PDF->getText();
// line break
$PDFContent = nl2br($fileText);
}
else
{
$PDFContent = '<p>only PDF file is allowed to upload.</p>';
}
}
else
{
$PDFContent = '<p>Please select a file.</p>';
}
}
// Display content
echo $PDFContent;
That's it! hope it's helpful. Hace a nice day!