Monday, June 4, 2012

Search a word in pdf file in Linux using shell script and poppler library

Hi all,
back after long time

Here is a very simple shell script to search in pdf files in linux environment.
Script is orginally written by  Karsten Wade. I have modified it little bit to suit our need. Please free to contact me at lkpatel123@gmail.com.

Note that for this script to run you need poppler library to be installed.


#!/bin/bash
#
# Copyright 2009 Karsten Wade
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, version 3 of the License.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see .

#echo -en "String to search for in all PDF files? "
#read STRING

if [ $# -ne 2 ] ; then
 echo "please provide valid argument"
 echo "1: dir 2:string to be searched"
 exit
fi

shopt -s nullglob

#iterate over dir

src=$1


COUNT=0


#enable for loops over items with spaces in their name

IFS=$'\n'

for dir in `find "$src/" -type d -print`
do
  #if [ -d "$src/$dir" ]; then
    #yay, we get matches!
   FILES[$COUNT]=$dir"/*.pdf"
   COUNT=$((COUNT+1))
  #fi
done

COUNT=0

STRING=$2
for i in "${FILES[@]}"
do
for i in ${FILES[$COUNT]}
        do
        ch=`pdftotext "$i" - | grep -i $STRING`
        if [ -n "$ch" ]; then
        echo $i
        fi
        done

COUNT=$((COUNT+1))
done

No comments:

Post a Comment