Indexing PDF for OSINT and Pentesting by Alejandro Nolla.
From the post:
Most of us, when conducting OSINT tasks or gathering information for preparing a pentest, draw on Google hacking techniques like site:company.acme filetype:pdf “for internal use only” or something similar to search for potential sensitive information uploaded by mistake. At other times, a customer will ask us to find out if through negligence they have leaked this kind of sensitive information and we proceed to make some google hacking fu.
But, what happens if we don’t want to make this queries against Google and, furthermore, follow links from search that could potentially leak referrers? Sure we could download documents and review them manually in local but it’s boring and time consuming. Here is where Apache Solr comes into play for processing documents and creating an index of them to give us almost real time searching capabilities.
A nice outline of using Solr for internal security testing of PDF files.
At the same time, a nice outline of using Solr for external security testing of PDF files. 😉
You can sweep sites for new PDF files on a periodic basis and retain only those meeting a particular criteria.
Low grade ore but even low grade ore can have a small diamond every now and again.