WordPress

last person joined: 28 days ago 

A place for nonprofit WordPress developers and content managers of all skill levels.
The WordPress group is an engaged network of WordPress developers and content managers, for all skill levels, by WordPress users for WordPress users, to encourage the usage of and advocate for WordPress.

Our goal: to support nonprofit organizations using (or interested in using) WordPress. Additionally, this is a safe and friendly place for beginning WordPress developers and users to ask questions and connect to like-minded people.

Short link: http://community.nten.org/wordpress

1.  Wordpress searching through PDF documents

Posted Feb 11, 2017 10:22
We have a large library of PDF documents (at least 50gb) that we are transferring from a Drupal site to a Wordpress site. We want the PDF content to be searchable on the wordpress site. We are considering placing the document library in either an Azure blob or an Amazon S3 space. We use WPEngine and my understanding is they have a LargeFS connection to Amazon S3 sites built into their system.

My understanding from looking at PDF search plugins is that the PDF documents have to reside in the main wordpress media library to make the content searchable.  

Does anyone know if there are search plugins that treat the WPEngine Large FS as the main media library?  
Or, does anyone know of search plugins that would search PDFs housed in an Azure blob?

Any suggestions would be helpful!

--
Nancy R. Rose
Chief Operating Officer
N.C. Center for Public Policy Research/EducationNC
P. O. Box 430, Raleigh NC 27602



2.  RE: Wordpress searching through PDF documents

Posted Feb 13, 2017 09:46
I have no direct experience with this, but did a little digging online.  This looks like a possibility to investigate: https://codecanyon.net/item/php-search-engine/89499

I saw in another forum where someone recommended Advanced Google Search for such a task. 

Does anyone else have ideas for Nancy?  

Cindy

Cindy Leonard, Consulting Team Leader
Bayer Center for Nonprofit Management at Robert Morris University, 
339 Sixth Ave, Ste 750, Pgh, PA 15222
p 412-397-6007 | f 412-397-6016 | leonard@rmu.edu | www.bcnm-rmu.org





3.  RE: Wordpress searching through PDF documents

Posted Feb 13, 2017 09:59

Hi Nancy,

We used SearchWP (https://searchwp.com) for one of our projects for indexing PDFs (and creating interface for search and results). In our project, PDFs were on the same server (host) as WP and total PDFs were closer to 8 GB. SearchWP handled the indexing without problem. It might be worth checking out the plugin and their support to ask about PDFs on external host.

Amar
--
Amar Trivedi | AmDee LLC | Phone: (240) 342-6271 ext. 700
Twitter: @AmDeeLLC | LinkedInlinked.in/amartrivedi/





4.  RE: Wordpress searching through PDF documents

Posted Feb 13, 2017 09:58
Edited by Jason King Feb 13, 2017 09:58

Apache Solr

I used it once on a project where synonym searching was important. We could build a thesaurus of preferred and alternative terms, really useful for a project where the language used by "real people" isn't the same as the legal terminology.

I remember it looked complex, but really wasn't so difficult to implement.

It can search common document formats inc Word and PDF, so not just HTML.

If I remember rightly, you can get a free or paid-for hosted version, and there are a few WordPress plugins that help you integrate it into a website, for example WPSOLR .



------------------------------
Jason King
Freelance WordPress development and Google Ad Grant management
Carcassonne, France

www.kingjason.co.uk

Twitter: @jasoncsking
------------------------------



5.  RE: Wordpress searching through PDF documents

Posted Feb 14, 2017 17:17

Like Amar, I wondered about SearchWP which is my preferred search plugin and one of three I recommend. That said, it looks like it doesn't support out-of-Media-Library search:

Can SearchWP index PDFs & Documents stored outside the Media library?

 

NO. SearchWP requires that PDFs & documents be uploaded to your WordPress Media library. In order for SearchWP to index and return results, each entry must have it’s own canonical, WordPress-provided object ID. This ID is assigned when files are uploaded to the Media library, and is essential for SearchWP.

If you are using a document management plugin that stores uploads outside of the Media library of your WordPress install, SearchWP will NOT be able to work with these files.

Then I went searching for S3 search and came across Amazon Cloud Search. I didn't spend much time with the docs or thinking about how the integration with WordPress would work, but that does seem like a place to start.



------------------------------
Mark Root-Wiley

MRW Web Design / MRWweb.com / @MRWweb
Thoughtful WordPress Website for Nonprofits & Mission-Driven Organizations
Seattle, WA
------------------------------



6.  RE: Wordpress searching through PDF documents

Posted Feb 15, 2017 18:29
Thanks everyone.  We have installed SearchWP and are going to see if it will work with our WPEngine's LargeFS which sends files over to an S3 bucket after it's "lived" in media for 10 days.  The support team at WPEngine says it should still work.

One issue I think we will encounter is our users are accustomed to searching with phrases or multiple words like "charter schools" or "community colleges" and SearchWP does not seem to work with multiple words.  

Will keep looking around for other options as well, so if you come across anything, let me know.

Again, THANKS!

------------------------------
Nancy Rose
Executive Director
North Carolina Center for Public Policy Research, Inc.
Raleigh, NC
------------------------------



7.  RE: Wordpress searching through PDF documents

Posted Feb 15, 2017 19:05

Good luck, Nancy!

We have installed SearchWP and are going to see if it will work with our WPEngine's LargeFS which sends files over to an S3 bucket after it's "lived" in media for 10 days.  The support team at WPEngine says it should still work.

From what you describe, I bet SearchWP will work. (For document indexing, it extracts the contents—when possible—and saves that separately to the WordPress database, so the document doesn't need to be on the server once it's been indexed.)

One issue I think we will encounter is our users are accustomed to searching with phrases or multiple words like "charter schools" or "community colleges" and SearchWP does not seem to work with multiple words.

Sounds like something isn't working right, if you're not seeing results for those. (For starters, make sure that your index is fully built.) You may need to play with SearchWP's settings around "keyword stemming" or by installing one of their "fuzzy matching" add-ons which are free. For large datasets, it takes a bit of experimentation to get your search results quite right!

Again, good luck!



------------------------------
Mark Root-Wiley

MRW Web Design / MRWweb.com / @MRWweb
Thoughtful WordPress Website for Nonprofits & Mission-Driven Organizations
Seattle, WA
------------------------------



8.  RE: Wordpress searching through PDF documents

Posted Feb 16, 2017 09:47
Nancy,

SearchWP requires a little tweaking to make it work with phrases etc. and there is definitely a bit of learning curve (not your typical WP plug-n-play plugin). Their support is slow (sometimes up to 24 hours) but they do genuinely try to help you.

Good luck with the project!

Amar
--
Amar Trivedi | AmDee LLC | Phone: (240) 342-6271 ext. 700
Twitter: @AmDeeLLC | LinkedInlinked.in/amartrivedi/