My Knowledge Management Process with PDF Files – Part 1

A few times I’ve mentioned about how I manage collections of PDF documents, Text files and the accreted detritus of a Network Engineer. Since a number of people have asked me to talk in more detail about how I organise this and what methods I use, here is some rough description of how I perform knowledge management.

PDF or Web Pages

In terms of reading documentation to find what I am looking for then, without doubt, a Web page is the easiest way to review lots of materials to find what you are looking for. Most of the web pages on a vendor site aren’t the core technical information that I’m looking for.

Once I’ve found a useful document, white paper, technical reference, product manual or whatever then I will download it to keep for reference.

Why download it ?

Big companies have big websites. And big websites have things changing and moving, and a bookmark isn’t useful two or three years later because the content has moved. It’s frustrating to recall that read about something but cannot find it. If it’s a blog page, then blogs often disappear when people change lives and decide to stop paying for the hosting. Or maybe the content changes over time. I’ve learned that you can’t rely on the Internet to be a reliable content store.

Downloading it isn’t enough.

So, now that you have downloaded a bunch of PDF files you have to find a way to organise them into meaningful data.

Consider a PDF file from Brocade on VCS Fabric Technical architecture. I might want to classify this into the following:

  • Brocade
  • VCS
  • Ethernet Fabric
  • Data Centre
  • Design
  • Architecture

In the same way a Cisco MediaNet 4.0 QoS Design Guide might be any of the following:

  • Design
  • Cisco
  • QoS
  • shaping
  • design
  • Design Guide
  • Policing
  • video
  • voip

and so on. To solve this problem, you need more than a directory on hard disks, you need a tool that can organise files according to some other schema. I found myself wasting many hours searching for PDFs that I remember reading but cannot find.

Many Tools

My toolset for handling this is entirely based around Mac OSX specific features and software that I’ve bought. In general, I could have scripted any of these tasks but the time taken to write and debug these scripts is more than the cost of the apps that do these tasks. And many of these apps have way more features than a script, and these features are also valuable. As a rule, I choose to pay for my software since I believe that these products will survive over the longer term.Therefore I will walk through the process as I adopted tools, then moved onto more complex tools as I tried new ways of working.

As far as I know, there are no programs that perform equivalent functions on Windows or Linux. Frankly, nor do I care. Go somewhere else if you want those solutions.


The first tool I used was HoudahSpot. Regular Mac Users will be familiar with the Spotlight which creates a detailed search index of all files on the disk. HoudahSpot is a front end that helps to search for files.

Caption Text.

HoudahSpot with a sample search.(Click for a full size image)

As you can see, you can search in the text or the name, or in your email. There is almost nothing that cannot be search in an easy to use interface. It supports search templates which allows me to define regular search types instead of typing up repetitive searches.

So HoudahSpot worked well for while and I still use it today for searching because it’s easier and faster than Spotlight – even with regex searches. However, I needed to gather files together in better ways – while I could find things, I couldn’t organise things.

Although today I’d consider the Mac App Store version which is called Tembo – but I already own HoudahSpot. It’s a great application and I highly recommend it.

A small thing. Apple Spotlight does not easily search non-standard locations outside of the Documents folder. HoudahSpot allows you to search anywhere on your disk drives by creating your default search template – useful for apps that allow you to have unnamed files in the iCloud folders.

HoudahSpot integrates with QuickLook

One of the greatest utilities in OSX is QuickLook. Whenever you have a file selected in Finder (or just about any other app) you can press the spacebar to get a preview of the file. Works for almost any file type, even Word files.

Caption Text.

Using Quick Look to Preview Files in HoudahSpot.(Click for a full size image)

Directory Storage and Leap

Searching is fine, but I really need to peek or preview files to see what they are. Although Preview in HoudahSpot does work, sometimes I need something faster. So the next program I used a lot is Leap from Ironic Software.

Caption Text.

Caption Text.(Click for a full size image)

As you can see, it’s supports OpenMeta tags (more on this in Part 2)  for search, and can edit them. You can hit the Spacebar to QuickLook any file (OSX feature) and you can change the view to get a display of the first page – often useful.

Caption Text.

Caption Text.(Click for a full size image)

When I went through my Leap phase, I stored files in directories by vendor and then tagged files by technology. The nice thing about Leap is that you find things that you had forgotten about. Serendipity is a great tool for having a wide recall of knowledge. Therefore, I use Leap to cruise around my PDF library which hasn’t been tagged (see below) and sorted in DevonThink (also, more below).

I’ve also looked at the sister product from Ironic called Yep. I couldn’t quite understand how to make that work, but I think it could do some limited knowledge management.

In Part Two, I’ll look more at Tagging and Collecting Text and PDF files.

Other Posts in A Series On The Same Topic

  1. Screencast: Knowledge Management in Technology - Part 3 (9th January 2013)
  2. Screencast: Knowledge Management in Technology - Part 2 (3rd January 2013)
  3. Screencast: Knowledge Management in Technology - Part 1 (30th December 2012)
  4. Important to Get Data Out and Well as In (21st May 2012)
  5. My Knowledge Management Process with PDF Files - Part 2 (13th March 2012)
  6. My Knowledge Management Process with PDF Files - Part 1 (12th March 2012)
  • Christopher Hayre

    Evernote is particularly handy for this as well.  You can tag, search, organize, etc.  Also handy to be able to access it from any device.  Of course you’d want to use caution and best judgement when posting anything sensitive, though.  Better to leave that stuff someplace trusted.

    • Etherealmind

      I talk about Evernote in Pt 2 – personally I found it to be useless. And recommend using DevonThink Pro. Tune in later this week …..

  • Francesco

    Windows 7 have a tag in his NTFS

    • Francesco

       Combined with Windows Search is OK, no?

      • Etherealmind

        Don’t know – as I said, I don’t use Windows.

      • Einar Aleksejev

        If it is NTFS level, then copying the file to another file system you lose all the information.

  • Pingback: My Knowledge Management Process with PDF Files – Part 2 — My EtherealMind()

  • Pingback: Using Hazel and Open Meta CLI for Document Management.()

  • Pingback: Internets of Interest for 21st April 2012 — My EtherealMind()

  • Pingback: Important to Get Data Out and Well as In — My EtherealMind()

  • Einar Aleksejev

    Free Adobe PDF iFilter 9 for 64-bit platforms is doing same job as paid Foxit.