How to I get paper documents into SharePoint 2010….3 easy options

Well I was perusing the LinkedIn Groups this weekend while down from dental surgery and I found that a few discussion points were around the topic:

“I have a gazillion paper documents, and I want to put them into SharePoint”

My suggestion, don’t mix up the desktop shredder and the scanner, I only made that mistake once….Smile but seriously that is a great topic that many folks don’t know SharePoint can do.  Well it can, with some help.  SharePoint is a kick butt document management system (ECM), that is what folks use if for anyway…”Glorified File Share”

So here is some ideas on ways to get that process moving.


First lets cover tools.  I am going to start with some well documented, and well liked commercial solutions:

Kofax, Websio, and KnowledgeLake these tools are extremely solid, and have been implemented on some of the biggest enterprise farms going today.  I have experience with KnowledgeLake and Kofax.  Have to say very impressed with the UI and the Support for the tools.  The installation was tricky, but worth it in the end.  Try them out, don’t take my word for it.

With commercial level tools like this you are going to pay a premium for licensing, and support each year.  So weight that into your consideration.  Remember once you implement a solution that is a pay for service, you have to consider what the cost of turning it off is…(rework, training, solution removal, refactoring, change management, etc.) this should be considered in your Total Cost of Ownership.

Custom Development – Codeplex

OK so let’s look at options your developers are going to love you for. 

embDocumentInhalator – a codeplex solution has a few hundred implementations.  I have tried it and have to say, I was impressed with the end solution.  Now the base solution you download is boring, and looks well like garbage, so you will need to style it.  Second I that you still need to have a scanner solution to leverage, I recommend seems to work with nearly everything I have thrown at it.  However you can come up with a scanner solution specific to your monster Canon or Konica if you wish.  In some cases someone has already so look around on the internets…lol

Option 2 for the custom folks is a Do It You Selves option.  Using some fun .NET, Powershell commands you can do nearly anything.  Just keep in mind if you build it, you own it.  You own the maintenance, support, and long term life.  Total Cost of Ownership, again.

OOB Creativity

OK so now let’s consider how in gods name you can do this OOB with $0.00 dollars other than you Enterprise License.  Sorry folks if you have Foundation or Standard you need to consider the above two options. 

So the trick is three steps:

  • Step 1: Content Type Classification

Why content types, well like in Comic Books “with great power comes great responsibility”, and I do not want you just scanning in a gazillion pdfs into my SharePoint.  I know, how is a content type going to solve this.  We are going to using auto classification to assist us with this.  By creating a content type called “Scanned File” you now have a way to apply metadata, workflows, publishing, retention policy, IMP (information management policy) and a slew of other content type goodness.

Create the content type and plan to add it to the libraries I talk about in step 2.

  • Step 2: Scan to file

So most folks have a network copy/scan/fax machine, if you don’t “seriously save some money and buy one on e-bay”, we are going to use it to the max.

Recommend creating a Site Collection for this the first time, just incase you hate the results.  Otherwise, any old document library will work.  I like to enable email support, enable content types, and adjust some thresholds for this site to support large files, and a lot of them.  Just keep in mind the file size of a few hundred Scanned PDF files.  Apply the fun content type we create before.

Now we could spend a lot more time on settings, but naaa that is boring.  Once you have the libraries in place (with some fun names like Scan Repository A, Scan DropBox) you have two options you can add those to your network as mapped network locations just like a file share, or create an associated email address, and configure the email settings on the site, to save file delete email.  I like the email option, as this also lets you send in scanned documents from any location via email, glorified document FTP.

  • Step 3: Metadata

Now once the file is in the library go to your SharePoint Dropbox and file the file.  One recommendation is creating a DataView or a custom view via SP Designer to support easy editing.   I have also seen folks implement a Javascript to allow hoverover image preview, cool feature idea.

To make my life easier I also like to customize the EDIT.aspx and VIEW.aspx in InfoPath to give some UI support for the common user.

  • Step 4: Workflow/Powershell move of the file to its final home.

How can we do this with a OOB workflow you ask….No this is the part you want to break open Visual Studio.  I have done this with SP Designer, but I prefer a robust approach here.  Now if you want to distribute the files to alternate Site Collections, or convert the files to another format, you can do this thru Visual Studio, and it can be more sexy this way, allowing you to perform these advanced functions.  Give your developer something to do, I do recommend this option instead of the SP Designer, as this can be a formal Site Feature, you can reuse.  You can also write in the code to do your clean up and delete the original file, or preform this action as a move. 

The other nice part about this in code, is that you can also re-classify to a content type in the destination if you, like.

Remember this is like creating a Custom Send To Location, and just running the code to push the button.

  • Step 5: Search, ensure your Index is running PDF iFilter, and has the home locations on a regular index routine.

In conclusion

So when someone asks can you do it, YEP and I can give you ten ways to Sunday on how-to.  That is the SharePoint way right.  Remember, look at your options and Total Cost.  If this is Pilot or just for the Legal Team or HR team to get some contracts in the system I suggest look at Codeplex first.  If you are trying to kill off that warehouse of banker boxes that you lease from Uncle Charlie, go Commercial you will get your monies worth.   I can never say this enough, try before you buy. 

Why E-Discovery needs a SharePoint expert on call!

So I was reviewing and doing demos of a bunch of SharePoint e-discovery tools.  Most built leveraging the Microsoft.Office.RecordsManagement.Holds namespace.  What I kept running into was something that troubled me. 

How is a e-discovery person going to understand SharePoint.  It takes some folks years to figure out the intricate challenges and configurations of SharePoint that have a ‘cause and effect’ on the data.  SharePoint is a discipline in itself.

Let’s look at how most are dealing with this challenge.


Lets take about a SharePoint Expert.

A SharePoint Expert is a person that has years of experience, leveraging the solution. Deploying all versions, and variations of SharePoint (2003,2007,2010), and can speak to the configuration, methodology, best practices, and key challenges of the solution.  They can sit in a stand and say that the SharePoint farm was XXX, under oath, putting there career on the line.

Now what I like to point out at this time is that you are not going to find a lot of SharePoint Experts.  SharePoint Administrators YES, Experts NO.  In Portland Oregon, you have about 10 SharePoint Architects that really know SharePoint, I say about 20 in the State of Oregon.  That is not a lot.  However I cannot count the amount of folks who over the years have claimed “guru” status, but have deployed garbage SharePoint farms.  It is a fine line.

Check references, and reach out to Microsoft, they can point you to folks they trust.


First.  ACTUAL APPLICATION – Lets look at the scenarios I have seen in demo from the tool makers.  Here is a common theme.

You (a forensic tech or auditor) arrive onsite, or you work for the company and you are doing an investigation. The IT Admin or SharePoint Admin has built a best case SharePoint Farm( s ) and has a full suite of tools he has deployed.  You are able to access them and run your search and holds.  Find your data, and leave.

I have to say that seems very nice, however have yet to meet a In-house legal team that is going to run its own investigations.  Cost of there time, and availability of tech savvy staff.  The other bump I see in this is no-one does investigations in field unless the data cannot leave the location due to sensitive nature.  We all like to do acquisitions, and bring all the data back to our shops to do full reviews with all our tools in hand.  Third, not many organizations are have the budgets anymore to afford toolsets, or expensive plugins that do not show value.

I am not going to pick on the tools, as I do not know how they were coded, but that does drive to the question……how good are the tools.

Second.  CASE LAW – Since no one wants to tell you how they did it “magicians rules”, I see it being a little hard for these SharePoint Tool manufacturers testifying on the methodology of the process.  One of the reasons the courts trust tools like EnCase and FTK is the years of Case review, and law enforcement leveraging it in the field. We will have to see if a case comes to bear that has a supported tool usage.

OK so lets put the tools to the side for a moment and say wow they all work just as advertised.


So I have been doing Information Architecture and Records Management in SharePoint for a while, and have to say I love the SharePoint Records Center in 2010.  For context on this topic Microsoft Enterprise Content Management (ECM) Team Blog has written a nice intro to the technology.  It compliments the implementation of Taxonomies, MetaData Management, Retention Policies/Schedules and all the warm fun that is Records Management in a ECM platform.

Where I have the challenge is, very few organizations have deployed it.  Many that have did it wrong, to quickly, not training the staff, or not deploying SharePoint using a Retention Schedule as the Content Type Hierarchy.  This is a vortex when it comes to SharePoint past and future, go to the blogs for horror stories of deployments.

The Records Management part of SharePoint is only as good…..Once Again… the Deployment.  The few great examples that I have seen of a complete Records Management solution were absolutely gorgeous.  Now I have to take credit I did built 2 of them…LOL.  That is the challenge though SharePoint Records Management and the Records Center needs to have a strong infrastructure behind it.  Remember this if you plan to deploy a Records Center.

Make sure you have a strong Governance, a very well documented and deployed taxonomy, ensure you have all your metadata tagging is correct, ensure your retention schedules are properly configured, and most of all.  Train your Legal Team, they cannot use it unless they know how it works.

I like to ask the SharePoint Administrator “Can you guarantee me that the SharePoint farm and all the sites in it are working correctly, based on best practices?”, anyone who says yes……is lying.  What best practices, we all have different demented uses for SharePoint, and all our users do things we never know about until we get the support call.


So now I tackle the WHY, so you might ask “Scott that was a nice ramble, what was the point?”  Great question, here you go.  A tool can only work as good as a SharePoint farm is built, governed, and administrated. We all have seen systems that major companies are running on that look good, till you open them up…and OMG how do you stay in business.  That is the same with SharePoint.  It is a monster platform that can somehow weather some very bad deployments, lack of governance, and administration.

Most of the tools today use the SharePoint Index.  So the tool only works as good as the SharePoint Search Configuration.  Also you can exclude files, lists, libraries, entire sites from an Index, so they would be invisible to the Index.    oops did you just miss the entire site.

So let’s look at this from a different angle, if I was a SharePoint Admin trying to clean up after a Executive.  I empty all the Recycle Bins, Admin level too. Set some locations to “no-index”, Tweek the Search Configuration and do a fresh build of all my Index files.  Then do a fresh set of Backups.  Magic,  the data is GONE.

Another reason, how many folks know what is really IN SharePoint.  I love to use BCS, or data connections to other systems allow a seamless view into the whole organization.  Using Workflows to shift data to alternate systems, using Blob Storage techniques, or using search to reach file shares.  Alternately how about some Page View webparts, or CQWP that is fetching to the page a complex query from a system not even in the farm.  Users in most cases do now know were the data is, so if you do your investigation based on there BELIEF you might be looking in SharePoint for the wrong reasons entirely.


Don’t depend on a tool to know how the system is built, or how data could be lost in the vacuum of the architecture.  It is nice to have a SharePoint Expert on speed dial just incase you encounter it in the field.  More and more organizations are using it, and you will find that just finding the answers can be herculean if you don’t even know where to start.  Be safe, and I hope all your e-discovery dream are golden. Smile