ContentBasedClassificationModule Project

Demonstrates
	This is an introduction to the extensibility model for FSRM Classification. 
	It demonstrates how to develop a custom classifier that will be a part of the FSRM pipeline and 
	enable applying custom rules during the FSRM classification phase.

	In this example, the classifier can be setup to search for strings in any file's content that
	has an IFilter registered for its extention type.
	When a classification rule is defined, the Additional Classification Parameter property sheet 
	(Rule Properties -> Classification -> Advanced -> Additional Classification Parameters) can be used
	to pass parameters to the classifier. This classifier accepts any number of parameters in the following format:

	<key/value> = <search-word-1/search-word-2>
	
	It searches on both keys and values.

Languages
     This sample is available in the following language implementation:
     C++
     
Files

	ContentBasedClassificationModule.vcproj
		This is the main project file for VC++ projects generated using an Application Wizard. 
		It contains information about the version of Visual C++ that generated the file, and 
		information about the platforms, configurations, and project features selected with the
		Application Wizard.

	ContentBasedClassificationModule.idl
		This file contains the IDL definitions of the type library, the interfaces
		and co-classes defined in your project.
		This file will be processed by the MIDL compiler to generate:
			C++ interface definitions and GUID declarations (ContentBasedClassificationModule.h)
			GUID definitions                                (ContentBasedClassificationModule_i.c)
			A type library                                  (ContentBasedClassificationModule.tlb)

	ContentBasedClassificationModule.h
		This file contains the C++ interface definitions and GUID declarations of the
		items defined in ContentBasedClassificationModule.idl. It will be regenerated by MIDL during compilation.

	ContentBasedClassificationModule.cpp
		This file contains the object map and the implementation of your DLL's exports.

	ContentBasedClassificationModule.rc
		This is a listing of all of the Microsoft Windows resources that the
		program uses.

	ContentBasedClassificationModule.def
		This module-definition file provides the linker with information about the exports
		required by your DLL. It contains exports for:
			DllGetClassObject  
			DllCanUnloadNow    
			GetProxyDllInfo    
			DllRegisterServer	
			DllUnregisterServer

	StdAfx.h, StdAfx.cpp
		These files are used to build a precompiled header (PCH) file
		named ContentBasedClassificationModule.pch and a precompiled types file named StdAfx.obj.

	Resource.h
		This is the standard header file that defines resource IDs.
		
	ContentBasedClassifier.h
		This contains the class that implements the IFsrmClassifierModuleImplementation interface
		enabling this classifier to become a part of the FSRM classification pipeline. It uses the FsrmTextReader
		to load an IFilter based on a file's extention and parse the file's data streams to obtain text chunks. It 
		can parse and search for text on non-text files such as DOCX, XLS or any file that has a registered IFilter (persistent handler).
		
	install.cmd/register_app.vbs/registerwithfsrm.vbs
		These scripts register the dlls, register with COM+ (for debugging ease) and register with FSRM respectively.
		registerwithfsrm.vbs contains the CLSID of the classifier, as well as the hosting model (external vs. local server).
		For ease of debugging, the script registers the server as external and the process running the classifier
                can be identified as the service with name of the classifier, FsrmSampleIFilterClassifier.
 
Prerequisites

	Windows Server 2008 R2

Building the Sample

To build the sample using the command prompt:
=============================================
     1. Open the Command Prompt window and navigate to the  directory.
     2. Type msbuild ContentBasedClassificationModule.sln


To build the sample using Visual Studio 2005 (preferred method):
================================================
     1. Open Windows Explorer and navigate to the  directory.
     2. Double-click the icon for the ContentBasedClassificationModule.sln (solution) file to open the file in Visual Studio.
     3. In the Build menu, select Build Solution. The application will be built in the default \Debug or \Release directory.

Installing the Sample

	Run the attached install.cmd script.

	This script will need to be edited if any changes to the classifier or its hosting model is required.
	This does the following:
		Installs the binaries to %systemdrive%\FsrmSampleIFilterClassifier\		
		regsvr32 /s ContentBasedClassificationModule.dll
		register_app registers with COM+
		registerwithfsrm registers this classifier with the FSRM pipeline
		

Running the Sample

To run the sample:
=================
     1. Run File Server Resource Manager (FSRM.msc)
     2. Go to Classification Management -> Classification Properties
     3. Define a custom property 
     4. Go to Classification Rules
     5. Create a new rule, provide rule name, scope, and on Classification Tab select 'Sample IFilter Based Classifier' in 
		classification mechanism.
	 6. Click Advanced and go to Additional Classification Parameters
	 7. Specify foo, bar as a <name,value> pair in the data grid
	 8. Click OK 
	 9. Create several files (using any application, DOC, DOCX, XLS) in the scope directory, some containing the word foo and/or bar as content.
	 10. Run 'Run Classification Now' from the Actions pane.
     11. The files containing 'foo' and or 'bar' in contents are classified.

