TRITON - Data Security Help
|
|
|
|
|
Classifying Content > Database fingerprinting > Preparing for database fingerprinting > Creating a validation script
|
Fingerprinting cells with some values, such as multiple short values, can lead to multiple false-positive incidents. Websense Data Security includes a mechanism that forwards database data to an external script for processing before fingerprinting.Each database fingerprints classifier can use a validation script. The validation script receives an input file containing the raw database data in a CSV format, and returns CSV data containing the information that should be fingerprinted.Validation scripts must be designed to receive at least two parameters: An input path name and an output path name. An additional parameter, containing a configuration file path name, is optional.The input file, received from Data Security, is a CSV file with a header row containing the database column names. Each line is delimited by a valid windows line break (CRLF), and all values are double-quotes escaped. A sample package containing a sample input file, among other things, is available through Websense Technical Support.The output file should be of the same format as the input file, but instead of using CRLF as the line delimiter, it uses CRCRLF (2 Carriage-Return characters, and a Line-Feed character). An output sample file is available on the same package as the sample input file.
1. Optionally, create a copy of the following files in the \ValidationScripts folder where Data Security was installed (typically C:\Program Files\Websense\Data Security\ValidationScripts).
![]()
<classifier-name> is the name of the classifier on which the script will be implemented. Alternatively, the word "default" may be used, for scripts that are to be implemented on all classifiers that don't have specific scripts named after them.
![]()
bat is the extension for a batch file.
![]()
exe is the extension for an executable.
![]()
py is the extension for a python script.If the script requires a configuration file, then name the configuration file using the following convention:Place all files in the \ValidationScripts folder on the server where Data Security is installed (typically C:\Program Files\Websense\Data Security\ValidationScripts).Every validation script must be an executable or a batch file. If there is a need for an infrastructure element, for example the python interpreter, the operating system must be able to automatically initiate the element when the script is being called. To ensure the correct file association is configured, Websense recommends running the script from the command line, without reference to any other executable.
Pay attention not to leave more than one executable or configuration file with the same name and different extension in the validation scripts directory.
3. The script should receive 2 command-line parameters from Data Security: the full path of a source file Data Security creates, and the full path where Data Security expects to find a destination file.
![]()
The first line of the source file includes the names of the columns that are available for fingerprinting. The remaining lines contain the data in those columns.
![]()
The destination file should be formatted in the same way as the source file—with the names of the columns that were fingerprinted on the first line. Note that the number of columns varies if your script adds or removes columns.
![]()
Your script should return a return code of 0 if everything succeeded, and non-zero if there was a problem.
4. If you want your script to receive a configuration file, place it in the same location as the script, and name it with the same name as the script file followed by .xml or .ini. If this file is found, it is supplied as a third parameter to your script.
5. Create and run the fingerprinting classifier as described in Creating a database fingerprint classifier. Name the classifier with the name given in step 2.During the scan, if the crawler finds a script named <classifier-name>_validation.[bat|exe|py], it runs that script. If it does not, it searches for a script named default_validation.[bat|exe|py] and runs that.If the crawler receives a non-zero return code from the script, the fingerprinting process stops and an appropriate error is returned. In this case, you can either fix the script or remove it then refingerprint.When Data Security finds a validation script, the Sample Data screen in the database fingerprinting wizard shows validated data, and not the raw data extracted from the database/CSV. (This is on the Field Selection page of the wizard, where you click View Sample Data.) You can use this to make sure that the validation script behaves as expected, and to see the exact information that is protected.You can obtain a sample validation script from Websense Technical Support and modify it to suit your needs. The script contains the basic abilities required for most customers, such as removing NULL or single-character values from being fingerprinted.
![]()
default_validation.bat - Sample validation script
![]()
validation_logic.py - Used by the sample validation script.
![]()
default_validation.ini - Sample configuration file
![]()
default_validation.ini.sample - An additional configuration sample file
![]()
dictionary.txt - Sample dictionary file
![]()
in.csv - Sample input file
![]()
out.csv - Sample output fileThe first 3 files are also included (with the .sample extension, for the batch and ini files) in the Data Security installation package.The sample validation script is a production grade script, which is suitable for many customers. Install it by copying the default_validation.bat, validation_logic.py and the default_validation.ini files into the \ValidationScripts folder, which is located in the Data Security installation folder (typically C:\Program Files\Websense\Data Security\ValidationScripts).Please note that although you can change the filenames of the default_validation.bat and default_validation.ini according to the conventions mentioned above, do not rename the validation_logic.py file. The validation_logic.py file must be present in the \ValidationScripts directory (typically C:\Program Files\Websense\Data Security\ValidationScripts) in its original form.The following additions and changes can be configured through the default_validation.ini configuration file:
![]()
It is possible to create a dictionary file that contains a list of strings for the validation script to remove. The file should be a line delimited UTF-16 file, and its path name should be written in the IgnoredDictionary configuration option in regular file system format. (For example c:\directory\dictionary.txt.) You can create UTF-16 files in Windows Notepad by saving the text with 'Unicode' encoding.The default_validation.ini.sample file, which is part of the package, is a sample file containing such a definition. The dictionary.txt file is a sample dictionary file.
![]()
Add the column name, in lower case, to the columns parameter. Separate column names by semicolons.
![]()
Add a configuration section for the column by appending [column-name] to the file (again, lower case). This is the section header.
![]()
Add a RegExp parameter under the relevant (newly added) section header. Its value is a regular expression.
![]()
The default_validation.ini sample file contains this type of validation for email addresses and social security numbers. These can be used as a reference.
Additional configuration options are available. Contact Websense Technical Support for further assistance.
|
|
|
|
|
Classifying Content > Database fingerprinting > Preparing for database fingerprinting > Creating a validation script
|