Go to the table of contents Go to the previous page Go to the next page View or print as PDF
Classifying Content > Patterns & Phrases > Adding a dictionary classifier
Adding a dictionary classifier
Administrator Help | Forcepoint DLP | Version 8.7.x
Use the Patterns & Phrases > Dictionary Properties page in the Data Security module of the Forcepoint Security Manager to create or edit a dictionary classifier either from scratch.
A dictionary is a container for words and expressions belonging to the same language.
*
*
Policies can include a combination of classifier types. For example, a policy might include a regex classifier that identifies alphanumerical sequences found in part numbers, as well as a custom dictionary of part names to further identify risk. This helps to reduce false positives.
To access the Dictionary Properties page:
*
To create a dictionary classifier from scratch, select New > Dictionary in the toolbar at the top of the Patterns & Phrases page.
*
To define or update the dictionary:
1.
Enter a Name for this pattern, such as Diseases.
2.
Enter a Description for this dictionary, such as Disease terminology.
3.
Under List of phrases to include, use the Phrase field to enter a word or phrase to include, then click Add.
Do this for each phrase to include until your list is complete. These phrases, when found in the content, affect whether the content is considered suspicious.
4.
For each phrase, select a Weight, from -999 to 999. When matched with a threshold, weight defines how many instances of a phrase can be present, in relation to other phrases, before triggering a policy.
For example, if the threshold is 100 and a phrase's weight is 10, an email message, Web post, or other destination can have 9 instances of that phrase before a policy is triggered, provided no other phrases are matched. If phrase A has a weight of 10 and phrase B has a weight of 5, 5 instances of phrase A and 10 instances of phrase B will trigger the policy.
The system also deducts the weights of excluded terms. Matches that should be excluded and are therefore not considered breaches are not accounted for in the summation of weight.
By default, if no weight is assigned, each phrase is given a weight of 1.
Thresholds are defined on the policy's Condition tab.
5.
The text file must be of UTF8 format. In the text file:
*
*
*
"confidential",5
"ProjectX",8
"ProjectY",3
*
*
*
*
6.
Indicate whether or not The phrases in this dictionary are case-sensitive.
7.
If you are editing a predefined dictionary, click Exclude to exclude certain values from the classifier, then:
*
Define the regex Pattern to exclude. Click the "i" icon for a list of valid values.
*
Enter a List of phrases to exclude, separated by commas. Click Add to add them to the list. These phrases, when found in combination with the script, affect whether the content is considered suspicious. Click Remove to remove selected strings from the list.
8.

Go to the table of contents Go to the previous page Go to the next page View or print as PDF
Classifying Content > Patterns & Phrases > Adding a dictionary classifier
Copyright 2020 Forcepoint. All rights reserved.