Patterns & Phrases

Administrator Help | TRITON AP-DATA | Version 8.3.x

Related topics:

Adding or editing a regular expression classifier

Adding a key phrase classifier

Adding a dictionary classifier

File properties

To view or manage a list of content classifiers based on patterns:

Click Main > Policy Management > Content Classifiers.

Select Patterns & Phrases. Both user-defined and built-in patterns are shown. These are distinguished by the icons and the Type column. You can sort the list by this column. Refer to Predefined Classifiers for details about each Pattern & Phrase classifier.

Also shown are the existing dictionary and key phrase classifiers, if any.

Click New to add a new regular expression, key phrase, or dictionary, Delete to delete the selected classifier, or Where Used to view where the classifier is used. The column, Used in a Policy, indicates whether the classifier is used in a policy at all.

RegEx patterns are special text strings for describing search patterns that can be detected within content. (Content includes the body of the content as well as any attachments). You define the patterns to look for in content and you set the action to take when a pattern is found.

For example, the string "a\d+" matches all strings that start with the letter "a" and are followed by at least one digit, where "\d" represents any digit and "+" represents "at least one." When the extracted text from a transaction is scanned, TRITON AP-DATA uses regular expressions to find strings in the text that match patterns for confidential information. For example, this is a very basic regular expression for catching Visa credit card numbers:

\b(4\d{3}[\-\\]\d{4}[\-\\]\d{4}[\-\\]\d{4})\b

Because a regular expression file contains many internal attributes, if it is improperly written it can create many false-positive incidents, slow down the system, and impede analysis.

One way of mitigating false positives in a pattern is to exclude certain values that falsely match it. When defining the classifier, you can define a Pattern to exclude listing words or phrases that are exceptions to the pattern rule (search for all Social Security numbers except these numbers that look like Social Security numbers but are not).

You can also add a List of phrases to exclude listing words or phrases that, when found in combination with the pattern, affect whether or not the content is considered suspicious.

Another way to mitigate false positives is to consider the pattern as suspicious only when some other pattern or set of words appear in the analyzed data. To do this, you create another content classifier (a pattern, dictionary or any other), and combine the 2 in the condition of your rule with an AND operator.

When creating a rule for your policy, you can specify how many instances (matches) of the pattern must be found before the content is considered suspicious enough for the action to be taken (for example, 2 Social Security numbers seems reasonable, but 4 is already suspect). You do this on the Condition tab of the Rule Properties sheet.

For each content transmission, the system tallies the number of instances in which the pattern was found in the content.

If the number of pattern matches is less than the number of matches set, the content is not considered suspicious and there is no further analysis.

If the number of pattern matches is equal to or greater than the number of matches set, the content triggers the action specified in the rule that uses this pattern.

Example:

The pattern is Social Security numbers and the number of matches is 4. The body of an email contains 3 Social Security numbers; the subject contains 2 Social Security numbers. Since there were 5 pattern matches, and this is greater than the number of set matches, the message triggers the action specified in the rule that uses this pattern.

When a pattern to exclude is added

You can define a list of exceptions to the pattern. This is a list of content that matches the pattern but should not be considered in the tally of pattern matches. For each content transmitted, the system tallies the number of instances in which the pattern was found in the content, and subtracts the number of pattern-matches that are included in the Exclude list and compares this final number with the number of matches set.

Example:

The pattern is Social Security numbers, the number of matches is 2, and the list of excluded patterns is: 111-11-1111, 222-22-2222, and 333 33 3333 (total of three in the excluded list). The email contains 7 Social Security numbers: 111-11-1111, 222-33-4444, 444-55-6666, 555-66-7777, 222-22-2222, 777888-9999, 333-33-3333. The number of pattern matches is 7, minus 3 excluded patterns that were found in the email, thus equal to 4. Since 4 is greater than the number of matches (2), the message triggers the action specified in the rule that uses this pattern.

When a list of phrases to exclude is added

You can add a String List that lists suspicious words to the patterns. When you do, for each content item transmitted, the action specified in the rule that uses this pattern is triggered only if the total number of pattern matches is above the number of matches and a word from the specified dictionary was found. If the number of matches is reached but no words from the dictionary are present, no further analysis is performed.

Example:

The pattern is Social Security numbers, the number of matches is 2, and the String List contains the phrases "Social Security" and "credit card." The distributed content contains 3 Social Security numbers: 111-22-3333, 222-33-4444, 444-55-6666, but none of the words were found. Since the number of found distributed content (3) is greater than the number of matches (2), but there were no dictionary words in the email, no action is taken.