How Forcepoint machine learning works

Technical Library | Support

Go to the table of contents

Go to the previous page

Go to the next page

View or print as PDF

Introduction to Machine Learning for TRITON AP-DATA : How Forcepoint machine learning works

How Forcepoint machine learning works

Machine Learning | TRITON AP-DATA | v8.3.x | 15-Dec-2016

Supervised machine learning for data protection requires, in general, two types of examples: content that needs to be protected and counterexamples. The former is usually referred to as "positive" and the latter as "negative." Counterexamples are documents that are thematically related to the positive set yet are not meant to be protected, such as public patents versus drafts of patent applications, or non-proprietary source code versus proprietary source code.

However, since it can be difficult and quite labor intensive to find a sufficient number of documents for the negative set (which includes ensuring that no positive examples are inside this set), Forcepoint has developed methods that allow the system to use a generic ensemble of documents as counterexamples to the positive set. (See Negative examples consisting of "All documents" examples and Positive examples).

For text-based data, some of the algorithms automatically create an optimal "weighted dictionary" that assigns positive weights to terms and phrases that are more likely to be included in the positive set and negative weights to terms and phrases that are more likely to be included in the negative set. The algorithms also find an optimal threshold. When the weighted sum of the terms that are found in a given document is greater than that threshold, the algorithm decides that the document belongs to the positive set. The assumption is that positive examples are more likely to have common themes.

Most machine learning algorithms are designed to be used with several hundred or several thousand positive and negative examples and require "clean" data, or data that is correctly labeled. Forcepoint machine learning, however, utilizes different algorithms for different data sizes and attempts to automatically match the type of algorithm to the size of the data.

In addition, Forcepoint machine learning algorithms can detect "outliers" among a set of positive examples. These are examples that should probably not be labeled "positive." Forcepoint algorithms also allow learning to take place even when negative examples are not provided.

Go to the table of contents

Go to the previous page

Go to the next page

View or print as PDF

Introduction to Machine Learning for TRITON AP-DATA : How Forcepoint machine learning works

Copyright 2016 Forcepoint LLC. All rights reserved.