Go to the table of contents Go to the previous page Go to the next page View or print as PDF
Introduction to Forcepoint DLP Machine Learning : Selecting examples for training
Selecting examples for training
Machine Learning | Forcepoint DLP | v8.4.x, v8.5.x, v8.6.x
Positive examples
For effective machine learning to occur, it is most important to select the best positive examples.
*
*
Without the commonalities, the learning algorithm will not be able to find a way to categorize the data.
The required number of examples depends on the level of commonality. If the positive examples share many common terms that are very rare, in general, a small number suffices. On the other hand, if the differences between the positive and the negative set are more subtle, more examples will be required. A positive set typically consists of 100–200 text documents.
Negative examples
Negative examples are samples of data that are semantically or thematically similar to the set of positive samples, but that should not be protected.
The size of this set of negative examples can be similar to the size of the positive set, although a larger set is preferable.
Negative examples consisting of "All documents"
To create a generic ensemble of documents that Forcepoint DLP machine learning can use as negative examples, select the path to a large folder with a representative sample of documents from the organization. This folder can contain both positive and negative examples, but substantially more negative examples should exist.
The size of this set of counterexamples can be similar to the size of the positive set, although a larger set is recommended.

Go to the table of contents Go to the previous page Go to the next page View or print as PDF
Introduction to Forcepoint DLP Machine Learning : Selecting examples for training
Copyright 2018 Forcepoint. All rights reserved.