Go to the table of contents Go to the previous page Go to the next page View or print as PDF
Introduction to Machine Learning for TRITON AP-DATA : Selecting examples for training
Selecting examples for training
Machine Learning | TRITON AP-DATA | v8.3.x | 15-Dec-2016
Positive examples
For effective machine learning to occur, it is most important to select the best positive examples. These are textual examples for the data that you want to protect. The documents in this set should be related to a certain theme or share some other commonalities – otherwise the learning algorithm will not be able to find a way to categorize the data.
The required number of examples depends on the level of commonality. If the positive examples share many common terms that are very rare, in general, a small number suffices. On the other hand, if the differences between the positive and the negative set are more subtle, more examples will be required. A positive set typically consists of 100-200 textual documents.
Negative examples
Negative examples refer to samples of data that are semantically or thematically similar to the set of positive samples but that should not be protected, such as public patents versus drafts of patent applications, or non- proprietary source code versus proprietary source code. The size of this set of negative examples can be similar to the size of the positive set, although a larger set is preferable.
Negative examples consisting of "All documents" examples
To create a generic ensemble of documents that Forcepoint machine learning can use as negative examples or counterexamples to the positive set, select the path to a large folder with a representative sample of documents from your organization. This folder can contain both positive and negative examples, but the underlying assumption is that substantially more negative examples exist. The size of this set of counterexamples can be similar to the size of the positive set, although a larger set is recommended.

Go to the table of contents Go to the previous page Go to the next page View or print as PDF
Introduction to Machine Learning for TRITON AP-DATA : Selecting examples for training
Copyright 2016 Forcepoint LLC. All rights reserved.