Accuracy of machine learning

Technical Library | Support

Go to the table of contents

Go to the previous page

Go to the next page

View or print as PDF

Introduction to Machine Learning for TRITON AP-DATA : Accuracy of machine learning

Accuracy of machine learning

Machine Learning | TRITON AP-DATA | v8.3.x | 15-Dec-2016

The ability of the system to accurately classify data depends to a large extent on the examples that you provide. If the system fails to find enough common elements, the results from machine learning may not be accurate. Should this happen, the system performs another stage of validation to assess the level of false positives (unintended matches) and false negatives (undetected matches) on new data that is not used during the training phase, sometimes referred to as "zero-day documents."

If the "recall" level of the classifier (i.e., the total number of "true positives" divided by the sum of false positives and false negatives in the new data) is below 70 percent, the system returns a FAIL message that includes the likely reason the attempt to accurately classify data failed. Examples of these error messages follow:


Error Code	Error Message
DSCV_ERR_-420_CODE	There are not enough examples in your positive examples folder. X were provided and at least Y are required. Please add more examples then restart the machine learning process.
DSCV_ERR_-421_CODE	There are not enough examples in your negative examples folder. X were provided and at least Y are required. Please add more examples then restart the machine learning process.
DSCV_ERR_-422_CODE	The files in your positive examples folder don't contain enough text. Of X files provided, only Y have enough text. At least Z are required. Please update the files or point to another folder, then restart the machine learning process.
DSCV_ERR_-423_CODE	The files in your negative examples folder don't contain enough text. Of X files provided, only Y have enough text. At least Z are required. Please update the files or point to another folder, then restart the machine learning process.
DSCV_ERR_-424_CODE	Your positive and negative examples are too similar. No significant difference in words distribution was found. Please provide new examples.
DSCV_ERR_-425_CODE	Your positive and negative examples are too similar, or your positive examples may not be consistent enough to draw conclusions. There were bad error rates on both training X and validation Y. Use different example folders in the classifier.
DSCV_ERR_-426_CODE	The examples you provided were not sufficient for accurate training. Though the accuracy of the training set is good X, the machine learning process cannot make accurate conclusions on unseen data X. Your positive examples may not be homogeneous enough. Please provide more consistent examples then restart the machine learning process.
DSCV_ERR_-427_CODE	Your examples don't fit the content type you specified. You provided X positive examples, but only {2} of them fit the type.
DSCV_ERR_-428_CODE	The files in your example folders don't contain enough meaningful text (only X words). Please add files with more meaningful content or point to other folders, then restart the machine learning process.
DSCV_ERR_-429_CODE	More than one file in your examples folders doesn't contain enough text (only X words). Please update the files or point to other folders, then restart the machine learning process.

By adjusting the sensitivity level of the classifier, you can reduce the number of false negatives (unintended matches) while accepting a higher level of false positives (undetected matches) or accept some false negatives to reduce the rate of false positives (or find an acceptable balance in between). Factors influencing your choice include the level of commonality in your positive set of examples (a low level tends to decrease accuracy); the business implications of false positives; and the resources that you have available to deal with false positives.

Go to the table of contents

Go to the previous page

Go to the next page

View or print as PDF

Introduction to Machine Learning for TRITON AP-DATA : Accuracy of machine learning

Copyright 2016 Forcepoint LLC. All rights reserved.