Principles

Traditional machine learning was initiated from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks and can learn from data.The theory is also known as Computational Learning Theory (CoLT). The goal of this theory is to aid in the design of better automated learning methods and to understand fundamental issues in the learning process itself (Aggrwal and Bhatia, 2015).

The other discipline that ML is guided by is Statistical Learning Theory (SLT). CoLT focuses on studying learn-ability, or what functions/features are necessary to make a given task learn-able for an algorithm whereas SLT focuses on making predictions and studying/improving the accuracy of the model (Luxburg and Scholkopf, 2008) .

According to  Schelldorfer (2019) principles applied for machine learning models are:

Data Related Principle
  • Choice of appropriate data features: Selecting a feature that contributes most to a prediction variable or output. This will ensure high accuracy of the model.
  • Data quality and governance:  Data quality refers to preprocessing of data like data cleansing, validity of data where several models can be created whereas governance deals with security and privacy, integrity, usability, integration, compliance, availability, roles and responsibilities, and overall management of the internal and external data flows within an organization.
  • Feature engineering: Feature engineering, the process creating new input features for machine learning, is one of the most effective ways to improve predictive models. Through feature engineering, one can isolate key information, highlight patterns, and bring in domain expertise.
Model Development Principles
  • Performance metrics: Different performance metrics are used to evaluate different Machine Learning Algorithms. For example a classifier used to distinguish between images of different objects;  classification performance metrics such as, Log-Loss, Accuracy, AUC can be used.
  • Model validation:  In machine learning, model validation is referred to as the process where a trained model is evaluated with a testing data set. The testing data set is a separate portion of the same data set from which the training set is derived.
  • Model calibration: Calibration is the iterative process of comparing the model with real system, revising the model if necessary, comparing again, until a model is accepted (validated).
  • Model uncertainty: Applied machine learning requires getting comfortable with uncertainty. Uncertainty means working with imperfect or incomplete information. The solution to uncertainty is systematically evaluate different solutions until a good or good-enough set of features and/or algorithm is discovered for a specific prediction problem.
  • Robustness: Robust is a characteristic describing a model's, test's or system's ability to effectively perform while its variables or assumptions are altered, in order for a robust concept to operate without failure under a variety of conditions. In general, being robust means a system can handle variability and remain effective.
Usage Principles
  • Fit for purpose
  • Explainability
  • Recalibration
  • Change Management
Governance Principles
  • Reproducibility and Auditability
The principles and goals define the criteria that should be included while developing a machine learning algorithm. Therefore, I will focus on including principles while creating a model for this project and set to confirm all the goals have been achieved.