Final Model
Principles and theories that were explained earlier, were considered while developing a model. This section includes final model demonstration along with principles answering why and where they are included into the model.
Data related principle focuses on choice of appropriate data features for the construction of a model. For the sentiment analysis, it is very important to consider opinions or reviews of the people. So, the datasets that has been collected includes reviews of different movies from Imdb.
The next principles are data quality and feature engineering. Data quality is essential because with high quality data, the accuracy, efficiency is increased lowering the risk in the desired outcome. Thus, data is cleansed and anomalies in the data are removed to ensure the quality of the data.
The next principles are data quality and feature engineering. Data quality is essential because with high quality data, the accuracy, efficiency is increased lowering the risk in the desired outcome. Thus, data is cleansed and anomalies in the data are removed to ensure the quality of the data.
Whereas, feature engineering is referred to observe single factor of the phenomenon being observed. This step allows you to represent underlying structure of data and create best model.
Model development principles is applied or considered while developing a model. Splitting of given dataset to train data to make the model learn about the reviews and test data to evaluate the accuracy of the model answers the use of model validation.
The Naive Bayes Classifier is used to train the data and test the accuracy of the model. The classifier also predicts the ratio of words if it is likely to be present in positive or negative review. Moreover, the comparison of Naive Bayes Classifier and Multi-nominal Naive Bayes Classifier is also given to check maximum accuracy of algorithms. The comparison of the classifier's accuracy gives answer to performance metrics principle.
Naive Bayes Classifier
Multi-nominal Naive Bayes Classifier
Now, after training the model and testing the model for accuracy, user input is checked for positive and negative reviews. With the highest accuracy rate of Multi-nominal classifier, multi-nominal classifier is used to produce the results. Usage principle is considered here. As seen in the figure, the model is able to distinguish positive and negative reviews. Thus, it full fills the principles of fit for purpose.
The governance principles states about reproducibility that refers to using existing data and recreating the same results using the described methods. Following the same approach, I believe results will not vary and have same outcome as demonstrated above.