Code

According to the latest TIOBE Programming Community Index, Python is one of the top 10 popular programming languages of 2017. Python is a general purpose and high level programming language. Python is used for developing desktop GUI applications, websites and web applications including data science and analysis. Also, Python, as a high level programming language, allows you to focus on core functionality of the application by taking care of common programming tasks. The simple syntax rules of the programming language further makes it easier for you to keep the code base readable and application maintainable.

Functionalities:

Readable and maintainable code
Multiple programming paradigms
Compatible with major platforms and systems
Robust standard library
Open source frameworks and tools
Simplify complex software development
Adopt test driven development

Evidences:

Python version installed in computer

Anaconda open-source distribution of the Python and R programming languages for scientific computing data science, machine learning applications, large-scale data processing, predictive analytics, etc.

From Anaconda, jupyter notebook is used to data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning

Math operators

Variables declaration and Data Types

Working with strings

Loops

Loops and conditions

Arrays are called lists in python.So, use of lists

After, the basics code of python, processes for Natural Language Processing (NLP) is demonstrated.The Natural Language Toolkit NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English language in the Python programming language.

Natural Language Processing (NLP)

Import of NLTK toolkit for natural language processing,

Tokenization

Tokenization is a process of breaking of strings into token. Tokenization include three steps;

Breaking sentences into words
Understanding importance of each word in respect to the sentence
Produce a structural description of a sentence

The sentence can also be tokenized with help of Ngrams, Bigrams and Trigrams. Trigrams are tokens of three consecutive words and Bigrams are tokens of two consecutive words whereas Ngrams are any number of tokens of words.

Tokenization using bigrams

Stemming

Stemming refers to normalizing words into its base form or root form. It works by cutting of the end or beginning of the word to normalize the specific word into its base form.

Lemmatization

Lemmatization takes into consideration of morphological analysis of the word. It groups together different inflected forms of a word called Lemma. It is similar to stemming, as it maps words and gives one common root. Unlike stemming, output of lemmatization is a proper word.

Stop Words

Stop words are words like I, before, after etc. which helps in forming the word, but these are not of any use in NLP. These types of words are listed as Stop Words.

Part of Speech

Grammatical types of words like verb, noun, adjective, articles indicates how the word functions. A sentence can includes lots of part of speech based on the context how it is used. POS tags are used as statistical NLP task which distinguishes the sense of word which is helpful is text realization.

Named Entity Recognition

Named entity recognition locates and classifies named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Syntax

Set of principles, rules and processes in a given language is a syntax. Syntax Tree is a representation of syntactic structure of sentences or strings which of formed in the basis of part of speech.

Chunking

Chunking is a process of picking up individual pieces of information and grouping them into bigger pieces.

The above mentioned processes are demonstrated as examples in Code Language.