---
title: "The AI Advantage"
author: "By: Vin Patel &amp; Harmit Kamboe"
url: "https://books.vinpatel.com/6/The-AI-Advantage"
---

# Foreword

Machine learning and artificial intelligence have emerged as transformative technologies across industries, from healthcare diagnostics to consumer analytics. However, the reality of these technologies is more nuanced than the widespread enthusiasm suggests.

While machine learning and AI have gained significant attention, they represent early stages in the evolution toward Artificial General Intelligence (AGI). Machine learning occupies the foundational level of this technological spectrum, with deep learning building upon it, and AI encompassing the broader vision of intelligent systems.

  ![1.png](https://books.vinpatel.com/u/1-qbrJ22.png)
 
## The relationship between Deep Learning, Machine Learning, and Artificial Intelligence

The effectiveness of machine learning, deep learning, and AI systems depends heavily on data scale and quality. These algorithms require substantial datasets to identify patterns, train effectively, and continuously improve through iterative testing.

The convergence of declining computing costs, AI-powered development tools, and widespread cloud adoption has created an opportune environment for organizations to leverage these technologies. On-demand availability of computational resources has democratized access to sophisticated machine learning capabilities and models, enabling businesses of all sizes to implement intelligent solutions using the power of natural language.

This guide provides executives with the practical knowledge needed to understand, evaluate, and implement machine learning and AI initiatives within their organizations.

## Who is this book written for?

If you are not a programmer, but a business executive in a leadership position, then this e-book is for you. 

You have an opportunity and an obligation to either lay the foundations for a Machine Learning culture and AI or push this culture into the DNA of your operations. To leave Machine Learning/AI in the hands of the Information Technology (IT) department would be a tragic miss for any enterprise, no matter how large or small.

## What is this ebook not intended to be?

If you think this book is a hands-on tutorial type of a Machine Learning resource that demystifies the statistical or programming techniques used in Machine Learning or AI with examples of code, then this is not an e-book for you.

## Authors & Contributors 

**Vin Patel**

Vin is a result-oriented senior enterprise architect with a focus on delivering high-quality code and products in high-traffic environments. He is enthusiastic about building new products and services. He has 24+ years of experience in the internet industry and specializes in Full Stack Engineering, DevOps, Data Ops, Artificial Intelligence & Machine Learning. He has hands-on experience with all aspects of building large-scale, high-availability applications: application development, n-tier architecture, frameworks, data interchange, security, online commerce, database administration, replication, optimization, server administration, open source software, and quality assurance. He stays up-to-date with best practices and always finds himself learning new technologies.

**Harmit S Kamboe**

Harmit S Kamboe is a seasoned marketing professional with expertise in digital marketing with experience at start-ups, agencies, as well as enterprise corporations.  With deep domain experience in SEO and Paid Media, Harmit has had a chance to see first-hand how Machine Learning is benefiting the marketing function. 

## Let's get started:

This ebook is a gentle introduction to Machine Learning and AI, and a guide to help you be at ease with thinking about what Machine Learning can do for your business or career.

**Note:** The book can be accessed online anytime as a quick reference, and you can put it in fullscreen mode as well. Currently, the book is not available in dark mode.

We look forward to your feedback on this ebook and wish you well on your Machine Learning and AI journey.

# Understanding Machine Learning and AI

### Defining the Current Landscape

Machine learning & Artificial Intelligence has rapidly evolved from an academic concept to a mainstream business tool. Google Trends data reveals exponential growth in search interest since 2004, indicating both early adoption by innovators and recent mainstream acceptance.


 ![Screenshot 2025-08-31 at 8.37.25 AM.png](https://books.vinpatel.com/u/screenshot-2025-08-31-at-8-37-25-am-T8yhvB.png) 


## Facets of Machine Learning Surround Us Everywhere

![Untitled-123.png](https://books.vinpatel.com/u/untitled-123-CaUMGb.png)
As a business leader, you likely encounter machine learning applications daily without realizing it. Netflix's content recommendations, Spotify's music suggestions, and even medical diagnostics using retinal imaging for cardiovascular risk assessment all demonstrate machine learning as examples in practical application.

## What is Machine Learning? 

So how do all of these companies make intelligent decisions and recommendations? How can they find these patterns in enormous troves of data?

The answer lies in a combination of pattern recognition in data, being able to make predictions based on the patterns, getting better at making these predictions and automating this process. Let’s examine all four of these elements.

- **Pattern Recognition**: Identifying meaningful relationships in data

- **Predictive Capability**: Making informed predictions about future outcomes

- **Adaptive Learning**: Improving accuracy as more data becomes available

- **Automation**: Reducing human intervention in decision-making processes

### Pattern Recognition

There is no magic number of how much data is the right amount of data in order for a ML/AI system to be able to recognize patterns. But the general rule is more data is better than less data. More usage of a product is better than less usage. The more you interact with an application, and the larger the number of users that interact with a program, the smarter it gets.

But if you insist on an answer, then we would refer to an explanation from Prof. Yaser Abu-Mostafa from Caltech. The professor was asked, in his online course, about the amount of data required for a Machine Learning algorithm. The professor stated - as a criterion, we need 10 times as many examples as there are degrees of freedom ([https://www.investopedia.com/terms/d/degrees-of-freedom.asp) in our machine learning model. Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample.

In a straightforward linear model, the degrees of freedom corresponds to the dimensionality of data (number of columns).

The ‘more data is better than less data’ is not immune to the law of common sense.  Machine Learning can be influenced if the input variables are not truly independent as well.

Generally a larger data set allows for testing multiple approaches (statistical models) to find patterns in a data set.

### Predictive Capability

The predictive capability of an algorithm should not be determined purely based on mathematical accuracy.

Data elements should also pass a sanity check to ensure that the predictive capability is grounded in the reality of the business. And for that we must ensure that all the data elements (attributes) are relevant.

You can think of attributes as different data points or elements of data. Normally, the more of these you have, the richer your data is. 

Imagine if you are an airline. In all likelihood, you have access to arrival and destination cities, credit card type, the website used to book the ticket, date, time, device, day of the week, address of the purchaser, age, number of accompanying passengers, etc. With these, you can create a persona of the different kinds of customers. But an even richer profile can be built by adding external data like key events at destination cities (festivals, sports events, concerts, weather, statutory holidays, school holidays etc.). All of these are “attributes” of a single object which is the airline ticket.

With the advantage of scale of data for pattern recognition and predictive capability grounded in critical attributes, you can explore and look for surprising relationships between independent variables and dependent variables. Finding, testing, and validating these relationships can lead to deep and actionable insights. 

### Adaptive Learning

There was a time when large amounts of data crunching was the prerogative of large corporations with vast resources. With the rapid adoption of on-demand computing resources, almost anyone can now get the computing resources required. All large technology companies like Google, Microsoft, Amazon have robust cloud computing offers.

With the benefit of cloud infrastructure, you can now constantly re-train the data as it refreshes to ensure that the model is always learning and improving its ability to predict. 

### Automation

All models, those that are independent or dependent on others can easily run on an automated basis. 

## What is not Machine Learning? 

The end goal of Machine Learning is ‘prediction’. Therefore any number-crunching that does not rely on statistical learning from a pre-existing data set or does not build towards the goal of "prediction, is not Machine Learning. Thus, you being targeted by an ad as you wander across the web is not Machine Learning. That is probably an example of you being retargeted by a merchant website that you visited but did not take an action that a merchant wants website visitors to undertake.  If you begin to see ads from a physical retail store that you visited, chances are that you are retargeted because the merchant is serving ads based on some kind of location technology. If you logged into the store wifi, then you are not really being targeted because of Machine Learning (although Machine Learning may have been used in some shape or form along the way).

But if you are re-targeted based on some learnings from a large data set of previous users that were re-targeted, then that is an example of machine learning being applied in the real world.

People often think of Machine Learning as AI (Artificial Intelligence) or use the terms interchangeably. While Machine Learning is a subset of AI, AI is not Machine Learning. AI has a much higher purpose in life. It aims to emulate some of the creative faculties of humans.

Machine Learning is also not ‘Deep Learning’. Much of ‘Machine Learning’ data is labelled, and this training data (or sample labelled data) is what the algorithm trains (learns) itself on. In Deep Learning, there is no training data. The self-learning algorithm directly interacts with the unlabeled data.

 ![3.png](https://books.vinpatel.com/u/3-D8T3SJ.png) 

**Artificial Intelligence In Increasing Stage of Complexity (from left to right)**

## Some Machine Learning Terminology

**Independent Variable** -  The variable that has an impact on the dependent variable. This independent variable (or input variable) is usually plotted on the x-axis

**Dependent Variable** - This is the target (or output) variable is what the model will try and predict. This is usually plotted on the y-axis. You can have more than 1 y-axis. 

**Feature** - Multiple input variables that may be having an impact on the target/output variable

**Label** - The output variable that the algorithm will try and predict.


# The Four Types of Machine Learning

Before you start working on implementing a Machine Learning model for your business problem, the first task that you need to do is “Define your Objective”. The objective here refers to the purpose of implementing the Machine Learning model and the result that you are expecting. The objective will help you to collect relevant data and decide on the appropriate method to be used.

Implementation of a Machine Learning model is not a very difficult task, but we need to decide which Model will give the best result. Normally we use a model based on data available or provided to us. Your task will be easier in selecting a model if you know about the different types of Machine Learning models.

There are four key types or stages of Machine Learning:

 ![4.png](https://books.vinpatel.com/u/4-VrVAwu.png) 

## The 4 stages of Machine Learning

Machine learning models fall into four main categories as depicted above. Based on the type of data we have and the research problem at hand, we may choose to solve the problem using a specific approach.

### Supervised Learning

Supervised learning is very common and easy to implement if you have labelled data, and a variable (target) whose value you want to predict. Labelled data here refers to sample data that is properly defined and this forms the foundation of how a model will learn to accurately predict the output or target variable.

### Unsupervised Learning

On the other hand, in an unsupervised learning model, the data is unlabeled, and the algorithm has to create a sense of the features that impact the accuracy of prediction and develop a pattern on its own.

### Semi Supervised Learning

Semi-supervised learning lies in the middle of Supervised and Unsupervised learning models. It has a small amount of labelled data and a relatively massive amount of unlabeled data. 

### Reinforcement Learning

The reinforcement learning model is associated with a reward system. The program here, sometimes referred to as an agent, needs to choose the best course of action in a given environment. The model/agent is rewarded over time for finding the best solutions and that creates a loop of reinforcement.

Each of these four methods is developed and designed based on input data and the end task to be accomplished. We are going to discuss these methods in detail.

## Supervised Learning in Detail

In a supervised learning model, we have access to labelled data with examples. Now based on your business problem or requirement, we need to understand the kind of model that we have to select. The Supervised learning model classifies problems into two types 1. Regression and 2. Classification.

 ![5.png](https://books.vinpatel.com/u/5-D77upd.png) 

Regression Models are used when we are trying to predict a continuous numeric variable such as sales, quantity, customer footfall, price or any such types of variables.

Classification Models are used for categorical variables such as will a customer buy or not, will a student enroll or not and more.

For example, if we have to predict “How many customers are going to visit our outlet/store or website today” we need to build a regression model.

If we are trying to classify cat images, then we would have sample labelled images of cats, dogs, other animals and perhaps even specific kinds of cats. By consuming this data, a Supervised Classification algorithm would be able to detect patterns and ‘learn’ how to detect a cat image versus a dog image or a specific type of cat. The accuracy generally improves with a larger and more diversified data set. Such a model would then be able to answer a binary question of is the image shown is the image of a cat or not.

### How does it work?

1. The input data is already labelled, and the output criteria are defined.

2. The data will be cleaned for missing values and outlier values, as a part of the data cleansing process. 

3. The data set is further divided into two parts:

A. Training data and 

B. Test Data

‘Training’ data is the data set that the program will learn on. It is usually between 70% to 80% of the data set. The ‘Test’ data is the remainder of the dataset. This data remains unseen by the program and is what the program will test its accuracy on.

4. Apply the Supervised Learning Model to the training data, to derive our prediction model.

5. After the algorithm is trained to an acceptable level of performance, the algorithm is applied to unseen instances of input data to predict the correct outcome. In case, we do not receive an acceptable level of performance then we need to repeat steps 2, 3 and 4 again with the new algorithm. This is not a single-step task but it is a repetitive task conducted till we find a desired level of accuracy.

Once we derive our prediction model, we apply this to our real scenario data and make decisions accordingly. This model is again tested and fine-tuned on real-world data too.  We need to keep testing it as and when the operating environment changes. 

One of the key things to remember, when it comes to the input variables, is the algorithm assumes that all input variables (pet height, colour of hair, the distance between the eyes, the distance between eyes and nose etc.) are all independent variables. This sanity check is something that we as users of Machine Learning must validate. Hypothetically if two or more of the input variables were interrelated, this may impact how accurate the model is by biasing it. 

As an example, assume we had input variables of the weight of a cat and the calorific diet of a cat. Assuming that larger cats eat more than smaller cats, we may be biasing the algorithm with such information when it comes to detecting images of cats. But such information may be of use if the objective of the program is to detect the species of a cat. 

Another relevant example here could be having columns of data for the cat's height in cm, another one in  feet and another one in meters. Not only are  we overloading the computing resources, we  may be teaching the algorithm to rely more on these columns of data when in effect they mean the same thing.

## Unsupervised Learning in Detail

Unsupervised Learning is the next evolution of Machine Learning Methods after Supervised Learning. In these methods, we do not have any labelled data. The Unsupervised Method is used for Clustering.

 ![6.png](https://books.vinpatel.com/u/6-RaIYul.png) 

Clustering is used to group data based on similar patterns being present in the data. There are multiple methods used to identify the pattern such as “Centroid-based”, “Density-based”, “Distribution-based” and “Hierarchical-based”. Here again, we need to select the method to be used as per our data.

Under this approach, the algorithm has the input data, but the information is not labelled. As an example imagine that a hypothetical grocery store has a lot of customer data but has not labelled or classified its customers. The store in this example has not grouped its customers based on their purchase patterns, frequency or basket size etc.

While the store may not know this information readily, the insights are somewhere in the data and they just have to be extracted. The goal of this learning approach is to model a structure and classify the data accordingly.

### How does it work?

1. The algorithm receives unlabeled data as the input.
2. The algorithm uses pre-defined statistical techniques to analyze the hidden patterns in it.
3. The algorithm forms clusters of data elements based on their similar properties.

Prof. Yaser Abu Mostafa, in his book ‘Learning From Data’, explains unsupervised learning with an example that supposes a relocation to Spain. If you were to move to Spain, you may start taking online Spanish lessons. While initially, the words may seem alien and have no real meaning, over time you would begin to ‘learn. You may not know the exact meaning of all the words but you do become better accustomed to familiar sounds and structures. So when you finally arrive in Spain, you will be in a better position to learn Spanish properly.

Relating this to our store example, over time we may be able to detect shoppers that buy more frequently, buy items associated with convenience or solely items on sale etc.

## Semi-supervised Learning in Detail

The semi-supervised learning approach is a hybrid of supervised and unsupervised learning methods.

### When do we need a semi-supervised learning algorithm?

When we have a mixture of labelled and unlabeled data, we need to employ this learning approach. Many of the real-world Machine Learning problems fall into this category because labelling all the data is expensive and requires time.

A classic example of this is Amazon’s Alexa - a voice search device. Amazon trains Alexa's algorithm on a small amount of labelled data and further applies it to a large amount of unlabeled data that it receives from the users. A confidence level score is associated with the results. When the results score high on the confidence level, the algorithm uses the data for additional learning, and the cycle continues, and the algorithm proceeds to self-learn. The algorithm then continues to improve itself over time and usage as new data with an acceptable confidence level is generated and provided to it.

## Reinforcement Learning in Detail

Under this approach, the algorithm learns to perform a task through trial and error to maximize reward. The reward, of course, is virtual.

### How does it work?

1. The algorithm takes action in an uncertain environment.
2. It gets either a positive reward or a negative reward, depending upon that action.
3. The algorithm learns to choose the best series of actions by leveraging the power of trial and error.

Games mostly use reinforcement learning algorithms. An often-quoted example is how Google trained an algorithm to master the board game Go. There are an astonishing 10 to the power of 170 possible board configurations in this game, and this makes Go more complicated than Chess. 

Google created an algorithm named **AlphaGo Zero**, which initially learned by playing against amateurs. It then acquired enough knowledge to play against itself and was able to beat human champions at the game in a consistent manner.

It is worth noting, to get a sense of the speed of learning, that in just three days, AlphaGo Zero had defeated the previous versions of AlphaGo, and within 40 days, it had independently found game principles that had taken humans thousands of years to discover.


# Commonly used Algorithms

Machine Learning is used by different companies in multiple industries to solve their business challenges. The main reason for selecting Machine Learning algorithms over traditional algorithms is the accuracy in prediction they provide. Machine Learning helps in building a model which can produce a highly accurate model which further improves itself with an increase in training data. 

Each of the four main types of Machine Learning can use a variety of algorithms. A summary of these main algorithms is provided below, along with some business use cases.

 ![7.png](https://books.vinpatel.com/u/7-1PC9g1.png) 
 **Main Machine Learning Algorithms and Some Key Business Use Cases**


## Supervised Learning Models

As we know Supervised learning models are used for labelled data and for regression and classification problems. A wide range of supervised learning algorithms are available, and each one comes with its inherent strengths and weaknesses.

### Classification Algorithm

Classification is a technique where we assign the input data to a class or category. The prediction made by a Classification algorithm usually falls into a binary choice, for example: Is this email spam or not? Is this transaction fraudulent or not? Depending on the data and situation, there are different classification algorithms that could be appropriate:

- Decision Trees
- K-Nearest Neighbor
- Random Forest

#### Decision Trees

Decision Tree algorithms are insightful as with a small amount of data as they can show where the critical parts of a process lie. However, as the Decision Trees get more complicated, they can be inaccurate as a small change can lead to significant variances as you move downstream.

A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute. Depending on whether the condition is met or not, the algorithm goes down a different path at each node. Each branch represents the outcome of the test, and each leaf node represents a class label. The paths from the root to the leaf represent classification rules. For multiple rules-based situations, these kinds of algorithms are the best.

 ![8.png](https://books.vinpatel.com/u/8-XuL53E.png) 
**A Typical Decision Tree**

**Typical Use Cases**

Decision Tree algorithms are especially useful when the goal of the exercise is to understand how critical each decision in the decision-making process is. A typical use case for these algorithms is fraud detection.

These algorithms are used quite often in understanding the types of transactions (e.g., chip or no chip, card present or not present, etc.), $ thresholds, the geography of the merchant, the geography of the user, the demographics of the user, and so on. Mapping out these variables can help bubble up commonalities in fraudulent transactions.

#### K Nearest Neighbor Algorithms (KNN)

We can use KNN for both classification and regression predictive problems. KNN is unique in that it is an algorithm that runs when executed. It does not store information or learn with time. It is typically run on small data sets.

KNN algorithms are used to find a predefined number of training data points closest in distance to the test data or new data and predict the label from training data. The number of observations can be constant K (user given) or based on the local density of points. KNeighborsClassifier and RadiusNeighborsClassifier are two types of algorithms that fall under the Nearest Neighbors Classification.

 ![9.png](https://books.vinpatel.com/u/9-QFTM66.png) 

Simple recommendation systems, image recognition technology, and decision-making models often use KNN. It forms the foundation of companies like Netflix or Amazon when recommending different movies to watch or books to buy.

 ![10.png](https://books.vinpatel.com/u/10-XUXN82.png) 
**A Visual Representation of Cluster Analysis**

**Typical Use Cases**

KNN powers simple recommendation systems to a large degree. 

Based on available customer data and a comparison of other customer behaviour that is close to you, i.e., who have watched similar movies or bought similar books, it will give you a recommendation that will feel quite relevant. 

#### Random Forest

Random Forest is another important algorithm that comes under the Ensemble learning algorithm category. In Ensemble learning algorithms, multiple models are built and the best-performing models amongst them are selected. Random Forest is a Bagging method of Ensemble Learning and GBM (Gradient Boosting Machine) is a Boosting algorithm of Ensemble Learning. 

Random Forest builds multiple decision trees and merges them to get a more accurate and stable prediction. One significant advantage of Random Forest is that it can be used for both classification (grouping) and regression (checking for relationships) problems, which form the majority of current machine learning systems.

 ![11.png](https://books.vinpatel.com/u/11-SzTthf.png) 
**Merging of Decision Trees in the Random Forest Approach**

Instead of searching for the most important feature while splitting a node, the Random Forest searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model.

**Typical Use Cases**

An algorithm like Random Forest can help determine the preferences for future purchases of customers identified as loyal. A 2018 study in the Middle East shared its results publicly and called out Random Forest models as one of the two approaches that displayed accuracy of between 93–96% for customer retention on a telecom customer dataset.

#### Regression

Regression is a technique which tries to predict a continuous value for the input based on the available information. The most simple model used here is the Linear Regression model. This model has been the most widely used statistical technique in business and economics. 

For example, if a company had a successful sales season repeatedly for the holiday season for a few years, it can use linear regression to predict future sales for the upcoming holiday season. Similarly, the company can use it to forecast the pricing and promotions of a product.

**Assumptions in Linear Regression**

This model has limited capability in some situations. The key assumptions that this model makes are as follows:

Linear regression works best when the relationship between the independent and dependent variables is linear. It assumes a straight line can be plotted when it comes to the different values of the input variable to the output variable. 

Input variables should be independent of each other, this means there is no relationship between Input variables. 

Linear regression is sensitive to outliers. But in the real world, the data is often contaminated, and outliers should be adjusted. This is easier said than done as it is hard to differentiate between what is noise and what is a real observation.

 ![12.png](https://books.vinpatel.com/u/12-bBKr9c.png) 
**Visual Representation of a Linear Regression**

**Typical Use Cases**

Linear Regression is the most useful and frequently used algorithm. We can use linear regression for sales forecasts which can help businesses in planning and budgeting. Sales can be predicted using marketing expense, customer footfall and other variables.

 Sales = 0+1*MarketingExpense+2*CustomerFootfall

There are many use cases where we can use linear regression like sales forecasting, stock price prediction, predicting customer behavior and many more.

We use Linear regression to evaluate trends in business and make better future decisions. The trend line created uses a ‘best fit’ approach where the model tries and achieves the best possible outcome with the least amount of inaccuracies. 

## Unsupervised Learning Models

Unsupervised Learning Models are used for unlabeled data to identify similarities and patterns in different data points. The data points which show similar patterns are grouped together and are called a cluster. These clusters can be analyzed by the marketing team to offer different products and different offers to different clusters.

Let us examine the central Unsupervised Learning Models and the standard techniques that are mostly used.

### Clustering

Clustering is used mostly for unsupervised learning models and algorithms. But in some special cases we use clustering in supervised learning models as well. Supervised clustering is used for image segmentation, news articles clustering and streaming email batch clustering. Here the algorithm is used to learn from training data to parameterized item-pair. 

Clustering is the task of grouping sets of similar objects together. Cluster analysis itself is not one specific algorithm. Clustering can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to find them efficiently.

 ![13.png](https://books.vinpatel.com/u/13-E4EQsy.png) 
**A Visual Representation of Clustering**

**Typical Use Cases**

Clustering is widely used when it comes to grouping customers and segmenting buyers by dividing the customers into segments and grouping them by their shopping behavior.  Businesses can use this approach with other analytical tools to manage their data better. 

The two most widely used techniques for clustering are:

- K-mean clustering
- Hierarchical clustering

In K-means clustering the data is partitioned into clusters based on their means. Thus the closer the data points to each other, the more similar they are taken to be. 

Hierarchical clustering follows a similar approach to K-means but takes the results one step further. Hierarchical clustering treats each data point as a cluster and merges the closest data points with each other until a few key clusters remain. The output of this approach depends on the minimum distance between each cluster that can be specified and the point where this minimum distance should be applied.

**Principal Component Analysis**

Even though we have made significant advances in computing power and storage costs, it still becomes difficult to process data with higher dimensions. So dimension reduction methods tend to squeeze data size only to relevant features and neglect the rest as noise. One of the techniques that help through this process is PCA (Principal Component Analysis).

We try to find a set of relevant variables and combine them into a single collection consisting of essential variables. These new sets of variables are called principal components, which should be fed as inputs to the learning model.

 ![14.png](https://books.vinpatel.com/u/14-B58PrE.png) 
**Determining Principal Components**

**Typical Use Cases**

Let's say we have to analyze 1,500 stocks in a portfolio. Let's further assume that we have a hundred key variables for each stock. This assumption would give us 1,500 X 100 = 150,000 variables or combinations to be examined, sorted, and scored. 

It is going to be a burdensome task, even with high computing power. Upon examining the data we may find that as our goal may be to judge 30-day stock price performance, data points like intra-day high and intra-day low are data points that add no value and do not impact the model accuracy. By identifying such variables, we begin to remove the data points that do not add any value to the problem statement that was formulated.

A key takeaway of the PCA approach is that as a user we can create a hybrid variable too if we think that it will assist in the accuracy of the model prediction.

## Semi-supervised Learning Models

 We will examine three main semi-supervised learning models that we commonly use.

-  Generative models
- Low-density separation
- Graph-based methods

### Generative Adversarial Networks (GANs)

Semi-supervised learning based on generative adversarial networks (GAN) have been used in the application of autonomous car driving. GAN models effectively capture the data distribution and hence create a powerful representation of the data. This gives us the ability to generate a realistic dataset for the framework of autonomous driving.

Under the GAN approach two models are engaged in competing with each other (hence the use of the word ‘adversarial”) resulting in better accuracy over time. The first GAN is the ‘generator’. This generator comes up with new likely examples e.g. if the goal is to detect cancer cells, the generator will come with new images of what cancerous cells may look like.

The other GAN, the ‘discriminator’ learns to classify the artificial examples from real examples. And over time the discriminator becomes better and better as it makes its way through more data.

**Typical Use Cases**

GANs have found a lot of use in facial image recognition, image classification, face ageing and the creation of personalized emojis out of images. Understanding how GANs work, will hopefully assist you in understanding why that is the case.

A 2019 study shows that graph-based methods in semi-supervised learning have reduced the error rate of the NLU (Natural Language Understanding) model by 5%. This technology is crucial for voice assistants like Amazon’s Alexa, Google’s Home, and Apple’s Siri.

### Low-Density Separation

The Low-Density Separation approach is related to Cluster Analysis. An outcome of the Cluster Analysis is an area of the high-density region. If two data points are close, that means they belong to the same cluster or their label must be the same. Low-Density Separation is used to separate data points using low-density regions. The samples belonging to low-density space are mostly likely to be boundary points or their classes can be different.

We know the Supervised algorithm SVM (Support Vector Machine), where the support vector lies in a low-density region. Low-Density Separation methods attempt to find decision boundaries that best separate one label class from the other and identify a label for unlabelled data points. The Transductive Support Vector Machine (TSVM) is an example of Low-Density Separation.

 ![15.png](https://books.vinpatel.com/u/15-9fllKO.png) 


**Typical Use Cases**

Low-density separation - Handwritten Digit Recognition.

The Low-Density Separation method uses many real-world problems such as text classification. It can be used for handwritten “Digit Recognition”. For example, we may want to distinguish between the handwritten digit “0” against digit “1”. A sample point taken exactly from a decision boundary will be between “0” and a “1”, will help in such a determination .

Low-density separation will be trained on large datasets which include some labelled data and large amounts of data as unlabelled. It will learn the pattern of writing digits from labelled data and apply it for unlabelled data.

### Graph Based Methods

A graph, in Machine Learning, is a data type that has nodes and edges. And, both nodes and edges can hold information.

 ![16.png](https://books.vinpatel.com/u/16-6KL8iL.png) 

Edges can represent direction indicating the flow of information and the nodes can be representative of weights.

According to Widmann and Verbern, Graph-Based Semi-supervised Learning (GSSL algorithm) is very effective for different kinds of problems specifically for short text classification. The GSSL algorithm uses either graph to spread labelled labels data to unlabeled data or to optimize the loss function. In the graph-based method, we build a complete graph-based model on similarities between the label and unlabeled nodes. The nodes which have high similarity trends have the same label.

The advantage of the GSSL algorithm is that it converges quickly and can be easily scaled for large data. It is also flexible to adopt but its computation is complex and expensive.

**Typical Use Cases**

The graph based approach is easy to imagine when we think of social networks like Facebook. The re-prioritization of the items in the news feed that was first announced in 2018 was a reflection of the graph theory that has been continuously built upon. 

By prioritizing personal messaging over brand messaging and within that, prioritizing posts that indicate or lead to a more meaningful conversation or were from a closer circle of friends, were indicative of the graph-based approach at work.

## Reinforcement Learning

Reinforcement learning, due to its generality, is studied in many other disciplines, such as game theory and simulation-based optimization. A key concept in Reinforcement learning is the concept of rewarding a machine and hence being able to bias its behaviour to complete tasks sooner or more efficiently.

A reinforcement learning agent interacts with its environment in discrete time steps. At each step, the agent receives a reward. The agent then chooses an action from the set of available actions. With reward maximization as its goal, the algorithm can work out how best to complete a given task.

 ![17.png](https://books.vinpatel.com/u/17-mYXylt.png) 
**A Visual Representation of Reinforcement Learning**

Let us examine the main reinforcement learning models.

- Value function
- Brute force
- Direct policy search

### Value Function (Q-learning)

Q-learning is a technique that tends to find the best action in a given environment. This algorithm updates the value function and seeks to maximize the total reward. 

### Brute Force

Brute Force-based learning attempts to take into account all possible options into account. When the problem is manageable, the need for computing resources is acceptable and the options are limited or when the cost of a mistake is high, this may be an option that can be considered.

**Typical Use Case**

Consider a Level 5 self-driving car, where there is no plan to have a steering wheel, gas pedal, brake, etc. and the machine is supposed to take care of everything.  The human passenger cannot take any evasive action in case of an emergency. Due to the high-cost litigation in the case of an accident, a Brute Force approach may be the best way to try and solve this problem.

Clearly, this approach should only be used in a select few situations.

### Direct Policy Search

The Direct Policy Search approach utilizes options (policies) where the agent (program) seeks to maximize its reward. The policy can differ with differing environments. The agent here takes a completely different approach than in the case of Brute Force.

The agent can be fed different policies or the agent can learn and devise its own policies over time.

Agents can use the Markov strategy where they have a history of having tried out a few different options in memory and are aware of the overall reward at stake. In such situations, the agent calculates the next moves based on the overall objective. It does not use a brute force approach and does not use a programmed Direct Policy approach, but can come up with newer policies too.

**Typical Use Cases**

Typical use cases of the Direct Policy approach can be found in building a Robot Control system. Policy search methods have made tremendous development like learning trajectory-based policies for complex skills and learning complex policies with thousands of parameters. But building Robot-RL is still challenging. 

Direct Policy search is also used for tracking autonomous underwater cables by visual feedback, autonomous helicopter control, and learning obstacles avoidance parameters from operator behaviour. 

From recent advancements in technologies and research, there are many use cases of reinforcement learning. You can see some of it in the below image as well.

 ![18.png](https://books.vinpatel.com/u/18-jJ3Znq.png) 


# Best Practices for Machine Learning

We have seen different algorithms in the last chapter and understood their significance as well. But before implementing Machine Learning for your “Business Objective”, we should suggest following the best practices used by the industry. These will help in avoiding any mistakes and ensure an error free implementation of Machine Learning.

We can identify the best practices for Machine Learning in the following manner:

 ![19.png](https://books.vinpatel.com/u/19-2m52PK.png) 

Most Machine Learning programs follow the above lifecycle. Machine Learning, like any other business capability, is primarily driven by a business problem. As a first activity, we have to convert this business problem into a Problem Statement and identify KPIs (Key Performance Indicator) to measure the outcome.

We need to make a list of relevant data points based on the Problem Statement and collect these data elements. The data points should be collected from trusted and reliable sources only. The data points should be labeled and understood. These are then tested individually and combined to create derived metrics to find the kernel of the input variables that have a significant impact on the outcome.  The data science team also has to test different algorithms to determine their suitability for this exercise.

Once the team has a good grasp of the data elements and the algorithm to be used, they apply it at scale. It is preferably done on the database, especially for large databases so that this does not become a resource-intensive application.

The team further validates the algorithm by running it on various test data and unseen data. Data Scientists then fine-tune various parameters of the algorithm so that the algorithm is neither overfitted or under-fitted to the data set. The last step is to validate the algorithm by using it on new datasets.

## 1.    Identify the business problem and the right success metrics.

The purpose of Machine Learning and AI in most cases is to make a prediction. In business-related use cases, Machine Learning and AI can be used for different purposes such as:

A.     Predicting customer buying behaviors or pattern

B.     Classifying an email is spam or not or any kind of image classification

C.    Understanding customer sentiment towards our services/products

D.    Is this customer going to attrite in the next X days

E.     Will the viewer/visitor buy a specific service/product

The first step of ML (Machine Learning) implementation is to define a Business Problem statement and its corresponding KPI or columns. The Business Problem statements include the challenges that the organization is facing and they will be in text format. We need to identify a column that can be used to measure it. These columns are called KPIs. 

It is important to note that the Business Problem and the KPIs are two different but related metrics:

**Identify the Specific Business Problem to be Addressed**

 ![20.png](https://books.vinpatel.com/u/20-jpyqus.png) 

In the above image, we are depicting the customer retention challenges faced by the organization. As we have discussed, the first step is to identify Business Problem statements and KPI. Here it is.

 ![table 1.png](https://vinpatel.com/u/table-1-n7fyst.png) 

Here the output variable is Low Customer Retention. And the variables that impact this outcome are the input variables. And like any decision that we make as individuals, there can be multiple input variables that impact our outcome decisions.

The order and importance of input variables may differ for different business problems.

We cannot measure the outcome of the algorithm just in terms of the prediction. As a business, the outcome should be measured relative to the current state.

The goal of a Machine Learning process should, therefore, be something along the lines of any one of the following:

Achieve an improvement of X% in our prediction rate

Achieve an improvement of X% in  a specific metric versus control 

Such numeric thresholds will help us determine if a Machine Learning experiment is successful or not. These metrics will also make it easier for businesses to figure out what the ROI of a Machine Learning project will be.

 ![21.png](https://books.vinpatel.com/u/21-TKsybi.png) 

## 2.    Gather correct data. Often (not always) more data is better.

Gathering correct data is the core of every machine learning business application. It would not be wrong to say that machine learning and especially deep learning techniques are “data-hungry”. 

When it comes to Machine Learning, it is essential to remember this quote from Michelangelo, the Engineer & Poet from the Renaissance period:

 “The sculpture is already complete within the marble block before I start my work. It is already there; I just have to chisel away the superfluous material.”

We can think of Machine Learning in the same way. The independent variables that impact the outcome of dependent variables are already there. They are waiting for us to help them reveal themselves. All we have to do is apply the right Machine Learning algorithms to the correct input variables.

The right amount of data can be thought of in two aspects:

A. Depth of data (Quantum of data in terms of the number of observations and how far back does the data go) and

B. The breadth of data (External Factors)

The Depth of Data

We will refer back to the comment from Prof. Yaser Abu-Mostafa from Caltech, who stated that as a rule of thumb, you need roughly ten times as many examples as there are degrees of freedom in your model. For the purpose of illustration, if we were trying to identify profitable customers for a grocery chain, our data may look like what is depicted in the table below:

 ![22.png](https://books.vinpatel.com/u/22-q2eXLu.png) 
**Sample Layout of Data Elements in a Hypothetical Dataset**


The first thing to note is that any machine learning model would be interested only in the numerical columns. Thus the two name columns and the gender field, if not converted into numbers, would be of no interest. Intuitively, we know that gender could play an important role (so it should be converted to a numeric field) in the analysis and perhaps the name fields can be safely dropped.

For most of the 12 columns if we have enough data such that we can meet any one of the following statistical rules: 

A. For these 12 columns, we would look at what the maximum values could be in each column. As an example for the Frequency of Purchase Columns, if we are going to be satisfied with a max count of recording a transaction a day then the number of observations should at least be 30 as that is the maximum number that the values in the column can hold. This approach would apply to all the columns.

Depending on how many sku's the store carries, the sample size can increase quite rapidly. A store with 5,000 sku’s would need 5,000 rows of data in the sample size.

B. A traditional statistical approach might suggest that a sample size be chosen on the basis of a specific confidence level. However, that estimate does not account for time duration and associated seasonality.

Grocery purchases have a short life cycle span with purchasers needing grocery items each week or two weeks. In such cases, we may need to add a qualifier to state that a minimum count of records are required for each week of a year to get a fuller picture.

**The Breadth of Data**

The following factors also impact the size of the basket and the groceries bought:

Weather

School Holidays and other Statutory Holidays

Special Occasions - Superbowl, Valentines Day etc.

Weekdays versus Weekends and other such factors

We can add most of these variables to the input data from third-party sources to get a better picture of what may be at play.

Gathering correct data is the core of every machine learning business application. Machine learning experts often say that “cleaner data” is better than big data. Cleaning your data means correcting the errors in the data. That means we also need to focus on the correct data because otherwise, it will be harder to yield insightful results. According to a survey in Forbes, data scientists spend 80% of their time on data cleaning. For example, there might be missing values in your data. You can either drop those values or replace them with the average of the known values. 

## 3.  Test different approaches for the input and the output with a few different algorithms.

We have seen there are many Machine Learning algorithms available for prediction. Each algorithm has its own advantages and disadvantages. Also, it’s not easy to select an algorithm based on the data available. We need to try different approaches and different algorithms to get the best-fit model.

Following up on the Michelangelo quote above, it is always good practice to check for a direct relationship between the variables and the outcome with two distinct viewpoints:

**A. Single Variable to Output**

 Check for a direct relationship between the independent variables and the desired outcome.

In the case of this hypothetical grocery store, we would examine if each of the variables (apart from the first name and last name) directly impacts the frequency of purchase and the value of the basket size. 

**B. Multiple Variables to the Output**

We would also combine multiple variables into a derived metric (with different weights) to uncover any linear/non-linear relationship.

When looking at the results of both single variables and multiple variables (derived metrics), we must pay attention to the adage of "correlation does not imply causation".

This saying in statistics is crucial in applying common sense to a statistical relationship. One of the best examples of this adage is the skirt length theory.  

This theory was first suggested in 1925 by George Taylor of the Wharton School of Business. In the 1920s- the "Roaring Twenties"- the economic strength of the U.S. led to a period of sustained growth in personal wealth for most of the population. This, in turn, led to new ventures in all areas, including entertainment and fashion. Fashions that would have been socially scandalous a decade before, such as skirts that ended above the knees, were all the rage.

Upon the onset of the depression, the skirt lengths once again fell back to being below the knees. The trends were visible again in the 1980s as the stock market touched higher peaks with the accompaniment of Reaganomics. The stock market crash of 1987 again saw an inverse relationship between the markets and the length of the skirts. 

Clearly, while there may be a statistical relationship between the two variables, there is no scientific relationship at play here.

When we select input variables, make sure the variables are relevant. Similar variables will cause duplication, and unrelated variables may add to the noise factor. Get active participation from the larger team that may have a different context of the business problem.

Much like picking the right variables is key, so is the correct algorithm. Based on the data available and the KPI set as the goal, the right algorithm needs to be matched to the proper KPI. Finding the right algorithm requires an experienced team and time for many hypotheses to be tested by them.

## 4.    Move the algorithms instead of your data.

Usually, the dataset resides in a database somewhere (no pun intended). Users mostly import the data out of the database, run the algorithm on it, and export the data back to the database. This process takes a lot of time, maybe hours or even days.

The best approach towards this could be to run the algorithm inside the database or object storage. This approach is relatively faster, and data can be queried using SQL (Structured Query Language). We can connect many machine learning tools to relational databases, and we have free open source database tools like MySQL to perform the task.

The larger your dataset, the more time you will save by running your algorithm on the database directly.

 ![23.png](https://books.vinpatel.com/u/23-GzvF2p.png) 

From the above image it is obvious that different models give different accuracy and generally generate high accuracy results if we increase data size. So, selecting the right machine Learning algorithm for our data is like a treasure hunt.

## 5.    Test, Train, and then Test & Train some more. 

The standard rule for training and testing is that you keep 70%-80% data for training and 30%-20% data for testing the algorithm. That’s a rule of thumb. Once you begin to see meaningful results, you can run the algorithm on some unrelated (but similar) open public datasets too as further validation.  

When running an algorithm on the testing data, we need to ensure that:

   1. The testing set is large enough to yield meaningful results.

   2. The testing data has the same characteristics as that of the training            set. It should act as a representative of the training set.

At this stage, the Data Scientists will also finetune the parameters of the algorithm to ensure a ‘best fit’. This is part science and part art. A Machine Learning algorithm that doesn’t overfit or underfit is desirable.

 ![24.png](https://books.vinpatel.com/u/24-Tk9KOR.png) 
**A Visual Representation of How data can ‘fit’ an Algorithm**

An underfit visualization of results, is referred to as a set of results with a ‘high bias’. In the case of an overfitted model, where the algorithm covers every data point, there is a high probability that all noise and exceptional events are being accounted for. This does not indicate a high accuracy rate. It may indicate ‘high variance’.

Bias - refers to the accuracy of the algorithm.

Variance - refers to the size of the fluctuation of the output from the model.

 ![25.png](https://books.vinpatel.com/u/25-GqCSWI.png) 

**Finding the right fit when it comes to the data and the Machine Learning Model**

The ultimate goal is to find a model that produces a ‘high bias’ and ‘low variance’. 

In statistics and machine learning, the bias-variance tradeoff refers to the dilemma of accuracy (bias) and fluctuation (variance).  Ideally, an algorithm should produce high bias and low variance. The bias-variance tradeoff is a central problem in supervised learning. Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously. 

## 6.    Avoid data dropping while machine learning algorithms train.

The goal of machine learning algorithms is to develop a model that can predict future unseen instances. Most of the time, you may feel the need to “filter out” some data because it might become easier to explore and understand the model. But there’s a particular disadvantage linked to this practice. You might be missing out on a lot of valuable information that is not present in the rest of your data.

In our hypothetical grocery example, data during the Covid19 lockdown should be marked as such so that any analysis covering this period does not lead us down the wrong path. 

But over a long period, you could benefit from comparing such data with other periods when similarly stressful situations might have existed in the past.

Remember to use data masking or data tokenization in case you don’t want to expose any personally identifiable information (PII).

## 7.    Validate, Validate, Validate. 

Cross-validation is usually suggested in evaluating machine learning models. While the algorithm works on the training set, it has to deal with an independent dataset in real life.

  ![26.png](https://books.vinpatel.com/u/26-Gzu24Q.png) 
**Training Data helps the Algorithm learn and adapt to the Test Data**


A further manner of testing would be to apply the model on data outside the training set and the 20% example on data for different years, data from a smaller brand of grocery chain stores that the parent company may own.  Lots of training datasets usually consist of corresponding test sets, which have instances labeled by hand, against which we can calculate the model’s performance. If a test set is not there for a given dataset, it can often be beneficial for a team to create one.


# Make your organization Machine Learning ready.

We know that Machine Learning contributes significantly to organizational growth. The organization, its stakeholders, employees, and management all have to learn and contribute to the  implementation of Machine Learning models. We have prepared a checklist which can help you to prepare yourself and your team for Machine Learning implementation.


 ![27.png](https://books.vinpatel.com/u/27-Rv16Jy.png) 
**Visual Checklist to determine if your organization is ready for Machine Learning**

## Do you have a clear problem statement that needs to be solved using AI/ML?

Machine Learning is mostly used for prediction and clustering purposes. So, the first step is to identify the problem statement that may equate to something like “what you want to predict?” or “How many clusters do you want to create”.

Organizations need to have a clear problem statement to utilize machine learning.

For example- which customers are responding to the promotional offers and which ones are not, or we need to predict the right treatment plan for a patient so that his/her health outcomes become better by 20%.

## Do you have the right data? 

The next step is to identify relevant and reliable data as per our problem statements. We need to check the right data which means it should be collected from a trusted system and all information related to the problem statement needs to be accessible.

The time-series (data points that are spaced out by equal units of time) data set always performs better than a static dataset for machine learning algorithms. Sometimes you need to enrich your existing data sets by merging information to make it ready as input to an algorithm.

## Do you have any/enough data?

We have already discussed how machine learning requires a lot of data. We can also classify the data that into three categories depending on where it originates from:

A. First-party data sources or internal/proprietary data

B.  Second-party data sources or external data (data from partners or              suppliers)

C.  Third-party data sources 

### First-Party Data Sources

First-party data is the type of data that you gather from directly engaging with the audience. It can be data collected from website cookies, customer transaction history, form submissions, or your customer service center. This type of data has some specific advantages:

- The first-party data that you collect gives you insightful details about your customer’s behavior.

- It is accurate and relevant since it is gathered directly from your customers.

- You own this data

For example, Netflix has created a direct relationship with its audience. It knows how their audience interacts with their content and hence Netflix enhances the user experience on a consistent basis.

There is a great case study of Netflix regarding the series ‘House of Cards’. While the general practice of the TV industry was to intrigue the audience by commissioning a few pilot episodes, Netflix commissioned two entire seasons of the show for $100+ million. Why did Netflix gamble on the show? Well, the answer lies in what we discussed above: first-party data. By analyzing the data of 33 million subscribers at that time, they knew that their audience liked David Fincher’s work or streamed movies featuring Kevin Spacey. And they were right. Within three months of introducing the show, Netflix added 2 million subscribers!

In the case of our grocery store example, here is a list of some first-party data sources that the grocery store would be able to collect:

A. Data from the POS system as customers purchase products, method of payment (travel cards versus cash back cards versus debit cards etc.) used, day and time of the purchase etc.

B. If the store offers free Wi-Fi, there is an opportunity to collect some additional web data

C. If the store has some cameras that can scrub out the faces of the                customers, generate heat maps of where the customers linger longer and use that information for better merchandising

D. Where stores have their own loyalty program, the data can be traced back to  an individual for better and richer profiles

E. Where the stores have their own recipe app or shopping lists app, the grocery chain would be able to tap into that data too

### Second-Party Data Sources

Second-party data is the data collected from some other organization like a supplier brand, or supplier website, for the mutual benefit of both businesses. This data is merely somebody else’s first-party data. This type of data is helpful only when the company is relevant to your audience. 

Some of the advantages include:

A. A more contextual understanding of its customers

B. Potential competitive market advantage

C. The possibility of predicting behavior with greater accuracy

Let’s say you are running an online cosmetic store targeted at women. Your first-party data comes mostly from women. Now, if you want to introduce a men’s product, you won’t have the first-party data for it to reach your audience. In this case, you can team up with a men’s health website and get the data from them.

In the case of our hypothetical grocery store, they may be a partner of a larger points program. Through this partnership, the grocery store may be able to understand the behavior of its customers in terms of what non-grocery items they like to consume.

Potential Use Cases:

A. If a grocery can determine that their customers like to travel to specific      countries for vacations, perhaps they can incorporate those themes into the store flyer and store merchandising

B. If a grocery can determine that their customer base has specific tastes       in movies, music, travel, etc., all of that can assist with the creation of a  richer persona.

C. If a grocery can see other elements of the lifestyle of their customers,  they could improve their advertising messaging.

### Third-Party Data Sources

Third-party data is the data collected from organizations that do not have a direct relationship with your customers. Sometimes, we have to buy this data. There are market research organizations that collect data from different sources for different industries and markets. You need to buy it to understand market trends and competitors. Some of the advantages of having third-party data are:

- Identifying potential new customers

- Expanding your audience

Data sourced from independent surveys, polls, etc., would constitute third-party data sources. Our hypothetical grocery store could buy such data (provided all privacy concerns are addressed in the Rules & Regulations under which such data was collected). They can match the name and address of the survey respondents with their first party database, and create a more vibrant profile of their customers.

**Matching and Merging the data **

It is nice to have a combination of all three types of data. Machine Learning is only as good as the data that it used for training purposes. By combining first, second and third-party data, we can have a complete view of our customer’s journey on and beyond our channels. It is also very obvious that we may not need all the information for the Machine Learning model but it can be useful for other business strategies. We need to match and merge data based on our input and output variable.

Netflix also monitors social media for ongoing conversations where there may be room for it to add to the conversation. 

While it may not be able to identify social media users in its database, this is an extension of using clusters of relevant customers/prospects wherever it may find them.

## How organized is your data and In how many distinct systems does your data reside?

A survey in 2019 that targeted 200 IT, data science, and data engineering professionals at North American organizations with at least 1,000 employees found that the mean number of data sources per organization is 400 sources. More than 20 per cent of companies surveyed were drawing from 1,000 or more data sources to feed BI (Business Intelligence) and analytics systems.

An Average Fortune 1000 company has around 48 applications and 14 databases for a total of 62 potential systems that it can integrate. Chances are that your corporation is no different. 

Take our hypothetical grocery as an example:

A. Products purchased by customers are in the POS (Point of Sale) system

B. Customer demographics and shopping frequency etc. are likely in the CRM (Customer Relationship Management) system

C. The suppliers are probably dealt with through some ERP (Enterprise resource planning) system

D. In-store inventory, returns, etc. are in some other system with a tie into the supplier systems for re-ordering

E. Historical pricing information may be in some other system

For the Machine Learning algorithm to take into account and test various variables, it is critical to:

A. Provide it with access to all these relevant data sources and elements

B. Ensure that the algorithm will have access to all such data sources on  an ongoing basis so that the data never goes stale.

## How frequently does your data get updated?

Every organization is different and they have a different time frame to update their data. And we know data is very crucial for our Machine Learning model to produce a highly accurate result. Training your machine learning model and then deploying it is not a one-time thing. It continuously needs to be updated because the surrounding environment always changes. Weather, holidays, long weekends, and the school year can all impact customer buying behaviors. So, new data needs to be fed to the model to maintain its predictive accuracy. Imagine applying NLP (Natural Language Processing) to a chatbot from the language model used in the 1980s. Language evolves. Pronunciations change, meanings of words drift, and so for a modern chatbot, the model needs to be fed to the new language.

There are two ways to keep your model updated:

- Manual approach

- Continuous approach

### Manual approach

This approach consists of training the model again with the new data. This process can be time-consuming, of course. With such an approach, a business may only find out too late that the ground underneath its feet has shifted already.

### Continuous approach

This approach incorporates new data streams into the model continuously. For example, Spotify uses collaborative filtering to provide recommendations to its users based on the preferences of users with similar tastes. So the data is fed back to their models’ algorithm and thus refines the user experience. Netflix also uses the same strategy for the continuous learning of their systems.

## In what format(s) is the data?

There are multiple ways of storing data, so the data formats vary greatly. We need to understand file structure and data format because we have to combine all the data and make a single data table for the Machine Learning Model. 

Data format represents how the data is stored in memory. The enterprise data is mostly in some database, and it is accessed through SQL. Various file formats are available that are used interchangeably for optimizing their data. We will have a brief overview below for some of them.

 
 ![table 2.png](https://books.vinpatel.com/u/table-2-nm6Q8G.png) 


## Do you have the expertise and resources required?

 The expertise and resources required to deploy a machine learning model generally depend on two factors:

A.    People

B.    Hardware

**People**

Having the right skill set of people in your team is as equally important as the algorithm you choose. We need not only data scientists and engineers but also business analysts since we are working on a business problem at hand. Data Scientists can take care of data science-related issues but they may not have the domain knowledge required for Machine Learning. Our goal is the automation and ongoing maintenance of the business process or system.

 ![28.png](https://books.vinpatel.com/u/28-aSnF0F.png) 

**Data Engineer**

As we discussed before in the data-gathering section, the output of the machine learning algorithm depends on the depth and breadth of data. Data engineers are responsible for integrating these data into the machine learning model. Data Engineers look at data availability, data hygiene, and data refresh routines to ensure new data keep flowing in, providing a methodology to estimate any missing data so that the algorithm does not assume missing data to be an actual outcome.

**Data Scientist**

Data scientists explore data to extract features critical for making business decisions. They often guide engineers on how to structure the business model according to the appropriate metrics.


**DevOps Engineer**

Development and operations engineers collectively referred to as DevOps engineers are responsible for the deployment and maintenance of the model. They have to interact with engineers and data scientists to ensure the smooth working of the model. All infrastructural needs are addressed by them.

**Business Analyst**

Business analysts look at the data provided by the data scientists and analyze business use cases to know where their model can add value to the organization.

The main obstacle that organizations face in implementing their business model is the gap that exists between data engineers and business analysts.

## Do you have the infrastructure?

There are four steps involved in preparing the ML model from an operational perspective.

- Gathering and processing the data

- Training the learning model

- Storing the learning model

- Deployment of the model

Among these, training the model is the most intensive task in terms of computational power. It is where CPUs, GPUs, and TPUs come to play.

 ![table 3.png](https://vinpatel.com/u/table-3-JtCNUJ.png) 

**CPU**

A CPU or Central Processing Unit is the brain of any computing device and is generally used to compute complex calculations. They are applicable for fast parsing or performing complex logic. So, if our machine learning task is small and only needs to handle complex calculations sequentially, we can safely use CPUs.

A CPU such as i7–7500U can train an average of ~115 examples/second. But when things get a little intensive, we might consider using a GPU.

**GPU**

A GPU or Graphical Processing Unit works opposite to a CPU. It performs calculations in parallel and has excellent processing power. A GPU has been popular with gaming applications and graphic engines. So, if your machine learning task is intensive, GPU would be a better choice to use. Laptops with a high graphics card like Nvidia GTX 1080 (8 GB VRAM) can train an average of ~14k examples/second.

**TPU**

TPU or a Tensor Processing Unit is an application-specific integrated circuit (ASIC) which is designed by Google for its machine learning framework. Google Search, Google Photos, and Google Translate all use TPU. A TPU can deliver 15-30x better performance than a CPU or a GPU. 

If your project needs TPU’s for computing power, you will likely need to use an external cloud service.


## Do you have corporate buy-in?

The end goal of a Machine Learning team is to find new insights, accelerate existing processes, or confirm/disprove existing insights. In the case of new insights, a business must be prepared to act on those insights. Parking the findings even for a few months, may in some cases, make the algorithm and its findings outdated. 

Change can be uncomfortable. Understanding the math behind the exact working of an algorithm can be daunting. But if there is buy-in at the top leadership level in a ‘better’ way of doing things, then the organization is ready for change.

## Crawl, Walk & Run

Starting down the path of ML and AI, organizations should keep the mantra of “crawl, walk, and run” in mind. The organizations that are succeeding in ML are utilizing some basic formulas. They are harnessing new sources of data while improving the algorithm’s performance. Your goals may not be to use AI to create something new but rather accelerate your business’ momentum. 

The following quote from Dr. Martin Luther King Jr. is apt for the Machine Learning journey:

“If you can’t fly then run, if you can’t run then walk, if you can’t walk then crawl, but whatever you do you have to keep moving forward.”

**Improving Analysis**

One feasible way is to invest in software usage analytics. Tracking user interaction with the model is an efficient way to collect data about outcomes and also correct the issues faced. It may also help organizations fine-tune their feature sets.

**The Right Team**

As discussed before, we need a team of not only data scientists and engineers but also business experts. We need all of them to work together. Having a diverse set of skills and providing them with an appropriate learning environment for analytics plays a crucial role.

**A Unified Data System**

When an organization merges all types of data into a single system, it is called a unified data system. It allows us to see a complete picture of the company’s data and provides flexibility to work with data of different formats. Enabling this is a giant step in the journey to make the Machine Learning process and program a success.

**Machine Learning is not a one-and-done exercise**

Take the case of our hypothetical grocery store. On an overall basis, the consumption pattern may not differ too much year over year. Consumption patterns based on Superbowl Sunday in 2023 may be quite similar to Superbowl Sunday in 2024, and the same may be true for Back to School for a year over year.

But when it comes to clusters of customers identified by a Machine Learning algorithm, the goal should be to spot folks that are more open to vegan Superbowl Sunday ingredients, etc. or folks that no longer have school-going kids. The job of the machine program is to provide inputs for the best possible outcome at each touchpoint for the customer.

Projects that are AI and big data-based have a high failure rate. 87% of data science projects never makes it to production. 77% of businesses report that the adoption of Big Data and AI initiatives represent a challenge. With such high failure rates, crawl, walk, and run is the best way to move forward unless an enterprise is facing an existential threat.

# Deep Learning & Neural Networks

 ![29.png](https://books.vinpatel.com/u/29-Y9THQa.png) 

## Visual Representation of a Neural Network

Deep Learning is a subfield of Machine Learning which makes high performing prediction models. Deep Learning uses Neural Network architecture with multiple hidden layers which is inspired by the structure of the human brain. In a human brain, the neurons form the fundamental building blocks of the brain and transmit electrical pulses throughout our nervous system, and the perceptrons receive a list of input signals and transform them into output signals.

Deep Learning is based on artificial neural networks and was built by experts to mimic the working of the human brain. Similar to how humans learn from experience, deep learning algorithms learn from iterative experience. Every time a network performs the task, it learns and adds the learnings of the approach it undertook to its memory. Over time, with increased practice it makes it possible for it to improve itself for a better, more accurate outcome.

 ![30.png](https://books.vinpatel.com/u/30-zz2lzW.png) 


There are two reasons which have caused Deep Learning to take off in the last few years:

- The availability of large data sets have caused engineers to look for a scalable way to address the challenges and opportunities being presented. From facial recognition to speech recognition, the vast and unique datasets have meant that traditional labelling of a large and representative data set is no longer required.

- Deep learning requires high-performance GPUs for computation. Manufacturers of GPUs like Nvidia and AMD have contributed to this to train deep learning algorithms in a time-efficient manner.

The availability of these high powered computing resources on a usage based approach (via the on demand cloud services) have made it easier for researchers, startups and established players to approach the Deep Learning space in a serious manner.

## How do Deep Learning & Neural Networks Work?

Deep Learning models learn by discovering intricate structures in data by building different models. These models are composed of multiple processing layers called hidden layers. We can add multiple hidden layers between input layers and output layers to generate high accuracy prediction models.

**Deep Neural Networks (DNN) as an Artificial Neural Network (ANN)**

Artificial neural networks consist of networks of neurons computationally designed to solve a specific problem using different layers.

A Deep Neural Network (DNN) is made up of layers of artificial neural networks (ANN). Each layer can have a specific purpose and for a complex problem, this is what necessitates the need for multiple layers.

 ![31.png](https://books.vinpatel.com/u/31-cWTfqv.png) 


## Components of ANN (Artificial Neural Networks)

We will discuss how ANNs work by breaking it down into components and exploring them.

**Nodes**

Artificial neural networks consist of a collection of nodes called artificial neurons. Nodes are points in these networks where computation takes place. Just like the neurons in our brain receive a stimulus, process the information, and transmit signals to other neurons, these nodes work the same way.

**Weights**

Each node in the artificial neural network is accompanied by weights that assign relative importance to the input. When the inputs are fed into the network, the weights or coefficients are multiplied by that input. This input-weight product is summed with a bias factor. It is then passed to the node’s activation function.

**Bias**

A Bias is a constant added to the model for managing error in prediction. Before starting the iteration, we assign some default value like 1, which changes with an iteration to reduce errors in prediction. You can consider this as an output when the input is zero.

 ![32.png](https://books.vinpatel.com/u/32-Qb5W9L.png) 


**Activation Function**

The activation function determines whether this input-weight product summed with the bias holds significant value to progress further in the network or not. If it does not, it is simply not “fired”. There are many types of activation functions that work for particular applications like the step function, linear function, sigmoid function, etc. We will examine their features along with their drawbacks in the comparison table below.

 ![table 4.png](https://books.vinpatel.com/u/table-4-TMQN3a.png) 

## Layers in Deep Neural Network

Deep neural networks are composed of layers of nodes where the inputs are transformed through some activation function and passed to the output layer. The first layer is called the input layer. The last layer is called the output layer, and all the layers in between are called the hidden layers. The more hidden layers we have, the ‘deeper’ a neural network is said to be.

Increasing the number of hidden layers depends on the complexity of the problem. It may increase the accuracy of our results, but raising them beyond a sufficient number might result in the overfitting of the model.

**Calculating the threshold of acceptance of results**

Building a deep learning model means that our ultimate goal is to make our results useful for whatever problem we are trying to solve. Focusing on the outcome is a direct measure of our deep learning model’s performance. But how do we determine the correctness of our results? We set up a threshold for this purpose. A threshold function acts as a binary classifier in our neural network. The results above the threshold are assigned the value of 1 (accepted). And the results below the threshold are assigned the value of 0 (rejected).

**1.    Is there an ideal number**

Based on the threshold, the samples choose one of the classes. Based on our model’s performance in terms of sensitivity (occurence of a true positive result i.e. no or low false positives) and specificity (occurence of a true negative result i.e. no or low false positives), we can fine-tune our threshold to accommodate the samples. So the inputs and their weights can be manipulated to reach the desired outcome by thresholding. It cannot be an ideal number like 0.5 between 0 and 1 because the threshold is problem-dependent.

For example, in our hypothetical grocery store example, we would like to email a 15% off coupon to users who abandon their shopping carts. We would build a classifier that predicts users who will not proceed to checkout. This is because we should not plan to display coupons to regular customers who are going to  proceed through to the checkout anyway. So, we may say that the users with an abandonment score above 0.8 should get the coupon.

**2.   Determining the threshold**

The nature of the deep learning model is that you need to make tweaks to the nodes and layers and then re-evaluate your model.

You start by initializing all the weights to random values (including the threshold of each neuron). We then train our model by feeding our inputs and compute the error by taking the difference between the supposed result and our network’s output. This process is called the cost function. Ideally, we want our cost function to be zero. 

The challenge is the same: we need to figure out the weights. For this purpose, we use optimization techniques like backpropagation that adjusts the weights a little in each iteration, including the threshold that gives the right output from the given inputs.

There are two critical components to the Deep Learning Model

A. Cells and
B. Layers

**Cells**

We can consider a cell in Deep Learning as the equivalent of a neuron in the human brain. Much like a neuron that transmits information to other neurons in the human brain, a cell on Deep Learning sends information to other cells to generate the best overall outcome. Cells can transmit information forward or backward, can be connected to cells in separate layers, can hold values and can have different weights in terms of the criticalness of the information they supply.

**A basic neural network cell**

This type of cell is the most fundamental type of neural network cell. It is used in feed-forward architecture, a network in which information moves in the forward direction only. The cell is connected to other neurons in the network through weights, and each connection carries its weight.

The weights are initialized as random numbers at the start, and they can be positive, negative, big, small, or even zero. The incoming inputs that become the value of these cells are multiplied with weights and added together. There is one extra value called bias, which has its connection weight. This bias value ensures that even if all the inputs are zero, there is still some activation of the neuron.

**Layers:**

A layer is a group of neurons that are connected to neurons in other layers, but not to neurons within the same group. Each layer can focus on a specific part of the overall task.

 ![33.png](https://books.vinpatel.com/u/33-lUZ5kk.png) 

**A visual deposition of layers in a Neural Network**

In basic supervised models, as a user, we can see the input layer comprising the input variables and the output layers which provide the predicted value. But when the algorithm moves into the realm of AI, it is not possible to know how each element may be impacting the end output.

## Potential use cases for an Enterprises

Deep learning finally made speech recognition accurate enough to make Amazon’s Alexa and Apple’s Siri more conversational. Rohit Prasad, vice president and head scientist of Alexa, says that the ultimate goal for Alexa is long-term conversation capabilities.

The challenge in NLP has always been the interpretation of the human language. The deep learning approach in NLP (Natural Language Processing) is being widely used for Predictive Analysis in these speech assistants to provide an appropriate response to humans.

Usually, modelling the data in AI requires training the algorithm with labelled data and then exposing it to unseen data sets for prediction. Deep learning takes it one step ahead by directly working on the audio or speech without any initial training. This capability of deep learning is known as “feature engineering”. However, the DL algorithm may take several tries to make this process more accurate. 

In our grocery store example, deep learning is a significant focus on search and personalization. Based on the contents of the user’s fridge (if provided by the user) and their previous purchase pattern, we can create a personal shopping experience for the users by suggesting them a shopping list.

We can use deep learning to have conversational agents that can answer phone calls about online grocery deliveries. These neural agents are no different than chatbots.

Google also provides a virtual agent service to customers built entirely on its own AI infrastructure using speech recognition, speech synthesis, and natural language processing.

Social media platforms are also leveraging the power of deep learning. Take the example of Pinterest. It uses a visual search tool to zoom in on specific objects in the image and recommend Related Pins. Like the other industry giants Facebook, Google, IBM, and Pinterest also has to deal with a lot of data to train its artificial neural networks and optimize its services.

## Different Types of Neural Networks

Here we will discuss different types of neural networks along with their applications. They each have their unique strengths and work on different principles to determine the outcome.

 ![table 5.png](https://books.vinpatel.com/u/table-5-zOGqb3.png) 

### Feed Forward Neural Network (FF or FFNN)

 ![34.png](https://books.vinpatel.com/u/34-gZ7zBu.png) 

This type of neural network was the first type of artificial network created for use in the machine learning framework. These networks are called “feed-forward” because the information flows in one direction only from the input layer, to the hidden layers, and finally to the output layer. Its working principle is what we have discussed previously as it relates to a simple neural network. The input value is multiplied by the weights, and then this is fed to the next layer as output. It uses some classifying activation function here to decide which values to pass through. So a Sigmoid Function or ReLU is generally used here.

How does this neural network learn? 

It learns through a backpropagation algorithm. The output that this network generates is compared with the actual output. The weights are then updated to reduce the error gradually. These neural networks are less complicated than the other types of neural networks and faster because of one-way propagation.

#### Potential Use case

Feed-forward neural networks are used in classification algorithms like in image recognition and computer vision, where the target classes need to be classified.

### Recurrent Neural Networks (RNN)

Recurrent neural networks are the class of networks in which the output of a layer is fed back to it as the input. It is done to help predict the outcome of that layer. The neuron cells in this network act as memory cells, storing the information from the previous time step for future use. It is a combination of a feed-forward network and recurrent neural network in the sense that the first layer works like feed-forward architecture, and the subsequent layers that follow have recurrent neural network architecture.

#### Potential Use case

Recurrent neural networks came into existence when we needed to predict the next word in a sequence. So retaining the information was key here. Natural Language Processing (NLP) and anomaly detection use this network. The RNN algorithm is created, which, in return, creates a custom template report for the customer using their relevant information.

 
Similarly, in fraudulent activities on the internet, automated algorithms look for a discernible pattern. So the RNN notices suspicious behaviour since it has already explored the data and ‘knows’ the particular turn of events.

### Convolution Neural Network (CNN)

 ![35.png](https://books.vinpatel.com/u/35-pyrwFj.png) 

**A Visual Depiction of a Convolution Neural Network (CNN)**

Convolutional Neural Network is commonly applied in image processing to analyze visual imagery. The first layer acts as a convolutional layer and uses a convolution operation on the input. This operation multiplies the input data with the set of weights called the filter.

The CNN processes information in batches like a filter. It understands one part of the image before moving on to the next until it completes the full image processing. In convolutional layers, the neuron cells are not fully connected to every other cell in the network. These layers use pooling cells, which are used to extract the relevant information from the inputs.

#### Potential Use case

Companies use CNN in simple applications like facial recognition. The CNN algorithm identifies every face in the picture and then identifies the unique features of the face. By comparing this data with the stored data in the database, it matches the face with a name. Social media uses this face recognition feature to tag your friends in photographs.

The healthcare industry also uses CNN for medical image computing. It detects anomalies in the X-ray and MRI images more accurately than the human eye. The algorithm has access to the Public Health Records, where it can see the differences between the images for predictive analysis.

### Radial Basis Function Neural Network (RBFNN)

This type of neural network uses a radial basis as an activation function. This function measures the distance of a point relevant to the center. Each neuron stores a value as a prototype when it is classifying on the training data. When the input is received, the neuron cells calculate the Euclidean distance between the input and the prototype value and decides the category to which it belongs.

#### Potential Use case

RBFNN has found its applications in power restoration systems. As the power systems have become more prominent and sophisticated, power outage issues have increased considerably. This algorithm ensures the restoration of power in the shortest possible time frame.


### Modular Neural Network (MNN)

 ![36.png](https://books.vinpatel.com/u/36-f4mjqJ.png) 

**A visual representation of a Modular Neural Network**

A modular neural network consists of many networks that work independently and perform sub-tasks. They do not interact with each other, and each module is responsible for its action. As a result, modular networks work relevantly faster for complex computational processes.

#### Potential Use case

MNN models are really popular for certain applications like stock price prediction systems. The system is based on various independent modules that work on their input and process the information. The final processing unit gathers the results of all the modules to project output.

https://ieeexplore.ieee.org/document/5726498

## Some of the Deep learning use cases examples

**Robotics**

You can easily find that the recent development in robotics is due to deep learning and AI. In the near future we will be able to see robotics being used in things like driverless cars, sensing and responding to the environment, smart-home and many more such applications. All these are possible due to deep learning and AI adaptation.

**Agriculture**

Deep learning is helping farmers in being able to distinguish between crop plants and weeds, signs of infestations and plant disease etc. This information helps the machines decide what to spray and where in terms of fertilizers. This saves lots of time for the farmer and gets better results. Deep Learning has a very high scope of implementation in agriculture.

**Healthcare**

The Convolutional Neural Network is used for the classification of images. We know the significance of classification when it comes to medical images. For example, a dermatologist can use deep learning in classifying skin cancer or an earlier detection of breast cancer. Some organisations have received FDA approvals for deep learning algorithms for diagnostic analysis. The faster and more accurate classification of images can greatly improve accuracy versus having to rely on a human examination of x-rays and MRI results.


# Myth Buster Section

In this chapter, we are going to discuss 7 Machine Learning myths that are misconceptions and need to be cleared up.


## Myth 1: Machine Learning can do anything with massive amounts of data

There is a general perception regarding the data for ML, which is “the more, the better”. Although we have discussed in our previous chapters about the amount of data we may need, we also mentioned the importance of having  “relevant and clean” data.

We need sufficient data and high processing power systems for our ML model to train, so that it produces a model with a high degree of accuracy when it comes to prediction. Specifically, Deep Learning models and NLP models require millions of records to be trained. But these data elements must be relevant and clean, otherwise, our ML model will be unable to generate the result that we are expecting.

There isn't going to be any significant positive impact on your ML model if the data is messy and irrelevant to the problem at hand. Sometimes, it can even harm the performance of your model. 

Let us discuss a use case in robot reasoning trained with lesser data.

You probably have seen CAPTCHAS on websites to determine whether you are a human or a robot. Researchers have developed models that can break those CAPTCHAS with 67% accuracy using only 5 training examples. 

So, having mountains of data is NOT all the rage; having cleaner data is.

## Myth 2: Machine Learning is about computer programs learning to think like a human brain.

Us humans have some physical and emotional limitations when it comes to our brains. We are biased and emotional sometimes while making decisions and can make mistakes when analyzing tremendous amounts of data. But computer programs have no physical or emotional limitations. Our Machine Learning algorithms can pick up on these biases if these are baked into our training data. 

ML identifies the intricate patterns and analyzes the data better than humans. Here is an example that can clarify this phenomenon. 

Researchers at the National Institute of Health and Global Good developed an AI model that can analyze the digital images and diagnose cervical cancer symptoms, which may go undetected by human experts.

In a project initiated by Google called AutoML, they made a machine learning software that can teach other ML software. The system was successful at categorizing objects in an image by 43 percent, while the humans scored 39 percent.

## Myth 3: Machine Learning Can Work Independently Without Human Intervention

Humans build the algorithms and the techniques used to develop a machine learning model. Providing cleaner data in its proper format and continuously updating the dataset is something that requires human intervention.

Machine Learning algorithms cannot work independently.  We need Data Scientists for decision-making and to avoid new risks that algorithms introduce. We need human intervention to test and re-train the model with new data and new algorithms.

We also need human intervention to do feature engineering and derive new features, which would have more significance in data modelling. These researchers tell the model what to learn, how to learn, how the model and findings should be deployed, and evaluated. Fine-tuning the weights in a deep learning network to find the best model also requires the supervision of experts. 

## Myth 4: Machine Learning Will Take Over Human Work

Yes, machine learning is going to replace humans for many tasks. For example, robots work in Amazon warehouses to move things around for picking and packing items. Every 10-15 years, old skills and talent gets replaced with new skills and talent. This is the same trend in ML and AI. Erik Brynjolfsson, Director of the MIT Center for Digital Business, says:

“Managers who know how to use machine learning will replace managers who don’t”. And we hear the very same thing now with AI.

This is going to be the winning mantra. The modern digital revolution requires humans and machines to work side by side: using machine learning to automate business processes and using the interpersonal skills of humans to take businesses to the next step is what will set the winners apart from the others.

We need to be decision-makers and input providers to our models.

## Myth 5: Machine learning can predict the future

Machine learning models are fed with the historical data to train, based on which it predicts future unseen instances. It can generate insightful results in business or can recommend products, services or entertainments according to a person’s past experience and profiles.

But right now, AI is not in the best position to predict the uncertain future or any black swan events. For example, it cannot predict the sales of a particular product when the world gets hit by the COVID-19 pandemic because the past data did not look like the present.

Also, it still lacks the creative thinking ability of humans despite the availability of large amounts of data and math-based technology.

In our grocery store example, the idea of extensive sales for men's products like that of beard wax (that was not popular in the past), would make it hard for a Machine Learning algorithm to predict the success of beard wax. But the growing trend of using beard wax in the US can be seen in the Google Trends image below.

 ![Screenshot 2025-09-01 at 2.02.26 PM.png](https://books.vinpatel.com/u/screenshot-2025-09-01-at-2-02-26-pm-69CPXV.png)  

## Myth 6: Machine Learning is the Objective

ML relies on data to function. It might give us the perception that it boasts objectivity. However, the data may reflect human biases in it and hold preconceptions that can produce stereotypical outcomes.

An example of this is Amazon, when it tried to use recruitment software for software development and other technical jobs. The resume data fed to the software mostly contained resumes over the past ten years, which mostly came from men. So the model concluded that male candidates were preferable over females for these positions. Later, Amazon realized the error and edited the program to have a bias-aware algorithm.

## Myth 7: Machine Learning is difficult to implement in the Business

Businesses that are using ML have a competitive advantage over those that are not. Even small companies can use AI and ML to provide a better user experience to their customers. 

As we know that AL and ML models need large and relevant amounts of that for training purposes. But small companies and startups do not have enough data and seldom have a high configuration system for training AL and ML models. In this case, they can use open data sources from World Bank, Data.gov, Open data sets on Kaggle and elsewhere and cloud computing resources such as AWS and AZURE to build their AI and ML model.

 ![37.png](https://books.vinpatel.com/u/37-mjm1Qp.png) 


ML can assist you in your existing jobs to increase the productivity and efficiency of your business. We also discussed in detail in our previous chapters, the checklist to make your organization machine learning ready and what you can do to achieve it.

# Programming Languages Used for Machine Learning

When you are planning to build AI and ML models for your business process, it is very important to select the right programming language after deciding the business objective and relevant data.  

One of the significant factors in deciding the programming language for your machine learning is the end goal of your project. Here we will discuss the main programming languages that can be used for Machine Learning, along with the associated use cases to see the effectiveness of each in different domains.

## Python

Python is one of the most prominent languages to be used in machine learning. Python is an open-source language that has a relatively easier syntax than other languages. You can implement classes, modules, objects with remarkable ease. 57% of data scientists prefer using Python in machine learning. 

What could be the reason for its popularity? The answer lies in its tools and libraries for the machine learning framework.

NumPy and Pandas are popular libraries that perform mathematical functions like linear algebra,  extraction and manipulation of datasets. It is used for handling structured data, images and sounds.

Scikit-learn library supports different ML algorithms such as Decision Trees, Ensemble Learning, KNN, SVM, clustering and dimension reduction such as PCA. 

TensorFlow is another excellent environment for handling and training deep neural network algorithms. It can be used on a wide range of platforms, including desktops, phones, and servers. It includes Keras libraries for building models such as ANN, CNN, RNN LSTM and many more.

**Potential Use Case**

Due to the wide variety of specialized libraries, and simple-to-understand syntax, Python is used for all types of ML algorithms and even in Natural Language Processing (NLP) and sentiment analysis. Voice assistants like Amazon’s Alexa, Apple’s Siri, and social media features of face recognition have been built using Python.

## R

R is another popular and powerful language used by data miners and statisticians around the world and provides support for operating systems like Windows, Linux, and OS X. It is an open-source, object-oriented, and dynamic language. It has LaTeX-like documentation (LaTeX is the de facto standard for the communication and publication of scientific documents) which helps the user to write programs. R provides different modules for advanced functions of statistical and graphical analysis. R module GGPLOT2 is used for graphical analysis. Some of the machine learning packages used in R include CARET, KernLab, Rpart, and the interface for TensorFlow.

R studio is a GUI interface to write and execute programs. And R Markdown can be used for adding text messages, program results embedded within the program. We have options to save multiple R projects as ‘.rdata’ or ‘.rda’. Later on, we can use this file to extract data as well. Another file .rds is used for single R projects.

**Potential Use Case**

R finds its applications in data mining, predictive analysis, and biomedical statistics. Several other languages for artificial intelligence development include Lisp, Ada, Julia, Go, Shell and Prolog, but not many people are well-acquainted with these.

## C++

C++ is the second most widely used programming language, with 43% of data scientists preferring it over others. This language is a low-level language compared to java meaning the machines can read the code more clearly than humans. That is why it has a faster run time as well. Most Data Scientists prefer Python for easier syntax but C++ is also preferred for faster execution even with larger datasets. C++ also provides a rich library support through STL (Standard Template Library). Tensorflow C++ API is used in the backend in many ML applications for faster computations.

The machine learning developers have a goal to focus on, and they do not want to get stuck in the intricacies of the code. That is why C++ is not preferred by many for development as compared to Python. C++ does not provide a collaborative environment like Python does. Jupyter Notebooks and Google Colab are interactive environments designed in Python where users can learn and share their code easily.

**Potential Use Case**

The task-specific algorithms in AI use C++, for example, in gaming applications, robotics, and computer vision.


## C# (pronounced C Sharp)

C# is also a versatile programming language that is simple, object-oriented, and open-source. Programmers can build applications like Windows clients, mobile apps, web applications, and consoles. It is among the top 5 programming languages used in ML.

C# is used in machine learning via ML.NET framework, which comes with exclusive packages like TensorFlow. It allows developers to fuse their custom-created models and algorithms into applications using .NET, even if you do not have a strong experience in machine learning.

ML.NET is a machine learning framework for .NET(C#) developers. It can be used for various ML projects such as scenario analysis, product price prediction, product recommendation, sales forecast and many more.

**Potential Use Case**

Using C#, in deep learning, is not easy as it has no toolkits available for this purpose. You need to have a deep understanding of this language before implementing your model as it is a bit more complex than Python.

## Java & JavaScript

Java closely resembles C/C++ in its syntax and also follows object-oriented principles. It is also easily implemented on various platforms. Some of the Java libraries for the machine learning framework include JavaML, ADAMS (Advanced Data Mining and Machine Learning System), and Deeplearning4j.

JavaML provides a simple implementation environment for machine learning algorithms. The ADAMS library helps control the data flow in a tree-like structure to manage the workplace. Deeplearning4j is another popular library that provides support for handling deep neural networks. It is commonly used for finding patterns in speech and text and detecting anomalies in fraud detection. Java is, however, slower than C++.

JavaScript is a web scripting language, and many machine learning libraries support JavaScript.

**Potential Use Case**

Java is a more enterprise-focused ML language. It is used to build applications in financial institutions, mostly for fraud detection and network security.


 ![table 6.png](https://books.vinpatel.com/u/table-6-Artvnu.png) 

# Infrastructure for AI/ML 

The biggest challenge an organization faces while adopting Machine Learning is the infrastructure cost. As we have seen, we need a significant amount of data and high-powered processing systems for Machine Learning and self hosted AI applications. The ideal solution for this is the Cloud Environment. Investing money in dedicated hardware, infrastructure and skilled personnel is not an option that most organizations have.  

Gordon Moore, the founder of Intel, made a famous prediction in 1965, which is known as Moore’s Law. He stated that:

‘The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly, over the short term, this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.’

The essence of Moore’s Law is that with each passing year, the cost of computing power should halve. But, the actual drop in computing costs has actually been much more severe. Take a look at the chart below.

 ![38.png](https://books.vinpatel.com/u/38-Q6uKUb.png) 

## Computing costs are falling faster than what even Moore's Law could predict


It is this dramatic decrease in costs at scale that has taken Machine Learning and AI from the statistical and mathematical halls of Universities and the domain of ultra-large corporations and made it available to the average corporation and committed entrepreneurs. 

The decrease in computing costs has also come at a time when growth in data has exploded.

 ![39.png](https://books.vinpatel.com/u/39-YLIQiu.png) 

This alignment of the stars could not have been planned. These two trends are what make us believe that we are living in very interesting times. 

When you hear of digital identities being given in China, tech giants getting into healthcare, and self-driving cars being around the corner, it is the abundance of computing power and the ability to handle the massive amounts of data that are to be thanked for.

 ![40.png](https://books.vinpatel.com/u/40-xzLUFM.png) 

## Growth of storage in the Public Cloud

Machine learning and cloud computing together form the Intelligent Cloud. Instead of maintaining your own IT infrastructure, you can use cloud computing to provide you with storage and processing power. This, combined with the machine learning capabilities, will allow intelligent clouds to learn from the vast amounts of data and analyze the situations better. 

Cloud computing is divided into three essential cloud models:

SaaS - (Software as a Service that runs in the Cloud)
IaaS - (Infrastructure as a Service that runs in the Cloud)
PaaS - (Platform as a Service that runs in the Cloud)

 ![55 table.png](https://books.vinpatel.com/u/55-table-IRQuvE.png) 


### Software as a service (SaaS)

SaaS is a cloud computing model that delivers applications as it is a service managed by a third-party provider. It is one of the most dominant forms of cloud models, which made up about two-thirds of the public cloud back in 2017. The majority of these instances do not require any installations and can be run directly on the web browser. The end-user does not need to care about the operating system and the hardware. 

There are many benefits of adopting the SaaS model such as being less expensive, easy to implement, high adoption rates, easy data recovery mechanisms and minimal upfront commitment required. There are a few risks such as control and security of application. So it is very important to select a trusted provider for your application

### Platform as a service (PaaS)

PaaS delivers the software and development tools that are necessary to build applications. It gives them a framework which the programmers can use to create customized applications.  PaaS services may include database management, operating system, and middleware.

PaaS offers an on-premise infrastructure to build and deploy your application at lower cost and in less time. But you have no control over data protection and network bandwidth which leads to adverse challenges. A few popular service providers are Google App Engine, Windows Azure and OpenShift.

 ![41.png](https://books.vinpatel.com/u/41-rffrsv.png) 


### Infrastructure as a service (IaaS)

IaaS is a cloud computing model that delivers virtual computing resources like storage, networking, and virtual servers to the end-user via the internet. This is ideal for companies who want to build their applications from scratch and keep control over all the elements. Of course, you do need to have the required technical skills, but IaaS users find it easier to innovate and deploy these services, and it also cuts maintenance costs. It allows enterprises to use these resources on-demand without having to buy hardware entirely. Some of the key benefits of adopting IaaS are improved security, High Availability, Scalability, Cost saving ad Time Saving. IaaS service providers are AWS EC2, Google Compute Engine (GCE), Microsoft Azure, and VPSie.

 As an organization, you need to select a cloud computing model based on your requirement, implementation and need for scalability. IaaS is commonly used due to its high-security features. 

## The Public & Private Cloud

### Public Cloud

Public cloud is a type of computing model where users can access resources like storage, applications, and computing power via the internet. You do not need to have access to hardware as it provides unlimited scalability. It provides a multi-tenanted environment, meaning the data for various organizations may be stored on the same physical server and sharing resources. They generally have a pay-as-you-go pricing model. Examples of public cloud service providers include Microsoft Azure, Amazon Web Services, and Google Cloud.

Public cloud is best suited for:

- Storing and archiving data
- Companies with a lot of customers in IT and business infrastructure
- Providing high scalability environment for heavy workloads
- Situations when organizations need to be cost-efficient as it reduces the    cost of hardware and maintenance

**Limitations of public cloud**

- The data is less secure on a public cloud infrastructure as it is being handled by a third-party
- The cost can exponentially rise for large enterprises as in this model you need to pay for what you use 
- Some companies may have a complex architecture, but the public cloud offers a generic environment for managing business operations

### Private Cloud

The private cloud also provides nearly the same benefits as the public cloud but with greater security. This infrastructure is ideal for large businesses. The data center may be physically located in the company’s premises or operated by a third-party provider. The private cloud gives you more control over the data since the infrastructure is maintained on a secured private network. Examples of private cloud service providers include Microsoft Azure, Amazon Web Services, IBM, and Cisco.

Private cloud is best suited for:

- Companies that require security and advanced privacy over their IT             infrastructure like government agencies
- Organizations that need custom flexibility and scalability via “cloud bursting” i.e., non-sensitive data on public cloud and critical information    on private cloud
- Large enterprises that can afford the costs of running advanced data centres

**Limitations of private cloud **

Private cloud is a relatively more expensive solution than the public cloud model because of hardware maintenance. Other than hardware, you also need licenses for software applications. 

**Key Cloud Service Providers**

 ![42.png](https://books.vinpatel.com/u/42-0LFluv.png) 


 ![table 7.png](https://books.vinpatel.com/u/table-7-au5PrI.png)  


# LLMs, Agents and What Comes Next

The rise and adoption of LLMs (Large Language Models) which in some ways an extension of neural networks has been mindboggling: 

In just 5 days post launch, ChatGPT crossed 1 million users.

Two months after launch ChatGPT had 100 million users.

After 9 months since launch ChatGPT has reached 180.5 million users.

And of course these numbers do not take into account the number of users that may be using ChatGPT inside a third party product that taps into the OpenAI API or is using  one of the many other LLMs.

## What is a LLM?

A LLM is an Neural Network model that has trained itself on a large corpus of data (billions of parameters) and has achieved general language understanding and generation capabilities. The LLMs use statistical models and learn the relationship between words and phrases. And that is why when we interact with them, we get visions of AGI (Artificial General Intelligence, where a machine thinks like a human). 

This pattern recognition aspect of a LLM can be best demonstrated by the following screenshot from ChatGPT.


 ![Screenshot 2025-09-01 at 2.36.30 PM.png](https://books.vinpatel.com/u/screenshot-2025-09-01-at-2-36-30-pm-ikbTwQ.png) 


## What is GPT?

It is also critical that we understand the GPT in ChatGPT. GPT stands for "Generative Pre-Trained Transformers”. GPTs are a family of models that use the “Transformer” architecture and this is what has enabled the visible improvement in humans and machines interacting today.

## How does a GPT Work?

 ![43.png](https://books.vinpatel.com/u/43-U0NsIw.png) 

The image above provides a good representation of input text which enters the Encoder as a “prompt”. The encoder analyzes each word in the input prompt and improves upon it if required. The prompt text is vectorized i.e. converted to a numerical set of values where related words and concepts are closer to each other than un-related words or concepts.

These values then are processed by a hidden layer and outputted by a “Decoder” which transforms the vectorized numerical values back to a language that the human user understands. The hidden layer(s) contain different nodes and weights which process the inputs and help generate the output.

While an algorithm may not understand the meaning of the words it vectorizes, it does understand the relationship between words as shown in the diagram  below.

 ![44.png](https://books.vinpatel.com/u/44-5qHnsv.png) 

This vectorized Encoder and Decoder framework is the “Pre-Trained” part of a GPT. The G in the GPT (Generative) allows the program to “generate” new, never seen before output.

## How do the LLM Models do a good job of answering your question comprehensively?

A traditional search engine takes in a keyword and then brings back some top results in its index of a passage or document that has resonated with searchers in the past or in a search engines model of ranking and relevance.

In the case of a LLM, it takes the input phrase and prepares a list of all related phrases using a “fan out” technique. This allows a LLM to be more comprehensive and confident in its response and so far it seems this approach has resonated with users too who prefer this AI way of getting responses versus having to scroll through links to get the piece of information they are after.

 ![56 TABLE.png](https://books.vinpatel.com/u/56-table-AGyfkn.png) 

## Drawbacks of a GPT

The creators of a GPT model suffer from an impossible choice. On the one hand they need to have a very, very, large body of training data (text in the case of non multi-modal models). And on the other hand they need to have a narrow focus so that they can convey context and understand the unique nuances of a specific domain. 

The easiest way to source data at scale is by harvesting and digesting the data on the web. This gets the LLMs a lot of foundational data at scale. But with much of the world's data locked in corporate systems or behind pay walls it is no surprise that Epoch AI, the research team cited in the report, projects “with an 80% confidence interval, that the current stock of training data will be fully utilized between 2026 and 2032”. 

Some key drawbacks of a LLM trained on publicly available data

**1. A bias towards the English Language**

The diagram below clearly shows that it is quite likely that for non English LLMs, accuracy for mission critical applications may not exist at a satisfactory level.

 ![chapter-10.jpeg](https://books.vinpatel.com/u/chapter-10-bD2zeI.jpeg) 

The same bias that exists on a language bias can be extrapolated for all other biases that exist in the human generated content. Some of these may revolve around race, gender, political views etc. may influence the LLM outputs. 

**2. A lack of originality**

If we think of LLMs as one big algorithm that only knows what has been produced over time, you will realize that these models desperately need to know and ingest all the “new” developments that occur. If that does not happen, they will become stale.

Any new words, phrases, concepts cannot be created out of the old data. The lack of access to such information is an achilles heel of the LLMs. LLMs cannot replace original through, finding and creative representation of data (yet).

Please bear in mind that “originality” here refers to an idea or thought and not strings of text that explain the same idea again. This difference is critical to understand. 

**3. Hallucination**

With the background that a LLM has been trained on a large amount of data that has in all likelihood been “seen” (is public on the web) and as a program it has been mandated to produce output, we must recognize that the generated output needs to be validated.

‘Hallucination’ here refers to content (especially text content) that linguistically may sound meaningful but it may not be meaningful from a domain level perspective. Here is a famous example of a hallucination from ChatGPT that has since been corrected.


 ![Screenshot 2025-09-01 at 2.00.00 PM.png](https://books.vinpatel.com/u/screenshot-2025-09-01-at-2-00-00-pm-8EQV5s.png) 


**4. Operating a LLM in a silo**

In order to maximize employee productivity, a LLM needs to ‘speak’ to many internal systems at a corporation. But given that LLMs need to always improve themselves, can a large enterprise be comfortable letting it into its proprietary data systems and ingesting its proprietary data?

Understanding the terms of use of a LLM is critical at corporations that have generated a large shareholder value which can be compromised. An accidental leak at Samsung where an employee uploaded code into ChatGPT is seen as a key example of how large corporations have to deal with this new risk. 

## Maximizing Your Productivity with a LLM 

Despite these risks, LLMs are a great productivity enhancer if they can be incorporated into the work environment safely.

A way out for corporations that do not wish to train a third party LLM with their proprietary data and have greater control is having a hosted open source LLM.

This is how you can and should maximize your corporate productivity with a LLM:

A. Use an open source Transformer Models

With open source Transformer Models, you get the source code and make changes as you see fit and on your own unique business requirements.

B. Train the Transformer Model on Your Data, Your Use Cases and Your Data Sources.

Depending on the tasks you wish to automate for your employees, you can train the transformer models and create a playbook for prompts for your employees.

One of the biggest things that many users miss out on is only thinking of the likes of ChatGPT for content creation. 

We need to imagine how LLM eliminates the structured language many white collar workers have to use when dealing with internal corporate systems. That is where LLMs excel when it comes to productivity. 

The rise in AI powered code editors are an example of this new paradigm. As an example, when Satya Nadella stated that 30% of Microsoft's code was being written by AI, life seems to have come full circle and AI seems to be eating away the profession which gave birth to it.

C. Self host the LLM

By self hosting the entire transformer models and the proprietary data you can be in control from a security perspective.

The rise of open source LLMs (META, Mistral as an example) is a way to fight some of the early closed models developed by OpenAI, Google etc.  By October 2024, META’s Llama had over 65,000 derivatives in the market showing the interest in open source models.

A notable statement from Ashok Srivastava, Intuit’s chief data officer in an interview with Venture Beat,  points to how the ability to take an off the shelf module, self host it and fine tuning it can help make the model smaller and more accurate.  

For customer-facing applications like transaction categorization in QuickBooks, the company found that its fine-tuned LLM built on Llama 3 demonstrated higher accuracy than closed alternatives. “What we find is that we can take some of these open source models and then actually trim them down and use them for domain-specific needs,” explains Ashok Srivastava, Intuit’s chief data officer. They “can be much smaller in size, much lower and latency and equal, if not greater, in accuracy.”

D. Create Agents
 
For many of the repetitive tasks that workers undertake, LLMs can be used to create Agents.

Agents can be thought of as a series of task automations written in plain language (no code) that create multiple steps in a sequence to achieve a stated goal provided by a human user. Agents have the potential to automate reports, presentations and other white collar tasks that take up a lot of time and resources as part of today's legacy work flows.  Agents are becoming project team members that excel in specific roles.

Lets examine the famous UK financial services firm Schroders and their multi-agent financial analysis and research assistant.  Schroders was smart enough to realize that the bulk of the time of their analysts was being spent on “data collection” and not on “insight generation”.

Given the complexity of its use case, Schroders opted to build a multi-agent system to the following characteristics:

- Specialization: Designing agents which are hyper-focused on specific tasks (e.g., R&D Agent, Working Capital Agent, etc.) with only the necessary tools and knowledge for their respective domains.

- Modularity and scalability: Each agent is a distinct component developed, tested, and updated independently thereby simplifying development and debugging.

- Complex workflow orchestration: Multi-agent systems model their workflows as graphs of interacting agents. For example, a Porter's 5 Forces Agent designed to identify and analyze industry competition, could trigger child agents like a Threat of New Entrants Agent, in parallel or sequence, to better manage dependencies between deterministic (e.g., calculations) and non-deterministic (e.g., summarization) tasks.

- Simplified tool integration: Specialized agents can handle specific toolsets (i.e., an R&D Agent using SQL database query tools) rather than having a single agent manage numerous APIs.

## How do Agents connect to each other?

These “agents” need to be able to take in the output from the preceding agent and then pass on the output in a format that the next agent down the line. This is usually achieved via an open standard (Model Context Protocol or MCP) developed by Anthropic, the company behind Claude. While it may sound technical, but the core idea is simple: give AI agents a consistent way to connect with tools, services, and data — no matter where they live or how they're built.

An MCP is written in and implemented using JSON (JavaScript Object Notation) for message exchange. It is typically orchestrated in Python, since most LLM frameworks (LangChain, CrewAI, etc.) are Python-based.

Not to be left behind, Google launched its own Agent to Agent Protocol (A2A), which also enables agents to interoperate with each other, even if they were built by different vendors or in a different framework, will increase autonomy and multiply productivity gains, while lowering long-term costs.


## Why do LLMs Hallucinate & How Can Their Performance Be Improved?

A. Programmed to Provide a Response

As a user of LLMs/Chatbots, we must realize that the LLMs are programmed to provide a response. That is their default behavior. This occurs even if they don't know much about a topic, they will provide a response whether it is grounded in deep knowledge or not.

B. Limited Short Term Memory

LLMs do not have a concept of a long term memory during a chat. At various points, the LLM chatbot “forgets” about the earlier part of the conversation and this also contributes to inaccuracies in their answers. This is why many LLMs choose to differentiate themselves on the basis of a long context window.

A context window is the amount of information a language model can “see” and understand at one time when answering your question.  You can think of the context window as a short term memory. 

As a user we cannot repeatedly refine what is in the context window across several elements of a chat as we are bound to run into context window limitations sooner or later.

C. Lack of Long Term Memory

Unlike a human, LLM chatbots do not “hold” memory across different chat sessions. Clearly this is a problem for enterprise solutions or solutions where accuracy is required.

To solve long term memory issues, the only possible solution is to store the key pieces of information in an external storage mechanism.

 ![46.png](https://books.vinpatel.com/u/46-cIhMFq.png) 

LLMs and Long Term Memory

It is not feasible for a LLM to go over all the data stored in a long term memory storage system every time you ask it a question. The compute time for such an exercise would be too onerous.

In order to ensure that we have an efficient means for the LLMs to operate with Long Term Memory storage mechanisms, we need to use a few different techniques that help balance accuracy and efficiency.

Some of these main techniques are as follows:

A. RAG (Retrieval-Augmented Generation)

In a RAG operation, a user enters a query which is run against a vector database. A vector database is the external storage mechanism where all the relevant data has been converted to a numerical format. This is done via an “embedding model” and in many cases if you use off the shelf vector databases, you may not be aware of which “embedding” model may be used in the background.

Unlike a traditional database, a vector database stores and searches chunks of documents by their similarity.

The prompt/query retrieves the most relevant passages of the responses (the "retriever' part of the RAG).

Once several such “chunks” have been found, the LLM (the “generator” part of the RAG) constructs that response for the user.

This kind of a basic RAG is best suited to processing text content.

B. Extended RAG

Data can come in all formats and styles.

Where the data may include diagrams, images and the text contact is sparse, the vector database needs to be augmented with OCR/labelling. Without this a traditional RAG approach may suffer.

Where the content may include spreadsheets, the data may have to be flattened and pre-processed using Pandas (a Python library used for data analysis). Other data formats may require other types of pre-processing.

C. Graph RAG

Sometimes in data sets, we have “entities” that have relationships between them. As an example, think of a dataset that contains elements from the Periodic table and when specific elements are combined they lead to a specific kind of a reaction.

In such a case, instead of relying on a traditional RAG approach, it would be beneficial to establish relationships between several elements of a data set. This allows for structured retrieval, deeper reasoning and reduced hallucination. Tools like LlamaIndex, Haystack and Neo4jETL etc. can be used in GraphRAG.

Here is a high level summary of some of the major RAG variants:


 ![table 8.png](https://books.vinpatel.com/u/table-8-oJs2R1.png) 


## How can a business adapt a LLM for their business effectively?

All LLMs have been trained on vast amounts of data which has been sourced from the web. This means that the domain knowledge for any specific business is diluted across many other verticals. 

A key way to fine-tune an existing LLM with LoRA (Low-Rank Adaptation) — without needing to retrain the entire model. 

LoRA is a technique that inserts small, trainable layers into a frozen pre-trained model. Instead of modifying all model weights, it updates only a few low-rank matrices.

Think of it like this:

- The main model stays frozen

- You plug in a lightweight adapter

- You train just that adapter

Weights are the learned parameters of a neural network.  They are like knobs that control how much influence different pieces of input have on the output.  In an LLM, weights are the "memory" — they determine how the model understands language.

Weights are found in:

- Attention layers (how words relate to each other)

- Feed-forward layers (basic computations between tokens)

- Embedding layers (mapping words to vectors)

- Output layers (predicting the next word)

## The Agentic Landscape

As the world moves towards an agentic landscape and perhaps viewing the rise of open LLMs, Google launched its own open source initiative of the Agent Development Kit (ADK). ADK is a Python based open source framework that offers a full stack end-to-end development of agents and multi-agent systems. Together with the  A2A Protocol, Google has also entered the “open” and “collaborative” AI initiatives. 

## Where do we go from here?

Here are a few key developments in the AI space that have begun to make their appearance felt in the AI space.

A. From task automation to workflow automation

We are quickly moving from a task automation stage towards “workflow” automation. This magnifies the impact of the changes.

B. The rise of SMOL agents

Smol Agents (short for Small Agents) are lightweight AI agents designed to perform specific, narrowly focused tasks efficiently — often with minimal compute, memory, or resource requirements.

This may mean we will probably see more AI processing happen locally on a user's mobile device where AI models are embedded into apps versus being hosted elsewhere.

C. AI powered browsers

New browsers like Comet from Perplexity can be connected by a user to their email, calendar, drive of documents, email. With these connections, the browser can execute on personalized automation at scale.

Have you thought about what you will be automating at scale and how you will ride the next wave of productivity heading our way.


# So What is the AI Advantage?

In a single phrase, the AI Advantage is “workflow automation”.

From a musician, programmer, film maker, someone that works on drug discovery, a teacher, a student or any one that has some kind of work flow in their job, everyone  will be working with aspects of AI powered workflow automation very soon. 

##1. MCP (Model Context Protocol)##

As mentioned before, MCP’s  are a key standard that will allow non-technical users to be able to use natural language to interact with various systems.

An example that many of us in the business world can easily understand would be to see the Google Analytics MCP in action. Watch this short video to get an understanding of where we are headed https://www.youtube.com/watch?v=PT4wGPxWiRQ

Imagine MCPs for all the major platforms we interact with on a daily basis. Extending this even further, imagine an environment where you can interact with multiple systems via multiple MCPs via a single prompt or series of prompts. 

##2. Critical Thinking & Clear Communication Leveraging the LLM with Humans in the Loop##

Taking an average work flow and average inputs will not result in an exceptional output. The need for human creativity will be even more acute and it will probably be copied by others even faster. Collectively, this is good for us as this will spur even more innovation and at speed.

Critical skills like conceptualizing the end to end work flow, thinking of value add at each step and deciding on the checks and balances at each step in the work flow will need clear communication and the ability to troubleshoot and QA what you have built.

An understanding of what is happening under the hood will help

If you work on the data side of things, hopefully, you will appreciate the overview of some of the key Machine Learning algorithms provided as that will allow you as the “human” to ask specific questions via the LLM of the data you wish to examine.

Over the course of the next few months, we will try and put some of these learnings into action ourselves and will share the results.

##3. There are "Agents" and there are "ReAct" agents##

A ReAct agent refers to a virtual agent that combines the power of "Reasoning" and "Acting". The "Reasoning" element of an agent takes a task and decomposes it into small tasks. The "Acting" agent then executes on the smaller tasks decided in the "Reasoning' phase.

The "Acting" phase is enabled via the agent determining the tool it must use - Scraping, use of a search engine, calling an API, running a calculation and so on.

As the human in the loop, you are able to observe both the "Reasoning" and the "Acting" phases and get an understanding of how and what the agent referred to in terms of data sources and how it arrived the output.

# Legal Disclaimer

This publication is designed to provide general information about machine learning and artificial intelligence for business leaders. It is not intended to provide legal, financial, technical, or professional advice. The information contained herein is current as of the publication date and may become outdated due to changes in technology, regulations, or market conditions.

The authors make no representations or warranties regarding the 
accuracy, completeness, or suitability of the information contained herein. 
Readers should consult qualified professionals before making business, technical, or legal decisions based on this information.

The use of company names, product names, and trademarks is for identification and educational purposes only and does not imply endorsement by or affiliation with those companies.

Results may vary based on individual circumstances, and past performance does not guarantee future results. The authors and publishers disclaim any liability for decisions made or actions taken based on the information provided in this publication.

## Technical Disclaimer

The technology landscape for machine learning and artificial intelligence evolves rapidly. Information about specific technologies, platforms, and performance characteristics may become outdated quickly. Readers should verify current capabilities and limitations with appropriate vendors and technical experts before making implementation decisions. 

Performance metrics and comparisons are based on available information at the time of publication and may not reflect current capabilities or optimal configurations for specific use cases.