How can you predict any algorithm that how it works?

I am seeing this I’m seeing this distribution this way I am seeing this kind of overlap and redo this I’m seeing this kind of overlap I am when I use another dimension in the other dimension also I am seeing an overlap but when I put them together in that mathematical space the two Gaussian become linearly separable because of that linear separability now possible amongst the classes on the dimensions look at this dimension this one.

I don’t know what which one is this one is not going to help you separate the orange from the blue doesn’t matter as long as I have this dimension with me which separates orange from the Blues, so one dimension will compensate for the other dimension so overall these weak predictors put together to become strong predictors exactly overall these weak predictors put together to become strong predictors and as you will notice.

When it goes down the line look at let’s look at the data types they we don’t have any nonnumerical columns here everything is numerical keep in mind all algorithms need numerical columns only numerical columns could be categorical or real numbers come down so I am separating the X variables independent variables the target is the cultivator I am splitting data into training set tested I was splitting my data into training septa set you’re of all of you with me on this okay you can actually download the data set from UCI if you wish to this data set you can download from UCI and run the code very good.

I instantiate the Gaussian noise base here there are different variants of Nye base this nine base assumes that the dimensions are Gaussian on each one of these the distributions are Gaussian on each one of the dimension and we meet that requirement very well on all the dimensions the classes are distributed in almost Gaussian ways Gaussian means looks like a normal distribution.

So perfect case for Gaussian I base so I am calling Gaussian evasive there are other variants like multinomial naive Bayes and other my base which you can make use of so I am instantiating the model Here I am going to do the model fit here this is where our law this my base likelihood ratios will be calculated and then I look at this on the training set itself I am doing testing it’s giving me 97% accuracy why am I doing this I’ll tell you in a minute while I can also do this model on the test data.

Here I am running the model on the test data okay and in the matrix, there is a function called classification report which gives me all this matrix recall precision everything in one shot if you are interested you can separately print the matrix also confusion matrix now look at this confusion matrix look at this look at the class level recall any problems he knew it holy any problems from SQL and import Gaussian okay all right now look at this there are three classes here one two three and look at the class level recall what is class level recall accuracy at class levels almost 100% one means hundred ninety five percent hundred percent.

Each one of the classes it’s able to accurately classify with a degree of this is what we wanted in Pisa diabetes it’s giving us a corrosion vine dataset okay I’ll talk about this micro average and all these things down the line we’ll talk about this later right now I just look at this recall matrix I have not told you what precision is and look at the confusion matrix an actual number of the class an in your test record is 23 all of them have been classified as a class.

An actual number of Records for Class B in your test data is 19 it has correctly classified 1812 it has correctly classified 12 you can achieve this kind of score in this model even though the dimensions are overlapping the classes are overlapping and all the dimensions because of some dimensions which come together to make strong predictors so you should understand the difference between Lima disabilities and this in both the cases the classes are overlapping, but there’s one very important critical difference and it’s that difference which is making this right.

You can run the same algorithm base on Pisa diabetes and see what is the score you get there your job here is to select the right attributes that this is you will never get data on a plate when you are doing real-life products this is one big classroom but when you do sit down and do real-life projects you customer sees like to suppose a simple example can you use data science to improve my customer cell CSAT ratings from the current three point five to four point five out of five can you do something using data science to help me improve my customer rating in our technical support these are the kind of requirements which will come to your data will never come to you.

Now your job is to see site rating customer rating for technical support what kind of data you need not desired you have to first guess what kind of data I need your domain expertise will help you if you don’t have it you need to have domain experts so for season rating in customer tech support what kind of data elite can somebody give go back to the past decades okay go back to the past ticket what kind of data you need to go back to the past decades from the past decades you collect what kind of ticket p1 p2 p3 tickets.

I need to find ticket classes then whether the ticket belongs to hardware-software or something else I need those things and various other you will decide what kind of data you need the next challenge will be very will you get the data from some data will be available within the organization some data will be available outside the organization some data will be available with the customer so getting this stakeholder to give the data to us will be such a challenge so all your soft skills will come into play right.

So once the data comes in you have to first establish the reliability of the data what if the tech support department has given you data where the customer is very happy they are not shared with you the dirty data your model will go for it to us those are the challenges which you will face as a data scientist once the data comes to you on this way attributes that you have you will do this analysis using pair plot and other techniques is this column good or that column good which column should I use to define customer satisfaction.

I come from the IT world just like you do many of you do we people have done a lot of projects in Java and C in all these things there the project starts with customer requirements which would turn into technical requirements that we turn into design requirements that turn into code specs coding requirements that turn into unit testing integration testing and finally sub Murcia then we go for acceptance testing in acceptance testing usually it bombs right.

The reason why it bombs is we look at data only in the last stage may we come to acceptance testing we ask customers to give us some data in data science projects your project will start with data the project will end with the data is the core you will be revolving around the data sets say 80% of your effort estimated effort in data science project you will see that when you do the capstone project will go in getting the data.

What type of algorithm is used for Classification?

Let’s start with the session of this model this algorithm is based on the base theorem it is used for classification again it uses probability values to decide which class a test point belongs to this is the only algorithm that gets this term associated with it called naive the term 9 is not a very good thing to be associated with ok, but this algorithm is the only one which gets associated with that very unfortunate because all algorithms are native.

As per this definition, the reason why it’s called naive is this algorithm assumes that all the variables that you’re going to use to build the models of those variables are independent of one another the independence assumption is often violated we already discussed this but then this is true for all the other ones it’s called naive due to the assumption that the features in the data set are mutually independent they don’t interact with each other, but that’s what every algorithm does.

This is the only one which gets the cap the hat whatever they provide but the beauty of this algorithm is what this algorithm says is howsoever weak predictors your features maybe don’t drop them consider them take into account the information given each feature and build your model based on probabilities so it doesn’t advocate you from dropping any of the features how they were significant the overlap might be using them that was this algorithm to understand this algorithm we have to understand what is the probability we have to understand what is joint probability.

We have to understand what is conditional probably three of them so let’s quickly start what is a probability how do you define probabilities okay chances of some event occurring so in the definition you have an event and then you have something called chances what is these chances how do you find that out how do you care to find the chances of something happening so what involves the Experian what is there in the experimental trials and proportion right so if I ask you what is the probability that today it will rain and you might be surprised just a week ago a week or 10 days ago there was news on the web wherein Kerala there were snowfalls I’ve seen that in Kerala Kerala there’s a hill hilly a body called Hill spot moon are right there they had snowfall and the temperature was minus 3 right, so such events can happen.

If I ask you what is the probability that today it’s going to rain at 5 o’clock when it is easy who said you know to take the weather just talk so what you’ve done is you have taken your experience into account on this particular date January is Jan month of January almost middle of January how many times in the past did I see rain that ratio how many times you saw this event of interest versus your records the number of trials this ratio is called frequencies approach to probability.

The probability is a ratio well it’s a ratio of the events of interest to you how many times they have occurred out of the total number of trials how many times they even could have occurred that is the ratio called probability values did you know that probability values can be calculated in another way have you done Poisson distribution in statistics Poisson distribution very convoluted formula there if given the state I am in what is the probability that I’ll be in a different state the next state that is a function of how many ways you can reach that state, okay, so that is another way of finding probability we are going to use the previous def of probability which is based on frequencies ratios okay now that you know.

What is the probability the next thing you need to know is what is joint probability joint probabilities’ probability of multiple events occurring together okay so what is the probability of if I give you a deck of cards and asked you to pull a card out of it and I ask you what is the probability that this card is going to be a red king, so there are two colors red in black 52 cards so 52 cards you have two red King’s hearts and diamonds so the probability of this joint event of the red king is red is one king is one of the probability is two by 52 all right now when you’re pulling out the card somehow you come to know that it is a red card some of you come through it’s a red card now what is the probability this is a red king now you know it’s a red card.

Your scope is now limited to only red cards to by 26 is 1 by 13 so you have taken into account and evidence and information that you captured that it is a red card the moment you get that information you recalibrate your probability of red king that concept of recalibrating your probabilities based on the information that you’re gathering that concept is called Bayes theorem so what Mr. BAE’s insiders start with some probability the default probability values but to keep on recalibrating those probabilities the moment more and more information comes to you however the information we may be howsoever weak the information maybe don’t ignore any information and recolored calibrate your probability that is the philosophy or reasoning behind this Basin basing Mode’s are you okay.

So for this to understand this we need to know probability joint probability and what are conditional probabilities probe what is the probability of King given it’s red we know it’s already red card so now if only the event left to happen is king so what is the probability that this key card will be a king given it’s read that concept which we call two by 26 that concept is called conditional probability.

We started by joint probability what is the probability is a red king because we didn’t have any information it was 2 by 52 the moment we came to know it’s red then we switched to the conditional probability what are the probabilities King given its red right conditional probability is invalid if the joint probability is 0 if the probability of two events occurring together with a 0 we don’t talk about conditional probability what is the probability of drawing a green King there is no green King possibility so don’t even talk about conditional probabilities so we talk about conditional probability only when we have a joint probability greater than 0 all right.

Let’s see how all this was put in to build a model we already discuss what is probably what is joint probability and what is the conditional probability we already discuss this so I’m going to jump these slides this is the model I am sure you have seen this formless looks very familiar oh man you should put it on some kind of board and put it on look so likely we’re going to launch some rocket it’s a pretty easy, ok so what I’m going to do is I’m going to jump all this right back.

I’m going to play with animations this blue box represents all the flight information have captured from my experience all the hundred present flights which have taken I am representing at Bay blue box the dimensions of the box are inconsequential they don’t matter when I look at the past data about the flights which I have taken I noticed that 20% of the time those flight delay 80% of the time the flight was on the shadow in mathematics in school order we would have done this as Venn diagrams ok.