Some basics of Data Visualization.

Let’s say there are two people Sam and Matt and they want to communicate with each other and if they want to communicate with each other than they need to a common than the neat no common language and both of them need to also follow a certain set of grammatical rules to speak with each other so let’s say if I make a statement such as I like ice cream then it would make complete sense because it follows the Associated grammatical rules but on the other hand if I make a statement such as like ice cream I then it will make no sense at all because it does not follow any grammatical rules now when we talk about grammar let’s actually take the English language.

Every word in the English language is associated with something fewer parts of speech so you have different parts of speech such as nouns pronouns adjectives verbs and adverbs and all of these different parts of speech played different roles in making up a complete sentence and when it comes to the grammar of graphics so it basically helps us to divide our plot into different components so you got different components such as aesthetically or geometry layer and so on and when you stack all of these layers together you get your final perfect lot so let’s understand all of these layers properly.

Let’s start with a from the Italia, so data layer is basically the data for which the visualization has to be done so as simple as that and once you choose the data for which the visualization has to be done you head on to the ascetics layer, so this is where you actually map your columns onto different metrics or scales so you have different ascetics such as the x-axis y-axis shape size color and so on so you can map your columns onto all of these ascetics so you have decided your data and then you have also mapped your columns on two different ascetics so once these two are done you can go ahead and decide the type of geometry which you would have built now what do you mean by the type of geometry, so this is where you actually decide the type of plot.

You can have different plots as a histogram bar plot scatter plot box plot and so on, so the data layer is a tickler and the geometry layer form the core components of the grammar of graphics and once you’re done with these three layers you can go ahead to the facet here now many times it happens that your data set would have a lot of numerical columns and a lot of categorical columns and if you have to plot all of these columns onto a single plot then it might lead to a lot of chaos and this is where you actually want to subset your plot or divide your plot into different subgroups and this is where particular comes in to divide your plot into different subgroups and then finally we have the theme Leo which is basically used to make uh plot even more pretty.

You can add background to your panel and you can also add background to your plot, so these are basically those different components that are present in the grammar of graphics now we’ll head-on with ggplot2 so ggplot2 is completely based on the grammar of graphics so the GUN ggplot2 it actually stands for the grammar of graphics all right so let’s go on to the main part and let’s do some beautiful visualization with the ggplot2 package right guys, so this is our studio and the first task would be to install ggplot2 so I click on packages tab over here and then I click on install so let me just given the name of the package ggplot2 and all you have to do is click on this install button and it’ll be automatically installed and since I already have ggplot2 installed in our studio.

I do not have to do it again now once you install the package you would have to load the package and to load the package you will be using the old library functions you’ll type in the library and then you’ll type in 3G power 2 over here and you have successfully loaded the package right so we’ll be doing all of these visualizations on this car dataset over here which has around 25 odd columns and 205 roots which you’ve got columns such as the number of tours of the car and you know whether the engine is present in the front or back then you’ve got the car length the car worth then you’ve got the mileage of the car and city and manage of the car on a highway and you also got the price of the car so we’ll be making some really beautiful visualizations on this car dataset.

As I’ve told you easy plot 2 is based on the grammar of graphics so we’ll start with the Natalia first know this is the command to we’ll start with ggplot2 material type in plot and as I’ve told you we have to decide to eat a set which has to be mapped on the detail here so since I want to do visualization on top of this car dataset I am basically mapping this car dataset into the data layer, so this is my first task now this helps us to make up bank blank plot, so this is basically the first layer and once this is done I will go ahead and tag the aesthetics layer on top of it.

I’ll put in a comma I’ll use the AE is an attribute which really stands for each setting and I want to understand the distribution of the price column so what I’ll do is I will map this price column on to the X e static so let me just put in price over here so I’ll hit enter right so I have successfully mapped the price column 1 to X he said it and what you see is the price of the car it would theory between around $5,000 to around $40,000 right so we are done with the data layer and also the aesthetics layer and now it’s time to decide on the geometry layer so since it’s a univariate distribution or I really want to understand the price column I’ll go ahead and build a histogram for this and to build a histogram I would need the job histogram function.

There are different job functions over here for each type of geometry should have to build a bar plot you’ve got the job of sure to build a box plot in you’ve got box floor trade and since I would have to build a histogram I’ll just type in job histogram over here and I have successfully well this histogram so let me just zoom this over here right now what we basically understand this so the majority of the car, so there are around 30 or 35 cars whose price would be around ten thousand dollars and there are very few cars so if you take these are bins over here so there are just three cars whose price would be greater than $40,000 right now you can draw these inferences by easily looking at this graph.

If you’re asked to understand the distribution of the price of the cars by just looking at the data set then obviously that wouldn’t have been possible alright now what we’ll do is we’ll add some colors to this plot so let me go ahead and what I’ll do is I’ll add a fill color, so fill color is basically used to add color inside all of these bins.
I’ll type infill inside this job histogram function and color which side we want to have us let’s say dark orchard 3 right through the color which you see over here this is dark orchard 3 now I will also go ahead and add a boundary color and if I want to add a boundary color I will be using the co L attribute and this time the color would be dark orange so I’ll just type in dark orange over here right so I have also added a boundary color.