Show A Scatter Diagram provides relationship between two variables, and provides a visual correlation coefficient. Why You Would Use Scatter Analysis and Scatter PlotsA Scatter Analysis is used when you need to compare two data sets against each other to see if there is a relationship. Scatter plots are a way of visualizing the relationship; by plotting the data points you get a scattering of points on a graph. The analysis comes in when trying to discern what kind of pattern – if any – is present. And what that pattern means. It is this kind of analysis we are talking about when we are trying to get at the root cause of an issue. Scatter Diagrams are used to show the “cause-and-effect” relationship between two kinds of data, and to provide more useful information about a production process. Specific instances of when to utilize scatter diagrams:
What Kind of Data Should You Use on Scatter Analysis?Scatter analysis generally makes use of continuous data. (See notes on the different data types here.) Discrete data is best at pass/ fail measurements. Continuous data lets you measure things deeply on an infinite set and is generally used in scatter analysis. You could use discrete data on one axis of a scatter plot and continuous data on the other axis. For the discrete data, you’d have to put it into some kind of quantified band – like say 1-10 on a customer satisfaction score. I suppose you also *could* put discrete data that comes out like pass/fail as one of two bands, but it would really depend on the data if you got any useful information out of it. Best bet is continuous data. If you are looking for a way to do graphical analysis on discrete data, you might try attribute charts. Scatter Plot VideosScatter Plots and CorrelationScatter plots only show correlation. They do not prove causation. The example often used is shark attacks and ice cream sales. There may be a correlation between the two, but ice cream does not cause shark attacks — the heat of the day does. In other words, more people are in the water on hot days equaling more shark attacks, and more people buy ice cream on hot days How to Make a Scatter Diagram:
NOTE: Do not count points on a line.
A = points in upper left + points in lower right B = points in upper right + points in lower left Q = the smaller of A and B N = A + B
kassambara |
17/11/2017 |
103423
| Comments (3)
| R Graphics Essentials
Scatter plots are used to display the relationship between two continuous variables x and y. In this article, we’ll start by showing how to create beautiful scatter plots in R. We’ll use helper functions in the ggpubr R package to display automatically the correlation coefficient and the significance level on the plot. We’ll also describe how to color points by groups and to add concentration ellipses around each group. Additionally, we’ll show how to create bubble charts, as well as, how to add marginal plots (histogram, density or box plot) to a scatter plot. We continue by showing show some alternatives to the standard scatter plots, including rectangular binning, hexagonal binning and 2d density estimation. These plot types are useful in a situation where you have a large data set containing thousands of records. R codes for zooming, in a scatter plot, are also provided. Finally, you’ll learn how to add fitted regression trend lines and equations to a scatter graph. Contents: Dataset: mtcars. The variable cyl is used as grouping variable.
Key functions:
To remove the confidence region around the regression line, specify the argument se = FALSE in the function geom_smooth(). Change the point shape, by specifying the argument shape, for example: b + geom_point(shape = 18)To see the different point shapes commonly used in R, type this: ggpubr::show_point_shapes()Create easily a scatter plot using ggscatter() [in ggpubr]. Use stat_cor() [ggpubr] to add the correlation coefficient and the significance level.
Instead of drawing the concentration ellipse, you can: i) plot a convex hull of a set of points; ii) add the mean points and the confidence ellipse of each group. Key R functions: stat_chull(), stat_conf_ellipse() and stat_mean() [in ggpubr]: # Convex hull of groups b + geom_point(aes(color = cyl, shape = cyl)) + stat_chull(aes(color = cyl, fill = cyl), alpha = 0.1, geom = "polygon") + scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) # Add mean points and confidence ellipses b + geom_point(aes(color = cyl, shape = cyl)) + stat_conf_ellipse(aes(color = cyl, fill = cyl), alpha = 0.1, geom = "polygon") + stat_mean(aes(color = cyl, shape = cyl), size = 2) + scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) # Add group mean points and stars ggscatter(df, x = "wt", y = "mpg", color = "cyl", palette = "npg", shape = "cyl", ellipse = TRUE, mean.point = TRUE, star.plot = TRUE, ggtheme = theme_minimal()) # Change the ellipse type to 'convex' ggscatter(df, x = "wt", y = "mpg", color = "cyl", palette = "npg", shape = "cyl", ellipse = TRUE, ellipse.type = "convex", ggtheme = theme_minimal())
Key functions:
First install ggrepel (ìnstall.packages("ggrepel")), then type this: library(ggrepel) # Add text to the plot .labs <-> # Draw a rectangle underneath the text, making it easier to read. b + geom_point(aes(color = cyl)) + geom_label_repel(aes(label = .labs, color = cyl), size = 3)+ scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))
In a bubble chart, points size is controlled by a continuous variable, here qsec. In the R code below, the argument alpha is used to control color transparency. alpha should be between 0 and 1. b + geom_point(aes(color = cyl, size = qsec), alpha = 0.5) + scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) + scale_size(range = c(0.5, 12)) # Adjust the range of points size
The function ggMarginal() [in ggExtra package] (Attali 2017), can be used to easily add a marginal histogram, density or box plot to a scatter plot. First, install the ggExtra package as follow: install.packages("ggExtra"); then type the following R code: One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. A solution is provided in the function ggscatterhist() [ggpubr]: library(ggpubr) # Grouped Scatter plot with marginal density plots ggscatterhist( iris, x = "Sepal.Length", y = "Sepal.Width", color = "Species", size = 3, alpha = 0.6, palette = c("#00AFBB", "#E7B800", "#FC4E07"), margin.params = list(fill = "Species", color = "black", size = 0.2) ) # Use box plot as marginal plots ggscatterhist( iris, x = "Sepal.Length", y = "Sepal.Width", color = "Species", size = 3, alpha = 0.6, palette = c("#00AFBB", "#E7B800", "#FC4E07"), margin.plot = "boxplot", ggtheme = theme_bw() )
In this section, we’ll present some alternatives to the standard scatter plots. These include:
Rectangular binning is a very useful alternative to the standard scatter plot in a situation where you have a large data set containing thousands of records. Rectangular binning helps to handle overplotting. Rather than plotting each point, which would appear highly dense, it divides the plane into rectangles, counts the number of cases in each rectangle, and then plots a heatmap of 2d bin counts. In this plot, many small hexagon are drawn with a color intensity corresponding to the number of cases in that bin. Key function: geom_bin2d(): Creates a heatmap of 2d bin counts. Key arguments: bins, numeric vector giving number of bins in both vertical and horizontal directions. Set to 30 by default.
Key function: geom_hex()
Key function: geom_density_2d()
To zoom the points, where Petal.Length < 2.5, type this: ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) + geom_point() + ggpubr::color_palette("jco") + facet_zoom(x = Petal.Length < 2.5)+ theme_bw()
In this section, we’ll describe how to add trend lines to a scatter plot and labels (equation, R2, BIC, AIC) for a fitted lineal model.
Note that, you can also display the AIC and the BIC values using ..AIC.label.. and ..BIC.label.. in the above equation. Other arguments (label.x, label.y) are available in the function stat_poly_eq() to adjust label positions. For more examples, type this R code: browseVignettes(“ggpmisc”).
Possible layers, include:
Possible layers include:
Enjoyed this article? Give us 5 stars (just above this text block)! Reader needs to be STHDA member for voting. I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In. Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!! Avez vous aimé cet article? Donnez nous 5 étoiles (juste au dessus de ce block)! Vous devez être membre pour voter. Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In. Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous! |