| Title: | Extract and Visualize the Results of Multivariate Data Analyses |
|---|---|
| Description: | Provides easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis), and 'HMFA' (Hierarchical Multiple Factor Analysis) from different R packages. It also includes support for supplementary qualitative variables in 'FactoMineR' 'FAMD' and 'MFA' workflows, hardened validation for clustering and dimension-reduction helper workflows, backward-compatible phylogenic dendrogram layout support for current 'igraph' APIs, and 'ggplot2'-based data visualization. |
| Authors: | Alboukadel Kassambara [aut, cre] (ORCID: <https://orcid.org/0009-0002-9136-0791>), Fabian Mundt [aut], Laszlo Erdey [ctb] (ORCID: <https://orcid.org/0000-0002-6781-4303>, affiliation: Faculty of Economics and Business, University of Debrecen, Hungary, contribution: Modern compatibility fixes, tests, and maintenance updates) |
| Maintainer: | Alboukadel Kassambara <[email protected]> |
| License: | GPL-2 |
| Version: | 2.0.0.999 |
| Built: | 2026-05-25 10:44:26 UTC |
| Source: | https://github.com/kassambara/factoextra |
Athletes' performance during two sporting meetings
data("decathlon2")data("decathlon2")
A data frame with 27 observations on the following 13 variables.
X100ma numeric vector
Long.jumpa numeric vector
Shot.puta numeric vector
High.jumpa numeric vector
X400ma numeric vector
X110m.hurdlea numeric vector
Discusa numeric vector
Pole.vaulta numeric vector
Javelinea numeric vector
X1500ma numeric vector
Ranka numeric vector corresponding to the rank
Pointsa numeric vector specifying the point obtained
Competitiona factor with levels Decastar OlympicG
This data is a subset of decathlon data in FactoMineR package.
data(decathlon2) decathlon.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon.active, scale = TRUE) fviz_pca_biplot(res.pca)data(decathlon2) decathlon.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon.active, scale = TRUE) fviz_pca_biplot(res.pca)
Deprecated functions. Will be removed in a future version.
get_mfa_quanti_var(). Deprecated. Use get_mfa_var(res.mfa, "quanti.var") instead.
get_mfa_quali_var(). Deprecated. Use get_mfa_var(res.mfa, "quali.var") instead.
get_mfa_group(). Deprecated. Use get_mfa_var(res.mfa, "group") instead.
fviz_mfa_ind_starplot(): Star graph of individuals (draws partial points). Deprecated. Use fviz_mfa_ind(res.mfa, partial = "all") instead.
fviz_mfa_quanti_var(): Graph of quantitative variables. Deprecated. Use fviz_mfa(X, "quanti.var") instead.
fviz_mfa_quali_var(): Graph of qualitative variables. Deprecated. Use fviz_mfa(X, "quali.var") instead.
get_hmfa_quanti_var(). Deprecated. Use get_hmfa_var(res.hmfa, "quanti.var") instead.
get_hmfa_quali_var(). Deprecated. Use get_hmfa_var(res.hmfa, "quali.var") instead.
get_hmfa_group(). Deprecated. Use get_hmfa_var(res.hmfa, "group") instead.
fviz_hmfa_ind_starplot(): Graph of partial individuals. Deprecated. Use fviz_hmfa_ind(X, partial = "all") instead.
fviz_hmfa_quanti_var(): Graph of quantitative variables. Deprecated. Use fviz_hmfa_var(X, "quanti.var") instead.
fviz_hmfa_quali_var(): Graph of qualitative variables. Deprecated. Use fviz_hmfa_var(X, "quali.var") instead.
fviz_hmfa_group(): Graph of the groups representation. Deprecated. Use fviz_hmfa_var(X, "group") instead.
get_mfa_quanti_var(res.mfa) get_mfa_quali_var(res.mfa) get_mfa_group(res.mfa) fviz_mfa_ind_starplot(X, ...) fviz_mfa_group(X, ...) fviz_mfa_quanti_var(X, ...) fviz_mfa_quali_var(X, ...) get_hmfa_quanti_var(res.hmfa) get_hmfa_quali_var(res.hmfa) get_hmfa_group(res.hmfa) fviz_hmfa_quanti_var(X, ...) fviz_hmfa_quali_var(X, ...) fviz_hmfa_ind_starplot(X, ...) fviz_hmfa_group(X, ...)get_mfa_quanti_var(res.mfa) get_mfa_quali_var(res.mfa) get_mfa_group(res.mfa) fviz_mfa_ind_starplot(X, ...) fviz_mfa_group(X, ...) fviz_mfa_quanti_var(X, ...) fviz_mfa_quali_var(X, ...) get_hmfa_quanti_var(res.hmfa) get_hmfa_quali_var(res.hmfa) get_hmfa_group(res.hmfa) fviz_hmfa_quanti_var(X, ...) fviz_hmfa_quali_var(X, ...) fviz_hmfa_ind_starplot(X, ...) fviz_hmfa_group(X, ...)
res.mfa |
an object of class MFA [FactoMineR]. |
X |
an object of class MFA or HMFA [FactoMineR]. |
... |
Other arguments. |
res.hmfa |
an object of class HMFA [FactoMineR]. |
Alboukadel Kassambara [email protected]
Clustering methods classify data samples into groups of similar objects. This process requires some methods for measuring the distance or the (dis)similarity between the observations. Read more: STHDA website - clarifying distance measures..
get_dist():
Computes a distance matrix between the rows of a data matrix. Compared to
the standard dist() function, it supports
correlation-based distance measures including "pearson", "kendall" and
"spearman" methods.
fviz_dist(): Visualizes a distance matrix
When stand = TRUE, scaling that produces non-finite values is
rejected. fviz_dist() also validates that supplied distance objects
contain only finite values before plotting.
get_dist(x, method = "euclidean", stand = FALSE, ...) fviz_dist( dist.obj, order = TRUE, show_labels = TRUE, lab_size = NULL, gradient = list(low = "red", mid = "white", high = "blue") )get_dist(x, method = "euclidean", stand = FALSE, ...) fviz_dist( dist.obj, order = TRUE, show_labels = TRUE, lab_size = NULL, gradient = list(low = "red", mid = "white", high = "blue") )
x |
a numeric matrix or a data frame. |
method |
the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski", "pearson", "spearman" or "kendall". |
stand |
logical value; default is FALSE. If TRUE, then the data will be
standardized using the function |
... |
other arguments to be passed to the function dist() when using get_dist(). |
dist.obj |
an object of class "dist" as generated by the function
|
order |
logical value. if TRUE the ordered dissimilarity image (ODI) is shown. |
show_labels |
logical value. If TRUE, the labels are displayed. |
lab_size |
the size of labels. |
gradient |
a list containing three elements specifying the colors for low, mid and high values in the ordered dissimilarity image. The element "mid" can take the value of NULL. |
get_dist(): returns an object of class "dist".
fviz_dist(): returns a ggplot2
Alboukadel Kassambara [email protected]
data(USArrests) res.dist <- get_dist(USArrests, stand = TRUE, method = "pearson") fviz_dist(res.dist, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))data(USArrests) res.dist <- get_dist(USArrests, stand = TRUE, method = "pearson") fviz_dist(res.dist, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))
Provides a convenient workflow for clustering analyses and
ggplot2-based data visualization. When k = NULL, the gap statistic
selects the number of clusters. Hierarchical backends may validly return
k = 1; in that case eclust() returns a one-cluster result
without silhouette information. Read more:
Visual enhancement of clustering analysis.
eclust( x, FUNcluster = c("kmeans", "pam", "clara", "fanny", "hclust", "agnes", "diana"), k = NULL, k.max = 10, stand = FALSE, graph = TRUE, hc_metric = "euclidean", hc_method = "ward.D2", gap_maxSE = list(method = "firstSEmax", SE.factor = 1), nboot = 100, verbose = interactive(), seed = 123, ... )eclust( x, FUNcluster = c("kmeans", "pam", "clara", "fanny", "hclust", "agnes", "diana"), k = NULL, k.max = 10, stand = FALSE, graph = TRUE, hc_metric = "euclidean", hc_method = "ward.D2", gap_maxSE = list(method = "firstSEmax", SE.factor = 1), nboot = 100, verbose = interactive(), seed = 123, ... )
x |
numeric vector, data matrix or data frame |
FUNcluster |
a clustering function including "kmeans", "pam", "clara", "fanny", "hclust", "agnes" and "diana". Abbreviation is allowed. |
k |
the number of clusters to be generated. If NULL, the gap statistic
is used to estimate the appropriate number of clusters. For hierarchical
clustering, this automatic selection may return |
k.max |
the maximum number of clusters to consider, must be at least two. |
stand |
logical value; default is FALSE. If TRUE, then the data will be
standardized using the function |
graph |
logical value. If TRUE, cluster plot is displayed. |
hc_metric |
character string specifying the metric to be used for calculating dissimilarities between observations. Allowed values are those accepted by the function dist() [including "euclidean", "manhattan", "maximum", "canberra", "binary", "minkowski"] and correlation based distance measures ["pearson", "spearman" or "kendall"]. Used only when FUNcluster is a hierarchical clustering function such as one of "hclust", "agnes" or "diana". |
hc_method |
the agglomeration method to be used (?hclust): "ward.D", "ward.D2", "single", "complete", "average", ... |
gap_maxSE |
a list containing the parameters (method and SE.factor) for determining the location of the maximum of the gap statistic (Read the documentation ?cluster::maxSE). |
nboot |
integer, number of Monte Carlo ("bootstrap") samples. Used only for determining the number of clusters using gap statistic. |
verbose |
logical value. If TRUE, the result of progress is printed. |
seed |
integer used for seeding the random number generator. |
... |
other arguments to be passed to FUNcluster. |
Returns an object of class "eclust" containing the result of the standard function used (e.g., kmeans, pam, hclust, agnes, diana, etc.).
It includes also:
cluster: the cluster assignement of observations after cutting the tree
nbclust: the number of clusters
silinfo: the silhouette information of observations, when available for solutions with at least two clusters, including $widths (silhouette width values of each observation), $clus.avg.widths (average silhouette width of each cluster) and $avg.width (average width of all clusters)
size: the size of clusters
data: a matrix containing the original or the standardized data (if stand = TRUE)
The "eclust" class has method for fviz_silhouette(), fviz_dend(), fviz_cluster().
Alboukadel Kassambara [email protected]
fviz_silhouette, fviz_dend,
fviz_cluster
# Load and scale data data("USArrests") df <- scale(USArrests) # Enhanced k-means clustering # nboot >= 500 is recommended res.km <- eclust(df, "kmeans", nboot = 2) # Silhouette plot fviz_silhouette(res.km) # Optimal number of clusters using gap statistics res.km$nbclust # Print result res.km ## Not run: # Enhanced hierarchical clustering res.hc <- eclust(df, "hclust", nboot = 2) # compute hclust fviz_dend(res.hc) # dendrogram if (res.hc$nbclust > 1) fviz_silhouette(res.hc) # silhouette plot ## End(Not run)# Load and scale data data("USArrests") df <- scale(USArrests) # Enhanced k-means clustering # nboot >= 500 is recommended res.km <- eclust(df, "kmeans", nboot = 2) # Silhouette plot fviz_silhouette(res.km) # Optimal number of clusters using gap statistics res.km$nbclust # Print result res.km ## Not run: # Enhanced hierarchical clustering res.hc <- eclust(df, "hclust", nboot = 2) # compute hclust fviz_dend(res.hc) # dendrogram if (res.hc$nbclust > 1) fviz_silhouette(res.hc) # silhouette plot ## End(Not run)
Eigenvalues correspond to the amount of the variation explained by each principal component (PC).
get_eig(): Extract the eigenvalues/variances of the principal dimensions
fviz_eig(): Plot the eigenvalues/variances against the number of dimensions
get_eigenvalue(): an alias of get_eig()
fviz_screeplot(): an alias of fviz_eig()
These functions support the results of Principal Component Analysis (PCA),
Correspondence Analysis (CA), Multiple Correspondence Analysis (MCA), Factor Analysis of Mixed Data (FAMD),
Multiple Factor Analysis (MFA) and Hierarchical Multiple Factor Analysis
(HMFA) functions. fviz_eig() validates ncp,
parallel.iter, and parallel.seed before plotting, accepting
integer-like numeric values while still rejecting fractional inputs.
get_eig(X) get_eigenvalue(X) fviz_eig( X, choice = c("variance", "eigenvalue"), geom = c("bar", "line"), barfill = "steelblue", barcolor = "steelblue", linecolor = "black", ncp = 10, addlabels = FALSE, hjust = 0, main = NULL, xlab = NULL, ylab = NULL, ggtheme = theme_minimal(), parallel = FALSE, parallel.color = "red", parallel.lty = "dashed", parallel.iter = 100, parallel.seed = NULL, ... ) fviz_screeplot(...)get_eig(X) get_eigenvalue(X) fviz_eig( X, choice = c("variance", "eigenvalue"), geom = c("bar", "line"), barfill = "steelblue", barcolor = "steelblue", linecolor = "black", ncp = 10, addlabels = FALSE, hjust = 0, main = NULL, xlab = NULL, ylab = NULL, ggtheme = theme_minimal(), parallel = FALSE, parallel.color = "red", parallel.lty = "dashed", parallel.iter = 100, parallel.seed = NULL, ... ) fviz_screeplot(...)
X |
an object of class PCA, CA, MCA, FAMD, MFA and HMFA [FactoMineR]; prcomp and princomp [stats]; dudi, pca, coa and acm [ade4]; ca and mjca [ca package]. |
choice |
a text specifying the data to be plotted. Allowed values are "variance" or "eigenvalue". |
geom |
a text specifying the geometry to be used for the graph. Allowed values are "bar" for barplot, "line" for lineplot or c("bar", "line") to use both types. |
barfill |
fill color for bar plot. |
barcolor |
outline color for bar plot. |
linecolor |
color for line plot (when geom contains "line"). |
ncp |
a single positive integer specifying the number of dimensions to be shown. Integer-like numeric values are accepted. |
addlabels |
logical value. If TRUE, labels are added at the top of bars or points showing the information retained by each dimension. |
hjust |
horizontal adjustment of the labels. |
main, xlab, ylab
|
plot main and axis titles. |
ggtheme |
function, ggplot2 theme name. Default value is theme_pubr(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
parallel |
logical value. If TRUE, adds a parallel analysis threshold line (Horn's method) to help determine the number of components to retain. Components with eigenvalues above this line are considered significant. Only works when choice = "eigenvalue" and X is a prcomp or princomp object. Default is FALSE. |
parallel.color |
color of the parallel analysis threshold line. Default is "red". |
parallel.lty |
line type for the parallel analysis line. Default is "dashed". |
parallel.iter |
a single positive integer giving the number of iterations for parallel analysis simulation. Integer-like numeric values are accepted. Default is 100. |
parallel.seed |
NULL or a single non-negative integer seed for reproducible parallel analysis simulation. If NULL (default), the current RNG stream is used. Integer-like numeric values are accepted. |
... |
optional arguments to be passed to the function ggpar. |
get_eig() (or get_eigenvalue()): returns a data.frame containing 3 columns: the eigenvalues, the percentage of variance and the cumulative percentage of variance retained by each dimension.
fviz_eig() (or fviz_screeplot()): returns a ggplot2
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
fviz_pca, fviz_ca,
fviz_mca, fviz_mfa, fviz_hmfa
# Principal Component Analysis # ++++++++++++++++++++++++++ data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) # Extract eigenvalues/variances get_eig(res.pca) # Default plot fviz_eig(res.pca, addlabels = TRUE, ylim = c(0, 85)) # Scree plot - Eigenvalues fviz_eig(res.pca, choice = "eigenvalue", addlabels=TRUE) # Use only bar or line plot: geom = "bar" or geom = "line" fviz_eig(res.pca, geom="line") # Parallel analysis (Horn's method) to determine number of components # Components with eigenvalues above the red line are significant fviz_eig(res.pca, choice = "eigenvalue", parallel = TRUE, addlabels = TRUE, parallel.color = "red", parallel.iter = 10, parallel.seed = 123) ## Not run: # Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(housetasks) res.ca <- CA(housetasks, graph = FALSE) get_eig(res.ca) fviz_eig(res.ca, linecolor = "#FC4E07", barcolor = "#00AFBB", barfill = "#00AFBB") # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) get_eig(res.mca) fviz_eig(res.mca, linecolor = "#FC4E07", barcolor = "#2E9FDF", barfill = "#2E9FDF") ## End(Not run)# Principal Component Analysis # ++++++++++++++++++++++++++ data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) # Extract eigenvalues/variances get_eig(res.pca) # Default plot fviz_eig(res.pca, addlabels = TRUE, ylim = c(0, 85)) # Scree plot - Eigenvalues fviz_eig(res.pca, choice = "eigenvalue", addlabels=TRUE) # Use only bar or line plot: geom = "bar" or geom = "line" fviz_eig(res.pca, geom="line") # Parallel analysis (Horn's method) to determine number of components # Components with eigenvalues above the red line are significant fviz_eig(res.pca, choice = "eigenvalue", parallel = TRUE, addlabels = TRUE, parallel.color = "red", parallel.iter = 10, parallel.seed = 123) ## Not run: # Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(housetasks) res.ca <- CA(housetasks, graph = FALSE) get_eig(res.ca) fviz_eig(res.ca, linecolor = "#FC4E07", barcolor = "#00AFBB", barfill = "#00AFBB") # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) get_eig(res.mca) fviz_eig(res.mca, linecolor = "#FC4E07", barcolor = "#2E9FDF", barfill = "#2E9FDF") ## End(Not run)
Subset and summarize the results of Principal Component Analysis (PCA), Correspondence Analysis (CA), Multiple Correspondence Analysis (MCA), Factor Analysis of Mixed Data (FAMD), Multiple Factor Analysis (MFA) and Hierarchical Multiple Factor Analysis (HMFA) functions from several packages. Axis indices are validated before extraction, and MCA quantitative supplementary summaries inherit the package-level error raised when that result is unavailable.
facto_summarize( X, element, node.level = 1, group.names, result = c("coord", "cos2", "contrib"), axes = 1:2, select = NULL )facto_summarize( X, element, node.level = 1, group.names, result = c("coord", "cos2", "contrib"), axes = 1:2, select = NULL )
X |
an object of class PCA, CA, MCA, FAMD, MFA and HMFA [FactoMineR]; prcomp and princomp [stats]; dudi, pca, coa and acm [ade4]; ca [ca package]; expoOutput [ExPosition]. |
element |
the element to subset from the output. Possible values are "row" or "col" for CA; "var", "ind", "mca.cor" or "quanti.sup" for MCA; "var" or "ind" for PCA; and 'quanti.var', 'quali.var', 'quali.sup', 'group' or 'ind' for FAMD, MFA and HMFA. |
node.level |
a single number indicating the HMFA node level. |
group.names |
a vector containing the name of the groups (by default, NULL and the group are named group.1, group.2 and so on). |
result |
the result to be extracted for the element. Possible values are the combination of c("cos2", "contrib", "coord") |
axes |
a numeric vector specifying the axes of interest. Values must be positive integer indices within the available dimensions. Default values are 1:2 for axes 1 and 2. |
select |
a selection of variables. Allowed values are NULL or a list containing the arguments name, cos2 or contrib. Default is list(name = NULL, cos2 = NULL, contrib = NULL):
|
If length(axes) > 1, then the columns contrib and cos2 correspond to the total contributions and total cos2 of the axes. In this case, the column coord is calculated as x^2 + y^2 + ...+; x, y, ... are the coordinates of the points on the specified axes.
A data frame containing the (total) coord, cos2 and the contribution for the axes.
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Principal component analysis # +++++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # Summarize variables on axes 1:2 facto_summarize(res.pca, "var", axes = 1:2)[,-1] # Select the top 5 contributing variables facto_summarize(res.pca, "var", axes = 1:2, select = list(contrib = 5))[,-1] # Select variables with cos2 >= 0.6 facto_summarize(res.pca, "var", axes = 1:2, select = list(cos2 = 0.6))[,-1] # Select by names facto_summarize(res.pca, "var", axes = 1:2, select = list(name = c("X100m", "Discus", "Javeline")))[,-1] # Summarize individuals on axes 1:2 facto_summarize(res.pca, "ind", axes = 1:2)[,-1] # Correspondence Analysis # ++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Summarize row variables on axes 1:2 facto_summarize(res.ca, "row", axes = 1:2)[,-1] # Summarize column variables on axes 1:2 facto_summarize(res.ca, "col", axes = 1:2)[,-1] # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) # Summarize variables on axes 1:2 res <- facto_summarize(res.mca, "var", axes = 1:2) head(res) # Summarize individuals on axes 1:2 res <- facto_summarize(res.mca, "ind", axes = 1:2) head(res) # Summarize quantitative supplementary variables on axes 1:2 res <- facto_summarize(res.mca, "quanti.sup", axes = 1:2) head(res) # Multiple factor Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Summarize categorical variables on axes 1:2 res <- facto_summarize(res.mfa, "quali.var", axes = 1:2) head(res) # Summarize individuals on axes 1:2 res <- facto_summarize(res.mfa, "ind", axes = 1:2) head(res)# Principal component analysis # +++++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # Summarize variables on axes 1:2 facto_summarize(res.pca, "var", axes = 1:2)[,-1] # Select the top 5 contributing variables facto_summarize(res.pca, "var", axes = 1:2, select = list(contrib = 5))[,-1] # Select variables with cos2 >= 0.6 facto_summarize(res.pca, "var", axes = 1:2, select = list(cos2 = 0.6))[,-1] # Select by names facto_summarize(res.pca, "var", axes = 1:2, select = list(name = c("X100m", "Discus", "Javeline")))[,-1] # Summarize individuals on axes 1:2 facto_summarize(res.pca, "ind", axes = 1:2)[,-1] # Correspondence Analysis # ++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Summarize row variables on axes 1:2 facto_summarize(res.ca, "row", axes = 1:2)[,-1] # Summarize column variables on axes 1:2 facto_summarize(res.ca, "col", axes = 1:2)[,-1] # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) # Summarize variables on axes 1:2 res <- facto_summarize(res.mca, "var", axes = 1:2) head(res) # Summarize individuals on axes 1:2 res <- facto_summarize(res.mca, "ind", axes = 1:2) head(res) # Summarize quantitative supplementary variables on axes 1:2 res <- facto_summarize(res.mca, "quanti.sup", axes = 1:2) head(res) # Multiple factor Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Summarize categorical variables on axes 1:2 res <- facto_summarize(res.mfa, "quali.var", axes = 1:2) head(res) # Summarize individuals on axes 1:2 res <- facto_summarize(res.mfa, "ind", axes = 1:2) head(res)
Map FactoMineR category labels to legacy naming patterns
factominer_category_map(X, element = c("quali.var", "quali.sup", "var"))factominer_category_map(X, element = c("quali.var", "quali.sup", "var"))
X |
a FactoMineR object (MCA, MFA, FAMD, HMFA). |
element |
element to map. Use "var" for MCA categories or "quali.var" for MFA/FAMD/HMFA qualitative categories. "quali.sup" maps supplementary qualitative categories when available. |
A data.frame with current labels, variable names, levels, and legacy naming patterns.
if (requireNamespace("FactoMineR", quietly = TRUE)) { data(poison) res.mca <- FactoMineR::MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph = FALSE) head(factominer_category_map(res.mca, element = "var")) }if (requireNamespace("FactoMineR", quietly = TRUE)) { data(poison) res.mca <- FactoMineR::MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph = FALSE) head(factominer_category_map(res.mca, element = "var")) }
Generic function to create a scatter plot of multivariate analyse outputs, including PCA, CA, MCA and MFA.
fviz( X, element, axes = c(1, 2), geom = "auto", label = "all", invisible = "none", labelsize = 4, pointsize = 1.5, pointshape = 19, arrowsize = 0.5, habillage = "none", addEllipses = FALSE, ellipse.level = 0.95, ellipse.type = "norm", ellipse.alpha = 0.1, mean.point = TRUE, color = "black", fill = "white", alpha = 1, gradient.cols = NULL, col.row.sup = "darkblue", col.col.sup = "darkred", select = list(name = NULL, cos2 = NULL, contrib = NULL), title = NULL, axes.linetype = "dashed", repel = FALSE, col.circle = "grey70", circlesize = 0.5, ggtheme = theme_minimal(), ggp = NULL, font.family = "", ... )fviz( X, element, axes = c(1, 2), geom = "auto", label = "all", invisible = "none", labelsize = 4, pointsize = 1.5, pointshape = 19, arrowsize = 0.5, habillage = "none", addEllipses = FALSE, ellipse.level = 0.95, ellipse.type = "norm", ellipse.alpha = 0.1, mean.point = TRUE, color = "black", fill = "white", alpha = 1, gradient.cols = NULL, col.row.sup = "darkblue", col.col.sup = "darkred", select = list(name = NULL, cos2 = NULL, contrib = NULL), title = NULL, axes.linetype = "dashed", repel = FALSE, col.circle = "grey70", circlesize = 0.5, ggtheme = theme_minimal(), ggp = NULL, font.family = "", ... )
X |
an object of class PCA, CA, MCA, FAMD, MFA and HMFA [FactoMineR]; prcomp and princomp [stats]; dudi, pca, coa and acm [ade4]; ca [ca package]; expoOutput [ExPosition]. |
element |
the element to subset from the output. Possible values are "row" or "col" for CA; "var", "ind", "mca.cor" or "quanti.sup" for MCA; "var" or "ind" for PCA; and 'quanti.var', 'quali.var', 'quali.sup', 'group' or 'ind' for FAMD, MFA and HMFA. |
axes |
a numeric vector specifying the axes of interest. Values must be positive integer indices within the available dimensions. Default values are 1:2 for axes 1 and 2. |
geom |
a text specifying the geometry to be used for the graph. Default value is "auto". Allowed values are the combination of c("point", "arrow", "text"). Use "point" (to show only points); "text" to show only labels; c("point", "text") or c("arrow", "text") to show both types. |
label |
a text specifying the elements to be labelled. Default value is "all". Allowed values are "none" or the combination of c("ind", "ind.sup", "quali", "var", "quanti.sup", "group.sup"). "ind" can be used to label only active individuals. "ind.sup" is for supplementary individuals. "quali" is for supplementary qualitative variables. "var" is for active variables. "quanti.sup" is for quantitative supplementary variables. |
invisible |
a text specifying the elements to be hidden on the plot. Default value is "none". Allowed values are the combination of c("ind", "ind.sup", "quali", "var", "quanti.sup", "group.sup"). |
labelsize |
font size for the labels |
pointsize |
the size of points |
pointshape |
the shape of points |
arrowsize |
the size of arrows. Controls the thickness of arrows. |
habillage |
an optional factor variable for coloring the observations by groups. Default value is "none". If X is a PCA object from FactoMineR package, habillage can also specify the supplementary qualitative variable (by its index or name) to be used for coloring individuals by groups (see ?PCA in FactoMineR). |
addEllipses |
logical value. If TRUE, draws ellipses around the individuals when habillage != "none". |
ellipse.level |
the size of the concentration ellipse in normal probability. |
ellipse.type |
Character specifying frame type. Possible values are
|
ellipse.alpha |
Alpha for ellipse specifying the transparency level of fill color. Use alpha = 0 for no fill color. |
mean.point |
logical value. If TRUE (default), group mean points are added to the plot. |
color |
color to be used for the specified geometries (point, text). Can be a continuous variable or a factor variable. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for individuals/variables are automatically controlled by their qualities of representation ("cos2"), contributions ("contrib"), coordinates (x^2+y^2, "coord"), x values ("x") or y values ("y"). To use automatic coloring (by cos2, contrib, ....), make sure that habillage ="none". |
fill |
same as the argument |
alpha |
controls the transparency of individual and variable colors, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the transparency for the individual/variable colors are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2+y^2, "coord"), x values("x") or y values("y"). To use this, make sure that habillage ="none". |
gradient.cols |
vector of colors to use for n-colour gradient. Allowed values include brewer and ggsci color palettes. |
col.col.sup, col.row.sup
|
colors for the supplementary column and row points, respectively. |
select |
a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
|
title |
the title of the graph |
axes.linetype |
linetype of x and y axes. |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
col.circle |
a color for the correlation circle. Used only when X is a PCA output. |
circlesize |
the size of the variable correlation circle. |
ggtheme |
function, ggplot2 theme name. Default value is theme_pubr(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
ggp |
a ggplot. If not NULL, points are added to an existing plot. |
font.family |
character vector specifying font family. |
... |
Arguments to be passed to the functions ggpubr::ggscatter() & ggpubr::ggpar(). |
a ggplot
Alboukadel Kassambara [email protected]
# Principal component analysis # +++++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) fviz(res.pca, "ind") # Individuals plot fviz(res.pca, "var") # Variables plot # Correspondence Analysis # ++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) fviz(res.ca, "row") # Rows plot fviz(res.ca, "col") # Columns plot # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) fviz(res.mca, "ind") # Individuals plot fviz(res.mca, "var") # Variables plot# Principal component analysis # +++++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) fviz(res.pca, "ind") # Individuals plot fviz(res.pca, "var") # Variables plot # Correspondence Analysis # ++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) fviz(res.ca, "row") # Rows plot fviz(res.ca, "col") # Columns plot # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) fviz(res.mca, "ind") # Individuals plot fviz(res.mca, "var") # Variables plot
Add supplementary data to a plot
fviz_add( ggp, df, axes = c(1, 2), geom = c("point", "arrow"), color = "blue", addlabel = TRUE, labelsize = 4, pointsize = 2, shape = 19, linetype = "dashed", repel = FALSE, font.family = "", ... )fviz_add( ggp, df, axes = c(1, 2), geom = c("point", "arrow"), color = "blue", addlabel = TRUE, labelsize = 4, pointsize = 2, shape = 19, linetype = "dashed", repel = FALSE, font.family = "", ... )
ggp |
a ggplot2 plot. |
df |
a data frame containing the x and y coordinates |
axes |
a numeric vector of length 2 specifying the components to be plotted. |
geom |
a character specifying the geometry to be used for the graph Allowed values are "point" or "arrow" or "text" |
color |
the color to be used |
addlabel |
a logical value. If TRUE, labels are added |
labelsize |
the size of labels. Default value is 4 |
pointsize |
the size of points |
shape |
point shape when geom ="point" |
linetype |
the linetype to be used when geom ="arrow" |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
font.family |
character vector specifying font family. |
... |
Additional arguments, not used |
a ggplot2 plot
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Principal component analysis data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # Visualize variables p <- fviz_pca_var(res.pca) print(p) # Add supplementary variables coord <- data.frame(PC1 = c(-0.7, 0.9), PC2 = c(0.25, -0.07)) rownames(coord) <- c("Rank", "Points") print(coord) fviz_add(p, coord, color ="blue", geom="arrow")# Principal component analysis data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # Visualize variables p <- fviz_pca_var(res.pca) print(p) # Add supplementary variables coord <- data.frame(PC1 = c(-0.7, 0.9), PC2 = c(0.25, -0.07)) rownames(coord) <- c("Rank", "Points") print(coord) fviz_add(p, coord, color ="blue", geom="arrow")
Correspondence analysis (CA) is an extension of Principal Component Analysis (PCA) suited to analyze frequencies formed by two categorical variables. fviz_ca() provides ggplot2-based elegant visualization of CA outputs from the R functions: CA [in FactoMineR], ca [in ca], coa [in ade4], correspondence [in MASS] and expOutput/epCA [in ExPosition]. Read more: Correspondence Analysis
fviz_ca_row(): Graph of row variables
fviz_ca_col(): Graph of column variables
fviz_ca_biplot(): Biplot of row and column variables
fviz_ca(): An alias of fviz_ca_biplot()
fviz_ca_row( X, axes = c(1, 2), geom = c("point", "text"), geom.row = geom, shape.row = 19, col.row = "blue", alpha.row = 1, col.row.sup = "darkblue", select.row = list(name = NULL, cos2 = NULL, contrib = NULL), map = "symmetric", repel = FALSE, ... ) fviz_ca_col( X, axes = c(1, 2), shape.col = 17, geom = c("point", "text"), geom.col = geom, col.col = "red", col.col.sup = "darkred", alpha.col = 1, select.col = list(name = NULL, cos2 = NULL, contrib = NULL), map = "symmetric", repel = FALSE, ... ) fviz_ca_biplot( X, axes = c(1, 2), geom = c("point", "text"), geom.row = geom, geom.col = geom, label = "all", invisible = "none", arrows = c(FALSE, FALSE), repel = FALSE, title = "CA - Biplot", ... ) fviz_ca(X, ...)fviz_ca_row( X, axes = c(1, 2), geom = c("point", "text"), geom.row = geom, shape.row = 19, col.row = "blue", alpha.row = 1, col.row.sup = "darkblue", select.row = list(name = NULL, cos2 = NULL, contrib = NULL), map = "symmetric", repel = FALSE, ... ) fviz_ca_col( X, axes = c(1, 2), shape.col = 17, geom = c("point", "text"), geom.col = geom, col.col = "red", col.col.sup = "darkred", alpha.col = 1, select.col = list(name = NULL, cos2 = NULL, contrib = NULL), map = "symmetric", repel = FALSE, ... ) fviz_ca_biplot( X, axes = c(1, 2), geom = c("point", "text"), geom.row = geom, geom.col = geom, label = "all", invisible = "none", arrows = c(FALSE, FALSE), repel = FALSE, title = "CA - Biplot", ... ) fviz_ca(X, ...)
X |
an object of class CA [FactoMineR], ca [ca], coa [ade4]; correspondence [MASS] and expOutput/epCA [ExPosition]. |
axes |
a numeric vector of length 2 specifying the dimensions to be plotted. |
geom |
a character specifying the geometry to be used for the graph. Allowed values are the combination of c("point", "arrow", "text"). Use "point" (to show only points); "text" to show only labels; c("point", "text") or c("arrow", "text") to show both types. |
geom.row, geom.col
|
as |
shape.row, shape.col
|
the point shapes to be used for row/column variables. Default values are 19 for rows and 17 for columns. |
map |
character string specifying the map type. Allowed options include: "symmetric", "rowprincipal", "colprincipal", "symbiplot", "rowgab", "colgab", "rowgreen" and "colgreen". See details |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
... |
Additional arguments.
|
col.col, col.row
|
color for column/row points. The default values are "red" and "blue", respectively. Can be a continuous variable or a factor variable. Allowed values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for row/column variables are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2, "coord"), x values("x") or y values("y") |
col.col.sup, col.row.sup
|
colors for the supplementary column and row points, respectively. |
alpha.col, alpha.row
|
controls the transparency of colors. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Allowed values include also : "cos2", "contrib", "coord", "x" or "y" as for the arguments col.col and col.row. |
select.col, select.row
|
a selection of columns/rows to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
|
label |
a character vector specifying the elements to be labelled. Default value is "all". Allowed values are "none" or the combination of c("row", "row.sup", "col", "col.sup"). Use "col" to label only active column variables; "col.sup" to label only supplementary columns; etc |
invisible |
a character value specifying the elements to be hidden on the plot. Default value is "none". Allowed values are the combination of c("row", "row.sup","col", "col.sup"). |
arrows |
Vector of two logicals specifying if the plot should contain points (FALSE, default) or arrows (TRUE). First value sets the rows and the second value sets the columns. |
title |
the title of the graph |
The default plot of (M)CA is a "symmetric" plot in which both rows and columns are in principal coordinates. In this situation, it's not possible to interpret the distance between row points and column points. To overcome this problem, the simplest way is to make an asymmetric plot. This means that, the column profiles must be presented in row space or vice-versa. The allowed options for the argument map are:
"rowprincipal" or "colprincipal": asymmetric plots with either rows in principal coordinates and columns in standard coordinates, or vice versa. These plots preserve row metric or column metric respectively.
"symbiplot": Both rows and columns are scaled to have variances equal to the singular values (square roots of eigenvalues), which gives a symmetric biplot but does not preserve row or column metrics.
"rowgab" or "colgab": Asymmetric maps, proposed by Gabriel & Odoroff (1990), with rows (respectively, columns) in principal coordinates and columns (respectively, rows) in standard coordinates multiplied by the mass of the corresponding point.
"rowgreen" or "colgreen": The so-called contribution biplots showing visually the most contributing points (Greenacre 2006b). These are similar to "rowgab" and "colgab" except that the points in standard coordinates are multiplied by the square root of the corresponding masses, giving reconstructions of the standardized residuals.
a ggplot
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Correspondence Analysis # ++++++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data(housetasks) head(housetasks) res.ca <- CA(housetasks, graph=FALSE) # Biplot of rows and columns # ++++++++++++++++++++++++++ # Symetric Biplot of rows and columns fviz_ca_biplot(res.ca) # Asymetric biplot, use arrows for columns fviz_ca_biplot(res.ca, map ="rowprincipal", arrow = c(FALSE, TRUE)) # Keep only the labels for row points fviz_ca_biplot(res.ca, label ="row") # Keep only labels for column points fviz_ca_biplot(res.ca, label ="col") # Select the top 7 contributing rows # And the top 3 columns fviz_ca_biplot(res.ca, select.row = list(contrib = 7), select.col = list(contrib = 3)) # Graph of row variables # +++++++++++++++++++++ # Control automatically the color of row points # using the "cos2" or the contributions "contrib" # cos2 = the quality of the rows on the factor map # Change gradient color # Use repel = TRUE to avoid overplotting (slow if many points) fviz_ca_row(res.ca, col.row = "cos2", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE) # You can also control the transparency # of the color by the "cos2" or "contrib" fviz_ca_row(res.ca, alpha.row="contrib") # Select and visualize some rows with select.row argument. # - Rows with cos2 >= 0.5: select.row = list(cos2 = 0.5) # - Top 7 rows according to the cos2: select.row = list(cos2 = 7) # - Top 7 contributing rows: select.row = list(contrib = 7) # - Select rows by names: select.row = list(name = c("Breakfeast", "Repairs", "Holidays")) # Example: Select the top 7 contributing rows fviz_ca_row(res.ca, select.row = list(contrib = 7)) # Graph of column points # ++++++++++++++++++++++++++++ # Control colors using their contributions fviz_ca_col(res.ca, col.col = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07")) # Select columns with select.col argument # You can select by contrib, cos2 and name # as previously described for ind # Select the top 3 contributing columns fviz_ca_col(res.ca, select.col = list(contrib = 3))# Correspondence Analysis # ++++++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data(housetasks) head(housetasks) res.ca <- CA(housetasks, graph=FALSE) # Biplot of rows and columns # ++++++++++++++++++++++++++ # Symetric Biplot of rows and columns fviz_ca_biplot(res.ca) # Asymetric biplot, use arrows for columns fviz_ca_biplot(res.ca, map ="rowprincipal", arrow = c(FALSE, TRUE)) # Keep only the labels for row points fviz_ca_biplot(res.ca, label ="row") # Keep only labels for column points fviz_ca_biplot(res.ca, label ="col") # Select the top 7 contributing rows # And the top 3 columns fviz_ca_biplot(res.ca, select.row = list(contrib = 7), select.col = list(contrib = 3)) # Graph of row variables # +++++++++++++++++++++ # Control automatically the color of row points # using the "cos2" or the contributions "contrib" # cos2 = the quality of the rows on the factor map # Change gradient color # Use repel = TRUE to avoid overplotting (slow if many points) fviz_ca_row(res.ca, col.row = "cos2", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE) # You can also control the transparency # of the color by the "cos2" or "contrib" fviz_ca_row(res.ca, alpha.row="contrib") # Select and visualize some rows with select.row argument. # - Rows with cos2 >= 0.5: select.row = list(cos2 = 0.5) # - Top 7 rows according to the cos2: select.row = list(cos2 = 7) # - Top 7 contributing rows: select.row = list(contrib = 7) # - Select rows by names: select.row = list(name = c("Breakfeast", "Repairs", "Holidays")) # Example: Select the top 7 contributing rows fviz_ca_row(res.ca, select.row = list(contrib = 7)) # Graph of column points # ++++++++++++++++++++++++++++ # Control colors using their contributions fviz_ca_col(res.ca, col.col = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07")) # Select columns with select.col argument # You can select by contrib, cos2 and name # as previously described for ind # Select the top 3 contributing columns fviz_ca_col(res.ca, select.col = list(contrib = 3))
Provides ggplot2-based elegant visualization of partitioning
methods including kmeans [stats package]; pam, clara and fanny [cluster
package]; dbscan [fpc package]; Mclust [mclust package]; HCPC [FactoMineR];
hkmeans [factoextra]. Observations are represented by points in the plot,
using principal components if ncol(data) > 2. An ellipse is drawn around
each cluster. When stand = TRUE, the plotting data must remain
finite after scaling.
fviz_cluster( object, data = NULL, choose.vars = NULL, stand = TRUE, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, show.clust.cent = TRUE, ellipse = TRUE, ellipse.type = "convex", ellipse.level = 0.95, ellipse.alpha = 0.2, shape = NULL, pointsize = 1.5, labelsize = 12, main = "Cluster plot", xlab = NULL, ylab = NULL, outlier.color = "black", outlier.shape = 19, outlier.pointsize = pointsize, outlier.labelsize = labelsize, ggtheme = theme_grey(), ... )fviz_cluster( object, data = NULL, choose.vars = NULL, stand = TRUE, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, show.clust.cent = TRUE, ellipse = TRUE, ellipse.type = "convex", ellipse.level = 0.95, ellipse.alpha = 0.2, shape = NULL, pointsize = 1.5, labelsize = 12, main = "Cluster plot", xlab = NULL, ylab = NULL, outlier.color = "black", outlier.shape = 19, outlier.pointsize = pointsize, outlier.labelsize = labelsize, ggtheme = theme_grey(), ... )
object |
an object of class "partition" created by the functions pam(), clara() or fanny() in cluster package; "kmeans" [in stats package]; "dbscan" [in fpc package]; "Mclust" [in mclust]; "hkmeans", "eclust" [in factoextra]. Possible value are also any list object with data and cluster components (e.g.: object = list(data = mydata, cluster = myclust)). |
data |
the data that has been used for clustering. Required only when object is a class of kmeans or dbscan. |
choose.vars |
a character vector containing variables to be considered for plotting. |
stand |
logical value; if TRUE, data is standardized before principal
component analysis. If scaling produces |
axes |
a numeric vector of length 2 specifying the dimensions to be plotted. |
geom |
a text specifying the geometry to be used for the graph. Allowed values are the combination of c("point", "text"). Use "point" (to show only points); "text" to show only labels; c("point", "text") to show both types. |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
show.clust.cent |
logical; if TRUE, shows cluster centers |
ellipse |
logical value; if TRUE, draws outline around points of each cluster |
ellipse.type |
Character specifying frame type. Possible values are
'convex', 'confidence' or types supported by
|
ellipse.level |
the size of the concentration ellipse in normal
probability. Passed for |
ellipse.alpha |
Alpha for frame specifying the transparency level of fill color. Use alpha = 0 for no fill color. |
shape |
the shape of points. |
pointsize |
the size of points |
labelsize |
font size for the labels |
main |
plot main title. |
xlab, ylab
|
character vector specifying x and y axis labels, respectively. Use xlab = FALSE and ylab = FALSE to hide xlab and ylab, respectively. |
outlier.pointsize, outlier.color, outlier.shape, outlier.labelsize
|
arguments for customizing outliers, which can be detected only in DBSCAN clustering. |
ggtheme |
function, ggplot2 theme name. Default value is theme_pubr(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
... |
other arguments to be passed to the functions
|
a ggplot2 object.
Alboukadel Kassambara [email protected]
fviz_silhouette, hcut,
hkmeans, eclust, fviz_dend
set.seed(123) # Data preparation # +++++++++++++++ data("iris") head(iris) # Remove species column (5) and scale the data iris.scaled <- scale(iris[, -5]) # K-means clustering # +++++++++++++++++++++ km.res <- kmeans(iris.scaled, 3, nstart = 10) # Visualize kmeans clustering # use repel = TRUE to avoid overplotting fviz_cluster(km.res, iris[, -5], ellipse.type = "norm") # Change the color palette and theme fviz_cluster(km.res, iris[, -5], palette = "Set2", ggtheme = theme_minimal()) ## Not run: # Show points only fviz_cluster(km.res, iris[, -5], geom = "point") # Show text only fviz_cluster(km.res, iris[, -5], geom = "text") # PAM clustering # ++++++++++++++++++++ requireNamespace("cluster", quietly = TRUE) pam.res <- pam(iris.scaled, 3) # Visualize pam clustering fviz_cluster(pam.res, geom = "point", ellipse.type = "norm") # Hierarchical clustering # ++++++++++++++++++++++++ # Use hcut() which compute hclust and cut the tree hc.cut <- hcut(iris.scaled, k = 3, hc_method = "complete") # Visualize dendrogram fviz_dend(hc.cut, show_labels = FALSE, rect = TRUE) # Visualize cluster fviz_cluster(hc.cut, ellipse.type = "convex") ## End(Not run)set.seed(123) # Data preparation # +++++++++++++++ data("iris") head(iris) # Remove species column (5) and scale the data iris.scaled <- scale(iris[, -5]) # K-means clustering # +++++++++++++++++++++ km.res <- kmeans(iris.scaled, 3, nstart = 10) # Visualize kmeans clustering # use repel = TRUE to avoid overplotting fviz_cluster(km.res, iris[, -5], ellipse.type = "norm") # Change the color palette and theme fviz_cluster(km.res, iris[, -5], palette = "Set2", ggtheme = theme_minimal()) ## Not run: # Show points only fviz_cluster(km.res, iris[, -5], geom = "point") # Show text only fviz_cluster(km.res, iris[, -5], geom = "text") # PAM clustering # ++++++++++++++++++++ requireNamespace("cluster", quietly = TRUE) pam.res <- pam(iris.scaled, 3) # Visualize pam clustering fviz_cluster(pam.res, geom = "point", ellipse.type = "norm") # Hierarchical clustering # ++++++++++++++++++++++++ # Use hcut() which compute hclust and cut the tree hc.cut <- hcut(iris.scaled, k = 3, hc_method = "complete") # Visualize dendrogram fviz_dend(hc.cut, show_labels = FALSE, rect = TRUE) # Visualize cluster fviz_cluster(hc.cut, ellipse.type = "convex") ## End(Not run)
This function can be used to visualize the contribution of rows/columns from the results of Principal Component Analysis (PCA), Correspondence Analysis (CA), Multiple Correspondence Analysis (MCA), Factor Analysis of Mixed Data (FAMD), and Multiple Factor Analysis (MFA) functions.
fviz_contrib( X, choice = c("row", "col", "var", "ind", "quanti.var", "quali.var", "group", "partial.axes"), axes = 1, fill = "steelblue", color = "steelblue", sort.val = c("desc", "asc", "none"), top = Inf, xtickslab.rt = 45, ggtheme = theme_minimal(), ... ) fviz_pca_contrib( X, choice = c("var", "ind"), axes = 1, fill = "steelblue", color = "steelblue", sortcontrib = c("desc", "asc", "none"), top = Inf, ... )fviz_contrib( X, choice = c("row", "col", "var", "ind", "quanti.var", "quali.var", "group", "partial.axes"), axes = 1, fill = "steelblue", color = "steelblue", sort.val = c("desc", "asc", "none"), top = Inf, xtickslab.rt = 45, ggtheme = theme_minimal(), ... ) fviz_pca_contrib( X, choice = c("var", "ind"), axes = 1, fill = "steelblue", color = "steelblue", sortcontrib = c("desc", "asc", "none"), top = Inf, ... )
X |
an object of class PCA, CA, MCA, FAMD, MFA and HMFA [FactoMineR]; prcomp and princomp [stats]; dudi, pca, coa and acm [ade4]; ca [ca package]. |
choice |
allowed values are "row" and "col" for CA; "var" and "ind" for PCA or MCA; "var", "ind", "quanti.var", "quali.var" and "group" for FAMD, MFA and HMFA. |
axes |
a numeric vector specifying the dimension(s) of interest. |
fill |
a fill color for the bar plot. |
color |
an outline color for the bar plot. |
sort.val |
a string specifying whether the value should be sorted. Allowed values are "none" (no sorting), "asc" (for ascending) or "desc" (for descending). |
top |
a numeric value specifying the number of top elements to be shown. |
xtickslab.rt |
rotation angle for x axis tick labels. Default is 45 degrees. |
ggtheme |
function, ggplot2 theme name. Default value is theme_pubr(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
... |
other arguments to be passed to the function ggpar. |
sortcontrib |
see the argument sort.val |
The function fviz_contrib() creates a barplot of row/column contributions.
A reference dashed line is also shown on the barplot. This reference line
corresponds to the expected value if the contribution where uniform.
For a given dimension, any row/column with a contribution above the reference line could be
considered as important in contributing to the dimension.
a ggplot2 plot
fviz_pca_contrib(): deprecated function. Use fviz_contrib()
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Principal component analysis # ++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # variable contributions on axis 1 fviz_contrib(res.pca, choice="var", axes = 1, top = 10 ) # Change theme and color fviz_contrib(res.pca, choice="var", axes = 1, fill = "lightgray", color = "black") + theme_minimal() + theme(axis.text.x = element_text(angle=45)) # Variable contributions on axis 2 fviz_contrib(res.pca, choice="var", axes = 2) # Variable contributions on axes 1 + 2 fviz_contrib(res.pca, choice="var", axes = 1:2) # Contributions of individuals on axis 1 fviz_contrib(res.pca, choice="ind", axes = 1) ## Not run: # Correspondence Analysis # ++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Visualize row contributions on axes 1 fviz_contrib(res.ca, choice ="row", axes = 1) # Visualize column contributions on axes 1 fviz_contrib(res.ca, choice ="col", axes = 1) # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) # Visualize individual contributions on axes 1 fviz_contrib(res.mca, choice ="ind", axes = 1) # Visualize variable categorie contributions on axes 1 fviz_contrib(res.mca, choice ="var", axes = 1) # Multiple Factor Analysis # ++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Visualize individual contributions on axes 1 fviz_contrib(res.mfa, choice ="ind", axes = 1, top = 20) # Visualize catecorical variable categorie contributions on axes 1 fviz_contrib(res.mfa, choice ="quali.var", axes = 1) ## End(Not run)# Principal component analysis # ++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # variable contributions on axis 1 fviz_contrib(res.pca, choice="var", axes = 1, top = 10 ) # Change theme and color fviz_contrib(res.pca, choice="var", axes = 1, fill = "lightgray", color = "black") + theme_minimal() + theme(axis.text.x = element_text(angle=45)) # Variable contributions on axis 2 fviz_contrib(res.pca, choice="var", axes = 2) # Variable contributions on axes 1 + 2 fviz_contrib(res.pca, choice="var", axes = 1:2) # Contributions of individuals on axis 1 fviz_contrib(res.pca, choice="ind", axes = 1) ## Not run: # Correspondence Analysis # ++++++++++++++++++++++++++ # Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Visualize row contributions on axes 1 fviz_contrib(res.ca, choice ="row", axes = 1) # Visualize column contributions on axes 1 fviz_contrib(res.ca, choice ="col", axes = 1) # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) # Visualize individual contributions on axes 1 fviz_contrib(res.mca, choice ="ind", axes = 1) # Visualize variable categorie contributions on axes 1 fviz_contrib(res.mca, choice ="var", axes = 1) # Multiple Factor Analysis # ++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Visualize individual contributions on axes 1 fviz_contrib(res.mfa, choice ="ind", axes = 1, top = 20) # Visualize catecorical variable categorie contributions on axes 1 fviz_contrib(res.mfa, choice ="quali.var", axes = 1) ## End(Not run)
This function can be used to visualize the quality of representation (cos2) of rows/columns from the results of Principal Component Analysis (PCA), Correspondence Analysis (CA), Multiple Correspondence Analysis (MCA), Factor Analysis of Mixed Data (FAMD), Multiple Factor Analysis (MFA) and Hierarchical Multiple Factor Analysis (HMFA) functions.
fviz_cos2( X, choice = c("row", "col", "var", "ind", "quanti.var", "quali.var", "group"), axes = 1, fill = "steelblue", color = "steelblue", sort.val = c("desc", "asc", "none"), top = Inf, xtickslab.rt = 45, ggtheme = theme_minimal(), ... )fviz_cos2( X, choice = c("row", "col", "var", "ind", "quanti.var", "quali.var", "group"), axes = 1, fill = "steelblue", color = "steelblue", sort.val = c("desc", "asc", "none"), top = Inf, xtickslab.rt = 45, ggtheme = theme_minimal(), ... )
X |
an object of class PCA, CA, MCA, FAMD, MFA and HMFA [FactoMineR]; prcomp and princomp [stats]; dudi, pca, coa and acm [ade4]; ca [ca package]. |
choice |
allowed values are "row" and "col" for CA; "var" and "ind" for PCA or MCA; "var", "ind", "quanti.var", "quali.var" and "group" for FAMD, MFA and HMFA. |
axes |
a numeric vector specifying the dimension(s) of interest. |
fill |
a fill color for the bar plot. |
color |
an outline color for the bar plot. |
sort.val |
a string specifying whether the value should be sorted. Allowed values are "none" (no sorting), "asc" (for ascending) or "desc" (for descending). |
top |
a numeric value specifying the number of top elements to be shown. |
xtickslab.rt |
rotation angle for x axis tick labels. Default is 45 degrees. |
ggtheme |
function, ggplot2 theme name. Default value is theme_pubr(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
... |
not used |
a ggplot
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Principal component analysis # ++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # variable cos2 on axis 1 fviz_cos2(res.pca, choice="var", axes = 1, top = 10 ) # Change color fviz_cos2(res.pca, choice="var", axes = 1, fill = "lightgray", color = "black") # Variable cos2 on axes 1 + 2 fviz_cos2(res.pca, choice="var", axes = 1:2) # cos2 of individuals on axis 1 fviz_cos2(res.pca, choice="ind", axes = 1) ## Not run: # Correspondence Analysis # ++++++++++++++++++++++++++ library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Visualize row cos2 on axes 1 fviz_cos2(res.ca, choice ="row", axes = 1) # Visualize column cos2 on axes 1 fviz_cos2(res.ca, choice ="col", axes = 1) # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) # Visualize individual cos2 on axes 1 fviz_cos2(res.mca, choice ="ind", axes = 1, top = 20) # Visualize variable categorie cos2 on axes 1 fviz_cos2(res.mca, choice ="var", axes = 1) # Multiple Factor Analysis # ++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Visualize individual cos2 on axes 1 # Select the top 20 fviz_cos2(res.mfa, choice ="ind", axes = 1, top = 20) # Visualize catecorical variable categorie cos2 on axes 1 fviz_cos2(res.mfa, choice ="quali.var", axes = 1) ## End(Not run)# Principal component analysis # ++++++++++++++++++++++++++ data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] res.pca <- prcomp(decathlon2.active, scale = TRUE) # variable cos2 on axis 1 fviz_cos2(res.pca, choice="var", axes = 1, top = 10 ) # Change color fviz_cos2(res.pca, choice="var", axes = 1, fill = "lightgray", color = "black") # Variable cos2 on axes 1 + 2 fviz_cos2(res.pca, choice="var", axes = 1:2) # cos2 of individuals on axis 1 fviz_cos2(res.pca, choice="ind", axes = 1) ## Not run: # Correspondence Analysis # ++++++++++++++++++++++++++ library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Visualize row cos2 on axes 1 fviz_cos2(res.ca, choice ="row", axes = 1) # Visualize column cos2 on axes 1 fviz_cos2(res.ca, choice ="col", axes = 1) # Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) # Visualize individual cos2 on axes 1 fviz_cos2(res.mca, choice ="ind", axes = 1, top = 20) # Visualize variable categorie cos2 on axes 1 fviz_cos2(res.mca, choice ="var", axes = 1) # Multiple Factor Analysis # ++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Visualize individual cos2 on axes 1 # Select the top 20 fviz_cos2(res.mfa, choice ="ind", axes = 1, top = 20) # Visualize catecorical variable categorie cos2 on axes 1 fviz_cos2(res.mfa, choice ="quali.var", axes = 1) ## End(Not run)
Draws easily beautiful dendrograms using either R base plot or ggplot2. Provides also an option for drawing a circular dendrogram and phylogenic trees.
fviz_dend( x, k = NULL, h = NULL, k_colors = NULL, palette = NULL, show_labels = TRUE, color_labels_by_k = TRUE, label_cols = NULL, labels_track_height = NULL, repel = FALSE, lwd = 0.7, type = c("rectangle", "circular", "phylogenic"), phylo_layout = "layout.auto", rect = FALSE, rect_border = "gray", rect_lty = 2, rect_fill = FALSE, lower_rect, horiz = FALSE, cex = 0.8, main = "Cluster Dendrogram", xlab = "", ylab = "Height", sub = NULL, ggtheme = theme_classic(), ... )fviz_dend( x, k = NULL, h = NULL, k_colors = NULL, palette = NULL, show_labels = TRUE, color_labels_by_k = TRUE, label_cols = NULL, labels_track_height = NULL, repel = FALSE, lwd = 0.7, type = c("rectangle", "circular", "phylogenic"), phylo_layout = "layout.auto", rect = FALSE, rect_border = "gray", rect_lty = 2, rect_fill = FALSE, lower_rect, horiz = FALSE, cex = 0.8, main = "Cluster Dendrogram", xlab = "", ylab = "Height", sub = NULL, ggtheme = theme_classic(), ... )
x |
an object of class dendrogram, hclust, agnes, diana, hcut, hkmeans or HCPC (FactoMineR). |
k |
the number of groups for cutting the tree. |
h |
a numeric value. Cut the dendrogram by cutting at height h. (k overrides h) |
k_colors, palette
|
a vector containing colors to be used for the groups. It should contains k number of colors. Allowed values include also "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". |
show_labels |
a logical value. If TRUE, leaf labels are shown. Default value is TRUE. |
color_labels_by_k |
logical value. If TRUE, labels are colored automatically by group when k != NULL. |
label_cols |
a vector containing the colors for labels. |
labels_track_height |
a positive numeric value for adjusting the room for the labels. Used only when type = "rectangle". |
repel |
logical value. Use repel = TRUE to avoid label overplotting when type = "phylogenic". |
lwd |
a numeric value specifying dendrogram branch and rectangle line width. |
type |
type of plot. Allowed values are one of "rectangle", "triangle", "circular", "phylogenic". |
phylo_layout |
the layout to be used for phylogenic trees. Default value
is "layout.auto", which is kept as a compatibility alias for
|
rect |
logical value specifying whether to add a rectangle around groups. Used only when k != NULL. |
rect_border, rect_lty
|
border color and line type for rectangles. |
rect_fill |
a logical value. If TRUE, fill the rectangle. |
lower_rect |
a value of how low should the lower part of the rectangle around clusters. Ignored when rect = FALSE. |
horiz |
a logical value. If TRUE, an horizontal dendrogram is drawn. |
cex |
size of labels |
main, xlab, ylab
|
main and axis titles |
sub |
Plot subtitle. If NULL, the method used hierarchical clustering is shown. To remove the subtitle use sub = "". |
ggtheme |
function, ggplot2 theme name. Default value is theme_classic(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
... |
other arguments to be passed to the function plot.dendrogram() |
an object of class fviz_dend which is a ggplot with the attributes "dendrogram" accessible using attr(x, "dendrogram"), where x is the result of fviz_dend().
# Load and scale the data data(USArrests) df <- scale(USArrests) # Hierarchical clustering res.hc <- hclust(dist(df)) # Default plot fviz_dend(res.hc) # Increase branch and rectangle line widths fviz_dend(res.hc, lwd = 2) # Cut the tree fviz_dend(res.hc, cex = 0.5, k = 4, color_labels_by_k = TRUE) # Don't color labels, add rectangles fviz_dend(res.hc, cex = 0.5, k = 4, color_labels_by_k = FALSE, rect = TRUE) # Change the color of tree using black color for all groups # Change rectangle border colors fviz_dend(res.hc, rect = TRUE, k_colors ="black", rect_border = 2:5, rect_lty = 1) # Customized color for groups fviz_dend(res.hc, k = 4, k_colors = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) # Color labels using k-means clusters km.clust <- kmeans(df, 4)$cluster fviz_dend(res.hc, k = 4, k_colors = c("blue", "green3", "red", "black"), label_cols = km.clust[res.hc$order], cex = 0.6) # Phylogenic tree layouts support both compatibility aliases and # current igraph layout names if (requireNamespace("igraph", quietly = TRUE)) { fviz_dend(res.hc, type = "phylogenic", phylo_layout = "layout_nicely", show_labels = FALSE) }# Load and scale the data data(USArrests) df <- scale(USArrests) # Hierarchical clustering res.hc <- hclust(dist(df)) # Default plot fviz_dend(res.hc) # Increase branch and rectangle line widths fviz_dend(res.hc, lwd = 2) # Cut the tree fviz_dend(res.hc, cex = 0.5, k = 4, color_labels_by_k = TRUE) # Don't color labels, add rectangles fviz_dend(res.hc, cex = 0.5, k = 4, color_labels_by_k = FALSE, rect = TRUE) # Change the color of tree using black color for all groups # Change rectangle border colors fviz_dend(res.hc, rect = TRUE, k_colors ="black", rect_border = 2:5, rect_lty = 1) # Customized color for groups fviz_dend(res.hc, k = 4, k_colors = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) # Color labels using k-means clusters km.clust <- kmeans(df, 4)$cluster fviz_dend(res.hc, k = 4, k_colors = c("blue", "green3", "red", "black"), label_cols = km.clust[res.hc$order], cex = 0.6) # Phylogenic tree layouts support both compatibility aliases and # current igraph layout names if (requireNamespace("igraph", quietly = TRUE)) { fviz_dend(res.hc, type = "phylogenic", phylo_layout = "layout_nicely", show_labels = FALSE) }
Draw confidence ellipses around the categories
fviz_ellipses( X, habillage, axes = c(1, 2), addEllipses = TRUE, ellipse.type = "confidence", palette = NULL, pointsize = 1, geom = c("point", "text"), ggtheme = theme_bw(), ... )fviz_ellipses( X, habillage, axes = c(1, 2), addEllipses = TRUE, ellipse.type = "confidence", palette = NULL, pointsize = 1, geom = c("point", "text"), ggtheme = theme_bw(), ... )
X |
an object of class MCA, PCA or MFA. |
habillage |
a numeric vector of indexes of variables or a character vector of names of variables. Can be also a data frame containing grouping variables. |
axes |
a numeric vector specifying the axes of interest. Values must be positive integer indices within the available dimensions. Default values are 1:2 for axes 1 and 2. |
addEllipses |
logical value. If TRUE, draws ellipses around the individuals when habillage != "none". |
ellipse.type |
Character specifying frame type. Possible values are
|
palette |
the color palette to be used for coloring or filling by groups. Allowed values include "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; or custom color palette e.g. c("blue", "red"); and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". Can be also a numeric vector of length(groups); in this case a basic color palette is created using the function palette. |
pointsize |
the size of points |
geom |
a text specifying the geometry to be used for the graph. Allowed values are the combination of c("point", "text"). Use "point" (to show only points); "text" to show only labels; c("point", "text") to show both types. |
ggtheme |
function, ggplot2 theme name. Default value is theme_pubr(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
... |
Arguments to be passed to the functions ggpubr::ggscatter() & ggpubr::ggpar(). |
a ggplot
Alboukadel Kassambara [email protected]
# Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) fviz_ellipses(res.mca, 3:4, geom = "point", palette = "jco")# Multiple Correspondence Analysis # +++++++++++++++++++++++++++++++++ library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph=FALSE) fviz_ellipses(res.mca, 3:4, geom = "point", palette = "jco")
Factor analysis of mixed data (FAMD) is, a particular case of
MFA, used to analyze a data set containing both quantitative and
qualitative variables. fviz_famd() provides ggplot2-based elegant
visualization of FAMD outputs from the R function: FAMD [FactoMineR].
fviz_famd_ind(): Graph of individuals
fviz_famd_var(): Graph of variables
fviz_famd(): An alias of fviz_famd_ind(res.famd)
fviz_famd_ind( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, shape.ind = 19, col.quali.var = "black", select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), gradient.cols = NULL, ... ) fviz_famd_var( X, choice = c("var", "quanti.var", "quali.var", "quali.sup"), axes = c(1, 2), geom = c("point", "text"), repel = FALSE, col.var = "red", alpha.var = 1, shape.var = 17, col.var.sup = "darkgreen", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_famd(X, ...)fviz_famd_ind( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, shape.ind = 19, col.quali.var = "black", select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), gradient.cols = NULL, ... ) fviz_famd_var( X, choice = c("var", "quanti.var", "quali.var", "quali.sup"), axes = c(1, 2), geom = c("point", "text"), repel = FALSE, col.var = "red", alpha.var = 1, shape.var = 17, col.var.sup = "darkgreen", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_famd(X, ...)
X |
an object of class FAMD [FactoMineR]. |
axes |
a numeric vector of length 2 specifying the dimensions to be plotted. |
geom |
a text specifying the geometry to be used for the graph. Allowed
values are the combination of |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
habillage |
an optional factor variable for coloring the observations by groups. Default value is "none". If X is an MFA object from FactoMineR package, habillage can also specify the index of the factor variable in the data. |
palette |
the color palette to be used for coloring or filling by groups. Allowed values include "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; or custom color palette e.g. c("blue", "red"); and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". Can be also a numeric vector of length(groups); in this case a basic color palette is created using the function palette. |
addEllipses |
logical value. If TRUE, draws ellipses around the individuals when habillage != "none". |
col.ind, col.var
|
color for individuals and variables, respectively. Can be a continuous variable or a factor variable. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for individuals/variables are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use automatic coloring (by cos2, contrib, ....), make sure that habillage ="none". |
col.ind.sup |
color for supplementary individuals |
alpha.ind, alpha.var
|
controls the transparency of individuals and variables, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the transparency for individual/variable colors are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use this, make sure that habillage ="none". |
shape.ind, shape.var
|
point shapes of individuals, variables, groups and axes |
col.quali.var |
color for qualitative variables in fviz_famd_ind(). Default is "black". |
select.ind, select.var
|
a selection of individuals and variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
|
gradient.cols |
vector of colors to use for n-colour gradient. Allowed values include brewer and ggsci color palettes. |
... |
Arguments to be passed to the function fviz() |
choice |
The graph to plot in fviz_famd_var(). Allowed values include one of c("var", "quanti.var", "quali.var", "quali.sup"). |
col.var.sup |
color for supplementary variables. |
a ggplot
Alboukadel Kassambara [email protected]
# Compute FAMD library("FactoMineR") data(wine) res.famd <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], graph = FALSE) res.famd.sup <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], sup.var = 2, graph = FALSE) # Eigenvalues/variances of dimensions fviz_screeplot(res.famd) # Graph of variables fviz_famd_var(res.famd) # Quantitative variables fviz_famd_var(res.famd, "quanti.var", repel = TRUE, col.var = "black") # Qualitative variables fviz_famd_var(res.famd, "quali.var", col.var = "black") # Supplementary qualitative variable categories fviz_famd_var(res.famd.sup, "quali.sup", col.var = "darkgreen") # Graph of individuals colored by cos2 fviz_famd_ind(res.famd, col.ind = "cos2", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE)# Compute FAMD library("FactoMineR") data(wine) res.famd <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], graph = FALSE) res.famd.sup <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], sup.var = 2, graph = FALSE) # Eigenvalues/variances of dimensions fviz_screeplot(res.famd) # Graph of variables fviz_famd_var(res.famd) # Quantitative variables fviz_famd_var(res.famd, "quanti.var", repel = TRUE, col.var = "black") # Qualitative variables fviz_famd_var(res.famd, "quali.var", col.var = "black") # Supplementary qualitative variable categories fviz_famd_var(res.famd.sup, "quali.sup", col.var = "darkgreen") # Graph of individuals colored by cos2 fviz_famd_ind(res.famd, col.ind = "cos2", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE)
Hierarchical Multiple Factor Analysis (HMFA) is, an extension of
MFA, used in a situation where the data are organized into a hierarchical
structure. fviz_hmfa() provides ggplot2-based elegant visualization of HMFA
outputs from the R function: HMFA [FactoMineR].
fviz_hmfa_ind(): Graph of individuals
fviz_hmfa_var(): Graph of variables
fviz_hmfa_quali_biplot(): Biplot of individuals and qualitative variables
fviz_hmfa(): An alias of fviz_hmfa_ind()
fviz_hmfa_ind( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", addEllipses = FALSE, shape.ind = 19, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), partial = NULL, col.partial = "group", group.names = NULL, node.level = 1, ... ) fviz_hmfa_var( X, choice = c("quanti.var", "quali.var", "group"), axes = c(1, 2), geom = c("point", "text"), repel = FALSE, col.var = "red", alpha.var = 1, shape.var = 17, col.var.sup = "darkgreen", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_hmfa_quali_biplot( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", title = "Biplot of individuals and qualitative variables - HMFA", ... ) fviz_hmfa(X, ...)fviz_hmfa_ind( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", addEllipses = FALSE, shape.ind = 19, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), partial = NULL, col.partial = "group", group.names = NULL, node.level = 1, ... ) fviz_hmfa_var( X, choice = c("quanti.var", "quali.var", "group"), axes = c(1, 2), geom = c("point", "text"), repel = FALSE, col.var = "red", alpha.var = 1, shape.var = 17, col.var.sup = "darkgreen", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_hmfa_quali_biplot( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", title = "Biplot of individuals and qualitative variables - HMFA", ... ) fviz_hmfa(X, ...)
X |
an object of class HMFA [FactoMineR]. |
axes |
a numeric vector of length 2 specifying the dimensions to be plotted. |
geom |
a text specifying the geometry to be used for the graph. Allowed
values are the combination of |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
habillage |
an optional factor variable for coloring the observations by groups. Default value is "none". If X is an HMFA object from FactoMineR package, habillage can also specify the index of the factor variable in the data. |
addEllipses |
logical value. If TRUE, draws ellipses around the individuals when habillage != "none". |
shape.ind, shape.var
|
point shapes of individuals and variables, respectively. |
col.ind, col.var
|
color for individuals, partial individuals and variables, respectively. Can be a continuous variable or a factor variable. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for individuals/variables are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use automatic coloring (by cos2, contrib, ....), make sure that habillage ="none". |
col.ind.sup |
color for supplementary individuals |
alpha.ind, alpha.var
|
controls the transparency of individual, partial individual and variable, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the transparency for individual/variable colors are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use this, make sure that habillage ="none". |
select.ind, select.var
|
a selection of individuals and variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
|
partial |
list of the individuals for which the partial points should be drawn. (by default, partial = NULL and no partial points are drawn). Use partial = "all" to visualize partial points for all individuals. |
col.partial |
color for partial individuals. By default, points are colored according to the groups. |
group.names |
a vector containing the name of the groups (by default, NULL and the group are named group.1, group.2 and so on). |
node.level |
a single number indicating the HMFA node level to plot. |
... |
Arguments to be passed to the function fviz() and ggpubr::ggpar() |
choice |
the graph to plot. Allowed values include one of c("quanti.var", "quali.var", "group") for plotting quantitative variables, qualitative variables and group of variables, respectively. |
col.var.sup |
color for supplementary variables. |
title |
the title of the graph |
a ggplot
Fabian Mundt [email protected]
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Hierarchical Multiple Factor Analysis # ++++++++++++++++++++++++ # Install and load FactoMineR to compute MFA # install.packages("FactoMineR") library("FactoMineR") data(wine) hierar <- list(c(2,5,3,10,9,2), c(4,2)) res.hmfa <- HMFA(wine, H = hierar, type=c("n",rep("s",5)), graph = FALSE) # Graph of individuals # ++++++++++++++++++++ # Color of individuals: col.ind = "#2E9FDF" # Use repel = TRUE to avoid overplotting (slow if many points) fviz_hmfa_ind(res.hmfa, repel = TRUE, col.ind = "#2E9FDF") # Color individuals by groups, add concentration ellipses # Remove labels: label = "none". # Change color palette to "jco". See ?ggpubr::ggpar grp <- as.factor(wine[,1]) p <- fviz_hmfa_ind(res.hmfa, label="none", habillage=grp, addEllipses=TRUE, palette = "jco") print(p) # Graph of variables # ++++++++++++++++++++++++++++++++++++++++ # Quantitative variables fviz_hmfa_var(res.hmfa, "quanti.var") # Graph of categorical variable categories fviz_hmfa_var(res.hmfa, "quali.var") # Groups of variables (correlation square) fviz_hmfa_var(res.hmfa, "group") # Biplot of categorical variable categories and individuals # +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fviz_hmfa_quali_biplot(res.hmfa) # Graph of partial individuals (starplot) # +++++++++++++++++++++++++++++++++++++++ fviz_hmfa_ind(res.hmfa, partial = "all", col.partial = "black")# Hierarchical Multiple Factor Analysis # ++++++++++++++++++++++++ # Install and load FactoMineR to compute MFA # install.packages("FactoMineR") library("FactoMineR") data(wine) hierar <- list(c(2,5,3,10,9,2), c(4,2)) res.hmfa <- HMFA(wine, H = hierar, type=c("n",rep("s",5)), graph = FALSE) # Graph of individuals # ++++++++++++++++++++ # Color of individuals: col.ind = "#2E9FDF" # Use repel = TRUE to avoid overplotting (slow if many points) fviz_hmfa_ind(res.hmfa, repel = TRUE, col.ind = "#2E9FDF") # Color individuals by groups, add concentration ellipses # Remove labels: label = "none". # Change color palette to "jco". See ?ggpubr::ggpar grp <- as.factor(wine[,1]) p <- fviz_hmfa_ind(res.hmfa, label="none", habillage=grp, addEllipses=TRUE, palette = "jco") print(p) # Graph of variables # ++++++++++++++++++++++++++++++++++++++++ # Quantitative variables fviz_hmfa_var(res.hmfa, "quanti.var") # Graph of categorical variable categories fviz_hmfa_var(res.hmfa, "quali.var") # Groups of variables (correlation square) fviz_hmfa_var(res.hmfa, "group") # Biplot of categorical variable categories and individuals # +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fviz_hmfa_quali_biplot(res.hmfa) # Graph of partial individuals (starplot) # +++++++++++++++++++++++++++++++++++++++ fviz_hmfa_ind(res.hmfa, partial = "all", col.partial = "black")
Multiple Correspondence Analysis (MCA) is an extension of simple CA to analyse a data table containing more than two categorical variables. fviz_mca() provides ggplot2-based elegant visualization of MCA outputs from the R functions: MCA [in FactoMineR], acm [in ade4], and expOutput/epMCA [in ExPosition]. Read more: Multiple Correspondence Analysis Essentials.
fviz_mca_ind(): Graph of individuals
fviz_mca_var(): Graph of variables
fviz_mca_biplot(): Biplot of individuals and variables
fviz_mca(): An alias of fviz_mca_biplot()
fviz_mca_ind( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, shape.ind = 19, map = "symmetric", select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_mca_var( X, choice = c("var.cat", "mca.cor", "var", "quanti.sup"), axes = c(1, 2), geom = c("point", "text"), geom.var = geom, repel = FALSE, col.var = "red", alpha.var = 1, shape.var = 17, col.quanti.sup = "blue", col.quali.sup = "darkgreen", map = "symmetric", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_mca_biplot( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, geom.var = geom, repel = FALSE, label = "all", invisible = "none", habillage = "none", addEllipses = FALSE, palette = NULL, arrows = c(FALSE, FALSE), map = "symmetric", title = "MCA - Biplot", ... ) fviz_mca(X, ...)fviz_mca_ind( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, shape.ind = 19, map = "symmetric", select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_mca_var( X, choice = c("var.cat", "mca.cor", "var", "quanti.sup"), axes = c(1, 2), geom = c("point", "text"), geom.var = geom, repel = FALSE, col.var = "red", alpha.var = 1, shape.var = 17, col.quanti.sup = "blue", col.quali.sup = "darkgreen", map = "symmetric", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_mca_biplot( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, geom.var = geom, repel = FALSE, label = "all", invisible = "none", habillage = "none", addEllipses = FALSE, palette = NULL, arrows = c(FALSE, FALSE), map = "symmetric", title = "MCA - Biplot", ... ) fviz_mca(X, ...)
X |
an object of class MCA [FactoMineR], acm [ade4] and expOutput/epMCA [ExPosition]. |
axes |
a numeric vector of length 2 specifying the dimensions to be plotted. |
geom |
a text specifying the geometry to be used for the graph. Allowed
values are the combination of |
geom.ind, geom.var
|
as |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
habillage |
an optional factor variable for coloring the observations by groups. Default value is "none". If X is an MCA object from FactoMineR package, habillage can also specify the index of the factor variable in the data. |
palette |
the color palette to be used for coloring or filling by groups. Allowed values include "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; or custom color palette e.g. c("blue", "red"); and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". Can be also a numeric vector of length(groups); in this case a basic color palette is created using the function palette. |
addEllipses |
logical value. If TRUE, draws ellipses around the individuals when habillage != "none". |
col.ind, col.var
|
color for individuals and variables, respectively. Can be a continuous variable or a factor variable. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for individuals/variables are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use automatic coloring (by cos2, contrib, ....), make sure that habillage ="none". |
col.ind.sup |
color for supplementary individuals |
alpha.ind, alpha.var
|
controls the transparency of individual and variable colors, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the transparency for individual/variable colors are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use this, make sure that habillage ="none". |
shape.ind, shape.var
|
point shapes of individuals and variables. |
map |
character string specifying the map type. Allowed options include: "symmetric", "rowprincipal", "colprincipal", "symbiplot", "rowgab", "colgab", "rowgreen" and "colgreen". See details |
select.ind, select.var
|
a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
|
... |
Additional arguments.
|
choice |
the graph to plot. Allowed values include: i) "var" and "mca.cor" for plotting the correlation between variables and principal dimensions; ii) "var.cat" for variable categories and iii) "quanti.sup" for the supplementary quantitative variables. |
col.quanti.sup, col.quali.sup
|
a color for the quantitative/qualitative supplementary variables. |
label |
a text specifying the elements to be labelled. Default value is "all". Allowed values are "none" or the combination of c("ind", "ind.sup","var", "quali.sup", "quanti.sup"). "ind" can be used to label only active individuals. "ind.sup" is for supplementary individuals. "var" is for active variable categories. "quali.sup" is for supplementary qualitative variable categories. "quanti.sup" is for quantitative supplementary variables. |
invisible |
a text specifying the elements to be hidden on the plot. Default value is "none". Allowed values are the combination of c("ind", "ind.sup","var", "quali.sup", "quanti.sup"). |
arrows |
Vector of two logicals specifying if the plot should contain points (FALSE, default) or arrows (TRUE). First value sets the rows and the second value sets the columns. |
title |
the title of the graph |
The default plot of MCA is a "symmetric" plot in which both rows and columns are in principal coordinates. In this situation, it's not possible to interpret the distance between row points and column points. To overcome this problem, the simplest way is to make an asymmetric plot. The argument "map" can be used to change the plot type. For more explanation, read the details section of fviz_ca documentation.
a ggplot
Alboukadel Kassambara [email protected]
get_mca, fviz_pca, fviz_ca,
fviz_mfa, fviz_hmfa
# Multiple Correspondence Analysis # ++++++++++++++++++++++++++++++ # Install and load FactoMineR to compute MCA # install.packages("FactoMineR") library("FactoMineR") data(poison) poison.active <- poison[1:55, 5:15] head(poison.active) res.mca <- MCA(poison.active, graph=FALSE) # Graph of individuals # +++++++++++++++++++++ # Default Plot # Color of individuals: col.ind = "steelblue" fviz_mca_ind(res.mca, col.ind = "steelblue") # 1. Control automatically the color of individuals # using the "cos2" or the contributions "contrib" # cos2 = the quality of the individuals on the factor map # 2. To keep only point or text use geom = "point" or geom = "text". # 3. Change themes: http://www.sthda.com/english/wiki/ggplot2-themes fviz_mca_ind(res.mca, col.ind = "cos2", repel = TRUE) ## Not run: # You can also control the transparency # of the color by the cos2 fviz_mca_ind(res.mca, alpha.ind="cos2") ## End(Not run) # Color individuals by groups, add concentration ellipses # Remove labels: label = "none". grp <- as.factor(poison.active[, "Vomiting"]) p <- fviz_mca_ind(res.mca, label="none", habillage=grp, addEllipses=TRUE, ellipse.level=0.95) print(p) # Change group colors using RColorBrewer color palettes # Read more: http://www.sthda.com/english/wiki/ggplot2-colors p + scale_color_brewer(palette="Dark2") + scale_fill_brewer(palette="Dark2") # Change group colors manually # Read more: http://www.sthda.com/english/wiki/ggplot2-colors p + scale_color_manual(values=c("#999999", "#E69F00"))+ scale_fill_manual(values=c("#999999", "#E69F00")) # Select and visualize some individuals (ind) with select.ind argument. # - ind with cos2 >= 0.4: select.ind = list(cos2 = 0.4) # - Top 20 ind according to the cos2: select.ind = list(cos2 = 20) # - Top 20 contributing individuals: select.ind = list(contrib = 20) # - Select ind by names: select.ind = list(name = c("44", "38", "53", "39") ) # Example: Select the top 40 according to the cos2 fviz_mca_ind(res.mca, select.ind = list(cos2 = 20)) # Graph of variable categories # ++++++++++++++++++++++++++++ # Default plot: use repel = TRUE to avoid overplotting fviz_mca_var(res.mca, col.var = "#FC4E07") # Control variable colors using their contributions # use repel = TRUE to avoid overplotting fviz_mca_var(res.mca, col.var = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07")) # Biplot # ++++++++++++++++++++++++++ grp <- as.factor(poison.active[, "Vomiting"]) fviz_mca_biplot(res.mca, repel = TRUE, col.var = "#E7B800", habillage = grp, addEllipses = TRUE, ellipse.level = 0.95) ## Not run: # Keep only the labels for variable categories: fviz_mca_biplot(res.mca, label ="var") # Keep only labels for individuals fviz_mca_biplot(res.mca, label ="ind") # Hide variable categories fviz_mca_biplot(res.mca, invisible ="var") # Hide individuals fviz_mca_biplot(res.mca, invisible ="ind") # Control automatically the color of individuals using the cos2 fviz_mca_biplot(res.mca, label ="var", col.ind="cos2") # Change the color by groups, add ellipses fviz_mca_biplot(res.mca, label="var", col.var ="blue", habillage=grp, addEllipses=TRUE, ellipse.level=0.95) # Select the top 30 contributing individuals # And the top 10 variables fviz_mca_biplot(res.mca, select.ind = list(contrib = 30), select.var = list(contrib = 10)) ## End(Not run)# Multiple Correspondence Analysis # ++++++++++++++++++++++++++++++ # Install and load FactoMineR to compute MCA # install.packages("FactoMineR") library("FactoMineR") data(poison) poison.active <- poison[1:55, 5:15] head(poison.active) res.mca <- MCA(poison.active, graph=FALSE) # Graph of individuals # +++++++++++++++++++++ # Default Plot # Color of individuals: col.ind = "steelblue" fviz_mca_ind(res.mca, col.ind = "steelblue") # 1. Control automatically the color of individuals # using the "cos2" or the contributions "contrib" # cos2 = the quality of the individuals on the factor map # 2. To keep only point or text use geom = "point" or geom = "text". # 3. Change themes: http://www.sthda.com/english/wiki/ggplot2-themes fviz_mca_ind(res.mca, col.ind = "cos2", repel = TRUE) ## Not run: # You can also control the transparency # of the color by the cos2 fviz_mca_ind(res.mca, alpha.ind="cos2") ## End(Not run) # Color individuals by groups, add concentration ellipses # Remove labels: label = "none". grp <- as.factor(poison.active[, "Vomiting"]) p <- fviz_mca_ind(res.mca, label="none", habillage=grp, addEllipses=TRUE, ellipse.level=0.95) print(p) # Change group colors using RColorBrewer color palettes # Read more: http://www.sthda.com/english/wiki/ggplot2-colors p + scale_color_brewer(palette="Dark2") + scale_fill_brewer(palette="Dark2") # Change group colors manually # Read more: http://www.sthda.com/english/wiki/ggplot2-colors p + scale_color_manual(values=c("#999999", "#E69F00"))+ scale_fill_manual(values=c("#999999", "#E69F00")) # Select and visualize some individuals (ind) with select.ind argument. # - ind with cos2 >= 0.4: select.ind = list(cos2 = 0.4) # - Top 20 ind according to the cos2: select.ind = list(cos2 = 20) # - Top 20 contributing individuals: select.ind = list(contrib = 20) # - Select ind by names: select.ind = list(name = c("44", "38", "53", "39") ) # Example: Select the top 40 according to the cos2 fviz_mca_ind(res.mca, select.ind = list(cos2 = 20)) # Graph of variable categories # ++++++++++++++++++++++++++++ # Default plot: use repel = TRUE to avoid overplotting fviz_mca_var(res.mca, col.var = "#FC4E07") # Control variable colors using their contributions # use repel = TRUE to avoid overplotting fviz_mca_var(res.mca, col.var = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07")) # Biplot # ++++++++++++++++++++++++++ grp <- as.factor(poison.active[, "Vomiting"]) fviz_mca_biplot(res.mca, repel = TRUE, col.var = "#E7B800", habillage = grp, addEllipses = TRUE, ellipse.level = 0.95) ## Not run: # Keep only the labels for variable categories: fviz_mca_biplot(res.mca, label ="var") # Keep only labels for individuals fviz_mca_biplot(res.mca, label ="ind") # Hide variable categories fviz_mca_biplot(res.mca, invisible ="var") # Hide individuals fviz_mca_biplot(res.mca, invisible ="ind") # Control automatically the color of individuals using the cos2 fviz_mca_biplot(res.mca, label ="var", col.ind="cos2") # Change the color by groups, add ellipses fviz_mca_biplot(res.mca, label="var", col.var ="blue", habillage=grp, addEllipses=TRUE, ellipse.level=0.95) # Select the top 30 contributing individuals # And the top 10 variables fviz_mca_biplot(res.mca, select.ind = list(contrib = 30), select.var = list(contrib = 10)) ## End(Not run)
Plots the classification, the uncertainty and the BIC values returned by the Mclust() function.
fviz_mclust( object, what = c("classification", "uncertainty", "BIC"), ellipse.type = "norm", ellipse.level = 0.4, ggtheme = theme_classic(), ... ) fviz_mclust_bic( object, model.names = NULL, shape = 19, color = "model", palette = NULL, legend = NULL, main = "Model selection", xlab = "Number of components", ylab = "BIC", ... )fviz_mclust( object, what = c("classification", "uncertainty", "BIC"), ellipse.type = "norm", ellipse.level = 0.4, ggtheme = theme_classic(), ... ) fviz_mclust_bic( object, model.names = NULL, shape = 19, color = "model", palette = NULL, legend = NULL, main = "Model selection", xlab = "Number of components", ylab = "BIC", ... )
object |
an object of class Mclust |
what |
choose from one of the following three options: "classification" (default), "uncertainty" and "BIC". |
ellipse.type |
Character specifying frame type. Possible values are
'convex', 'confidence' or types supported by
|
ellipse.level |
the size of the concentration ellipse in normal
probability. Passed for |
ggtheme |
function, ggplot2 theme name. Default value is theme_pubr(). Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), .... |
... |
other arguments to be passed to the functions fviz_cluster and ggpar. |
model.names |
one or more model names corresponding to models fit in object. The default is to plot the BIC for all of the models fit. |
shape |
point shape. To change point shape by model names use shape = "model". |
color |
point and line color. |
palette |
the color palette to be used for coloring or filling by groups. Allowed values include "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; or custom color palette e.g. c("blue", "red"); and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". Can be also a numeric vector of length(groups); in this case a basic color palette is created using the function palette. |
legend |
character specifying legend position. Allowed values are one of c("top", "bottom", "left", "right", "none"). To remove the legend use legend = "none". Legend position can be also specified using a numeric vector c(x, y); see details section. |
main |
plot main title. |
xlab |
character vector specifying x axis labels. Use xlab = FALSE to hide xlab. |
ylab |
character vector specifying y axis labels. Use ylab = FALSE to hide ylab. |
A ggplot2 object.
fviz_mclust(): Plots classification and uncertainty.
fviz_mclust_bic(): Plots the BIC values.
if(requireNamespace("mclust", quietly = TRUE)){ # Compute model-based-clustering library("mclust") data("diabetes") mc <- Mclust(diabetes[, -1]) # Visualize BIC values fviz_mclust_bic(mc) # Visualize classification fviz_mclust(mc, "classification", geom = "point") }if(requireNamespace("mclust", quietly = TRUE)){ # Compute model-based-clustering library("mclust") data("diabetes") mc <- Mclust(diabetes[, -1]) # Visualize BIC values fviz_mclust_bic(mc) # Visualize classification fviz_mclust(mc, "classification", geom = "point") }
Multiple factor analysis (MFA) is used to analyze a data set in
which individuals are described by several sets of variables (quantitative
and/or qualitative) structured into groups. fviz_mfa() provides
ggplot2-based elegant visualization of MFA outputs from the R function: MFA
[FactoMineR].
fviz_mfa_ind(): Graph of individuals
fviz_mfa_var(): Graph of variables
fviz_mfa_axes(): Graph of partial axes
fviz_mfa(): An alias of fviz_mfa_ind(res.mfa, partial = "all")
fviz_mfa_quali_biplot(): Biplot of individuals and qualitative variables
fviz_mfa_ind( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, shape.ind = 19, col.quali.var.sup = "black", select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), partial = NULL, col.partial = "group", ... ) fviz_mfa_quali_biplot( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, title = "Biplot of individuals and qualitative variables - MFA", ... ) fviz_mfa_var( X, choice = c("quanti.var", "group", "quali.var", "quali.sup"), axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", col.var = "red", alpha.var = 1, shape.var = 17, col.var.sup = "darkgreen", palette = NULL, select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_mfa_axes( X, axes = c(1, 2), geom = c("arrow", "text"), col.axes = NULL, alpha.axes = 1, col.circle = "grey70", select.axes = list(name = NULL, contrib = NULL), repel = FALSE, ... ) fviz_mfa(X, partial = "all", ...)fviz_mfa_ind( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "blue", col.ind.sup = "darkblue", alpha.ind = 1, shape.ind = 19, col.quali.var.sup = "black", select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), partial = NULL, col.partial = "group", ... ) fviz_mfa_quali_biplot( X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, title = "Biplot of individuals and qualitative variables - MFA", ... ) fviz_mfa_var( X, choice = c("quanti.var", "group", "quali.var", "quali.sup"), axes = c(1, 2), geom = c("point", "text"), repel = FALSE, habillage = "none", col.var = "red", alpha.var = 1, shape.var = 17, col.var.sup = "darkgreen", palette = NULL, select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_mfa_axes( X, axes = c(1, 2), geom = c("arrow", "text"), col.axes = NULL, alpha.axes = 1, col.circle = "grey70", select.axes = list(name = NULL, contrib = NULL), repel = FALSE, ... ) fviz_mfa(X, partial = "all", ...)
X |
an object of class MFA [FactoMineR]. |
axes |
a numeric vector of length 2 specifying the dimensions to be plotted. |
geom |
a text specifying the geometry to be used for the graph. Allowed
values are the combination of |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
habillage |
an optional factor variable for coloring the observations by groups. Default value is "none". If X is an MFA object from FactoMineR package, habillage can also specify the index of the factor variable in the data. |
palette |
the color palette to be used for coloring or filling by groups. Allowed values include "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; or custom color palette e.g. c("blue", "red"); and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". Can be also a numeric vector of length(groups); in this case a basic color palette is created using the function palette. |
addEllipses |
logical value. If TRUE, draws ellipses around the individuals when habillage != "none". |
col.ind, col.var, col.axes
|
color for individuals, variables and col.axes respectively. Can be a continuous variable or a factor variable. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for individuals/variables are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use automatic coloring (by cos2, contrib, ....), make sure that habillage ="none". |
col.ind.sup |
color for supplementary individuals |
alpha.ind, alpha.var, alpha.axes
|
controls the transparency of individual, variable, group and axes colors, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the transparency for individual/variable colors are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2 , "coord"), x values("x") or y values("y"). To use this, make sure that habillage ="none". |
shape.ind, shape.var
|
point shapes of individuals, variables, groups and axes |
col.quali.var.sup |
color for supplementary qualitative variables. Default is "black". |
select.ind, select.var, select.axes
|
a selection of individuals/partial individuals/ variables/groups/axes to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
|
partial |
list of the individuals for which the partial points should be drawn. (by default, partial = NULL and no partial points are drawn). Use partial = "all" to visualize partial points for all individuals. |
col.partial |
color for partial individuals. By default, points are colored according to the groups. |
... |
Arguments to be passed to the function fviz() |
title |
the title of the graph |
choice |
the graph to plot. Allowed values include one of c("quanti.var", "quali.var", "quali.sup", "group") for plotting quantitative variables, qualitative variables, supplementary qualitative variables and group of variables, respectively. |
col.var.sup |
color for supplementary variables. |
col.circle |
a color for the correlation circle. Used only when X is a PCA output. |
a ggplot2 plot
Fabian Mundt [email protected]
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Compute Multiple Factor Analysis library("FactoMineR") data(wine) res.mfa <- MFA(wine, group=c(2,5,3,10,9,2), type=c("n",rep("s",5)), ncp=5, name.group=c("orig","olf","vis","olfag","gust","ens"), num.group.sup=c(1,6), graph=FALSE) # Eigenvalues/variances of dimensions fviz_screeplot(res.mfa) # Group of variables fviz_mfa_var(res.mfa, "group") # Quantitative variables fviz_mfa_var(res.mfa, "quanti.var", palette = "jco", col.var.sup = "violet", repel = TRUE) # Graph of individuals colored by cos2 fviz_mfa_ind(res.mfa, col.ind = "cos2", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE) # Partial individuals fviz_mfa_ind(res.mfa, partial = rownames(wine)[1:3], col.partial = "black") # Partial axes fviz_mfa_axes(res.mfa) # Graph of categorical variable categories # ++++++++++++++++++++++++++++++++++++++++ data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Plot of qualitative variables fviz_mfa_var(res.mfa, "quali.var") # Plot of supplementary qualitative variables fviz_mfa_var(res.mfa, "quali.sup") # Biplot of categorical variable categories and individuals # +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ # Use repel = TRUE to avoid overplotting grp <- as.factor(poison[, "Vomiting"]) fviz_mfa_quali_biplot(res.mfa, repel = FALSE, col.var = "#E7B800", habillage = grp, addEllipses = TRUE, ellipse.level = 0.95)# Compute Multiple Factor Analysis library("FactoMineR") data(wine) res.mfa <- MFA(wine, group=c(2,5,3,10,9,2), type=c("n",rep("s",5)), ncp=5, name.group=c("orig","olf","vis","olfag","gust","ens"), num.group.sup=c(1,6), graph=FALSE) # Eigenvalues/variances of dimensions fviz_screeplot(res.mfa) # Group of variables fviz_mfa_var(res.mfa, "group") # Quantitative variables fviz_mfa_var(res.mfa, "quanti.var", palette = "jco", col.var.sup = "violet", repel = TRUE) # Graph of individuals colored by cos2 fviz_mfa_ind(res.mfa, col.ind = "cos2", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE) # Partial individuals fviz_mfa_ind(res.mfa, partial = rownames(wine)[1:3], col.partial = "black") # Partial axes fviz_mfa_axes(res.mfa) # Graph of categorical variable categories # ++++++++++++++++++++++++++++++++++++++++ data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph=FALSE) # Plot of qualitative variables fviz_mfa_var(res.mfa, "quali.var") # Plot of supplementary qualitative variables fviz_mfa_var(res.mfa, "quali.sup") # Biplot of categorical variable categories and individuals # +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ # Use repel = TRUE to avoid overplotting grp <- as.factor(poison[, "Vomiting"]) fviz_mfa_quali_biplot(res.mfa, repel = FALSE, col.var = "#E7B800", habillage = grp, addEllipses = TRUE, ellipse.level = 0.95)
Partitioning methods, such as k-means clustering require the users to specify the number of clusters to be generated.
fviz_nbclust(): Determines and visualizes the optimal number of
clusters using different methods: within cluster sums of squares,
average silhouette and gap statistics. Silhouette values
are evaluated only for k = 2, ..., k.max because average
silhouette width is undefined for a one-cluster partition. The dashed
guide line marks the best displayed silhouette width among those
k >= 2 candidates.
fviz_gap_stat(): Visualizes the gap statistic generated by the
function clusGap() [in cluster package]. The optimal
number of clusters is specified using the "firstmax" method
(?cluster::clustGap).
For method = "wss", factoextra computes the k = 1
baseline internally so helper functions such as hcut() and
hkmeans() can keep rejecting direct k = 1 inputs.
Read more: Determining the optimal number of clusters
fviz_nbclust( x, FUNcluster = NULL, method = c("silhouette", "wss", "gap_stat"), diss = NULL, k.max = 10, nboot = 100, verbose = interactive(), barfill = "steelblue", barcolor = "steelblue", linecolor = "steelblue", print.summary = TRUE, ... ) fviz_gap_stat( gap_stat, linecolor = "steelblue", maxSE = list(method = "firstSEmax", SE.factor = 1) )fviz_nbclust( x, FUNcluster = NULL, method = c("silhouette", "wss", "gap_stat"), diss = NULL, k.max = 10, nboot = 100, verbose = interactive(), barfill = "steelblue", barcolor = "steelblue", linecolor = "steelblue", print.summary = TRUE, ... ) fviz_gap_stat( gap_stat, linecolor = "steelblue", maxSE = list(method = "firstSEmax", SE.factor = 1) )
x |
numeric matrix or data frame. In the function fviz_nbclust(), x can be the results of the function NbClust(). |
FUNcluster |
a partitioning function which accepts as first argument a
(data) matrix like |
method |
the method to be used for estimating the optimal number of clusters. Possible values are "silhouette" (for average silhouette width), "wss" (for total within sum of square) and "gap_stat" (for gap statistics). |
diss |
dist object as produced by dist(), i.e.: diss = dist(x, method = "euclidean"). Used to compute the average silhouette width of clusters, the within sum of square and hierarchical clustering. If NULL, dist(x) is computed with the default method = "euclidean" |
k.max |
the maximum number of clusters to consider, must be at least two. |
nboot |
integer, number of Monte Carlo ("bootstrap") samples. Used only for determining the number of clusters using gap statistic. |
verbose |
logical value. If TRUE, the result of progress is printed. |
barfill, barcolor
|
fill color and outline color for bars |
linecolor |
color for lines |
print.summary |
logical value. If true, the optimal number of clusters are printed in fviz_nbclust(). |
... |
optionally further arguments:
arguments for FUNcluster() in "wss"/"silhouette" modes; arguments for
|
gap_stat |
an object of class "clusGap" returned by the function clusGap() [in cluster package] |
maxSE |
a list containing the parameters (method and SE.factor) for determining the location of the maximum of the gap statistic (Read the documentation ?cluster::maxSE). Allowed values for maxSE$method include:
|
fviz_nbclust, fviz_gap_stat: return a ggplot2
The default method "firstSEmax" (developed by Martin Maechler, 2012) is recommended as a robust alternative to "Tibs2001SEmax". The original Tibshirani method can be overly conservative and often returns k=1 when standard deviations are large relative to gap differences. The "firstSEmax" method finds the smallest k within one standard error of the first local maximum, providing more stable results in practice.
Alboukadel Kassambara [email protected]
set.seed(123) # Data preparation # +++++++++++++++ data("iris") head(iris) # Remove species column (5) and scale the data iris.scaled <- scale(iris[, -5]) # Optimal number of clusters in the data # ++++++++++++++++++++++++++++++++++++++ # Examples are provided only for kmeans, but # you can also use cluster::pam (for pam) or # hcut (for hierarchical clustering) ### Elbow method (look at the knee) # Elbow method for kmeans fviz_nbclust(iris.scaled, kmeans, method = "wss") + geom_vline(xintercept = 3, linetype = 2) # WSS with hierarchical clustering keeps the internal k = 1 baseline fviz_nbclust(iris.scaled, hcut, method = "wss", hc_method = "complete") # Average silhouette for kmeans fviz_nbclust(iris.scaled, kmeans, method = "silhouette") ### Gap statistic library(cluster) set.seed(123) # Compute gap statistic for kmeans # we used B = 10 for demo. Recommended value is ~500 gap_stat <- clusGap(iris.scaled, FUN = kmeans, nstart = 25, K.max = 10, B = 10) print(gap_stat, method = "firstmax") fviz_gap_stat(gap_stat) # Gap statistic for hierarchical clustering gap_stat <- clusGap(iris.scaled, FUN = hcut, K.max = 10, B = 10) fviz_gap_stat(gap_stat)set.seed(123) # Data preparation # +++++++++++++++ data("iris") head(iris) # Remove species column (5) and scale the data iris.scaled <- scale(iris[, -5]) # Optimal number of clusters in the data # ++++++++++++++++++++++++++++++++++++++ # Examples are provided only for kmeans, but # you can also use cluster::pam (for pam) or # hcut (for hierarchical clustering) ### Elbow method (look at the knee) # Elbow method for kmeans fviz_nbclust(iris.scaled, kmeans, method = "wss") + geom_vline(xintercept = 3, linetype = 2) # WSS with hierarchical clustering keeps the internal k = 1 baseline fviz_nbclust(iris.scaled, hcut, method = "wss", hc_method = "complete") # Average silhouette for kmeans fviz_nbclust(iris.scaled, kmeans, method = "silhouette") ### Gap statistic library(cluster) set.seed(123) # Compute gap statistic for kmeans # we used B = 10 for demo. Recommended value is ~500 gap_stat <- clusGap(iris.scaled, FUN = kmeans, nstart = 25, K.max = 10, B = 10) print(gap_stat, method = "firstmax") fviz_gap_stat(gap_stat) # Gap statistic for hierarchical clustering gap_stat <- clusGap(iris.scaled, FUN = hcut, K.max = 10, B = 10) fviz_gap_stat(gap_stat)
Principal component analysis (PCA) reduces the dimensionality of multivariate data, to two or three that can be visualized graphically with minimal loss of information. fviz_pca() provides ggplot2-based elegant visualization of PCA outputs from: i) prcomp and princomp [in built-in R stats], ii) PCA [in FactoMineR], iii) dudi.pca [in ade4] and epPCA [ExPosition]. Read more: Principal Component Analysis
fviz_pca_ind(): Graph of individuals
fviz_pca_var(): Graph of variables
fviz_pca_biplot(): Biplot of individuals and variables
fviz_pca(): An alias of fviz_pca_biplot()
Note that, fviz_pca_xxx() functions are wrapper around the core
function fviz(), which is also a wrapper around the
function ggscatter() [in ggpubr]. Therefore, further arguments, to be
passed to the function fviz() and ggscatter(), can be specified in
fviz_pca_ind() and fviz_pca_var().
fviz_pca(X, ...) fviz_pca_ind( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "black", fill.ind = "white", col.ind.sup = "blue", alpha.ind = 1, select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_pca_var( X, axes = c(1, 2), geom = c("arrow", "text"), geom.var = geom, repel = FALSE, col.var = "black", fill.var = "white", alpha.var = 1, col.quanti.sup = "blue", col.circle = "grey70", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_pca_biplot( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, geom.var = c("arrow", "text"), col.ind = "black", fill.ind = "white", col.var = "steelblue", fill.var = "white", gradient.cols = NULL, label = "all", invisible = "none", repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, title = "PCA - Biplot", biplot.type = c("auto", "form", "covariance"), ... )fviz_pca(X, ...) fviz_pca_ind( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, col.ind = "black", fill.ind = "white", col.ind.sup = "blue", alpha.ind = 1, select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_pca_var( X, axes = c(1, 2), geom = c("arrow", "text"), geom.var = geom, repel = FALSE, col.var = "black", fill.var = "white", alpha.var = 1, col.quanti.sup = "blue", col.circle = "grey70", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), ... ) fviz_pca_biplot( X, axes = c(1, 2), geom = c("point", "text"), geom.ind = geom, geom.var = c("arrow", "text"), col.ind = "black", fill.ind = "white", col.var = "steelblue", fill.var = "white", gradient.cols = NULL, label = "all", invisible = "none", repel = FALSE, habillage = "none", palette = NULL, addEllipses = FALSE, title = "PCA - Biplot", biplot.type = c("auto", "form", "covariance"), ... )
X |
an object of class PCA [FactoMineR]; prcomp and princomp [stats]; dudi and pca [ade4]; expOutput/epPCA [ExPosition]. |
... |
Additional arguments.
|
axes |
a numeric vector of length 2 specifying the dimensions to be plotted. |
geom |
a text specifying the geometry to be used for the graph. Allowed
values are the combination of |
geom.ind, geom.var
|
as |
repel |
a boolean, whether to use ggrepel to avoid overplotting text
labels or not. The old |
habillage |
an optional factor variable for coloring the observations by groups. Default value is "none". If X is a PCA object from FactoMineR package, habillage can also specify the supplementary qualitative variable (by its index or name) to be used for coloring individuals by groups (see ?PCA in FactoMineR). |
palette |
the color palette to be used for coloring or filling by groups. Allowed values include "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues", ...; or custom color palette e.g. c("blue", "red"); and scientific journal palettes from ggsci R package, e.g.: "npg", "aaas", "lancet", "jco", "ucscgb", "uchicago", "simpsons" and "rickandmorty". Can be also a numeric vector of length(groups); in this case a basic color palette is created using the function palette. |
addEllipses |
logical value. If TRUE, draws ellipses around the individuals when habillage != "none". |
col.ind, col.var
|
color for individuals and variables, respectively. Can be a continuous variable or a factor variable. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for individuals/variables are automatically controlled by their qualities of representation ("cos2"), contributions ("contrib"), coordinates (x^2+y^2, "coord"), x values ("x") or y values ("y"). To use automatic coloring (by cos2, contrib, ....), make sure that habillage ="none". |
fill.ind, fill.var
|
same as col.ind and col.var but for the fill color. |
col.ind.sup |
color for supplementary individuals |
alpha.ind, alpha.var
|
controls the transparency of individual and variable colors, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the transparency for the individual/variable colors are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2+y^2, "coord"), x values("x") or y values("y"). To use this, make sure that habillage ="none". |
select.ind, select.var
|
a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
|
col.quanti.sup |
a color for the quantitative supplementary variables. |
col.circle |
a color for the correlation circle. Used only when X is a PCA output. |
gradient.cols |
vector of colors to use for n-colour gradient. Allowed values include brewer and ggsci color palettes. |
label |
a text specifying the elements to be labelled. Default value is "all". Allowed values are "none" or the combination of c("ind", "ind.sup", "quali", "var", "quanti.sup"). "ind" can be used to label only active individuals. "ind.sup" is for supplementary individuals. "quali" is for supplementary qualitative variables. "var" is for active variables. "quanti.sup" is for quantitative supplementary variables. |
invisible |
a text specifying the elements to be hidden on the plot. Default value is "none". Allowed values are the combination of c("ind", "ind.sup", "quali", "var", "quanti.sup"). |
title |
the title of the graph |
biplot.type |
type of biplot scaling for fviz_pca_biplot(). Options are:
Note: "form" and "covariance" scaling requires prcomp or princomp objects. |
a ggplot
Alboukadel Kassambara [email protected]
# Principal component analysis # ++++++++++++++++++++++++++++++ data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) # Graph of individuals # +++++++++++++++++++++ # Default plot # Use repel = TRUE to avoid overplotting (slow if many points) fviz_pca_ind(res.pca, col.ind = "#00AFBB", repel = TRUE) # 1. Control automatically the color of individuals # using the "cos2" or the contributions "contrib" # cos2 = the quality of the individuals on the factor map # 2. To keep only point or text use geom = "point" or geom = "text". # 3. Change themes using ggtheme: http://www.sthda.com/english/wiki/ggplot2-themes fviz_pca_ind(res.pca, col.ind="cos2", geom = "point", gradient.cols = c("white", "#2E9FDF", "#FC4E07" )) # Color individuals by groups, add concentration ellipses # Change group colors using RColorBrewer color palettes # Read more: http://www.sthda.com/english/wiki/ggplot2-colors # Remove labels: label = "none". fviz_pca_ind(res.pca, label="none", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95, palette = "Dark2") # Change group colors manually # Read more: http://www.sthda.com/english/wiki/ggplot2-colors fviz_pca_ind(res.pca, label="none", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95, palette = c("#999999", "#E69F00", "#56B4E9")) # Select and visualize some individuals (ind) with select.ind argument. # - ind with cos2 >= 0.96: select.ind = list(cos2 = 0.96) # - Top 20 ind according to the cos2: select.ind = list(cos2 = 20) # - Top 20 contributing individuals: select.ind = list(contrib = 20) # - Select ind by names: select.ind = list(name = c("23", "42", "119") ) # Example: Select the top 40 according to the cos2 fviz_pca_ind(res.pca, select.ind = list(cos2 = 40)) # Graph of variables # ++++++++++++++++++++++++++++ # Default plot fviz_pca_var(res.pca, col.var = "steelblue") # Control variable colors using their contributions fviz_pca_var(res.pca, col.var = "contrib", gradient.cols = c("white", "blue", "red"), ggtheme = theme_minimal()) # Biplot of individuals and variables # ++++++++++++++++++++++++++ # Keep only the labels for variables # Change the color by groups, add ellipses fviz_pca_biplot(res.pca, label = "var", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95, ggtheme = theme_minimal()) # Biplot scaling modes: # Form biplot - focus on individual distances fviz_pca_biplot(res.pca, biplot.type = "form", label = "var", habillage = iris$Species) # Covariance biplot - focus on variable correlations fviz_pca_biplot(res.pca, biplot.type = "covariance", label = "var", habillage = iris$Species)# Principal component analysis # ++++++++++++++++++++++++++++++ data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) # Graph of individuals # +++++++++++++++++++++ # Default plot # Use repel = TRUE to avoid overplotting (slow if many points) fviz_pca_ind(res.pca, col.ind = "#00AFBB", repel = TRUE) # 1. Control automatically the color of individuals # using the "cos2" or the contributions "contrib" # cos2 = the quality of the individuals on the factor map # 2. To keep only point or text use geom = "point" or geom = "text". # 3. Change themes using ggtheme: http://www.sthda.com/english/wiki/ggplot2-themes fviz_pca_ind(res.pca, col.ind="cos2", geom = "point", gradient.cols = c("white", "#2E9FDF", "#FC4E07" )) # Color individuals by groups, add concentration ellipses # Change group colors using RColorBrewer color palettes # Read more: http://www.sthda.com/english/wiki/ggplot2-colors # Remove labels: label = "none". fviz_pca_ind(res.pca, label="none", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95, palette = "Dark2") # Change group colors manually # Read more: http://www.sthda.com/english/wiki/ggplot2-colors fviz_pca_ind(res.pca, label="none", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95, palette = c("#999999", "#E69F00", "#56B4E9")) # Select and visualize some individuals (ind) with select.ind argument. # - ind with cos2 >= 0.96: select.ind = list(cos2 = 0.96) # - Top 20 ind according to the cos2: select.ind = list(cos2 = 20) # - Top 20 contributing individuals: select.ind = list(contrib = 20) # - Select ind by names: select.ind = list(name = c("23", "42", "119") ) # Example: Select the top 40 according to the cos2 fviz_pca_ind(res.pca, select.ind = list(cos2 = 40)) # Graph of variables # ++++++++++++++++++++++++++++ # Default plot fviz_pca_var(res.pca, col.var = "steelblue") # Control variable colors using their contributions fviz_pca_var(res.pca, col.var = "contrib", gradient.cols = c("white", "blue", "red"), ggtheme = theme_minimal()) # Biplot of individuals and variables # ++++++++++++++++++++++++++ # Keep only the labels for variables # Change the color by groups, add ellipses fviz_pca_biplot(res.pca, label = "var", habillage=iris$Species, addEllipses=TRUE, ellipse.level=0.95, ggtheme = theme_minimal()) # Biplot scaling modes: # Form biplot - focus on individual distances fviz_pca_biplot(res.pca, biplot.type = "form", label = "var", habillage = iris$Species) # Covariance biplot - focus on variable correlations fviz_pca_biplot(res.pca, biplot.type = "covariance", label = "var", habillage = iris$Species)
Silhouette (Si) analysis is a cluster validation approach that
measures how well an observation is clustered and it estimates the average
distance between clusters. fviz_silhouette() provides ggplot2-based elegant
visualization of silhouette information from i) the result of
silhouette(), pam(),
clara() and fanny() [in
cluster package]; ii) eclust() and hcut() [in
factoextra]. Results without silhouette information, such as one-cluster
eclust/hcut objects, are rejected with a package-level error.
Read more: Clustering Validation Statistics.
fviz_silhouette(sil.obj, label = FALSE, print.summary = TRUE, ...)fviz_silhouette(sil.obj, label = FALSE, print.summary = TRUE, ...)
sil.obj |
an object of class silhouette: pam, clara, fanny [in cluster
package]; eclust and hcut [in factoextra]. For |
label |
logical value. If true, x axis tick labels are shown |
print.summary |
logical value. If true a summary of cluster silhouettes are printed in fviz_silhouette(). |
... |
other arguments to be passed to the function ggpubr::ggpar(). |
- Observations with a large silhouhette Si (almost 1) are very well clustered.
- A small Si (around 0) means that the observation lies between two clusters.
- Observations with a negative Si are probably placed in the wrong cluster.
- Silhouette plots require at least two clusters and available silhouette widths.
a ggplot2 object.
Alboukadel Kassambara [email protected]
fviz_cluster, hcut,
hkmeans, eclust, fviz_dend
set.seed(123) # Data preparation # +++++++++++++++ data("iris") head(iris) # Remove species column (5) and scale the data iris.scaled <- scale(iris[, -5]) # K-means clustering # +++++++++++++++++++++ km.res <- kmeans(iris.scaled, 3, nstart = 2) # Visualize kmeans clustering fviz_cluster(km.res, iris[, -5], ellipse.type = "norm")+ theme_minimal() # Visualize silhouette information requireNamespace("cluster", quietly = TRUE) sil <- cluster::silhouette(km.res$cluster, dist(iris.scaled)) fviz_silhouette(sil) # Identify observation with negative silhouette neg_sil_index <- which(sil[, "sil_width"] < 0) sil[neg_sil_index, , drop = FALSE] ## Not run: # PAM clustering # ++++++++++++++++++++ requireNamespace("cluster", quietly = TRUE) pam.res <- cluster::pam(iris.scaled, 3) # Visualize pam clustering fviz_cluster(pam.res, ellipse.type = "norm")+ theme_minimal() # Visualize silhouette information fviz_silhouette(pam.res) # Hierarchical clustering # ++++++++++++++++++++++++ # Use hcut() which compute hclust and cut the tree hc.cut <- hcut(iris.scaled, k = 3, hc_method = "complete") # Visualize dendrogram fviz_dend(hc.cut, show_labels = FALSE, rect = TRUE) # Visualize silhouette information if (hc.cut$nbclust > 1) fviz_silhouette(hc.cut) ## End(Not run)set.seed(123) # Data preparation # +++++++++++++++ data("iris") head(iris) # Remove species column (5) and scale the data iris.scaled <- scale(iris[, -5]) # K-means clustering # +++++++++++++++++++++ km.res <- kmeans(iris.scaled, 3, nstart = 2) # Visualize kmeans clustering fviz_cluster(km.res, iris[, -5], ellipse.type = "norm")+ theme_minimal() # Visualize silhouette information requireNamespace("cluster", quietly = TRUE) sil <- cluster::silhouette(km.res$cluster, dist(iris.scaled)) fviz_silhouette(sil) # Identify observation with negative silhouette neg_sil_index <- which(sil[, "sil_width"] < 0) sil[neg_sil_index, , drop = FALSE] ## Not run: # PAM clustering # ++++++++++++++++++++ requireNamespace("cluster", quietly = TRUE) pam.res <- cluster::pam(iris.scaled, 3) # Visualize pam clustering fviz_cluster(pam.res, ellipse.type = "norm")+ theme_minimal() # Visualize silhouette information fviz_silhouette(pam.res) # Hierarchical clustering # ++++++++++++++++++++++++ # Use hcut() which compute hclust and cut the tree hc.cut <- hcut(iris.scaled, k = 3, hc_method = "complete") # Visualize dendrogram fviz_dend(hc.cut, show_labels = FALSE, rect = TRUE) # Visualize silhouette information if (hc.cut$nbclust > 1) fviz_silhouette(hc.cut) ## End(Not run)
Extract all the results (coordinates, squared cosine, contributions and inertia)
for the active row/column variables from Correspondence Analysis (CA) outputs.
get_ca(): Extract the results for rows and columns
get_ca_row(): Extract the results for rows only
get_ca_col(): Extract the results for columns only
get_ca(res.ca, element = c("row", "col")) get_ca_col(res.ca) get_ca_row(res.ca)get_ca(res.ca, element = c("row", "col")) get_ca_col(res.ca) get_ca_row(res.ca)
res.ca |
an object of class CA [FactoMineR], ca [ca], coa [ade4]; correspondence [MASS]. |
element |
the element to subset from the output. Possible values are "row" or "col". |
a list of matrices containing the results for the active rows/columns including :
coord |
coordinates for the rows/columns |
cos2 |
cos2 for the rows/columns |
contrib |
contributions of the rows/columns |
inertia |
inertia of the rows/columns |
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Result for column variables col <- get_ca_col(res.ca) col # print head(col$coord) # column coordinates head(col$cos2) # column cos2 head(col$contrib) # column contributions # Result for row variables row <- get_ca_row(res.ca) row # print head(row$coord) # row coordinates head(row$cos2) # row cos2 head(row$contrib) # row contributions # You can also use the function get_ca() get_ca(res.ca, "row") # Results for rows get_ca(res.ca, "col") # Results for columns# Install and load FactoMineR to compute CA # install.packages("FactoMineR") library("FactoMineR") data("housetasks") res.ca <- CA(housetasks, graph = FALSE) # Result for column variables col <- get_ca_col(res.ca) col # print head(col$coord) # column coordinates head(col$cos2) # column cos2 head(col$contrib) # column contributions # Result for row variables row <- get_ca_row(res.ca) row # print head(row$coord) # row coordinates head(row$cos2) # row cos2 head(row$contrib) # row contributions # You can also use the function get_ca() get_ca(res.ca, "row") # Results for rows get_ca(res.ca, "col") # Results for columns
Before applying cluster methods, the first step is to assess whether the data is clusterable, a process defined as the assessing of clustering tendency. get_clust_tendency() assesses clustering tendency using Hopkins' statistic and a visual approach. An ordered dissimilarity image (ODI) is shown. Objects belonging to the same cluster are displayed in consecutive order using hierarchical clustering. For more details and interpretation, see STHDA website: Assessing clustering tendency.
get_clust_tendency( data, n, graph = TRUE, gradient = list(low = "red", mid = "white", high = "blue"), seed = NULL )get_clust_tendency( data, n, graph = TRUE, gradient = list(low = "red", mid = "white", high = "blue"), seed = NULL )
data |
a numeric data frame or matrix. Columns are variables and rows are samples. Computation are done on rows (samples) by default. If you want to calculate Hopkins statistic on variables, transpose the data before. |
n |
a positive integer specifying the number of points selected from sample space and from the observed data. Must be smaller than the number of complete observations. |
graph |
logical value; if TRUE the ordered dissimilarity image (ODI) is shown. |
gradient |
a list containing three elements specifying the colors for low, mid and high values in the ordered dissimilarity image. The element "mid" can take the value of NULL. |
seed |
an integer seed for reproducibility, or NULL to use the current RNG stream. When non-NULL, the function restores the caller RNG state on exit. |
Hopkins statistic: If the value of Hopkins statistic is close to 1 (far above 0.5), then we can conclude that the dataset is significantly clusterable. The statistic is calculated using the correct formula from Cross and Jain (1982) with exponent d=D where D is the dimensionality (number of columns) of the data. Under the null hypothesis of spatial randomness, the Hopkins statistic follows a Beta(n, n) distribution.
Note on interpretation: This function returns the Hopkins statistic H
where values close to 1 indicate clusterable data. Some other packages (e.g.,
performance::check_clusterstructure) return 1-H, where values close to
0 indicate clusterability. Always check the documentation of the specific
implementation you are using.
Breaking change: factoextra uses the corrected Hopkins statistic
formula (Wright 2022). Results differ from legacy factoextra and a one-time
warning is emitted. Set options(factoextra.warn_hopkins = FALSE) to
silence the warning.
For large datasets, nearest-neighbor distances are computed with a low-memory
fallback when the full pairwise matrix would exceed
getOption("factoextra.hopkins.max_matrix_cells", 2e7) cells.
VAT (Visual Assessment of cluster Tendency): The VAT detects the clustering tendency in a visual form by counting the number of square shaped dark (or colored) blocks along the diagonal in a VAT image.
A list containing the elements:
- hopkins_stat for Hopkins statistic value
- plot for ordered dissimilarity image. This is generated using the
function fviz_dist(dist.obj).
Alboukadel Kassambara [email protected]
data(iris) # Silence the one-time compatibility warning in examples old_hopkins_warn <- getOption("factoextra.warn_hopkins") options(factoextra.warn_hopkins = FALSE) # Clustering tendency gradient_col = list(low = "steelblue", high = "white") get_clust_tendency(iris[,-5], n = 50, gradient = gradient_col) # Random uniformly distributed dataset # (without any inherent clusters) set.seed(123) random_df <- apply(iris[, -5], 2, function(x){runif(length(x), min(x), max(x))} ) get_clust_tendency(random_df, n = 50, gradient = gradient_col) options(factoextra.warn_hopkins = old_hopkins_warn)data(iris) # Silence the one-time compatibility warning in examples old_hopkins_warn <- getOption("factoextra.warn_hopkins") options(factoextra.warn_hopkins = FALSE) # Clustering tendency gradient_col = list(low = "steelblue", high = "white") get_clust_tendency(iris[,-5], n = 50, gradient = gradient_col) # Random uniformly distributed dataset # (without any inherent clusters) set.seed(123) random_df <- apply(iris[, -5], 2, function(x){runif(length(x), min(x), max(x))} ) get_clust_tendency(random_df, n = 50, gradient = gradient_col) options(factoextra.warn_hopkins = old_hopkins_warn)
Extract all the results (coordinates, squared cosine and contributions)
for the active individuals and variables from Factor Analysis of Mixed Data (FAMD) outputs.
get_famd(): Extract the results for variables and individuals
get_famd_ind(): Extract the results for individuals only
get_famd_var(): Extract the results for quantitative and qualitative variables only
get_famd( res.famd, element = c("ind", "var", "quanti.var", "quali.var", "quali.sup") ) get_famd_ind(res.famd) get_famd_var( res.famd, element = c("var", "quanti.var", "quali.var", "quali.sup") )get_famd( res.famd, element = c("ind", "var", "quanti.var", "quali.var", "quali.sup") ) get_famd_ind(res.famd) get_famd_var( res.famd, element = c("var", "quanti.var", "quali.var", "quali.sup") )
res.famd |
an object of class FAMD [FactoMineR]. |
element |
the element to subset from the output. Possible values are "ind", "var", "quanti.var", "quali.var" or "quali.sup". |
a list of matrices containing the results for the active individuals and variables, including :
coord |
coordinates of individuals/variables. |
cos2 |
cos2 values representing the quality of representation on the factor map. |
contrib |
contributions of individuals / variables to the principal components. |
Alboukadel Kassambara [email protected]
# Compute FAMD library("FactoMineR") data(wine) res.famd <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], graph = FALSE) res.famd.sup <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], sup.var = 2, graph = FALSE) # Extract the results for qualitative variable categories quali.var <- get_famd_var(res.famd, "quali.var") print(quali.var) head(quali.var$coord) # coordinates of qualitative variables # Extract the results for supplementary qualitative variable categories quali.sup <- get_famd_var(res.famd.sup, "quali.sup") print(quali.sup) head(quali.sup$coord) # coordinates of supplementary qualitative variables # Extract the results for quantitative variables quanti.var <- get_famd_var(res.famd, "quanti.var") print(quanti.var) head(quanti.var$coord) # coordinates # Extract the results for individuals ind <- get_famd_ind(res.famd) print(ind) head(ind$coord) # coordinates of individuals# Compute FAMD library("FactoMineR") data(wine) res.famd <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], graph = FALSE) res.famd.sup <- FAMD(wine[,c(1,2, 16, 22, 29, 28, 30,31)], sup.var = 2, graph = FALSE) # Extract the results for qualitative variable categories quali.var <- get_famd_var(res.famd, "quali.var") print(quali.var) head(quali.var$coord) # coordinates of qualitative variables # Extract the results for supplementary qualitative variable categories quali.sup <- get_famd_var(res.famd.sup, "quali.sup") print(quali.sup) head(quali.sup$coord) # coordinates of supplementary qualitative variables # Extract the results for quantitative variables quanti.var <- get_famd_var(res.famd, "quanti.var") print(quanti.var) head(quanti.var$coord) # coordinates # Extract the results for individuals ind <- get_famd_ind(res.famd) print(ind) head(ind$coord) # coordinates of individuals
Extract all the results (coordinates, squared cosine and
contributions) for the active individuals/quantitative variables/qualitative
variable categories/groups/partial axes from Hierarchical Multiple Factor
Analysis (HMFA) outputs.
get_hmfa(): Extract the results for variables and individuals
get_hmfa_ind(): Extract the results for individuals only
get_mfa_var(): Extract the results for variables (quantitatives, qualitatives and groups)
get_hmfa_partial(): Extract the results for partial.node.
get_hmfa( res.hmfa, element = c("ind", "quanti.var", "quali.var", "group", "partial.node") ) get_hmfa_ind(res.hmfa) get_hmfa_var(res.hmfa, element = c("quanti.var", "quali.var", "group")) get_hmfa_partial(res.hmfa)get_hmfa( res.hmfa, element = c("ind", "quanti.var", "quali.var", "group", "partial.node") ) get_hmfa_ind(res.hmfa) get_hmfa_var(res.hmfa, element = c("quanti.var", "quali.var", "group")) get_hmfa_partial(res.hmfa)
res.hmfa |
an object of class HMFA [FactoMineR]. |
element |
the element to subset from the output. Possible values are "ind", "quanti.var", "quali.var", "group" or "partial.node". |
a list of matrices containing the results for the active individuals, variables, groups and partial nodes, including :
coord |
coordinates |
cos2 |
cos2 |
contrib |
contributions |
Alboukadel Kassambara [email protected]
Fabian Mundt [email protected]
# Multiple Factor Analysis # ++++++++++++++++++++++++ # Install and load FactoMineR to compute MFA # install.packages("FactoMineR") library("FactoMineR") data(wine) hierar <- list(c(2,5,3,10,9,2), c(4,2)) res.hmfa <- HMFA(wine, H = hierar, type=c("n",rep("s",5)), graph = FALSE) # Extract the results for qualitative variable categories var <- get_hmfa_var(res.hmfa, "quali.var") print(var) head(var$coord) # coordinates of qualitative variables head(var$cos2) # cos2 of qualitative variables head(var$contrib) # contributions of qualitative variables # Extract the results for individuals ind <- get_hmfa_ind(res.hmfa) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # You can also use the function get_hmfa() get_hmfa(res.hmfa, "ind") # Results for individuals get_hmfa(res.hmfa, "quali.var") # Results for qualitative variable categories# Multiple Factor Analysis # ++++++++++++++++++++++++ # Install and load FactoMineR to compute MFA # install.packages("FactoMineR") library("FactoMineR") data(wine) hierar <- list(c(2,5,3,10,9,2), c(4,2)) res.hmfa <- HMFA(wine, H = hierar, type=c("n",rep("s",5)), graph = FALSE) # Extract the results for qualitative variable categories var <- get_hmfa_var(res.hmfa, "quali.var") print(var) head(var$coord) # coordinates of qualitative variables head(var$cos2) # cos2 of qualitative variables head(var$contrib) # contributions of qualitative variables # Extract the results for individuals ind <- get_hmfa_ind(res.hmfa) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # You can also use the function get_hmfa() get_hmfa(res.hmfa, "ind") # Results for individuals get_hmfa(res.hmfa, "quali.var") # Results for qualitative variable categories
Extract all the results (coordinates, squared cosine and
contributions) for the active individuals/variable categories from
Multiple Correspondence Analysis (MCA) outputs.
get_mca(): Extract the results for variables and individuals
get_mca_ind(): Extract the results for individuals only
get_mca_var(): Extract the results for variables only
For FactoMineR MCA results, get_mca() and get_mca_var() also
support element = "quanti.sup" for quantitative supplementary
variables and report a clean package-level error when that result is
absent.
get_mca(res.mca, element = c("var", "ind", "mca.cor", "quanti.sup")) get_mca_var(res.mca, element = c("var", "mca.cor", "quanti.sup")) get_mca_ind(res.mca)get_mca(res.mca, element = c("var", "ind", "mca.cor", "quanti.sup")) get_mca_var(res.mca, element = c("var", "mca.cor", "quanti.sup")) get_mca_ind(res.mca)
res.mca |
an object of class MCA [FactoMineR], acm [ade4], expoOutput/epMCA [ExPosition]. |
element |
the element to subset from the output. Possible values are "var" for variables, "ind" for individuals, "mca.cor" for correlation between variables and principal dimensions, and "quanti.sup" for quantitative supplementary variables in FactoMineR MCA results. |
a list of matrices containing the results for the active individuals/variable categories including :
coord |
coordinates for the individuals/variable categories |
cos2 |
cos2 for the individuals/variable categories |
contrib |
contributions of the individuals/variable categories |
inertia |
inertia of the individuals/variable categories |
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Multiple Correspondence Analysis # ++++++++++++++++++++++++++++++ # Install and load FactoMineR to compute MCA # install.packages("FactoMineR") library("FactoMineR") data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, graph = FALSE) # Extract the results for variable categories var <- get_mca_var(res.mca) print(var) head(var$coord) # coordinates of variables head(var$cos2) # cos2 of variables head(var$contrib) # contributions of variables # Extract the results for individuals ind <- get_mca_ind(res.mca) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # You can also use the function get_mca() get_mca(res.mca, "ind") # Results for individuals get_mca(res.mca, "var") # Results for variable categories quanti.sup <- get_mca(res.mca, "quanti.sup") head(quanti.sup$coord) # coordinates of quantitative supplementary variables# Multiple Correspondence Analysis # ++++++++++++++++++++++++++++++ # Install and load FactoMineR to compute MCA # install.packages("FactoMineR") library("FactoMineR") data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, graph = FALSE) # Extract the results for variable categories var <- get_mca_var(res.mca) print(var) head(var$coord) # coordinates of variables head(var$cos2) # cos2 of variables head(var$contrib) # contributions of variables # Extract the results for individuals ind <- get_mca_ind(res.mca) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # You can also use the function get_mca() get_mca(res.mca, "ind") # Results for individuals get_mca(res.mca, "var") # Results for variable categories quanti.sup <- get_mca(res.mca, "quanti.sup") head(quanti.sup$coord) # coordinates of quantitative supplementary variables
Extract all the results (coordinates, squared cosine and contributions)
for the active individuals/quantitative variables/qualitative variable categories/groups/partial axes from Multiple Factor Analysis (MFA) outputs.
get_mfa(): Extract the results for variables and individuals
get_mfa_ind(): Extract the results for individuals only
get_mfa_var(): Extract the results for variables (quantitatives, qualitatives and groups)
get_mfa_partial_axes(): Extract the results for partial axes only
get_mfa( res.mfa, element = c("ind", "quanti.var", "quali.var", "quali.sup", "group", "partial.axes") ) get_mfa_ind(res.mfa) get_mfa_var( res.mfa, element = c("quanti.var", "quali.var", "quali.sup", "group") ) get_mfa_partial_axes(res.mfa)get_mfa( res.mfa, element = c("ind", "quanti.var", "quali.var", "quali.sup", "group", "partial.axes") ) get_mfa_ind(res.mfa) get_mfa_var( res.mfa, element = c("quanti.var", "quali.var", "quali.sup", "group") ) get_mfa_partial_axes(res.mfa)
res.mfa |
an object of class MFA [FactoMineR]. |
element |
the element to subset from the output. Possible values are "ind", "quanti.var", "quali.var", "quali.sup", "group" or "partial.axes". |
a list of matrices containing the results for the active individuals/quantitative variable categories/qualitative variable categories/groups/partial axes including :
coord |
coordinates for the individuals/quantitative variable categories/qualitative variable categories/groups/partial axes |
cos2 |
cos2 for the individuals/quantitative variable categories/qualitative variable categories/groups/partial axes |
contrib |
contributions of the individuals/quantitative variable categories/qualitative variable categories/groups/partial axes |
inertia |
inertia of the individuals/quantitative variable categories/qualitative variable categories/groups/partial axes |
Alboukadel Kassambara [email protected]
Fabian Mundt [email protected]
# Multiple Factor Analysis # ++++++++++++++++++++++++ # Install and load FactoMineR to compute MFA # install.packages("FactoMineR") library("FactoMineR") data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph = FALSE) # Extract the results for qualitative variable categories var <- get_mfa_var(res.mfa, "quali.var") print(var) head(var$coord) # coordinates of qualitative variables head(var$cos2) # cos2 of qualitative variables head(var$contrib) # contributions of qualitative variables # Extract the results for individuals ind <- get_mfa_ind(res.mfa) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # You can also use the function get_mfa() get_mfa(res.mfa, "ind") # Results for individuals get_mfa(res.mfa, "quali.var") # Results for qualitative variable categories get_mfa(res.mfa, "quali.sup") # Results for supplementary qualitative variable categories# Multiple Factor Analysis # ++++++++++++++++++++++++ # Install and load FactoMineR to compute MFA # install.packages("FactoMineR") library("FactoMineR") data(poison) res.mfa <- MFA(poison, group=c(2,2,5,6), type=c("s","n","n","n"), name.group=c("desc","desc2","symptom","eat"), num.group.sup=1:2, graph = FALSE) # Extract the results for qualitative variable categories var <- get_mfa_var(res.mfa, "quali.var") print(var) head(var$coord) # coordinates of qualitative variables head(var$cos2) # cos2 of qualitative variables head(var$contrib) # contributions of qualitative variables # Extract the results for individuals ind <- get_mfa_ind(res.mfa) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # You can also use the function get_mfa() get_mfa(res.mfa, "ind") # Results for individuals get_mfa(res.mfa, "quali.var") # Results for qualitative variable categories get_mfa(res.mfa, "quali.sup") # Results for supplementary qualitative variable categories
Extract all the results (coordinates, squared cosine, contributions) for
the active individuals/variables from Principal Component Analysis (PCA) outputs.
get_pca(): Extract the results for variables and individuals
get_pca_ind(): Extract the results for individuals only
get_pca_var(): Extract the results for variables only
get_pca(res.pca, element = c("var", "ind")) get_pca_ind(res.pca, ...) get_pca_var(res.pca)get_pca(res.pca, element = c("var", "ind")) get_pca_ind(res.pca, ...) get_pca_var(res.pca)
res.pca |
an object of class PCA [FactoMineR]; prcomp and princomp [stats]; pca, dudi [adea4]; epPCA [ExPosition]. |
element |
the element to subset from the output. Allowed values are "var" (for active variables) or "ind" (for active individuals). |
... |
not used |
a list of matrices containing all the results for the active individuals/variables including:
coord |
coordinates for the individuals/variables |
cos2 |
cos2 for the individuals/variables |
contrib |
contributions of the individuals/variables |
Alboukadel Kassambara [email protected]
https://www.sthda.com/english/
# Principal Component Analysis # +++++++++++++++++++++++++++++ data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) # Extract the results for individuals ind <- get_pca_ind(res.pca) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # Extract the results for variables var <- get_pca_var(res.pca) print(var) head(var$coord) # coordinates of variables head(var$cos2) # cos2 of variables head(var$contrib) # contributions of variables # You can also use the function get_pca() get_pca(res.pca, "ind") # Results for individuals get_pca(res.pca, "var") # Results for variable categories# Principal Component Analysis # +++++++++++++++++++++++++++++ data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) # Extract the results for individuals ind <- get_pca_ind(res.pca) print(ind) head(ind$coord) # coordinates of individuals head(ind$cos2) # cos2 of individuals head(ind$contrib) # contributions of individuals # Extract the results for variables var <- get_pca_var(res.pca) print(var) head(var$coord) # coordinates of variables head(var$cos2) # cos2 of variables head(var$contrib) # contributions of variables # You can also use the function get_pca() get_pca(res.pca, "ind") # Results for individuals get_pca(res.pca, "var") # Results for variable categories
Computes hierarchical clustering (hclust, agnes, diana) and cuts the tree
into k clusters. It also accepts correlation-based distance measures
such as "pearson", "spearman" and "kendall". Direct calls require
k >= 2; helper-level one-cluster handling is implemented in callers
such as eclust() and fviz_nbclust().
hcut( x, k = 2, isdiss = inherits(x, "dist"), hc_func = c("hclust", "agnes", "diana"), hc_method = "ward.D2", hc_metric = "euclidean", stand = FALSE, graph = FALSE, ... )hcut( x, k = 2, isdiss = inherits(x, "dist"), hc_func = c("hclust", "agnes", "diana"), hc_method = "ward.D2", hc_metric = "euclidean", stand = FALSE, graph = FALSE, ... )
x |
a numeric matrix, numeric data frame or a dissimilarity matrix. |
k |
a single integer specifying the number of clusters to be generated. Must be at least 2 and smaller than the number of observations. |
isdiss |
logical value specifying whether |
hc_func |
the hierarchical clustering function to be used. Default value is "hclust". Possible values is one of "hclust", "agnes", "diana". Abbreviation is allowed. |
hc_method |
the agglomeration method to be used (?hclust) for hclust() and agnes(): "ward.D", "ward.D2", "single", "complete", "average", ... |
hc_metric |
character string specifying the metric to be used for calculating dissimilarities between observations. Allowed values are those accepted by the function dist() [including "euclidean", "manhattan", "maximum", "canberra", "binary", "minkowski"] and correlation based distance measures ["pearson", "spearman" or "kendall"]. |
stand |
logical value; default is FALSE. If TRUE, then the data will be standardized using the function scale().
Measurements are standardized for each variable (column), by subtracting the
variable's mean value and dividing by the variable's standard deviation. If
scaling produces |
graph |
logical value. If TRUE, the dendrogram is displayed. |
... |
not used. |
an object of class "hcut" containing the result of the standard function used (read the documentation of hclust, agnes, diana).
It includes also:
cluster: the cluster assignement of observations after cutting the tree
nbclust: the number of clusters
silinfo: the silhouette information of observations (available when
k > 1)
size: the size of clusters
data: a matrix containing the original or the standardized data (if stand = TRUE)
data(USArrests) # Compute hierarchical clustering and cut into 4 clusters res <- hcut(USArrests, k = 4, stand = TRUE) # Cluster assignements of observations res$cluster # Size of clusters res$size # Visualize the dendrogram fviz_dend(res, rect = TRUE) # Visualize the silhouette fviz_silhouette(res) # Visualize clusters as scatter plots fviz_cluster(res)data(USArrests) # Compute hierarchical clustering and cut into 4 clusters res <- hcut(USArrests, k = 4, stand = TRUE) # Cluster assignements of observations res$cluster # Size of clusters res$size # Visualize the dendrogram fviz_dend(res, rect = TRUE) # Visualize the silhouette fviz_silhouette(res) # Visualize clusters as scatter plots fviz_cluster(res)
The final k-means clustering solution is very sensitive to the initial random selection of cluster centers. This function provides a solution using an hybrid approach by combining the hierarchical clustering and the k-means methods. The procedure is explained in "Details" section. Read more: Hybrid hierarchical k-means clustering for optimizing clustering outputs.
hkmeans(): compute hierarchical k-means clustering
print.hkmeans(): prints the result of hkmeans
hkmeans_tree(): plots the initial dendrogram
hkmeans( x, k, hc.metric = "euclidean", hc.method = "ward.D2", iter.max = 10, km.algorithm = "Hartigan-Wong" ) ## S3 method for class 'hkmeans' print(x, ...) hkmeans_tree(hkmeans, rect.col = NULL, ...)hkmeans( x, k, hc.metric = "euclidean", hc.method = "ward.D2", iter.max = 10, km.algorithm = "Hartigan-Wong" ) ## S3 method for class 'hkmeans' print(x, ...) hkmeans_tree(hkmeans, rect.col = NULL, ...)
x |
a numeric matrix, data frame or vector |
k |
a single integer specifying the number of clusters to be generated.
Must be at least 2 and smaller than |
hc.metric |
the distance measure to be used. Possible values are "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski" (see ?dist). |
hc.method |
the agglomeration method to be used. Possible values include "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median"or "centroid" (see ?hclust). |
iter.max |
the maximum number of iterations allowed for k-means. |
km.algorithm |
the algorithm to be used for kmeans (see ?kmeans). |
... |
others arguments to be passed to the function plot.hclust(); (see ? plot.hclust) |
hkmeans |
an object of class hkmeans (returned by the function hkmeans()) |
rect.col |
Vector with border colors for the rectangles around clusters in dendrogram |
The procedure is as follow:
1. Compute hierarchical clustering
2. Cut the tree in k-clusters
3. compute the center (i.e the mean) of each cluster
4. Do k-means by using the set of cluster centers (defined in step 3) as the initial cluster centers. Optimize the clustering.
This means that the final optimized partitioning obtained at step 4 might be different from the initial partitioning obtained at step 2.
Consider mainly the result displayed by fviz_cluster().
hkmeans returns an object of class "hkmeans" containing the following components:
The elements returned by the standard function kmeans() (see ?kmeans)
data: the data used for the analysis
hclust: an object of class "hclust" generated by the function hclust()
# Load data data(USArrests) # Scale the data df <- scale(USArrests) # Compute hierarchical k-means clustering res.hk <-hkmeans(df, 4) # Elements returned by hkmeans() names(res.hk) # Print the results res.hk # Visualize the tree hkmeans_tree(res.hk, cex = 0.6) # or use this fviz_dend(res.hk, cex = 0.6) # Visualize the hkmeans final clusters fviz_cluster(res.hk, ellipse.type = "norm", ellipse.level = 0.68)# Load data data(USArrests) # Scale the data df <- scale(USArrests) # Compute hierarchical k-means clustering res.hk <-hkmeans(df, 4) # Elements returned by hkmeans() names(res.hk) # Print the results res.hk # Visualize the tree hkmeans_tree(res.hk, cex = 0.6) # or use this fviz_dend(res.hk, cex = 0.6) # Visualize the hkmeans final clusters fviz_cluster(res.hk, ellipse.type = "norm", ellipse.level = 0.68)
A data frame containing the frequency of execution of 13 house tasks in the couple. This table is also available in ade4 package.
data("housetasks")data("housetasks")
A data frame with 13 observations (house tasks) on the following 4 columns.
Wifea numeric vector
Alternatinga numeric vector
Husbanda numeric vector
Jointlya numeric vector
This data is from FactoMineR package.
library(FactoMineR) data(housetasks) res.ca <- CA(housetasks, graph=FALSE) fviz_ca_biplot(res.ca, repel = TRUE)+ theme_minimal()library(FactoMineR) data(housetasks) res.ca <- CA(housetasks, graph=FALSE) fviz_ca_biplot(res.ca, repel = TRUE)+ theme_minimal()
Map legacy FactoMineR category names to current labels
map_factominer_legacy_names( X, names, element = c("quali.var", "quali.sup", "var"), quiet = FALSE )map_factominer_legacy_names( X, names, element = c("quali.var", "quali.sup", "var"), quiet = FALSE )
X |
a FactoMineR object (MCA, MFA, FAMD, HMFA). |
names |
character vector of category labels. |
element |
element to map. Use "var" for MCA categories or "quali.var" for MFA/FAMD/HMFA qualitative categories. "quali.sup" maps supplementary qualitative categories when available. |
quiet |
if TRUE, suppress warnings. |
Character vector of mapped labels.
if (requireNamespace("FactoMineR", quietly = TRUE)) { data(poison) res.mca <- FactoMineR::MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph = FALSE) map <- factominer_category_map(res.mca, element = "var") map_factominer_legacy_names(res.mca, map$legacy_underscore[1:3], element = "var", quiet = TRUE) }if (requireNamespace("FactoMineR", quietly = TRUE)) { data(poison) res.mca <- FactoMineR::MCA(poison, quanti.sup = 1:2, quali.sup = 3:4, graph = FALSE) map <- factominer_category_map(res.mca, element = "var") map_factominer_legacy_names(res.mca, map$legacy_underscore[1:3], element = "var", quiet = TRUE) }
Data containing clusters of any shapes. Useful for comparing density-based clustering (DBSCAN) and standard partitioning methods such as k-means clustering.
data("multishapes")data("multishapes")
A data frame with 1100 observations on the following 3 variables.
xa numeric vector containing the x coordinates of observations
ya numeric vector containing the y coordinates of observations
shapea numeric vector corresponding to the cluster number of each observations.
The dataset contains 5 clusters and some outliers/noises.
data(multishapes) plot(multishapes[,1], multishapes[, 2], col = multishapes[, 3], pch = 19, cex = 0.8)data(multishapes) plot(multishapes[,1], multishapes[, 2], col = multishapes[, 3], pch = 19, cex = 0.8)
This data is a result from a survey carried out on children of primary school who suffered from food poisoning. They were asked about their symptoms and about what they ate.
data("poison")data("poison")
A data frame with 55 rows and 15 columns.
This data is from FactoMineR package.
library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = c(3,4), graph = FALSE) fviz_mca_biplot(res.mca, repel = TRUE)+ theme_minimal()library(FactoMineR) data(poison) res.mca <- MCA(poison, quanti.sup = 1:2, quali.sup = c(3,4), graph = FALSE) fviz_mca_biplot(res.mca, repel = TRUE)+ theme_minimal()
Print method for an object of class factoextra
## S3 method for class 'factoextra' print(x, ...)## S3 method for class 'factoextra' print(x, ...)
x |
an object of class factoextra |
... |
further arguments to be passed to print method |
Alboukadel Kassambara [email protected]
data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) ind <- get_pca_ind(res.pca, data = iris[, -5]) print(ind)data(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) ind <- get_pca_ind(res.pca, data = iris[, -5]) print(ind)