4 Centrality

When we first read a graph into R, it is generally a good idea to explore it a bit to get a sense of its overall properties. The best place to start is with the graph summary summary. We can use Padgett’s data on Florentine marriages to explore the analysis of graphs using igraph.

require(igraph)
flo <- read.table("data/flo.txt", 
                  header=TRUE, row.names=1)
gflo <- graph_from_adjacency_matrix(as.matrix(flo), mode="undirected")
gflo

IGRAPH dba1057 UN-- 16 20 -- 
+ attr: name (v/c)
+ edges from dba1057 (vertex names):
 [1] Acciaiuoli--Medici       Albizzi   --Ginori       Albizzi   --Guadagni    
 [4] Albizzi   --Medici       Barbadori --Castellani   Barbadori --Medici      
 [7] Bischeri  --Guadagni     Bischeri  --Peruzzi      Bischeri  --Strozzi     
[10] Castellani--Peruzzi      Castellani--Strozzi      Guadagni  --Lamberteschi
[13] Guadagni  --Tornabuoni   Medici    --Ridolfi      Medici    --Salviati    
[16] Medici    --Tornabuoni   Pazzi     --Salviati     Peruzzi   --Strozzi     
[19] Ridolfi   --Strozzi      Ridolfi   --Tornabuoni

The summary of our graph provides a great deal of information, even if it is a bit terse. The first piece of information is that it is, in fact, an igraph object. Following iGraph indicator, we get a one-to-four letter code indicating what type of graph it is. For the Florentine marriage graph, it is of type UN--, which means that it is undirected (U) and named (N). If our graph was weighted, there would be a third letter W and if it were bipartite, the fourth letter would be B. As gflo is neither weighted nor bipartite, the code is only two letters, UN. Following this two-letter code, we get a summary of the number of vertices and edges in the graph, +16 20. There are 16 vertices and 20 edges in gflo. Finally, we see that there is one attribute contained in the graph, + attr: name (v/c). This means that there is a vertex attribute (v) called “name” which is of type character (c).

Given that there are a reasonably small number of vertices and (especially) edges, another good idea to to visualize the graph. In general, it is always a good idea to do this. However, if the graph is really big and has lots of edges, the visualization may take some time so you shouldn’t do it casually. For the Florentine marriages, we have only 16 vertices and 20 edges, so plotting is simple.

plot(gflo, vertex.color="skyblue2", vertex.label.family="Helvetica")

One of the first things that we discover when we visualize the Florentine marriage network is that one family, Pucci, is an isolate. There were no marriage ties between the Puccis and any of the other 15 families in Padgett’s sample.

Because the graph is so small, it’s easy to see that Pucci is an isolate. In other cases, it may not be so obvious. There are a number of things we can do to study the overall connectedness of a graph. The first is to ask whether the graph is, in fact, connected. For a graph \(\mathcal{G}\) to be connected, there must be a path between all vertices in \(\mathcal{G}\). If the graph is not connected, we might want to know how many distinct components there are. A component is a maximally connected subgraph of a graph (i.e., a path exists between all vertices in the subgraph). We can find the components of our graph using the somewhat unfortunately named igraph function clusters(). The name is unfortunate because “cluster” has many meanings in both network analysis and beyond, but we work with the function names available to us.

is_connected(gflo)

[1] FALSE

components(gflo)

$membership
  Acciaiuoli      Albizzi    Barbadori     Bischeri   Castellani       Ginori 
           1            1            1            1            1            1 
    Guadagni Lamberteschi       Medici        Pazzi      Peruzzi        Pucci 
           1            1            1            1            1            2 
     Ridolfi     Salviati      Strozzi   Tornabuoni 
           1            1            1            1 

$csize
[1] 15  1

$no
[1] 2

We can see that gflo is not connected (since we know from visual inspection that Pucci is an isolate). There are two components in gflo: the main component that includes 15 of the 16 families and then a component of one (Pucci), also known as an isolate.

Knowing that the graph is not connected, we can use the command decompose.graph() to extract the elements. In this case, we really only care about the first component, since the second of the two components is simply an isolate.

gflo1 <- decompose(gflo)[[1]]
## could also do this: gflo1 <- induced_subgraph(gflo, subcomponent(gflo,1))
is_connected(gflo1)

[1] TRUE

plot(gflo1, vertex.color="skyblue2",vertex.label.family="Helvetica")

The number of unordered pairs of vertices in a graph of size \(n\) is \(n(n-1)/2\). This means that the density of edges in a graph is simply given by the ratio of the number of observed edges to the number of possible edges, \(2e/n(n-1)\), where \(e\) is the number of edges in the graph.

## note we're working with the matrix, not the graph object here
n <- dim(flo)[1]
2*sum(apply(flo,1,sum)/2)/(n*(n-1))

[1] 0.1666667

Here, we summed along the rows of the matrix and then summed the resulting vector to get the number of elements in the sociomatrix. We then divided by two since each marriage is represented twice in the sociomatrix (since it is an undirected relation). Taking out the Pucci isolate, we get a slightly different density. We can use the igraph function graph.density() to reassure ourselves that this is, in fact, the right value.

n <- n-1
2*sum(apply(flo,1,sum)/2)/(n*(n-1))

[1] 0.1904762

graph.density(gflo1)

Warning: `graph.density()` was deprecated in igraph 2.0.0.
ℹ Please use `edge_density()` instead.

[1] 0.1904762

4.1 The Different Flavors of Centrality

In his classic essay, Freeman (1978) lays out the different notions of centrality:

degree centrality: captures the idea that individuals with many contacts are central to a structure. This measure is calculated simply as the degree of the individual actor.
closeness centrality: captures the idea that a central actor will be close to many other actors in the network. Closeness is measured by the geodesics between actor \(i\) and all others
betweenness centrality: captures the idea that high-centrality individuals should be on the shortest paths between other pairs of actors in a network. The betweenness of actor \(i\) is simply the fraction of all geodesics in the graph on which \(i\) falls.

4.1.1 Degree

For non-directed graphs, degree centrality of vertex \(i\) is simply the sum of edges incident to \(v_i\):

\[ C_D(v_i) = d(v_i) = \sum_j x_{ij} = \sum_j x_{ji}. \]

Note that while we call this a “centrality” measure, it is simply the degree of node \(i\) This observation gets the fundamental confounding of degree-based and centrality-based measures of social structure discussed in Salathé and Jones (2010). Sometimes this measure will be standardized by the size of the graph \(n\): \(C'_D(v_i) = d(v_i)/(n-1)\), where we subtract one from the size of the graph to account for the actor itself.

4.1.2 Closeness

For non-directed graphs, closeness centrality of vertex \(i\) is the inverse of the sum of geodesics between \(v_i\) and \(v_j,~~j\neq i\):

\[ C_C(v_i) = \left[ \sum_{j=1}^n d(v_i,v_j)\right]^{-1}, \]

where \(d(v_i, v_j)\) is the distance (measured as the minimum distance or geodesic) between vertices \(i\) and \(j\). Note that if any \(j\) is not reachable from \(i\), \(C_C(v_i)=0\) since the distance between \(i\) and \(j\) is infinite! This means that we often want to restrict our measurements of centrality to connected components of a graph. To standardize, we multiply by \(n-1\), the number of vertices not including \(i\): \(C'_C(v_i) = (n-1) C_C(v_i)\).

4.1.3 Betweenness

For non-directed graphs, betweenness centrality of vertex \(i\) is the fraction of all geodesics in the graph on which \(i\) lies:

\[ C_B(v_i) = \sum_{j<k} \frac{g_{jk}(v_i)}{g_{jk}}, \]

where \(g_{jk}\) is the number of geodesics linking actors \(j\) and \(k\) and \(g_{jk}(v_i)\) is the number of geodesics linking actors \(j\) and \(k\) that contain actor \(i\). It can be standardized by dividing by the number of pairs of actors not including \(i\), \((n-1)(n-2)/2\): \(C'_B(v_i) = C_B(v_i)/[(n-1)(n-2)/2]\).

Another notion of centrality is that a central person is someone who knows people who know a lot of people. This idea can be captured using a measure known as information centrality. The calculation of information centrality is a bit more complicated than for the other three measures and requires some linear algebra. Start with the sociomatrix \(\mathbf{X}\). From this, we calculate an intermediate matrix \(\mathbf{A}\). For a binary relation, \(a_{ij}=0\) if \(x_{ij}=1\) and \(a_{ij}=1\) if \(x_{ij}=0\) for \(i \neq j\) (that is, the non-diagonal elements of \(\mathbf{A}\) are the complements of their values in \(\mathbf{X}\)). The diagonal elements of \(\mathbf{A}\) (\(a_{ii}\)) are simply the degree of vertex \(i\) plus one, \(a_{ii} = d(v_i)+1\). Once we have \(\mathbf{A}\), we invert it yielding a new matrix \(\mathbf{C}=\mathbf{A}^{-1}\). We then calculate \(T\), the trace of \(\mathbf{C}\), which is simply the sum of its diagonal elements and \(R\) which is one of the row sums of \(\mathbf{C}\) (they are all the same). Information centrality is then simply

\[ C_I(v_i) = \frac{1}{d(v_i) + (T - 2R)/n)}, \]

where, as usual, \(d(v_i)\) is the degree of vertex \(i\) and \(n\) is the size of the graph.

There is no implementation of information centrality in igraph. This can be calculated either in the package sna or in my function infocentral.R. This function takes a sociomatrix as its only argument. It assumes that the matrix is binary.

infocentral <- function(X){
  ## assumes binary relation
  k <- dim(X)[1]
  A <- matrix(as.numeric(!X),nr=k,nc=k)
  diag(A) <- apply(X,1,sum)+1
  C <- solve(A)
  T <- sum(diag(C))
  R <- apply(C,1,sum)[1]
  ic <- 1/(diag(C) + (T - 2*R)/k)
  return(ic)
}

4.1.4 Eigenvalue Centrality

Yet another approach to centrality was suggested by Bonacich (1972) He suggests that the eigenvectors of the sociomatrix are a fruitful way of thinking about centrality. As with information centrality, the eigenvector approach captures the idea that central people will have well-connected alters but that the relative importance of these alters falls off with distance from ego. While the eigenvector (preferably the dominant one) of the sociomatrix is an excellent measure of centrality, the metric Bonacich (1987) suggests is actually a bit more complex:

\[ C(\alpha,\beta) = \alpha(\mathbf{I} - \beta \mathbf{X})^{-1} \mathbf{X}\, \mathbf{1}, \]

where \(\alpha\) is a parameter, \(\beta\) measures the extent to which an actor’s status is a function of the statuses of its alters decay of influence from the focal actor, \(\mathbf{I}\) is an identity matrix of the same rank as the sociomatrix \(\mathbf{X}\), and \(\mathbf{1}\) is a column vector of ones.

The size of \(\beta\) determines the degree to which Bonacich centrality is a measure of local or global centrality. When \(\beta=0\), only an actor’s direct ties are taken into account – Bonacich centrality becomes proportional to degree centrality. However, when \(\beta>0\), an actor’s alters’ ties are also taken into account. The larger the value of \(\beta\), the more distant ties will matter. It is also possible for \(\beta\) to be less than zero. In this case, being connected to powerful alters who themselves have many alters negatively affects an actor’s status. This seemingly odd situation captures the effect observed in bargaining experiments performed on networks by Cook et al. (1983), where an individual’s ability to negotiate a favorable outcome is lessened when he or she must bargain with powerful, well-connected alters.

4.1.5 Comparing Centralities

Consider the centrality measures on Padgett’s Florentine marriage data. For these analyses, we will take out the Pucci family since they are an isolate. For the standard centrality measures discussed by Freeman, Pucci will have a score of zero. Information and eigenvalue centrality only work on connected graphs.

# need a matrix
# remove Pucci
flo1 <- flo[-12,-12]
ic <- infocentral(flo1)
CP <- abs(power_centrality(gflo1))
CE <- eigen_centrality(gflo1)$vector

flo_measures <- cbind(CD=degree(gflo1),
                CB=round(betweenness(gflo1),1),
                CC=round(closeness(gflo1),2),
                CI=round(ic,2),
                CP=round(CP,2),
                CE=round(CE,2))
              
dimnames(flo_measures)[[1]] <- dimnames(flo1)[[1]]
flo_measures

             CD   CB   CC   CI   CP   CE
Acciaiuoli    1  0.0 0.03 0.55 0.37 0.31
Albizzi       3 19.3 0.03 0.83 2.02 0.57
Barbadori     2  8.5 0.03 0.76 1.47 0.49
Bischeri      3  9.5 0.03 0.83 0.00 0.66
Castellani    3  5.0 0.03 0.79 1.29 0.60
Ginori        1  0.0 0.02 0.48 1.84 0.17
Guadagni      4 23.2 0.03 0.92 0.18 0.67
Lamberteschi  1  0.0 0.02 0.51 0.00 0.21
Medici        6 47.5 0.04 1.06 0.55 1.00
Pazzi         1  0.0 0.02 0.39 0.00 0.10
Peruzzi       3  2.0 0.03 0.78 0.55 0.64
Ridolfi       3 10.3 0.04 0.90 1.29 0.79
Salviati      2 13.0 0.03 0.60 0.18 0.34
Strozzi       4  9.3 0.03 0.88 0.18 0.83
Tornabuoni    3  8.3 0.03 0.90 1.10 0.76

Plot the graph one last time with the vertices sized according to betweenness centrality.

plot(gflo1, vertex.color="plum", vertex.size=flo_measures[,"CB"]+1, vertex.label.family="Helvetica", vertex.label.cex=0.5)

While centrality measures capture different notions of centrality, power, prestige, etc., they are generally fairly highly correlated. This said, some measures can be quite divergent. We can see this with eigenvalue centralities of the Florentine families. The Medici are clearly central to this marriage network. However, because they are so dominant, their alters do not have as many connections as they do. As a result, the Medici have a low eigenvalue centrality, but the Albizzi do quite well.

## correlation matrix of the centralities
cor(flo_measures)

            CD         CB        CC        CI          CP         CE
CD  1.00000000 0.84392220 0.7463380 0.9207909 -0.02813901 0.92473505
CB  0.84392220 1.00000000 0.6592939 0.7143692  0.01003666 0.66445276
CC  0.74633802 0.65929385 1.0000000 0.8238569  0.12584573 0.83955267
CI  0.92079092 0.71436925 0.8238569 1.0000000  0.14762599 0.97762806
CP -0.02813901 0.01003666 0.1258457 0.1476260  1.00000000 0.06943349
CE  0.92473505 0.66445276 0.8395527 0.9776281  0.06943349 1.00000000

## vertex size proportional to eigenvalue centrality
plot(gflo1, vertex.size=flo_measures[,"CE"]*10, vertex.color="plum",
     vertex.label.family="Helvetica")

We can redo our comparative analysis of centralities with Kapferer’s tailor shop.

A <- as.matrix(read.table("data/kapferer-tailorshop1.txt", 
                          header=TRUE, row.names=1))
G <- graph_from_adjacency_matrix(A, mode="undirected", diag=FALSE)
plot(G,vertex.shape="none", vertex.label.cex=0.75, 
     vertex.label.family="Helvetica", edge.color=grey(0.85))

Calculate the centralities.

c <- infocentral(A)
CP <- abs(power_centrality(G))
CE <- eigen_centrality(G)$vector

kap_measures <- cbind(CD=degree(G),
                CB=round(betweenness(G),1),
                CC=round(closeness(G),2),
                CI=round(c,2),
                CP=round(CP,2),
                CE=round(CE,2))
dimnames(kap_measures)[[1]] <- dimnames(A)[[1]]
kap_measures

          CD    CB   CC   CI   CP   CE
KAMWEFU    4   0.2 0.01 1.94 0.66 0.20
NKUMBULA   5   0.0 0.01 2.19 1.95 0.32
ABRAHAM   13  25.3 0.01 3.02 1.49 0.59
SEAMS      9   8.7 0.01 2.73 0.91 0.48
CHIPATA    5   1.4 0.01 2.16 0.28 0.24
DONALD     6   0.2 0.01 2.39 1.66 0.40
NKOLOYA    6   3.7 0.01 2.36 0.90 0.33
MATEO      3   1.2 0.01 1.70 1.97 0.18
CHILWA     9  17.8 0.01 2.73 0.08 0.40
CHIPALO    1   0.0 0.01 0.79 0.94 0.06
LYASHI    15  43.1 0.02 3.13 0.60 0.68
ZULU      14  68.4 0.02 3.10 0.80 0.64
HASTINGS  10  16.6 0.01 2.85 1.56 0.51
LWANGA     8  11.3 0.01 2.67 0.46 0.37
NYIRENDA   5   7.2 0.01 2.22 1.39 0.28
CHISOKONE 24 155.9 0.02 3.48 0.44 1.00
ENOCH      2   0.0 0.01 1.33 0.70 0.14
PAULOS     7  20.8 0.01 2.52 1.33 0.30
MUKUBWA   17  67.1 0.02 3.29 1.28 0.81
SIGN       1   0.0 0.01 0.69 1.18 0.01
KALAMBA    8  15.5 0.01 2.69 1.02 0.44
ZAKEYO     1   0.0 0.01 0.73 0.88 0.02
BEN        7  44.6 0.01 2.39 0.74 0.24
IBRAHIM   11  25.5 0.01 2.93 0.01 0.51
MESHAK     4   1.2 0.01 1.99 1.04 0.24
ADRIAN     2   0.0 0.01 1.33 0.36 0.15
KALUNDWE   5  40.0 0.01 2.00 1.05 0.14
MPUNDU     9  23.9 0.01 2.74 0.49 0.31
JOHN       9  12.0 0.01 2.75 0.20 0.43
JOSEPH    10  11.4 0.01 2.83 0.43 0.51
WILLIAM   10  14.3 0.01 2.86 0.05 0.50
HENRY     14  37.0 0.02 3.10 1.08 0.64
CHOBE     10  19.8 0.01 2.83 0.76 0.42
MUBANGA   14  54.6 0.02 3.11 0.24 0.67
CHRISTIAN  8   5.7 0.01 2.64 1.41 0.37
KALONGA   10   9.8 0.01 2.82 0.63 0.47
ANGEL      6   0.4 0.01 2.36 1.18 0.35
CHILUFYA   9   7.1 0.01 2.73 0.54 0.45
MABANGE    5   1.2 0.01 2.18 1.12 0.25

cor(kap_measures)

           CD         CB         CC         CI         CP         CE
CD  1.0000000  0.8318384  0.7356737  0.8838023 -0.2407085  0.9732954
CB  0.8318384  1.0000000  0.7572213  0.5650055 -0.2006116  0.7385235
CC  0.7356737  0.7572213  1.0000000  0.4892835 -0.1084082  0.7165777
CI  0.8838023  0.5650055  0.4892835  1.0000000 -0.2047442  0.8989130
CP -0.2407085 -0.2006116 -0.1084082 -0.2047442  1.0000000 -0.1795482
CE  0.9732954  0.7385235  0.7165777  0.8989130 -0.1795482  1.0000000

Sade (1972) grooming matrix from Cayo Santiago rhesus macaques. We will illustrate automatic coloring of vertices here.

rhesus <- read.table("data/sade1.txt", skip=1, header=FALSE)
rhesus <- as.matrix(rhesus)
rhesus

      V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
 [1,]  0  0  0  0  0  0  0  3  4   0   0   0   0   0   0   0
 [2,]  0  0  0  0  0  0  1  0  0   0   0   0   0   0   0   0
 [3,]  0  0  0 15  8  0  2  0  2   1   9   3   1   0   4   2
 [4,] 17  0  5  0  0  0  8  0  1   0   4   0   3   1   0   0
 [5,]  0  0  5  0  0  0  2  1  0   2   0   0   0   0   0  11
 [6,]  0  0  0  0  0  0  0  2  2   1   0   3   1   3   0   0
 [7,]  0  0  0 11  1  0  0  0  0   0   0   0   0   0   0   0
 [8,] 49  3  0  0  0  1  0  0 41   3   2   1   6   9   0   0
 [9,] 25  0  0  0  0  0  0  8  0   5   1   9   2  21   2   0
[10,]  0  0  0  0  0  0  0  4  6   0   0   4   0   1   2   6
[11,]  1  0  5  3  0  0  0  4  4   2   0   8   5  11  16   0
[12,]  5  0  1  0  0  5  0  9  7   4   1   0   0  10   1   0
[13,]  0  0  2  1  0  0  3  3 24   0   4   5   0  25   2   1
[14,]  0  1  0  0  0  4  0  6 23   0   4  13   2   0   0   1
[15,]  1  0  2  0  0  0  0  0  9   2  21   3   4   5   0   0
[16,]  0  3  0  0  1  0  1  1  1   8   1   0   5   2   1   0

nms <- c("066", "R006", "CN", "ER", "CY", "EC", "EZ", "004", "065", "022", "076", "AC", "EK", "DL", "KD", "KE")
sex <- c(rep("M",7), rep("F",9))
dimnames(rhesus)[[1]] <- nms
dimnames(rhesus)[[2]] <- nms
grhesus <- graph_from_adjacency_matrix(rhesus, weighted=TRUE)
V(grhesus)$sex <- sex

rhesus.layout <- layout_with_kk(grhesus)
plot(grhesus, 
     edge.width=log10(E(grhesus)$weight)+1, 
     edge.arrow.width=0.5,
     edge.color=grey(0.85),
     vertex.shape="none",
     vertex.label=V(grhesus)$name, 
     vertex.label.family="Helvetica",
     vertex.label.color=as.numeric(V(grhesus)$sex=="F")+5, 
     layout=rhesus.layout)

The layout shows nicely that this is a classic female-philopatric species. Note that males are all laid out on the periphery of the graph.

Now we can look at centrality.

c <- infocentral(rhesus)
CP <- abs(power_centrality(grhesus))
CE <- eigen_centrality(grhesus)$vector

rh_measures <- cbind(CD=degree(grhesus),
               CB=round(betweenness(grhesus),1),
               CC=round(closeness(grhesus),2),
               CI=round(c,2),
                CP=round(CP,2),
                CE=round(CE,2))
dimnames(rh_measures)[[1]] <- nms
rh_measures

     CD   CB   CC    CI   CP   CE
066   8  3.5 0.01  4.95 0.31 0.66
R006  4  2.7 0.01  1.52 0.28 0.03
CN   16 17.8 0.03 11.33 0.93 0.13
ER   11  4.0 0.02 10.80 2.17 0.17
CY    8 27.5 0.02  8.78 2.07 0.03
EC    9 25.5 0.03  6.76 0.54 0.11
EZ    8 14.9 0.02  6.89 0.22 0.04
004  19 60.6 0.03 13.19 0.18 0.88
065  20  6.5 0.02 12.37 0.01 1.00
022  15 24.5 0.02  9.10 1.33 0.20
076  19 11.2 0.02 11.90 0.69 0.37
AC   18 17.6 0.02 11.08 1.23 0.45
EK   19 37.7 0.03 12.28 1.06 0.51
DL   18 32.8 0.03 11.71 0.41 0.73
KD   15  9.3 0.02 11.33 0.55 0.29
KE   15 48.5 0.03  9.31 0.50 0.09

cor(rh_measures)

           CD          CB          CC        CI          CP         CE
CD  1.0000000  0.39329155  0.54424110 0.9008141 -0.11175161  0.6298551
CB  0.3932916  1.00000000  0.75733693 0.3954524 -0.07372445  0.1758193
CC  0.5442411  0.75733693  1.00000000 0.6097230 -0.01358132  0.1037271
CI  0.9008141  0.39545240  0.60972296 1.0000000  0.15770410  0.4977592
CP -0.1117516 -0.07372445 -0.01358132 0.1577041  1.00000000 -0.4277270
CE  0.6298551  0.17581933  0.10372712 0.4977592 -0.42772703  1.0000000

Plot vertex size according to betweenness centrality.

plot(grhesus, edge.arrow.size=0.5, 
     edge.color=grey(0.85),
     vertex.size=rh_measures[,"CB"]/1.5, 
     vertex.color="plum", 
     vertex.frame.color="plum",
     vertex.label=NA, 
     layout=rhesus.layout)

It is somewhat surprising that the individual with the highest betweenness is 004, who is somewhat peripheral to the core.