# Social Network Analysis (week two)

**Reasons for creating network models:**

- they can represent a complex network(s) in a simple way
- they allow you to derive properties mathematically
- they allow you to predict properties and outcomes

**Erdös-Renyi (ER) model**

Assumptions

- nodes connect at random
- network is undirected

Key parameters (besides number of nodes, N):

- p = probability that any two nodes share an edge
- m = total number of edges in the graph

Degree distribution: (N,p)-model

- with probability p we add the edge
- with probability (1-p) we don’t
- probabilities must sum to 1
- the average degree, being (N-1)*p will increase as you increase N
- the average degress, z, equals (n-1) * p
- the binomial distribution gives us the probability that a node has degree k
- the maximum degree of a node in a simple (no multiple edges between the same two nodes) N node graph is (N-1); a node can at most connect to each of the (N-1) other nodes once
- the binomial coefficient is calculated as: the number of ways of choosing k items out of (n-1) = (n-1)! / k! ((n-1) - k)!
- variance in degree is calculated as (n-1) * p * (1-p)
- when p is very small, you switch to the Poisson approximation
- when n is vey large, you switch to the Normal approximation

ER graphs do not generate hubs – there are no “popular” nodes. This happens because the network is undirected, allowing for paths between nodes to be mutual.

The average degree z at which the giant component in an ER random graph starts to emerge is 1.

The percolation threshold of a 2D lattice – the fraction of sites that need to be occupied in order for a giant connected component to emerge – is roughly ½ (occupation probability).

If you have 2 large-components each occupying roughly ½ of the graph, each edge will join the two components with probably ½. Therefore, it typically takes 1-4 edge additions for the 2 large-components to join.

Average shortest path:

- If average degree of each node is z, then the number of nodes at distance l is z^l
- The average shortest path is calculated as N (with distance l) = z^l
- The average shortest path scales at log(N)

Relative to ER, the introduction mode has:

- More closed triads
- Longer average shortest path
- More uneven degree
- Smaller giant component at low p

In the introduction model (included in the Random Graphs Netlogo simulation), a lot of triads are formed as your friend introduces you to their friend. The average shortest path is a bit longer, because each link doesn’t have a chance to bridge completely random parts of the network, instead many are tied up locally in a triad. Finally, because people with more connections get more introductions, the degree distribution is a bit more uneven.

Relative to ER, the static geographical model has:

- A longer average shortest path

The model creates geographically localized ties, making the average shortest path longer, because there are no shortcuts from one end of the network to the other. The giant component also ends up being smaller, while the degree distribution is narrower because each node is aiming for the same number of neighbors.

Relative to ER, the random encounters mode has:

- More closed triads
- Longer average shortest path

The geographic locality of the links adds more triads (a neighbor of a neighbor is likely to be a neighbor) and lengthens the average shortest path while shrinking the giant component.

Relative to ER, the growth model has:

- More nodes with degree 1

The growth model has more younger nodes with degree 1, while the older nodes are more likely to sit in the smaller giant component (links are unevenly distributed and are allocated to nodes that are there to receive them).

**Power Law Degree Distribution**

- Graphed on a log-log scale so that all values are seen. When a power law distribution is graphed on a linear scale, the larger values are unnoticeable, and look as if they had zero nodes.
- Used in networks with hubs / highly-connected nodes (large degree)
- Power law distribution is represented by ln(p(k))=c−αln(k)
- c is the normalization constant, and α is power law exponent. Exponentiate both sides to get that p(k), the probability of observing an node of degree k is given by p(k)=Ck−α
- As the exponent α (alpha) increases, the downward slope of the line on a log-log plot becomes steeper.
- The slope of the line is -α . Therefore the slope grows steeper. Once it exceeds ~3, it’s no longer a truly heavy-tailed distribution.
- There are two ingredients in generating power-law networks
- Nodes appear over time (growth)
- Growth of random networks
- New nodes join overtime and have no preference; the only restriction is that when a new node joins, it must link with another node. Older nodes will on average have more connections and so a higher degree.
- As a node is born (with time t), that node selects m other nodes at random to connect to. At time t there are t nodes. The degree of a node born at time I growths as: dk
*i*(t) / dt= m / t - To get the number of nodes a node accumulates since its birth at time i until time t we integrate the above expression to get k
*i*(t) = m+mlog(t / i). - On average k
*i*(t) > k*j*(t), if i<j. The outcome will show that older nodes on average have more edges. - Preferential attachment / Cumulative attachment: nodes prefer to attach to nodes with many connections.
- Price’s preferential attachment model for citation networks
- First attempt to show preferential attachment
- Probably of attachment with degree K is proportional to K + 1 (the +1 prevents a code start problem, should a node have 0 degree)
- Power law with exponent α = 2 + 1/m
- Barabasi-Albert model
- The BA model was developed in an effort to explain the skewed degree distribution of the World Wide Web. Given that newer websites link to older websites (older content), making it difficult for the newer website to catch up (aka acquire lots of links from other websites). Having links determine the hubs is not entirely realistic and so there are ways to adjust for other differences (better web design, upkeep, reputation, etc).
- Each node connects to other nodes with probability proportional to their degree
- The process starts with some initial subgraph of a connected graph
- Each new node comes in with m edges
- The probability of connecting to node i is proportional to the degree of i (higher degree nodes will be chosen by new nodes)
- Result in power-law with exponent α = 3, P(k) = 2m^2 / k^3

*Above are notes I’ve taken to keep track of the material covered in the Social Network Analysis course I am taking on Coursera. Notes are derived from week-based video lectures and supplemental information I’ve found online to help clarify certain concepts. If you are taking this course, most of the information below will be found in Week 2 associated material. If you are not taking the course, I hope you find the information below helpful in your search*