Introduction to networks and network data collection
- Background to the study of social structure
- Contextualizing SNA and relational data
- Network datasets
- Network data
- Ethics and sensitivity
- Cleaning network data (is miserable)
- Data flexibility
Back to main page.
Background to the study of social structure
Social-network analysis (SNA) has a complicated genealogy, but has very clear roots in the tradition of British social anthropology. This can be seen in Radcliffe-Brown’s structural-functionalism and especially the work of the Manchester School ethnographers like Elizabeth Bott and John Barnes.
- Make a scientific study of sociality by generalizing the particular: focus on systems, rather than culture
- Social function: how a social behavior is related to the overall structure
- That A <–> B is not important
- How it forms a component of the whole structure
- Thus, behaviors may be particular, but structure is somewhat predictable.
- Study of social structure is not a deviation from natural science: physics (atomic); chemistry (molecular); biology (cellular); anthropology/ethology (social)
Contextualizing SNA and relational data
- Derived from structural-functionalist anthropology, but utilized in many fields
- Can measure network structure and/or simulate activity within a network
- The SNA endeavor can be highly computational AND highly ethnographic
Methodological characteristics | Data categories | Description level | Data-collection approach |
---|---|---|---|
Quantitative | Variable | Individual | Survey |
Network | Relational | Relationship | Survey / Ethnographic techniques / Observation |
Qualitative | Ideational | Group | Open interviews / Observation |
Network datasets
- Typically (but not obiligatorily) a field-collected “network” dataset is:
- ego-centric (as opposed to a complete sociomatrix)
- sampled
- cross-sectional
- Thus, a sociomatrix (below) is rarely useful.
ego | alter1 | alter2 | alter3 | alter4 | alter5 | |
---|---|---|---|---|---|---|
ego | - | 1 | 1 | 1 | 1 | 1 |
alter1 | 1 | - | 1 | 1 | 0 | 1 |
alter2 | 1 | 1 | - | 0 | 0 | 0 |
alter3 | 1 | 1 | 0 | - | 1 | 0 |
alter4 | 1 | 0 | 0 | 1 | - | 1 |
alter5 | 1 | 1 | 0 | 0 | 1 | - |
- These characteristics reflect the tradeoffs of real-world data collection.
- Their affect on data visualization and analysis must be strategized at the outset of data collection.
Network data
Let’s assume you are going to collect “typical” network data collected in an ethnographic context: ego-centric networks
You’ll need to:
- Construct an appropriate set of name-generator questions
- Select necessary/elicitable information about the contacts (i.e., name-interpreter questions)
- Elicit alter-alter relationships for ego’s alters if you are collecting more than a minimal ego-network
Problems to consider
What if:
- You cannot get full names of contacts?
- Participants don’t know much about their contacts?
- You don’t know what kinds of contacts are meaningful?
Setting up surveys
- Network data collection can actually be a fairly messy endeavor
- Elicitation of many alters, alter-alter contacts, and name-interpreter data can be very burdensome of subjects (and researchers!)
- Setting up your surveys to streamline the data prep step will save your sanity
- Different research objectives call for very different set-ups.
Let’s look at a couple of examples…
Highly structured
Less structured
Ethics and sensitivity
- Relational data present a special case for field and data-care ethics.
- You might be asking about senstive behaviors (sex, drugs, places they go, power dynamics)
- You are asking participants to talk about other people’s behaviors
- You have to store identifiable information
- As with spatial data, there is a substantial risk of deductive disclosure with relational data
- Thinking like an ethnographer can be useful for improving reliability of responses
- Trust participants’ ability to understand your research goals
- Show and tell them what you’re doing & why
Cleaning network data (is miserable)
- From your sureys and fieldnotes, you will need to create:
- An edgelist: A dataset of ties and relationship characteristics
- A vertexlist: A dataset of vertex characteristics
## ego.id alter.id relation
## 1 ETG-001-01 ETG-020-02 2
## 2 ETG-001-01 OKA-013-02 3
## 3 ETG-002-01 SP-005-02 2
## 4 ETG-002-01 ETG-015-02 2
## 5 ETG-002-01 SP-109-02 3
## 6 ETG-002-01 SP-047-02 3
## id sex village region age.range tribe current
## 1 BANG-001-01 1 BANG 2 2 1 3
## 2 BANG-002-01 1 BANG 2 4 9 1
## 3 BANG-003-01 1 NDW 2 1 2 1
## 4 BANG-004-01 1 BANG 2 2 2 777
## 5 BANG-005-01 1 BANG 2 3 2 1
## 6 BANG-006-01 1 BANG 2 2 2 1
- Common time-consuming data-cleaning steps
- Entity resolution
- Organizing name-generator questions
Data flexibility
- Network dataframes can be converted into matrices for different types of analyses.
Log-linear models of homophily