We used the available data that identified racially motivated activities in a particular place (city) or county. This counted out a lot of data that we might have used. The figure below identifies the data that we do have. Note, the Slaves data exists for two periods (1850 and 1860), but we replicated the 1860 data for all subsequent periods and allowed its effect to vary over time.
For each of these data sets, we have identified a set of events or average level of racially motivated actions in the county-decade. We could refer to these as \(y_{ctj}\) where \(c\) is an index for the county, \(t\) for the decade and \(j\) for the indicator. For every variable except the salve data, we do the following:
\[E(y_{ctj}|z_{ct}, \alpha_{tj}, \beta_j) = \alpha_{tj} + \beta_jz_{ct}\] where \(z_{ct}\) is the county-decade latent racially motivated human rights violations measure. The \(\alpha_tj\) term is an intercept for each of the \(j=\{1,\ldots,J\}\) indicators in each of the \(t=\{1, \ldots, T\}\) decades. This operationalizes Fariss’s idea of changing standards over time by allowing the baseline level of racism to change over time. At base, these are really just regression models of the indicator on the latent racism variable The only trick here is that the independent variable is not observed and is estimated as part of the model.
Since everything on the right-hand side of the model is estimated (even the independent variable), the model’s parameters are not jointly identified. Without some restrictions, we could not come to a unique answer. For example, increasing the size of \(\beta\) and decreasing the variance in \(z\) could result in an equally good solution. The methods for identifying these models tend to take one of two forms. We can deterministically set one of the \(\beta_j\) values and allow the variance of \(z_{ct}\) to be estimated. Alternatively, we can fix the variance of \(z_{ct}\) which will set the scale of the latent variable and put a sign restriction on one or a few \(\beta\) parameters to set the direction of the latent variable. Putting restrictions on the coefficients is easier and both solutions are equally good on both theoretical and applied grounds. We set the coefficient on the slave population in 1850 to 1. This has no substantively meaningful effect on the model aside from determining the scale of the latent variable.
We allow the coefficients on the on the slave population to change over time. We assume that the effect is equally strong in 1850 and 1860 \(\gamma_{1850} = \gamma_{1860} = 1\). Then, in subsequent years, we assume the effect dampens by a set factor: \(\gamma_t = \gamma_{t-1}\delta\) where \(\delta\sim N(.5, 10)\), but truncated to be in the unit interval.
We use MCMC to estimate the model and use priors that are generally in line with those suggested by Fariss. Most importantly, we use a random walk prior on the latent variable such that \(z_{c1} \sim N(0, 1)\) and \(z_{ct}\sim N(z_{c,t-1}, 1)\) for \(t = \{2, \ldots, T\}\). This does some smoothing out of the latent variable over time and the more sparsely populated time-periods to borrow strength from the more densely populated time-periods. This tends to smooth out big changes over time, but also reduces the posterior variability of the latent variable estimates. That said, we still see plenty of movement over time in the estimates, so the smoothing is not making the temporal changes arbitrarily small.
After the model is estimated, we can re-scale the latent variable to be in whatever range we want. In each iteration of the markov chain, we rescale the latent variable to live in the range \([0,100]\) with \(z_{ct}^{*} = z_{ct} - \text{min}(z_{ct})\), then \(z_{ct}^{*} = \frac{z_{ct}^{*}}{\text{max}(z_{ct}^{*})}\times 100\). We can then summarise these re-scaled draw and use them in our analysis.
One of the ways we can understand how the model fits, or alternatively what the latent variable means, is to consider the \(R^{2}\) of the relationship between the latent variable and indicator. Below is the table of \(R^{2}\) values. As you can see, the black over-incarceration rate and percent slaves are most closely related to the latent variable. Several other variables have interesting relationships with the latent variable as well. The way we can think about interpreting the latent variable is that it would be something in the world that is closely related to these variables. Our suggestion that it is measuring anti-Black violence.
| Description | R-squared |
|---|---|
| Black Over-incarceration (Rate) | 0.5913311 |
| Percent Slaves | 0.5144157 |
| Black Over-incarceration (Difference) | 0.4465821 |
| Radical Right Organizations | 0.4113418 |
| Urban Disturbances (Myers) | 0.3417413 |
| Urban Disturbances (Spilerman) | 0.2940047 |
| Intimidations, Reprisals and Violence 1 | 0.2376608 |
| Ku Klux Klan Chapters | 0.2242323 |
| We Charge Genocide (Police) | 0.1533274 |
| Dynamics of Contention Events | 0.1414759 |
| Dynamics of Contention Violence | 0.1063727 |
| Black Church Arsons | 0.0961252 |
| We Charge Genocide (Citizens) | 0.0775289 |
| Lynchings | 0.0656690 |
| Intimidations & Reprisals (Citizens) | 0.0446378 |
| Intimidations & Reprisals (Police) | 0.0443358 |
| Redlining | 0.0404868 |
| Black Executions | 0.0134936 |
We can think of each county in the data as having a trajectory through time. To the extent that these trajectories look similar it suggests a similar temporal pattern of improving (or worsening) conditions for Black Americans. Our goal, then, is to try to place the trajectories into a relatively small number of groups. Since it is difficult to do this for counties that have only a few data points, we consider only the counties with at least seven time-points of valid data. We will use a hierarchical clustering algorithm to do the grouping. The algorithm operates on the inter-county distances - we identify how different each county’s trajectory is from every other county’s trajectory. To do this effectively, we need to make sure that we are highlighting important differences. Differences in baseline level will be removed by centering each trajectory at zero. Differences in scale will be removed by scaling all trajectories to have variance of 1. Doing this ensures that the distances that we calculate increase as patterns change and not as a function of baseline or scale differences. We can then look at what these trajectories say about shared experiences. The figure below shows the individual trajectories and the overall patterns.
How closely do the overall smooth trajectories fit each county’s data? The figure below shows the distribution of \(R^{2}\) values from the regression of the county values regressed on the fitted values from the group-wide smooth regression (i.e., the red line in the figure above). The graph shows that with the groups that indicate consistently getting worse (“Getting Worse from 1850”) and consistently getting better (“Getting Better”), they correspond quite closely to the county trajcetories - the \(R^2\) values are mostly quite high. The groups identified as “Better then Worse” (those whose values tend to get better through around 1920 and then get worse again) and “Getting Worse after 1920” (those whose values tend to be flat until around 1920 and then get worse) have slightly more mixed results. The mode is still above .9, but there is considerable variation down through the long tail toward zero. Those identified as “Worse then Better” (whose values tend to get worse through around 1950 and then get better) have considerable variation across the range of values indicating a less good fit to the data. This suggests a group that still contains considerable heterogeneity.