-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathfeed.r.xml
3517 lines (2779 loc) · 965 KB
/
feed.r.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>free range statistics - R</title>
<description>Posts categorised as 'R'</description>
<link>https://freerangestats.info</link>
<atom:link href="https://freerangestats.info/feed.R.xml" rel="self" type="application/rss+xml" />
<item>
<title>Revisiting depression incidence by county and vote for Trump by @ellis2013nz</title>
<description><p>Just before Christmas <a href="/blog/2024/12/23/depression-and-vote">I blogged about the positive correlation between depression incidence in US counties and their vote for Trump in the 2024 presidential election</a>. In addition to my casual interest in the topic, I used it as a case study in multilevel modelling while adjusting for spatial correlation. I explicitly said that I didn’t think it likely that the depression-vote relationship was a causal link; I suspected that most likely, some underlying variable that caused depression was also related to voting behaviour.</p>
<p>I’m coming back to the issue because on reflection, I have three bits of unfinished business:</p>
<ol>
<li>I had a nagging thought that with 50+ counties per state (and hence some degrees of freedom to spare), I perhaps should have allowed random slopes for depression incidence in each state, rather than just random intercepts</li>
<li>I thought my spatial statistics workaround, of just adding a rubbery mat to space to suck up the spatial autocorrelation between observations, perhaps was a bit slap-dash and I should be as a matter of course modelling the spatial autocorrelation explicitly</li>
<li>An alert reader, Jonathan Spring, pointed out that in the USA there are <a href="https://disq.us/url?url=https%3A%2F%2Fpmc.ncbi.nlm.nih.gov%2Farticles%2FPMC6390869%2F%23%3A%7E%3Atext%3DStudies%2520that%2520have%2520explored%2520the%2Copposed%2520to%2520African%2520Americans%252C%2520whose%3AIlHBLjpwdFZl-Dn9J9CLQ5OH2gc&amp;cuid=3714645">very marked racial differences between depression diagnoses</a> and suggested that perhaps in my model the depression incidence was standing in as a proxy for “whiteness”.</li>
</ol>
<p>Of these, the first two felt like bits of probably-immaterial-to-the-question methodological details that I should polish up, whereas number 3 seemed likely to be the explanation of the whole phenomenon. In other words, race is likely a <a href="/blog/2023/06/04/causality-sims">confounder</a> of the depression-voting relationship, as per this directed acyclic graph:</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286c-dag.svg" width="100%"><img src="https://freerangestats.info/img/0286c-dag.png" width="100%" /></object>
<p>Which means that if we want to actually understand the causal relationship of depression on voting we would need to control for race in the regression. Now, there are other things we’d need to do too; in particular to identify and control for the various “other factors”. I’m not up for that right now - this is someone else’s job - but I’m interested enough to go part-way into things to at least check out the degree to which race makes the depression relationship go away.</p>
<p>This is a long post and parts of it are likely to be of interest only to people concerned with the minutiae of multilevel modelling with spatial autocorrelation. If you just want to see what including race in the model does to the depression effect on voting for Trump, you can skip down to the final section.</p>
<p>All the code for this post assumes that the code from the previous post has already been run. If you want a version of the whole script that just runs, it’s <a href="https://github.com/ellisp/blog-source/blob/master/_working/0286c-depression-and-race.R">here in the source repository</a> of the whole blog.</p>
<p>But here’s the code for just that DAG diagram:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">GGally</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggdag</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">patchwork</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">kableExtra</span><span class="p">)</span><span class="w">
</span><span class="c1"># you need to have run the code from the previous blog first, line below will </span><span class="w">
</span><span class="c1"># work only for those who are running this from a clone of my whole repository:</span><span class="w">
</span><span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">exists</span><span class="p">(</span><span class="s2">"combined2"</span><span class="p">)){</span><span class="w">
</span><span class="n">source</span><span class="p">(</span><span class="s2">"0286-voting-and-depression.R"</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#-----------diagram--------------</span><span class="w">
</span><span class="n">dag</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">dagify</span><span class="p">(</span><span class="w">
</span><span class="n">Vote</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Race</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Depression</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'Other factors'</span><span class="p">,</span><span class="w">
</span><span class="n">Depression</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Race</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'Other factors'</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">set.seed</span><span class="p">(</span><span class="m">125</span><span class="p">)</span><span class="w">
</span><span class="n">ggdag</span><span class="p">(</span><span class="n">dag</span><span class="p">,</span><span class="w"> </span><span class="n">edge_type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"link"</span><span class="p">,</span><span class="w"> </span><span class="n">node</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_dag_blank</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_dag_node</span><span class="p">(</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lightgreen"</span><span class="p">,</span><span class="w"> </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">19</span><span class="p">,</span><span class="w"> </span><span class="m">19</span><span class="p">,</span><span class="w"> </span><span class="m">19</span><span class="p">,</span><span class="w"> </span><span class="m">17</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_dag_edges</span><span class="p">(</span><span class="n">edge_colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"lightblue"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lightblue"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grey"</span><span class="p">,</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="s2">"black"</span><span class="p">),</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">),</span><span class="w">
</span><span class="n">edge_width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">1.5</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">0.8</span><span class="p">,</span><span class="w"> </span><span class="m">1.5</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">),</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">),</span><span class="w">
</span><span class="n">edge_linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_dag_text</span><span class="p">(</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"steelblue"</span><span class="p">)</span></code></pre></figure>
<p>OK, on to fixing up the loose ends with my modelling approach.</p>
<h2 id="move-from-gam-to-gamm">Move from <code class="language-plaintext highlighter-rouge">gam</code> to <code class="language-plaintext highlighter-rouge">gamm</code></h2>
<p>I decided that the simplest material improvement to my approach to the spatial autocorrelation would be to explicitly model the spatial co-variance of the residuals. One way of doing this that is the minimal change from my approach so far would be to move the model-fitting from <code class="language-plaintext highlighter-rouge">mgcv::gam()</code> to <code class="language-plaintext highlighter-rouge">mgcv::gamm()</code>. <code class="language-plaintext highlighter-rouge">gamm()</code> fits models by iterating between calls to <code class="language-plaintext highlighter-rouge">nlme::lme()</code> and a generalized additive model until convergence. Basically this gives us the ability to use the correlation structures for residuals and mixed random and fixed effects of <code class="language-plaintext highlighter-rouge">lme()</code> while still using the splines and response distributions of <code class="language-plaintext highlighter-rouge">gam()</code>. This is what I need as I want to continue modelling my response with a quasi-binomial family, and I am probably going to want to keep my “rubber sheet” nuisance effect over the US space modelled with a two-dimensional spline (<code class="language-plaintext highlighter-rouge">s(x, y)</code>), even while using the correlation features of <code class="language-plaintext highlighter-rouge">lme()</code>.</p>
<p>So the first thing I do is create a <code class="language-plaintext highlighter-rouge">model6b</code>, as similar as possible to the <code class="language-plaintext highlighter-rouge">model6</code> that was the best and final model from the last blog post, but just changes the estimation method. So here it is, a straight transfer from <code class="language-plaintext highlighter-rouge">gam()</code> to <code class="language-plaintext highlighter-rouge">gamm()</code></p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#---------------moving from gam to gamm--------------</span><span class="w">
</span><span class="c1"># start with the same model as our final one in the last post, but estimated differently:</span><span class="w">
</span><span class="n">model6b</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">state_name</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"re"</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="c1"># we can't compare the AIC of models created with gam and gamm, see</span><span class="w">
</span><span class="c1"># https://stats.stackexchange.com/questions/70512/huge-%CE%94aic-between-gam-and-gamm-models</span><span class="w">
</span><span class="c1"># some differences eg effective degrees of freedom less in the GAMM. But the</span><span class="w">
</span><span class="c1"># main conclusions (significance of cpe) the same</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model6</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model6b</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span></code></pre></figure>
<p>… which gives us these two different results:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; summary(model6)
Family: quasibinomial
Link function: logit
Formula:
per_gop ~ cpe + s(x, y) + s(state_name, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) -2.7972 20.7488 -0.135 0.893
cpe 15.5908 0.6778 23.001 &lt;2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x,y) 28.39 28.94 7.995 &lt;2e-16 ***
s(state_name) 49.00 49.00 7.949 &lt;2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.386 Deviance explained = 44.8%
GCV = 3138.5 Scale est. = 3170.7 n = 3107
&gt; summary(model6b$gam)
Family: quasibinomial
Link function: logit
Formula:
per_gop ~ cpe + s(x, y) + s(state_name, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) -2.8878 0.1481 -19.50 &lt;2e-16 ***
cpe 15.0166 0.6271 23.95 &lt;2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x,y) 25.41 25.41 6.438 &lt;2e-16 ***
s(state_name) 39.91 49.00 7.154 &lt;2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.382
Scale est. = 3005.3 n = 3107
</code></pre></div></div>
<p>There’s some differences coming from the different estimation methods. The <code class="language-plaintext highlighter-rouge">gamm()</code> model uses less effective degrees of freedom for both the <code class="language-plaintext highlighter-rouge">s(x,y)</code> rubber mat and the <code class="language-plaintext highlighter-rouge">s(state_name, bs="re")</code> state level random intercept. The fixed effect coefficient for <code class="language-plaintext highlighter-rouge">cpe</code> (‘crude prevalence estimate’ of county-level depression) is a little different - 15.0 versus 15.6. But we can see we’re fitting the same model and getting substantively the same results.</p>
<p>Next small change I make is to move the state level random intercept from the <code class="language-plaintext highlighter-rouge">gam</code> specification into <code class="language-plaintext highlighter-rouge">lme</code>. Again, this is an identical model, just changing how the fitting is done:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># move the state random effect into the things to be estimated by nlme:</span><span class="w">
</span><span class="n">model6c</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="c1"># fixed coefficients are identical to 6b, but fit was much faster</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model6c</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="w"> </span><span class="c1"># not shown</span></code></pre></figure>
<p>This is a big performance speed up (which we’re going to need as the models start getting more complex) for materially the same results.</p>
<h2 id="random-slopes">Random slopes</h2>
<p>Now I’m ready to take advantage of the move to the <code class="language-plaintext highlighter-rouge">gamm</code> syntax and I let the slopes, not just the intercepts, vary for each state:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#----------------random slopes--------------------</span><span class="w">
</span><span class="n">model7</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="c1"># AIC is 200 less so worth having the random intercepts</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model7</span><span class="o">$</span><span class="n">lme</span><span class="p">,</span><span class="w"> </span><span class="n">model6c</span><span class="o">$</span><span class="n">lme</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model7</span><span class="o">$</span><span class="n">lme</span><span class="p">)</span></code></pre></figure>
<p>…which gives us these results. Note that the Akaike Information Criterion has decreased by 200 from adding the random slopes, suggesting the model is overall worth the increased complexity. The cpe (depression) coefficient has decreased in size but is still quite large and definitely significant.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; AIC(model7$lme, model6c$lme)
df AIC
model7$lme 9 8605.692
model6c$lme 7 8806.902
&gt; summary(model7$lme)
Linear mixed-effects model fit by maximum likelihood
Data: data
AIC BIC logLik
8605.692 8660.064 -4293.846
Random effects:
Formula: ~Xr - 1 | g
Structure: pdIdnot
Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986
Xr8 Xr9 Xr10 Xr11 Xr12 Xr13 Xr14
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986
Xr15 Xr16 Xr17 Xr18 Xr19 Xr20 Xr21
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986
Xr22 Xr23 Xr24 Xr25 Xr26 Xr27
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986
Formula: ~1 + cpe | state_name %in% g
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 3.294624 (Intr)
cpe 15.833056 -0.985
Residual 50.963709
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: list(fixed)
Value Std.Error DF t-value p-value
X(Intercept) -2.099504 0.5204783 3054 -4.033796 0.0001
Xcpe 10.482779 2.5041023 3054 4.186243 0.0000
Xs(x,y)Fx1 0.221830 0.1339667 3054 1.655859 0.0979
Xs(x,y)Fx2 0.134028 0.1735957 3054 0.772070 0.4401
Correlation:
X(Int) Xcpe X(,)F1
Xcpe -0.985
Xs(x,y)Fx1 -0.083 0.084
Xs(x,y)Fx2 -0.121 0.126 0.568
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-9.65917765 0.05092882 0.32541129 0.62277480 6.27537901
Number of Observations: 3107
Number of Groups:
g state_name %in% g
1 50
</code></pre></div></div>
<p>We can use this model to make a new version of the final chart from the previous blog post:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">preds7</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">model7</span><span class="o">$</span><span class="n">gam</span><span class="p">,</span><span class="w"> </span><span class="n">se.fit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"response"</span><span class="p">)</span><span class="w">
</span><span class="n">combined2</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">fit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preds7</span><span class="o">$</span><span class="n">fit</span><span class="p">,</span><span class="w">
</span><span class="n">se</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preds7</span><span class="o">$</span><span class="n">se.fit</span><span class="p">,</span><span class="w">
</span><span class="n">lower</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1.96</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">se</span><span class="p">,</span><span class="w">
</span><span class="n">upper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1.96</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">se</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state_name</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">per_gop</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state_name</span><span class="p">),</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_ribbon</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">ymin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lower</span><span class="p">,</span><span class="w"> </span><span class="n">ymax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">upper</span><span class="p">),</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">limits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Crude prevalence estimate of depression"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Percentage vote for Trump in 2024 election"</span><span class="p">,</span><span class="w">
</span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Grey ribbons are 95% confidence intervals from quasibinomial generalized additive model with spatial effect and state-level random intercept effect"</span><span class="p">,</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Counties with more depression voted more for Trump"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">the_caption</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">state_name</span><span class="p">)</span></code></pre></figure>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286c-random-slopes.svg" width="100%"><img src="https://freerangestats.info/img/0286c-random-slopes.png" width="100%" /></object>
<p>It doesn’t look much different. So my hunch on this aspect was right; there’s enough data to justify letting the depression-vote slope vary in each state (ie we get a better model) but it doesn’t change the substantive conclusion.</p>
<h2 id="better-spatial-autocorrelation">Better spatial autocorrelation</h2>
<p>Now I’m ready to improve the way I’m handling spatial autocorrelation. As I mentioned last time, my approach to this is to model the centroid of each county with an <code class="language-plaintext highlighter-rouge">s(x, y)</code> two-dimensional spline, a sort of rubber mat overlaid over the USA to soak up the things counties have in common with their neighbouring counties. I still think this is much better than nothing, but admit there is still a problem that even after doing this the residuals of counties that are close together will still be correlated. Which means that each observation is not worth as much as if it were truly independent, which means that my inferences are over-confident in their precision.</p>
<p>First I wanted to check out if this hunch was correct. The standard way to look for spatial autocorrelation is via a variogram. I couldn’t get the <code class="language-plaintext highlighter-rouge">Variogram()</code> function in <code class="language-plaintext highlighter-rouge">nlme</code> to return sensible results from the models that had been fit with <code class="language-plaintext highlighter-rouge">gamm()</code> so I had to make one more explicitly with the <code class="language-plaintext highlighter-rouge">variogram()</code> function in <code class="language-plaintext highlighter-rouge">gstat</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">sp</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">gstat</span><span class="p">)</span><span class="w">
</span><span class="n">sp_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">combined2</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">residuals</span><span class="p">(</span><span class="n">model7</span><span class="o">$</span><span class="n">lme</span><span class="p">))</span><span class="w">
</span><span class="n">coordinates</span><span class="p">(</span><span class="n">sp_data</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">variogram</span><span class="p">(</span><span class="n">res</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sp_data</span><span class="p">),</span><span class="w">
</span><span class="n">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Variogram for pairs of counties' residuals"</span><span class="p">,</span><span class="w">
</span><span class="n">xlab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Distance between pairs of counties"</span><span class="p">)</span></code></pre></figure>
<p>… which gets me this:</p>
<p><img src="https://freerangestats.info/img/0286c-variogram.png" width="100%" /></p>
<p>A variogram works by calculating the distances between each pair of observations, binning these and then assessing how related the pairs of observations (in this case, residuals after the model fitting) are in each bin. It doesn’t do this by a correlation coefficient but by another measure that I haven’t got my head around but is described as <a href="https://www.kgs.ku.edu/Tis/surf3/s3krig2.html">‘half the variance of the differences between all possible points spaced a constant distance apart.’</a>. The important thing is that a score of zero means perfect correlation, and the higher the numbers are the more independent the pairs of observations contained in the bin are. So to interpret the chart above we can say that the counties that are about 20 degrees (of latitude/longitude, ignoring curvature of the earth) apart or closer have some degree of correlation with eachother; once they get to that far apart this measure more or less stablises.</p>
<p>I don’t know why spatial statistics doesn’t just use a good-ole correlation coefficient for this job, but presume there are interesting historical reasons. To check that I wasn’t mangling things, I decided to “roll my own” spatial correlation measure, with:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># let's roll our own on a similar concept to see what's happening</span><span class="w">
</span><span class="n">counties</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">combined2</span><span class="p">,</span><span class="w"> </span><span class="n">county_fips</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w">
</span><span class="c1"># find the counties' distance from each other country</span><span class="w">
</span><span class="n">county_pairs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">expand_grid</span><span class="p">(</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">counties</span><span class="o">$</span><span class="n">county_fips</span><span class="p">,</span><span class="w">
</span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">counties</span><span class="o">$</span><span class="n">county_fips</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="n">from</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="n">to</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">left_join</span><span class="p">(</span><span class="n">counties</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"from"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"county_fips"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">rename</span><span class="p">(</span><span class="n">fx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">fy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">left_join</span><span class="p">(</span><span class="n">counties</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"to"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"county_fips"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">rename</span><span class="p">(</span><span class="n">tx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">ty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">distance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sqrt</span><span class="p">((</span><span class="n">fx</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">tx</span><span class="p">)</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">fy</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">ty</span><span class="p">)</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">res7</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">combined2</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">residuals</span><span class="p">(</span><span class="n">model7</span><span class="o">$</span><span class="n">lme</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"response"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="n">county_fips</span><span class="p">,</span><span class="w"> </span><span class="n">res</span><span class="p">)</span><span class="w">
</span><span class="n">county_pairs</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">left_join</span><span class="p">(</span><span class="n">rename</span><span class="p">(</span><span class="n">res7</span><span class="p">,</span><span class="w"> </span><span class="n">from_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">),</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"from"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"county_fips"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">left_join</span><span class="p">(</span><span class="n">rename</span><span class="p">(</span><span class="n">res7</span><span class="p">,</span><span class="w"> </span><span class="n">to_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">),</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"to"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"county_fips"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">distance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cut</span><span class="p">(</span><span class="n">distance</span><span class="p">,</span><span class="w"> </span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="m">6</span><span class="p">,</span><span class="w"> </span><span class="m">8</span><span class="p">,</span><span class="m">12</span><span class="p">,</span><span class="m">24</span><span class="p">,</span><span class="m">48</span><span class="p">,</span><span class="m">108</span><span class="p">)))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">distance</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">summarise</span><span class="p">(</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cor</span><span class="p">(</span><span class="n">from_res</span><span class="p">,</span><span class="w"> </span><span class="n">to_res</span><span class="p">),</span><span class="w">
</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">distance</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">correlation</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"red"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_size_area</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">comma</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Correlation between pairs of counties' residuals"</span><span class="p">,</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Distance between two counties"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Correlation in residuals from model7"</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Number of county-pairs"</span><span class="p">)</span></code></pre></figure>
<p>That gives me this chart, which is more like the sort of thing we use in time series analysis.</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286c-spatial-correlation.svg" width="100%"><img src="https://freerangestats.info/img/0286c-spatial-correlation.png" width="100%" /></object>
<p>I’m pleased it tells a similar story to the variogram (which is more of a black box to me) - the correlation between pairs of counties’ residuals is above zero for counties that are within 6 degrees/units of eachother, and stablises at a small negative number from about 20 units apart onwards.</p>
<p>My conclusion from this is that yes, there is still some spatial autocorrelation that should be taken into account. Particularly for those counties very close together (around 2 degree/units apart or less).</p>
<p>Now, there is a tricky problem with the <em>shape</em> of spatial autocorrelation. Even if you’re prepared to assume (as I am in this case) that it is symmetrical north-south and east-west (not always the case when it depends on eg wind), there are differing shapes of decay in the level of correlation, as a function of distance. Experts can apparently/allegedly judge which method is best by looking at the shape of the variogram, but I feel safest by trying all five correlation structures available to me and choosing the one with the lowest AIC.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># o help us decide the shape of the spatial autocorrelation</span><span class="w">
</span><span class="n">model8</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">list</span><span class="p">()</span><span class="w">
</span><span class="n">model8</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="n">model8</span><span class="p">[[</span><span class="m">2</span><span class="p">]]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corGaus</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="n">model8</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corLin</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="n">model8</span><span class="p">[[</span><span class="m">4</span><span class="p">]]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corRatio</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="n">model8</span><span class="p">[[</span><span class="m">5</span><span class="p">]]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corSpher</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="n">sapply</span><span class="p">(</span><span class="n">model8</span><span class="p">,</span><span class="w"> </span><span class="err">\</span><span class="p">(</span><span class="n">m</span><span class="p">){</span><span class="nf">round</span><span class="p">(</span><span class="n">AIC</span><span class="p">(</span><span class="n">m</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w"> </span><span class="m">-1</span><span class="p">)})</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model7</span><span class="o">$</span><span class="n">lme</span><span class="p">)</span></code></pre></figure>
<p>That gets me these results:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; sapply(model8, \(m){round(AIC(m$lme), -1)})
[1] 8230 8600 8590 8330 8580
&gt; AIC(model7$lme)
[1] 8605.692
</code></pre></div></div>
<p>We see that all the models that adjust for spatial correlation are better than model7 (which doesn’t), but the models using <code class="language-plaintext highlighter-rouge">corExp</code> and <code class="language-plaintext highlighter-rouge">corRatio</code> are much better than the other three. The <code class="language-plaintext highlighter-rouge">corExp</code> model (<code class="language-plaintext highlighter-rouge">model8[[1]]</code>) is our new best model so far.</p>
<p>If we look at the t statistics for the coefficients of these five models, we see that for the first time our best model has a “non-significant” slope for cpe (ie depression), 1.695. Now, I’m not going to remove a variable from a complex mixed effects model like this because of a non-significant t statistic, but it is definitely note-worthy that the depression effect has become less important once we let it vary by state and do the best correction for spatial autocorrelation:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; sapply(model8, \(m){summary(m$gam)$p.t})
[,1] [,2] [,3] [,4] [,5]
(Intercept) -2.031622 -4.036009 -3.987122 -3.494438 -3.952872
cpe 1.695218 4.206850 4.193251 3.746362 4.167077
</code></pre></div></div>
<p>Also noteworthy though is that the other spatial correlation shapes still leave depression with significant t statistic.</p>
<p>Here’s the full report from the <code class="language-plaintext highlighter-rouge">lme</code> part of the best fit so far:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; summary(model8[[1]]$lme)
Linear mixed-effects model fit by maximum likelihood
Data: data
AIC BIC logLik
8231.487 8291.901 -4105.744
Random effects:
Formula: ~Xr - 1 | g
Structure: pdIdnot
Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7 Xr8
StdDev: 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243
Xr9 Xr10 Xr11 Xr12 Xr13 Xr14 Xr15 Xr16
StdDev: 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243
Xr17 Xr18 Xr19 Xr20 Xr21 Xr22 Xr23 Xr24
StdDev: 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243
Xr25 Xr26 Xr27
StdDev: 6.68243 6.68243 6.68243
Formula: ~1 + cpe | state_name %in% g
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 2.702224 (Intr)
cpe 12.092468 -0.99
Residual 66.289425
Correlation Structure: Exponential spatial correlation
Formula: ~x + y | g/state_name
Parameter estimate(s):
range
0.7289188
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: list(fixed)
Value Std.Error DF t-value p-value
X(Intercept) -0.889436 0.4390918 3054 -2.0256268 0.0429
Xcpe 3.370112 1.9919865 3054 1.6918346 0.0908
Xs(x,y)Fx1 0.165268 0.1447102 3054 1.1420604 0.2535
Xs(x,y)Fx2 0.000906 0.1909150 3054 0.0047431 0.9962
Correlation:
X(Int) Xcpe X(,)F1
Xcpe -0.988
Xs(x,y)Fx1 -0.065 0.066
Xs(x,y)Fx2 -0.131 0.141 0.510
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-5.2880800 0.2592565 0.5963373 1.0331947 4.6052452
Number of Observations: 3107
Number of Groups:
g state_name %in% g
1 50
</code></pre></div></div>
<p>I believe that <code class="language-plaintext highlighter-rouge">Formula: ~Xr - 1 | g</code> bit refers to the <code class="language-plaintext highlighter-rouge">s(x,y)</code> term from the GAM when brought into the linearised LME fit. The range of 0.73 refers to the exponential spatial correlation.</p>
<h2 id="race">Race</h2>
<p>Finally I’m ready to look at the question that’s of most substantive interest - does incuding race in the model make the “depression effect” go away, suggesting that depression is acting as a proxy for whiteness. I sourced data on county characteristics of various sorts from the UN Census Bureau. I considered four candidate variables:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">white_alone</code> which is the proportion of the county that describe themselves as white and not other race</li>
<li><code class="language-plaintext highlighter-rouge">white_all</code> which is the proportion of the county that describe themselves as white, whether or not they also are of another race as well</li>
<li><code class="language-plaintext highlighter-rouge">hispanic</code> proportion of county that describe themselves as hispanic</li>
<li><code class="language-plaintext highlighter-rouge">hispanic_multi</code> proportion of county that describe themselves as hispanic and more races in addition</li>
</ul>
<p>I deliberately left out the proportion of African-Americans in each county, assuming it would be very collinear with some combination of the others. If were seriously interested in how race worked in this election I would probably have included it anyway.</p>
<p>Looking at these four variables (without peeking at their relationship to vote) gives me this pairs plot:</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286c-pairs.svg" width="100%"><img src="https://freerangestats.info/img/0286c-pairs.png" width="100%" /></object>
<p>which convinces me I should save a degree of freedom and drop the <code class="language-plaintext highlighter-rouge">white_all</code> variable from my modelling as containing virtually no extra information from the <code class="language-plaintext highlighter-rouge">white_alone</code> variable. Here’s the code to import that data and draw the pairs plot:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">---------------------</span><span class="n">data</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="s1">'race'</span><span class="o">-------------</span><span class="w">
</span><span class="c1"># county characteristis from US Census Bureau</span><span class="w">
</span><span class="c1"># see https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2020-2023/CC-EST2023-ALLDATA.pdf</span><span class="w">
</span><span class="c1"># for metadata</span><span class="w">
</span><span class="n">df</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"cc-est2023-alldata.csv"</span><span class="w">
</span><span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">file.exists</span><span class="p">(</span><span class="n">df</span><span class="p">)){</span><span class="w">
</span><span class="n">download.file</span><span class="p">(</span><span class="s2">"https://www2.census.gov/programs-surveys/popest/datasets/2020-2023/counties/asrh/cc-est2023-alldata.csv"</span><span class="p">,</span><span class="w">
</span><span class="n">destfile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1"># key columns:</span><span class="w">
</span><span class="c1"># TOT_POP total population</span><span class="w">
</span><span class="c1"># WA_MALE "White alone" male</span><span class="w">
</span><span class="c1"># WAC_MALE "white alone or in combination" male</span><span class="w">
</span><span class="c1"># H_MALE Hispanic male</span><span class="w">
</span><span class="c1"># HTOM_MALE Hispanic AND more races male</span><span class="w">
</span><span class="n">race</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_csv</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="c1"># just 2023 and TOTAL age group:</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="n">YEAR</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">AGEGRP</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">white_alone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">WA_MALE</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">WA_FEMALE</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">TOT_POP</span><span class="p">,</span><span class="w">
</span><span class="n">white_all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">WAC_MALE</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">WAC_FEMALE</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">TOT_POP</span><span class="p">,</span><span class="w">
</span><span class="n">hispanic</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">H_MALE</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">H_FEMALE</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">TOT_POP</span><span class="p">,</span><span class="w">
</span><span class="n">hispanic_multi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">HTOM_MALE</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">HTOM_FEMALE</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">TOT_POP</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">county_fips</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">STATE</span><span class="p">,</span><span class="w"> </span><span class="n">COUNTY</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="n">white_alone</span><span class="o">:</span><span class="n">county_fips</span><span class="p">,</span><span class="w"> </span><span class="n">CTYNAME</span><span class="p">)</span><span class="w">
</span><span class="n">race</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">county_fips</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">CTYNAME</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggpairs</span><span class="p">()</span></code></pre></figure>
<p>Next I wanted to look at the relationship of these race variables to the logit of vote for Trump, to check if a linearity assumption was going to be reasonable. That got me this chart, which seemed linear enough for me (for my purposes):</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286c-exp-v-response.svg" width="100%"><img src="https://freerangestats.info/img/0286c-exp-v-response.png" width="100%" /></object>
<p>…produced with this code:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">combined4</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">combined2</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">left_join</span><span class="p">(</span><span class="n">race</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"county_fips"</span><span class="p">)</span><span class="w">
</span><span class="c1"># visual check that the counties joined correctly:</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="n">combined4</span><span class="p">,</span><span class="w"> </span><span class="n">county_name</span><span class="p">,</span><span class="w"> </span><span class="n">CTYNAME</span><span class="p">)</span><span class="w">
</span><span class="c1"># check for linearity of relationships to the response variable</span><span class="w">
</span><span class="n">logit</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">p</span><span class="p">){</span><span class="w">
</span><span class="nf">log</span><span class="p">(</span><span class="n">p</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="m">1</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">combined4</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">logit_gop</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">logit</span><span class="p">(</span><span class="n">per_gop</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="n">logit_gop</span><span class="p">,</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">`
Depression incidence`</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w">
</span><span class="n">`Proportion only white`</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">white_alone</span><span class="p">,</span><span class="w">
</span><span class="n">`Proportion only hispanic`</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hispanic</span><span class="p">,</span><span class="w">
</span><span class="n">`Proportion hispanic plus another`</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">gather</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">logit_gop</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">total_votes</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">logit_gop</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="n">value</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">),</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_smooth</span><span class="p">(</span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lm"</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">weight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">variable</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free_x"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Explanatory variable value"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"logit of vote for Trump, 2024"</span><span class="p">)</span></code></pre></figure>
<p>OK so now I fit a whole bunch of different models to be confident that individual decisions from me weren’t going to be leading to my final conclusions. I fit models with many of the combinations of using a rubber mat to deal with spatial autocorrelation, explicitly modelling the spatial autocorrelation, random or fixed slopes (ie varying by state) for <code class="language-plaintext highlighter-rouge">cpe</code> (depression), random or fixed slopes (ie varying by state) for hte race variables. Then I calculated the AIC of all these models and the t statistics for <code class="language-plaintext highlighter-rouge">cpe</code> (depression), which gets me this table, sorted with the best models (lowest AIC) at the top:</p>
<table class="table" style="margin-left: auto; margin-right: auto;">
<thead>
<tr>
<th style="text-align:right;"> model </th>
<th style="text-align:right;"> AIC </th>
<th style="text-align:right;"> rubber_mat </th>
<th style="text-align:right;"> random_cpe_slope </th>
<th style="text-align:right;"> race </th>
<th style="text-align:right;"> random_race_slope </th>
<th style="text-align:right;"> SAC_fix </th>
<th style="text-align:right;"> cpe_t_stat </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:right;"> 15 </td>
<td style="text-align:right;"> 5676.498 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> -1.612598 </td>
</tr>
<tr>
<td style="text-align:right;"> 13 </td>
<td style="text-align:right;"> 5904.910 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> -1.900880 </td>
</tr>
<tr>
<td style="text-align:right;"> 10 </td>
<td style="text-align:right;"> 6063.537 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> -1.581859 </td>
</tr>
<tr>
<td style="text-align:right;"> 14 </td>
<td style="text-align:right;"> 6241.311 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> -4.612077 </td>
</tr>
<tr>
<td style="text-align:right;"> 11 </td>
<td style="text-align:right;"> 6416.801 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> -2.996010 </td>
</tr>
<tr>
<td style="text-align:right;"> 12 </td>
<td style="text-align:right;"> 7136.013 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 5.347793 </td>
</tr>
<tr>
<td style="text-align:right;"> 8 </td>
<td style="text-align:right;"> 8231.487 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1.695218 </td>
</tr>
<tr>
<td style="text-align:right;"> 9 </td>
<td style="text-align:right;"> 8246.613 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 2.162823 </td>
</tr>
</tbody>
</table>
<p>Lots of interesting things here, but most important:</p>
<ul>
<li>The best models are definitely those that include race</li>
<li>The best model of all is also the most complex - random slopes for both race and depression, rubber mat, plus addressing the spatial autocorrelation (SAC) explicitly with an exponential correlation structure</li>
<li>The best models all have a <em>negative</em> sign for <code class="language-plaintext highlighter-rouge">cpe</code>, indicating that after controlling for race, it is the counties with <em>less</em> depression that were more likely to vote for Trump - a compelling change in narrative from looking at the data without controlling for race. But in the best model of all, depression isn’t very important.</li>
<li>One model that I describe in the code below as “definitely illegitimate because it makes no attempt at all to correct for spatial autocorrelation” is the one that gives the biggest positive effect (as in t statistics) for the depression impact on voting for Trump</li>
</ul>
<p>I regard this as good evidence that in fact incidence of depression by county wasn’t important in driving the vote for Trump, but that race composition of counties probably was (or at least ‘more likely’ was). Of course, how that mechanism works, and whether ‘whiteness’ here is in fact standing in for something else, is beyond the scope of this blog post (already far too long) to explain.</p>
<p>Here’s the code that fit all those candidate models described above and generates the table:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># no rubber mat, no race, but does have spatial autocorrelation</span><span class="w">
</span><span class="n">model9</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined2</span><span class="p">)</span><span class="w">
</span><span class="c1"># no rubber mat, but race:</span><span class="w">
</span><span class="n">model10</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined4</span><span class="p">)</span><span class="w">
</span><span class="c1"># no random slopes or rubber mat, but race:</span><span class="w">
</span><span class="n">model11</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined4</span><span class="p">)</span><span class="w">
</span><span class="c1"># no random slope, rubber mat or spatial autocorrelation fix at all. this model</span><span class="w">
</span><span class="c1"># is definitely illegitimate in that it makes no effort to fix for spatial issues.</span><span class="w">
</span><span class="n">model12</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined4</span><span class="p">)</span><span class="w">
</span><span class="c1"># rubber mat, race, random slope for cpe</span><span class="w">
</span><span class="n">model13</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined4</span><span class="p">)</span><span class="w">
</span><span class="c1"># rubber mat, race, only random intercept</span><span class="w">
</span><span class="n">model14</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined4</span><span class="p">)</span><span class="w">
</span><span class="c1"># fullest model so far PLUS giving random slopes to the nuisance variables</span><span class="w">
</span><span class="c1"># rubber mat, race, random slope for cpe &amp; random slope for the two main</span><span class="w">
</span><span class="c1"># race variables:</span><span class="w">
</span><span class="n">model15</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined4</span><span class="p">)</span><span class="w">
</span><span class="c1"># compare the no-rubber mat models (expect #15 to be best, with random slopes</span><span class="w">
</span><span class="c1"># for CPE and race - the most flexibility)</span><span class="w">
</span><span class="n">tibble</span><span class="p">(</span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">8</span><span class="o">:</span><span class="m">15</span><span class="p">,</span><span class="w">
</span><span class="n">AIC</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">AIC</span><span class="p">(</span><span class="n">model8</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model9</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model10</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model11</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model12</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model13</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model14</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w">
</span><span class="n">AIC</span><span class="p">(</span><span class="n">model15</span><span class="o">$</span><span class="n">lme</span><span class="p">)),</span><span class="w">
</span><span class="n">rubber_mat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">random_cpe_slope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">race</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">random_race_slope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">SAC_fix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">cpe_p_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model8</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model9</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model10</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model11</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model12</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model13</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model14</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model15</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.pv</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">]),</span><span class="w">
</span><span class="n">cpe_t_stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model8</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model9</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model10</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model11</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model12</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model13</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model14</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">],</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model15</span><span class="o">$</span><span class="n">gam</span><span class="p">)</span><span class="o">$</span><span class="n">p.t</span><span class="p">[</span><span class="s1">'cpe'</span><span class="p">])</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">cpe_p_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">round</span><span class="p">(</span><span class="n">cpe_p_value</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">arrange</span><span class="p">(</span><span class="n">AIC</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="c1"># for space decided not to show this column, can just use t stat</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">cpe_p_value</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">knitr</span><span class="o">::</span><span class="n">kable</span><span class="p">(</span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"html"</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">kable_styling</span><span class="p">()</span><span class="o">|&gt;</span><span class="w">
</span><span class="n">writeClipboard</span><span class="p">()</span></code></pre></figure>
<p>Here’s the key results from the overall winning model:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; summary(model15$lme)
Linear mixed-effects model fit by maximum likelihood
Data: data
AIC BIC logLik
5676.498 5797.326 -2818.249
Random effects:
Formula: ~Xr - 1 | g
Structure: pdIdnot
Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787
Xr8 Xr9 Xr10 Xr11 Xr12 Xr13 Xr14
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787
Xr15 Xr16 Xr17 Xr18 Xr19 Xr20 Xr21
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787
Xr22 Xr23 Xr24 Xr25 Xr26 Xr27
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787
Formula: ~1 + cpe + white_alone + hispanic | state_name %in% g
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 1.503864 (Intr) cpe wht_ln
cpe 6.544726 -0.761
white_alone 1.055536 -0.437 -0.188
hispanic 1.719388 -0.134 -0.152 0.180
Residual 43.651012
Correlation Structure: Exponential spatial correlation
Formula: ~x + y | g/state_name
Parameter estimate(s):
range
0.8315562
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: list(fixed)
Value Std.Error DF t-value p-value
X(Intercept) -2.395289 0.271082 3051 -8.836033 0.0000
Xcpe -1.862160 1.155263 3051 -1.611892 0.1071
Xwhite_alone 3.678207 0.191161 3051 19.241375 0.0000
Xhispanic -0.544371 0.327396 3051 -1.662729 0.0965
Xhispanic_multi 1.751862 4.307237 3051 0.406725 0.6842
Xs(x,y)Fx1 0.202288 0.093536 3051 2.162664 0.0306
Xs(x,y)Fx2 0.429865 0.131173 3051 3.277087 0.0011
Correlation:
X(Int) Xcpe Xwht_l Xhspnc Xhspn_ X(,)F1
Xcpe -0.761
Xwhite_alone -0.461 -0.164
Xhispanic -0.178 -0.053 0.161
Xhispanic_multi -0.002 -0.114 0.101 -0.278
Xs(x,y)Fx1 -0.051 0.030 0.028 -0.032 0.157
Xs(x,y)Fx2 -0.121 0.101 0.083 0.000 -0.081 0.350
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-6.1786290 0.2312745 0.4830700 0.8228585 5.5847101
Number of Observations: 3107
Number of Groups:
g state_name %in% g
1 50
</code></pre></div></div>
<p>Looking at that table of fixed effects, we see the <code class="language-plaintext highlighter-rouge">white_alone</code> variable is definitely significant, as is the rubber mat <code class="language-plaintext highlighter-rouge">s(x,y)</code> spatial correlation absorber. So we could tentatively conclude that this suggests that whiteness strongly contributed to vote for Trump; that depression incidence didn’t contribute anywhere near as much; and there was a strong spatial correlation not explained by either whiteness or depression.</p>
<p>Looking for a way to summarise all this I came up with this chart of the partial relationship of whiteness and of depression incidence to vote for Trump; <em>after</em> controlling for the other things in the final model:</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286c-partials.svg" width="100%"><img src="https://freerangestats.info/img/0286c-partials.png" width="100%" /></object>
<p>That was produced with this code, which involved fitting a new model to produce the residuals after controlling for race but not controlling for depression (once combination that hadn’t yet been done in the frenzy of model-fitting above):</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#--------------------partial charts-------------------</span><span class="w">
</span><span class="c1"># model 8[[1]] is the full model except for race. we also need a full model except</span><span class="w">
</span><span class="c1"># for depression (cpe). then we will use the residuals from each for some charts</span><span class="w">
</span><span class="n">model17</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gamm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic_multi</span><span class="p">,</span><span class="w">
</span><span class="n">random</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">state_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">white_alone</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">hispanic</span><span class="p">),</span><span class="w">
</span><span class="n">correlation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">corExp</span><span class="p">(</span><span class="n">form</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">~</span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined4</span><span class="p">)</span><span class="w">
</span><span class="c1"># Difference between this and model 15 is no depression variable</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="n">AIC</span><span class="p">(</span><span class="n">model17</span><span class="o">$</span><span class="n">lme</span><span class="p">),</span><span class="w"> </span><span class="n">AIC</span><span class="p">(</span><span class="n">model15</span><span class="o">$</span><span class="n">lme</span><span class="p">))</span><span class="w">
</span><span class="n">p5</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">combined4</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">after_cpe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">residuals</span><span class="p">(</span><span class="n">model8</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="o">$</span><span class="n">gam</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"response"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">white_alone</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">after_cpe</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_smooth</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">weight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">),</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lm"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">,</span><span class="w"> </span><span class="n">limits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">-0.4</span><span class="p">,</span><span class="w"> </span><span class="m">0.6</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_size_area</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">comma_format</span><span class="p">(</span><span class="n">suffix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"m"</span><span class="p">,</span><span class="w"> </span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1e-6</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Percentage of county that is 'white' as its only race"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Residual vote for Trump"</span><span class="p">,</span><span class="w">
</span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"After controlling for counties' depression incidence"</span><span class="p">,</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Partial relationship of 'whiteness' and Trump vote"</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Total votes, 2024:"</span><span class="p">)</span><span class="w">
</span><span class="n">p6</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">combined4</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">after_race</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">residuals</span><span class="p">(</span><span class="n">model17</span><span class="o">$</span><span class="n">gam</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"response"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">after_race</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_smooth</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">weight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">),</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lm"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">,</span><span class="w"> </span><span class="n">limits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">-0.4</span><span class="p">,</span><span class="w"> </span><span class="m">0.6</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_size_area</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">comma_format</span><span class="p">(</span><span class="n">suffix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"m"</span><span class="p">,</span><span class="w"> </span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1e-6</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Incidence of diagnosed depression in each country"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Residual vote for Trump"</span><span class="p">,</span><span class="w">
</span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"After controlling for counties' racial composition"</span><span class="p">,</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Partial relationship of depression incidence and Trump vote"</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Total votes, 2024:"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Source: voting from tonmcg, depression from CDC, race from US Census Bureau; analysis by freerangestats.info"</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">p5</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">p6</span><span class="p">)</span></code></pre></figure>
<p>That’s it for today. I hope that’s of interest for someone, either as a messy but realistic modelling strategy case study, or for those interested in the specific issue of depression and the 2024 election.</p>
</description>
<pubDate>Fri, 03 Jan 2025 00:00:00 +0800</pubDate>
<link>https://freerangestats.info/blog/2025/01/03/depression-and-vote-again</link>
<guid isPermaLink="true">https://freerangestats.info/blog/2025/01/03/depression-and-vote-again</guid>
</item>
<item>
<title>Depression incidence by county and vote for Trump by @ellis2013nz</title>
<description><p>A <a href="https://bsky.app/profile/mchinn.bsky.social/post/3ldwkfv7uz22v">skeet floated across my Bluesky feed</a> that looked at the cross-sectional relationship between incidence of depression in 2020 and voting for Trump in the 2024 Presidential election. The data in the skeet and immediate blog post was at state level, but the hypothesis of interest in an article that sparked all this was an individual one (are depressed people voting for Trump). I don’t have the individual level microdata that might help explore the actual hypothesis, but I was surprised to see that the state-level regression had a significant evidence of an effect, and wondered if this would continue at the county level, which still has relatively accessible data.</p>
<p>This also led me down an interesting but familiar rabbit hole of multilevel modelling in the presence of a spatial correlation nuisance.</p>
<h2 id="ordinary-least-squares">Ordinary Least Squares</h2>
<p>Well, it turns out depression at the county level does correlate with voting for Trump, as we can see with this first, simple chart which shows the expected vote based on a model fit with ordinary least squares (OLS):</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286-ols.svg" width="100%"><img src="https://freerangestats.info/img/0286-ols.png" width="100%" /></object>
<p>I’ll be fitting some more fancy models and getting better charts further down, but the basic message is the same as in this one - counties with higher incidence of depression had a higher proportion of their vote going to Trump than was the case with counties with lower levels of depression.</p>
<p>Before I say anything else or show any code, let’s get straight that this is very possibly not a causal link. In fact there are at least three things that are plausibly happening here:</p>
<ol>
<li>People who are more depressed were more likely to vote for Trump (or less likely to turn up to vote against him, which given voluntary voting, has a similar result although for importantly different reasons)</li>
<li>People (who may themselves be not depressed) who are in areas with lots of depressed people around them were more likely to vote for Trump (eg because voters think “Trump will be able to do something about all the depressed people around here”)</li>
<li>Some underlying factor (eg economic, social or cultural conditions) that leads to some areas having higher rates of depression also leads to higher votes for Trump, through some other mechanism</li>
</ol>
<p>My expectation is that #3 is the more likely explanation, but I personally don’t actually have evidence to choose between them. Nor are these hypotheses mutually exclusive; two or all of them might be true at once.</p>
<p>OK here’s the code that gets that data and produces the first chart and a simple model with a statistically significant effect:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readxl</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">mgcv</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">lme4</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w">
</span><span class="c1"># county level prevalence of depression at (have to hit the 'download' button)</span><span class="w">
</span><span class="c1"># https://stacks.cdc.gov/view/cdc/129404</span><span class="w">
</span><span class="n">dep</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_excel</span><span class="p">(</span><span class="s2">"cdc_129404_DS1.xlsx"</span><span class="p">,</span><span class="w"> </span><span class="n">skip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">fn</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"2024_US_County_Level_Presidential_Results.csv"</span><span class="w">
</span><span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">file.exists</span><span class="p">(</span><span class="n">fn</span><span class="p">)){</span><span class="w">
</span><span class="n">download.file</span><span class="p">(</span><span class="s2">"https://raw.githubusercontent.com/tonmcg/US_County_Level_Election_Results_08-24/refs/heads/master/2024_US_County_Level_Presidential_Results.csv"</span><span class="p">,</span><span class="w">
</span><span class="n">destfile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">votes</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_csv</span><span class="p">(</span><span class="s2">"2024_US_County_Level_Presidential_Results.csv"</span><span class="p">)</span><span class="w">
</span><span class="n">combined</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">votes</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">inner_join</span><span class="p">(</span><span class="n">dep</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"county_fips"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"CountyFIPS code"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">cpe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Crude Prevalence Estimate`</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w">
</span><span class="n">aape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Age-adjusted Prevalence Estimate`</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w">
</span><span class="c1"># what was missed in this join?</span><span class="w">
</span><span class="n">votes</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">anti_join</span><span class="p">(</span><span class="n">dep</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"county_fips"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"CountyFIPS code"</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">count</span><span class="p">(</span><span class="n">state_name</span><span class="p">)</span><span class="w">
</span><span class="c1"># 37 counties in Alaska, 9 and Connecticut and 7 in DC. Will ignore these</span><span class="w">
</span><span class="c1"># for my purposes.</span><span class="w">
</span><span class="c1">#========================modelling==================</span><span class="w">
</span><span class="c1">#----------Ordinary Least Squares------------------</span><span class="w">
</span><span class="n">model</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model</span><span class="p">)</span><span class="w">
</span><span class="c1"># note several things could be happening here:</span><span class="w">
</span><span class="c1"># - depressed people makes you vote for Trump</span><span class="w">
</span><span class="c1"># - being around depressed people makes you vote for Trump</span><span class="w">
</span><span class="c1"># - some underlying condition (eg economic) both leads to higher depression</span><span class="w">
</span><span class="c1"># and more likely to vote for Trump. This seems the most likely.</span><span class="w">
</span><span class="n">the_caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Source: data from tonmcg and CDC; analysis by freerangestats.info"</span><span class="w">
</span><span class="c1"># draw chart:</span><span class="w">
</span><span class="n">combined</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">per_gop</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"steelblue"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_smooth</span><span class="p">(</span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lm"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Crude prevalence estimate of depression"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Percentage vote for Trump in 2024 election"</span><span class="p">,</span><span class="w">
</span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Line is ordinary least squares fit to all county data together"</span><span class="p">,</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Counties with more depression voted more for Trump"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">the_caption</span><span class="p">)</span></code></pre></figure>
<p>Excerpt from the results:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) 0.40002 0.01861 21.49 &lt;2e-16 ***
cpe 1.27547 0.08712 14.64 &lt;2e-16 ***
</code></pre></div></div>
<h2 id="generalized-linear-model-with-a-quasibinomial-response">Generalized linear model with a quasibinomial response</h2>
<p>Now, I wanted to improve this for all sorts of reasons, although I suspected it was actually good enough for pragmatic purposes - case proven really, that counties with more depressed people voted more for Trump. Proven enough to justify the further work with additional data needed to explore why. But I had some statistical loose ends to tidy up. First of which is that vote is a proportion, and it feels icky to use OLS (which can send values above 1 and below 0) to model a proportion when we have generalized linear models (GLMs) designed for the job and easily available.</p>
<p>I didn’t want to use a straight logistic regression because the county-level data is far more dispersed than would be expected if it really were individuals making their own voting decisions. But a GLM with a quasi-binomial response keeps the link function used in logistic regression and the relationship of the mean and variance, while allowing the variance to be larger (or smaller) by some consistent ratio. Here’s what I get with that GLM:</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286-glm.svg" width="100%"><img src="https://freerangestats.info/img/0286-glm.png" width="100%" /></object>
<p>…created with this code. Note that we now have started to weight counties based on their overall number of voters, which makes particular sense for a binomial or similar family response. I suspect this is one of the key issues driving the line vertically down, compared to the OLS version. The other key difference of course is that now it is slightly curved, as the ‘linear’ in a GLM is on the link scale, not the scale the response is originally observed on and used for this chart.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#----------------Quasibinomial GLM----------------</span><span class="w">
</span><span class="n">model2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">glm</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w">
</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">quasibinomial</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined</span><span class="p">,</span><span class="w"> </span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model2</span><span class="p">)</span><span class="w">
</span><span class="n">preds2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">model2</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"response"</span><span class="p">,</span><span class="w"> </span><span class="n">se.fit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1"># draw chart:</span><span class="w">
</span><span class="n">combined</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">fit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preds2</span><span class="o">$</span><span class="n">fit</span><span class="p">,</span><span class="w">
</span><span class="n">se</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preds2</span><span class="o">$</span><span class="n">se.fit</span><span class="p">,</span><span class="w">
</span><span class="n">lower</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1.96</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">se</span><span class="p">,</span><span class="w">
</span><span class="n">upper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1.96</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">se</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cpe</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">per_gop</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"steelblue"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_ribbon</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">ymin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lower</span><span class="p">,</span><span class="w"> </span><span class="n">ymax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">upper</span><span class="p">),</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_line</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit</span><span class="p">),</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">limits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">percent</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Crude prevalence estimate of depression"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Percentage vote for Trump in 2024 election"</span><span class="p">,</span><span class="w">
</span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Line is generalized linear model with quasibinomial response, fit to all county data together"</span><span class="p">,</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Counties with more depression voted more for Trump"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">the_caption</span><span class="p">)</span></code></pre></figure>
<p>Here’s an excerpt from that summary of model2:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) -1.82178 0.06823 -26.70 &lt;2e-16 ***
cpe 9.25748 0.34153 27.11 &lt;2e-16 ***
</code></pre></div></div>
<p>We see <code class="language-plaintext highlighter-rouge">cpe</code> (crude prevalence estimate, ie not the age-standardised one) has very definitely “significant” evidence that it isn’t zero, with a point estimate of 9.3 and a standard error of only 0.3.</p>
<h2 id="introducing-a-state-effect">Introducing a state effect</h2>
<p>One thing that all the world knows is how spatially-based are US politics. Everything is thought of in terms of states, in particular, and smaller areas sometimes. It follows naturally from the ways that US politics is discussed that we should use a multi-level ie mixed-effects model with state as a random intercept, when looking at something like the overall relationship between depression and voting. In other words, we have to let the vote for Trump in any state vary for all the state-specific things that aren’t in our model, and see if after doing that we still get an overall (constant nation-wide) relationship between depression and voting in the counties within each state.</p>
<p>I often reach to the <code class="language-plaintext highlighter-rouge">lme4</code> library by Bates, Bolker and Walker as my starting point for mixed effects models and this is the results in this case:</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286-glmer.svg" width="100%"><img src="https://freerangestats.info/img/0286-glmer.png" width="100%" /></object>
<p>Note we have a bunch of parallel (on link scale) lines, one per state, with their height varying by the state random effect. I love the effect of this chart, but unfortunately <code class="language-plaintext highlighter-rouge">lme4::glmer</code> which is used in this case doesn’t let us use a quasibinomial family response; we have to use a binomial response which forces the variance to equal <code class="language-plaintext highlighter-rouge">p(1-p)/n</code>, not just be proportional to it. The net result is that the confidence intervals are much narrower than is justified.</p>
<p>An alternative way to fit a similar model is the the <code class="language-plaintext highlighter-rouge">gam()</code> function from the amazing <code class="language-plaintext highlighter-rouge">mgcv</code> library by Simon Wood. There’s <a href="https://fromthebottomoftheheap.net/2021/02/02/random-effects-in-gams/">a great discussion of this</a> on Gavin Simpson’s blog. By specifying a spline around a categorical factor like <code class="language-plaintext highlighter-rouge">s(state_factor, bs = 're')</code> (‘re’ stands for random effect) we can use <code class="language-plaintext highlighter-rouge">gam()</code> to add random intercepts while using the full range of families available to <code class="language-plaintext highlighter-rouge">gam()</code> including quasibinomial. That gives us this alternative version of the last model; this time with much fatter (and realistic) confidence intervals!:</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0286-gam.svg" width="100%"><img src="https://freerangestats.info/img/0286-gam.png" width="100%" /></object>
<p>The confidence intervals are now very fat. But there’s still no doubt about the significance of the evidence of the relationship of the crude prevalence of depression on voting behaviour, even after allowing a random state-level intercept.</p>
<p>Here’s the code for both the <code class="language-plaintext highlighter-rouge">glmer</code> and <code class="language-plaintext highlighter-rouge">gam</code> versions of this random state effect model:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#---------------------With state random effect with lmer4::glmer--------------------</span><span class="w">
</span><span class="n">model4</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">lme4</span><span class="o">::</span><span class="n">glmer</span><span class="p">(</span><span class="n">per_gop</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cpe</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="m">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">state_name</span><span class="p">),</span><span class="w">
</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"binomial"</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">combined</span><span class="p">,</span><span class="w">
</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">total_votes</span><span class="p">)</span><span class="w">
</span><span class="c1"># note can't use quasibinomial family with glmer so we aren;t really dealing</span><span class="w">
</span><span class="c1"># properly with the overdispersion. what to do about that? Confidence intervals</span><span class="w">
</span><span class="c1"># will be too narrow. Various alternatives posisble.</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">model4</span><span class="p">)</span><span class="w">
</span><span class="n">preds4</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">model4</span><span class="p">,</span><span class="w"> </span><span class="n">se.fit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"response"</span><span class="p">)</span><span class="w">
</span><span class="n">combined</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">fit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preds4</span><span class="o">$</span><span class="n">fit</span><span class="p">,</span><span class="w">
</span><span class="n">se</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preds4</span><span class="o">$</span><span class="n">se.fit</span><span class="p">,</span><span class="w">
</span><span class="n">lower</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1.96</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">se</span><span class="p">,</span><span class="w">