All of the groups are using data from SoCal. We could grab a dataset from SCAG. This data is from 2019 though, so boundaries may not be up-to-date.
However, instead we’re going to learn a different approach and import the data using the tigris package. Hat tip and extra credit to Anna for using this in her assignment.
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
As always, there’s a bunch of cool functions. The one we will use is called places(). It pulls cities and census designated places geospatial information.
In this example, I
#read the dataCA_places<-places(state ='CA', cb =T, #pull for California, lower detail level year =2022, progress_bar =FALSE)|>st_transform(crs =4326)head(CA_places)
Simple feature collection with 6 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -120.5913 ymin: 33.05765 xmax: -116.9939 ymax: 35.22005
Geodetic CRS: WGS 84
STATEFP PLACEFP PLACENS AFFGEOID GEOID NAME
1 06 55156 02411359 1600000US0655156 0655156 Palmdale
2 06 22804 02410455 1600000US0622804 0622804 Escondido
3 06 02924 02409738 1600000US0602924 0602924 Arvin
4 06 16350 02410232 1600000US0616350 0616350 Corona
5 06 43224 02410875 1600000US0643224 0643224 Los Alamitos
6 06 31414 02410672 1600000US0631414 0631414 Guadalupe
NAMELSAD STUSPS STATE_NAME LSAD ALAND AWATER
1 Palmdale city CA California 25 274705334 629569
2 Escondido city CA California 25 96726218 276445
3 Arvin city CA California 25 12482061 0
4 Corona city CA California 25 103320711 53926
5 Los Alamitos city CA California 25 10380819 161830
6 Guadalupe city CA California 25 3394103 10948
geometry
1 MULTIPOLYGON (((-118.2877 3...
2 MULTIPOLYGON (((-117.0237 3...
3 MULTIPOLYGON (((-118.8511 3...
4 MULTIPOLYGON (((-117.673 33...
5 MULTIPOLYGON (((-118.093 33...
6 MULTIPOLYGON (((-120.5912 3...
That’s just too much information. For the Fontana and Bloomington projects, we probably want to just show those two communities. Let’s use the %in% operator to filter() those two places from the list.
Figure 19.2 shows a leaflet map of the cities, zoomed into the main valley.
Spatial joins use geospatial information to identify overlapping geometries in space. Thus, if one has two geospatial datasets, one can quickly identify intersections of interest.
19.3.1 Example 1 - Spatial Joins
Let’s pick all the cities and census designated places in Los Angeles County.
Step 1 - pull Los Angeles County spatial boundaries using the county
#read the dataLA_county<-counties(state ='CA', cb =T, progress_bar =FALSE)|>#pull for California, lower detail level, year = 2022 ) st_transform(crs =4326)|>filter(NAME=='Los Angeles')# Just pick LA County
Retrieving data for the year 2022
Check if it worked by making a basic leaflet map. Figure 19.3
Now let’s spatially filter the Los Angeles places using st_filter() from the sf package.
The syntax is to provide the basic dataset to filter first, then the geospatial polygon to filter on second.
LA_places<-CA_places|>st_filter(LA_county, # .predicate =st_covered_by)# This adjective includes fully 'covered' places - not ones that merely intersect partially, # which is the defaulthead(LA_places)
Simple feature collection with 6 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -118.6682 ymin: 33.70515 xmax: -117.9162 ymax: 34.76915
Geodetic CRS: WGS 84
STATEFP PLACEFP PLACENS AFFGEOID GEOID NAME
1 06 55156 02411359 1600000US0655156 0655156 Palmdale
2 06 71806 02411897 1600000US0671806 0671806 Sierra Madre
3 06 19990 02410361 1600000US0619990 0619990 Duarte
4 06 44000 02410877 1600000US0644000 0644000 Los Angeles
5 06 45400 02411020 1600000US0645400 0645400 Manhattan Beach
6 06 40130 02411620 1600000US0640130 0640130 Lancaster
NAMELSAD STUSPS STATE_NAME LSAD ALAND AWATER
1 Palmdale city CA California 25 274705334 629569
2 Sierra Madre city CA California 25 7643252 10258
3 Duarte city CA California 25 17376217 0
4 Los Angeles city CA California 25 1218634998 80417875
5 Manhattan Beach city CA California 25 10189493 16389932
6 Lancaster city CA California 25 244189279 651713
geometry
1 MULTIPOLYGON (((-118.2877 3...
2 MULTIPOLYGON (((-118.0682 3...
3 MULTIPOLYGON (((-117.9902 3...
4 MULTIPOLYGON (((-118.6682 3...
5 MULTIPOLYGON (((-118.4228 3...
6 MULTIPOLYGON (((-118.3252 3...
Ok, let’s make a map to see if these are just the correct cities.
Figure 19.4 shows the place and city boundaries in blue, and LA County in red.
leaflet()|>addTiles()|>addPolylines(data =LA_county, color ='red')|>addPolygons(data =LA_places, color ='blue', fillOpacity =0.2, label =~NAME, weight =2)
19.4 Database Joins
Dataset CA_places has the geospatial information on city and census designated place boundaries. Some of the other datasets you are using have the information you would like to compare. How can we merge or join these disparate data layers for visualization and analysis?
There are two types of joins that can help to merge disparate datasets.
The first type of join is based on unique records in both datasets. If you have a column in dataset A and it has some matching records in dataset B, there are ways to join the two datasets.
Here’s all the permutations. - inner_join() - keep only records present in both dataset A and B. - left_join() - keep all records from the dataset A and only matching variables from B. Fill missing with NA. - right_join() - keep all records from the dataset B and only matching variables from A. Fill missing with NA. - full_join() - keep all records from A and B - fill NA in any records where dataset is missing from one of the datasets. - anti_join() - keep only records from A that don’t occur in B.
The example below shows Venn diagrams of all the permutations.
19.4.1 Example #2 - population data
The California Department of Finance provides city and place information on households and populations. I’d like to map population of cities and places using the latest and greatest data.
First, let me scrape the data using a new package for reading MS Excel spreadsheets - readxl.
We need to install the new package which reads Excel files.
Scrape the data using the read.xlsx() function from openxlsx
Tbl_E1<-read.xlsx(xlsxFile ='https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/E-1_2024_InternetVersion.xlsx', sheet ='E-1 CityCounty2024', #multiple sheets, we need to specify the sheetname rows =5:599)|># this is filtering for rows with decent datarename(place ='State/County/City', pop2023 ='1/1/2023', pop2024 ='1/1/2024')head(Tbl_E1)
Alright, we have population data and place names. Let’s see if we can show city populations using a join function.
LA_places_pop<-LA_places|>left_join(Tbl_E1, by =c('NAME'='place'))|># this tells it what columns to join the two tables byfilter(!is.na(pop2024))|># remove census places with no population data in this tablefilter(NAME!='Los Angeles')# remove Los Angeles City because it is too bigpalPop<-colorNumeric(domain =LA_places_pop$pop2024, palette ='Spectral', reverse =T)
Time for a map. Figure 19.5 shows population of cities in Los Angeles County other than the city of Los Angeles.
leaflet()|>addProviderTiles(provider =providers$CartoDB.Positron)|>addPolygons(data =LA_places_pop, color =~palPop(pop2024), fillOpacity =0.8, label =~str_c(NAME, ', ', pop2024), # add concatenated string label with city name and population weight =2)|>addLegend(data =LA_places_pop, title ='2024 population', pal =palPop, values =~pop2024)