Update (July 2020): There are now other R options that are faster. For example exactextractr
, velox
, and fasterize
, among many others. I no longer resort to Python for these tasks.
Taking the average value of rasters by polygon is slow in R. Really slow. Using extract
with a raster of more than 800k cells (PRISM weather data) and a shapefile of more than 3000 polygons (counties) takes a little over 500 seconds.
The rasterstats
package in Python offers a faster solution, but can be tricky to install, particularly in Windows:
Install Anaconda Python (trying to upgrade
pip
broke it, so I just reinstalled Ananconda)Install
gdal 1.11.4
(not 2.0),fiona
,rasterio
, andShapely
- May need to install some from binaries if using Windows
Install
rasterstats
Sample code to run zonal statistics and save the result to CSV.
import csv
from rasterstats import zonal_stats
# GLOBALS
= 'E:/prism/data/zipped/PRISM_tmin_stable_4kmD1_20150531_bil.bil'
RAST = 'E:/prism/data/census/gz_2010_us_050_00_20m/gz_2010_us_050_00_20m.shp'
POLY
# Run zonal_stats
= zonal_stats(POLY, RAST, stats=['count', 'mean', 'min', 'max'], all_touched=True, geojson_out=True)
stats
# Save to CSV
= stats[0]['properties'].keys()
keys with open('E:/prism/out.csv', 'wb') as output_file:
= csv.DictWriter(output_file, keys)
dict_writer
dict_writer.writeheader()for row in stats:
dict((k, v.encode('utf-8')) if isinstance(v, basestring) else (k,v) for k, v in row['properties'].iteritems())) dict_writer.writerow(