Abstract Aim The purposes of this study were to develop a Geographic Information System and spatial analytical methodology to reconstruct and represent the presettlement vegetation in a spatially continuous manner over large areas and to investigate vegetation–site relationships before widespread changes of the vegetation had taken place. Location The study area was the Holland Land Company Purchase in western New York, a 14,400 km2 area extending across the physiographic provinces of the Erie–Ontario Lowlands and the Appalachian Uplands. Methods Bearing-tree records from the Holland Land Company township surveys of western New York in c. 1800 were collected and analysed. The geostatistical method of indicator kriging was used to map spatially continuous representations of individual tree species. Rule-based and statistically clustered approaches were used to analyse and classify the reconstructed tree species distributions in order to obtain the vegetation association distribution. Contingency table analysis was conducted to quantify species relationships with soil conditions. Results The presettlement vegetation at both the tree species and the vegetation association levels were easier to interpret and visually more effective as a spatially continuous representation than as a discontinuous distribution of symbols. The results for tree species were probability occurrences of species distribution, showing spatial patterns that were not apparent in discrete maps of points or in summary tables of species frequencies. Analysis of the 8792 bearing trees suggested the dominance of American beech (Fagus grandifolia) and sugar maple (Acer saccharum) in the forest composition 200 years ago. Both soil drainage and texture were important site determinants of the vegetation in western New York. The rule-based and statistically clustered approaches had the advantage of summarizing vegetation compositional patterns in a single image, thus avoiding the need to delineate manually and subjectively the location of boundaries between adjacent vegetation associations. Main conclusions The study offers more insights into the spatial pattern of presettlement forests in western New York than do prior studies. The spatially continuous representation could also enable the comparison of vegetation distribution from data sources that have different sampling schemes, for example the comparison of presettlement vegetation from the presettlement land survey records with current vegetation from modern forest inventories. The results are of value, providing a useful benchmark against which to examine vegetation change and the impacts of human land use.