San Francisco fire service analysis
Introduction
Fire services are standard public institutions that provide fire prevention and fire suppression services. Beyond this, fire services also act as first responders for medical emergencies and rescue operations. Their efficiency and effectiveness are crucial to enhance public safety.
The city of San Francisco has public datasets for fire service calls, non-medical incidents and safety complaints. The records consist of categorical types, time of occurrence and response location. Combined with other public data, we explore and analyze aspects of San Francisco's fire service purely for hobbyist purposes. Our analysis is twofold. First, we collect descriptive statistics of service calls, incidents and complaints. Second, we explore answers to the following questions:
- What effect do fire safety complaints have with fire-related incidents?
- Do fire-related incidents occur more often during particular seasons (e.g. summer, winter)?
- Are fires and false alarms independent of the location's zoning type?
Addressing these questions can offer insight on where and when fire-related incidents occur. In turn, a more rigorous investigation can be used to inform allocation of fire service resources and policy.
Data methodology
Our data extraction, cleaning and analysis is carried out in Python using pandas
, geopandas
, scipy.stats
and seaborn
.
All of the data used is obtained from the San Francisco open data portal, and can be found by searching
- fire department calls for service
- fire incidents
- fire safety complaints
- zoning map - zoning districts
- bay area counties
The code is available on GitHub.
For those interested, a discussion on data processing and limitations of our analysis is in the README.md
file therein.
Analysis - Part 1: descriptive statistics
For calls, incidents and complaints, we extract all data points over exactly 3 years from 2020 to 2022. After extracting and cleaning the data, we have 439,626 fire service calls, 94,647 nonmedical incidents, and 9,295 fire safety complaints. In each case, these make up at least 98% of the original records before erroneous entries were removed.
In Figure 1.1, counts of various fire service call types are shown, which display a total of 105,734 calls (24.1%). Omitted from the figure, 333,892 calls (75.9%) are responses to medical incidents. From the various types, the fire service can be seen responding to (claims of) incidents of varying severity. To our understanding, the call types are assigned to the call, which leaves the possibility that the incident is a false alarm. This is clarified by the nonmedical incidents data.
Figures 1.2 and 1.3 show counts of incident types and response types, respectively, for nonmedical incidents. Contrary to the name, emergency medical incidents are included in the count. Each incident and response category is defined by National Fire Incident Reporting System (NFIRS) codes provided in the data.
The top three incidents are 38,466 false alarms (40.6%), 16,551 (17.5%) service calls and 14,895 fires (15.7%). The top three responses are 50,272 regulatory practices (53.1%), 10,825 are for providing assistance (11.4%), and 9,632 involve fire extinguishment (10.2%).
Figure 1.4 shows counts of fire safety complaints. These typically involve issues with alarms, extinguishers, exits or other regulatory violations. These make up 79% of documented complaints to the fire service.
Finally, Figure 1.5 geographically visualizes the density of fire-related incidents and safety complaints. The San Francisco city boundary is shown in black and the coordinates are associated with the EPSG:2227 projection. The larger density (darker area) could be explained by the fact that it coincides with the downtown area, which has a larger population density and larger buildings.
Analysis - Part 2: inferential statistics
Recall that we initially posed some open-ended questions about the fire service data:
- What effect do fire safety complaints have with fire-related incidents?
- Do fire-related incidents occur more often during particular seasons (e.g. summer, winter)?
- Are fires and false alarms independent of the location's zoning type?
We can formalize each question into something that we can answer with inferential statistics via hypothesis testing. Note that there may be several test formulations for a single question, which suggests further investigation is needed beyond this report. Moreover, it is not always possible to meet the test assumptions. Nonetheless, the goal here is to learn something about the fire service data.
1. What effect do fire safety complaints have with fire-related incidents?
Analysis. To simplify, we investigate a more specific question: do complaints reduce (or increase) the number of fire-related incidents locally, in a spatiotemporal sense? The data is generated as follows. First, for each complaint we count all fire-related incidents within 500 meters of the complaint location that occurred within 30 days BEFORE the complaint submission date. Additionally, we do the same for AFTER the complaint submission date. Now we draw random samples from the BEFORE and AFTER counts independently.
Result. A Mann-Whitney U test is performed on the BEFORE and AFTER samples. The null hypothesis states the probability that BEFORE > AFTER is equal to 0.5. The test statistic is U = 39233381.5 with p = 0.2047, for sample sizes nBEFORE = nAFTER = 2000. A histogram of the samples is shown in Figure 2.1.
Interpretation. There is not enough evidence here to conclude that fire safety complaints locally impact the occurrence of fire-related incidents. This may be explained by complaints often seeking to improve survival from a fire, as opposed to preventing fires in general. Examining how changing time units and distance impacts the results may be of future interest.
2. Do fire-related incidents occur more often during particular seasons (e.g. summer, winter)?
Analysis. Here we analyze counts of fire-related incidents per unit time of 3 days. First, a subset of incidents are randomly drawn. Next, each year is partitioned into 3-day intervals where each sample is binned. The intervals themselves are assigned a season. The choice of using 3-day intervals is to balance sample size and approximating normality via the Central Limit Theorem.
Comparison | Statistic | p-value | Lower CI | Upper CI |
---|---|---|---|---|
winter - spring | 0.727 | 0.235 | -0.481 | 1.934 |
winter - summer | -0.034 | 1.000 | -1.242 | 1.173 |
winter - autumn | 0.879 | 0.105 | -0.332 | 2.090 |
spring - summer | -0.761 | 0.197 | -1.965 | 0.443 |
spring - autumn | 0.152 | 0.979 | -1.055 | 1.360 |
summer - autumn | 0.913 | 0.084 | -0.294 | 2.121 |
Result. We perform ANOVA with 2000 incidents drawn at random, yielding n = 366 total samples after binning. A histogram and box plot of the samples are shown in Figure 2.2. Now, we run Levene's test (median) to check for homoscedasticity. For Levene, the test statistic is L = 2.3302 with p = 0.074. The result is not significant, so proceeding, ANOVA yields the test statistic A = 3.0746 with p < 0.05. Post hoc analysis with Tukey's HSD is given Table 2.1, in which the means between seasons are compared.
Interpretation. We find a lack of evidence that fire incidents significantly differ by season. An explanation for a lack of difference is not clear. Considering the fire category is broad, for future work it would be interesting to look at seasonal differences for specific types of fire incidents (e.g. outdoor fire, kitchen fire, etc.).
3. Are fires and false alarms independent of the location's zoning type?
Analysis. To assemble the data, we filter out incidents to only consider false alarms and fires. Since each incident has a location, we can identify its zoning type by performing a spatial join with San Francisco's zoning districts dataset.
Incident \ Zone type | Commerical | Industrial | Mixed use | Public | Residential |
---|---|---|---|---|---|
False alarm | 508 | 94 | 1174 | 377 | 1467 |
Fire | 137 | 132 | 533 | 225 | 353 |
Result. 5000 incidents are sampled at random from the list of false alarms and fires. This gives the contingency table in Table 2.2. A chi-squared test for independence is performed on the data with k = 4 degrees of freedom. The result is significant with test statistic is X = 221.7132 and p < 0.001.
Interpretation. We can conclude that the zoning type variable and incident variable (restricted to false alarms and fires) are very likely to be dependent. This is somewhat expected since fire hazards and safety practices likely differ by zone, and the area (i.e. presence) of some zones are greater than others. For future work, it would be interesting to study this dependence in detail.