San Francisco fire service analysis


Introduction

Fire services are standard public institutions that provide fire prevention and fire suppression services. Beyond this, fire services also act as first responders for medical emergencies and rescue operations. Their efficiency and effectiveness are crucial to enhance public safety.

The city of San Francisco has public datasets for fire service calls, non-medical incidents and safety complaints. The records consist of categorical types, time of occurrence and response location. Combined with other public data, we explore and analyze aspects of San Francisco's fire service purely for hobbyist purposes. Our analysis is twofold. First, we collect descriptive statistics of service calls, incidents and complaints. Second, we explore answers to the following questions:

  1. What effect do fire safety complaints have with fire-related incidents?
  2. Do fire-related incidents occur more often during particular seasons (e.g. summer, winter)?
  3. Are fires and false alarms independent of the location's zoning type?

Addressing these questions can offer insight on where and when fire-related incidents occur. In turn, a more rigorous investigation can be used to inform allocation of fire service resources and policy.

Data methodology

Our data extraction, cleaning and analysis is carried out in Python using pandas, geopandas, scipy.stats and seaborn. All of the data used is obtained from the San Francisco open data portal, and can be found by searching

The code is available on GitHub. For those interested, a discussion on data processing and limitations of our analysis is in the README.md file therein.

Analysis - Part 1: descriptive statistics

For calls, incidents and complaints, we extract all data points over exactly 3 years from 2020 to 2022. After extracting and cleaning the data, we have 439,626 fire service calls, 94,647 nonmedical incidents, and 9,295 fire safety complaints. In each case, these make up at least 98% of the original records before erroneous entries were removed.

Figure 1.1: Count of fire service call types. Medical incidents have been omitted.

In Figure 1.1, counts of various fire service call types are shown, which display a total of 105,734 calls (24.1%). Omitted from the figure, 333,892 calls (75.9%) are responses to medical incidents. From the various types, the fire service can be seen responding to (claims of) incidents of varying severity. To our understanding, the call types are assigned to the call, which leaves the possibility that the incident is a false alarm. This is clarified by the nonmedical incidents data.

Figure 1.2: Count of nonmedical incidents. Emergency medical incidents are included.
Figure 1.3: Count of response types to nonmedical incidents. Emergency medical incidents are included.

Figures 1.2 and 1.3 show counts of incident types and response types, respectively, for nonmedical incidents. Contrary to the name, emergency medical incidents are included in the count. Each incident and response category is defined by National Fire Incident Reporting System (NFIRS) codes provided in the data.

The top three incidents are 38,466 false alarms (40.6%), 16,551 (17.5%) service calls and 14,895 fires (15.7%). The top three responses are 50,272 regulatory practices (53.1%), 10,825 are for providing assistance (11.4%), and 9,632 involve fire extinguishment (10.2%).

Figure 1.4: Count of fire safety complaints.

Figure 1.4 shows counts of fire safety complaints. These typically involve issues with alarms, extinguishers, exits or other regulatory violations. These make up 79% of documented complaints to the fire service.

Figure 1.5: Kernel density estimation (KDE) plots of fire-related incidents and safety complaints in San Francisco. Darker colour means larger count.

Finally, Figure 1.5 geographically visualizes the density of fire-related incidents and safety complaints. The San Francisco city boundary is shown in black and the coordinates are associated with the EPSG:2227 projection. The larger density (darker area) could be explained by the fact that it coincides with the downtown area, which has a larger population density and larger buildings.

Analysis - Part 2: inferential statistics

Recall that we initially posed some open-ended questions about the fire service data:

  1. What effect do fire safety complaints have with fire-related incidents?
  2. Do fire-related incidents occur more often during particular seasons (e.g. summer, winter)?
  3. Are fires and false alarms independent of the location's zoning type?

We can formalize each question into something that we can answer with inferential statistics via hypothesis testing. Note that there may be several test formulations for a single question, which suggests further investigation is needed beyond this report. Moreover, it is not always possible to meet the test assumptions. Nonetheless, the goal here is to learn something about the fire service data.

Figure 2.1: Histogram of fire-related incidents within 500 meters and 30 days before or after a complaint.

1. What effect do fire safety complaints have with fire-related incidents?

Analysis. To simplify, we investigate a more specific question: do complaints reduce (or increase) the number of fire-related incidents locally, in a spatiotemporal sense? The data is generated as follows. First, for each complaint we count all fire-related incidents within 500 meters of the complaint location that occurred within 30 days BEFORE the complaint submission date. Additionally, we do the same for AFTER the complaint submission date. Now we draw random samples from the BEFORE and AFTER counts independently.

Result. A Mann-Whitney U test is performed on the BEFORE and AFTER samples. The null hypothesis states the probability that BEFORE > AFTER is equal to 0.5. The test statistic is U = 39233381.5 with p = 0.2047, for sample sizes nBEFORE = nAFTER = 2000. A histogram of the samples is shown in Figure 2.1.

Interpretation. There is not enough evidence here to conclude that fire safety complaints locally impact the occurrence of fire-related incidents. This may be explained by complaints often seeking to improve survival from a fire, as opposed to preventing fires in general. Examining how changing time units and distance impacts the results may be of future interest.

Figure 2.2: Histogram (left) and box plot (right) of fire-related incidents per 3-day time units, by season.

2. Do fire-related incidents occur more often during particular seasons (e.g. summer, winter)?

Analysis. Here we analyze counts of fire-related incidents per unit time of 3 days. First, a subset of incidents are randomly drawn. Next, each year is partitioned into 3-day intervals where each sample is binned. The intervals themselves are assigned a season. The choice of using 3-day intervals is to balance sample size and approximating normality via the Central Limit Theorem.

Comparison Statistic p-value Lower CI Upper CI
winter - spring 0.727 0.235 -0.481 1.934
winter - summer -0.034 1.000 -1.242 1.173
winter - autumn 0.879 0.105 -0.332 2.090
spring - summer -0.761 0.197 -1.965 0.443
spring - autumn 0.152 0.979 -1.055 1.360
summer - autumn 0.913 0.084 -0.294 2.121
Table 2.1: Tukey's HSD pairwise comparisons after significance with ANOVA.

Result. We perform ANOVA with 2000 incidents drawn at random, yielding n = 366 total samples after binning. A histogram and box plot of the samples are shown in Figure 2.2. Now, we run Levene's test (median) to check for homoscedasticity. For Levene, the test statistic is L = 2.3302 with p = 0.074. The result is not significant, so proceeding, ANOVA yields the test statistic A = 3.0746 with p < 0.05. Post hoc analysis with Tukey's HSD is given Table 2.1, in which the means between seasons are compared.

Interpretation. We find a lack of evidence that fire incidents significantly differ by season. An explanation for a lack of difference is not clear. Considering the fire category is broad, for future work it would be interesting to look at seasonal differences for specific types of fire incidents (e.g. outdoor fire, kitchen fire, etc.).

3. Are fires and false alarms independent of the location's zoning type?

Analysis. To assemble the data, we filter out incidents to only consider false alarms and fires. Since each incident has a location, we can identify its zoning type by performing a spatial join with San Francisco's zoning districts dataset.

Incident \ Zone type Commerical Industrial Mixed use Public Residential
False alarm 508 94 1174 377 1467
Fire 137 132 533 225 353
Table 2.2: Contingency table of false alarms and fires for zoning districts.

Result. 5000 incidents are sampled at random from the list of false alarms and fires. This gives the contingency table in Table 2.2. A chi-squared test for independence is performed on the data with k = 4 degrees of freedom. The result is significant with test statistic is X = 221.7132 and p < 0.001.

Interpretation. We can conclude that the zoning type variable and incident variable (restricted to false alarms and fires) are very likely to be dependent. This is somewhat expected since fire hazards and safety practices likely differ by zone, and the area (i.e. presence) of some zones are greater than others. For future work, it would be interesting to study this dependence in detail.