Title: | Signal Detection Analysis |
---|---|
Description: | Exploring time series for signal detection. It is specifically designed to detect possible outbreaks using infectious disease surveillance data at the European Union / European Economic Area or country level. Automatic detection tools used are presented in the paper "Monitoring count time series in R: aberration detection in public health surveillance", by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. The package includes: - Signal Detection tool, an interactive 'shiny' application in which the user can import external data and perform basic signal detection analyses; - An automated report in HTML format, presenting the results of the time series analysis in tables and graphs. This report can also be stratified by population characteristics (see 'Population' variable). This project was funded by the European Centre for Disease Prevention and Control. |
Authors: | Lore Merdrignac [aut, ctr] (Author of the package and original code), Joana Gomes Dias [aut, fnd, cre] (Project manager and package maintainer), Esther Kissling [aut, ctr], Tommi Karki [aut, fnd], Margot Einoder-Moreno [ctb, fnd] |
Maintainer: | Joana Gomes Dias <[email protected]> |
License: | EUPL |
Version: | 0.1.1 |
Built: | 2024-11-03 04:05:45 UTC |
Source: | https://github.com/eu-ecdc/episignaldetection |
Aggregate filtered final Atlas export
aggAtlasExport(x, input)
aggAtlasExport(x, input)
x |
dataframe |
input |
list of parameters as defined in the Signal Detection Application (see (i.e. |
dataframe aggregated by geographical level and time unit
filterAtlasExport
SignalData
stsSD
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input) #-- Aggregating the data by geographical level and time point dataset <- aggAtlasExport(dataset, input)
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input) #-- Aggregating the data by geographical level and time point dataset <- aggAtlasExport(dataset, input)
A list including two datasets containing the parameters used for Farrington Flexible and for GLRNB for each time unit available in the Signal Detection tool
AlgoParam
AlgoParam
A list of 2 dataframes: one with 2 rows and 9 variables and GRLNB with 2 rows and 8 variables
Default parameters for FarringtonFlexible algorithm
Time units available in the signal detection tool i.e. week, month
Window's half-size, i.e. number of weeks to include before and after the current week in each year (w=2 for weeks, w=1 for months)
Logical specifying whether to reweight past outbreaks or not (TRUE for both weeks and months, past outbreaks are always reweighted)
Logical specifying whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met. (TRUE for both weeks and months, a trend is always fit)
Numeric defining the threshold for reweighting past outbreaks using the Anscombe residuals (2.85 for both weeks and months, as advised in the improved method)
Logical specifying whether to print warnings from the call to glm (TRUE for both weeks and months)
Numeric defining the threshold for deciding whether to keep trend in the model (0.05 for both weeks and months)
Integer, the number of cases defining a threshold for minimum alarm, no alarm is sounded if fewer than 'limit54_1' cases were reported in the past 'limit54_2' weeks/months
Integer, the number of periods defining a threshold for minimum alarm, no alarm is sounded if fewer than 'limit54_1' cases were reported in the past 'limit54_2' weeks/months
Default parameters for GLRNB algorithm
Time units available in the signal detection tool i.e. week, month
A vector of in-control values of the mean of the Poisson / negative binomial distribution with the same length as range - NULL for both weeks and months
Numeric, the pre-specified value for k or lambda is used in a recursive LR scheme - log(1.2) for both weeks and months corresponding to a 20 percent increase in the mean
Numeric, the dispersion parameter of the negative binomial distribution. If alpha=NULL the parameter is calculated as part of the in-control estimation - alpha=NULL for both weeks and months
Numeric, the threshold in the GLR test, i.e. c_gamma - cARL=0.25 for both weeks and months
Integer, the number of observations needed before we have a full rank - Mtilde=1 for both weeks and months
Integer defining the number of time instances back in time in the window-limited approach. To always look back until the first observation use M=-1. M=1 for both weeks and months
Character string specifying the type of the alternative. Currently the two choices are intercept and epi - Change=intercept for both weeks and months
surveillance::farringtonFlexible
surveillance::glrnb
Build algo object from an sts object class using either FarringtonFlexible or GLRNB surveillance algorithm
algoSD(x.sts, algo = "FarringtonFlexible", timeUnit = "Month", testingPeriod = 5)
algoSD(x.sts, algo = "FarringtonFlexible", timeUnit = "Month", testingPeriod = 5)
x.sts |
sts class object (see |
algo |
character string containing the name of the algorithm to use. Options are "FarringtonFlexible" (default) or "GLRNB". |
timeUnit |
character string for the time unit of the time series. Options are "Week" or "Month". |
testingPeriod |
numeric: number of time units (months, weeks) back in time to test the algorithm on (to detect outbreaks in) |
sts
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input) #-- Aggregating the data by geographical level and time point dataset <- aggAtlasExport(dataset, input) #-- Bulding the corresponding sts object dataset.sts <- stsSD(observedCases = dataset$NumValue, studyPeriod = dataset$StudyPeriod, timeUnit = input$unit, startYM = c(as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%Y")), as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%m")))) #-- Building the corresponding algo object dataset.algo <- algoSD(dataset.sts, algo = input$algo, timeUnit = input$unit, testingPeriod = input$testingperiod)
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input) #-- Aggregating the data by geographical level and time point dataset <- aggAtlasExport(dataset, input) #-- Bulding the corresponding sts object dataset.sts <- stsSD(observedCases = dataset$NumValue, studyPeriod = dataset$StudyPeriod, timeUnit = input$unit, startYM = c(as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%Y")), as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%m")))) #-- Building the corresponding algo object dataset.algo <- algoSD(dataset.sts, algo = input$algo, timeUnit = input$unit, testingPeriod = input$testingperiod)
Clean the Atlas data export dataframe before signal detection analysis
(see importAtlasExport
and online ECDC Atlas:
http://atlas.ecdc.europa.eu/public/index.aspx)
cleanAtlasExport(x)
cleanAtlasExport(x)
x |
dataframe, usually the ouput of the import function |
The function will:
Filter only on case based indicators i.e. 'Reported Cases"
Create four additional time variables to ease the analysis:
TimeUnit ('Year', 'Month', 'Week'),
TimeYear (xxxx),
TimeMonth (xx)
TimeWeek(xx)
Keep only variables of interest i.e. "HealthTopic", "Population", "Time", "RegionName", "NumValue"
dataframe
importAtlasExport
filterAtlasExport
dataset <- cleanAtlasExport( importAtlasExport(x = 'ECDC_surveillance_data_Anthrax.csv') )
dataset <- cleanAtlasExport( importAtlasExport(x = 'ECDC_surveillance_data_Anthrax.csv') )
Filter clean Atlas export according to input parameters
filterAtlasExport(x, input, stratified)
filterAtlasExport(x, input, stratified)
x |
dataframe, clean Atlas export (see |
input |
list of parameters as defined in the Signal Detection Application (see (i.e. |
stratified |
a logical value indicating whether the report
should be stratified by |
dataframe filtered on the selected parameters (input list)
cleanAtlasExport
aggAtlasExport
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input, stratified = FALSE)
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input, stratified = FALSE)
Import ECDC Atlas csv export file
(exported from the online ECDC Atlas:
http://atlas.ecdc.europa.eu/public/index.aspx)
e.g. "ECDC_surveillance_data_Anthrax.csv"
importAtlasExport(x)
importAtlasExport(x)
x |
file name of a csv file, export from the ECDC Atlas (e.g. |
The function will interpret missing reports '-' as NA values
dataframe
dataset <- importAtlasExport(x = 'ECDC_surveillance_data_Anthrax.csv')
dataset <- importAtlasExport(x = 'ECDC_surveillance_data_Anthrax.csv')
Plot the Signal Detection time series including historical data, alarm detection period and alarms
plotSD(x, input, subRegionName, x.sts, x.algo)
plotSD(x, input, subRegionName, x.sts, x.algo)
x |
dataframe (default |
input |
list of parameters as defined in the Signal Detection Application (see (i.e. |
subRegionName |
character string, region label to use in the plot, if different than |
x.sts |
sts object (optional), see |
x.algo |
algo object (optional), see |
plot
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Plotting the signal detection output plotSD(input = input)
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Plotting the signal detection output plotSD(input = input)
Run the 'shiny' interactive application for signal detection analysis using ECDC Atlas export data.
runEpiSDApp()
runEpiSDApp()
Datasets to use in the tool:
Default dataset included in the application (Salmonellosis 2007-2016 or Measles 1999-2018 data);
External dataset using the "Browse" button in the application:
–> An export (csv format) from the ECDC Surveillance Atlas of Infectious Diseases: http://atlas.ecdc.europa.eu/public/index.aspx.
On the ECDC "Surveillance Atlas of Infectious Diseases" web site:
1- Choose the disease/health topic to analyse
2- Export the data (csv) using the default settings
3- Import the csv in the application
4- You can now explore the disease time series for signal detection...
–> Any dataset specified as described in the package vignette.
# --- Run the 'shiny' app # --- (NB: please open the app in an external browser # --- in order to facilitate its use) runEpiSDApp()
# --- Run the 'shiny' app # --- (NB: please open the app in an external browser # --- in order to facilitate its use) runEpiSDApp()
Function to render the markdown report of alarms in HTML format for ECDC Signal Detection Report
runEpiSDReport(input, stratified, outputfile)
runEpiSDReport(input, stratified, outputfile)
input |
list of parameters as defined in the Signal Detection Application (see (i.e. (see also default parameters in
|
stratified |
a logical value indicating whether the report
should be stratified by |
outputfile |
output file name (e.g. (default value is a temporary folder - |
Datasets to use in the report:
Default dataset included in the package
(Salmonellosis 2007-2016 or Measles 1999-2018 data) (i.e. input$file = NULL
);
External dataset:
–> An export (csv format) from the ECDC Surveillance Atlas of Infectious Diseases: http://atlas.ecdc.europa.eu/public/index.aspx.
On the ECDC "Surveillance Atlas of Infectious Diseases" web site:
1- Choose the disease/health topic to analyse
2- Export the data (csv) using the default settings
3- Specify the location of this external dataset in the input
argument of the runEpiSDReport() function
(e.g. input <- list(file = list(datapath = "C:/Users/Downloads/ECDC_surveillance_data_Pertussis.csv"),
disease = "Pertussis", country = "Greece", indicator = "Reported cases",
stratification = "All cases", unit = "Month", daterange = c("2011-12-01", "2016-12-01"),
algo = "FarringtonFlexible", testingperiod = 3))
4- You can now render the re markdown report...
(e.g. runEpiSDReport(input = input)
)
–> Any dataset specified as described in the package vignette.
An HTML report
Default dataset used in the report SignalData
Signal Detection Application runEpiSDApp
#-- Running the report as a standalone function runEpiSDReport() #Definition of each input parameter #is done one by one through the R console #---> OR #-- First setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "Portugal", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2011-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 6 ) #-- Second running the report based on the EpiSignalDetection::SignalData dataset #-- and store it in a temporary folder runEpiSDReport(input = input) #-- Running the report based on the EpiSignalDetection::SignalData dataset #-- and store the HTML output 'test.html' in the folder 'C:/R/' runEpiSDReport(input = input, outputfile = "C:/R/test.html") #-- Running the report based on external data input <- list( file = list(datapath = "C:/Users/Downloads/ECDC_surveillance_data_Pertussis.csv"), disease = "Pertussis", country = "Greece", indicator = "Reported cases", stratification = "All cases", unit = "Month", daterange = c("2011-12-01", "2016-12-01"), algo = "FarringtonFlexible", testingperiod = 3 ) runEpiSDReport(input = input, stratified = TRUE)
#-- Running the report as a standalone function runEpiSDReport() #Definition of each input parameter #is done one by one through the R console #---> OR #-- First setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "Portugal", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2011-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 6 ) #-- Second running the report based on the EpiSignalDetection::SignalData dataset #-- and store it in a temporary folder runEpiSDReport(input = input) #-- Running the report based on the EpiSignalDetection::SignalData dataset #-- and store the HTML output 'test.html' in the folder 'C:/R/' runEpiSDReport(input = input, outputfile = "C:/R/test.html") #-- Running the report based on external data input <- list( file = list(datapath = "C:/Users/Downloads/ECDC_surveillance_data_Pertussis.csv"), disease = "Pertussis", country = "Greece", indicator = "Reported cases", stratification = "All cases", unit = "Month", daterange = c("2011-12-01", "2016-12-01"), algo = "FarringtonFlexible", testingperiod = 3 ) runEpiSDReport(input = input, stratified = TRUE)
A dataset containing an export from the ECDC Atlas for salmonellosis and measles data. This export is cleaned and ready for Signal Detection Analysis (see. cleanAtlasExport() )
SignalData
SignalData
A data frame with 80,834 rows and 11 variables:
Disease name e.g. Salmonellosis or Measles
Population characteristics e.g. All cases, Confirmed cases, Serotype AGONA, Serotype BAREILLY etc.
Indicator e.g. Hospitalised cases, Reported cases, Number of deaths, etc.
Time variable including both yearly data from 1999 to 2017, and monthly data from 1999-01 to 2018-02
Geographical level including country names e.g. Austria, Belgium, Bulgaria, etc.
Number of cases
Time unit corresponding to the format of the date in the 'Time' variable e.g. Year or Month
Year of the date available in the 'Time' variable, regardless of the date format i.e. 1999 to 2018
Month of the date available in the 'Time' variable, regardless of the date format i.e. 1 to 12
Week of the date available in the 'Time' variable, regardless of the date format i.e. NA since this dataset does not include any weekly data
Approximated date corresponding to the date available in the 'Time' variable (daily format)
http://atlas.ecdc.europa.eu/public/index.aspx
Build sts surveillance object
stsSD(observedCases, studyPeriod, timeUnit = "Month", startYM = c(2000, 1) )
stsSD(observedCases, studyPeriod, timeUnit = "Month", startYM = c(2000, 1) )
observedCases |
numeric vector of the number of cases by time unit (y axis of the time series) |
studyPeriod |
vector of dates of length(obeservedCases) (x axis of the time series) |
timeUnit |
character string for the time unit of the time series. Options are Week or Month. |
startYM |
numeric vector including Year and Month of start of the historical data |
sts
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input) #-- Aggregating the data by geographical level and time point dataset <- aggAtlasExport(dataset, input) #-- Bulding the corresponding sts object dataset.sts <- stsSD(observedCases = dataset$NumValue, studyPeriod = dataset$StudyPeriod, timeUnit = input$unit, startYM = c(as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%Y")), as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%m"))))
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) #-- Example dataset dataset <- EpiSignalDetection::SignalData #-- Filtering on declared input parameters dataset <- filterAtlasExport(dataset, input) #-- Aggregating the data by geographical level and time point dataset <- aggAtlasExport(dataset, input) #-- Bulding the corresponding sts object dataset.sts <- stsSD(observedCases = dataset$NumValue, studyPeriod = dataset$StudyPeriod, timeUnit = input$unit, startYM = c(as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%Y")), as.numeric(format(as.Date(input$daterange[1], "%Y-%m-%d"), "%m"))))
Compute a dataframe including two types of dates corresponding
to the study period defined in the list of parameters input
(i.e. StudyPeriod
= approximated daily date; Time
= exact date in the format according to the time unit parameter)
studyPeriod(input)
studyPeriod(input)
input |
list of parameters as defined in the Signal Detection Application (see (i.e. |
Dataframe including the complete time series with no gaps:
StudyPeriod |
approximated daily date e.g. |
Time |
exact date in the format according to the time unit parameter e.g. |
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) StudyPeriod <- studyPeriod(input) head(StudyPeriod)
#-- Setting the parameters to run the report for input <- list( disease = "Salmonellosis", country = "EU-EEA - complete series", indicator = "Reported cases", stratification = "Confirmed cases", unit = "Month", daterange = c("2010-01-01", "2016-12-31"), algo = "FarringtonFlexible", testingperiod = 5 ) StudyPeriod <- studyPeriod(input) head(StudyPeriod)