LeakDB (Leakage Diagnosis Benchmark)

Description

The Leakage Diagnosis Benchmark (LeakDB). The dataset is comprised of a large number of artificially created but realistic leakage scenarios, on different water distribution networks, under varying conditions. Additionally, a scoring algorithm was developed to evaluate the results of different algorithms using various metrics. The dataset is stored in an open research data repository.

How to Use

The benchmark contains 1000 scenarios based on the Net1 and the same number of scenarios based on the Hanoi network. For each scenario, the pressure, demands, and flow rates at every node/link in the network are provided as .csv files. The leakage location and type (abrupt, incipient) are provided as well (see Leaks folder) – note that not every scenario contains leakages. Furthermore, the .inp file used for the simulation is available as well. Labels for each time step (30-minute steps) indicating the presence of a leak are given in Labels.csv.

Usage in Python

This benchmark is also available in Python under the key “KIOS-LeakDB”:

leakdb = load("KIOS-LeakDB")

Detailed information about the provided functionality can be found in the documentation.

Loading the original data set

The original data set can be loaded as Numpy arrays by calling the load_data() function:

# Load original data set for Net1 as labeled Numpy array
# Labels:
#   1: Leak is present
#   0: No leak is present
X, y_leak = leakdb.load_data(use_net1=True, return_X_y=True)

Besides loading the entire data set of 1000 scenarios, it is also possible to load a sub-set of scenarios only:

# Load the first 10 Net1 scenarios as labeled Numpy array
X, y_leak = leakdb.load_data(scenarios_id=range(10), use_net1=True, return_X_y=True)

Besides the data set, this benchmark also provides the original evaluation function implemented in compute_evaluation_score(). In the context of the previous example of the first 10 Net1 scenarios, this could look like as follows:

# Predict the presence of a leakage for each time step in each scenario
y_pred_labels_per_scenario = ...

# Evaluate prediction
score = compute_evaluation_score(scenarios_id=range(10), use_net1=True,
                                 y_pred_labels_per_scenario=y_pred_labels_per_scenario)

Loading the scenario configurations

Besides loading the original (already simulated) data sets, it is also possible to load the scenario configuration in EPyT-Flow by calling the load_scenarios() function – i.e. the user can modify the scenarios and run the simulation themself:

# Load the first Net1 scenarios as an EPyT-Flow scenario
scenario, = leakdb.load_scenarios(scenarios_id=[0], use_net1=True)

# Modify scenario and run simulation
from epyt_flow.simulation import ScenarioSimulator
with ScenarioSimulator(scenario_config=scenario) as scenario:
    # ....

Note that due to uncertainties and other factors in the original simulation, the simulated results will differ from the original data set.

Reference

Vrachimis, S. G., Kyriakou, M. S., Eliades, D. G. and Polycarpou, M. M. (2018). LeakDB: A benchmark dataset for leakage diagnosis in water distribution networks. In Proc. of WDSA / CCWI Joint Conference (Vol. 1).