Task 1: Classification

Description

The input to the system is a music audio file. The output is the set of instruments that are played anywhere in the input file. Such systems can be used to create instrument-related tags, which can serve as features for high level systems such as recommenders.

Format

[TBD]

Evaluation

[TBD]

In the following, let $X=\{x_1,\dots,x_n\}$ be the set of instruments annotated for the input audio file, and let $Y=\{y_1,\dots,y_m\}$ be the set of instruments predicted by the system for that same audio file. In addition, let $1[e]$ be an indicator function that evaluates to $1$ if $e$ is true and $0$ otherwise.

Hierarchy-unaware Measures

The following measures are defined ignoring the hierarchy of classes:

  • Precision evaluates the ability of the system to identify instruments without making mistakes (avoid false positives). It computes the fraction of predictions that are correct:
  • Recall evaluates the ability of the system to identify all instruments present in the input file (avoid false negatives). It computes the fraction of annotations correctly predicted:
  • F-measure integrates both Precision and Recall in a single score:

Hierarchy-aware Measures

We will use the hierarchical counterparts of $P$, $R$ and $F$ defined by Kiritchenko et al. Let $anc(x)$ be a function that returns the set of ancestors of class $x$, excluding the root of the hierarchy. The extended set of reference annotations is thus $X^*=\cup_{x\in X}{anc(x)}$, while the extended set of predictions is similarly $Y^*=\cup_{y\in Y}{anc(y)}$. The above measures can now be reformulated as follows to account for the hierarchy:

  • hPrecision:
  • hRecall:
  • hF-measure: