Document Actions

MPTK Users FAQ

by broy — last modified 2007-02-09 12:07
Up one level
MPTK Users general FAQ

FAQ overview

What is matching pursuit ?

Matching Pursuit  is an iterative algorithm which can be used to provide adaptive sparse decompositions of high-dimensional vectors such as signals or images using a redundant dictionary of elementary building blocks called atoms. The Matching Pursuit algorithm was first introduced by Zhifeng Zhang and Stéphane Mallat in the following paper:

What's about MPTK ?

The Matching Pursuit Tool Kit provides a fast implementation of the Matching Pursuit algorithm for the sparse decomposition of audio signals. It comprises a library, some standalone utilities and some scripts to plot the results under Matlab.

Where can I find other implementations of Matching Pursuit ?

The Matching Pursuit algorithm, which was proposed by Zhifeng Zhang and Stéphane Mallat, has been implemented in several software packages.

  • The original software was distributed under the name mpp (matching pursuit program) but seems to be no longer available.
  • An open source implementation was developped as a package of the LastWave program by Rémi Gribonval, Emmanuel Bacry and Javier Abadia at Ecole Polytechnique. Some documentation can be found here.
  • Fabien Bracher, from the Observatoire Midi-Pyrénées, developped a graphical interface for Matching Pursuit called Guimauve (archived on wayback machine here) which is based on LastWave's matching pursuit package.
  • A matlab implementation of matching pursuit is available in Wavelab, a collection of Matlab routines to implement various wavelet related algorithms.
  • If you know of other Matching Pursuit implementations that we should quote here, please help us update this page by contacting us.

Which functionnalities MPTK actualy provides ?

        • mpd: performs the signal decompostion into atoms stored in the resulting books.

        • mpf: the atoms stored in the resulting books can be ltered (i.e.sorted and selected) with the command mpf.

        • mpcat: several books can then be concatenated with the command mpcat.

        • mpr: once a desired book is obtained, a corresponding approximant signal can be rebuilt using mpr, with the optional addition of a residual signal (or in addition to any other signal).

        • mpd_demix: he mpd demix utility provides blind source separation using Matching Pursuit when the mixing matrix is known.

        • Each of the cited utilities have a -h switch to get command-line help.

        • Some Matlab utilities are bundled with the distribution to help loading and visualizing books under Matlab. They can be found in the ./extras/matlab/ directory of the source tree. They are not automatically installed by the congure system: you should copy them manually to a location accessible to your MATLABPATH:

          • bookread.m: to load a binary book into Matlab.

          • bookplot.m: to plot a book in a spectrogram-like fashion.

          • bookover.m: to overlay a book plot on a STFT spectrogram.

          • dictread.m: under development. Don't use.

What is defined as an atom ?

An elementary piece of signal. An atom is characterized by its class (e.g. Gabor atoms, harmonic atoms etc.) and a set of parameters (e.g., for a Gabor atom, its time location and length, its frequency, its amplitude, a chirp factor and a given shaping window). In our code, the corresponding object knows how to subtract itself from a signal, and how to generate its own waveform.

What is defined as a block ?

A block computes the correlations (inner products) between a signal to analyze and a set of atoms for which the computation of the correlations can be factorized in an efficient manner, along the whole support of a signal.

For instance, the inner products can stem from a time-frequency transform, such as the

Fourier Transform or a wavelet transform, provided the transform can be interpreted as a set of correlations between the signal and a set of atoms which cover the time-frequency plane.

Each block object can search for the location of the maximum correlation between the

atoms and the signal, and can thus deduce which atom contributes the most energy to the analyzed signal. Several blocks corresponding to various scales or various transforms can be concurrently applied to the same signal, thus providing for multi-scale or multiple-basis analysis.

What is defined as a dictionary ?

A dictionary contains a collection of blocks plus the signal on which they operate. It can search across all the blocks (i.e., all the scales and all the bases) for the atom which brings the most energy to the analyzed signal.

What is defined as a book?

A book is a collection of atoms. Summing all the atoms in a book gives a signal.

Which type of block are currently implemented ?

  • Gabor blocks, based on Short-Time Fourier Transforms, which compute the correlations with windowed sinusoids having a “flat" frequency (chirp rate == 0). In this case, one block is conceptually equivalent to one STFT with a given scale.

  • Harmonic blocks, based on the harmonic grouping of Gabor atoms, and inheriting from the Gabor block.

  • Chirp blocks, based on a post-processing of the Gabor atoms and aimed at optimizing the chirp rate.

  • AnyWave blocks, based on a fast convolution algorithm to compute some correlations with arbitrary waveforms.

  • Constant blocks, which are an extension of the Dirac blocks. Constant atoms are rectangular windows, defined by a length and a shift between atoms. A constant atom catches the mean of a signal on the specified frame.

  • Nyquist blocks. They are defined by a length and a shift between atoms, and the waveform is a normalized succession of 1 and -1. A Nyquist atom catches the greatest frequency that can be distinguished (called the Nyquist frequency if the support is even) of the specified frame.

  • MDCT/MDST/MCLT blocks. They are transforms based on local cosine functions.

  • Dirac blocks,which provide no particular time-frequency transform, but just find the signal samples which have a high energy

What kind of Inputs/Outputs are actually working with MPTK ?

MPTK has been originally designed for sound processing, but it is applicable to any kind of multichannel 1D signal. For input, we have mostly been using the WAV format, but any format recognized by the libsndfile library (i.e., most audio formats, see http://www.mega-nerd.com/libsndfile/) should work with the provided MPTK utilities.

The signal output is more specific: the MPTK binaries currently output signals in the WAV format only. (WARNING: in the libsndfile implementation, this format is not protected against clipping, which may happen for some books and is not an artifact of the MPTK analysis.)

Nevertheless, the MPTK API also offers Matlab, raw double and raw float signal output formats: those could be quickly enabled by simple hacks of the relevant utilities. To enable other output formats, please contribute some code to the mp_signal.{h,cpp} API.

What is the release licence of MPTK ?

MPTK is released both for binaries and source code format under the GNU General Public License.

What are the dependencies of MPTK in term of library ?

The MPTK package depends on a few external libraries: FFTW3, libsndfile, pthreads and (if you want to compile the GUI) wxWidgets. It is mandatory to have these libraries installed on your system before you can compile MPTK. The versions which worked for us when compiling the latest release of the MPTK package are mirrored in the MPTK_externals section of the Released Files page.

What are the supported Operating System for using MPTK ?

At the moment, each new release is being tested on i386 Linux (using GCC), Mac OSX (using Darwin) and Win32 (using mingw32-make). There are two kinds of release, the binaries one including executable files and the source files one if you want to build MPTK by yourself. More information in the platform part both for users or developers.

Some articles exposing scientific results related to MPTK are available in PDF format through the Released Files page in MPTK_related_articles section.

How to improve the quality of reconstructed signal

The quality of a signal reconstructed from a book with mpr command line increases with the number of atoms contained in this book.

At each iteration of decomposition of the analyzed signal, the best atom is extracted from the dictionary and put into the book in binary format. The dictionary from which atom are extracted is a mathematical object used in MPTK, it is defined by a XML file but not contained in this file.

The number of extracted atoms contained in the book is the number of iterations of the decomposition. The more iterations, the better the quality of the reconstructed signal. This number of iterations defines the Signal Noise Ratio, if you want to reach a particular SNR (Signal Noise Ratio), you should specify it in the command line by using -s <SNR> or --snr=<SNR> option instead of using a number of iterations with option -n <N>, --num-iter=<N>|--num-atoms=<N> to stop the decomposition.

The XML file called dictionary merely describes the “type” of atoms that will be contained in the dictionary. The size of dictionary (that is to say the number of atom available for the decomposition of the signal) is determined when the mpd command is run. It depends on the structure of the dictionary, described in the XML file, and on the number of samples of the analyzed signal.

You can know the number of atoms available in the dictionary and the number of atoms extracted in the book by looking at the console output of the mpd command. For example, this two decomposition based on the same dictionary containing only Gabor atoms:

-With a short signal (16kHz, about 320 Ko):

mptk INFO -- mpd - -------------------------

mptk INFO -- mpd - Starting Matching Pursuit on signal [Z:\MPTK\used_file\s1-short.wav] with dictionary [Z:\MPTK\used_file\Gabor_dict_structure.xml].

mptk INFO -- mpd - -------------------------

mptk INFO -- Conditions - This run will iterate until the SNR goes above [80], using [998397] atoms.

mptk INFO -- Conditions - The resulting book will be written to book file [Z:\MPTK\used_file\booktest.bin].

mptk INFO -- Conditions - The residual will not be saved.

mptk INFO -- Conditions - The energy decay will not be saved.

mptk INFO -- mpd - The initial signal energy is : 226.272

mptk INFO -- mpd - STARTING TO ITERATE

mptk INFO -- Current state - The current SNR [80.0002] reached or passed the target SNR [80] in [213391] iterations.

mptk INFO -- mpd - ------------------------

mptk INFO -- mpd - MATCHING PURSUIT RESULTS:

mptk INFO -- mpd - ------------------------

mptk INFO -- mpd - Duration of initialisation [0.624988] s:

mptk INFO -- mpd - Duration of iterations [24.905771] s:

mptk INFO -- Result - [213391] iterations have been performed.

mptk INFO -- Result - ([213391] atoms have been selected out of the [998397] atoms of the dictionary.)

mptk INFO -- Result - The initial signal energy was [226.272].

mptk INFO -- Result - The residual energy is now [2.26263e-006].

mptk INFO -- Result - The SNR is now [80].



-With a longer signal (16kHz, about 1.83 Mo):

mptk INFO -- mpd - -------------------------

mptk INFO -- mpd - Starting Matching Pursuit on signal [Z:\MPTK\used_file\s1-long.wav] with dictionary [Z:\MPTK\used_file\Gabor_dict_structure.xml].

mptk INFO -- mpd - -------------------------

mptk INFO -- Conditions - This run will iterate until the SNR goes above [80], using [5811147] atoms.

mptk INFO -- Conditions - The resulting book will be written to book file [Z:\MPTK\used_file\booktest.bin].

mptk INFO -- Conditions - The residual will not be saved.

mptk INFO -- Conditions - The energy decay will not be saved.

mptk INFO -- mpd - The initial signal energy is : 2664.06

mptk INFO -- mpd - STARTING TO ITERATE

mptk INFO -- Current state - The current SNR [80] reached or passed the target SNR [80] in [2578834] iterations.

mptk INFO -- mpd - ------------------------

mptk INFO -- mpd - MATCHING PURSUIT RESULTS:

mptk INFO -- mpd - ------------------------

mptk INFO -- mpd - Duration of initialisation [1.593719] s:

mptk INFO -- mpd - Duration of iterations [199.371172] s:

mptk INFO -- Result - [2578834] iterations have been performed.

mptk INFO -- Result - ([2578834] atoms have been selected out of the [5811147] atoms of the dictionary.)

mptk INFO -- Result - The initial signal energy was [2664.06].

mptk INFO -- Result - The residual energy is now [2.66406e-005].

mptk INFO -- Result - The SNR is now [80].


The used dictionary:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<dict>
 <libVersion>0.5.3</libVersion>
<block>
   <param name="type" value="gabor" />
   <param name="blockOffset" value="0" />
   <param name="fftSize" value="128" />
   <param name="windowLen" value="128" />
   <param name="windowShift" value="32" />
   <param name="windowtype" value="gauss" />
   <param name="windowopt" value="0" />
</block>
<block>
   <param name="type" value="gabor" />
   <param name="blockOffset" value="0" />
   <param name="fftSize" value="256" />
   <param name="windowLen" value="256" />
   <param name="windowShift" value="64" />
   <param name="windowtype" value="gauss" />
   <param name="windowopt" value="0" />
</block>
<block>
   <param name="type" value="gabor" />
   <param name="blockOffset" value="0" />
   <param name="fftSize" value="512" />
   <param name="windowLen" value="512" />
   <param name="windowShift" value="128" />
   <param name="windowtype" value="gauss" />
   <param name="windowopt" value="0" />
</block>
<block>
   <param name="type" value="gabor" />
   <param name="blockOffset" value="0" />
   <param name="fftSize" value="1024" />
   <param name="windowLen" value="1024" />
   <param name="windowShift" value="256" />
   <param name="windowtype" value="gauss" />
   <param name="windowopt" value="0" />
</block>
</dict>

How to choose the number of atoms for a decomposition with mpd?


There is no universal answer to this good question. The more atoms you select with matching pursuit, the better the accuracy (SNR) of the signal reconstruction, but the higher the computational cost of the decomposition.


MPTK provides a way of choosing the SNR you want to reach instead of the number of atoms, and will iterate MP until the desired SNR is reached, this is done using the --SNR option of the command line utility mpd.


Using the --energy-decay option of mpd also allows you to export the curve with the decay of the residual energy (which corresponds to the increasing SNR) with the number of selected atoms. You may find in the shape of this curve some information to help you determine how many atoms you actually want to select, since "meaningful" atoms often correspond to the fast decay of the residual energy at the first iterations while "irrelevant" ones yield a much slower decay when the number of iterations becomes large.

Workgroup Members