MAGO cropmaps

MAGO cropmaps is an open source Python library that allows users to perform crop type classification using multispectral Sentinel-2 data.

End Users

This solution is intended for water managers, irrigation communities, and scientists for the classification and crop type mapping utilizing Copernicus Sentinel-2 L2A multispectral satellite data and machine learning.  For the use of the software a user must be familiar with Python and to know how to install a library in Python.

Solution Overview

MAGO cropmaps package is an open source (GPLv3) Python toolbox for crop type mapping from Sentinel-2 L2A multispectral satellite data using two well known machine learning algorithms; Support Vector Machines and Random Forests. The implementation allows both the usage of the package in server environments (e.g CreoDIAS) and locally. Two types of data are required, (1) Sentinel-2 images and (2) ground truth vector data (e.g a shapefile with parcels as labeled polygons) with the information of crop types to be classified. Users can install the software using the analytical instructions in the project's Gitlab repository and use the software in Jupyter notebooks or by simple custom Python scripts. Along with the instructions a number of examples (as notebooks), wikis, and a readthedocs documentation page  was implemented for the help of the users.

Key innovation

MAGO cropmaps is free and open source and enables for the users the ability to perform land cover and crop type mapping classification. It utilizes Support Vector Machines and Random Forests machine learning algorithms along with multitemporal Sentinel-2 data which enables the ability for monitoring vegetation phenology, thus providing to the machine learning models the advantage of learning temporal features and seasonality.

Key Features

The MAGO cropmaps Python package is implemented to utilize data from Copernicus Sentinel-2. As an entry point a user must define for the software in Python the following:

  • A path to a file with an area of interest (AOI) must be defined by the user for the software to be able to search for data in a specific region.
  • ESA-Scihub credentials are required in order to perform requests to ESA while searching for data. In case of missing credentials, other searching methods were implemented that support search without the need of credentials.
  • A search time slot.
  • Ground truth (GT) data for the training/testing of the model. The raw format of the GT data must be in ESRI Shapefile. Any valid Coordinate Reference System (CRS) can be used on that data. Users must provide a shapefile variable with the path to the data and the respective class column.

Flowchart of MAGO cropmaps package

The first step in order to find all the available Sentinel-2 data of the catalog is to use the above variables. Note that a user can provide more Sentinel-2 product variables such as satellite orbit, product tile etc, for a more specific request. The output of the data search is then fed to a Python class (find more here) responsible for collecting, processing and analyzing Sentinel-2 timeseries data. More extensively this object is for:

  • Collecting the Sentinel-2 images and providing to the user the ability to sort them based on acquisition date and time, cloud coverage or name, and as well as to remove any image that is not needed for the user based on a new time range and date.
  • Preview Sentinel-2 metadata information to the user.
  • Mask the raw data using the AOI selected by the user in order to save disk space resources and computational time.
  • Resample Sentinel-2 low resolution bands (20-60 meters) to the highest resolution of 10 meters.
  • Apply cloud masks to the raw L2A Sentinel-2 data using the SCL band.
  • Calculate vegetation indices such as NDVI, NDWI and NDBI to be provided as extra layers for the training of the model.

When the timeseries is processed, the next step consists of the extraction of the hypercube. A hypercube is the stack of multiple images in different spectral resolution and dates as one multiband image.  More extensively, all the available dates are by default being used. If a user does not want a specific image for any reason can use the functionalities of the class object to exclude an image. These  functionalities  are for removing images with cloud coverage percent more than the user has selected, for removing a specific date and more. Regarding the spectral bands of the Sentinel-2 data the user must select which bands wants to include in the hypercube for training, including the vegetation indices and then the generation of the hypercube is done.

The last step before the modeling is the rasterization of the vector training data. For this, specific functionalities have been implemented and can be found in the documentation.

For the training of the models two methods have been implemented for training Random Forests (RF) and Support Vector Machines (SVM) models. For both models all the parametrization is supported as well as grid search capabilities for hyperparameter tuning (for example number of estimators for RF and regularization parameter for SVM). Also, the RF training method supports multiprocessing. The results from the training procedure by default are:  (1) *.csv text files that include the confusion matrix produced by the test dataset, the nomenclature with a corresponding integer representing class value, the bands importance, as well as the dates importance regarding the trained model and (2) the model itself.

Finally, for the extraction of the classification map, multiple functions are implemented including patching support for less powerful systems. Patching separates the AOI into smaller regions for the estimation of the final map, thus providing the ability to use less RAM memory.

Case Study

As part of the MAGO project, the MAGO cropmaps has been tested in Tunisia and Spain. 

Technology Stack and Methodology

The software is implemented for optimal performance within the CreoDIAS environment, but can be deployed on various systems as well. It's important to note that the modules dedicated to retrieve Copernicus Sentinel-2 are specifically configured for CreoDIAS. However, the other features seamlessly operate on any system.

Collaboration and Partnerships

The solution has been developed by UTH-NTUA and tested in Spain and Tunisia with the collaboration of different partners such as CETAQUA and INRGREF.

Visuals and Demonstrations

In the image below a map of the crop type map of Cap Bon in Tunisia for 2022 is presented. This map was produced during the MAGO project and presents the classes: Urban Fabric (URF), Forest (FOR), Other Trees (OTR), Citrus Trees (CTR), Olive Groves (OLG), Fruit Trees (FRT), Vineyards (VNY), Cereals (CRL), Vegetables (VEG), Natural Grasslands (NGR), Sparsely Vegetated Areas (SVA), Bare Soil (BRS), Sand (SAN), Coastal Water (CWT) and Water Bodies (WBD).  The overall accuracy of the model was 99.48%, and the average F1 score was 98.25%. All classes achieved an accuracy score of more than 71%, with the majority exceeding 98%.

Contact: Alekos Falagas Geospatial Software Developer| Remote Sensing Specialist @NTUA
Research Publications: ongoing work

Open Code, Access and Licensing
The MAGO senet is free and open source and is available here under GNU GENERAL PUBLIC LICENSE Version 3.