FAIR Data in Action: PA-X data informs new dataset on peace agreement actors

The PA-X Peace Agreement Database and Dataset contains data on peace agreements and peace processes which adhere to the FAIR principles, i.e. data which are findable, accessible, interoperable, and reusable. The application of these principles is becoming the staple of the PeaceRep approach to data management, research, and analysis, which triangulates several methodologies to better understand how peace processes unfold. PA-X releases have consistently demonstrated the FAIR principles – they make data easily accessible online, with documentation, various download options, and provide crosswalks to related datasets on armed conflict and peace processes. The latest PA-X releases are now being utilized in upcoming PeaceRep projects, with a greater emphasis on interoperability and reusing existing data.

This year, in parallel with the most recent, seventh release of PA-X data, the PeaceRep team will release a new dataset of peace agreement actors and their commitments, developed as an extension of PA-X and an example of successful reuse of existing data to answer new and different research questions.

Reuse of PA-X data has been a key concern in the PA-X team at the University of Edinburgh since the dataset’s initial release. We knew that, alongside the research plans in the team, there was potential for the data to be used more widely, and for purposes beyond our own. The PA-X data primarily provide information on peace agreements, their context and their content. However, they can also be used to understand the peace processes from which these agreements stem, as well as the actors that take part in their creation.

Understanding peacemaking actors

The issue of understanding actors involved in peacemaking has been at the centre of a data problem: the PA-X database shows all agreement signatories as listed in the original documents, and it is quite easy to see which actors have signed each agreement. However, as the data are entered manually into a single database field, with all signatories listed together, there is no easy way to see which agreements a particular party has signed, nor is it easy to see what kinds of commitments some sides are more likely to make when signing agreements. For example, if we wanted to know which agreements the representatives of the African Union, the Government of Colombia, or those of the Catholic Church have signed, we would not be able to directly search for that on the PA-X website. It is possible to use simple word searches to find mention of particular groups and organisations in the data, but that does not resolve the problem. For instance, an organisation such as the European Union can be identified in agreements by multiple names, and in multiple languages: it could appear as the European Community Monitoring Mission, as it had in the 1990s in the post-Yugoslav wars, or as EULEX in Kosovo since 2008, or as Union européenne, as it does in some documents from the Democratic Republic of Congo. The same problem appears for governments, even some armed groups: an agreement in the Philippines may be signed by the representative of the Government, but they might not be listed as ‘Government of the Philippines’ but rather as ‘Government representative’; an armed group may go by its full name, abbreviation, alternate name, and to make matters more complex, in case of translated documents, the translated name likely is not identical to the original language name. So the data conundrums multiply, making identification of agreement actors exceptionally difficult.

This question of activity of particular actors in peace processes is important – we do want to know whether an agreement is signed by all parties relevant to the conflict, but we also want to see when some of them join or exit the peace process. In complex conflicts, we may also want to know which sides co-sign, and which ones never do. Further, there is much research on the importance of third parties in armed conflict and in peace processes, and how third parties influence the form and content of signed agreements. We may wish to know whether agreements in which the EU, to continue with that example, is a third party focus more on economic issues than others, or whether the support of an illiberal or authoritarian neighbouring country makes agreement on democratic processes less likely.

Creating a new dataset using natural language processing

Creating the new actor-level dataset using traditional methods would have required a lot of time and researcher effort, including re-reading the signatories and manually classifying each of them and matching them to alternate names. Instead, we opted for a more modern approach, using natural language processing (NLP) tools. Named entity recognition (NER) is a subset of NLP methods that, as its name suggests, focuses on identifying named entities in written text, be they geographic locales, human names, or names of countries or organisations. In our case, NER was conducted using Python’s spaCy library. The starting point were dictionaries of names: some of them we developed ourselves for the more prominent organisations, and some were adopted from other data sources, such as the Uppsala Conflict Data Programme (UCDP) and the Integrated Crisis Early Warning System (ICEWS), who provide lists of relevant states, international organisations, armed groups, and others.

The NER protocols matched the names of countries and organisations to those in the dictionaries. For most of them, the various dictionaries provided the names, translations, alternate names, and abbreviations. In some cases, we needed to identify organisations ourselves – in cases where they appeared with misspelling, or in cases where organisations were small and not captured by these large data collection projects. The biggest problem by far – and still less labour-intensive than manual data creation – was identification of organisations by the individuals that were listed as agreement signatories. For instance, in the conflict in Bosnia and Herzegovina, there were several cases of UNPROFOR officers signing with just their name and rank, and they needed to be identified manually as UN and UNPROFOR representatives. Furthermore, the process allowed us to improve our data collection, and provided an additional resource for the automation of future PA-X releases.

Network map showing peace agreements and their signatories for the Myanmar ceasefires process for the period 2012 – 2022.

This new dataset of peace agreement signatories now counts more than 6000 instances of agreement-signing actors, and allows for the construction of actor networks to identify co-signing, as well as allowing for trends in agreement signing over time and in particular contexts. While the data are not yet publicly available, our first data reports are expected to be published soon, looking at the trends in peace agreement participation by third-party signatories.

The publication of the new peace agreement actors dataset is expected at the end of 2023 – and this will be the point at which the FAIR data cycle begins anew. From already findable, accessible, interoperable and reusable PA-X data, we will be providing another stand-alone data resource that will hopefully find its use in research beyond the confines of the PeaceRep researchers, as the original PA-X dataset has.


About PA-X

The new, seventh release of the PA-X Peace Agreement Database and Dataset is available at www.peaceagreements.org. The Database comprises a full dataset of peace agreements since 1990, current up to January 2023, a searchable database and two sub-databases, PA-X Local and PA-X Gender. All entries in PA-X can also be downloaded as corpus of texts, as all included agreements have been translated into English and digitised. With this release, the PA-X Database now includes 2003 peace agreements, 44 of which are new additions. Read more about PA-X V7.

PA-X forms the cornerstone of our PeaceTech work, building innovation focused on better data for supporting adaptive management of peace and transition processes. PA-X data underpins a range of digital tools to support policy and practice, including visualisations, trackers, interactive timelines, infographics, and a mobile app. Browse all digital tools.

PA-X Team at the University of Edinburgh are Christine Bell, Sanja Badanjak, Laura Wise, Robert Wilson, Juline Beaujouan, Adam Farquhar, and Jennifer Hodge.

The Peace Agreement Actors Dataset was developed by Sanja Badanjak and Niamh Henry, with assistance from the PA-X Team.