Easier Inputs, Harder Questions: AI and the PA-X Database

A Different Kind of Difficulty

When I started working on the Peace Agreements Database (PA-X) as a data and research officer almost five years ago, a surprising share of the job was data pipeline work. PA-X itself is built on years of legal analysis, conflict expertise, careful coding decisions and the data-processing infrastructure that supports it. A lot of my role, especially at the start, was taken up by the latter. Agreements arrived as scanned images of printed pages, sometimes folded, sometimes photographed at an angle in a room with bad lighting. Getting the text out of the image was its own project. Once we had the text, we had to wrestle it into a format we could actually work with. Then we needed to translate it, if it was not in English originally, as well as extract the key metadata. The structure of the document needed to be evident to the user, as its position in the document provided key context.

Five years on, a lot of this process is in a much better state, in large part because of improvements in Artificial Intelligence (AI). Text extraction is dramatically easier. Translation, at least during the discovery phase where we are just trying to work out whether a document belongs in the database, is fast and usually good enough, though for the canonical version we add to PA-X, a hybrid approach with a human translator is still what we rely on. The way we code agreements is shifting too, from copy-and-paste into category fields towards AI-assisted tagging of segments, which has opened the database up to new kinds of analysis, such as how provisions sit in relation to each other within an agreement’s structure. Generative AI is genuinely useful, particularly at helping people interact with complex, often non-AI systems in natural language queries at scale.

So, the data pipeline got easier. What I did not expect is that the conceptual work would get harder.

A Structure Under Strain

The ontology we apply to code peace agreements (the structured knowledge that informs how we understand peace agreements) is straining. This is clear when agreements are arriving with content that coding categories struggle with. Sometimes this is because nothing quite fits, but more often because what counts as central under one way of organising the field is peripheral under another. Some provisions belong at the intersection of two or three categories. Some do not quite belong anywhere. And a more basic question has started to surface more often than it used to: is the document in front of me a peace agreement at all?

The instinct, faced with content that doesn’t fit, might be to update the coding schema by adding new categories or modifying old ones. But such patching assumes the problem is coverage, when increasingly it is one of perspective. What has become more interesting to me is the relationship to the ontology itself, treating ontology less as a fixed object and more as one lens among several.

Part of what is driving this is the fragmentation of contemporary peace processes themselves. Agreements increasingly emerge from processes with multiple tracks, overlapping mediators, and parties whose understandings of what the agreement is for do not converge. A scheme built on the assumption of a coherent process struggles when the process itself is plural. Holding multiple ontologies open could be a response to a world in which the agreements themselves are no longer speaking from a single vantage either.

Ontology as Lens

Here is what I mean. Take climate change. An ontology rooted in the international peace and security law of a decade ago would file climate provisions, where they appear at all, under ‘Environment’. An ontology designed around the security landscape of 2026 cannot treat it that way. Climate-driven displacement, resource conflict and adaptation financing become matters that a peace agreement either addresses or conspicuously fails to address. The same provision in the same agreement is peripheral under one ontology and central under another. Run both across the same corpus and you do not get two labels on the same text: you get two different maps of where the action is. And none of these ontologies stay frozen. The categories within each lens can themselves evolve as new agreements surface provisions that don’t fit.

What AI and AI-adjacent techniques now make tractable is running more than one lens over the same corpus, and treating the comparison between lenses as its own form of evidence about which patterns in peace agreements span ontologies and which are specific to one way of organising the field. This connects to a broader conversation in digital peacebuilding scholarship about whether the field’s infrastructure needs to accommodate multiple ways of knowing rather than converging on one. Hirblinger and Perera’s recent work on the Pluriverse frames this as moving beyond ‘digital monocultures’ toward approaches that let different ontologies coexist. What I have been calling ontology-as-lens is one concrete version of what that could look like in practice. Findings that show up no matter which ontology framework you use carry a different kind of weight. Even adversarial ontologies, such as those that disagree about what the relevant categories are, can be compared on the same documents. The ontology stops being a cage and starts being one viewing instrument among several.

The Bridge Role

None of this works without people who can move between the technical side of AI and the substantive side of peace research. Someone has to notice that the classification scheme is under strain in the first place. Someone has to decide which alternative lenses are worth building, translate between how a peace researcher thinks about a provision and how a model represents it, and know when the model’s output is doing interpretive work that should really sit with a human. In a recent report for PeaceRep I called this a bridge role and argued that the ability to connect the technical side of AI implementation with domain knowledge (not just holding both, but understanding how each reshapes the other) should be treated as structural rather than incidental. The ontology-as-lens approach only makes that case sharper. The more perspectives you hold at once, the more judgement you need about which ones matter and why.

The data pipeline will keep getting better. That was never going to be the interesting problem for long. The interesting problem is figuring out what we are looking at, and being honest that the way we look at it is itself a choice. For PA-X, this means the database’s value increasingly lies not just in what it codes, but its capacity to support multiple ways of understanding the same material, and to make the choices behind those ways of understanding visible.


Adam Farquhar is a Research Associate and Data Officer with PeaceRep at Edinburgh Law School. He supports the management, development, and coding of the PA-X Peace Agreements Database and its sub-databases.