An Insight To Our Dataset

Describing our data to further our understanding of Terrorism.

Data Visualization
5 min readMar 21, 2021
Photo by Mika Baumeister on Unsplash

The Source

In our quest to better understand terrorism, we have extracted our dataset from the Global Terrorism Database (GTD) of the National Consortium for the Study of Terrorism And Responses to Terrorism (START). This organization is part of the Department of Homeland Security emeritus center of excellence led by the University of Maryland. Its objectives are collecting and analyzing wide ranges of data on terrorism. Therefore, it is active in both research and education.

“START is committed to the widespread dissemination of its research findings not only to homeland security professionals through tailored research, education and training efforts, but also to students of all levels and to the general public.”

An Overview of The Global Terrorism Database

The Global Terrorism Database (GTD) is an open-source database including information on both domestic and international terrorist attacks from 1970 through 2019, and now includes more than 200,000 cases.

The GTD is available online in an effort to increase understanding of terrorist violence.

Characteristics of the GTD

  • Contains information on over 200,000 terrorist attacks
  • Includes information on more than 95,000 bombings, 20,000 assassinations, and 15,000 kidnappings and hostage events since 1970
  • Includes information on at least 45 variables for each case, with more recent incidents including information on more than 120 variables
  • More than 4,000,000 news articles and 25,000 news sources were reviewed to collect incident data from 1998 to 2019 alone

The Global Terrorism Database (GTD)™ is the most comprehensive unclassified database of terrorist attacks in the world.

GTDs Definition of Terrorism

“Terrorist attack as the threatened or actual use of illegal force and violence by a non-state actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation.” Defines the GTD.

In practice to consider an incident for inclusion in the GTD, all three of the following attributes must be present: 

  1. The incident must be intentional the result of a conscious calculation on the part of a perpetrator. 
  2. The incident must entail some level of violence or immediate threat of violence - including property violence, as well as violence against people.
  3. The perpetrators of the incidents must be sub-national actors. The database does not include acts of state terrorism.

In addition, at least two of the following three criteria must be present for an incident to be included in the GTD: 

  • Criterion 1: The act must be aimed at attaining a political, economic, religious, or social goal. In terms of economic goals, the exclusive pursuit of profit does not satisfy this criterion.
  • Criterion 2: Evidence of an intention to coerce, intimidate, or convey some other message to a larger audience (or audiences) than the immediate victims.
  • Criterion 3: The action must be outside the context of legitimate warfare activities. That is, the act must be outside the parameters permitted by international humanitarian law.

Exploring Variables

For every observation (unique ID), both spatial and temporal dimensions are given with information on the date and numerous variables specifying the location as described here below,

  • Temporal variables: year, month, day, approximate date, extended incident (if incident extends beyond 24h),…
  • Incident information variables: A summary, the incident qualified as terrorism by which inclusion criterion (3 criteria), if there is doubt as to if it was terrorism, related incidents, part of multiple incidents,…
  • Location variables: country, region, province/state, vicinity, latitude, longitude, …
  • Attack variables: attack type, if it was a successful attack, if it was a suicide attack,…
  • Weapon variables: Weapon type and subtype, variables related to if there were multiple weapons,…
  • Target/Victim variables: target type and subtype, nationality, specific target name, variables related to account for multiple targets…
  • Perpetrator variables: group name and subgroup name, number of perpetrators, presence of unaffiliated individuals, number of the captured perpetrators, claim of responsibility, mode or claim, and variables to account for multiple perpetrator groups, if they are confirmed or not,…
  • Causalities variables: number of fatalities, number of perpetrators fatalities, number of injured (nationality, perpetrators,…), variables related to damaged properties, series of var. related to number of kidnapping( number, duration, country, ransoms, outcome, rescued)
  • Additional and Sources variables: additional notes, was the attack logistically international or domestic, ideologically international or domestic, Source citations (1st to 3rd source),…

It is worth mentioning that not all variables will be used, as we have proceeded to variable selection depending on the questions we seek to answer. Some observations that may fall out of the trend will also be analyzed further as potential outliers or crucial observations that could be explored later on.

Limits of our GTD

A few limits can found as part of the construct of the dataset.

Firstly, the GTD does not include plots or conspiracies that are not enacted, or at least attempted, so we lack perspective into the “real number of attempts” and how successful terrorists were in enacting attacks. Additionally, the dataset is limited as it only goes back as far as 1970 but lacked much information. Only starting from 1997, new efforts were made to complete the dataset with additional variables. These variables are in progressive evolution since, which does affect the previous data as somehow retrospectively incomplete. However, the year 1997 holds no particular meaning in the relevance of the data except for the collection methodology.

The GTD made some efforts in transparency and inclusiveness, as well as in terms of criterion and coding system that were designed to be completely transparent and available to all future users of the database, but also by enabling users to truncate the dataset according to the definition of terrorism that meets their needs.

However, the data was not always collected by the START initiative. Between 1970 -1997 data was collected by the Pinkerton Global Intelligence Service (PGIS) - a private security agency. Then START collaborated with the Center for Terrorism and Intelligence Studies (CETIS) which collected GTD data for terrorist attacks that occurred from January 1998-March 2008. After which, the data collection transitioned to the Institute for the Study of Violent Groups (ISVG). ISVG collected data on attacks that occurred from April 2008 to October 2011. And only as of November 2011, were GTD data collection conducted by START staff at the University of Maryland. Those organizations were not specifically uniform in their methods of data collection. Also, some questions might be raised at the notion of private or public owned agencies, and how independent were these organizations.

Another limit worth mentioning is that the data recorded here is only part of unclassified resources. The sources of information are also worth exploring, as to where the information originates from. If there is a possible distortion between sources, be it explained by a language barrier or the NLP algorithms used by START. We might also add the presence of a bias, for example about claims the perpetrators make.

Lastly, variables might also be limited. Therefore as we progress through our analysis, we will comment on some of their limits if need be.

For further details on variables, data collection methods, and source on methodology:

--

--

Data Visualization

This blog was created for the class on Data Visualization at KU Leuven