Artificial Intelligence and Big Data Analytics for Biodefence: Implications for Threat Assessment and Biosurveillance

By Roman Kernchen

Artificial Intelligence and Big Data Analytics for Biodefence: Implications for Threat Assessment and Biosurveillance

The global migratory dynamics and movement of commodities has increased the risk of biosecurity threats and their potential to incur large economic, social, and environmental costs. Infectious disease outbreaks, whether natural or deliberate, respect neither geographic nor political borders. As a result of rising numbers of military conflicts and humanitarian operations, a large number of both civilian aides and military forces are deployed in areas of uncertainty and high complexity, where encounters with endemic infectious agents can impair their readiness. Furthermore, the advances in biotechnology and biosciences open up the possibility of new toxins, living substances and bioregulators that would demand new detection methods, preventive measures and interventions. All these trends increase the risk of unexpected threats and the anticipation of such risks through intelligence efforts is further hampered by the dual nature of biological technologies and infrastructures, so that the same scientific and technological basis necessary for legitimate scientific, economic and health applications can be used to the detriment. In view of the increasing complexity of the defence against biological threats in times of fast-developing and converging technologies, in particular genome editing and information technologies, it is a significant challenge for the intelligence community to keep pace with these rapid advances in its risk assessments. This contribution will discuss the application of artificial intelligence (AI) and big data analytics to biodefense problems in the specific domains of threat assessment and biosurveillance. The provision of intelligence is a key element for technological superiority and an effective instrument in the effort to prevent the proliferation of biological threat agents. The utilisation of advanced information technologies to better identify and assess emerging biosecurity threats could lead to improved response and preparedness measures to prevent or at least mitigate the use of biological agents as weapons.

The use of AI and big data analytics for the purpose of intelligence is a new discipline of information extraction that changes every stage of intelligence work, from acquisition to processing to formulating the intelligence picture and translating the information into operational steps. Big data is a term for massive data sets having large, more varied and complex structure with the difficulties of storing, analysing and visualizing for further processes or results (1). Big Data Analytics is the process of researching large amounts of data to find hidden patterns and secret correlations. Artificial Intelligence technologies aim to reproduce or surpass abilities in computational systems that would require intelligence if humans were to perform them, including: learning and adaptation; sensory understanding and interaction; reasoning and planning; optimisation of procedures and parameters; autonomy; creativity; and extracting knowledge and predictions from large, diverse digital data. For the purpose of making predictions about the presence and extent of biological threats, the type of AI that enables the extraction of knowledge and predictions from large diverse digital data is most relevant. There are currently a variety of AI methods for model development to support decision making and predictions. For deployment and user understanding and trust, however, there are two things needed and currently not well addressed: verification and validation (V&V), and operations and monitoring (O&M) (2). Both of these steps help to address questions and concerns from practitioners and users on issues of trust, explainability, robustness, and effectiveness of the AI approaches. To support national security decision makers, analysis aided by AI will need to engender trust, requiring transparency and plausibility at each stage of AI deployment.

AI and Big Data Implications for Threat Assessment

Threat assessment is an information fusion task which consists of assessing projected future situations to determine whether detrimental events are likely to occur (3). It contributes to the achievement of the highest level of situation awareness which is essential to conduct decision-making activities in the kind of complex and dynamic environments decision-makers and intelligence collection analysts are confronted with in monitoring and surveillance applications. With regard to biological hazards, threat assessment involves obtaining timely, accurate, and relevant intelligence related to the malevolent use of biological agents, and the recognition of existing and future trends and patterns in the evolving threat of biological weapons. These activities encompass the collection and analysis of a multitude of information about potential adversaries and their interest and capabilities in the areas of biology and biotechnology, in order to enable intelligence collection analysts to anticipate the challenges and support the development of medical and non-medical countermeasures (4). Artificial intelligence and big data analytics are well suited to support these efforts (5, 6). Such methods now offer effective means to gather, compile, organise and record data relevant to biodefence, including many types of biologically relevant materials and related resources utilized in the development of biological weapons, and a range of information on the identities, composition, location, resources and capabilities of potential adversaries. Biologists around the world routinely hire companies to synthesize DNA fragments for experimental or clinical use in the laboratory. But intelligence experts and scientists both have been afraid for years that bioterrorists could misuse such services to develop dangerous viruses and toxins – perhaps by making small changes in a genetic sequence to bypass security controls without compromising the function of the DNA (7-9). The characterization of biological threats involves laboratory research conducted for the purpose of biological defence. The data necessary for the development of such risk assessments are currently largely inadequate and there are still major gaps in our knowledge and understanding of biological weapons. We are often left with only limited data on the biology of many potential biological hazards, e.g. their dose-response profile, their behaviour under various conditions and their environmental persistence, and a rather limited understanding of the intentions of opponents who possess or attempt to possess biological weapons (10). This uncertainty about the biological characteristics of a threat agent on the one hand, and about its probability of use on the other, makes it difficult to make an effective decision on how to counter the risk. Meanwhile, government institutions in the US and Europe have begun funding research and development that uses machine learning to identify whether a DNA sequence is encoding part of a virulent pathogen, and researchers are beginning to make progress in developing AI-based screening tools (11). Among them is the US Intelligence Advanced Research Projects Agency (IARPA), who has launched an initiative – the Functional Genomic and Computational Assessment of Threats (Fun GCAT) program, to design better algorithms for spotting potentially threatening sequences. The Fun GCAT program intends to develop new approaches and tools for the screening of nucleic acid sequences, and for the functional annotation and characterization of genes of concern, with the goal of preventing the accidental or intentional creation of a biological threat (12). At present, biological threats are being organised on the basis of genetic relatedness, leading to static threat-based lists that do not highlight biological functions or assess the risk of unknown sequences. To better address biosecurity considerations, the Fun GCAT program aims to develop next-generation computer and bioinformatics tools to improve DNA sequence screening, enhance biological defence capabilities by characterizing threats based on function, and improve our understanding of the relative risks associated with unknown nucleic acid sequences. As part of the Fun GCAT program, a database of threatening genetic sequences based on their protein function has been developed by Battelle, that is combining machine learning with human subject matter expertise. The AI-based technology (ThreatSEQ), which is already in use, can analyse microbial genomes for pathogen severity, antibiotic resistance and infectiousness by looking at genetic sequences and the proteins they encode. In this way, it can predict whether or not a novel species represents a threat to a variety of hosts, including humans. A similar approach, using advanced computational and machine learning methods, like neural networks, is followed by a team led by investigators from the Biocomplexity Institute of Virginia Tech (13). These systems, currently under development, will advance the capacity for computer-assisted and functional analysis of nucleic acid sequences, identify threat potentials of known and unknown genes by comparison with the functions of known threats, and improve the ability to analyse and identify critical sequences, in particular genes that are responsible for the pathogenesis and virulence of viral threats, bacterial threats and toxins.

In addition to the role played by artificial intelligence and big data, as described above, concerning the subject-specific scientific assessment of future risks of modern biotechnology with regard to the development and production of biological weapons, such modern information technologies are also becoming increasingly important in the assessment of bioterrorist threats and the threat of proliferation of biological weapons. An understanding of the threat environment is very difficult because many potential threat scenarios involve a number of factors (technology, psychosocial factors, biosecurity and compliance issues, and policy issues) and it remains difficult to determine the impact of these factors on enabling certain types of threats over time (14). The contribution of predictive artificial intelligence to countering bioterrorism has already been recognised and is being used to a limited extent. Automated data analysis is used to support the operations of intelligence and security services, in particular through data visualization (15). Machine learning methods allow the interpretation and analysis of otherwise inaccessible patterns in large-scale data sets. As far as the collection of information is concerned, individuals involved in the collection are using extensive databases, some of which are openly accessible and others not, to collect information about a particular site (a province, a region or a smaller area), a specific population, a certain activity or a particular organisation (16). Different algorithms are then used to process these data in order to obtain an answer to the questions asked at the initiation of the procedure. The questions can relate to warnings of attacks, changes in the activities of an organization, statements about social media, and more.

Deploying artificial intelligence applications in a national security environment, however, is often challenging, as the opacity of the systems makes it difficult for a human being to understand how the results came about. The reliance on black boxes to generate forecasts and make informed decisions is potentially devastating. The operator wants a trustworthy and comprehensible result with high reliability, confidence and credibility – and thus a low level of uncertainty. The problem with many artificial intelligence and machine learning techniques, which seek a classification boundary, is that they do not perform uncertainty quantification. One way of improving the trustworthiness of the results would be human-machine teams during data selection and decision processes (2).

AI and Big Data Implications for Biosurveillance

Automated biosurveillance is very promising in terms of both improving public health response to natural disease outbreaks and minimising potential casualties from the utilisation of biological weapons. Biosurveillance systems collect and analyse vast quantities of diverse real-time data from many sources to provide governments with advance warning of disease outbreak or bioweapons attack. The capacity to anticipate and monitor when and where an outbreak may occur and how a pathogen may be transmitted has the potential to substantially improve response strategies at local, national and international levels. The day-to-day volume of monitored information from hospital admittances, emergency services, drug purchases and social media scanning is immense and growing fast as new sources become available. Such information is required to generate accurate predictions and an understanding of transmission patterns that take into account the various biological, environmental, behavioural, and socio-cultural issues that that can dynamically change disease outcomes (5).

The detection of disease outbreak signals and indications of bioweapon use in these large noisy data streams is a challenge for traditional statistically based algorithms that are currently in use. Current largescale biosurveillance systems are affected by two principal deficiencies: the timely detection of disease-indicating signals in noisy data, and anomaly detection over multiple channels. In order to manage unstructured and multimodal health surveillance data, deep learning approaches were developed and applied (17). Deep learning, also known as deep structured learning is a class of machine learning algorithms based on artificial neural networks, which has led to satisfactory results when used to perform tasks that are difficult for conventional analysis methods. Within the limits imposed of data availability, deep learning methods have been successfully tested to improve anomaly detection and data fusion performance for particularly demanding data subsets (18).

The application of Artificial Intelligence was also successfully demonstrated in terms of improving the framework conditions for monitoring influenza outbreaks via the social media platform Twitter (19). Machine learning techniques have been used to improve the filtering process in order to better distinguish tweets that appear to describe real-world cases of influenza from those that do not. The study indicates that greater consideration of machine learning methods can provide a new perspective on the role of Twitter and similar social networks in the study of disease outbreaks.

Conclusions

Big data and artificial intelligence techniques offer enormous possibilities to perform technical assessments of the risk and likelihood of different biothreats and an unparalleled opportunity for mapping disease outbreaks. Deploying artificial intelligence applications in a national security environment, however, is often challenging, as the opacity of the systems makes it difficult for a human being to understand how the results came about. Establishing guidelines for the use of artificial intelligence and big data analytics to counter bioterrorist attacks and for the purpose of intelligence work therefore is an ongoing and complex process, that will require joint work by computer scientists, security and terrorism experts, strategists, legal experts, and philosophers.

References

1. Sagiroglu S, Sinanc D, editors. Big data: A review. 2013 International Conference on Collaboration Technologies and Systems (CTS); 2013 20-24 May 2013.
2. Blasch E, Sung J, Nguyen T, Daniel CP, Mason AP. Artificial Intelligence Strategies for National Security and Safety Standards. arXiv preprint arXiv:191105727. 2019.
3. Le Guillarme N, Mouaddib A-I, Gatepaille S, Bellenger A, editors. Adversarial intention recognition as inverse game-theoretic planning for threat assessment. 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI); 2016: IEEE.
4. The White House. Homeland Security Presidential Directive 10 and National Security Directive Presidential 33: Biodefense for the 21st Century: Washington, DC, April; 2004 [Available from: https://georgewbush-whitehouse.archives.gov/news/releases/2007/10/20071018-10.html.
5. Vogel KM. Big Data and Biodefense: Prospects and Pitfalls. Defense Against Biological Attacks: Springer; 2019. p. 297-315.
6. Valdivia-Granda WA. Big Data and Artificial Intelligence for Biodefense: A Genomic-Based Approach for Averting Technological Surprise. In: Singh SK, Kuhn JH, editors. Defense Against Biological Attacks: Volume I. Cham: Springer International Publishing; 2019. p. 317-27.
7. Kernchen R. Interactions between vulnerability to bioterrorism and S&T change. Working Paper 21. Scientific and Technological Advances Relevant to Bioterrorism, and their Possible Impact on Vulnerabilities in EU Society: A Prospective Study, for the European Science and Technology Observatory in response to paragraph 4.2 of EU Commission Communication 2001/707. Seville: JRC-IPTS; 2002, http://publica.fraunhofer.de/dokumente/N-564986.html.
8. Kernchen R. Survey: Impact of potential BW control measures on scientific and technological activities in Germany and EU candidate countries. Working Paper 20. Scientific and Technological Advances Relevant to Bioterrorism, and their Possible Impact on Vulnerabilities in EU Society: A Prospective Study, for the European Science and Technology Observatory in response to paragraph 4.2 of EU Commission Communication 2001/707. Seville: JRC-IPTS; 2002, http://publica.fraunhofer.de/dokumente/N-564985.html.
9. Kernchen R. STI Implications for Biosecurity Governance. Schriftenreihe des Eyvor Instituts (ISSN 2698-5403). 2019;BS-1 1-5.
10. Watson CR, Watson MC, Ackerman G, Gronvall GK. Expert Views on Biological Threat Characterization for the U.S. Government: A Delphi Study. Risk Analysis. 2017;37(12):2389-404.
11. Reardon S. How machine learning could keep dangerous DNA out of terrorists’ hands. Nature. 2019;566(7742):19-.
12. IARPA. Functional Genomic and Computational Assessment of Threats (Fun GCAT) Program 2019 [Available from: https://www.iarpa.gov/index.php/research-programs/fun-gcat.
13. Trent T. Research team creates powerful system to identify biological threats 2017 [Available from: https://vtnews.vt.edu/articles/2017/11/bi-fungcatiarpa.html.
14. Walsh PF. The Biosecurity Threat Environment. Intelligence, Biosecurity and Bioterrorism: Springer; 2018. p. 21-57.
15. McKendrick K. Artificial Intelligence Prediction and Counterterrorism. London: The Royal Institute of International Affairs – Chatham House; 2019 9 August 2019
16. Ganor B. Artificial or Human: A New Era of Counterterrorism Intelligence? Studies in Conflict & Terrorism. 2019:1-20.
17. Chae S, Kwon S, Lee D. Predicting Infectious Disease Using Deep Learning and Big Data. International Journal of Environmental Research and Public Health. 2018;15(8):1596.
18. Finley PD, Levin D, Flanagan TP, Beyeler WE, Mitchell MD, Ray J, et al. Biologically inspired approaches for biosurveillance anomaly detection and data fusion. ; Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Laboratories, Livermore, CA; 2018. Report No.: SAND2018-14334; Other: 671073 United States 10.2172/1489542 Other: 671073 SNL English.
19. Allen C, Tsou M-H, Aslam A, Nagel A, Gawron J-M. Applying GIS and Machine Learning Methods to Twitter Data for Multiscale Surveillance of Influenza. PLoS One. 2016;11(7):e0157734.

Please find a PDF version of this article here.

Please cite as: Kernchen, Roman, Artificial Intelligence and Big Data Analytics for Biodefence: Implications for Threat Assessment and Biosurveillance. In: Schriftenreihe des Eyvor Instituts (ISSN 2698-5403), BS 2019 (3): p. 1-6

Roman Kernchen
Scientific Director

Roman Kernchen is scientific director at Eyvor Institute. He received his Ph.D. (Dr.rer.nat.) from Rheinische Friedrich-Wilhelms University, Bonn and then held permanent academic positions at the German Aerospace Centre (DLR) and the Fraunhofer-Institute for Technological Trend Analysis. His current research interests centre around data-driven approaches for risk forecasting and analysis. He is especially interested in assessing climate change risks and vulnerabilities to businesses and industries, regional security, and sustainable development. Topics of interest include innovation research, risk studies, technology options assessment, data mining, topic analysis, security studies, and governance of emerging technologies.