The Growing Importance of Open-Source Intelligence to National Security

Reading Time: 6 minutes

Disponible en français.

Intelligence agencies around the world, including the Communications Security Establishment (CSE), are increasingly acquiring, analyzing, using and sharing publicly available information (see Figure 1). Given the scale of data involved, artificial intelligence (AI) – particularly machine learning (ML) – and cloud computing will be vital, as will broader definitions of “publicly available” in privacy law. This HillNote discusses how technological advancements and an explosive growth in publicly available information have made CSE and its domestic partners interested stakeholders in anticipated changes to Canada’s privacy law.

Open-Source Intelligence: Past and Present

Until recently, open-source intelligence (OSINT) – publicly available information that has been processed into intelligence – was typically derived from foreign broadcasts and publications.

Today, OSINT is a changed discipline. Social media platforms, smartphones and Internet of Things devices have exponentially increased the volume and variety of available information. Commercial cloud solutions have driven down the cost of data storage and created a market for “machine learning as a service.” Finally, advancements in microchip design have augmented data processing power, with quantum computers promising more. Consequently, publicly available information is now collected in bulk, with the aim of applying big data analytics – including ML – to generate insight.

CSE is not the only government institution collecting and analyzing publicly available information. However, its legal mandate provides the only explicit definition of “publicly available information” apart from existing privacy law.

The Communications Security Establishment Act and Publicly Available Information

Section 23 of the Communications Security Establishment Act (CSE Act) authorizes CSE to acquire, use, analyze, retain and disclose publicly available information in furtherance of its mandate. CSE’s mandate is to provide foreign intelligence, cyber security and cyber assurance services, and – acting under their respective authorities – technical and operational assistance to federal law enforcement and security agencies, the Canadian Forces and the Department of National Defence.

Section 2 of the CSE Act defines publicly available information as “information that has been published or broadcast for public consumption, is accessible to the public on the global information infrastructure … or is available to the public on request, by subscription or by purchase.”

Constraining CSE’s acquisition of information, the CSE Act excludes from that definition “information in respect of which a Canadian or a person in Canada has a reasonable expectation of privacy.” As digitalization advances, however, the question of what constitutes a “reasonable expectation of privacy” remains open.

The Communications Security Establishment’s Privacy Protection Measures for Publicly Available Datasets

Section 24 of the CSE Act directs CSE to “ensure that measures are in place to protect the privacy of Canadians and of persons in Canada in the use, analysis, retention and disclosure” of its publicly available information datasets. CSE implements such measures through a combination of policy and technical solutions. De-identification through anonymization or pseudo-anonymization and encryption are two potential technical solutions.

Anonymization of personal information entails the removal of personally identifying data elements. Pseudo-anonymization is reversible and entails the masking of personally identifying data elements through substitution. Extrapolating from the National Security and Intelligence Review Agency’s 2020 review of CSE’s disclosures of Canadian identifying information to domestic and foreign agencies, CSE appears to mask rather than remove identifying information.

Regardless, most, if not all, de-identification techniques are vulnerable to re-identification attacks. The pattern recognition capabilities that AI technologies – particularly ML – can bring to such attacks compounds the re-identification risk.

Encrypting personal information is a strong privacy protection measure, but this protection vanishes when text is decrypted for analysis. Homomorphic encryption (HE) – which enables computations to be performed on data that remains encrypted – offers a potential solution for CSE.

A key advantage of HE is that it might offer a secure way to analyze sensitive datasets stored off-premises. Some cloud service providers either plan to or already offer HE to clients. It is therefore noteworthy that Amazon Web Services (AWS) provides cloud services to CSE’s counterparts in the United States and United Kingdom – the National Security Agency and Government Communications Headquarters, respectively. In 2019, AWS entered into a framework agreement with the Government of Canada to provide cloud services for information marked as “Protected B.”

Publicly Available Information and Canada’s Existing Privacy Law

The CSE Act’s definition of publicly available information is broader than that of the Personal Information Protection and Electronic Documents Act (PIPEDA), which governs the commercial sector’s collection and use of personal information. PIPEDA’s Regulations Specifying Publicly Available Information essentially define “publicly available” as personal information appearing in telephone and business directories, public registries, court records or personal information provided voluntarily to a publication.

By contrast, the CSE Act definition accommodates all of the above plus de-identified network metadata and, through subscription or purchase, data acquired, analyzed and de-identified by third party entities, such as text, imagery, videos (including voiceprints), geolocation data; and, from social media websites, lifestyle and relationship data.

Communications service providers, data brokers and commercial OSINT providers are all examples of third-party entities. So, too, are non-profit investigative organizations like Bellingcat, the International Consortium of Investigative Journalists and Distributed Denial of Secrets. Of note, the personal information such non-profits acquire may not always have been provided voluntarily.

Finally, CSE can also receive publicly available information from foreign allies and other federal departments and agencies. If a federal institution collects publicly available information that relates directly to its operating program or activity, then it does not need to seek consent prior to use or disclosure of this information. However, the Privacy Act does not define “publicly available information.” The discrepancy between the CSE Act’s broad definition of publicly available information and PIPEDA’s narrower definition has implications for CSE since the private sector generates much of the publicly available information it collects.

An Unreasonable Expectation of Privacy?

Pervasive digitalization raises difficult legal and ethical questions about what can be reasonably expected to remain private and the extent to which an individual can control what happens to their personal information while continuing to participate in society.

Informed and meaningful consent, a key tenet of privacy law, is increasingly difficult to obtain. Clicking through website cookie consent requests has become a time-consuming and – according to some observers – meaningless task. Anxious to gain access to website content or an online service, many users will simply click “accept all” without reading the fine print. Moreover, as ML grows more capable of drawing accurate (and monetizable) inferences from data, data collectors are incentivized to maximize collection.

The concept of informed consent becomes further attenuated as Internet-connected smart devices drive “dumb” devices from the consumer marketplace. Every computer-embedded, network-connected device – from toothbrushes to thermostats – is a data collection opportunity. A similar transformation is underway in public spaces as infrastructure becomes smart, prompting some to describe a mismatch between Canada’s data protection law and smart environments.

“Data protection law is premised on ideas of individual control over personal information and centered on relationships between individuals and (private or public) organizations,” argue Canadian privacy scholars Lisa Austin and David Lie in their paper on data trusts and the governance of smart environments, but data collection in smart environments “is not easily modelled on the intentional sharing of personal information with an organization providing you with a product or services.”

Legislative Amendments Anticipated

Emphasizing the need for Canada to keep pace with technological change, a September 2021 Department of Justice discussion paper entitled Privacy principles and modernized rules for a digital age indicated that the federal government seeks to define or redefine a range of key terms, including personal information, consent and publicly available information.

This discussion paper and the federal Minister of Justice’s 2021 mandate letter suggest that the government intends to amend the Privacy Act. Similarly, proposed PIPEDA amendments, referred to in the Minister of Innovation, Science and Industry’s 2021 mandate letter, would make it easier for commercial entities to collect, use and disclose personal information without prior consent.

CSE and its domestic partners have a stake in these proposed amendments as each relies on big data analytics. A key question for parliamentarians is how the security and privacy risks of maintaining sensitive datasets – either on-premises or in the commercial cloud – will be addressed.

Figure 1 – Examples of Nations Whose Intelligence Agencies Have Arti­ficial Intelligence (AI)–Enabled Open-Source Intelligence (OSINT) Capabilities

Each country in the Five Eyes intelligence alliance – Canada, the United States, the United Kingdom, Australia and New Zealand – has artificial intelligence (AI)–enabled open-source intelligence (OSINT) capabilities. Examples of countries that have AI-enabled OSINT capabilities and cooperative relations with the Five Eyes intelligence alliance include France, Israel, Germany, the Netherlands, Denmark, Sweden, Norway and Japan. Russia and China have longstanding OSINT programs. Their implication in recent large-scale database breaches in the United States and elsewhere suggests these two countries’ intelligence services have AI capabilities.

Sources: Figure prepared by the Library of Parliament using information obtained from Elliot A. Jardines, “Open Source Intelligence,” in Mark Lowenthal and Robert M. Clark, eds., The Five Disciplines of Intelligence Collection, 2016; and Institute for Human-Centered AI, “Chapter 7: AI Policy and National Strategies,” Artificial Intelligence Index Report 2021, Stanford University, 3 March 2021.

Additional Resources

Hammond-Errey, Miah. Big Data and National Security: A guide for Australian policymakers. Lowy Institute, 2 February 2022.

Harding, Emily. Move Over JARVI, Meet OSCAR: Open-Source, Cloud-Based, AI-Enabled Reporting for the Intelligence Community. Center for Strategic and International Studies, January 2022.

New Zealand. Ministerial Policy Statement: Obtaining and using publicly available information, September 2017.

Office of the Privacy Commissioner. Interpretation Bulletin: Publicly Available Information, March 2014.

Review Committee on the Intelligence and Security Services (CTIVD). Review report: On bulk data sets collected using the hacking power and their further processing by the AIVD and the MIVD. Report 70, 19 August 2020.

Royal Canadian Mounted Police. Audit of Open Source Information, January 2021.

Treasury Board of Canada Secretariat. Government of Canada White Paper: Data Sovereignty and Public Cloud.

United Kingdom. Operational Case for Bulk Powers, 2016.

Author: Holly Porteous, Library of Parliament

Categories: Information and communications, Law, justice and rights

Tags: , , , ,

%d bloggers like this: