Cambridge Analytica sought to use Facebook data to predict partisanship for voter targeting, UK investigation confirms
The UK’s data watchdog has sent a letter to parliament In lieu of a final report on a wide-ranging investigation into online political advertising which saw it raid the offices of Cambridge Analytica in 2018 after it emerged that the disgraced (and now defunct) data company had improperly acquired data on millions of Facebook users.
In the letter the regulator says the material that it reviewed included:
42 laptops and computers;
700 TB of data;
31 servers;
over 300,000 documents; and
a wide range of material in paper form and from cloud storage devices
“The sheer volume of material seized meant that we were presented with a digital ‘haystack’ of information in various states and locations and this has prolonged the work involved in reviewing and assessing the material to help us understand what happened. However, by piecing together the timeline of events we were able to get a thorough evidential insight into what was likely to have taken place,” it writes before going on to sketch its understanding of how Cambridge Analytica/SCL was operating at the time it paid a Cambridge University academic, Dr Aleksandr Kogan, to improperly procure and process millions of Facebook users’ data with the intention of targeting US voters with ads.
“The conclusion of this work demonstrated that SCL were aggregating datasets from several commercial sources to make predictions on personal data for political alliance purposes,” the ICO writes. “For example, we recovered data which included Voter files (the US version of the Electoral Register), Consumer Data Sets, Social Media and Intelligence Data Sets that appeared to come from the following companies: Labels & Lists, InfoGroup, Aristotle, Magellan, Acxiom and Experian. Some data has the appearance of similar US voter data that has been subject to known cyber breaches and has been available on-line.”
The former CEO of Cambridge Analytica, Alexander Nix -- who was last month banned from running a company for seven years, after he signed a disqualification undertaking with the UK insolvency service -- previously told the UK parliament that CA/SCL had acquired the bulk of the data it was using to build psychographic profiles of voters from major commercial data brokers such as Acxiom, Experian and Infogroup.
Per the ICO’s assessment, CA/SCL had been over-egging the depth of its people profiling — with the regulator saying it did not find evidence to back up claims in its marketing material that it had "5,000+ data points per individual on 230 million adult Americans”.
“Based on what we found it appears that this may have been an exaggeration,” it writes.
The ICO was satisfied that the Facebook data transferred to CA/SCL by Dr Kogan’s company was incorporated into a pre-existing larger database it already held -- containing "voter file, demographic and consumer data for US individuals”.
"The data points collected by GSR [Dr Kogan's company] with respect to [Facebook app] survey users and their Facebook ‘friends’ was specifically selected to enable a ‘matching’ process against pre-existing SCL databases," it writes, explaining its understanding of how CA/SCL used the improperly obtained Facebook data. "Matching took place using file sharing platforms and by reference to name, date of birth and location – with SCL’s existing datafiles being ‘enriched’ and supplemented by GSR’s data about those same individuals – and this matched information being passed back into SCL systems.
"This resulted for example information including scores for voting frequency, whether likely republican or democrat, voting consistency, and a profile which predicted personality traits matched to information such as voter ID, name, address, age, and other commercial data."
The investigation also confirmed CA/SCL applied AI techniques to the data to try to predict partisanship or other significant attributes of voters for the purpose of more effectively targeting them with political messaging. Although it says it was unable to confirm whether such techniques were used in specific campaigns.
"Through such processes the relevant US voter GSR data (about approx. 30 million individuals) was then further analysed using machine learning algorithms to create additional 'predicted' scores relating to partisanship and other criteria which were then applied to all the individuals in the database. Some of these focussed on likes as wide ranging as “gay rights”, “Obama the worst president in US history”, “Re-elect President Obama in 2012”, “the Bible” and “National Rifle Association”," it writes.
"These scores were used to identify clusters of similar individuals who could be potentially targeted with advertising relating to political campaigns. This targeted advertising was ultimately likely the final purpose of the data gathering but whether or which specific data from GSR was then used in any specific part of campaign has not been possible to determine from the digital evidence reviewed. There is however evidence recovered that suggests that similar approaches and models based on the predicted personality traits and other measures were used with Republican National Committee (RNC) data."
On CA’s/SCL's data modelling methods the ICO concludes that the company was mainly using “well recognised processes using commonly available technology”.
“For example, open source data science libraries such as ‘scikit’ were downloaded by SCL – containing well established, widely used algorithms for data visualisation, analysis and predictive modelling. It was these third-party libraries which formed the majority of SCL’s data science activities which were observed by the ICO,” it writes. “Using these libraries, SCL tested multiple different machine learning model architectures, activation functions and optimisers (all of which come pre-developed within the third-party libraries) to determine which combinations produced the most accurate predictions on any given dataset. We understand this procedure is well established within the wider data science community, and in our view does not show any proprietary technology, or processes, within SCL’s work.”
The regulator further notes there are ongoing questions over the efficacy of such modelling for predicting individuals’ attributes -- highlighting signs of internal scepticism over the approach.
“Through the ICO’s analysis of internal company communications, the investigation identified there was a degree of scepticism within SCL as to the accuracy or reliability of the processing being undertaken. There appeared to be concern internally about the external messaging when set against the reality of their processing,” it notes.
The ICO’s investigation also did not find evidence that the Facebook data that Kogan sold to Cambridge Analytica was used for political campaigning associated with the UK’s Brexit Referendum. “Our view on review of the evidence is that the data from GSR could not have been used in the Brexit Referendum as the data shared with SCL/Cambridge Analytica by Dr Kogan related to US registered voters,” it writes.
A lack of evidence that UK Facebook users’ data had been used for the political targeting was Facebook’s contention when it challenged the ICO’s £500k penalty for the Cambridge Analytica scandal.
The regulator eventually settled with Facebook last year -- although the company did not admit liability.
The ICO’s letter also discusses the Canada-based data company AIQ, which was linked to CA/SCL, and did play a key role in the UK’s Brexit referendum — as it was used by several 'Leave' campaigns to target ads at UK voters via Facebook.
“There was a range of evidence that demonstrated a very close relationship between AIQ and SCL (such as evidence that described AIQ as the Canadian branch of SCL and evidence that Facebook invoices to AIQ for advertising were paid directly by SCL). However, AIQ has consistently denied having a closer relationship beyond that between a software developer and their client. Mr Silvester (a director/owner of AIQ) has stated that in 2014 SCL 'asked us to create SCL Canada but we declined',” the ICO writes.
The regulator says it investigated whether AIQ had used the same datasets to target adverts at UK voters on behalf of three different 'Leave' campaigns: Vote Leave, BeLeave, the DUP and Veterans for Britain -- but it did not find evidence that this occurred.
"Initial information provided by Facebook had suggested that there were three audiences that were used for targeting by both Vote Leave and BeLeave. However, AIQ subsequently clarified that this was an admin error made by a junior member of staff while creating the BeLeave account. The error was corrected the following day and no information from those campaigns was disseminated through Facebook in the form of targeted ads," it writes.
While the ICO's letter-to-parliament in lieu of a more formal final report may appear to be something of an anticlimax to a long-running data misuse scandal, the regulator reiterates concerns over what the letter couches as "systemic vulnerabilities in our democratic systems".
Although information commissioner, Elizabeth Denham, does not further flesh out her earlier publicly stated concern that democracy is being disrupted by big data.
Instead the letter notes the ICO has provided "advice and guidance" with the aim of achieving better future compliance with the rules to several unnamed organisations on the remain and the leave side of the UK’s referendum.
"My audit teams have also concluded audits of data protection compliance at 14 organisations associated with the original investigation, including: the main political parties, the main credit reference agencies and major data brokers, as well as Cambridge University’s Psychometrics Centre. We have made significant recommendations for changes to comply with data protection legislation," she adds.
The detail of those "significant" recommendations are pending reports of the ICO's audits of the main political parties; the main credit reference agencies and major data brokers; and Cambridge University Psychometrics Centre -- which the ICO notes will be published "shortly".
One more interesting detail from the ICO's CA/SCL investigation is it appears the company had been planning to relocate its data offshore to avoid regulatory scrutiny -- presumably as the media furore around the Facebook data scandal cast a spotlight on its processes.
"We also identified evidence that in its latter stages SCL /CA was drawing up plans to relocate its data offshore to avoid regulatory scrutiny by ICO. We have followed up their complex company structure with overseas counterparts and have concluded that while plans were drawn up, the company was unable to put them into effect before it ceased trading," is the regulator's conclusion on that.
On the Facebook data-set itself, the ICO says its investigation found data "in a variety of locations, with little thought for effective security measures". "We found that individuals of interest to the investigation held data on various Gmail accounts," it notes. "Data was also found in servers and appeared to have been shared with a range of parties, for example there was evidence that data had been shared with staff at SCL/CA, Eunoia Technologies Inc [CA whistleblower Chris Wylie's company], the University of Cambridge and the University of Toronto."
The letter also reveals that a number of unnamed "senior figures" associated with the scandal have continued to refuse to cooperate with the ICO's investigation. "Several senior figures have continued to maintain their silence and have declined to be interviewed," it notes.