The staggering amount of data that Amazon records about its users

In front of me was a table with 12.048 rows and 49 columns that contains all the clicks related to my activity on Amazon.

This article originally appeared on Motherboard Italia.

Credits: NASA/Ames/JPL-Caltech

In the age of the digital disintegration of our inner self, scattered around a thousand platforms that absorb our personal data and our online activities, the attention to the e-commerce giant, Amazon, seems to be still too low. But I decided to scratch the surface and look at the abyss: I sent a request for access to my personal data that opened a world of tracking.

While Amazon’s monopoly is already beginning to draw the attention of both American and European antitrust experts, the digital surveillance of Jeff Bezos’ platform still seems to be poorly probed.

So far, 2018 has been the year of Facebook disaster, in which the capillarity of the social network tracking was thrown in front of us—every single action and personal data become a vector for the production of targeted advertising. Amazon, however, seems to have emerged unscathed.

The clickstream data is the recording of every click and action that we make on the Amazon website, complete with information about the date, the type of device we are using, the IP address and our position.

This is why last May, before the entry into force of the new European General Data Protection Regulation (GDPR), I sent a request for access to my personal data to Amazon. The answer came after ten days and contained a slim list of data that is already available through the personal account panel.

The email contained four attachments, password protected: Miscellaneous, Correspondence with sellers, Correspondence with Customer service, and Order History. The most interesting part, the Miscellaneous document, contained the list of credit cards connected to my account, my devices connected to Amazon, and information on any active promotions.

In essence, however, nothing extraordinary. What a strange thing, because I would have expected at least the whole list of searches that I have done and communications via email related to products that I may be interested in. But there was none of this.

However, on April 28th, activist and politician Katharina Nocun published in Spiegel Online her epic journey to get her personal data from Amazon. Inside the article she talked about the so-called clickstream data that the activist was able to get after months of requests.

The clickstream data is the recording of every click and action that we make on the Amazon website, complete with information about the date, the type of device we are using, the IP address and our position.

Given this precedent, I immediately sent a further request to Amazon to have these data too. After 90 days of waiting I finally got what I wanted:

Among the various files attached, there is one related to the search history, but it contains only the subject of the search and no other type of chronological information—the data does not even seem to include all the searches I’ve done since the first time I used Amazon.

Likewise, it is interesting to note the localization data file which, however, includes very few values—most likely due to the fact that I have disabled access to GPS information for my Amazon app.

The most interesting file, however, is the one related to the clickstream data: a table with 12.048 rows and 49 columns that contains all the clicks related to my activity on Amazon that have been recorded.

In addition, all our movements within Amazon website are recorded, gathering information on the previous and the next page visited, and also the URL that led us to the page we are viewing.

Each line is related to a session and, to understand the meaning of every single recorded data, Amazon has also attached a file in which it comments and explains each attribute—you can find the document at the end of this article.

The values concern the day and time when a specific page is visited, the IP address and the device used, the geolocation—if possible—based on the IP address, and the name of the telecommunication company that offers the internet service.

In addition, all our movements within Amazon website are recorded, gathering information on the previous and the next page visited, and also the URL that led us to the page we are viewing. It even registers if the last page we opened on our browser was Amazon on that day.

Many of the table values ​​are missing and some have unreachable links, though.

We have asked information directly to Amazon to understand how long this data is stored, when the collection began—since my account dates back to 2013 while the clickstream values are only relative to 2017—and especially if they will be provided by default in response to requests for access to data under the terms of the GDPR.

At the time of publication of this article we still have not received any answers.

In the privacy policy of the Amazon website, “they do list relatively detailed information about the types of data they collect,” explains via mail Michael Veale, a researcher at University College London and a privacy expert, but “What they don’t seem to do is link these types of data to the GDPR lawful bases used to collect them, such as consent, contract or legitimate interests. They need to do this (see Art 13, GDPR), and it’s important for several reasons, as it affects your rights to object or to withdraw consent.”

“We shouldn’t just demand it, it should be provided upon request. It’s personal data, and when users request all their personal data from Amazon or any other company, web tracking and usage data should be provided, ideally in an intelligible form.”

“It’s very common for firms to capture detailed user data,” explains Veale, “also for the purposes of user experimentation, which can be highly invasive.” It is however difficult to confirm that each e-commerce website collects them, clarifies Veale, but we do know that some firms are reluctant to release this data to users.

What is striking, however, is to see them aggregated together in a single table, mirroring our digital self, prey of the capitalistic impulses to purchase, revealing our most intimate passions, our interests and our obsessions—including searching for products at the most unlikely hours of the night.

While these data seem necessary for the analysis of site metrics, on the other hand it raises considerable concerns about the granular  details which these platforms have access to: it is as if we were constantly followed by a person who peeks over our shoulder eager of our clicks and meticulously marking all the values ​​in a notebook. And this happens every single moment of our life.

“There’s a lot that can be inferred from this type of collection, both implicitly and explicitly,” underlines Veale, “It’s important that there is transparency and oversight around how this data is used for prediction or personalisation. Data should not be kept for longer than necessary under the GDPR, and so Amazon should not sit on these records, but decide whether they are useful for a purpose, and if not, they should be erased. “

“We shouldn’t just demand it, it should be provided upon request. It’s personal data, and when users request all their personal data from Amazon or any other company, web tracking and usage data should be provided, ideally in an intelligible form,” concludes Veale.

Moreover, compared to the responses received by Nocun, in my case there is still a lot of data missing, for example the one related to the advertisements I have clicked on and other details about the emails received based on my preferences.

So I have no choice but to change weapons: with the introduction of the GDPR—which has clogged our inbox but also provided European citizens with a very powerful tool for controlling our online data—I will send an additional request to Amazon to get, once and for all, all the data on my account.    

Italian Hacker Camp — Video

Lo scorso 3 agosto sono stato ospite alla prima edizione di Italian Hacker Camp (IHC) presso Padova, presentando il mio lavoro Come to Italy: we’ve got pizza, pasta, and surveillance technologies!

Si parla nuovamente delle tecnologie di sorveglianza italiane, soffermandosi sull’acquisto di sistemi IMSI-catcher da parte della Polizia e della Guardia di Finanza. Si tratta di una tecnologia che simula una finta cella telefonica e viene utilizzata per raccogliere informazioni sui dispositivi (e di conseguenza le persone) presenti in una determinata zona — minacciando così la privacy di tutti i cittadini che si trovano lì presenti.

Inoltre, per la prima volta, ho presentato il database delle aziende italiane che partecipano alle gare di appalto per le tecnologie di sorveglianza. È consultabile qui.

Di seguito il video, mentre qui si possono scaricare le slide.

International Journalism Festival 2018 — Video

Here are the videos of the two panels I attended at the International Journalism Festival in Perugia.

In the first one we discussed peculiar cases regarding surveillance in Italy.

The second one is a panel regarding the challenges that cyber journalists have to deal with when talking abut online anonymity, government malware, ethical hacking and whistleblowing.

Italy’s Surveillance Toolbox @ 34C3

I’ll be speaking at the 34C3 in Leipzig on 30th December 2017.

For the past 6 months I’ve been researching on monitoring Italian government surveillance capabilities using transparency tools

This project aims to take advantage of the availability of public procurement data sets, required by anticorruption transparency laws, to discover government surveillance capabilities in Italy.

At the same time it will be possible to update a database of companies selling surveillance tech, discover official resellers of other foreign surveillance companies, and detail governmental expenditures for surveillance technologies.

Hope to see you at the 34C3 or contact me on the internet.

boter