African experts are gathered for two days (19-20 February 2018) in Addis Ababa, Ethiopia to contribute to the development of the African Privacy and Personal Data Protection Guidelines. The meeting, facilitated by the African Union Commission (AUC) and supported by Internet Society, explored the future of privacy and data protection and provided some practical suggestions that African states can consider in implementing the Malabo convention provisions related to online privacy. The guidelines are aimed at empowering citizens, as well as establishing legal certainty for stakeholders through clear and uniform personal data protection rules for the region.
The expert meeting comes amidst growing concern across the world on the need to prepare for the EU General Data Protection Regulation (GDPR), which will be enforced on 25 May 2018. The expert meeting is rather focused on creating general principles for African member states in developing good practices now and in the future. The project, a partnership of the AUC and the Internet Society, comes as a follow up to the recommendations of the Africa Infrastructure Security Guidelines, developed in 2017 to assist speed up their adoption and subsequent ratification of the Malabo Convention.
Both the Heads of States Summit in January 2018 and Specialized Technical Committee Ministerial meeting endorsed the development of these guidelines as a way to strengthen the capacity of African states to deal with emerging issues in the digital space.
The African privacy and data protection landscape is still nascent with only 16 of the 55 countries having adopted comprehensive privacy laws regulating the collection and use of personal information (C Fichet, 2015). The African Union Convention on Cyber Security and Personal Data Protection is considered an important first step aimed at creating a uniform system of data processing and determining a common set of rules to govern cross-border transfer of personal data at the continental (African) level to avoid divergent regulatory approaches between the Member States of the African Union. Now that a continental framework is in place, there is a need for more detailed best practice guidelines on personal data protection to assist countries in the process of domesticating the Malabo Convention into the national laws.
With every advancement in technology we get a new database to get excited about. With the cloud, we started caring about scale, and No-SQL databases rose to the fore. With social networks, graph databases became the hot new thing. And now, with the internet of things, time series databases are getting their day in the sun.
That’s why InfluxData just raised $ 35 million in a round led by Sapphire Ventures. The goal is to expand the company’s database sales beyond its current customers, which include Tesla, IBM, and Nordstrom. At the end of January, another time series database called Timescale raised $ 12.4 million in funding. So the space is hot.
Time series databases aren’t new. Traditionally, they are simply a measurement of the state of some sensor and the time. But now that there are connected sensors that can take in data hundreds of times a day or more, these databases are seeing more action. Plus, in many situations it’s not enough to collect the data and then ship it somewhere as a log. Now people want to take action on that data. And they want to take that action as soon as possible.
This means that time series databases aren’t just handling a greater velocity and volume of data; they also have to analyze it as it streams by. Think of it as the more active version of logging data as performed by companies such as Splunk. There are many time series databases out there, including giants such as GE’s Predix, as well as smaller projects like Riak or Graphite. Many projects started as ways to monitor IT systems and websites, not thermostat readings or automotive data.
In InfluxData’s case, CEO Evan Kaplan touts the speed of the database plus the available suite of tools it works with, which allows developers to monitor IoT assets and query data even as more data is coming in. It also stores data in a compressed format and quickly ditches the dregs it doesn’t need.
Together with tools called Telegraf, Chronograf, and Kapacitor, Kaplan is selling a concept called the TICK stack. It is designed to rapidly ingest and handle data while also giving users the tools to query it. As a lover of many IT stacks—from the historical LAMP (Linux, Apache, MySQL, PHP) stack for web development to the more recent SMACK (Spark, Mesos, Akka, Cassandra, and Kafka) stack for big data—I like the idea of one for the IoT.
However, note that in this case Influx is promoting the tools it is developing as opposed to developers promoting a collection of independent technologies that they have found to work well together. That doesn’t mean it will fail; it’s just a different genesis.
As for revenue, InfluxData has a slightly different model than the traditional open-source efforts. It offers one server for free, and as the database expands, customers will pay for an enterprise license so they can build a larger cluster capable of handling more. Given how much time series data machines throw off, it’s a model that should net it plenty of revenue over time.
Did you know your television is watching you? Specifically, that most smart TVs are sending data off to their makers and in certain cases, to marketers. Consumer Reports showcased the security flaws and the lack of privacy inherent in connected TV in a report last week, while over at GizmodoKashmir Hill has a new article out about privacy in the smart home that puts a big focus on televisions.
It’s no secret that internet-connected TVs share data with others, nor is it remarkable that most TVs available today are smart. That’s what allows you to watch Netflix, YouTube, or Amazon Prime shows. But the rest of our appliances are also going the way of TV. Samsung and Kenmore both say that, going forward, all of their appliances will have some kind of connectivity built into them.
And for many, the features enabled by connected devices will mostly outweigh the fears of data surveillance. I’m not talking about connected light bulbs and home automation here, but about adding truly innovative and helpful features to once-dumb appliances, letting them become truly smart.
An example of this is a washing machine that can tell how dirty your clothes are and select the proper cycle. Or a fridge that can offer you a remote camera feed to the inside so you can see what’s on the shelf. Maybe the fridge could reorder your water filter when it’s getting old. Even better, maybe that same filter could report back on the purity of the water to environmental agencies and consumers as a way to ensure public health.
Smarter products will have to be connected in order to create information exchanges that benefit the consumer, the manufacturer, and maybe even society. However, the industry so far is screwing this up with an ineptitude driven by greed, short-term thinking, and a desire to act first and beg forgiveness later.
This is emblematic of the culture built up over the last two decades in technology, where we took the internet and used it to turn users into the product. The current backlash against Silicon Valley companies is a reaction to this exchange of personal data for services. Especially as the services became more about keeping the person engaged to the exclusion of their well-being or the well-being of society.
This may sound like hippie dippie stuff, but there is a direct link from Google and Facebook’s behavior to the privacy concerns that people have with regard to connected devices. That those concerns are completely justified only makes it worse.
I’ve spent years trying to tell the industry and the government that privacy matters. Not just because it’s a basic right, but because if you respect people’s privacy and offer them agency over controlling their data, they are more likely to buy the product. And if you offer them a compelling reason to share their data while still offering them some control, you actually build a model where the data you collect has to benefit the user or the larger society.
We are starting to see some momentum on this front, and I am hopeful that 2018 will be a turning point in the U.S. The General Data Protection Regulation in the EU has already established a framework for how to establish data privacy as a human right. What’s even more promising is that many of the regulations in the GDPR are impossible or difficult to implement today, and the EU realizes that.
The hope is that the EU will guide technologists in developing tools that match the regulatory framework while the regulatory stick offer will offer an incentive for companies to make a market to develop the tools required to meet the law. Meanwhile, here in the U.S., technologists are increasingly asking themselves how to get and use data responsibly.
While this entire essay is focused on the importance of managing user privacy and the intentional gathering and sharing of consumer data, security is also related to the topic. Specifically, what happens to consumer data when security is breached. As it stands, consumers are worried both about a loss of their privacy to companies, but also to hackers as part of the all-too-often security breaches.
Until the tech companies get their priorities in order and the government steps up with rules that give consumers some control over their information, I believe the promise of the smart home will never take off, because consumers won’t trust it.
After years of hope and promise, 2018 may be the year when artificial intelligence (AI) gains meaningful traction within Fortune 1000 corporations. This is a key finding of NewVantage Partners’ annual executive survey, first published in 2012. The 2018 survey, published on January 8, represented nearly 60 Fortune 1000 or industry-leading companies, with 93.1% of survey respondents identifying themselves as C-level executive decision-makers. Among the 2018 survey participants were corporate bellwether companies, including American Express, Capital One, Ford Motors, Goldman Sachs, MetLife, Morgan Stanley, and Verizon.
The main finding of the 2018 survey is that an overwhelming 97.2% of executives report that their companies are investing in building or launching big data and AI initiatives. Among surveyed executives, a growing consensus is emerging that AI and big data initiatives are becoming closely intertwined, with 76.5% of executives indicating that the proliferation and greater availability of data is empowering AI and cognitive initiatives within their organizations.
The survey results make clear that executives now see a direct correlation between big data capabilities and AI initiatives. For the first time, large corporations report that they have direct access to meaningful volumes and sources of data that can feed AI algorithms to detect patterns and understand behaviors. No longer dependent on subsets of data to conduct analyses, these companies combine big data, AI algorithms, and computing power to produce a range of business benefits from real-time consumer credit approval to new product offers. Companies such as American Express and Morgan Stanley have publicly shared stories of their successes within the past year.
Staving Off Disruption
Survey participants comprised executives representing data-intensive industries, notably financial services companies, which constituted 77.2% of the survey respondents. Financial services companies have long been at the forefront of industry due to the large volumes of transactional and customer data that they maintain, and they have developed robust data management and data governance processes over a period of decades. These organizations have been at the forefront in the use of analytics to manage risk, assess customer profitability, and identify target market segments. Industries such as life sciences, while newer to data management, possess vast repositories of scientific and patient data that have gone largely untapped relative to the potential for insight.
Now, many of these mainstream companies are facing threats from data-driven competitors that have no legacy processes and have built highly agile data cultures. Companies like Amazon, Google, Facebook, and Apple are among the most prominent disruptive threats to these traditional industry leaders. As mainstream companies increase their investment in big data and AI initiatives, they face a range of issues and challenges as they seek to organize to compete against data-driven competitors. This concern is highlighted in the 2018 survey results.
A clear majority (79.4%) of executives report that they fear the threat of disruption and potential displacement from these advancing competitors. In response to the threat of disruption, companies are increasing their investment in big data and AI initiatives. In the 2018 survey, 71.8% of executives indicate that investments in AI will have the greatest impact on their ability to stave off disruption (in the next decade). Although overall investments in AI and big data initiatives continue to be relatively modest for most large corporations, 12.7% of executives report that they have invested half a billion dollars in these initiatives to date. If the fear of disruption is any indication, this number can be expected to increase.
Driving Innovation Through AI
Executives indicate that investments in big data and AI are beginning to yield meaningful results. Nearly three-fourths of executives surveyed (73.2%) report that their organizations are now achieving measurable results from their big data and AI investments. In particular, executives report notable successes in initiatives to improve decision-making through advanced analytics — with a 69% success rate — and through expense reduction, with a 60.9% success rate. Businesses are also using big data and AI investments to accelerate time-to-market for new products and services (54.1% success rate) and to improve customer service (53.4% success rate). Yet, just over one-fourth (27.3%) of executives report success thus far in monetizing their big data and AI investments. This remains an elusive goal for most organizations.
Nearly one-fourth (23.9%) of respondents report that their investments in big data and AI are highly transformational and innovative for their organization, and potentially disruptive for their industry. But 43.8% of executives report that innovation and disruption initiatives involving big data and AI yield successful results for their organizations.
As mainstream companies look to the future, there is a growing consensus that AI holds the key. With 93% of executives identifying artificial intelligence as the disruptive technology their company is investing in for the future, there appears to be common agreement that companies must leverage cognitive technologies to compete in an increasingly disruptive period. Investment in AI can be expected to increase as organizations position themselves to compete in the future. Those companies that prove themselves to be adept at developing and executing initiatives using big data and AI capabilities will likely be the companies that are best positioned to deflect the threats of agile, data-driven competitors in the decade ahead.
This week’s big news had to do with a heat map published back in November by a fitness tracking application called Strava. A 20-year-old in Australian noticed that the running data from U.S. military personnel indicated where clandestine bases were in Syria. His insights percolated through security analysts on Twitter, and then to the U.S. Department of Defense.
Now the DOD is re-evaluating its policies around wearables and mobile phones, and will likely look at the social media habits of its soldiers as well. What happened with Strava is nothing new, exactly. On a smaller scale, hackers and spies have used public social media profiles to get all kinds of information on targets.
But there are two things that are different about the Strava case—and worth noting. The first is the scale of it. The second is how two types of data were combined to create new insights. Strava helpfully showed data from more than a billion activities which, when combined with the map, created a clear picture for those who knew what they were looking for, and disclosed more than Strava intended.
Inadvertently disclosing new information will be the new challenge of our age as we connect ourselves and our things to the internet. Each of us will leave ever-larger digital footprints, which can be combined in various ways to provide new information, all of which will be searchable to anyone with an internet connection and an interest.
Short of hiding in a bunker, wrapping your phone in foil, and ditching social media, what is a person — or a concerned employer — to do? The short answer is we don’t know. Even fully grasping the problem is tough. There are several aspects to it.
Most importantly, there’s an increasing amount of data about individuals online that’s fairly easy to get. Then there’s an increasing amount of data about that data, so-called metadata, that’s also easy to find (or subpoena). For example, if your tweets are data, then the location data attached to them are metadata. And this data can now be combined in new ways. In this week’s podcast, privacy analyst Chiara Rustici called this a “toxic combination.”
Finally, once data is out there, it can be reused, repurposed, and reformulated to help draw new conclusions and meanings that were never intended. Imagine if that permanent record your teachers threatened you with back in school were real. In this new era it effectively is.
That’s just the data challenge. There’s also an economic challenge. Data is incredibly cheap. Which means getting data and metadata and creating these toxic combinations is also incredibly cheap. It’s also seen as incredibly valuable to corporations, which is why everything from your toothbrush maker to your coffeepot is trying to snag as much information as it can.
Data may be cheap to get and hold economic value, but it’s also expensive and difficult to secure, which means bad actors can get a hold of your social security number and credit cards with what feels like relative ease. And yet, when data breaches happen the individual is left to pay the inevitable costs as they try to restore their credit, deal with financial fallout, or recover embarrassing secrets.
There’s a link from Strava’s disclosure of military secrets to revenge porn, and it runs through the internet and its ability to make getting information easier than ever. And it relies on our increasing ability to digitize anything from our running routes to our photos.
We’re intellectually aware of all this, but whenever it comes time to do something about it, we throw up our collective hands and keep snapping our naked pics. There are few existing weapons to solve this problem, so let’s take a look at what they are and where they fall short.
Opt-ins and transparency: Many of our apps and devices come with a variety of privacy settings that can range from simple — share or do not share — to byzantine. Strava’s were apparently byzantine, which didn’t help folks that wanted to stay off the heat map. But good privacy settings can only go so far. They don’t stop hackers from accessing data and they also don’t stop toxic combinations of data.
Differential Privacy: Apple made this privacy concept famous. Essentially all data collected gets anonymized and injected with random noise to make it hard to recombine it and determine to whom the data refers. This is good for individuals, but it requires technical overhead and that the company do it correctly. Apple’s talked a good game, but researchers looking at its implementation say it left a lot to be desired. The other challenge is that you can still glean a lot of information from anonymized data. Note that none of the Strava folks were identified.
Collect only what you need: This idea is simple. If you are making a device or app, don’t collect more data than you need. For example, the Skybell doorbell doesn’t keep a user’s Wi-Fi credentials after getting set up on the network because it’s not information the company needs. Most other connected devices don’t share that view, however, which led to LIFX bulbs leaking a bunch of Wi-Fi credentials a few years back. Whoops.
This is a tough issue because in many cases companies collect all this extra data in case they might need it someday. And thanks to improvements in machine learning, they may not be wrong. Applying machine learning to random data sets can yield new insights that could improve the service.
Regulations: All of the above are voluntary things that companies can do as a step toward protecting user privacy, or letting users have more say in how their data is used. But the strongest tools to protect privacy will come from regulatory pressure. This year, the world is about to get a massive amount of regulatory pressure in the form of the General Data Protection Regulation. This regulation was passed by the EU in 2016 and goes into effect in May. It acts as a safeguard for data. It enshrines some of the above items, such as needing a reason to collect a piece of data and providing transparency, but it also goes a lot further.
For example, it allows an individual to ask what a company knows about them, forces the company to correct wrong information, and requires the company to dump the user’s data upon request. It also prohibits profiling on the basis of data. These are only some of the regulation’s provisions, but in my conversation with Rustici, it became clear that the GDPR is so forward-looking that from a technical standpoint, we don’t have ways to actually implement some of these provisions yet.
For example, the ability to retract your permission to use data sounds good, but once that data is sold to a third party or combined to create new insights, how can that data be controlled? How can the new knowledge go away?
So while privacy is a huge challenge and one that we’re still wrapping our arms around, we also need to build tools to track each piece of data about us. Maybe even each piece of metadata. Then we need ways to claw that data back. All of this has to be scalable, which leads me to look to something like the blockchain as a way to track data.
We also need to develop a far more sophisticated understanding of what is known about us and how that knowledge can be applied. Which means that companies creating fun blog posts or heat maps based on a wide array of anonymized data should carefully consider how that information could be used.
We keep saying that data is the new oil, but oil is not a wholly harmless substance. We need to accept that data isn’t, either.