Residuals for everyone—selling our data to teach AI

Residuals for everyone

—selling our data to teach AI

Analysis

Creativity / Technology

Reading Time:

Minutes

Source:

Wired

Medical researchers start dabbling in AI, but hit wall

Medical professionals are starting to tap into machine learning as a means of furthering their work, especially to find patterns that can help interpret their patient’s test results. Stanford ophthalmologist Robert Chang hopes to use eye scans to track conditions like glaucoma as part of this ongoing tech rush.

The problem, however, is that doctors and researchers have trouble gathering enough data from either their own patients or others because of the way those patients’ data is handled. Indeed, there’s a great deal of medical data that’s silo’d due to different policies on sharing patient information. This makes it challenging to share patient metrics between institutions, and subsequently to reach critical data mass.

Kara and Differential Privacy

Oasis Labs, founded by UC Berkeley professor Dawn Song, securely stores patient data using blockchain technology that encrypts and anonymizes the data while preventing it from being reverse engineered. It also provides monetary incentives to encourage participants, who could be compensated for each time their data is used to train artificial intelligence.

It’s not just the promise of money that’s making them more willing to submit their data. Song and Chang are trialling Kara, a system that uses differential privacy to ensure the AI gets trained on data (stored on Oasis’ platform), but the data remaining invisible to researchers.

Quality Matters

For the medical industry, having access to quality data will become increasingly important as the reliance on AI increases. Quality doesn’t mean individual data points (a grainy eye scan could throw off the machine’s learning) but rather the entire data set.

In order to prevent biases, which AI systems are prone to depending on what data sets they are fed, a system will need particular segments of the population to contribute data to round out its “training.” For this to happen, incentives will need to be carefully weighed and valued. Training a medical AI designed for a general population, for instance, would require samples from a diverse group of individuals including those with less common profiles. To incentivize participation, compensation might be higher for this group.

Otherwise, the designers of the AI could simply choose not to include certain groups as has happened in the past, thus creating a discriminatory AI. In this case, it’s less a matter of the machine that’s learning and more of the people initiating the teaching. That said, the resultant discriminatory AI has the very real power to change the course of peoples’ lives such as by filtering out their job applications.

Data ownership, Dividends and Industries

Despite these drawbacks, a combination of monetization and secure storage of personal data could signal the beginning of a new market where individuals can earn a fee for sharing data that wouldn’t have been shared in the first place; in essence, royalties for being ourselves, assuming we’re “valuable,” that is.

For the creative industry, the consensus is that for all its strides, AI has yet to evolve beyond being a very powerful assistant in the creative process. At present, it can create derivative work that resembles the art it’s been fed, but still lacks the ability to absorb what we know as inspiration. For example, IBM used its Watson AI to create a movie trailer for a horror movie after feeding it 100 trailers from films of that genre with each scene segmented and analyzed for visual and audio traits.

For now, the emergence of a data market doesn’t seem lucrative enough to birth a new class of workers (lest we all quit today to become walking data mines), but supposing the incentives were enticing and a company like Oasis could guarantee that data privacy was ensured, could we see more creators willing to give up some of their work? Perhaps even unpublished work that would never been seen? Would quick file uploads coupled with a hassle-free “for machine learning only” license mean an influx of would-be creators hoping to make data dividends off work they could license elsewhere too?

On one hand, it would provide a way for creatives to earn residuals off their work given that AI needs thousands if not millions of samples and other sources (such as websites for stock creative assets) might not be as lucrative. That said, just as different data sets are needed for different purposes, we might see the emergence of a metrics-based classification system to objectively grade subjective work and assign value to it.

And if those works can be graded, so too can their creators with all the opportunities that follow a “quality data” distinction. Maybe one day when a program like Watson reaches celebrity artist status, we can brag to our peers, “yeah, I taught it that.”