Doctors Swear to 'Do No Harm.' Why Don’t Data Scientists?

Doctors Swear to 'Do No Harm.' Why Don’t Data Scientists?

By Tom Cassauwers


Because data science wields great power, but not always equal responsibility.

By Tom Cassauwers

Two researchers from Stanford University recently claimed to have constructed a machine learning algorithm that predicted sexual orientation from facial pictures. The researchers recognized the dangers in this themselves, observing that “in some cases, losing the privacy of one’s sexual orientation can be life-threatening.” The original study, a good example of the risks involved in data science research, is now under ethical review.

We live in an age where data is described as the new oil. Large amounts of information are collected and stored, and increasingly complicated techniques like machine learning are used to derive insights from it. But with such great insight comes great risk, not to mention tremendous power to influence people’s lives. Toon Borré, who heads the data department of consultancy firm TriFinance, remembers refusing a request to determine when it would be financially cheaper for a hospital to let someone die on the operating table. “We hear all the time about how data is the new oil,” says data scientist Charles Givre, “but I would argue that if mishandled, data can also be the new TNT.”

One way to deal with these types of unsavory, and potentially explosive, outcomes is a voluntary code of conduct for data scientists, similar to the Hippocratic oath in medicine. Many professions, like lawyers, journalists or doctors, already have such codes of conduct, and data scientists should be next.

A code of conduct for data scientists makes sense.

Data scientists are the people making the algorithms and deriving the insights from data. They often hail from disparate backgrounds, the subject generally can’t be studied in school, and the field includes diverse areas such as databasing, programming and statistics, but also machine learning and data visualization. Data science also carries with it ethical risks, but, surprisingly, ethical considerations are not always top of mind. “Right now questions about ethics, when they appear at all, are often only asked at the end, when the algorithm is already developed,” says Sabina Leonelli, a professor and philosopher of science at the University of Exeter.


What rules and procedures should be in such an oath is open for debate. “I do not think a code of conduct should stop a technology whenever there is a potential bad use of it,” argues Leonelli. “Every technology will have problematic implications depending on the context in which it is used.” Jacob Metcalf, a data ethics researcher at Data & Society, agrees that context matters as there is often no clear right or wrong with such data technologies. “What we should invest in,” he says, “is the adoption of rigorous structures that force researchers to think about these questions.”

A code of conduct could involve communicating about risks with experts and stakeholders, making alterations to algorithms to prevent misuse or making algorithms open source. Such a code could also interact with law and state regulation, which could help establish parameters for what should be done, particularly with personal data. Yet state regulation of data science has its challenges. “The problem is that technology is moving so fast that there is no way any legal framework can regulate future technologies,” says Leonelli. “The law will always fall behind.” A code of conduct, however, would be able to keep up with technology.

It is hard to find anyone in the data science world who explicitly opposes a code of conduct, but even its proponents are skeptical about its practical implementation. For one thing, few professional associations for data scientists exist, and the field lacks clear boundaries. Educational institutions and informal meet-ups, though, might provide a staging ground for a code to develop.

Still, the lack of a professional body would make it hard to enforce a code of conduct, and such codes generally involve professional licensing, as with the medical and legal professions. “Unless there is an enforcement mechanism or a body that can pass judgment,” says Metcalf, “a code of conduct is mostly useful for peer pressure.”

Leonelli says that ultimately the development of a code of conduct or oath for data science would need to be “a bottom-up initiative by researchers.” Nevertheless, a code of conduct for data scientists makes sense. They work with risky technologies that are hard to officially regulate, and a professional code could act as a first line of defense against misuse, avoiding problems downstream. In the end, some sort of oath just might be hard to avoid for data science.