Academia-driven development

In this post I present an opinionated (and mostly-wrong) account on programming in academia. It's based in part on my experience as a developer working in academia and in part on conversations I had with fellow developers working in the private sector.

In academia, you rarely program even if you are in computer science. Instead, you read papers, write papers, write deliverables for ongoing projects, or write project proposals. You aren't paid to program, you are paid to publish. You do only as much programming as is needed to have something to write a paper about. “Scientists use programming the way software engineers use public transport – just as a means to get to what they have to do,” observes Bozhidar Bozhanov. While programming purists may be dissatisfied with that, Jason Baldridge is content with this state of affairs and writes: “For academics, there is basically little to no incentive to produce high quality software, and that is how it should be.”

Albert Einstein allegedly said this: “If we knew what it was we were doing, it would not be called research, would it?” While the attribution of this quote is dubious at best, there's a grain of truth in what the quote says. It's natural in research that you often don't know what you work on. I think this is the reason why test-driven development (TDD) is not truly applicable in research. Programming in research is used to explore new ideas. TDD, on the contrary, requires upfront specification of what you are building. German has the verb ‘basteln’ that stands for DIY fiddling. The word was adopted into the Czech spoken language with a negative connotation of not knowing what you're doing, which I think captures nicely what often happens in academic programming.

The low quality of academic software hinders its maintenance and extensibility in long-term development. For one-off experiments these concerns aren't an issue, but most experiments needs to be reproducible. Academic software must allow to reproduce and verify the results that are reported in the publication associated with it. Anyone must be able to re-run the software. It must be open-source, allowing others to scrutinize its inner workings. Unfortunately, it's often the case that academic software isn't released or, when it's made available, it's nigh impossible to run it without asking its creators for assistance.

What's more is that usability of software is hardly ever a concern in academia, in spite of the fact that usable software may attract more citations, thereby increasing the academic prestige of its author. An often-mentioned example of this effect in practice is Word2vec, the paper of which boasts with 1305 citations according to Google Scholar. Indeed, it would be a felicitous turn if we reconsidered the usability of academic software as a valuable proxy that increases citation numbers.

A great benefit that comes with reproducible and usable software is extensibility. Ted Pedersen argues that there's “a very happy side-effect that comes from creating releasable code—you will be more efficient in producing new work of your own since you can easily reproduce and extend your own results.” Nonetheless, even though software may be both reproducible and usable, extending a code base without tests may be like building on quicksand. This is usually an opportunity for refactoring. For example, the feature to be extended can be first covered with tests that document its expected behaviour, as Nell Shamrell-Harrington suggests in surgical refactoring. The subsequent feature extension must not break these tests, unless the expected behaviour should change. I think adopting this approach can do a great good to the continuity of academic development.

Finally, there's also an economic argument to make for ‘poor-quality’ academic software. If software developed in the academia achieved production quality, it would constitute a competition to software produced in the private sector. Since academia is a part of the public sector, academic endeavours are financed mostly from the public funds. Hence such competition with commercial software can be considered unfair. Dennis Polhill argues that “unfair competition exists when a government or quasi-government entity takes advantage of its tax exemption and other privileges to supply private goods to the market in competition with private suppliers.” Following this line of thought, the public sector should not subsidize the development of software that is commercially viable and can be built by private companies. Instead of developing working solutions, academia can try and test new prototypes. If released openly, this proof-of-concept work can be then adopted in the private sector and grown into commercial products.

Eventually, when exploring my thoughts on academia-driven development, I realized that I'm torn between settling for the current status quo and pushing for emancipating software with publications. While I'm stuck figuring this out, there are laudable initiatives, such as Semantic Web Developers, which organizes regular conference workshops that showcase semantic web software and incite conversations about the status of software in academia. Let's see how these conversations pan out.

No comments :

Post a Comment