The digital revolution comes to marine science
Table of Contents
Keywords: digitalization, data processing, technology
In The Structure of Scientific Revolutions, Thomas Kuhn introduces the concept of paradigm shifts. Science, he claims, does not evolve gradually accumulating new discoveries and knowledge converging towards completeness and accuracy. Instead, at intervals scientific discoveries are fundamental enough to cause a paradigm shift, changing the very foundation of how we organize and interpret knowledge. Kuhn meant to describe scientific revolutions, but it is tempting to apply the concept to other fields and identify similar or analogous paradigm shifts. One such analogy, closely linked to science itself, concerns the machinery used to produce knowledge. Modern knowledge production often involves instruments, technology, and engineering, and new digital tools are introduced at an accelerating pace, and the sheer volume of data collection is difficult to process with traditional methods. In recent years, machine learning and artificial intelligence are increasingly used to automate the analysis of the collected data. Here we argue that it will be necessary to digitalize knowledge production to take full advantage of new technologies, and that this transformation does not occur as gradual changes. Instead, it represents a paradigm shift which will have profound consequences for the way we do science in the future.
A brief history of fisheries and fisheries science
Norway has a long coast line and is surrounded on three sides by rich oceans. Unsurprisingly, fisheries have always been an important source of food and wealth for the people living here, and for centuries fish has been traded nationally and across borders (5). Egil's saga tells of stockfish (i.e. dried cod) exported to Britain as early as 875 AD, and fish has remained among Norway's most valuable commodities ever since.
In the early twentieth century, Norway decided to develop its fisheries using scientific methods (2, 6). The primary task was for the scientists to improve the efficiency of fisheries, to help fishermen find and catch more fish with less effort. Scientists would study the geophysics of the ocean, the biology of fish, and the dynamics of the ecosystem.
This proved fruitful, and a few decades later, advances in knowledge and technology had made fisheries so effective that many fish stocks were threatened by depletion and even by extinction. The original task had changed, and in the last decades of the century, science changed its aim from efficiency to sustainability. Fishing methods and equipment were restricted and catch volumes limited by quotas to ensure that stocks could maintained a productive biomass. We can see this as the first paradigm shift in fisheries science. As the march of technology has progressed, a second paradigm shift has recently arrived on our doorstep, in the form of digitalization (4, 7).
Over decades of fisheries science technology improved, but in many ways the scientific methods remained essentially the same. The marine scientist (we can imagine a bearded and bespectacled man with a grave air and a framed university diploma on his wall) would go on surveys with a research vessel to collect his specimens and measurements, meticulously setting up his tables and diagrams and drawing his conclusions. Marine science was a craft, and the marine scientist was the master craftsman, whether working alone or surrounded by his journeymen and apprentices.
Although scientists, like many other craftsmen, have adopted and sometimes invented new technology to support and improve their craft, digitalization goes beyond just the use of computerized tools. Digitalization, at least for our purposes here, is a systemic change to collecting and processing information, so that the structure of operations and flows are automated (1).
Digitalization, then, is to traditional science what industrialization was to craftsmanship. The qualities of the product no longer depend on the individual skill and intuition of the craftsman, but are instead inherent in a defined process. Uniqueness and the personal touch gives way to scalability and measurable defect rates. This is a profound change, and when it comes to digitalization of science, we are nowhere near its completion.
What does industrialization mean for information?
One defining difference between a craft and an industry is that the craft is about the product, industry is about the process. The craftsman works on one product at a time, using his tools and his skill to create the end product. Using deep knowledge of his tools and processes, he turns carefully selected materials into the best they can be, combining practicality and aesthetics into a unique object.
In contrast, an industrial process applies the same techniques over and over, aiming for a product that is reproducibly uniform. To aid this process, outputs and inputs are standardized as much as possible. Subjectivity and skill is replaced by quantified performance indicators and minimum error margins.
An important distinction between industrial process and a craft is where the knowledge and skill is located. In crafts, the skill resides with the master craftsman. He may be surrounded by acolytes and journeymen, who develop their own skills by listening to and observing the master. The secrets of his trade, often in the form of tacit knowledge, are either passed on to his disciples or die with him.
In an industrial process, the knowledge is embedded in the process itself. Many people can work to improve the process, and the improvements remain, in the form of written standards, tools, machinery, and documented routines. In most cases, there is no single person who understands the complete process in detail.
Where we usually think of industrial processes as something that concerns physical objects, science is mainly about information. Scientists collect data from observations, they process and refine data, they combine data from different sources, and they extract knowledge from the analyses. In the end, the results are published in scientific journals.
Similar to physical processes, an information process can be broken down into modules that perform specific operations, and mechanisms that connect modules so that the output of one module can be used as input to another. Where a factory consists of physical machinery, the information process consists of a series of computer programs.
The use of technology does not necessarily mean industrialization, and it is possible to use improved tools without substantially changing the overall process. Sometimes the improved technology just replaces one tool with a better one, that still is used in the hands of an operator.
Even with advanced instruments and platforms, the data collection process can be manual and require real-time decisions and other human intervention. Examples of this are images captured by machines that need manual scrutiny by human experts, observation platforms and remote sensing systems that need monitoring by operators, and unmanned vessels piloted by remote control.
Similarly, merely using computational analysis tools is not sufficient. While data processing and analysis steps are performed with a computer, there are usually manual steps of naming, transferring, reformatting, converting, and editing data. Or, to put it another way, technological advances can improve individual steps, but digitalization is about what happens between steps.
The advantages of industrialized science and why we need them
At first glance, industrialization was a matter of scale. From smiths and weavers producing goods for their village, to factories that could supply entire countries.
A manual process can be scaled up by adding more manpower, but even then, individuals vary in efficiency and skill, and coordination, training, and management introduces costs that increase superlinearly.
In marine science (and almost everywhere else) new technologies have changed how we collect and generate data. In the lab, mechanical tools like automatic pipetting robots have been available for decades, and more recently, PCR machines, flow cells, DNA sequencing, microscopes, and many other machines automate the extraction of data from samples. In the field, advanced sensors are similarly collecting data with ever increasing volumes and complexities, increasingly mounted on unmanned platforms or autonomous vessels.
These advances have increased the volumes of data enormously, and any remaining manual steps in the processing have rapidly become bottlenecks that limit the usefulness, and consequently the value, of the collected data. But when the whole process is automated end to end, scalability becomes a matter of adding more machinery.
The best violins come from the workshop of Stradivarius, not from the factories of Yamaha. A hand-built Rolls-Royce is a luxurious and coveted vehicle, whereas a Toyota is a tool that gets you from one place to another. On the surface, it might seem that the difference between craft and industry is a trade-off between quality and quantity. Quality here refers to a set of desirable properties in the end product, like durability, aesthetics, or performance, but which go beyond the mere minimum needed to function.
A central goal (and benefit) of industrialization is that each unit produced is interchangeable. All else held equal, increased product quality is a plus. At the same time, production at scale makes avoiding extra unit cost important. Usually, reducing the number of defective units produced is more important than increasing the average quality, and a few units of better than expected quality rarely make up for a number of defective units. The industrial process is therefore aimed at sufficiency rather than excellence, and consequently, reducing variance to maintain results within the acceptable range. We might say that low variance is a quality of the industrial process.
We may think of science as driven by creativity and curiosity, but the scientific process relies on reproducibility and rigor. Reproducibility requires predictability, and low variance becomes a quality of the scientific process as well, something the industrialization of science can contribute to. As for industrial production of goods and services, reducing the variance of the collected data and of subsequent analyses contribute to the reuse value of the results, and the generality of the process.
Interchangeability is not just for end products. When each operation in a process is clearly defined and delineated, it can be replaced by another that conforms to the same constraints. By measuring the changes to the output as components are modified or replaced, an evolutionary process of improvement is possible, increasing efficiency and reducing waste. Automation means that the whole process can be subject to experimentation to determine how quality and variance of the output depends on each operation and input.
Craftsmanship can produce unique objects that maximize the potential of the raw materials and which are tailored to a particular use and a particular user. Swiss cuckoo watches and Fabergé Easter eggs are examples that represent the pinnacle of excellent craftsmanship.
But crafts fail us in building larger and more complex structures. To build a railway, an airplane, or a high rise building requires an industrial process. No single mind, no matter how talented, experienced, or educated, is able to encompass the complete picture, to understand or attend to intricate details of all subsystems and components. It is only by a level of abstraction provided by standard gauges, calibers, volumes, and rates that complex systems can be designed and assembled.
For instance, a vast number of factors including geophysical variables, predator and prey relationships, migration, and anthropogenic influences interact in complex ways to affect ecosystems. Models that take these factors into account can in theory yield better understanding of causal relationships, and thus more accurate predictions than simpler models. Yet integrating a wide array of ecological, physical, and chemical mechanisms and estimating parameters from a large variety of data sources is an ambitious task. Like assembling an airplane or an oil refinery, the number of moving parts is too high for any one individual or even a team to fully understand the system from the ground up.
A digitalized marine science needs to be built from standardized data collection and processing components that provide useful abstractions that can be fitted together to form larger, modular designs. Each module as well as the overall design can be measured against quantitative performance metrics and incrementally improved. This would allow aggregate designs of a higher complexity.
Obstacles to digitalization
To digitalize a process, the aim needs to go beyond mere application of technological tools. The goal should be the elimination of any and all human intervention. This implies a transformation not only of the technical process, but also of how the people implementing the process need to work and think.
Whenever a step in the data processing chain is performed manually, it opens the door for human errors and inconsistencies that obstruct automation of subsequent steps. Where another person might immediately recognize and mentally correct a misspelling, and to some extent be able to work with a you-know-what-I-mean mentality, machines and computer programs are much less forgiving. It is often the case that errors introduced manually must be corrected manually, and technicians who work with machine learning often complain that they use most of their time organizing and correcting data and much less actually running analyses.
Even when each step in the process is completely deterministic and records its parameters, input and output is often managed manually. Data to be processed must be pulled from its source and arraigned appropriately for the analysis. Results must be annotated with metadata and organized in a form suitable for long term storage, retrieval, and future use. When these steps are performed manually, the same opportunity for errors and deviation from standards arise.
Scientists recognize the value of technology and they appreciate better tools as much as the next person. However, they will tend to think of incremental improvements, technological solutions that streamline their existing ways of working. They envision advanced interactive tools with high flexibility and user choice.
This view is often at odds with effective automation, where it is paramount to reduce and not increase complexity. A high degree of user flexibility means more opportunities to introduce variance in the output. Analytic methods and software are generally deterministic and their output reproducible, given the same data and configuration parameters. But if this configuration is interactive, it opens up for error. And in general, any human intervention means opportunities for human errors that contaminate the data.
The industrial process may be necessary, but it is usually not fun, interesting, or charming. The prestige of the master craftsman is rarely afforded the process engineer, much less the assembly line worker. When all produced units are interchangeable, as is the contribution from each worker who may not even get to see the final product, there is little to be proud of for any individual worker.
The desire for low variance often leads to a constant regime of checks, measurements, and performance indicators. These intrusions in the process seem unnecessary and cumbersome to the craftsman, who has subjective expertise to evaluate product excellence directly. Such measurements are therefore rarely popular.
Fears of job loss is one reason for resistance, but perhaps more important is fear of losing prestige. The craftsman is admired for his best products, they define him to others and to himself as a master of his craft, an expert whom others envy, admire, or seek to emulate. When his craft is replaced by factories, it is a devaluation of the craft itself, and by extension, of the master craftsman.
Towards an information industry for science
Digitalization, as we have defined it here, applies to a process. It is not about improving individual processing steps, but about how multiple automated steps can be combined into a complete information factory. Such a design has advantages of scalability, flexibility, predictability, and reliability. However, it represents a profound change in mindset, and for both psychological and technical reasons, many attempts at digitalization fail or see limited success. For a successful digitalization effort, the following points should be kept in mind:
Simplify and standardize everything
In contrast to machines, humans excel at dealing with the unforeseen. We can quickly see when something is wrong, and take steps to remedy or work around errors. In contrast, for an automated step to work effectively, its inputs must be predictable. This means having simple, standardized, and clearly defined data types and interfaces. Complexity of each component should be kept as low as possible. It is important to maintain a clear separation between the abstract data model and implementation details, and to keep the latter out of data exchange or archival formats.
Aim to eliminate all manual intervention
Any manual intervention or decision introduces the opportunity for human error, and the resulting unpredictability can wreak havoc on any later automated processing. Automation should therefore start from the beginning of the processing chain and push human intervention to as late a stage as possible. Whenever human intervention is unavoidable, inputs and outputs should be subjected to automated validation to identify and eliminate possible mistakes.
Involve users, but prudently
When building any kind of software, it is important to involve users in the design. Working closely with existing practitioners is invaluable, and craftsmen usually know best what a successful product should look like. In terms of process, however, they will tend to think along traditional lines, and may not be prepared to make necessary trade-offs. In the words of Kuhn (3):
What man sees depends both upon what he looks at and also upon what his previous visual-conceptual experience has taught him to see.
Successfully constructing an information factory requires a change in mindset, and resistance is to be expected, especially from people who have a deep personal investment in the existing process.
Know when not to
Digitalizing a process is an ambitious undertaking, even for relatively simple processes. To build complex systems and reap the potentially large benefits has substantial costs and a high risk of failure. It is tempting to focus only on the possible benefits, but a thorough analysis taking all factors into account should be performed before any decision is made. Ambitious IT projects often fail, and in many cases, the correct decision would have been to keep things as they are.
References
[1] Brennen, J Scott and Kreiss, Daniel, Digitalization, Wiley Online Library, 2016.
[2] Jentoft, Svein and Finstad, Bj{\o}rn-Petter, Building fisheries institutions through collective action in Norway, Springer, 2018.
[3] Kuhn, Thomas S, The structure of scientific revolutions, University of Chicago press Chicago, 1997.
[4] Mnatsakanyan, AG and Kharin, AG, Digitalization in the context of solving ecosystem problems in the fishing industry, 2021.
[5] Nielssen, Alf Ragnar, Indigenous and early fisheries in North-Norway, Citeseer, 2001.
[6] Solhaug, T and Saetersdal, G, The development of fishery research in Norway in the nineteenth and twentieth centuries in the light of the history of the fisheries, Royal Society of Edinburgh Scotland Foundation, 1972.
[7] Uriondo, Zigor and Fernandes-Salvador, Jose A and Reite, Karl-Johan and Quincoces, I{\~n}aki and Pazouki, Kayvan, Toward Digitalization of Fishing Vessels to Achieve Higher Environmental and Economic Sustainability, ACS Publications, 2024.