Virtual virtuosity: Czech device lets musicians play together across vast distances  

A device developed by Czech researchers enabling musicians thousands of kilometres apart to play together virtually – with no perceptible delay – proved invaluable during the pandemic. So much so, in fact, that the unique device won a European Heritage ‘Europa Nostra’ Award, for helping bring together students and teachers of classical music, as well as entire ensembles. The idea to develop such a device actually came to Dr Sven Ubik of the Czech Technical University years ago, while attending a concert also broadcast on television.

Dr Sven Ubik and fellow researchers within the Czech Education and Scientific NETwork (CESNET) developed the device over a number of years, in cooperation with the Music and Dance Faculty of the Academy of Performing Arts in Prague (HAMU). The ground-breaking technology allows musicians to play together, via an audio-visual hook-up, over vast distances with minimal latency – just a few milliseconds delay.

I visited Dr Ubik in his laboratory, packed with humming PCs alongside the whisper silent device, and began by asking him its official name.

“The official name is MVTP – that’s an acronym for Modular Video Transmission Platform, and it’s a device for low latency video and audio transmissions over a computer network.

“The added latency of the transmitter and receiver together is about 1 millisecond. So, even when network propagation delay is added, the end-to-end latency is still very low.

“It allows special applications like connecting musicians together in different cities so that they can play together – they can see and hear each other with very low latency.”

How long was this in development and how did the idea begin? I know that in 2020 you won the Europa Nostra award, and it was very timely because of the Covid pandemic…

“It started about eight years ago – obviously, there were some previous versions in the development stages, so this particular device was developed over about two years, but the work started much earlier.”

I understand that you were actually at a concert when the idea came to you. You recorded it with a phone and saw that the delay was quite extreme – is that right?

“Once I was at a concert that was broadcast on television, and I was curious about the delay between the live concert and the broadcast.

“So, using my mobile phone, I recorded the broadcast and the actual sound, and there was almost a minute.

“We were curious if we could make it fast enough that musicians could comfortably play together in different cities.

“In the end, it happened, so we organised several concerts in different countries connecting two organists – Prof. Jaroslav Tůma was playing in Brno together with another organist in Trondheim, in Norway, some 2,000 kilometres away.”

Sven Ubik | Photo: Brian Kenety,  Radio Prague International

And what year was that concert?

“I think that concert was in 2016. There were other interesting concerts after that.

“For instance, in 2018 – because it was 100 years since the founding of Czechoslovakia, we organised a concert between Prague and Bratislava.

“And some concerts over large distances, like between some place in Czech Republic and the New World Symphony in Miami.

“With distances so large only some kinds of music can be played together. Generally, with classical music, you need a latency of no more than 20 milliseconds.”

So, there’s a difference in requirements for different kinds of music – is classical music less demanding than say jazz?

“Classical music is generally the most demanding – it needs the lowest delay. If music includes some strong beat, then usually a larger delay can be tolerated.

“For instance, distances between cities in Europe, then just the network propagation delay – which is limited by the speed of light in optical cables – is between 10 to 20 milliseconds.

“Which implies that we have just a few milliseconds left that we can use for our equipment to connect musicians comfortably.

And the device that you’ve developed here at CESNET doesn’t require a PC or anything else, just one group of musicians has one unit, the other has another unit, and then you program in the IP address?

“Exactly. There is no PC. The major part of achieving low latency is that all functionality is programmed in FPGA – that’s Field-Programmable Gate Array – or it’s basically implemented in programable hardware.

“There’s no operating system involved, and therefore the latency is very small and stable.”

If you were to do it through a PC, that would add precious milliseconds.

“Yes, that’s exactly the problem.”

I’d like to go back a bit and ask what sparked your interest in the technology.

“I think it started on both sides – on one side, musicians and music teachers in academies were interested if they could have distance-learning master classes.

“And on our technical side, we were curious if we could make transmissions with such small latency and other parameters sufficient for this application.”

What did the onset of the Covid pandemic mean for the project? Was there suddenly not only more interest but also greater funding?

“I think, yes, it sparked more interest. But as I said, we started the project long before the pandemic because there was already a need to connect musical academies.

“We tried to help with some events during the pandemic, but it was not a new experience for us.”

And the award – the Europa Nostra award – did that change things for the project?

“Of course, we very much appreciated that we received this award. Some people maybe now believe more that it works, really.

“There is also some kind of a [sense of] duty for us to continue in developing and improving this thing and helping institutions collaborate together.”

Right now, we’re looking at images of ourselves on two different screens. Was there something you wanted to demonstrate?

“Yes – you can see how latency is very low for the video. We see two monitors, one is connected right to the camera, the other is monitor is connected after video compression, transmission and decompression.

Jaroslav Tůma | Photo: Prague Spring Festival

“You can hardly see which monitor is the first one and which is the second – because the picture looks the same.”

Right, with the naked eye we cannot perceive the difference. But of course, these monitors are very, very close together.

“Yes, there’s no network propagation delay now. But there is a delay inside the boxes, inside the transmission and the receiver.”

“But if you arranged these using two PCs with some software, you would definitely see some delay just from that hardware and software.”

What has changed with this model since that first concert, in 2018?

“Well, we now support compression that reduces the bit rate in the network more than previously. And there are improvements like there is no fan inside – it just uses passive cooling, so it’s completely quiet.

“It makes no noise, like a PC would, and can be placed in a music environment.”

No noise – like we have now in this lab.

“Yes, yes.”

Are there many competitors, so to speak, different research institutions that have developed similar technologies? And has it been commercialised – do you sell or rent them?

“Yes, there are other technologies for low-latency audio transmission, like Dante Audio, for instance. But I think this is unique in that it combines audio and video and both have very low added latency.

“You can buy it now – we have a reseller and can provide it on a commercial basis. But we also collaborate with musical academies and try to help connect them as part of our research programme.”

What’s the limitation in terms of distance – what’s the farthest apart these machines can be that the musicians won’t notice the latency?

“There is no strict limit. We did, for instance, a concert between Prague and Taiwan or South Korea. But, of course, with such large distances, there is network propagation delay that limits the kind of collaboration that can be arranged.”

So, in that case it wouldn’t be a concert but perhaps a master class.

“It definitely can be a master class which connects a student and a teacher together – it could also be a concert if artists react one to the other in sequence or in some arrangement, not really playing together over such a long distance.”

Is there is anything I haven’t asked you about that you’d like listeners to understand?

“Maybe that the way which we achieved very low latencies in part by implementing hardware, as I said, but also that there’s a specialised codec for video that adds only a few lines of latency.

“Also, there’s a smart algorithm that keeps the transmission stable even with very small receiver buff – that’s the usual problem, that most video transmission equipment has a large buffer receiver.”

What else are you working on? I saw that in 2017, I think, you gave a presentation on cultural heritage and technology at museums…

“Yes, that’s another area of our research – we also help museums and similar institutions to bring their cultural heritage on the internet in modern ways.

“We create interactive, 3D models of selected collection items such that visitors can virtually manipulate these items – look inside, open them, or see how the item looked before and after restoration.

“So, that’s another area of our research.”

So, to make museum exhibitions more interactive and more tangible to the layperson.

“Yes, it’s useful both for physical exhibitions – like, you may have a statue and in front of it there’s a touch panel and you must reconstruct it.

“And it’s also useful for the online work of museums, to present their masterpieces on their websites in a more attractive way.”