There is a strong and growing demand for digitally sovereign video conferencing solutions. We at Kopano provide a tool for this with Kopano Meet, other successful players on the market are for example Big Blue Button or Jitsi. However, we all have one problem in competition with Zoom, Google Meet or Microsoft Teams: the underlying infrastructure.
Video conferences require comparatively small audio streams as well as video streams and data streams with the shared screens or presentations. Depending on the number of active participants in a single conference, the challenge of bandwidth and latency grows. I briefly touch on this topic in a previous blog post.
Hyperscalers like Google or Microsoft, or well-funded providers like Zoom, operate an infrastructure of MCUs and/or SFUs that are close to the users.
An SFU (Selective Forwarding Unit, ours is named Kopano Meet Boost) primarily takes the load off the participating clients when sending video and audio streams. These must always be available in the format that every other participant needs. And above all, there must be such a stream for every other participant.
An MCU (MultiCast Unit) also ensures that participants only receive a pre-mixed video and audio stream from all the other participants.
Apart from the economic challenges (both units are subject to very high fluctuations in the demand for computing power) and the (too) slowly solving problem of end-to-end encryption (see https://www.chromestatus.com/feature/6321945865879552, Status: 2020-11-20), it is precisely these modules that form the LockIn scenario: SFU and MCU can each only be used by the provider of a service. That is why Zoom can do things that only Zoom can do.
One solution would be to separate infrastructure and application, as is the case with the rail network and the providers of train connections. One could think about an Airbus model here, where Europe creates an infrastructure provider. However, this contradicts the basic ideas of digital sovereignty. However, open well-defined interfaces and an infrastructure approach comparable to the Sovereign Cloud Stack can create a flexible, highly scalable and digitally sovereign solution.
The SFUs/MCUs form the basis of the infrastructure alongside a TURN service. Since the communication of all modern video conferencing solutions takes place via WebRTC, it should be possible to define all-encompassing usable interfaces. An infrastructure definition could include the free choice between an MCU and an SFU. Providers can then offer this package as their product with their computing power and pricing. Customers are thus free to choose the infrastructure provider and switch between them.
Solution provider market
Solution providers like us (or Jitsi or others) then connect to these infrastructures respectively. Those who already have an infrastructure, such as Big Blue Button, can offer the sovereign infrastructure alongside their proprietary one. Differences at the solution level can then be found in use cases (e.g. conference, classroom, …), integrations, look & feel or comfort. Since all data flows necessarily end up with the infrastructure provider, trust also plays a major role. With the idea of defined interfaces, however, self-hosting is possible, which could solve the question of trust for everyday school life or critical corporate communication, for example.