How network audio players work
Nov 13, 2020 14:37 · 5662 words · 27 minute read
Although you can say that network audio players aka streamers simply get audio files from a storage medium and send them to a digital to analog converter, there are many ways to do that. And almost all those systems are not compatible. In this 40 minutes long video a comprehensive overview of most systems. The past 20 years playing media over networks has developed from extremely rudimentary to rather sophisticated. From a situation where 640 x 480 interlaced video and MP3 at 128 kb/s was just doable over a local network, to a current reality where 4 k video and 192 kHz 24 bit FLAC files are streamed over the internet. Even voice control is slowly getting into the market.
In this video I will describe most popular network audio players - so no 01:06 - video. I will also demo Google Home, so if you have a Google Home speaker nearby, you might want to switch off its microphone during this presentation. Every digital audio player can be divided into a number of function blocks. Let’s begin with Storage. This is where the audio files are stored. These audio files will have some kind of labelling. In a cd-player a cd has very rudimentary labelling while in an advanced digital audio player this file might include over 100 metadata fields.
The second 01:45 - building block I named Control and the third Database. We’ll see their functions further on. The fourth function I named Renderer, the fifth Remote control. Please do realise that throughout this presentation these blocks represent ‘functions’, not necessarily physical devices. Storage can be a cd-drive holding a cd, a hard disk, thumb drive or any other storage medium. Remote Control can be an infrared remote like that of your TV but it can also be a smartphone, tablet of computer.
Renderer in essence includes the 02:24 - digital to analog conversion, amplification and loudspeakers or headphones. Although in many cases amplification and loudspeakers and even sometimes digital to analog conversion will happen outside of the player. For this presentation, however, it is irrelevant whether these are integrated or done by separate devices. When the digital player is switched on, it usually will query the storage for its content. The result will be stored in a database. When the user makes a request, Control will check the Database, gets the corresponding file from Storage and send it to Renderer.
The 03:04 - user request can be done on the player or over a remote control. Simple infrared remote controls are unidirectional, More advanced remote controls, like apps on a smartphone or tablet are bidirectional. If we remove the bidirectionally of the remote control, we have the functionality of the first digital consumer player, the CD-player Now, let’s go to the present. Hey Google, play blues music over Home Speaker. [wait] Hey Google stop. As you can see, it’s not perfect yet. I am told that a subscription to Google Music would give more control. But this is what early adaptors are using now. And it’s not only music files. You can also have it play radio stations: hey Google play BBC Radio 1. [wait] Hey Google stop.
I have shown you how the Google Home speaker can 04:22 - play music, but it can even route music to other players like the Chromecast audio, Chromecast Video and Sonos products. Again subscriptions are needed for many functions but it can be done. Hey Google, play blues music over chromecast audio. [wait] Hey Google, stop. Google is not the only voice controlled service, Apple and Amazon also have these kinds of services, although Apple has a rather closed system and less perfect than Google. I have no hands on experience with Amazon Echo yet. Now, what’s the topology of such a system. There are two environments: the Cloud and Home. The cloud does the control, the storage and the database and connects to the Home environment over the Network interface using the internet. At home there again is a Network interface talking to Control. Control listens to the internal Microphones all the time to pick up the attention call Hey Google and listens to the request.
It sends that audio 05:46 - over the network to Control in the cloud. There the instruction is interpreted and a response is returned to Home. That can be a remark, a question or a result like playing music. I know of no quality home audio equipment that listens to any voice control system, not even if a service like If-This-Than-That is used for interfacing. IFTTT.com lets you specify what should be done on given requests.
You can, for instance define the request “watch 06:18 - movie” as lights low, blinds down, beamer on and AV receiver on BlueRay player. But it requires some logic thinking and my guess is that it will take still some time before we see that widely used. Especially in the quality home audio market. Using file based audio players took over 15 years before it became more or less mainstream. Let’s go back to that point in time for - as often - things are more clear when knowing how they developed over time. For at least two decades CD was the only way of playing digitised music.
The music industry 07:02 - pushed the market to consume more and more music and found easy prospects in youngsters. The rate at which new hits were born and the financial limitations of the youth had already lead to a lot of mix tapes on compact cassette. But the sound quality of affordable gear was very low while - even more important - the ruggedness of cassettes was poor and cassettes also laid a burden on the pocket money. By the time Dad bough a new computer and donated the old one to the kids, a very comfortable way of proletarian sharing of music was discovered in the shape of MP3’s. Just hook two PC speakers to the computer and set up a sharing sneaker network with friends and now everyone could play the music that they shared.
Later 07:51 - on the sneaker network was replaced by the internet. MP3’s were ideal since they were about 10% of the data. Hard disk capacity was expensive at that time and sharing over the internet was slow. When we look at the block diagram of such a setup, we see Control, Storage, Database and Renderer. Control being the sum of the operating system and music player software, storage being the hard disk and the renderer being the software mixer of the operating system and the internal digital to analog conversion.
The software mixer might 08:27 - have changed the level and even the sample rate, depending on the pc’s settings. But no one cared. Sound quality was less of an item for those youngsters since low quality audio was all they could afford anyway. A number of Dad’s found this way of playing music a good idea so the laptop’s headphone output was connected to the stereo in the living. Of course the headphone amp in the laptop was limited in audio quality and over time digital audio outputs started appearing on computers and laptops. Now a quality digital to analog converter could be used, with the goal to improve the sound quality, although the software mixer still processed the sound.
09:12 - Further more a computer has many internal clocks and other ‘dirty signals’ like cheap switch mode power supplies that all interfere with each other. As a result the SPDIF signal suffered from digital signal processing and the biggest enemy of digital audio: jitter. Although I expect jitter to be known by most of you, I will do a short recap. Let’s imagine a short piece of analog music signal by chance being a straight line like this. On digitising, amplitude samples are taken at the given sampling frequency, like 44.1 kHz for CD.
These measurements are stored in a table and sent to a transport 09:57 - or storage medium. On playback the table is read and then rendered in discrete voltages. This would have resulted in a staircase pattern if not for the reconstruction filter that ‘slows down’ the the signal so - at least in theory - the original straight line is regained. If the interval at which the voltages are plotted is not very constant, there will be amplitude errors as can be clearly seen when a white line is plotted behind the resulting red line. This creates side bands at the modulation frequency of the clock and can lead to several sound quality problems.
For instance degraded lows, distorted stereo image, poor focus within 10:41 - the stereo image and harshness in the mid range. It might be clear that connecting a digital to analog converter to a PC is not the way to go if sound quality is important to you. Unless you prepare that PC in such a way that it produces a clean digital signal. This kind of PC is available commercially too and is called music server. Due to the special measures, relatively small production series and different distribution model these are clearly more expensive than regular PC’s.
In the mean time digital to analog converters 11:21 - with USB inputs came to the market. Initially it was what is now known as USB Audio Class 1, an isochronous data stream like SPDIF and thus with the same problem: the computer audio clock is master. And it’s not the best master. USB Audio Class 1 is limited to 2-channel 96 kHz. Later on USB Audio Class 2 was developed. This is an asynchronous data transport where the digital to analog converter is master. It resulted in an improved sound quality and offers higher sampling rates - even up to 768 kHz 32 bits.
As where USB Audio Class 12:02 - 1 is supported by all modern operating systems, USB Audio Class 2 has not been supported by Microsoft for a long time. A driver had to be installed to solve that. All other modern operating systems, Apple OS, Linux, Android, iOS and iPadOS have supported USB Audio Class 2 for years now and recently Microsoft has added it to Windows 10. Large consumer electronics brands realised that video’s, photos and music, stored on the computer, should be played in the living room too. This lead to two standards that enable playback of media from the computer in - for instance - the study over the network to a network player in the living room. DLNA is set up by brands like Sony Philips, Microsoft, Hewlett-Packard and others and initially had measures incorporated to prevent pirated software to be played.
It did not recognise MP3’s as audio files, for instance. As far as I can see, DLNA is just a layer over the Universal Plug and Play AV standard - UPnP AV for short - and in practice they appear to be interchangeable now the DLNA anti piracy measures appear to be abandoned. So I will treat them as one standard. The system is supported by all major consumer electronics brands and many smaller high end brands like Linn, Naim, Marantz, Denon, Cambridge Audio and others. Some of these brands, like Linn and Yamaha, have made extensions to the DLNA standard for improved user experience. It has the advantage that all media on your computer can be played back on DLNA/UPnP AV renderers like smart TV’s, networked video players and networked audio players.
Using clients like the video player 13:58 - VLC you can also play this content on iPhones and iPads. It is clear, however, that video had the most attention. The usually large number of audio and photo files were, at least for a long time, handled slowly. Furthermore only a limited number of metadata types for music were supported by the server programs. For the average pop and rock fan, that only searches music on artist name or track name, this posed no problem.
Lovers of classical music like to search on 14:35 - conductor or composer and these fields were not supported. Jazz lovers want to look for combinations of artist to find their favourite albums. Today there are server music programs specially aimed at music that handle all the 100 plus data fields modern metadata taggers offer. A very popular UPnP AV server program for audio only is MinimServer. Another shortcoming of the UPnP AV and DLNA protocol is the lack of standard support for gapless playback of albums where the music of one track continues without interruption into the following track. Sgt.
Pepper’s Lonely Hearts Club Band by The Beatles is a good example. As are live albums. Today most network players that use DLNA or UPnP AV have solutions for this in their hardware. To recap: DLNA and UPnP AV have broad industry acceptance, audio, video and photo is supported, a server program has to be running on a computer or NAS in the network, designed primarily for video and therefore many server programs have limited metadata support. Playback of gapless albums is not supported and has to be solved in the player. When we look at the function diagram, we find two environments: first the computer that has storage in the shape of a hard disk, Control by OS and DLNA/UPnP AV server with Database and Network.
And second the network player that also has a network function that can 16:12 - be either ethernet or Wifi, then control, which is relatively simple firmware and Renderer. The UPnP AV/DLNA profile also knows a two way remote control standard. Some years ago there were dedicated remote controls, nowadays smartphones and tablets are used. It comprises Network, Control and Display. Control receives info from Control in the computer over the network, sends it to the display and returns touch info that the local control sends to Control in the computer over Network.
Control in the computer then sends the audio file 16:52 - to the network player. This setup can be controlled from all three environments: Computer, Remote Control and Network player. The most recent instruction is leading. You can control all network players from one remote but multi-room playback, where players in different rooms can play in perfect sync, is not supported. Several manufacturers have added their proprietary extension for this. The list of brands that use DLNA is endless. Here some popular brands that are licensees, for a complete list see the DLNA.org site.
17:30 - UPnP AV has no certification program, as far as I know, but is part of the UPnP standard and I currently know of little cross-brand problems. DLNA initially was not picked up by music lovers, probably because the network players combined video and audio playback in one device. Especially in affordable equipment the video electronics has a profound effect on the jitter behaviour of the analog electronics. Furthermore - as mentioned - it was poor in browsing speed and had limited metadata support. When Slim Devices introduced the SliMP3 audio player, it wasn’t a success either since it also had very poor jitter behaviour.
But they took the problem seriously and introduced the Squeezebox 18:21 - only two years later. That was rapidly picked up by hifi enthusiasts and although the jitter is not up to current standards, it was a lot better than its predecessor and the DLNA players. It also added internet services like Pandora, Napster, last.fm and Sirius, using an online service called mysqueezebox.com. After three years they were bought by Logitech that, after only six years, ended production but promised to keep maintaining mysqueezebox.
com and the 18:54 - local server program Logitech Media Server - LMS for short. Which they did and still do. Today many people use Raspberry Pi small board computers with a sound card and shareware to emulate Squeezebox functionality. On my YouTube channel there is a playlist of videos on ‘Raspberry Pi for audio’. When we look at the function diagram we see in essence the same setup as with UPnP AV: there is a computer with Storage, Control, Database and Network but now the DLNA/UPnP AV server is replaced by the Logitech Media Server program. And also on the Network player side things look identical.
Most Squeezebox models had a display, except for the Duet 19:36 - that had a dedicated handheld remote with iPod like controls. A Squeezebox system can be controlled from the player, from a remote control and from the computer. On the computer you use an HTML interface while on tablets and smartphones apps too are available from third parties. The system is not difficult to set up if you are familiar with computer technology but digiphobes will have a hard time to get it playing. So if we sum it up we see that the software is still maintained but not further developed.
20:10 - It is partly open source and lots of plug-ins are available, for instance for Spotify, Tidal and other streaming services. And the Logitech Media Server also supports photo and video over UPnP AV. While combining video and serious audio in one player often leads to disappointing sound quality, combining both in a server program is no problem at all. When Sonos was introduced I wrote in my review of the Sonos Connect network player that even my mother could install it. You don’t know my late mother but believe me, that’s a statement. However, that’s not all that made Sonos popular.
The installation was easy 20:55 - since on the computer you only had to share the folder that holds the music files. The Sonos player then starts reading the metadata from that share and builds a database in the player itself, making browsing and searching music a lot faster. Initially it had to be used with a proprietary remote control, again with an iPod like user interface. Later on an iPod Touch like remote was introduced but it costed more than the real iPod Touch and worked less perfect than the app on the Touch. Yet another clever technique was added: it uses a mesh network for connection with other Sonos players, which was, with the Wifi standards of that time, clearly a bonus.
One player is master and is connected to your home network 21:45 - and if you have more than one player the other players connect to each other over a proprietary Wifi-like mesh network that connects each player with every other player within reach. If you don’t have a network connection close by one of the players, there even is a bridge that connects your network to the Sonos mesh network. The system is enormously robust and their program not only had the network streamer shown here but also one with a power amplifiers integrated plus several active speakers that only needed a power outlet to play. Recently Sonos introduced a new operating system that, although it is backwards compatible, does not offer the new features when used with older hardware. Despite rumours, Sonos equipment will only play music files at cd-resolution, high-res music is not supported.
22:40 - When we look at the function diagram we see that the computer or NAS only has to share the volume that holds the music to the network. Control in the Sonos network player - being it a player that has to be connected to your stereo or active speakers - will access the share, index it and store it in the internal database. This index is then shared with other Sonos players in the mash network. Sonos players have next to no controls, only some models have volume controls. You will need to use a smartphone or tablet as controller.
23:19 - To recap: Installation is easy, it offers faster browsing, quick searching and uses a mesh network. Since there is a very large install base, about any internet music streaming service love to work with Sonos. There are limitations too: the internal memory limits - at least in theory - the number of tracks that can be indexed. The exact number depends on the amount of metadata and the size of the cover art. Sonos mentions a maximum of 64,000 tracks, which will be around 6,500 albums.
Furthermore it’s a closed system 23:57 - with the advantage of operational reliability and the disadvantage of having to deal with the choices of Sonos. Last but not least the sound quality is aimed the ‘average consumer’, not at audiophiles. Apple had the same idea about sharing audio, video and photos and set up their own environment, including an on-line music shop. Although Steve Jobs reportedly owned a very high end audio system, Apple started of with selling 128 kb/s AAC coded music that included copyright protection. After a few years that changed into 256 kb/s AAC and the copyright protection was abandoned.
iTunes initially was only available for Apple computers and was used to load music 24:51 - onto the iPod portable music player. Their Airport Wifi access points got optical digital outputs the iTunes could stream music to in lossless PCM up to 48 kHz sampling. The output signal was quite jittery. Later on Apple TV’s came to market, effectively video and photo streamers that also could stream music from iTunes. Apple made a clever move by licensing manufactures for Airplay, a streaming protocol that uses encryption and lossless compression for the transport over the network. Many brands offer Airplay streaming, often next to other systems like DLNA.
To do a round-up: It is easy to set up for Apple computer users, the 25:39 - Windows version is know to be less robust, iTunes has good ripping facilities with metadata completion integrated and uses a robust streaming protocol that is licensed to many manufacturers of playback hardware. It is a closed system and thus operates reliable but allows no streaming from other streaming companies and it is limited to a maximum of 48 kHz sampling. Only a few years later a couple of computer buffs, being dissatisfied with music software players on computers, developed a very sophisticated music player that physically looked like the ordering computer system found in restaurants. It was amongst the best user interfaces while it automatically completes metadata, artists bio’s, album reviews and album art. A complete system was offered including an automatic backup system for the music files and database.
26:36 - A basic system could easily set you back $ 12,000 or more. The founders had less experience building good sounding hardware, which was a problem for a system of this price. After four years they were bought by high-end audio specialist Meridian, that brought the audio quality up to standard. The Meridian Sooloos Control is a fully integrated system that not only holds a hard disk but also a cd-drive to automatically rip the music from cd to the hard disk. It uses a touch screen to access the music through the extensive metadata it collected itself from metadata services on the internet.
The user interface is brilliant 27:18 - and - after the Meridian upgrade - it sounds fantastic. The initially rather high price had come down drastically but in 2019 Meridian announced to freeze development of the system. The metadata service remains active, however. When we look at the function diagram we see the usual functions, Control, Storage and Database but this time there is a touch screen as user interface and a CD drive for ripping. Then of course Renderer and Network complete the setup. After startup Control checks Storage for content.
Further content can be added 27:54 - using the CD-drive. The system collects metadata from internet services and the result is stored in the database. After receiving instructions from the touch screen Control checks the database, get the associated file or files from storage and sends them to Renderer. A remote control in the shape of a smartphone or tablet can be used too. Music stored on a shared volume on either a computer or NAS can be indexed and played as well.
It can even have the metadata 28:24 - updated without changing anything to the audio files since all metadata is stored in the database, not in the audio files. Lenbrook, the mother company of audio brand NAD, must have seen the gap between Sonos and and Sooloos like products and introduced the Bluesound brand in 2014, 12 years after the introduction of Sonos. It basically uses the same approach: easy installation, self indexing and smartphone or tablet as remote. But there are also clear differences. Bluesound products do have primary controls, like volume, play/pause. forward and back. Most current products also have five preselect buttons that lets you start your favourite radio stations, playlists or albums.
Furthermore it can also play from a USB drive, connected to the player 29:19 - directly. This way no computer or NAS has to be switched on to play music. A mesh network is not used. Most homes nowadays have fast wifi throughout the house. There even is a product with a hard disk and cd-drive that will rip your cd- collection and add metadata to the audio files. After the MQA audio format was introduced, all Bluesound players were updated to be an MQA decoder and renderer.
Which brings us to the sound quality that 29:51 - is clearly higher than that of Sonos products. The function diagram is much like that of Sonos. The computer or NAS only has to share a volume containing music to the network and if you use a USB drive connected directly, the computer isn’t even needed. The network player indexes the music content on the share or USB drive and stores that in the database on the player. Bluesound states that up to 200,000 tracks can be indexed, depending on the amount of metadata and size of the cover art. That is about 15,000 albums.
The renderer 30:31 - is MQA enabled and a smartphone, tablet or computer is used as remote control. Most popular streaming services are supported too. After a few years the NAD brand introduced products with Bluesound streaming integrated. There are amplifiers, AV- receivers and network players that work with the Bluesound protocol and do include MQA decoding and rendering. This way the Bluesound system got access to a market for even higher sound quality. As we have seen earlier you can have the computer take care of music playback. But without special care the computer will route the music bits over internal DSP functions like sample rate converter and volume control. We have also seen that using the internal DAC isn’t going to offer serious sound quality due to the very jittery clock signal. Connecting an external DAC over USB Audio Class 2 makes the sound independent of the computer’s clock and can yield better results if also the DSP function in the PC is bypassed. When you use programs that provide so called ‘bit perfect’ signals, that is taken care of automatically on Mac’s and Linux and might take special care on Windows computers, depending on the version you use.
31:53 - The software will have clear instructions on how to handle. Still the galvanic connection - the USB cable - will cause deterioration of the sound quality for reasons I will describe later. A very good solution then is the network bridge. Basically it is a simple device, optimised for music reproduction, where programs can send music to. It is the same as a basic DLNA/UPnP AV player or Squeezebox but then controlled from an audiophile music player on a computer. Often network bridges can function as DLNA/UPnP AV renderer or Squeezebox.
And music player 32:36 - programs like Audirvana 2+ and JRiver Media Center can send bit perfect music to a network bridge using the DLNA/UPnP AV protocol. To recap: Easy physical installation but it requires special software for bit perfect playback. Multiple clocks in the PC interfere with the audio clock, resulting in jitter. Furthermore the fan noise and hard disk noise is not wanted in the listening room while electronic noise will pollute the DAC. A good solution for this is a network bridge that effectively functions as a USB Audio Class 2 connection at-a distance, connected to the PC over the network.
This makes that the computer can be places elsewhere in the house. The network bridge produces far less electronic noise than a normal PC and therefore will degrade the music far less. I’ll get back to this later on. This is another software player that works this way and has become enormously popular. When Sooloos was bought by Meridian, the Sooloos founders developed the system further as employees of Meridian. But over time it appeared that the vision they had could not be realised within the Meridian organisation. In 2016 they left Meridian and founded Roon Labs.
33:59 - Their vision was to make the best music player program that would deliver bit perfect music to devices of as many brands as possible. The user interface was based on Sooloos but further perfected. Metadata, artist bio and album reviews are still added. In stead of running on dedicated hardware as with Sooloos, it runs on a normal computer under Windows, MacOS or Linux as long as it complies to some hardware specifications. Like a Core i3 level processor or higher and a separate SSD system drive. There also is a special server version of Roon that runs on a headless Intel NUC computer. Once installed it needs no further user maintenance and updates can be taken care of automatically. Playback can be done as normal with computers over the internal digital to analog conversion or using an external DAC. They also developed a protocol to send audio over the network to what they call an ‘Endpoint’. In its basic form it’s just a network bridge that is connected to the computer over a network connection using a special protocol called RAAT.
In the almost 35:13 - five years of existence they manage to have many brands support this protocol, that is user maintenance free. Over 800 devices by over 220 manufactures are now available. Those include brands like Bluesound, NAD, Auralic, Volumio and others that support RAAT next to their own protocol. It supports Multiroom and allows for plug-ins, called extensions. Next to the RAAT protocol Roon can also stream to Sonos, Chromecast and Airplay devices, of course with the hardware limitation of 48 kHz.
Despite the price that is higher than 35:53 - that of other music player software it is very popular amongst music lovers and high-end audio fans. One reason might be that it works flawlessly with equipment like a Raspberry Pi with sound card running RAAT software, with streamers costing tens of thousands of Euros and anything in between. A Bluesound speaker in the study, a Sonos waterproof ceiling speaker in the bathroom, the old midi stereo with Chromecast streamer in the kitchen, the PC speakers in the study and the Auralic Aries G2 network player as RAAT endpoint in the high-end audio system in the living. All controllable from your smartphone, tablet of computer. It integrates your music collection completely with your pick of music in Tidal and Qobuz streaming services.
I often don’t know whether I am streaming from either Tidal or Qobuz 36:48 - or from my own music collection. But this all comes at a price: a yearly subscription costs $120, a life time subscription $ 700. But if you have a subscription on Tidal or Qobuz, they see one Roon Server as one client while it can server all your endpoints with different music in different rooms at the same time. So there are closed systems, linked to a hardware manufacturer. There are generic systems supported by many brands but there are also systems that that are often added as second system, like Apple Airplay on a DLNA streamer or Roon Endpoint on a Bluesound streamer.
Recently 37:34 - I have seen ChromeCast Audio added to streamers. Bluetooth is also supported on many devices but this is a lossy system using only about 20% of the bits and thus throwing away information. You could also use a normal household computer for music reproduction but if you are only slightly critical on sound quality, you might be tempted to add all kinds of quality improving measures like a linear power supply for the computer, a USB signal cleaner or a I²S PCI card with separate audio grade power supply. Been there, done it. And they all help a bit but in the end you have payed more on those accessories than a better sounding network bridge would have cost. But that’s less adventurous and the money has to be spent at one instance where with those gadgets you get a few happy buying moments more.
Believe 38:27 - it or not, I’m not even being cynical. I know how it is to go that route. That’s where I got a part of my experience from. And with that bombshell we came to the end of this video. I hope you enjoyed it. If so, it’s good to know there will be another video next Friday, as always at 5 PM central European time. If you don’t want to miss that, subscribe to this channel or follow me on the social media so you’ll be warned when new videos are out. If you liked this video, give it a thumbs up.
Many thanks to 39:01 - all that support this channel financially, it keeps me independent and thus trustworthy. If you also feel like supporting my work, the links are in the comments below this video on Youtube. I am Hans Beekhuyzen, thank you for watching and see you in the next show or on theHBproject.com. And whatever you do, enjoy the music. .