blog/content/posts/pipewire-compressed-offload.md

+++
title = "Supporting ALSA compressed offload in PipeWire"
date = 2024-03-18
+++

## Background

**Editor's note**: this work was completed in late 2022 but this post was unfortunately delayed.

Modern day audio hardware these days comes integrated with Digital Signal
Processors integrated in SoCs and audio codecs. Processing compressed or encoded
data in such DSPs results in power savings in comparison to carrying out such
processing on the CPU.

```
     +---------+      +---------+       +---------+
     |   CPU   | ---> |   DSP   | --->  |  Codec  |
     |         | <--- |         | <---  |         |
     +---------+      +---------+       +---------+
```

This post takes a look at how all this works.

## Audio processing

A traditional audio pipeline, might look like below. An application reads encoded
audio and then might leverage a media framework like GStreamer or library like
ffmpeg to decode the encoded audio to PCM. The decoded audio stream is then handed
off to an audio server like PulseAudio or PipeWire which eventually hands it off
to ALSA.

```
                    +----------------+
                    |   Application  |
                    +----------------+
                            |           mp3
                    +----------------+
                    |    GStreamer   |
                    +----------------+
                            |          pcm
                    +----------------+
                    |    PipeWire    |
                    +----------------+
                            |          pcm
                    +----------------+
                    |      ALSA      |
                    +----------------+
```

With ALSA Compressed offload, the same audio pipeline would look like this. The
encoded audio stream would be passed through to ALSA. ALSA would then, via it's
compressed offload API, send the encoded data to the DSP. DSP does the decode
and render.

```
                    +----------------+
                    |   Application  |
                    +----------------+
                            |           mp3
                    +----------------+
                    |    GStreamer   |
                    +----------------+
                            |          mp3
                    +----------------+
                    |    PipeWire    |
                    +----------------+
                            |          mp3
                    +----------------+
                    |      ALSA      |
                    +----------------+
```

Since the processing of the compressed data is handed to a specialised hardware
namely the DSP, this results in a dramatic reduction of power consumption compared
to CPU based processing.

## Challenges

- ALSA Compressed Offload API which is a different API compared to the ALSA PCM
  interface, provides the control and data streaming interface for audio DSPs.
  This API is provided by the [tinycompress](https://github.com/alsa-project/tinycompress)
  library.

- With PCM there is the notion of `bytes ~ time`. For example, 1920 bytes,
  S16LE, 2 channels, 48 KHz would correspond to 10 ms. This breaks down
  for compressed streams. It's impossible to estimate reliably the duration of
  audio buffers when handling most compressed data.

- While sampling rate, number of channels and bits per sample are enough to
  completely specify PCM, various parameters may have to be specified to enable
  the DSP to deal with multiple compressed formats.

- For some codecs, additional firmware has to be loaded by the DSP. This has to
  be handled outside the context of audio server.

## Requirements

- Expose all possible compressed formats.

- Allow a client to negotiate the format.

- Stream encoded audio frames and not PCM.

## PipeWire

PipeWire has become the default sound server on Linux, handling multimedia
routing and audio pipeline processing. It offers capture and playback for
both audio and video with minimal latency and support for PulseAudio, JACK,
ALSA, and GStreamer-based applications.

## SPA

PipeWire is built on top of SPA (Simple Plugin API), a header only API for
building plugins. SPA provides a set of low-level primitives.

SPA plugins are shared libraries (.so files) that can be loaded at runtime.
Each library provides one or more `factories`, each of which may implement
several `interfaces`.

The most interesting interface is the `node`.

- A node consumes or produces buffers through ports.

- In addition to ports and other well defined interface methods, a node can have
  events and callbacks.

Ports are also first class objects within the node.

- There are a set of port related interface methods on the node.

- There may be statically allocated ports in instance initialization.

- There can be dynamic ports managed with `add_port` and `remove_port` methods.

- Ports have `params` which can be queried using the `port_enum_params` method
  to determine the list of formats `EnumFormat`, the currently configured format
  `Format`, buffer configuration, latency information, `I/O areas` for data structures
  shared by port, and other such information.

- Some params such as the selected format can be set using the `port_set_format`
  method.

## Implementing compressed sink SPA node

This section covers some primary implementation details of a PipeWire SPA node which
can accept an encoded audio stream and then write it out using ALSA compressed
offload API.

```c
static const struct spa_node_methods impl_node = {
	SPA_VERSION_NODE_METHODS,
	.add_listener = impl_node_add_listener,
	.set_callbacks = impl_node_set_callbacks,
	.enum_params = impl_node_enum_params,
	.set_io = impl_node_set_io,
	.send_command = impl_node_send_command,
	.add_port = impl_node_add_port,
	.remove_port = impl_node_remove_port,
	.port_enum_params = impl_node_port_enum_params,
	.port_set_param = impl_node_port_set_param,
	.port_use_buffers = impl_node_port_use_buffers,
	.port_set_io = impl_node_port_set_io,
	.process = impl_node_process,
};
```

Some key node methods defining the actual implementation are as follows.

### **`port_enum_params`**

`params` for ports are queried using this method. This is akin to finding out
the capabilities of a port on the node.

For the compressed sink SPA node, the following are present.

- `EnumFormat`

  This builds up a list of the encoded formats that's handled by the node to
  return as a result.

- `Format`

  Returns the currently set format on the port.

- `Buffers`

  Provides information on size, minimum, and maximum number of buffers to be used
  when streaming data to this node.

- `IO`

  The node exchanges information via `IO` areas. There are various type of `IO`
  areas like buffers, clock, position. Compressed sink SPA node only advertises
  `buffer` areas at the moment.

The results are returned in an [SPA POD](https://docs.pipewire.org/page_spa_pod.html).

### **`port_use_buffers`**

Tells the port to use the given buffers via the `IO` area.

### **`port_set_param`**

The various `params` on the port are set via this method.

`Format` `param` request sets the actual encoded format that's going to be streamed
to this SPA node by a pipewire client like `pw-cat` or application for sending to
the DSP.

### **`process`**

Buffers containing the encoded media are handled here. The media stream is
written to the IO buffer area which were provided in `use_buffers`. The encoded
media stream is written to the DSP by calling `compress_write`.

### **`add_port`** and **`remove_port`**

Since dynamic ports aren't supported, these methods return a `ENOTSUP`.

## `pw-cat`

`pw-cat` was modified to support negotiation of encoded formats and passing the
encoded stream as is when linked to the compressed sink node.

## Deploying on hardware

Based on discussions with upstream compress offload maintainers, we chose a
Dragonboard 845c with the Qualcomm SDM845 SoC as our test platform.

For deploying Linux on Embedded devices, the tool of choice is Yocto.
Yocto is a build automation framework and cross-compile environment used to
create custom Linux distributions/board support packages for embedded devices.

Primary dependencies are

- tinycompress

- ffmpeg

- PipeWire

- WirePlumber

The `tinycompress` library is what provides the compressed offload API. It makes
`ioctl()` calls to the underlying kernel driver/sound subsystem.

`ffmpeg` is a dependency for the example `fcplay` utility provided by
`tinycompress`. It's also used in `pw-cat` to read basic metadata of the encoded
media. This is then used to determine and negotiate the format with the compressed
sink node.

`PipeWire` is where the compressed sink node would reside and `WirePlumber` acting
as the session manager for `PipeWire`.

Going into how Yocto works is beyond the scope of what can be covered in a blog
post. Basic Yocto project concepts can be found [here](https://docs.yoctoproject.org/overview-manual/concepts.html?highlight=meta+layer#yocto-project-concepts).

In Yocto speak, a custom
[meta layer](https://github.com/asymptotic-io/meta-asymptotic) was written.

Yocto makes it quite easy to build `autoconf` based projects. A new `tinycompress`
[bitbake recipe](https://github.com/asymptotic-io/meta-asymptotic/blob/master/recipes-multimedia/tinycompress/tinycompress.bb)
was written to build the latest sources from upstream and also include the
`fcplay` and `cplay` utilities for initial testing.

The existing PipeWire and WirePlumber recipes were modified to point to custom git
sources with minor changes to default settings included as part of the build.

## Updates since the original work

Since we completed the original patches, a number of changes have happened thanks
to the community (primarily Carlos Giani). These include:

- A device plugin for autodetecting compress nodes on the system

- Replacing `tinycompress` with an internal library to make all the requisite
  `ioctl()`s

- Compressed format detection (which was previously waiting on
  [an upstream API addition we implemented in `tinycompress`](https://github.com/alsa-project/tinycompress/pull/16)

## Future work

- Make compressed sink node provide clocking information. While the
  API provides a method to retrieve the timestamp information, the relevant
  timestamp fields seem to be not populated by the `q6asm-dai` driver.

- Validate other encoded formats. So far only MP3 and FLAC have been validated.

- May be the wider community can help test this on other hardware.

- Add capability to GStreamer plugin to work with compressed sink node. This
  would also help in validating pause and resume.