Building the Perfect Audio Editor

In the past, I have gone into quite some detail about why I believe that the current crop of Linux multi-trackers are not up to the job for both recording my own music and recording LUGRadio. To remedy this, Aq and I sat down for a few evenings and fleshed out a bunch of ideas about how an audio editor should work. Although much of the interface can be inspired from contemporary editors such as Cubase, there really is no point in just re-implementing the same interface for the sake of it. It makes sense to step back and re-think the interface in ways that might make the application more usable, more learnable and easier to use.

The aim of this short document is to give a clear understanding about what I would love to see in a general purpose audio editor, and to share some of the subtle ways in which we have thought through some functional and interface concepts. This document is by no means complete and really is just the result of a first round of thinking – building an audio editor is a complex job, and so is designing one. However, this guide should hopefully provide a solid grounding in which to move forward.

Of course, the plan is to hopefully see these ideas implemented in something such as JonoEdit (their name, not mine :P), which has been discussed on the forums in some detail and also has a wiki page. Thanks to everyone who is keen to get involved in it.

Functionality

The editor really needs to have this core set of features to be useful:

  • The ability to record from any ALSA sound card, including multi-input cards such as the M-Audio Delta 44.
  • Non-destructive editing.
  • Undo/Redo (at least to a reasonable limit, but preferably unlimited).
  • Be able to edit the volume curve in different parts of the track.
  • Resizable track views to easily zoom in and out of a waveform.
  • Support for effects plug-ins, most notably LADSPA.
  • Be able to apply effects to an entire track or a selected portion of a track. When applying effects, there should be the ability to preview the sound with the effect before it is applied.
  • It should be able to master to OGG, MP3 and WAV.
  • It should be able to import OGG, MP3 and WAV.
  • Most important, it should be easy and intuitive to use.

There of course, some non-essential features that would be nice:

  • Support for VST plug-ins or good modules with (and this is important) sensible defaults. Modules I use are Compression, Limiting, DoubleDelay, EQ, Reverb etc.
  • Sensible wave editing. I only use simple stuff, but I should be able to cut and merge waves and insert fades and silence easily.
  • Well documented python API where users can write there own scripts to extend JonoEdit.

Interface thoughts

The main area in which Linux audio software seems to have a problem is with the interface. Editors of today seem rather complex, unwieldy beasts that rely a lot of technical knowledge and a keen understanding of the parlance of multi-track recording. There is no reason why the software needs to be this complex. Aside from rethinking some of these interface elements, something as simple as just changing the jargon is useful. Why call them Tracks? They are really Instruments. Why refer to collections of tracks as Subgroups? They are really Combinations. Simple changes such as this as well as some simple and effective design could create a ground shaking audio editor.

The point here is not in satisfying every possible music recording edge case. The point is to create a sensible editor that satisfies the general requirements of people who want to record some music and audio on their computer. As such, JonoEdit leaves out lesser used or overly complex functionality and instead focuses on key features and functionality. Naturally, I have modelled this around my own needs, which I feel are fairly typical. I record music (Recreant View) and a radio show (LUGRadio) and these two projects use a good subset of features in an audio editor.

The interface should follow some simple golden rules, mostly nicked from good design theory:

  1. The interface should be as simple and intuitive as possible. The requirement to read the manual should be kept to a bare minimum, and the interface should be as discoverable and intuitive as possible.
  2. The user should never be asked for information the computer can figure out itself. Never query the user for details that can be detected, discovered or reasonably assumed.
  3. The language of the application should be the language that musicians understand.
  4. The interface should work in a visual way that maps to the context of music recording. As an example, if you need to connect something (such as an Instrument to a Combination Socket, drag a line as if you were making a connection with a cable).
  5. The user need to remember as few commands and screens as possible.

Suggested design

To solve these problems, we have come up with a suggested design. Again, this interface is by no means complete, but it provides a simple and consistent design with plenty of room for expansion. The current design caters for the major features discussed above.

The application is split into three major areas (called Workplaces):

  • Recording – in this view, the different instruments are displayed. This view is typically used when the music is recorded.
  • Compact Mix – this view combines a smaller Recording view and the mixing sliders. This is ideal for setting rough volume settings.
  • Detailed Mix – this view is where the detailed mixing happens. In this view more detailed sliders are displayed as well as Combinations (the new name for Subgroups) and other bits and pieces.

Each of these views should be mapped to keyboard shortcuts such as F5, F6 and F7, as well as large toolbar buttons. With three main views to remember, the application’s learning scope is easier – each of those three views can service most typical needs. Also, these these three views are quicker to access – once the user learns the shortcuts it is quick to navigate around the application. Another key decision has been to avoid using MDI and floating windows. Floating windows are a pain in Cubase and require a lot of mousing around to stop windows obscuring windows. This design avoids all of this.

Getting started

When the application starts[1], it will look something like this:

[1] well, the application will display this interface but no doubt also throw up a getting started wizard

This is the interface with no instruments loaded into it. I have numbered the different parts of the application to describe how it fits together:

  1. The toolbar. The toolbar contains a limit selection of large essential icons. Unlike other multi-trackers, this one should not bombard the user with a huge number of buttons. When recording music, you are often stood up, holding a guitar and need large buttons that don’t require a great degree of mousing accuracy. This buttons include the Record, Play and Stop buttons as well as buttons to add a New Instrument and the click track.
  2. The menu bar.
  3. The timing view. Sat on the toolbar is the timing view which displays the time of the track. This can be flexible. It would be good to not only show the current time but also have different modes such as a bar count and possibly even show the total length of the recording.
  4. The workspace buttons. These are toggle buttons that toggle between the different workspaces discussed above. Note: each of the icons shown on the buttons would be different, and not the swirls shown here.
  5. This area contains a few elements. On the left side is the click track time. If the user clicks this box they can change the regularity of the clicks. To the right of this are two long lines. The top line is where markers can be placed to set the start and end point of sections (as an example, if you record a song and want to master it to a single audio file, you would click the line to add a starting marker and also add an ending marker – the song would then be mastered between those two markers). Beneath that line is a grid which notes the times in seconds and minutes or bars. The lines that divide this area are when the click track clicks.
  6. This is where the workspaces display their views.
  7. This portion of screen is left clear. This would be good to use for status labels and any essential editing buttons.

When the user clicks on the New Instrument button, they see something such as this:

The user selects on the instrument(s) that they want to add. The term Tracks has been thrown out and and replaced with Instruments. When you record something in a multi-tracker you are always recording from an instrument. This could be a guitar, woodwind instrument, bass, drum, vocal microphone or even a sound effect. By using the term Intrument it not only relates to the language of audio production, but there are some other key benefits:

  • A small image of the instrument in question can be placed on the screen and used as a visual cue to see which instrument is which.
  • By specifying a specific instrument, certain effects could be set up with some sane defaults. As an example, if you are going to add a vocal instrument, you are likely going to want a specific type of compression (audio compression, not not data compression) and some subtle reverb. Setting these sane defaults provide a great starting point for users to customise these effects. Again, remember, this may not be suitable for everyone, but I suspect it will be suitable for the majority of cases.

When the instrument(s) have been selected, the screen looks like this:

Each instrument that is added to the main part of the screen has some key features:

  • The name and image of the instrument is on the left side of the box. This makes the instrument easy to identify. The name is clickable so the user can change the name to something specific to them such as the person who plays the instrument.
  • The record toggle button enables that instrument for recording.
  • The M button mutes the instrument and the S button turns on Solo, in which only that instrument is heard. Both of these buttons are toggle buttons.
  • The black wavey line is the volume curve for the track. In most multi-track editors a waveform is displayed, but in reality, a waveform is not really required and simple a volume curve is more attractive and more useful. Note: the area below the volume curve would be shaded a colour, but I could not figure out how to do this in Inkscape. :P
  • The watermarked instruments that appear behind the volume curve help identify the instrument easily.
  • The red line is the playhead in the project. The user can click the timing line to move the play head.

To record an instrument the user would do this:

  1. Tick the small record button on the relevant instrument(s).
  2. Move the play head to the start of the track and hit the main Record button on the toolbar. As the user plays their piece, the volume curve is updated as it records.

When the piece is recorded, the user can perform a number of adjustments to the instrument track:

  • The volume curve can be grabbed with the mouse cursor and dragged to adjust the volume visually.
  • Tools can be used to cut the waveform. To do this, a tool is selected (I have not added any of these tools to the interface yet, but they could appear in the space at the bottom of the screen) and the user can click at any position on the instrument. At that point, a cut is made and then the user can select the left or right part of the cut and delete it (by pressing the Delete key) or move it by dragging it with the mouse.

With the piece recorded, the user is likely to want to adjust the volume of the recording. To do this they hit the Compact Mix or Detailed Mix buttons.

Compact Mixing

The idea behind the compact mix workspace is to see both the instrument view and some main fader controls. This view is useful when you need to see the different parts in the different instruments, but also be able to mix them. When you have a complex song with lots of parts that start in different sections of the song, a view like this is essential. This view improves on MDI based tools as there is no obscuring of the instrument or faders – both are clearly in view.

It looks like this:

On the left side, each fader is displayed with three buttons:

  • The mute button (M) – this mutes the instrument.
  • The solo button (S) – this mutes everything else but this instrument.
  • The effect button (E) – this opens up another dialog to configure effects that are applied to the instrument.

The effects button is used to specify specific effects for that specific instrument. The use of effects is not covered in this document and will be developed later. From a UI perspective all we need to know for now is that when you hit the E button, another dialog box pops up to apply effects. That dialog box will be designed and developed later.

The rectangular region to the right of each fader is a VU meter that displays the current volume when the project is playing. This meter shows in real time the current volume of each instrument and the fader can then be used to adjust the volume. This VU meter should also be coloured like most VU meters with green at the bottom, amber in the middle and red at the top. This colour scheme is an established system in the majority of recording equipment.

To the right of the screen are the (L)eft and (R)ight volume faders and their VU meter. These faders control the general volume for the project.

Detailed Mixing

Detailed mixing occurs when the instruments have been recorded and the parts no longer need to be edited and moved in the instrument view. All that is left is to apply effects and mix the project.

This is what the view looks like:

Inside this view, the instrument faders are visible, and each fader has similar buttons to the Compact Mix view. There are a few differences though:

  • The Shape button is used to adjust the EQ of the track. There must be a better word than Equalisation to describe what it does, and Shape was my first reasonable idea. I am sure there is a better word though. Clicking this button opens up another window to set the EQ on the instrument.
  • With the sliders taking up the full size of the window, it enables finer grained control of the sliders an more accuracy.

Another feature in the Detailed Mix workspace are Combinations. In the multi-tracking world, there is a concept called Subgroups. The idea behind a subgroup is that you can take a bunch of tracks and assign them to a single fader. As an example, imagine you are recording a drum kit. You have five instruments added, one for each part of the kit. When you have the general volume of the sliders set in relation to each other (such as making sure the snare is louder than the hi-hat), you can then assign those sliders to a subgroup slider. This single subgroup slider then controls the five drum sliders.

In other software, setting up these subgroups is complex, so I have re-thought this. Firstly, lets throw Subgroup out of the window and instead call them Combinations. To create combinations, you click on a menu item and get this dialog box:

On the left side are the current instruments, and the right side are the Combination Sockets. To make the connection, just drag from the instrument to the correct socket, and a line will be displayed to indicate the connection has been made. Simple as that!

Back in the Detailed Mix window, you can see the Combinations sliders on the right hand side. Currently I have not visually indicated in this view which instruments are assigned to which Combination slider. This is something I am currently thinking about.

Conclusion

So there we have it – that is the proposed design for the bulk of what I would consider the perfect recording tool; a tool that combines power, ease of use and good design. I really, really hope that some of the discussions about implementing these ideas goes ahead, and I hope that the implementation uses Python, GStreamer, PyGTK and Glade, so I can contribute where I can.

Naturally, if there are any questions or thoughts, do use the comment box to post them. Thanks.

  • Grilo

    I’d like to help you developing your application IF you’re really using Python + PyGTK.

    Have my e-mail: joao DOT grilo AT gmail DOT com

  • Sparkes

    I’ve been looking forward to seeing these designs as things like cubase have subtuly different interfaces to each other but are similar enough to be understandable. This looks good. Not cloning an existing interface but not pretending to be ‘the real world’ just enough of both to make in intuitive to people familier to both ways of working.

    Best of luck with coding this the interface is certainly something that can be done in python as can most of the code but a few parts are jobs for C. hopefully these low level parts can be pulled out of the main code and run seperately making everything nice and easy.

    A word of caution with the testing, run it on both your Mac and intel boxes as artists are endian agnostic and the low level bits you might get dragged into coding will require extra testing.

  • http://www.jonobacon.org/?p=722 jonobacon@home » Announcing Jokosher 0.1

    [...] The Jokosher team are proud to announce our very first 0.1 release of Jokosher; a simple, usability focused Open Source multi-track studio. Since the original design and conception of the project in February, a team of developers, documentation writers, artists, testers and packagers have worked together to create a compelling first release. I am so proud of every single person involved. [...]

  • http://linux.wordpress.com/2006/10/22/jokosher-powerful-multi-track-studio/ Jokosher: Powerful Multi-Track Studio « Linux and Open Source Blog

    [...] It is not cross-platform like Audacity, but it aims to be the perfect audio editor. [...]

  • http://linuxzacatecas.com/2006/11/17/reporte-del-ubuntu-developer-summit-2006/ Linux Zacatecas » Blog Archive » Reporte del Ubuntu Developer Summit 2006

    [...] Con respecto a la edición de audio en Linux Jono Bacon demostró una nueva aplicación llamada Jokosher, que será incluída con Ubuntu 7.04. De hecho, la versión 0.1 de Jokosher está incluida en Ubuntu 6.10, pero de hecho es una aplicación sin todas las características que se pretenden implementar. Para Bacon, dio inicio con este proyecto porque en Linux simplemente no hay soporte para una herramienta simple para la edición de audio. Ha puesto ideas en su blog para que se convierta en la aplicación ideal. Con esto la idea no es trabajar con tracks, sino con instrumentos, muy similar a GarageBand de Apple. Aunque Bacon niega la influencia del editor para Mac, según él Jokosher es un editor modal, esto quiere decir que la interfaz cambiará dependiendo si lo que quieres es grabar o mezclar un proyecto. Una vez que se ha grabado un instrumento, la edición del audio es sencilla. Si las cosas marchan bien, y para la liberación de Feisty, Jokosher quizá se vuelva una “killer-app” para aquellos que quieran usar Linux para producir audio de manera sencilla. [...]

  • http://www.pcultimate.net/program-one/archives/53/reporte-del-ubuntu-developer-summit-2006/ Program-One » Archive » Reporte del Ubuntu Developer Summit 2006

    [...] Con respecto a la edición de audio en Linux Jono Bacon demostró una nueva aplicación llamada Jokosher, que será incluída con Ubuntu 7.04. De hecho, la versión 0.1 de Jokosher está incluida en Ubuntu 6.10, pero de hecho es una aplicación sin todas las características que se pretenden implementar. Para Bacon, dio inicio con este proyecto porque en Linux simplemente no hay soporte para una herramienta simple para la edición de audio. Ha puesto ideas en su blog para que se convierta en la aplicación ideal. Con esto la idea no es trabajar con tracks, sino con instrumentos, muy similar a GarageBand de Apple. Aunque Bacon niega la influencia del editor para Mac, según él Jokosher es un editor modal, esto quiere decir que la interfaz cambiará dependiendo si lo que quieres es grabar o mezclar un proyecto. Una vez que se ha grabado un instrumento, la edición del audio es sencilla. Si las cosas marchan bien, y para la liberación de Feisty, Jokosher quizá se vuelva una “killer-app” para aquellos que quieran usar Linux para producir audio de manera sencilla. [...]

  • Neilen

    All the images linked in the article seems to have disappeared, perhaps you could try to reinstate them?

  • http://linux2u.wordpress.com/2006/10/22/jokosher-powerful-multi-track-studio/ Jokosher: Powerful Multi-Track Studio « alll about linux

    [...] It is not cross-platform like Audacity, but it aims to be the perfect audio editor. [...]