Archive for the ‘Usability’ Category
Posted on March 28, 2007 - by jono
Organic interface design for GNOME
Interface design is a complex business. There are a great many schools of thought about how to build an effective interface, and ultimately no-one is 100% correct. Lots of theory, lots of academia, lots of opinion, but little hard evidence about what design constructs actually work best for general human-computer interaction.
Recently I kicked off a segment on everyone’s-favorite-un-PC-ramblefest, LUGRadio, in which I expressed concerns that the GNOME project is not deciding on a direction for a next-gen incarnation of the environment, and KDE4 is primed to swoop in and eat its lunch. I am pleased to see the segment kicked off some discussion, and the issue has been raised in the minds of some core GNOME contributors.
While at GUADEC 2006 I sat on the patio of our wooden shack with Mirco Muller at about 3am and we spent quite some time discussing concepts about what a next-gen GNOME could look like. For a while I had been mulling over different concepts and ideas about how GNOME should work, and trying to distill them into core interactions for a desktop. In my mind, before you even think about mocking up a a user interface design, you need to define the modes of interaction; they are like deciding which tools and ingredients you are going to need to bake a cake. If you don’t decide on the tools and ingredients, you cannot effectively move onto the design stage and then the implementation.
The problem with current desktops is that they are largely artificial. We have created modes of interaction that the user has to learn to understand the computer, instead of the computer trying to understand the user. We have to learn where things live, how to move things around, which things can be clicked on and which can’t, how sensitivity and insensitivity works and other false economies. Fundamentally we the users have to fit in with what the computer wants us to do.
The next-gen GNOME needs to change this. It really, really does. What I want to see is an organic environment; one that is designed around human interactions, tasks and concepts that we find natural, intuitive and repeatable. Do you ever have those experiences where you think “it would make sense if it worked this way, I wonder if it does” and to your surprise it does? We need to fill our desktop with these experiences. To do this, we need to understand what interactions and concepts are natural to us as humans, and work on these concepts in GNOME.
So, with time not my friend right now, here is a rough list of some organic concepts that I think we need to bear in mind in our thinking:
- Pile Theory – nope, nothing to do with a nasty dose of the bum grapes, but the idea that we all naturally collect and stack things together into piles. I think this is a fundamental concept in a desktop – collections of things. Think of archives, directories, photo sets, collections of songs, related videos – they are groups of things that we need to access both as a group and as the individual items in that group. You can see this theory in action, look at many people’s desktops and the groups of icons of related bits and pieces – we need to make it easy to great this piles. Imagine a 3D interface to this piles where a bunch of items pile on top of each other and you can explode the pile or fit it together and re-organise it in different ways.
- A Physical Environment – I want to pick up documents that I am editing, spin them round and scribble notes on them, I want them to look like they are shredded when I delete them, I want to stick related things together like lego – I want a physicality to the things that happen on my desktop. A great first step with this was when Compiz put virtual desktops on a cube – it made the concept of multiple desktops more tangible. We need to apply this kind of physicality to all aspects of the desktop.
- Contextual Tools – something I have banged on about with Jokosher. You should only ever see tool options appear when it makes sense and when you can actually use those tools – insensitive greyed out tool options are nothing more than a distraction and a waste of space. In Jokosher, when you make a selection, the tools that can be used on that selection appear, we need to apply this concept to the entire desktop. This makes the desktop feel more organic in itself as the tools will only ever appear applicable to your context. It also makes the desktop far less cluttered and gets away from the nightmare of modal tools. We particularly want to get away from the hundreds of toolbar options available that clutter our applications. For all people have heralded the Ribbon as a great idea in Microsoft Office, I am pretty convinced that it may over-egg the pudding and confuse people with so many functional options available. We fundamentally need our desktop to be contextual – more on this later.
- Two Handed Interaction – some of the work with multiple mouse pointers makes this possible. For some applications this makes perfect sense. Think of a 3D modeller such as Blender – the most natural modelling process is sculpting using your hands, and this requires two hands. Think about putting things in other things – it makes sense to hold the container open to put the things in it (like when you put items in a carrier bag). Naturally there is a hardware implication for this which will delay its adoption.
- Real Contextual Working – a while back I wrote up my thoughts for a project desktop. We need our applications to be aware of what the user wants to do and ensure they are organic enough to evolve into a form that is condusive to that task.
With the growth in 3D technology, we have the opportunity to make all of this happen. This is just a collection of rough notes, but at GUADEC I hope to flesh some of these ideas out with other people. We need to break down the barriers to interaction, but also be brave enough to stand up and set a direction, which was the point of the segment. If I had my own way, I would love to blow off a week and spend all week designing a bunch of mock-ups. I have a fairly clear idea in my head how this kind of stuff would work, inspired from various interfaces and concepts, but I just don’t have the time to mock it up.
Posted on January 29, 2007 - by jono
Interviewed on Linux Action Show
The other day I did an interview with the fellas on the Linux Action Show. They are nice guys, good fun, and the interview covered my work with Ubuntu, Jokosher, my new book and lots of discussion.
Go and download it.
Posted on January 27, 2007 - by jono
Usability and GNOME video from LCA
On the Monday at LCA, there was a GNOME Love session as part of gnome.conf.au. In this session a bunch of us shouted out things to discuss, and I shouted out ‘application design and usability’. What I didn’t realise was that shouting out a suggestion mean’t volunteering to talk about it. Oof!
So, I got up and discussed my views on usability, good design and making better applications. There was some interesting banter with the other folks in the room, and I think the (completely unprepared, unexpected) session raised some interesting issues that would be of use for most application authors.
As with the rest of LCA, this was videoed for your online viewing kicks, so go and download the video – my bit kicks in at around 12 minutes.
Posted on October 22, 2006 - by jono
Making applications look schaweeeeet
When we started the Jokosher project, we wanted it to kick arse and take names when it came to usability, but also attractiveness. This is why my-friend-and-yours J5 is displayed in the New Project dialog, and why we have spent a lot of time on making Jokosher look attractive, yet neat.
Dialog design is essential here. God gave you eyes and Carl Worth gave you Cairo for a reason, so lets set them on fire and make our applications look the bomb. As Laszlo blogged about, I added some header images to the effects dialog boxes when I was hacking on the code; this was to make the dialogs look consistent (by using the orange Jokosher theme) and attractive. One of the problems was with kind of approach was translating the text I put in the images – the text was part of a bitmap. This is a big problem now Jokosher is feeling the i18n love. To fix this, Laszlo recently replaced the images with a Cairo equivalent. As such, now we have good looking dialog boxes that translate well.
Traditionally, form and function have divided people into two approximate camps, one accused of being orange-sunglasses-wearing-hippy-web-two-point-zero-idolising-feckless-morons and the other described as over-technical-geeky-binary-lovers-with-no-mates. Why do we even need to make a choice? Why can’t we feel the love of powerful software…with rounded edges and shiny dialog boxes?
As our desktop moves into a new era, one driven by windows that wobble, software that gets ever more advanced and users that demand attractiveness and ability, we have so much opportunity. Hey, its not as if we don’t have an incredible platform to do this. Using Cairo as one such example, we have such an awesome ability to use this important component in our desktop to re-shape how we look at software. When I was hacking on the effects dialogs, sure, I could have made my life easier if I just used a plain old push-button for each effect, but I really wanted the dialog to have some life and have some character. There are of course usability concerns here too – in a future version of Jokosher I would like to replace the effects listed in the dialog box with images that look like a physical effects unit such as a stomp box. Cairo gives us the ability to break away from the Gtk mold and explore better ways of representing concepts on the desktop, and better ways to deliver attractive interfaces. Feel the power!
Of course, with power also comes responsibility, and like many of you folks, I never want to see our desktop turn into the invent-your-own-interface-and-toolkit bonanza that is going on in the Windows world. Here we want consistency of toolkit and HIG, but scope to develop new constructs and ideas where it makes sense. When I look at the GNOME desktop, I always feel like there is an opportunity for someone to wave a paintbrush over it to spruce it up. Lets see some of that action going on.
Posted on October 13, 2006 - by jono
KDE 10th Anniversary Event
Here I am at the 10th anniversary event for KDE and its really cool. Last night I had two, count them, two hours of sleep before my alarm went off at 3am. I get back into England at around 11pm. Its a long day, but its worth it. The energy here behind the KDE team is great, and its been fantastic to meet many old and new friends alike.
My talk seemed to go pretty well, and caused some interesting questions and discussion afterwards. Shortly after the talk I did an interview for a podcast which was fun, and then there was an interesting talk by danimo about KDE 4. Danimo’s talk was informative, but the post talk Q+A was even more interesting, and there was some really good discussion.
Its nice to feel the sense of community here, and I have had some great feedback about Canonical, Kubuntu, community management, concerns, expectations and other issues. It has also been nice to hook up with Matthias Ettrich again and also to meet Knut, Torsten Rahn (coolest name evaaar), Klaus Knopper and Martin Konold.
So all in all, a pretty good event so far, and it has filled my already full head with yet more ideas and thoughts.
Oh, and I entered a raffle for a Trolltech Qtopia Greenphone – I really hope I win. I would love one of those beauties.
Posted on August 3, 2006 - by jono
Making ALSA suck less in GNOME
I read Chris’s blog about ALSA, and it does concern me a little. I think ALSA is most definitely the right direction for us to head in, but if users are experiencing a distinctive difference in audio quality, then this needs fixing, and fixing quick. Personally, I have not noticed the difference, but then again, I rarely use OSS, so I could do with comparing them in more detail.
Although Chris’s post was interesting, one of the comments brought up what I consider the main issue:
The biggest problem with ALSA is that configuring it is beyond any mere mortal. If your stuff doesn’t work out-of-the-box with ALSA, you have zero chance of getting it working yourself. Just try to get the optical output on that nforce4 to work and you’ll see.
Exactly. But, this is not just a problem with ALSA, it is a problem with the presentation of audio in our desktop. My question is – how much of this can we fix at the desktop level? Is there a way we can develop some sane defaults for most users, and at least make a GUI interface to sound that is simple? If you take a card such as the Delta 44 or M-Audio 1010, the configuration gets devilishly difficult due to the number of available inputs/outputs, different types of mixing etc. Even users of these high-end cards should not need to care about this kind of stuff.
I think we have a really strong multimedia stack, and GStreamer is making leaps and bounds in features and stability, but we need to nail that all important middle-ground between application-level multimedia playback and the physical sound card, and this seems to be an area that needs fixing in GNOME. This is incredibly important for Jokosher – we have gone out of our way to make audio production in it as usability and ease-of-use focussed as possible, but the whole user experience falls down if sound card configuration requires a degree in rocket science to use.
So, I ask you all – how much of this is fixable in the desktop, and if not, how much needs to be fixable at the ALSA or kernel layer? Also, how can HAL and FDI files help solve this problem?
Posted on July 19, 2006 - by jono
Recognising potential through history
As an advocate and consultant, I tend to get exposed to a rich tapestry of reasons why Open Source is great and why it is not so great. I am not going going to bother singing to the choir about the great aspects, but I instead want to discuss one of the oft heard criticisms – Linux isn’t as feature rich as Windows and will never be good enough. To approach this problem, you need to divide it into two distinctive areas – feature development and marketing. Both have their own driving forces and issues. So, lets take a look at them both.
Feature development
Human beings are great at making comparisons, that’s what we do. Utterances of “my car is better than yours”, “so, why is this microwave £30 more expensive than that one”, and “well, she/he is nice, but not as nice as xyz” can be heard across the land. Of course, we are no different when evaluating software. As such, comparisons have been made from day one about Linux vs. Windows, the GIMP vs. Photoshop, Ardour vs. Cubase, Firefox vs. IE, Outlook vs. Evolution etc. In the short term, we can indeed draw such conclusions. If application A does something that application B does not, it is fair to judge application A as being the superior product, right? Well, it is not that simple.
First of all, the IT industry is ridden with ‘feature insanity’. I alluded to this, particularly in an educational context in my Unwrapping Learning Potential With Open Source post. This phenomenon manifests in the innate process of comparing two things on a blow by blow feature chart, as opposed to actually identifying if the tool can do what you need it to do. This is particularly prevalent with OpenOffice.org – many, many people can be productive and do what they need to do in OpenOffice.org, but they instead demand that it matches every feature in the latest version of Microsoft Office. So, the first rule is to actually identify which features you need, and in many cases Open Source software is fine and dandy.
The second issue is to analyse the problem with a longer term approach and look at the history of Open Source. Let me outline this with my mad Inkscape skillz:
The diagram looks at the development of the Windows and Linux desktop. Back in the early days, a usable desktop in the Windows world was something such as Windows 3.0 or 3.1. Below the Windows box you can the Linux box is further to the right. The equivalent functionality to Windows 3.0 or 3.1 didn’t come to the Linux desktop until some years later. As such, the Linux desktop was immediately playing catch-up due to its later start. Microsoft had already got in there and started building their product, with plenty of developers and money behind it. How could the desktop compete?
If you now look at the right side of the diagram, you can see that the Windows and Linux desktops are fairly level. If you compare Windows XP to Ubuntu Dapper, SuSE Linux Enterprise Desktop or Fedora Core 5, they are fairly comparable. Sure, there are certain chunks missing, particularly in vertical markets, but people are today making real-world comparisons and considering the desktop in their organisations and homes.
From this we can identify that the Linux desktop is developing a lot quicker than the competition. This proves that the Open Source development process is working, and is working at a higher rate. This does not necessarily mean we are better optimised (100,000 monkeys picking tea are quicker than 10 high-speed tea-picking humans), but it does mean we are developing quicker, and piling in the features and unique selling points.
Marketing
In this months PC Pro, Tim Danton stated that “open-source will never pose a threat to Microsoft”. His argument basically boils down to the fact that Open Source software websites are not as good as the competition – he says that many Open Source project sites consist of just plain text and links. This is clearly an issue about Marketing.
To be honest, sure, there are some pretty dog-awful websites that push Open Source projects out there, and many do indeed consist of boring black text on a white background and a few blue links. But, I would say this is rare for major Open Source projects, and is mainly the case for niche applications, and the same limitations in fascia can be applied to niche software on any platform.
Since my entry to the Open Source community, I have seen developers evolve. Back in the early days, developers were largely code heads who cared for nothing but code. Many of these developers wrote awesome code, but produced terrible websites, ugly interfaces and terse documentation. As Open Source developed and become a serious and credible platform, developers have evolved into code heads with an experience and respect for project management, release schedules, usability, documentation and marketing. Take a look at any of the major Open Source projects and you can see well organised development teams with contributors who help in many different areas such as documentation, art, websites, coding and marketing. This is evolution doing its thing, and the average Open Source developer maintains a remarkably diverse range of skills and appreciation of these different areas. Open Source development process and practice basically grew up as the platform grew.
We have a strong, diverse and fast developing platform, and the really exciting time is as we surpass the competition in the many different areas. This is not a one horse game – we may steam ahead in web browser technology, but we may lag in CAD software. But, as Open Source grows and the platform grows, each of these areas look more and more promising every day. We just need to look at our history to understand our future.
Posted on July 7, 2006 - by jono
Thinking about GNOME 3.0
As many of you will know, I am quite the usability pervert. Understanding how people use computers and creating better and more intuitive interfaces fires me up, and the mere idea of GNOME 3.0 is interesting to me. The reason why I find GNOME 3.0 exciting is that it presents a dream for us; we are not entirely sure where we are going, but we know it needs to be different, intuitive and better for our users.
While at GUADEC I sat down for a while with MacSlow, and he gave me a demo of Lowfat. For quite some time I have had some vague ideas for the approximate direction of GNOME 3.0, and some of Mirco’s work triggered some of these latent concepts and mental scribblings. I am still not 100% sure where I would like to see GNOME 3.0 go, but some of the fundamental concepts are solidifying, and with my recent addition to the mighty Planet GNOME, I figured I should share some ideas and hopefully cause some useful discussion. I am going to wander through a few different concepts and discuss how we should make use of them in GNOME 3.0, culminating in some ideas and food for thought.
A more organic desktop
One of the problems with the current GNOME is that it largely ignores true spatial interaction. Sure, we have spatial nautilus (which is switched off in many distros), but spatial interaction goes much further. If you look at many desktop users across all platforms, the actual desktop itself serves as a ground for immediate thinking and ad-hoc planning in many different ways:
- Immediate Document Handling – the desktop acts as an initial ground for dealing with documents. Items are downloaded to the desktop and poked at before entering more important parts of the computer such as the all-important organised filesystem.
- Grouping – users use the desktop to group related things together. This is pure and simple pile theory. The idea is that people naturally group related things together into piles. Look at your desk right now – I bet you have things grouped together in related piles and collections. We don’t maximise this natural human construct on our desktop. More on this later.
- Deferred Choices – the desktop serves as a means to defer choices. This is when the user does not want to immediately tend to a task or needs to attend to the task later. An example of this is if you need to remember to take a DVD to work the next day – you typically leave it next to the front door or with your car keys. The analogy with the current desktop is that you would set an alarm in a special reminder tool. More people set ad-hoc reminders than alarms.
- Midstream Changes – it is common for users to begin doing a task, and then get distracted or start doing something else. An example of this is if you start making a website. You may make the initial design, and then need to create graphics and get distracted playing with different things in Inkscape. The desktop often acts as a midstream dumping ground for these things. Work in progress in documents are often placed on the desktop, and this acts as a reminder to pick it up later (see deferred choices earlier).
It is evident that the desktop is an area that provides a lot of utility, and this utility maps to organic human interaction – collecting things together, making piles, creating collections, setting informal reminders, grouping related things. These are all operations on things, and are the same kind of operations we do in our normal lives.
Part of the problem with our current desktop is that there is a dividing line between things on the desktop and things elsewhere. It is a mixed maze of meta-data, and inter-connected entities that should be part of the desktop itself. As an example, when I was writing my book, I created word processor documents, kept notes about the book in TomBoy, saved bookmarks in Firefox and kept communications in GMail and Gaim. The singular effort of writing a book involved each of these disparate unconnected resources storing different elements of my project. I would instead like to see these things much more integrated into (a) contextual projects, and (b) manageable at a desktop level. More on contextual projects later.
Blurring the line between files, functions and applications
The problem we have right now is that the desktop is just not as integrated as it could be. If you want to manage files, you do it in a file manager, if you want to do something with those files you do it in an application, if you want to collect together files into a unit, you use an archive manager. Much of this can be done on the desktop itself, but we need to identify use cases and approach the problem from a document-type level.
Let me give you an example. A common type of media are pictures, photographs and other images. The different things you may want to do with those images include:
- Open them
- Edit them
- Compare them
- Collect them together by some form of relevance (such as photos from a trip, or pictures of mum and dad)
- Search for them
These tasks involve a combination of file management, photo editing applications, photo viewing applications and desktop search. Imagine this use case instead:
I want to look through my photos. To do this I jump to the ‘photo collection’ part of my desktop (no directories) and my collection has different piles of photos. I can then double-click on a pile and open up in front of me. Each photo can be picked up and moved around in a physically similar way to a normal desk (this is inspired from MacSlow’s LowFat). I can also spin photos over and write notes and details on the back of them. Using my photos I can put two side by side and increase the size to compare them, or select a number of photos and make them into a pile. This pile can then be transported around as a unit, and maybe flicked through like a photo album. All of this functionality is occurring at the desktop level – I never double-click a photo to load it into a photo viewer, I just make the photo bigger to look at it. All of the manipulation and addition of meta-data (by writing on the back of the photo) is within the realm of real world object manipulation, and obeying pile theory and spatial orientation.
The point here is that the objects on the desktop (which are currently thought of as icons in today’s desktop) are actual real world objects that can be interacted with in a way that is relevant to their type. In the above use case you can make the items bigger to view them, compare them side by side, and scribble notes on them. These are unique to certain documents and not others. You would not zoom, compare and scribble notes on audio for example, but you would certainly use pile theory on audio to collect related audio together (such as albums).
Applications
So, if we are trying to keep interaction with objects at the desktop level, how exactly do we edit them and create new content? How do today’s applications fit into this picture? Well, let me explain…
The problem with many applications is that they provide an unorganised collections of modal tools that are not related to context in any way. I have been thinking about this a lot recently with regards to Jokosher, and this was discussed in my talk at GUADEC. Take for example, Steinberg’s Cubase:
In Cubase, if you want to perform an operation, you need to enable a tool, perform the operation, disable the tool and then do something else. There is a lot of tool switching going on, and toolbar icons are always displayed, often in situations when that tool can either not be used or just would not make sense to be used. The problem is that it obeys the philosophy of always show lots of tools onscreen as it makes the app look more professional. Sure, it may look professional, but it has a detrimental impact on usability.
I believe that tools should only ever be displayed when pertinent to the context. As an example, in Jokosher we have a bunch of waveforms:
The first point here is that we don’t display the typical waveforms you see in other applications. Waveforms are usually used to indicate the level in a piece of audio, and as such, we figured that musicians just want to see essentially a line graph, instead of the spiky waveform in most applications. This immediately cuts down the amount of irrelevant information on screen. Now, if you select a portion of the wave in Jokosher, a tray drops down:
(well, at the moment, it drops down, but does not visually look like a tray, so run with me on this for a bit!)
Inside this tray are buttons for tools that are relevant to that specific selection. Here we are only ever displaying the tools that are pertinent to the context, and this has a few different benefits:
- We don’t bombard our users with endless toolbars
- Tools are always contextual, which makes the interface more discoverable and intuitive
- We restrict the potential of error by restricting the number of tools available for a given context
- There are fewer buttons to accidentally click on, and this lowers the hamfistability of our desktop
Now, take this theory of contextual tools, and apply it at the desktop level. Using our example from earlier with photos, I would like to see contextual tools appear when you view a photo at a particular size. So, as an example, if I have my collection of photos and I increase the size of a photo to look at it, I would like to see some context toolbars float up when I hover my mouse over the photo to allow me to make selections. When I have made a selection I should then see more tools appear. There are two important points here:
- Firstly, you don’t load the photo into an application. As you view the photo at the desktop level, the functionality associated with an editing application seamlessly appears as contextual tools. This banishes the concept of applications. Instead, you deal with things, and interact with those things immediately.
- Secondly, tools are always contextual, and relevant to the media type. For a photo and a document, selections make sense, for an audio file, you should be able to apply effects and adjust the volume, for a text editor you should be able to change font properties. Everything is relevant to the context.
The contextual desktop at a project level
To really make the project feel contextual, we need to be able to make it sensitive to projects and tasks. At the moment, tasks and sub-tasks are dealt with on a per-task basis as opposed to being part of a bigger, grander picture. Let me give you a use case with today’s desktop:
I am working on a project with a client to build a website for him. I decide I want to send some emails to him, so I fire up my email client and dig through my inbox to find the mail he sent me yesterday. I then reply to him. As I work, I realise that I need to speak to him urgently, so I log on to IM. Within my buddy list I see that he is there, so I have a chat with him. While chatting, my friend pops up to discuss the most recent Overkill album. As I am working, I don’t really want to talk about the album right now, so I either ignore him or make my excuses. After finishing the discussion with the client, I load up Firefox and hunt through my bookmarks to find a relevant page and start merging the content into the customers site. To do this I load up Bluefish and look through the filesystem to find my work files and begin the job.
The problem here is that the relevant work is buried deep in other irrelevant items. To make matters worse, some resources such as IM can just prove too distracting, and may never get used (remember midstream changes earlier). As such, the really valuable medium of IM is never used for fear of distraction. Now imagine the use case as this:
I am working on a project with a client to build a website for him. I find my collection of projects and enable it, and my entire desktop switches context to that specific project. Irrelevant applications such as games are hidden, and relevant resources are prioritised. When I communicate with the client, only emails and buddies relevant to that project are displayed. When I want to find resources (such as documents) to work on, only those documents that are part of the project are displayed. The entire desktop switches to become aligned to my current working project. This makes me less distracted, more focused and there is less clutter to trip over.
I actually had this idea a while back, and wrote an article describing it in more detail. Feel free to have a read of it.
Conclusion
The point of this blog post is not to sell you these concepts, but to identify some better ways of working which are more intuitive and more discoverable. Importantly, we need to make our desktop feel familiar. This was a point Jef Raskin made as part of his work, and I agree. Some people have been proposing some pretty wacky ideas for GNOME 3.0, but grandiose UI statements mean nothing unless they feel familiar and intuitive. What I am proposing is an implementation of real world context, relevance and physics into our desktop. This will make it more intuitive, less cluttered, less distracting and a better user experience.
I really want to encourage some genuine discussion around this, so feel free to add comments to this blog post, or reply via Planet GNOME. Have fun!
Posted on January 13, 2006 - by jono
Building the Perfect Audio Editor
In the past, I have gone into quite some detail about why I believe that the current crop of Linux multi-trackers are not up to the job for both recording my own music and recording LUGRadio. To remedy this, Aq and I sat down for a few evenings and fleshed out a bunch of ideas about how an audio editor should work. Although much of the interface can be inspired from contemporary editors such as Cubase, there really is no point in just re-implementing the same interface for the sake of it. It makes sense to step back and re-think the interface in ways that might make the application more usable, more learnable and easier to use.
The aim of this short document is to give a clear understanding about what I would love to see in a general purpose audio editor, and to share some of the subtle ways in which we have thought through some functional and interface concepts. This document is by no means complete and really is just the result of a first round of thinking – building an audio editor is a complex job, and so is designing one. However, this guide should hopefully provide a solid grounding in which to move forward.
Of course, the plan is to hopefully see these ideas implemented in something such as JonoEdit (their name, not mine
), which has been discussed on the forums in some detail and also has a wiki page. Thanks to everyone who is keen to get involved in it.
Functionality
The editor really needs to have this core set of features to be useful:
- The ability to record from any ALSA sound card, including multi-input cards such as the M-Audio Delta 44.
- Non-destructive editing.
- Undo/Redo (at least to a reasonable limit, but preferably unlimited).
- Be able to edit the volume curve in different parts of the track.
- Resizable track views to easily zoom in and out of a waveform.
- Support for effects plug-ins, most notably LADSPA.
- Be able to apply effects to an entire track or a selected portion of a track. When applying effects, there should be the ability to preview the sound with the effect before it is applied.
- It should be able to master to OGG, MP3 and WAV.
- It should be able to import OGG, MP3 and WAV.
- Most important, it should be easy and intuitive to use.
There of course, some non-essential features that would be nice:
- Support for VST plug-ins or good modules with (and this is important) sensible defaults. Modules I use are Compression, Limiting, DoubleDelay, EQ, Reverb etc.
- Sensible wave editing. I only use simple stuff, but I should be able to cut and merge waves and insert fades and silence easily.
- Well documented python API where users can write there own scripts to extend JonoEdit.
Interface thoughts
The main area in which Linux audio software seems to have a problem is with the interface. Editors of today seem rather complex, unwieldy beasts that rely a lot of technical knowledge and a keen understanding of the parlance of multi-track recording. There is no reason why the software needs to be this complex. Aside from rethinking some of these interface elements, something as simple as just changing the jargon is useful. Why call them Tracks? They are really Instruments. Why refer to collections of tracks as Subgroups? They are really Combinations. Simple changes such as this as well as some simple and effective design could create a ground shaking audio editor.
The point here is not in satisfying every possible music recording edge case. The point is to create a sensible editor that satisfies the general requirements of people who want to record some music and audio on their computer. As such, JonoEdit leaves out lesser used or overly complex functionality and instead focuses on key features and functionality. Naturally, I have modelled this around my own needs, which I feel are fairly typical. I record music (Recreant View) and a radio show (LUGRadio) and these two projects use a good subset of features in an audio editor.
The interface should follow some simple golden rules, mostly nicked from good design theory:
- The interface should be as simple and intuitive as possible. The requirement to read the manual should be kept to a bare minimum, and the interface should be as discoverable and intuitive as possible.
- The user should never be asked for information the computer can figure out itself. Never query the user for details that can be detected, discovered or reasonably assumed.
- The language of the application should be the language that musicians understand.
- The interface should work in a visual way that maps to the context of music recording. As an example, if you need to connect something (such as an Instrument to a Combination Socket, drag a line as if you were making a connection with a cable).
- The user need to remember as few commands and screens as possible.
Suggested design
To solve these problems, we have come up with a suggested design. Again, this interface is by no means complete, but it provides a simple and consistent design with plenty of room for expansion. The current design caters for the major features discussed above.
The application is split into three major areas (called Workplaces):
- Recording – in this view, the different instruments are displayed. This view is typically used when the music is recorded.
- Compact Mix – this view combines a smaller Recording view and the mixing sliders. This is ideal for setting rough volume settings.
- Detailed Mix – this view is where the detailed mixing happens. In this view more detailed sliders are displayed as well as Combinations (the new name for Subgroups) and other bits and pieces.
Each of these views should be mapped to keyboard shortcuts such as F5, F6 and F7, as well as large toolbar buttons. With three main views to remember, the application’s learning scope is easier – each of those three views can service most typical needs. Also, these these three views are quicker to access – once the user learns the shortcuts it is quick to navigate around the application. Another key decision has been to avoid using MDI and floating windows. Floating windows are a pain in Cubase and require a lot of mousing around to stop windows obscuring windows. This design avoids all of this.
Getting started
When the application starts[1], it will look something like this:
[1] well, the application will display this interface but no doubt also throw up a getting started wizard
This is the interface with no instruments loaded into it. I have numbered the different parts of the application to describe how it fits together:
- The toolbar. The toolbar contains a limit selection of large essential icons. Unlike other multi-trackers, this one should not bombard the user with a huge number of buttons. When recording music, you are often stood up, holding a guitar and need large buttons that don’t require a great degree of mousing accuracy. This buttons include the Record, Play and Stop buttons as well as buttons to add a New Instrument and the click track.
- The menu bar.
- The timing view. Sat on the toolbar is the timing view which displays the time of the track. This can be flexible. It would be good to not only show the current time but also have different modes such as a bar count and possibly even show the total length of the recording.
- The workspace buttons. These are toggle buttons that toggle between the different workspaces discussed above. Note: each of the icons shown on the buttons would be different, and not the swirls shown here.
- This area contains a few elements. On the left side is the click track time. If the user clicks this box they can change the regularity of the clicks. To the right of this are two long lines. The top line is where markers can be placed to set the start and end point of sections (as an example, if you record a song and want to master it to a single audio file, you would click the line to add a starting marker and also add an ending marker – the song would then be mastered between those two markers). Beneath that line is a grid which notes the times in seconds and minutes or bars. The lines that divide this area are when the click track clicks.
- This is where the workspaces display their views.
- This portion of screen is left clear. This would be good to use for status labels and any essential editing buttons.
When the user clicks on the New Instrument button, they see something such as this:
The user selects on the instrument(s) that they want to add. The term Tracks has been thrown out and and replaced with Instruments. When you record something in a multi-tracker you are always recording from an instrument. This could be a guitar, woodwind instrument, bass, drum, vocal microphone or even a sound effect. By using the term Intrument it not only relates to the language of audio production, but there are some other key benefits:
- A small image of the instrument in question can be placed on the screen and used as a visual cue to see which instrument is which.
- By specifying a specific instrument, certain effects could be set up with some sane defaults. As an example, if you are going to add a vocal instrument, you are likely going to want a specific type of compression (audio compression, not not data compression) and some subtle reverb. Setting these sane defaults provide a great starting point for users to customise these effects. Again, remember, this may not be suitable for everyone, but I suspect it will be suitable for the majority of cases.
When the instrument(s) have been selected, the screen looks like this:
Each instrument that is added to the main part of the screen has some key features:
- The name and image of the instrument is on the left side of the box. This makes the instrument easy to identify. The name is clickable so the user can change the name to something specific to them such as the person who plays the instrument.
- The record toggle button enables that instrument for recording.
- The
Mbutton mutes the instrument and theSbutton turns on Solo, in which only that instrument is heard. Both of these buttons are toggle buttons. - The black wavey line is the volume curve for the track. In most multi-track editors a waveform is displayed, but in reality, a waveform is not really required and simple a volume curve is more attractive and more useful. Note: the area below the volume curve would be shaded a colour, but I could not figure out how to do this in Inkscape.
- The watermarked instruments that appear behind the volume curve help identify the instrument easily.
- The red line is the playhead in the project. The user can click the timing line to move the play head.
To record an instrument the user would do this:
- Tick the small record button on the relevant instrument(s).
- Move the play head to the start of the track and hit the main Record button on the toolbar. As the user plays their piece, the volume curve is updated as it records.
When the piece is recorded, the user can perform a number of adjustments to the instrument track:
- The volume curve can be grabbed with the mouse cursor and dragged to adjust the volume visually.
- Tools can be used to cut the waveform. To do this, a tool is selected (I have not added any of these tools to the interface yet, but they could appear in the space at the bottom of the screen) and the user can click at any position on the instrument. At that point, a cut is made and then the user can select the left or right part of the cut and delete it (by pressing the Delete key) or move it by dragging it with the mouse.
With the piece recorded, the user is likely to want to adjust the volume of the recording. To do this they hit the Compact Mix or Detailed Mix buttons.
Compact Mixing
The idea behind the compact mix workspace is to see both the instrument view and some main fader controls. This view is useful when you need to see the different parts in the different instruments, but also be able to mix them. When you have a complex song with lots of parts that start in different sections of the song, a view like this is essential. This view improves on MDI based tools as there is no obscuring of the instrument or faders – both are clearly in view.
It looks like this:
On the left side, each fader is displayed with three buttons:
- The mute button (
M) – this mutes the instrument. - The solo button (
S) – this mutes everything else but this instrument. - The effect button (
E) – this opens up another dialog to configure effects that are applied to the instrument.
The effects button is used to specify specific effects for that specific instrument. The use of effects is not covered in this document and will be developed later. From a UI perspective all we need to know for now is that when you hit the E button, another dialog box pops up to apply effects. That dialog box will be designed and developed later.
The rectangular region to the right of each fader is a VU meter that displays the current volume when the project is playing. This meter shows in real time the current volume of each instrument and the fader can then be used to adjust the volume. This VU meter should also be coloured like most VU meters with green at the bottom, amber in the middle and red at the top. This colour scheme is an established system in the majority of recording equipment.
To the right of the screen are the (L)eft and (R)ight volume faders and their VU meter. These faders control the general volume for the project.
Detailed Mixing
Detailed mixing occurs when the instruments have been recorded and the parts no longer need to be edited and moved in the instrument view. All that is left is to apply effects and mix the project.
This is what the view looks like:
Inside this view, the instrument faders are visible, and each fader has similar buttons to the Compact Mix view. There are a few differences though:
- The Shape button is used to adjust the EQ of the track. There must be a better word than Equalisation to describe what it does, and Shape was my first reasonable idea. I am sure there is a better word though. Clicking this button opens up another window to set the EQ on the instrument.
- With the sliders taking up the full size of the window, it enables finer grained control of the sliders an more accuracy.
Another feature in the Detailed Mix workspace are Combinations. In the multi-tracking world, there is a concept called Subgroups. The idea behind a subgroup is that you can take a bunch of tracks and assign them to a single fader. As an example, imagine you are recording a drum kit. You have five instruments added, one for each part of the kit. When you have the general volume of the sliders set in relation to each other (such as making sure the snare is louder than the hi-hat), you can then assign those sliders to a subgroup slider. This single subgroup slider then controls the five drum sliders.
In other software, setting up these subgroups is complex, so I have re-thought this. Firstly, lets throw Subgroup out of the window and instead call them Combinations. To create combinations, you click on a menu item and get this dialog box:
On the left side are the current instruments, and the right side are the Combination Sockets. To make the connection, just drag from the instrument to the correct socket, and a line will be displayed to indicate the connection has been made. Simple as that!
Back in the Detailed Mix window, you can see the Combinations sliders on the right hand side. Currently I have not visually indicated in this view which instruments are assigned to which Combination slider. This is something I am currently thinking about.
Conclusion
So there we have it – that is the proposed design for the bulk of what I would consider the perfect recording tool; a tool that combines power, ease of use and good design. I really, really hope that some of the discussions about implementing these ideas goes ahead, and I hope that the implementation uses Python, GStreamer, PyGTK and Glade, so I can contribute where I can.
Naturally, if there are any questions or thoughts, do use the comment box to post them. Thanks.







