Download Files Async With Gio And Python

Recently I asked for some help on how to download a file without blocking the GUI. Thanks to everyone who contributed their expertise in the post comments: I now have my program working great.

I wanted to now share my conclusions so that others can benefit from them too. To do this I am going to first explain how this works, and secondly I have created a Python Snippet and added it to the Python Snippets library so there is a great working example you folks can play with. You can use Acire to load the snippet and play with it. This is the first gio snippet, and I hope there will be many more. :-)

The goal I set out with was to download a file without freezing the GUI. This was somewhat inspired from a recent Shot Of Jaq shot that we did on async programming, and I used this app as a good example to play with. Typically I had downloaded files the manual way and this had blocked my GUI hard, but I was aware that this is exactly what gio, part of the GNOME platform is here to solve.

The way async basically works is that you kick off an operation and then you wait for confirmation of the result before you proceed. It is the opposite of procedural programming: you don’t kick off an operation and in the next line process it. When you do things the async way, you start an operation and then tell it what callback should be called when it is complete. It feels very event-driven: kind of how you connect a handler to a signal in a widget so that when that signal is generated, the handler is called.

When I started playing with this the docs insinuated that read_async() and read_finish() were what I needed to use. I started off with code that looked a little like this:

def download_latest_shot(self):
    audiourl = "http://....the url to the Ogg file...."

    self.shot_stream = gio.File(audiourl)
    self.shot_stream.read_async(self.download_latest_shot_complete)

It then calls this callback:

def download_latest_shot_complete(self, gdaemonfile, result):
    f = self.shot_stream.read_finish(result).read()

    outputfile = open("/home/jono/Desktop/shot.ogg","w")
    outputfile.writelines(f)

After some helpful notes from the GNOME community, it turned out that what I really needed to use was load_contents_async() to download the full content of the file (read_async() merely kicks off a read operation) and load_contents_finish() as the callback that is called when it is complete. This worked great for me.

As such, here is the snippet which I have added to the Python Snippets library which downloads the Ubuntu HTML index page, shows it in a GUI without blocking it and writes it to the disk:

#!/usr/bin/env python
#
# [SNIPPET_NAME: Download a file asynchronously]
# [SNIPPET_CATEGORIES: GIO]
# [SNIPPET_DESCRIPTION: Download a file async (useful for not blocking the GUI)]
# [SNIPPET_AUTHOR: Jono Bacon <jono@ubuntu.com>]
# [SNIPPET_LICENSE: GPL]

import gio, gtk, os

# Downloading a file in an async way is a great way of not blocking a GUI. This snippet will show a simple GUI and
# download the main HTML file from ubuntu.com without blocking the GUI. You will see the dialog appear with no content
# and when the content has downloaded, the GUI will be refreshed. This snippet also writes the content to the home
# directory as pythonsnippetexample-ubuntuwebsite.html.

# To download in an async way you kick off the download and when it is complete, another callback is called to process
# it (namely, display it in the window and write it to the disk). This separation means you can download large files and
# not block the GUI if needed. 

class Example(object):
    def download_file(self, data, url):
        """Download the file using gio"""

        # create a gio stream and download the URL passed to the method
        self.stream = gio.File(url)

        # there are two methods of downloading content: load_contents_async and read_async. Here we use load_contents_async as it
        # downloads the full contents of the file, which is what we want. We pass it a method to be called when the download has
        # complete: in this case, self.download_file_complete
        self.stream.load_contents_async(self.download_file_complete)

    def download_file_complete(self, gdaemonfile, result):
        """Method called after the file has downloaded"""

        # the result from the download is actually a tuple with three elements. The first element is the actual content
        # so let's grab that
        content = self.stream.load_contents_finish(result)[0]

        # update the label with the content
        label.set_text(content)

        # let's now save the content to the user's home directory
        outputfile = open(os.path.expanduser('~') + "/pythonsnippetexample-ubuntuwebsite.html","w")
        outputfile.write(content)

ex = Example()

dial = gtk.Dialog()
label = gtk.Label()
dial.action_area.pack_start(label)
label.show_all()
label.connect('realize', ex.download_file, "http://www.ubuntu.com")
dial.run()

I am still pretty new to this, and I am sure there is plenty that can be improved in the snippet, so feel free submit a merge request if you would like to improve it. Hope this helps!

  • http://blogs.gnome.org/jessevdk Jesse van den Kieboom

    Of course you do realize that right now your writing to disk is done sync, and not async :) You could try the ‘splice’ API to write an input stream to an output stream.

  • jono

    Well, yeah…but I figured the write is pretty quick. I still need to port that to be async. :-)

  • http://launchpad.net/~dobey Rodney Dawes

    GIO is nice, but it’s not always what you want to use, as it’s not designed to be an HTTP library, but a filesystem abstraction. In this sense, it’s usually ok to use it to copy from one location to another.

    However, if to download a file, you need to do more complicated things with HTTP headers and auth, you will probably want to use libsoup (or httplib(2?) with GIOChannels).

    Also, I know people are oft afraid to use threads, because they’re “difficult” to get right and all, but using threads in Python is pretty trivial (even with glib/gtk+), and will make it easier to do the write to disk async as well.

  • http://twitter.com/weberdc Derek

    I like the example: very straightforward. I agree that asynchronously writing it to disk might be nice, but that could go in a different snippet.

    One thing that I wonder is how many characters you have per line? Is it 80? The code is a little truncated on your website (though it’d be fine in Acire, of course).

    Glad you figured it out. :o)

  • http://joeshaw.org Joe Shaw

    GIO has always felt a little awkward to me, and it seems especially so when translated into Python.

    I’m rather fond of the Node.js way, but maybe it’s because I love Javascript’s nestable function literals:

    var fs = require("fs"); var http = require("http");

    var client = http.createClient(80, "google.com"); var request = client.request("GET", "/", {"host": "google.com"}); var file = fs.createWriteStream("google.html", { flags: "w", mode: 0644 });

    request.addListener("response", function(response) { response.addListener("data", function(chunk) { file.write(chunk); }); response.addListener("end", function() { file.close(); }); }); request.close();

    The beauty part about this is that it’s entirely async — both reads via HTTP and writes to the file system. (Although this one ignores any errors, never mind about that. :)

    And, well, Node doesn’t integrate with GTK.

  • http://joeshaw.org Joe Shaw

    Oh well, that code formatting got butchered…

    http://gist.github.com/333511

  • http://joeshaw.org Joe Shaw

    One final comment: if you are using load_contents(), you are storing the entire contents of the file in memory. This is not a big deal if you are downloading a 1 or even a 100 kilobyte file, but if you’re talking about several megabytes, that is a considerable waste.

  • hb

    Is there a way to monitor % of download ?