Archive

Archive for the ‘digital libraries’ Category

first academic conference and presentation: jcdl 2010

June 30th, 2010 Kyle Williams No comments

I recently got back from Australia where I presented my first published conference paper at an academic conference. The conference was the ACM/IEEE Joint Conference on Digital Libraries (JCDL) which ran concurrently with the International Conference on Asian Digital Libraries (ICADL). I had one paper accepted at JCDL and another accepted at ICADL. The conferences took place in Surfer’s Paradise, Gold Coast, Australia.

The paper at JCDL was called Translating Handwritten Bushman Texts and is available via ACM here or via my institutional repository here. The paper for ICADL was called A Visual Dictionary for an Extinct Language and is available via Springer here or via my institutional repository here.

It was a great experience! Not only did I get to see and hear what cutting edge research was being performed by leading researchers in the field of digital libraries, but I also got to meet and interact with many of them and form some new connections at various Universities around the world. I also got to put faces to many of the papers I have been reading for most of this year :)

The main things that I took away from the conference were:

  • Computer science does not always need to be technical, but can also be philosophical and have social implications.
  • Researchers are generally interested in what others have to say.
  • Researchers contextualise the research of others and fit it in with their research.
  • Computer scientists are not simply nerds, but also like to have fun (but everyone already knows that)!

Unfortunately, I didn’t really have time to explore the Gold Coast since I was only there for 4 days, but I did get to go to the Outback Spectacular show (sort of Australia’s Wild West) and got to see family which recently immigrated to Australia a few years ago, as well as a cousin who lives in the United States, who I hadn’t seen in 7 years, and who just happened to be in Australia at the same time as me.

I would like to thank my supervisor A/Prof. Hussein Suleman for assisting me in writing the articles, as well as my co-authors Sanvir Manilal and Lebogang Molwantoa. I would also like to thank the Department of Computer Science at the University of Cape Town for funding my trip, and lastly, the JCDL and ICADL reviewers who liked my papers and got them accepted into the conferences.

boldproject: bold translator overview

October 2nd, 2009 Kyle Williams No comments

A few weeks ago I wrote a post which introduced the BOLD Project. Well, a lot has happened since then and this post gives an overview of the translation system which I am building.

BOLD Translator Overview

BOLD Translator Overview

The translator is split into three parts:

  1. The preprocessor
  2. The user input
  3. The matcher

The Preprocessor

The preprocessor is called as soon as an image is inserted into the repository. The preprocessor works by first segmenting the Bushman words in the dictionary. It does this by exploiting the known fact that every Bushman word on a page is underlined by a solid black line. Once the Bushman words have been segmented, specific features are extracted from them and these features are stored in inverted files. Once the features have been extracted then the orifinal image, the segmented words and the inverted files are all stored in the repository.

The User Input

The user who is accessing the Bleek & Lloyd notebooks uses a tool to select a specific word on a page which then becomes known as the key. The same features which were extracted from each word in the preprocessor are extracted from the key. These features along with the key image will be used later for matching.

The Matcher

The matcher starts by taking the features belonging to the key and finding images with the same features in the inverted files. For each feature match, the score of the image which matched increases. At the end of all the feature comparisons, the images with the highest scores are returned. At this stage there may be some images with the same or similar scores, so to resolve this clash the matcher performs a more intensive comparison between the key and the images with the highest score. Based on the result of this comparison, the most likely match is returned.

So that’s how the BOLD Translator works. Ultimately it is a framework which means that it will be designed such that anyone can adapt it and make use of it by plugging their own algorithms into each of the specific parts. In the next day or two I will blog about the actual work that has been done on the system up to now as well as show some of the results that the translator returns at this point.

boldproject: introducing the bold project

September 9th, 2009 Kyle Williams No comments

BOLD LogoThe BOLD (Bushman On-Line Dictionary) Project is an honours project being worked on by myself and two colleagues, Sanvir Manilal and Lebogang Molwantoa and is supervised by Dr. Suleman. Together we are creating an online visual dictionary based on about 40 000 scanned images of dictionary pages which form part of the Bleek & LLoyd Collection. The dictionary pages contain an English word and the bushman translation(s) of the word. The goal of the project is to create a useable on-line visual dictionary which researchers around the world can make use of to find out more about bushman culture and bushman language.

The project has been split into three separate parts:

Part 1: Archive Management – Lebogang Molwantoa

This part involves the setting up and building of the archive, including developing administrative tools for managing the repository.

Part 2: Searching and Browsing – Sanvir Manilal

This part involves the way in which end users interact with and make use of the dictionary.

Part 3: Image Based Translation – Kyle Williams

This part involves using the Bushman dictionary to translate Bushman words in the existing Bleek & Lloyd Collection on the fly.

I’ll make use of this blog to provide updates on how development is going, as well as to document techniques I develop – with the idea being that:

  1. I document them for my own use
  2. They’re out there for use by other people who may be working on similar projects.

Wish us luck!

A page from the bushman dictionary

A page from the bushman dictionary*

* This image is not available under the same CC license as the rest of this blog. For more information about this image please visit http://lloydbleekcollection.cs.uct.ac.za