Archive

Archive for the ‘bold’ Category

boldproject: bold translator overview

October 2nd, 2009 Kyle Williams No comments

A few weeks ago I wrote a post which introduced the BOLD Project. Well, a lot has happened since then and this post gives an overview of the translation system which I am building.

BOLD Translator Overview

BOLD Translator Overview

The translator is split into three parts:

  1. The preprocessor
  2. The user input
  3. The matcher

The Preprocessor

The preprocessor is called as soon as an image is inserted into the repository. The preprocessor works by first segmenting the Bushman words in the dictionary. It does this by exploiting the known fact that every Bushman word on a page is underlined by a solid black line. Once the Bushman words have been segmented, specific features are extracted from them and these features are stored in inverted files. Once the features have been extracted then the orifinal image, the segmented words and the inverted files are all stored in the repository.

The User Input

The user who is accessing the Bleek & Lloyd notebooks uses a tool to select a specific word on a page which then becomes known as the key. The same features which were extracted from each word in the preprocessor are extracted from the key. These features along with the key image will be used later for matching.

The Matcher

The matcher starts by taking the features belonging to the key and finding images with the same features in the inverted files. For each feature match, the score of the image which matched increases. At the end of all the feature comparisons, the images with the highest scores are returned. At this stage there may be some images with the same or similar scores, so to resolve this clash the matcher performs a more intensive comparison between the key and the images with the highest score. Based on the result of this comparison, the most likely match is returned.

So that’s how the BOLD Translator works. Ultimately it is a framework which means that it will be designed such that anyone can adapt it and make use of it by plugging their own algorithms into each of the specific parts. In the next day or two I will blog about the actual work that has been done on the system up to now as well as show some of the results that the translator returns at this point.

boldproject: introducing the bold project

September 9th, 2009 Kyle Williams No comments

BOLD LogoThe BOLD (Bushman On-Line Dictionary) Project is an honours project being worked on by myself and two colleagues, Sanvir Manilal and Lebogang Molwantoa and is supervised by Dr. Suleman. Together we are creating an online visual dictionary based on about 40 000 scanned images of dictionary pages which form part of the Bleek & LLoyd Collection. The dictionary pages contain an English word and the bushman translation(s) of the word. The goal of the project is to create a useable on-line visual dictionary which researchers around the world can make use of to find out more about bushman culture and bushman language.

The project has been split into three separate parts:

Part 1: Archive Management – Lebogang Molwantoa

This part involves the setting up and building of the archive, including developing administrative tools for managing the repository.

Part 2: Searching and Browsing – Sanvir Manilal

This part involves the way in which end users interact with and make use of the dictionary.

Part 3: Image Based Translation – Kyle Williams

This part involves using the Bushman dictionary to translate Bushman words in the existing Bleek & Lloyd Collection on the fly.

I’ll make use of this blog to provide updates on how development is going, as well as to document techniques I develop – with the idea being that:

  1. I document them for my own use
  2. They’re out there for use by other people who may be working on similar projects.

Wish us luck!

A page from the bushman dictionary

A page from the bushman dictionary*

* This image is not available under the same CC license as the rest of this blog. For more information about this image please visit http://lloydbleekcollection.cs.uct.ac.za