History of SOCR
SOCR was first started in 1994. I’ve always had an interest to develop a free OCR system, and had my first attempt during a six month visit to Calgary University in 1994. This first system used bitmap subsampling, and that’s all! It was an isolated character recogniser that used C4.5 to generate rules. The accuracy was very good for isolated Courier fonts, but terrible on anything else.
Over the next few years at the University of Waikato (in New Zealand) I developed the WEKA machine learning workbench and associated tools. The WEKA team introduced the ARFF file format, and an internal representation that is common to lots of different machine learning methods.
Since March 1997, I have been paid as a research programmer (while I finish my Ph.D.) and I’ve been allowed to work on SOCR for about a day a week. As of 1st July 1998 the Department of Computer Science at the University of Waikato, New Zealand is paying me to develop SOCR fulltime.
SOCR System
SOCR is composed of the following components:
Image library
Arff library
Machine learning schemes
Image to line routines
Line to word/character routines
Feature extraction routines (from characters)
Language model
User Interfaces (KDE/gtk/command line)