User Interface Design

User Interface Design

This page will contain a little bit of information that may help people write new user interfaces for SOCR. The following information is gleaned out of messages I sent to Reggie.

To talk to libSOCR you need to allocate a SOCR_doc document structure that keeps track of each page and the zones that have been drawn on each page. The SOCR_zone type allows creation/deletion/movement of the zones.

As a user selects a series of files the SOCR_doc::add_page() function is called, and then as the user draws zones on each page the add_zone() function is called. The bitmap isn’t stored in the SOCR_doc structure to make it possible to process thousands of pages.

The KDE interface allocates memory for a iconic preview of each page, and when the page is about to be processed, the image is loaded into memory and copied into a SOCR_pixmap structure.

To do the OCR, a SOCR_ocr object is created and the SOCR_doc and SOCR_pixmap are passed to it. The resulting OCR’d text is returned as a string.

    /* allocate the document structure */
    SOCR_doc doc;

    doc.add_page();
    /* add a zone */
    doc.page(0).add_zone(10,11,200,201);
    doc.page(0).add_zone(50,81,100,151);

    doc.add_page();
    doc.page(1).add_zone(1,2,240,101);

    /* allocate the image */
    SOCR_pixmap im;

    for(j=0;j<h;j++)
      for(i=0;i<w;i++)
        im.put(i,j,_getpixel(i,j))

    /* do the OCR on a page at a time*/
    SOCR_ocr ocr;
    string output;

    ocr.read(im,doc, output);

    cout << output;

The pixmap structure

The SOCR_pixmap class is a simple wrapper to an 8 bpp greyscale image. If you load in a b/w image, copy the values to SOCR_pixmap as 0==black, 255==white. For greyscale pixels use values in this range. If you use the SOCR_pixmap::put(x,y, r,g,b) method for colour images, this will use the YIQ conversion to greyscale.

Good things a UI will have

Here are a few thoughts about what a simple UI may have. If you are developing a plugin to another application, not all of these may make sense. Write to Stuart with any ideas.

  • Input of lots of different types of images: .tiff .png .gif .pnm. SOCR_pixmap doesn’t support reading/writing so the UI must deal with that.
  • Input from scanners (probably linking with SANE)
  • Has “preview of pages”, “actual page to process” and “output text” regions in the UI. See the SOCR/KDE screenshot.
  • Low memory footprint for multiple pages
  • Allows user to draw zones to process (and move/del/etc)
  • Different zooming options (page width/page height/400% etc)
  • The “hand” scrolling option for moving around a zoomed in page
  • Allow loading of multiple pages (think 200) easily
  • Doesn’t have to wait until the user clicks on the “OCR” button to copy to SOCR_pixmap. It could be doing this during idle time. Then clicking the OCR button would be snappier.
  • The output text area should be able to support different colours, binding mouse actions (ie. clicking on a word moves the bitmap to that region) and (hopefully) supports spell checking.