SUMMARY: looking for scanner with OCR capability

From: Timothy Jones (tim@boxhill.com)
Date: Thu Jul 09 1992 - 02:49:06 CDT


On June 30th I posted the following request for information:

> I'm looking for information on scanners and accompanying software with OCR
> cabilities that can connect to a Sun, most likely through the SCSI bus. Any
> pointers to products would be greatly appreciated. Please send via email to
> tim@boxhill.com and I'll post a summary if interest is sufficient.

The overwhelming winner in terms of general quality was the Xerox ScanWorx.
Unfortunately, it's also the overwhelmingly most expensive (in the $20K
ballpark).

Other recommendations included PerfectScan software (although it wasn't clear
that they supported OCR) with a Microtek scanner, Calera Recognition Systems
and Apunix.

Thanks to the following people for taking the time to respond.

(Kevin G. Currans) <kcurrans@CORDLEY.ORST.EDU>
Eugene H. Simpson III <Eugene.H.Simpson.III@acenet.auburn.edu>
HP48SX Archive Maintainer <hp48sx@wuarchive.wustl.edu>
John R. Kilheffer <amp19263@garfield.amp.com>
Jonathan C. Davis <Jonathan.C.Davis@acenet.auburn.edu>
Kerien Fitzpatrick <fitz@frc2.frc.ri.cmu.edu>
MIESCH@acsd6.dnet.ge.com
Michael O'Shaughnessy <mikeo@grafnetix.qc.ca>
Mike Raffety <miker@sbcoc.com>
Randy Born <randy@ai.iit.nrc.ca>
Yubert.Fang@west.sun.com (Yubert Fang SE Irvine 714-251-2529)
epacyna@auratek.com (Edward Pacyna)
francis@monod.biol.mcgill.ca (Francis Ouellette)
kalli!kevin@fourx.aus.sun.com (Kevin Sheehan {Consulting Poster Child})
martin@centaur.saic.com (Martin Hobson)
mdl@cypress.com (J. Matt Landrum)
michael@daedalus.ts.go.dlr.de (Michael Klein)
mist@source.nl (Michiel Steltman)
ohnielse@ltf.dth.dk (Ole Holm Nielsen)
par@sirius.aus.sun.com (Paul Riethmuller SE Brisbane)
rick%pgt1@Princeton.EDU (Rick Mott)
toro.MTS.ML.COM!mta%beethoven@uunet.UU.NET (Mike Askew)
ytsuji@cfi.waseda.ac.jp ("Y.Tsuji")

I got many helpful replies, which are included below in no particular order:

*****

I spent more than $1700 and bought Innovatic's OpenReadBeta, one of the very
few OCR software available on hte SUN platform. As you may have read in the
July issue of MacWorld, Innovatic's OCR (ReadRight) is pathetically poor
compared with anything that usually come free of charge with the scanner. The
alternative is clearly the ScanWorx by Xerox Imaging Systems (whose AccuText
comes second or third after OmniPage Pro), but it costs more than $23,000!!
Does anyone know of a better OCR on the sparc platform? BTW, the
OpenReadBeta(or Plus 1.1) cannot be used on a monochrome display, which is its
typical bug.

*****

I guess you should look at the latest UnixWorld, they have a review about this
subject.

*****

Xerox Image System's ScanWorx is the ONLY equipment so far. Other products are
still at the beta test stage and cannot be used for serious purposes
(Apunix's, Pectronics). I personally own Apunix's whose recognition rate is
somewhere between twenty and thirty per cent and Pectronic's is reported to be
worse. ScanWorx can be as good as seventy per cent. (when ordinary material is
fed; the advertizing hype is based on a glossy magazine material whose
printing quality is highest).

*****

Scanners for the Apple Macintosh comes with standard SCSI interface. The OCR
Software is usually bought from 3rd parties. I am not sure if you can find any
for your Sun, but asw you can afford such a machine it will probably be a hell
lot more expensive than it costs for a mac. I think good OCR programs for the
Mac is around $400 list.

*****

We've been using PerfectScan software with a Microtek scanner, both from
Perfect Byte, Inc. We've used this setup for well over a year now with only
minor quirks now and then, and the folks at Perfect Byte are always ready to
assist. The scanner is 400 DPI grayscale and connects via the SCSI bus, and
the software uses the OpenLook interface. It offers scan previewing as well
as various options for adjusting brightness, contrast, etc. We've been very
pleased with the quality and ease of use.

*****

we had exactly the same problems as you have now. We connected a Scanner
(Microtek 600ZS, 600dpi Color) to a Sun4 with SCSI-Interface. It is possible,
but a bad solution. It seems to be a great problem to handle the scanner with
SCSI Interface.

        1. The Scanner seems to be ok, all depends on the software.

        2. We bought the EasyScan and EasyRead Software...
           DON`T MAKE THE THE SAME MISTAKE, FORGET THE EASY-SOFTWARE,
           it is distributed by Pectronics-Corporation.

*****

Take a look at ScanWorX from Xerox Imaging Systems. Very impressive, a little
on the expensive side (can't quote US dollars), but has exactly the
capabilties you're after - Sun SCSI, OCR, and has the ability to output the
scanned text and graphics in a number of popular formats - ASCII, WordPerfect,
FrameMaker, Interleaf.

*****

We bought a Ricoh scanner from Apunix in San Diego, which hangs off the
SCSI bus. It worked very well until the host was upgraded to a SPARC-2
(and the scanner ceased working), but Apunix has promised us a new
device driver Real Soon Now.

We also evaluated an OCR package called OpenRead-Plus resold by
Apunix. It was doing sort of OK on high-quality text, and very poorly
on for xample faxes. It didn't allow input of 8-bit ASCII during the
correction phase, so we didn't buy that software !

*****

Call Xerox Imaging Systems. Product: ScanWorX. Is a SUN, Openwindows / Motif /
Sunview high-end volume ICR (Intelligent OCR) system. Uses SCSI, time 4.3
secs. to scan a page, a few more to convert.

Delivers text in INterleaf, Frame, Wordperfect, ASCII. Language support: uses
dictionary. Nice feature: interactive verifier.

Great product!

*****

We have been using a SCSI based scanner from Xerox called the ScanWorx
scanner. We installed it in July of 1991 and have had no problems at all with
it since then. We typically scan text, graphics, and photos directly into
Interleaf and many different users use the scanner, all successfully. If I
needed another one of these I would definitely purchase another ScanWorx
scanner. It has been one of the few brand new products we have had no trouble
at all with.

*****

Contact APUNIX. In calif. I don't have their address right now,
but they advertize in sun_expert, and other sun/unix rags.

*****

I know Xerox has one that is jam up. It's fairly expensive though. You can
call Don Purkey at Highland Digital if your interested.

415-493-8550

*****

Tim, your Sun rep can get you a copy of an internal publication called "Sun
Microsystems, Inc. Input/Output Devices Portfolio". Among other things it
lists image and OCR scanners. I suggest you get the most recent copy of that
document (mine is dated March 1990). Through that document and a helluva lot
of follow-up we selected a Microtek MSF400GS image scanner. It's a 400 dpi
SCSI box with Open Windows software. It does an excellent job on bringing in
photos and artwork. The phone number I have is (213) 321-2121. We seamlessly
move scanned images into FrameMaker and Island docs, and it has performed
without problems. My techs have had fairly good support from Microtek, also.

For text, we already had an HP ScanJet plus (300 dpi) running on a PC. We
simply connected the PC to our local net and can now move scanned text
anywhere and into any application. I feel sure that the Microtek will handle
text, but we've had no reason to worry about it, since all of our clerical
staff were familiar with the HP and its DOS Windows interface. Good luck.
Hope this helps.

*****

I think that there is a review of scanner technology in Unix World this month.

*****

       Aurora Technologies, Inc. -- 176 Second Ave. -- Waltham MA 02154
       (617)290-4800 voice (617)290-4844 fax

                        FirstScan Image Capture Product
                                                                        p. 1/3
                          D a t a S h e e t

The FirstScan(TM) product allows SPARC(R) workstation users to capture high
quality images from various sources, such as photographs, line art, and other
graphics. FirstScan includes all the software and hardware required to
interface the Hewlett Packard ScanJet(TM) scanner to a SPARC workstation. The
user contols the scanning operation through a highly visual, easy-to-use,
OpenWindows(TM) push-button interface. For scanning multiple pages in batches,
a command-line interface is also included. Now inexpensive ASCII terminals can
be used to scan pages, and set up shell scripts for unattended scanner
operation. Scanned images can be saved in TIFF, PostScript, and Sun raster
formats. These image files can then be imported into a variety of software
products for image editing, OCR text image-to-ASCII conversion, and inclusion
in documents.

The FirstScan product includes a one year limited warranty and comes complete
with:

o FirstScan Application - provides the operator with the tools needed to
  achieve the best results with the HP ScanJet scanner.
o Aurora SBus Scanner Interface Card - provides the bi-directional, parallel
  interface needed by the scanner. The associated SunOS(TM) device driver
  software automatically installs when the application is loaded.
o Complete Documentation - the User's Manual includes a step-by-step
  introduction to the scanning process, and helpful scanning hints. Full
  hardware and software installation instructions are also included.
  All software is provided on one 3.5" floppy.

FirstScan Application Scanning Control and Operational Features

- `Pre-scan' function - for quickly viewing and making adjustments to scanned
   artwork prior to final scan.

- Match scanning resolution to your output device - selectable standard (75,
  150, 300 dots-per-inch) as well as custom vertical and horizontal options
  (up to 1500 dpi) are provided for high quality output.

- Capture only the image area you want - interactive rubber-banding rectangle
  for selection of scan area.

- Control the image appearance - adjustable image contrast (soften/sharpen) and
  intensity (darken/lighten).

- Unattended operation - now you can automatically scan-in many images using
  batch-mode control of the scanning process and the fully supported ScanJet
  Automatic Document Feeder.

- Full scanning control using terminals - using the command line interface, all
  of the control FirstScan offers under OpenWindows is available from standard,
  low-cost ASCII terminals.

- Save scanned images in popular file formats - TIFF, PostScript or Sun Raster
  file formats are supported.

- Choose the proper tone - selectable resolution sensitivity; from coarse
  dither to 8-bit grayscale (256 grays).

- Scale scanned images to the size you need - images can be scaled from 4% to
  200% in 1% increments.

- Help when you need it - built-in on-line help facility to quickly answer your
  questions.

- Test your scanner - run the ScanJet diagnostics from within the FirstScan
  application.

- Calera OCR Engine compatibility - document images saved in the TIFF file
  format are compatible with products that include Calera's OCR Engine.

System Requirements

- SPARC SBus system running SunOS 4.1, or later, and OpenWindows 2.0, or
  later. All FirstScan software is distributed on a single 3.5" floppy disk.

- Hewlett Packard ScanJet or ScanJet Plus.

- Optional - ScanJet Plus Automatic Document Feeder for unattended, bulk image,
  document image capture.

- Optional - OCR software from third party vendors.

Also Included

**Developer's Toolkit - Now developers/OEMs can incorporate full FirstScan
software functionality in their own applications; resume processing,
image-based document archiving and retrieval, image databases, and others. The
FirstScan Developer's Toolkit allows you to easily incorporate a complete
imaging solution that is fully compatible with the popular ScanJet scanner and
Calera OCR Engine. The Developers Kit includes everything needed to quickly
integrate image capture functionality into your new or existing product. Call
for more information regarding the FirstScan Developer's Kit.

*****

Contact Xerox Imaging Systems for information on their Scanworx products.
The last phone number I had on file for them was 617-864-4700. Be
advised that their product is quite expensive - figure on thousands of
dollars.

Another source is Calera Recognition Systems, who manufacture both
software and hardware for OCR conversion. They can be reached at 408-720-8300.

*****

The best one I've found, if you need a GOOD scanner, is the
Xerox/Kurzweil scanner; there's a great X11 GUI, and it does a very
good job of recognizing the print. It'll even output FrameMaker MIF
files with all the formatting information intact! It's 200/300/400
dpi, and has an automatic document feeder. Runs around $17,000.

*****

We use a HP ScanJet Plus with a software package from a company called
Mentalix. The software is called !PixelFX (or it might be
PixelView....we've been using it for around 2 years initially from
beta). The software supports other scanners besides HP. A loadable
device driver is provided for connecting the HP via the SCSI bus.

*****

We have a scanner with OCR software from Xerox Imaging Systems. The
software is called ScanWorX. It does ICR (Intelligent Char Recognition)
using Artificial Intelligence. Very good results! The scanner is 400dpi
with a 50 page automatic document feeder and an A3 sized flatbed. It scans
at a rate of 4 seconds/page. The scanner connects to a Sun via SCSI cable.

The software and scanner are usually bundled together for somewhere in the
$20-25K range (ballpark). The software is very powerful and versatile, and
can easily be configured for virtually unassisted ICR of numerous documents.
Among its features are: Preview (allows you to do some pre-processing such
as defining zones to be ICR'd, images, zoom in/out to check raster image
quality, load/save templates); Feedback (shows you the ICR in progress);
Verify (allows you to confirm/correct what the program thought a group of
characters really was); portrait/landscape input; multi-column documents,
single/double sided documents. The ICR'd document can be stored in a
variety of word processing formats such as FrameMaker, Interleaf, Word
Perfect, plain ascii, etc. It will handle images on the same page and
store them as TIFF, Sun Rasterfile, etc.

The licensing for the software is on a floating license basis, so you could
buy one scanner and, say four licenses of the software. Then four users
on your network could be processing documents with the scanner or TIFF images
as the input simultaneously. Something I do often is use a HP ScanJet on a
PC to scan in documents, save them as TIFF, then transfer them to the Sun to
do the ICR with ScanWorX.

*****

We presently have a Microtek Scanners connected to a sun3/60 but it has NO
OCR capability and this model ( MSF-400G) is not supported under and SunOS
above 4.03. Its only three years old, less than 1000 scans and no its obsolete!

Any ways, its has a few software bugs ( ScanVu )but otherwise worked fine on
our network for sacanning and the 400 dpi was good. They have newer and
susposedly better scnanners now so perhaps give them a call.

We looked at a scanner that fits your requirements ( ScanWorX ) from Xerox
Imaging Systems but I beleive the price tag for the system was > $20K

Anyways here are the addresses for the above two scanners:

        Microtek
        680 Knox Street
        Torrance, CA 90502
        (213) 321-2121

        XIS
        1-800-248-6550 ( for spec sheet or to arrange a demo )

        185 Albany Street
        Cambridge, MA 02139
        (617) 864-4700

*****

In case no one has mentioned it yet, you should check out the ScanWorx
product from Xerox Imaging Systems. The Scanworx software has one of
the highest accuracy rates in the OCR industry and is also quite fast.

The Scanner and software used to be know as the Kurzweil OCR scanner
until Xerox decided to resell it. This scanner can scan directly into
Interleaf (formatted) as well as Frame (formatted), and ASCII.

*****

We have been quite happy with Apunix's scanners and their other software,
although we have not tried out the OCR capability they claim to have. The
SCSI driver works and works well. 800-8AP-UNIX, or mail to Debbie:
deb@apunix.com.

*****

Any of the current crop of hi-res scanners will create bits just fine,
like the Sharp and Canon scanners. What you need is something like
the software from Calera to go over the bits for you.

---
Timothy Jones                                           e-mail: tim@boxhill.com
Box Hill Systems Corporation                                voice: 212-989-4455
161 Avenue of the Americas, New York, NY, 10013               fax: 212-989-6817



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:44 CDT