ixtract version 3.6 was released today, featuring many new features, enhancements and fixes.
Among the new features are:
Enhanced recognition quality
Better plugin support for special requirements
Better image pre-processing and noise removal
Better reporting of productivity
Support for new EDMS XML export format
Multi-level verification capability added
ixCheckerCL now supports single-field or all-field verification views
Fixes to improve speed and reliability for high-volume processing
Internal review fixes to various modules
Existing customers under support agreements, where the features are compatible with customizations and custom enhancements, and who would benefit from the new versions will receive free upgrades delivered electronically. Customers without ongoing support agreements, contact your reseller or salesperson to receive an upgrade.
ICT04: Auto-categorization of Unstructured Data
Track:Information Classification, Taxonomies & Metadata Management
Wednesday, March 05, 2008
Speaker: Milind Joshi , CTO, IDEA TECHNOSOFT INC.
Update 10 March 2008: A packed standing-room only audience attended the conference session at a time when they had at least 4 other sessions to choose from at the conference. Feedback received indicates that participants benefited from the technology overview and non-partisan information presented at the session.
Classifying unstructured content is a very difficult problem, especially if your content is unstructured and you have high volumes of data. Most electronic document management systems force users to "tag" and "index" documents as they go in – users rarely have enough time to do that, so they take shortcuts or make mistakes. Using automatic classification, keyword extraction, and summarization tools helps make it easier to put content away. Capturing incoming content correctly helps make it easier to find.
Current practices for content classification and retrieval, and problems associated with them.
The concepts and technology behind extracting, tagging, classifying, indexing and finding content easily.
Using existing technologies and avoiding vendor dependence when implementing your own content classification and information retrieval system, and avoiding common pitfalls.
You have your RDBMS, and your content search engine. Your team or consultant even built a cool taxonomy. Now what? How do you get all that content into a form that gets you what your users are looking for, and fast. How do you get beyond the "X million search matches found"?
Problems associated with database queries & full-text search, and the current script-kitten approaches that pass for "state-of-the-art".
Machine learning & computational linguistics techniques for getting content into a hybrid "findable" content database.
Practical tools and techniques to apply that technology without investing too much money or becoming dependent on one vendor.
Seeing beyond the jargon of taxonomies and ontologies.
Focus on the core value of information to the searcher, and getting that information to your user, fast.
ixNamer v3.2 released (Jun 15, 2007)
IDEA releases the latest version of its ixNamer software.
ixNamer is a post-scan software to automatically name scanned images according to sophisticated rules based on Barcode or Character Recognition Results, including handprint characters, and other image heuristics.
For companies that scan hundreds or thousands of images of documents that are similar to each other, like invoices, work orders, work tickets, delivery challans, waybills, contracts, etc., finding them afterwards easily enough is a big problem. Most document management systems force users to manually type in index fields which would then be used to retrieve these documents. This is a time-consuming task. Some DMS software products allow some OCR, but the quality of the OCR results leaves a lot of manual efforts on the part of operators.
ixNamer eliminates this drudgery by allowing to specify very detailed rules, applying the latest noise removal, layout detection, and recognition technologies, and gives high quality results, enabling a high degree of automation. In addition, the product is independent of the scanner used, or the EDMS deployed.
In a recent customer deployment, ixNamer succeeded in naming over 95% of all scanned images automatically, needing user input for less than 5% of images, and that too mainly because
Some images were not of the right type - junk images
Some images had extensive scribbling on the relevant portion of the form
Even human operators had trouble reading the text on the originals
The relevant portions of the paper was clipped out by punch holes
The enhancements in this version are:
Support for a hands-off Windows NT Service-type interface, and a Windows UI
Easy installation, less than 10 minutes to install once software has been customized
Support for Windows and Linux operating systems
Ability to process machine print, hand print, and barcode
Support for import into many major EDMS softwares, post naming.
Support for across-the-network scanning, and more than 1 scanner
Our CTO, Milind Joshi, delivered an educational presentation to attendees who want to learn more about categorizing unstructured data and extracting useful information from it.
The presentation explored current practices, the scientific research and technology around unstructured content, and cover some techniques that can be used to work with unstructured documents with a practical focus that attendees could use in deploying similar systems in their own organizations.
No specific products, services, or technologies were endorsed in the presentation, as it was intended to be educational in nature.
When: Wednesday April 18, 2007, 1:30 - 2:30 PM
Conference: AIIM Conference
Track: Managing Unstructured Data
Where: Boston Convention Centre, Boston, 415 Summer St, Boston, MA, USA
IDEA Commences 4th
Year of Operation in Canada (Oct 15, 2006)
Settling in
for the long haul
IDEA recently commenced the 4th year of its operations in Canada. This
achievement is all thanks to our Valued Customers, Canada-wide channel
partners, employees and associates. The core IDEA team has been in
business under various names and organizational structures for over 8
years internationally.
This is a landmark for everyone working with IDEA, because, according
to the Innovation Synergy Centre in Markham Ontario, the rate of
survival of micro-enterprises (0-5), which IDEA started as, is as low
as 50% in Ontario, as low as 37% in other parts of Canada, and as less
as 69% for other 5-99 employee companies.
What does this mean for our customers and partners?
It means that IDEA TECHNOSOFT has set down a strong basic
infrastructure, a sound business growth strategy with sustainable
long-term growth. It means that IDEA's products and services will
continue to be supported in the longer term, and in continuous
innovation and new development
In the following year, customers and partners can look forward to
several new and exciting product launches, upgrades to existing
products and a whole lot of exciting technologies brought to the market
in solutions that are simple to deploy and operate.
IDEA announced the release of its
latest version 3.1 of Software for Network-wide Text Recognition of
scanned images.
ixTexter NTS, a component of the ixtract family of products, can run
either as a scriptable command-line, or a Windows NT Service. In
service mode, it works as a Recognition Server, polls a set of folders
periodically. When images in pre-configured image formats are found,
they are recognized with OCR engines and exported to various formats
such as Searchable PDF, XML, RTF, CSV, XLS, TXT, and other
special-purpose formats, and recognition results written either to the
same path or a separately configured path. The original image files can
then be archived to a separate folder, renamed, or deleted when no
longer required.
Designed as an OCR-engine independent framework, it can deploy one of
more OCR/ICR/Barcode engines and perform a complex series of steps to
every incoming image. The advantage is that this product can ship with
best-in-class engines for that particular application or requirement.
ixTexter NTS can interface with MFU scanners as well as dedicated
Document Scanners, and can process either color, grayscale, or
black-and-white images.
One of the major advantages of this product is that systems
administrators and IT staff no longer have to install OCR software on
every workstation or desktop in the organization. Users simply scan
their images in as TIFF, JPEG, or over 30 different formats, and the
software automatically converts those images to Searchable Text-based
formats, ready for digital indexing, search & retrieval.
Centralized control means that logging and troubleshooting is
centralized, reducing the total cost of deployment and ownership. Most
parameters can be controlled by IT staff from a simple config file.
Pricing depends on the total number of images to be processed per
month, the number of different engines that are required for document
processing, total number of installed servers, and any special
customization required, but can save as much as 65% over individually
installed OCR software in initial deployment and training costs alone,
and over 75% in administration and maintanence costs.
IDEA announced the Certification of the following models of Panasonic
Document Scanners for use with our Data Capture Software.
After thorough testing and analysis, we are happy to recommend them to
our quality-conscious customers.
Panasonic KV-S3065CL
Panasonic KV-S2046C
Panasonic KV-2026
As we develop custom High-Speed, High-Accuracy Data capture Solutions,
we were on the lookout for scanners that have all the features that our
applications could use to deliver high-quality output. The scanners
were tested for speed and image quality under the following different
combinations of criteria:
Thick and thin paper, varying paper sizes
USB 2.0 as well as USB 1.1 used with older PCs
Severe Skew and Paper Jams
Color as well as B/W scanning
With and without Panasonic's RTIV and Image Capture Software
Documents that were most likely to get jammed
Overstuffing the Automatic Document Feeder
With and without auto-crop, deskew, noise reduction, color
conversion, various file formats and compression algorithms
We tested all the above scanners over several weeks, and put them
through loads that were beyond the official rated loads of each
machine, and found that the machines handled increased workloads very
well.
We found that these scanners are extremely ruggedly built, and easy to
maintain. Installation was a fairly simple process, and the scanners
worked well with other scanning programs as well.
Each of the above scanners is good for its own set of applications, and
we would recommend the appropriate model for different needs. The 2026
is really good for small workgroup or personal scanning, and the 3065
is excellent for production scanning. However, different models and
variations are possible for different needs.
IDEA TECHNOSOFT is an independent Software Developer, Consultant,
and Systems Integrator not affiliated to any Panasonic group company.
The above is a general review, and not a specific recommendation for
any purpose or use. For informational use only
IDEA TECHNOSOFT announced the release of the latest version of their
Forms Processing Software ixtract,
featuring several enhancements, bugfixes, and general improvements.
The improvements were a result of feedback and requests from our valued
customers.
Upto 20% improvements in OCR/ICR Quality
Upto 15 reduction in Manual verification time
Memory errors in .NET to C-runtime marshalling fixed
UI bugs fixed
Added special command to enable all agents at once
ixtract is a product under active research & development, with new
features and enhancements being added on
a regular basis.
IDEA TECHNOSOFT today announced the release of the first version of
their Linux Software for Forms Recognition.
Also available under Windows, ixtractCL enables clients to integrate
powerful Forms Recognition functionality into their existing
User-interfaces, without investing efforts into creating interfaces,
software development, integration, and testing.
ixtractCL allows application developers to add forms recognition
functionality simply and easily, running it as a
shell command, producing recognition results written to an
easy-to-parse text format (XML output will be supported shortly).
ixtractCL currently supports machine-print(OCR), hand-print(ICR), and
1-D Barcode recognition.
A simple INI file interface allows developers to specify form
properties properties.
No matter what the development environment is, PHP, JavaScript, Java,
Gambas, C++, Visual Basic, .NET, or any other scripting language not
mentioned here, if it can run a command-line program, it can run
ixtractCL
This is especially useful for environments where a significant amount
of UI work has been done, or if users have been trained in the use of a
certain tool
Imagine a UI where users would enter data from a paper sheet, now, the
paper can be scanned in, and the UI
elements automatically populated with OCR, ICR, and Barcode recognition
results with a small change in the UI code.
Contact us
today. Let us study your needs
in detail and partner with you to achieve
increased productivity and reduced costs.