In this article i will show you how to read text from image by using OCR Components in C#.net.
What is OCR?
OCR (Optical Character Recognition) is the recognition of printed or written text characters by a computer. This involves photoscanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes, such as ASCII, commonly used in data processing.
OCR translates images of text, such as scanned documents, into actual text characters. Also known as text recognition, OCR makes it possible to edit and reuse the text that is normally locked inside scanned images. OCR works using a form of artificial intelligence known as pattern recognition, to identify individual text characters on a page, including punctuation marks, spaces, and ends of lines.
First off, you need to have MS Office 2007 installed or later version. This is obviously a dependency if you develop an application to use the OCR capabilites in the field – it won’t work without Office installed. Furthermore, the OCR capability doesn’t install by default when you install Office, you need to add a component called ‘Microsoft Office Document Imaging’ (MODI).
Instructions on how to add the required MODI component.
Step 1
Click Start, click Run, type appwiz.cpl in the Open box, and then click OK.
Step 2
Click to select the Office 2007 version that you have installed.
Step 3
Click Change.
Step 4
Click Add or Remove Features, and then click Continue.
Step 5
Expand Office Tools.
Click on Image for better View.
Step 6
Click Microsoft Office Document Imaging, and then click Run all from My Computer.
Click on Image for better View.
Step 7
Click Continue.
Now MODI Components installed on your Machine.lets create OCR Application in Visual Stdio.
Step 8
Create a Console Application and give the solution name as SolReadTextFromImage.
Step 9
Copy a Sample image file in Application BaseDirectory.(./bin/debug/SampleImage.JPG)
Click on Image for better View.
Step 10
Add a MODI Reference in our application.so we can use in our application for reading text from image.Right Click on project in Solution Explorer.right click onReferences,select the COM tab,then select Microsoft Office Document Imaging 12.0 Type Library.
Click on Image for better View.
Step 11
The Code below will read text from image and store in text file,it is look like this
Step 12
Call both methods in main function,it is look like this
Full Code
Output
Click on Image for better View.
Download
Download source Code
Refer by : http://kishor-naik-dotnet.blogspot.in/2012/05/cnet-read-text-from-image-in-cnet.html
What is OCR?
OCR (Optical Character Recognition) is the recognition of printed or written text characters by a computer. This involves photoscanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes, such as ASCII, commonly used in data processing.
OCR translates images of text, such as scanned documents, into actual text characters. Also known as text recognition, OCR makes it possible to edit and reuse the text that is normally locked inside scanned images. OCR works using a form of artificial intelligence known as pattern recognition, to identify individual text characters on a page, including punctuation marks, spaces, and ends of lines.
First off, you need to have MS Office 2007 installed or later version. This is obviously a dependency if you develop an application to use the OCR capabilites in the field – it won’t work without Office installed. Furthermore, the OCR capability doesn’t install by default when you install Office, you need to add a component called ‘Microsoft Office Document Imaging’ (MODI).
Instructions on how to add the required MODI component.
Step 1
Click Start, click Run, type appwiz.cpl in the Open box, and then click OK.
Step 2
Click to select the Office 2007 version that you have installed.
Step 3
Click Change.
Step 4
Click Add or Remove Features, and then click Continue.
Step 5
Expand Office Tools.
Click on Image for better View.
Step 6
Click Microsoft Office Document Imaging, and then click Run all from My Computer.
Click on Image for better View.
Step 7
Click Continue.
Now MODI Components installed on your Machine.lets create OCR Application in Visual Stdio.
Step 8
Create a Console Application and give the solution name as SolReadTextFromImage.
Step 9
Copy a Sample image file in Application BaseDirectory.(./bin/debug/SampleImage.JPG)
Click on Image for better View.
Step 10
Add a MODI Reference in our application.so we can use in our application for reading text from image.Right Click on project in Solution Explorer.right click onReferences,select the COM tab,then select Microsoft Office Document Imaging 12.0 Type Library.
Click on Image for better View.
Step 11
The Code below will read text from image and store in text file,it is look like this
- #region Methods
- /// <summary>
- /// Read Text from Image and display in console App
- /// </summary>
- /// <param name="ImagePath">specify the Image Path</param>
- private static void ReadTextFromImage(String ImagePath)
- {
- try
- {
- // Grab Text From Image
- MODI.Document ModiObj = new MODI.Document();
- ModiObj.Create(ImagePath);
- ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
- //Retrieve the text gathered from the image
- MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
- System.Console.WriteLine(ModiImageObj.Layout.Text);
- ModiObj.Close();
- }
- catch (Exception ex)
- {
- throw new Exception(ex.Message);
- }
- }
- /// <summary>
- /// Read Text from Image and Store in Text File
- /// </summary>
- /// <param name="ImagePath">specify the Image Path</param>
- /// <param name="StoreTextFilePath">Specify the Store Text File</param>
- private static void ReadTextFromImage(String ImagePath, String StoreTextFilePath)
- {
- try
- {
- // Grab Text From Image
- MODI.Document ModiObj = new MODI.Document();
- ModiObj.Create(ImagePath);
- ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
- //Retrieve the text gathered from the image
- MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
- // Store Image Content in Text File
- FileStream CreateFileObj = new FileStream(StoreTextFilePath, FileMode.Create);
- //save the image text in the text file
- StreamWriter WriteFileObj = new StreamWriter(CreateFileObj);
- WriteFileObj.Write(ModiImageObj.Layout.Text);
- WriteFileObj.Close();
- ModiObj.Close();
- }
- catch (Exception ex)
- {
- throw new Exception(ex.Message);
- }
- }
- #endregion
Step 12
Call both methods in main function,it is look like this
- static void Main(string[] args)
- {
- // Set Sample Image Path
- String ImagePath = AppDomain.CurrentDomain.BaseDirectory + "SampleImage.jpg";
- ReadTextFromImage(ImagePath);
- // Set Store Image Content text file Path
- String StoreTextFilePath = AppDomain.CurrentDomain.BaseDirectory + "SampleText.txt";
- ReadTextFromImage(ImagePath, StoreTextFilePath);
- }
Full Code
- using System;
- using System.Collections.Generic;
- using System.Linq;
- using System.Text;
- using System.IO;
- namespace SolReadTextFromImage
- {
- class Program
- {
- static void Main(string[] args)
- {
- // Set Sample Image Path
- String ImagePath = AppDomain.CurrentDomain.BaseDirectory + "SampleImage.jpg";
- ReadTextFromImage(ImagePath);
- // Set Store Image Content text file Path
- String StoreTextFilePath = AppDomain.CurrentDomain.BaseDirectory + "SampleText.txt";
- ReadTextFromImage(ImagePath, StoreTextFilePath);
- }
- #region Methods
- /// <summary>
- /// Read Text from Image and display in console App
- /// </summary>
- /// <param name="ImagePath">specify the Image Path</param>
- private static void ReadTextFromImage(String ImagePath)
- {
- try
- {
- // Grab Text From Image
- MODI.Document ModiObj = new MODI.Document();
- ModiObj.Create(ImagePath);
- ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
- //Retrieve the text gathered from the image
- MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
- System.Console.WriteLine(ModiImageObj.Layout.Text);
- ModiObj.Close();
- }
- catch (Exception ex)
- {
- throw new Exception(ex.Message);
- }
- }
- /// <summary>
- /// Read Text from Image and Store in Text File
- /// </summary>
- /// <param name="ImagePath">specify the Image Path</param>
- /// <param name="StoreTextFilePath">Specify the Store Text File</param>
- private static void ReadTextFromImage(String ImagePath, String StoreTextFilePath)
- {
- try
- {
- // Grab Text From Image
- MODI.Document ModiObj = new MODI.Document();
- ModiObj.Create(ImagePath);
- ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
- //Retrieve the text gathered from the image
- MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
- // Store Image Content in Text File
- FileStream CreateFileObj = new FileStream(StoreTextFilePath, FileMode.Create);
- //save the image text in the text file
- StreamWriter WriteFileObj = new StreamWriter(CreateFileObj);
- WriteFileObj.Write(ModiImageObj.Layout.Text);
- WriteFileObj.Close();
- ModiObj.Close();
- }
- catch (Exception ex)
- {
- throw new Exception(ex.Message);
- }
- }
- #endregion
- }
- }
Output
Click on Image for better View.
Download
Download source Code
Refer by : http://kishor-naik-dotnet.blogspot.in/2012/05/cnet-read-text-from-image-in-cnet.html
Comments
Post a Comment