Related projects
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
Mitacs brings innovation to more people in more places across Canada and around the world.
Learn MoreWe work closely with businesses, researchers, and governments to create new pathways to innovation.
Learn MoreNo matter the size of your budget or scope of your research, Mitacs can help you turn ideas into impact.
Learn MoreThe Mitacs Entrepreneur Awards and the Mitacs Awards celebrate inspiring entrepreneurs and innovators who are galvanizing cutting-edge research across Canada.
Learn MoreDiscover the people, the ideas, the projects, and the partnerships that are making news, and creating meaningful impact across the Canadian innovation ecosystem.
Learn MoreWord segmentation in handwritten document is a difficult task because inter-word-spacing (i.e. the space between parts of the same word) is sometimes wider than the intra-word-spacing (i.e. the space between two consecutive words). Many different approaches to segmenting words have been proposed so far. However these segmentation approaches usually use some parameters that are manually tuned; meaning that they do not take into account the properties of the document in order to automatically calibrate the parameters.
In this project, we wish to explore the use of genetic programming in order to find relations between the characteristics of the text and the parameters of the word segmentation algorithm. A good starting point is the algorithm proposed by Manmatha and Rothfeder [1] which is a state-of-the-art word segmentation algorithm. This algorithm is based on the scale-space theory, which is a framework for representing image structures at different scales. The scale-space is obtained by Gaussian filtering. Roughly speaking, if we convolve the image by Gaussian kernels with different sizes (i.e. standard deviations), we will obtain the image structures at different scales. For a text line, by using Gaussian kernels of a certain size we can obtain the blobs that correspond to words. In the original paper, Manmatha and Rothfeder use an experimental formula to tune the size of the Gaussian kernels. However, their proposed formula is independent of the characteristics of the text line, such as how densely or how sparsely the characters are written. Therefore, the performance of the algorithm is sometimes effected by under-segmented and over-segmented errors. In order to mitigate this problem, we wish to use genetic programming to estimate the optimal size for the Gaussian kernels based on the properties of the text.
The student will be provided with the C/C++ source codes to work with a benchmark database in order to train the algorithms and evaluate the performance.
Dr. Tien D. Bui
Kalyan Sahoo
Computer science
Information and communications technologies
Concordia University
Globalink Research Internship
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
Find the perfect opportunity to put your academic skills and knowledge into practice!
Find ProjectsThe strong support from governments across Canada, international partners, universities, colleges, companies, and community organizations has enabled Mitacs to focus on the core idea that talent and partnerships power innovation — and innovation creates a better future.