a.3-4 years of relevant experience in Core Java
b.Should be strong in data structures and algorithms
c.Shouldve worked on various file-formats natively pdf, open-xml, etc.
d.Shouldve worked independently and is research-oriented (trying different, small things quickly and using them for production applications)
a.The position requires the candidate to implement algorithms to suit various extraction tasks.
b.Extraction can be as simple as (cleanly) converting native file-formats into text and can be as difficult as applying methods and techniques to improve the quality of OCR.
c.The implementations should need to account for common errors in OCR, etc. using domain (problem)-specific data to provide near-perfect information extraction accuracies.
3.Good to have
a.A candidate from reputed institute OR a masters can relax the number of years of experience a bit
b.Shouldve worked on atleast one of the following libraries:
c.Some exposure to Image processing (even if conceptually) using Java will be great!