Enterprise Search and Text Analytics for a Publication Giant
t the Client
The client is a scientific publication company.
The client wanted ALTEN Calsoft Labs to do the following for them.
- Store huge amount of unstructured publication data
- Search relevant documents based on search terms
ALTEN Calsoft Labs’ expert team did the following to shape the client’s requirement.
- Developed search grammars that include AND, OR, NAND, 1, 2 and 3 character search, phrase search and NEAR to name a few.
- Ensured enhanced search performance
- Based on the query string
- Search will be longer for the query string that occurs in more number of documents
- The search results are around ~1.5 secs to ~10 seconds
- Filters are generally fast, usually ~2 seconds
- Conducted analysis (Semantics and Stats pages) based on preset KPIs or definitions with which, an analysis can be done on the data
- Easy discovery of relevant documents
- Boolean Search Support
- Platform: Hadoop
- File System: Hadoop Distributed File System
- Paradigm: MapReduce
- Machine Learning Tools: Mahout
- Language: R
Let’s fast-track your next big idea