A team of information retrieval experts from RMIT is developing new techniques to improve online text searching.
The team are collaborating with abbrevi8, a Sydney-based software company who are also funding the research, to investigate ways to prioritise relevant references according to a search term, also known as ‘entity centrality’.
Dr Damiano Spina is a member of the research group and explains that the project will provide important refinements to the ways text is searched and sorted.
“Within a news story or report, many names of people or organisations may appear, but only a few are actually of central importance in the text,” he said.
“Pinpointing these central entities can make search and document summarisation more accurate and effective.”
Spina said that identifying the relative value or pertinence of a particular search term to the overall document is crucial for various entity-based document search applications, such as online reputation monitoring and entity-focused search and summarisation.
“For example, in reputation monitoring, a focused news story about a particular CEO would have higher priority than a business review that simply mentions the individual's name in a list of other CEOs,” he said.
“The same applies to entity-focused summarisation, where information extracted from high-centrality documents would be considered more credible and important to include in a summary."
“Ultimately, the new techniques will enable users to find the most important information about people, organisations, and other entities of interest more quickly.”
These new ways of search prioritisation will be incorporated in abbrevi8’s Hugo app, which sifts news and other information about people you’re going to meet and briefs you on the most essential information about them.
“Hugo aims to provide background information on the person the user is meeting by providing a summarised profile of the person and its company,” Spina said.
“The idea is to improve the retrieval of news related to the person of interest; finding information that is worth including in the profile by considering entity centrality, which is where our research will come in.”
To create a system for determining entity centrality, the team is treating the problem in terms of ranking.
“Given a fragment of text such as a document or a sentence, entities will be identified and scored according to their centrality to the text based on semantic features and co-reference resolution,” Spina said.
According to Professor Mark Sanderson, RMIT’s new Enabling Capability Platform Director, Information and Systems, and the Head of RMIT’s Information, Storage, Analysis and Retrieval Group, this project is another example of RMIT’s concentration of expertise in the area of information retrieval (IR).
“Melbourne is one of the World’s information retrieval research hubs and RMIT is actually a big part of that,” he said.
“It’s because of the research excellence in search engines and IR, which we’ve demonstrated over a number of years, that we are now attracting the sort of talented young researchers that make up the team working with abbrevi8 on this project.”
Spina is also delighted to get this grant from abbrevi8 as it gives him and his team great encouragement to take on other research challenges in the field of IR.
"We are very excited to get this support and it’s very nice to have the opportunity to transfer our research outcomes to innovative products like this,” he said.
“As a team of early career researchers, these projects give us the impulse to spread our wings and tackle new, applied research challenges in our field."
Story: Daniel Walder