Big data
Research strengths within the RMIT Centre for Information Discovery and Data Analytics include data integration, data quality (lineage), data visualization, and data exploration, with expertise in spatial data, high dimensional data, and multimedia data. The increasing volume of data, such as from GPS and sensor devices, enables us to answer many real-life queries for different applications. Examples of this include facility location selection (finding a suitable location of a facility or a store for business and marketing, identifying an optimal location such that the facility is close to the maximum number of its customers) and trajectory travel patterns analysis (multi-range queries that find trajectories passing through a set of given spatio-temporal ranges). In high dimensional data analysis, real world objects can be represented by features of different aspects, it is essential to identify the similarities, differences and relevance between objects in terms of these features, including through techniques such as clustering, multi-objective optimisation, and influence set identification. Multimedia data analysis aims to identify the similarity or relevance of media data effectively and efficiently for different applications. It covers the research on several typical topics: digital copy detection, anomaly detection, media recommendation, media compression and summarization.
Machine learning
Data mining, or knowledge discovery in data, aims to infer hidden knowledge, patterns and insights from data for various applications. Data is usually in high volumes and can be in formats such as transactions, natural language texts, and networks of linked data items. The knowledge mined from data can be patterns describing human behaviours and activities, or predictive models that classify a credit card application as low risk or high risk, or classify a review as fraudulent or genuine. Data mining technologies can include efficient algorithms searching for knowledge patterns, or machine learning models for predictive analytics. The RMIT Centre for Information Discovery and Data Analytics has expertise in several areas: (1) Designing efficient data mining algorithms for knowledge patterns from large volumes of data on big data processing platforms. (2) Sentiment analysis and opinion mining, including opinion spam detection, opinion summarization. (3) Fraud detection and anomaly detection, especially for financial applications. (4) Social media and social network data mining, including user profiling, sentiment analysis and information credibility analysis. (4) Recommender systems, including complex applications such as recommending itineraries for travel or amusement parks, contextual recommendation for personal assistants and dynamic and personalised scenarios such as recommending publications to read and study. (5) Biomedical text mining.
Information retrieval
Information Retrieval (IR) systems retrieve information relevant to a user’s information need. While this sounds like a simple process (just find documents containing query words), the volume of documents matching virtually any query is often so large that an IR system must attempt to infer what the user is seeking in order to locate relevant documents. Outstanding IR systems need to be fast, intuitive to use, and accurate. The technology underpinning IR systems can be found in search systems (e.g. web search engines such as Google and Bing), recommender systems (Netflix, Amazon), and many other tools to retrieve information. The RMIT Centre for Information Discovery and Data Analytics has expertise in designing, evaluating, and improving complex, multi-stage retrieval in a wide variety of application areas, such as web, legal, medical, genomic, product, and job-based search systems. Particular areas of focus include improving the efficiency and scalability search engines, improving the effectiveness of retrieval systems through learning-to-rank and other state-of-the-art ranking models, modelling and understanding user behaviour, and evaluating the quality of the search results returned.