Computational Statistics and Visualisation
An intensive course in data analysis primarily for non-mathematics/statistics graduates. Covers fundamentals of descriptive statistics, probability and applications.
Visualisation is a central theme and is incorporated in all three content sections.
- Data Visualisation [20%] - Methods of sampling. Data representation - pie and bar charts; scatterplots; histograms; cumulative (relative) frequency curves; dot plots; box-whisker plots, stem-and-leaf displays. Measures of central tendency and variability for sample and grouped data. Psychological aspects.
- Probability [30%] - Definitions and fundamental laws; counting techniques; conditional probability; Bayes theorem; the concept of a discrete probability distribution; expectations and variance; some standard discrete distributions; Geometric, Binomial, Poisson. The concept of a continuous distribution; the Normal distribution and properties; use of Normal tables. Continuous probability distributions and their properties; Expectation and variance. Some standard continuous distributions; normal and related distributions.
- Statistical Applications [50%] - The concept of a sampling distribution; point and interval estimation; hypothesis testing; Type I and Type II errors; p values; determination of sample size; confidence intervals and significance tests for means and for proportions; single, paired and unpaired samples; Normal and t tests. F-test. Normal probability plot. Introduction to one-way Analysis of Variance. Hartley's test, Bartlett's test. Confidence intervals for treatment means and differences between treatment means. Introduction to simple linear regression. ANOVA table. Confidence intervals and prediction intervals. Correlation and rank correlation. Chi-square as a test of association and as a test of model fit. Non-parametric tests (Wilcoxon's Signed rank test, Mann-Whitney-Wilcoxon test, Kruskal-Wallis test and Friedmann test).
Data Management and Machine Learning
The aim of this unit is to develop the student’s knowledge in the areas of data management including online analytical processing; data architectures such as data warehousing and the process and application of machine learning algorithms to data.
- Data Management Overview [15%] - Example content includes database modelling/querying (relational/noSQL), graph data modelling, applications.
- Online Analytical Processing (OLAP) [15%] - Including the representation of multi-dimensional views of data; Technologies and Architectures; Categories of OLAP tools, Business Intelligence Tools.
- Data Warehousing [10%] - Methodologies, architectures, modelling techniques; Data Warehousing Project Management; The Extraction, Transformational and Loading Process;
- Machine Learning Overview [10%] - The machine learning process, Applications of machine Learning.
- Machine Learning Algorithms [50%] - For example, artificial neural networks, naïve bayes, decision trees, clustering, association rules, text mining, fuzzy systems, application, analysis and validation.
Introduction to Data Science
Introduces data science concepts, techniques and algorithms for processing and visualising datasets so as to infer useful, actionable knowledge in domain, using Python and R as ecosystems.
- Introduction to the data science ecosystem [10%] - basic functionality, such as fundamental data structures and operations, for specifying and running a data science pipeline; key concepts of Python (with iPython [Notebook]) and R (with RStudio).
- Data visualization [10%] - fundamental plots: line and bar charts, histograms, scatterplots, among others.
- Data manipulation [20%] - aspects of popular formats of datasets: tabular, text, graph, markup (e.g., XML); obtaining datasets via API requests and scraping data. Brief introduction to graph concepts: nodes, edges, paths; directed and undirected; degree. Fundamental transformational operations: extract and add features, obtain subsets of data, group and combine datasets.
- Data analysis [20%] - summarising data with measures of totality (e.g., count, sum), central tendency (e.g., mean and mode) and spread (e.g., standard deviation). frequency and probability distributions; correlation, introductory aspects of graph analytics (e.g., centrality).
- Data preparation [20%] - choosing, configuring and applying basic data reduction techniques and algorithms (e.g., dimensionality reduction, sampling); data cleaning, normalisation.
- Data mining [20%] - selecting, configuring and applying common data mining algorithms for classification, clustering and regression.
High performance Computing and Big Data
The aim of this unit is to develop students' knowledge in the areas of parallel and distributed processing, machine learning approaches for handling big data, and current parallel programming models for high-performance computing and big data processing, such as MPI and MapReduce.
- Current and Emerging Trends [25%] - Evaluation of current and emerging trends underpinning parallel and distributed systems for high-performance computing and big data - paradigms and platforms, cloud computing.
- Features of Big Data [10%] - feature extraction and dimensionality reduction approaches.
- Artificial Intelligence [10%] - Machine learning, AI approaches and their algorithms for handling big data e.g. images, graphs, text.
- Models and Applications [50%] - Programming models and applications for big data and High-performance computing, including MPI, OpenMP, Hadoop/MapReduce, NoSQL with case studies.
- Professional Context [5%] - Professional, legal, ethical, social and cultural issues in high performance computing of big data.
Data Science Project
Each individual project will investigate a challenging but constrained Data Science problem.
The project will involve performing an end-to-end data science task pipeline including, data collection, formulation of one or more questions to be asked about the data, typical preprocessing steps (e.g. cleaning, transforming and exploring), analysis, application of applicable machine learning methods, modelling, visualization, interpretation and assessment of whether models are meaningful and relevant to the field. Students will be required to demonstrate understanding of experimental design including validation and evaluation of models using appropriate statistical methods.
The project will involve practical experimentation work on live data. The project may also involve practical implementation. The project will provide him or her with the opportunity to develop independent practical and analytical skills using proven methods and techniques.
Students will be able to produce well-substantiated and validated results within the limits imposed by the time constraint. They will be able to demonstrate their investigative ability but will not necessarily be able to produce a complete piece of research or make a significant contribution to knowledge. They will, however, be expected to critically examine their work and be able to place it in context.
Each student will be allocated a Project Supervisor from the academic staff. The main function of a Project Supervisor is to offer general advice and guidance to the student. Students will submit a proposal to their Project Supervisor which will be scrutinised by at least one other academic member of staff.
Supporting seminars (5%), commencing before the start of the project, will be used to reinforce the students knowledge of research methods and to discuss personal organization and time management. Students need support to develop the communications and other generic skills they require to become effective researchers, to enhance their employability and assist their career progress after completing their degree. These skills may be present on commencement or developed during the project. The need for dissertations to address, as appropriate, legal, ethical, professional and social issues will be emphasised.
Students on the MSc will also attend a seminar which will be dedicated to examining current professional, legal, ethical, social and cultural issues in data science.
As the project is the most distinctive part of postgraduate study, there will be a strong element of personal development planning, both during the support seminars and also during the supervision sessions with individual project supervisors, as students are invited to reflect on their progress during the projects execution and write-up.
The student, at the end of the project will be required to submit a project dissertation and undertake a Viva examination to present the project work, too the Projects Supervisor and a designated Second Reader allocated by the Project Tutor.
Where it is appropriate,the project may be undertaken with an industry partner (e.g. existing employer or internship) with system creation or experimentation being work-based.