Work Hours
Everyday: 北京时间8:00 - 23:59
- 1 Identify eye movement – single wave file
- 2 Extracting signals from multiple files
- 3 Streaming classifier
- 4 Data visualization
- 5 Preparation for Week 2 lecture
Code
DATA3888_2022 Lab: Week 2
Brain box data analytics – Sample Code
Preparation and assume knowledge
- Have a look at the Lab and make sure you have all packages installed on your computer.
- Able to write basic cross-validation R code.
- Work through Sections 1.1 and 1.2 of this lab, see Lab 1 and Week 1 lecture for suggestions.
Aims
- To explore and process the .wav data and produce graphical summaries.
- Generate training data for left and right eye movement.
- Getting familiar with
while
loop and use it to generate statistics for streaming data. - Please submit 3.2 for formative feedback.
- Pre-reading and Homework: become familiar with visualizing data on the world map.
Note: This is a very long lab, and some material here are hints for the interdisciplinary project.
R packages you need for this lab:
## Sections 1 - 3
library(tidyverse)
library(tuneR)
library(devtools)
library(ggplot2)
library(tsfeatures)
## Section 4
library(maps)
## Section 5
library(BiocManager) ## install.packages("BiocManager")
library(GEOquery) ## BiocManager::install(GEOquery)
library(Biobase)
1 Identify eye movement – single wave file
Two physics tutors have generated a series of data from the Spiker box. These signals are captured and saved as a WAV format. We will begin our exploration by looking at the data contained in the file “LRL_L3.wav.”
- 1.1 Reading the data
- 1.2 Simple visualization
- 1.3 Generating training data
- 1.4 Classify eye movement
- 1.5 Function
With the code below, we’ll read the .wav
file using the tuneR
package’s readWave
function. We use two functions unique to the S4 class to retrieve information: – The function slotNames
returns information about the components (termed slots) for a given object. – To extract a specific slots of an S4 object, you use @ such as waveSeq@left. If you’re curious, you may learn all about the S4 object framework at (http://adv-r.had.co.nz/S4.html), however it’s not needed for the lab.
- [a] What is the dimension of this
.wav
file? - [b] What kind of information do we have?
- [c] How many measurements did we observe per second?
Suggestions
##
## Wave Object
## Number of Samples: 72688
## Duration (seconds): 7.27
## Samplingrate (Hertz): 10000
## Channels (Mono/Stereo): Mono
## PCM (integer format): TRUE
## Bit (8/16/24/32/64): 16
## [1] "left" "right" "stereo" "samp.rate" "bit" "pcm"
There are 72688 values and we have 10000 observations per second.
2 Extracting signals from multiple files
In Section 1.5, we have written a function to identify an eye-movement event. Using this function, we are in a position to create a training data for left- vs right- eye movement. We will do this in 4 steps:
- Step 1: Reading in multiple files.
- Step 2: Identify event and extract signals for all files.
- Step 3: Filter low quality data.
- Step 4: Extract signals for all 24 files.
- 2.1 Data input
- 2.2 Extract signals (extension)
- 2.3 Simple classifier
- 2.4 Feature extraction
- 2.5 Basic classifier – KNN
- 2.6 Classifier – Extension (for interdisciplinary project)
- [a] Read all the short sequence data generated by the Physics tutor Louis and store it as a list object. The function
lapply
apply a function over a list and returns a list of the same length.
dir_short = <write down the data directory>
all_files_short <- list.files(dir_short)
wave_file_short <- list()
for (i in all_files_short) {
wave_file_short[[i]] <- readWave(file.path(dir_short, i))
}
Suggestions
We assume the data are in the folder data/Spiker_box_Louis/Short
.Code
3 Streaming classifier
The “while loop” refers to the repetition of a collection of “code blocks” or “expression” until a certain condition is met. In R, the syntax of while loop is
while (condition) {
expression
}
As long as the condition
is TRUE the expression
is evaluated. Once the condition is FALSE control is transferred outside the loop.
- [a] Write a procedure to print the first 10 even numbers using while loop.
SuggestionsCode
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14
## [1] 16
## [1] 18
## [1] 20
- [b] Create a While Loop that prints out a random, standard normal number until it exceeds 1.
SuggestionsCode
## [1] -0.05817514
## [1] -0.3886656
## [1] 0.04657794
## [1] 0.8024861
## [1] -0.01972808
## [1] 0.701923
## [1] 0.1444695
## [1] -0.7800232
## [1] -0.1068324
## [1] -1.44374
## [1] 0.6526983
## [1] 0.2598768
## [1] -0.808634
## [1] -1.857157
4 Data visualization
This week, we’ll introduce the idea of a map’s graphic display. The maps package includes a variety of different maps. The R function map_data("world2")
will be used in this part to generate a world map that places Australia in the center of the Pacific Ocean. We can then extract longitude and latitude coordinates and save them in a R object called ‘World Map.’. The ‘world map’ object is a six-column data-frame with the first two columns hold the country’s boundary coordinates.Code

R-tip: Rename the two countries, UK and USA, so that the region
indicated in world_map
mapped to the location
in covid
data. When attempting to merge two data sets, quite a lot of data wrangling is frequently required in order to match information between two datasets. Here we use the function setdiff
to identify potential issues between two data modalities.Code
Visualise the new_cases_smoothed_per_million on the day 2021-12-31
on the world map. The code below merge the world_map
data with the covid_full
data.Code
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 8.383 152.199 424.079 800.108 4180.632 8448
Code

5 Preparation for Week 2 lecture
In the Week 2 lecture, we will examine a biomedical data case study. To get the best possible preparation for the lecture and the course, make sure you have downloaded the data from the canvas site and load it into R. The second case study looks at data aiming to predict Kidney transplant outcome. Kidney transplant remains the only treatment for patients with end-stage kidney disease. However, despite advances in medical fields, the proportion of patients who developed graft rejection after kidney transplant remains high. Understanding the characteristic between stable patients and patients who experienced rejection can be one of the ways to understand more about kidney transplant rejection. In this exercise we will visualise public datasets on kidney transplant patients. We’ll start our investigation by looking at one data downloaded from GEO. The NCBI Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/ is a large public repository of microarray and sequencing-based datasets. You can download the dataset by looking up its GSE ID in the database. Information about each dataset, for example, the paper that this data is published in, the experimental protocol, can also be found on the database. We will first focus on how to analyse the data from GSE46474
. This dataset contains a group of “discovery” patients and a group of “validation” patients. You can think of it as “training” and “testing” group. To illustrate visualisation, we will play around with the “discovery” patients. The file GSE46474_series_matrix.txt.gz
can also be access via CANVAS.
The large file download from GEO may have an impact on your generation of htmlfile using Rmarkdown code, depending on your internet connection. We’ve provided the data in the form of an RData file called ‘GSE46474.RData.’ However, we encourage you to experiment with the following code to see how to import data directly using the ‘GEOquery’ package.
library(GEOquery)
library(Biobase)
## import directly from the net
gse <- getGEO("GSE46474")
gse <- gse$GSE46474_series_matrix.txt.gz
## import directly from a locally saved file
Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 2) # in case of weird error about VROOM_CONNECTION_SIZE
gse <- getGEO(filename="data/GSE46474_series_matrix.txt.gz")
Code
## [1] "ExpressionSet"
## attr(,"package")
## [1] "Biobase"
Code
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54613 features, 40 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GSM1130812 GSM1130813 ... GSM1130851 (40 total)
## varLabels: title geo_accession ... tissue:ch1 (46 total)
## varMetadata: labelDescription
## featureData
## featureNames: 1007_s_at 1053_at ... NA.25224 (54613 total)
## fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
## fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## pubMedIds: 25387159
## Annotation: GPL570
© University of Sydney 2022