1 Identify eye movement – single wave file
2 Extracting signals from multiple files
3 Streaming classifier
4 Data visualization
5 Preparation for Week 2 lecture

Code

DATA3888_2022 Lab: Week 2

Brain box data analytics – Sample Code

Preparation and assume knowledge

Have a look at the Lab and make sure you have all packages installed on your computer.
Able to write basic cross-validation R code.
Work through Sections 1.1 and 1.2 of this lab, see Lab 1 and Week 1 lecture for suggestions.

Aims

To explore and process the .wav data and produce graphical summaries.
Generate training data for left and right eye movement.
Getting familiar with while loop and use it to generate statistics for streaming data.
Please submit 3.2 for formative feedback.
Pre-reading and Homework: become familiar with visualizing data on the world map.

Note: This is a very long lab, and some material here are hints for the interdisciplinary project.

R packages you need for this lab:

## Sections 1 - 3
library(tidyverse)
library(tuneR)
library(devtools)
library(ggplot2)
library(tsfeatures)

## Section 4
library(maps)

## Section 5
library(BiocManager) ## install.packages("BiocManager")
library(GEOquery) ## BiocManager::install(GEOquery)
library(Biobase)

1 Identify eye movement – single wave file

Two physics tutors have generated a series of data from the Spiker box. These signals are captured and saved as a WAV format. We will begin our exploration by looking at the data contained in the file “LRL_L3.wav.”

1.1 Reading the data
1.2 Simple visualization
1.3 Generating training data
1.4 Classify eye movement
1.5 Function

With the code below, we’ll read the .wav file using the tuneR package’s readWave function. We use two functions unique to the S4 class to retrieve information: – The function slotNames returns information about the components (termed slots) for a given object. – To extract a specific slots of an S4 object, you use @ such as waveSeq@left. If you’re curious, you may learn all about the S4 object framework at (http://adv-r.had.co.nz/S4.html), however it’s not needed for the lab.

[a] What is the dimension of this .wav file?
[b] What kind of information do we have?
[c] How many measurements did we observe per second?

Suggestions

## 
## Wave Object
##  Number of Samples:      72688
##  Duration (seconds):     7.27
##  Samplingrate (Hertz):   10000
##  Channels (Mono/Stereo): Mono
##  PCM (integer format):   TRUE
##  Bit (8/16/24/32/64):    16

## [1] "left"      "right"     "stereo"    "samp.rate" "bit"       "pcm"

There are 72688 values and we have 10000 observations per second.

2 Extracting signals from multiple files

In Section 1.5, we have written a function to identify an eye-movement event. Using this function, we are in a position to create a training data for left- vs right- eye movement. We will do this in 4 steps:

Step 1: Reading in multiple files.
Step 2: Identify event and extract signals for all files.
Step 3: Filter low quality data.
Step 4: Extract signals for all 24 files.

2.1 Data input
2.2 Extract signals (extension)
2.3 Simple classifier
2.4 Feature extraction
2.5 Basic classifier – KNN
2.6 Classifier – Extension (for interdisciplinary project)

[a] Read all the short sequence data generated by the Physics tutor Louis and store it as a list object. The function lapply apply a function over a list and returns a list of the same length.

dir_short = <write down the data directory>  
all_files_short <- list.files(dir_short)

wave_file_short <- list()
for (i in all_files_short) {
wave_file_short[[i]] <- readWave(file.path(dir_short, i))
}

Suggestions

We assume the data are in the folder data/Spiker_box_Louis/Short.Code

3 Streaming classifier

3.1 While loop practice
3.2 Moving window
3.3 Streaming classifier (Q1 in Assignment 1)

The “while loop” refers to the repetition of a collection of “code blocks” or “expression” until a certain condition is met. In R, the syntax of while loop is

while (condition) {
  expression
}

As long as the condition is TRUE the expression is evaluated. Once the condition is FALSE control is transferred outside the loop.

[a] Write a procedure to print the first 10 even numbers using while loop.

SuggestionsCode

## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14
## [1] 16
## [1] 18
## [1] 20

[b] Create a While Loop that prints out a random, standard normal number until it exceeds 1.

SuggestionsCode

## [1] -0.05817514
## [1] -0.3886656
## [1] 0.04657794
## [1] 0.8024861
## [1] -0.01972808
## [1] 0.701923
## [1] 0.1444695
## [1] -0.7800232
## [1] -0.1068324
## [1] -1.44374
## [1] 0.6526983
## [1] 0.2598768
## [1] -0.808634
## [1] -1.857157

4 Data visualization

This week, we’ll introduce the idea of a map’s graphic display. The maps package includes a variety of different maps. The R function map_data("world2") will be used in this part to generate a world map that places Australia in the center of the Pacific Ocean. We can then extract longitude and latitude coordinates and save them in a R object called ‘World Map.’. The ‘world map’ object is a six-column data-frame with the first two columns hold the country’s boundary coordinates.Code

R-tip: Rename the two countries, UK and USA, so that the region indicated in world_map mapped to the location in covid data. When attempting to merge two data sets, quite a lot of data wrangling is frequently required in order to match information between two datasets. Here we use the function setdiff to identify potential issues between two data modalities.Code

Visualise the new_cases_smoothed_per_million on the day 2021-12-31 on the world map. The code below merge the world_map data with the covid_full data.Code

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##    0.000    8.383  152.199  424.079  800.108 4180.632     8448

Code

5 Preparation for Week 2 lecture

In the Week 2 lecture, we will examine a biomedical data case study. To get the best possible preparation for the lecture and the course, make sure you have downloaded the data from the canvas site and load it into R. The second case study looks at data aiming to predict Kidney transplant outcome. Kidney transplant remains the only treatment for patients with end-stage kidney disease. However, despite advances in medical fields, the proportion of patients who developed graft rejection after kidney transplant remains high. Understanding the characteristic between stable patients and patients who experienced rejection can be one of the ways to understand more about kidney transplant rejection. In this exercise we will visualise public datasets on kidney transplant patients. We’ll start our investigation by looking at one data downloaded from GEO. The NCBI Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/ is a large public repository of microarray and sequencing-based datasets. You can download the dataset by looking up its GSE ID in the database. Information about each dataset, for example, the paper that this data is published in, the experimental protocol, can also be found on the database. We will first focus on how to analyse the data from GSE46474. This dataset contains a group of “discovery” patients and a group of “validation” patients. You can think of it as “training” and “testing” group. To illustrate visualisation, we will play around with the “discovery” patients. The file GSE46474_series_matrix.txt.gz can also be access via CANVAS.

5.1 Import data
5.2 ExpressionSet
5.3 Outcome
5.4 Quick visualisation

The large file download from GEO may have an impact on your generation of htmlfile using Rmarkdown code, depending on your internet connection. We’ve provided the data in the form of an RData file called ‘GSE46474.RData.’ However, we encourage you to experiment with the following code to see how to import data directly using the ‘GEOquery’ package.

library(GEOquery)
library(Biobase)
## import directly from the net 
gse <- getGEO("GSE46474")
gse <- gse$GSE46474_series_matrix.txt.gz

## import directly from a locally saved file
Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 2) # in case of weird error about VROOM_CONNECTION_SIZE
gse <- getGEO(filename="data/GSE46474_series_matrix.txt.gz")

Code

## [1] "ExpressionSet"
## attr(,"package")
## [1] "Biobase"

Code

## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54613 features, 40 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM1130812 GSM1130813 ... GSM1130851 (40 total)
##   varLabels: title geo_accession ... tissue:ch1 (46 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_s_at 1053_at ... NA.25224 (54613 total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 25387159 
## Annotation: GPL570

DATA3888 Data Science Capstone | assignments labs代写 | R语言代写 | R Markdown | R Studio

DATA3888_2022 Lab: Week 2

Brain box data analytics – Sample Code

Preparation and assume knowledge

Aims

1 Identify eye movement – single wave file

2 Extracting signals from multiple files

3 Streaming classifier

4 Data visualization

5 Preparation for Week 2 lecture

MSU密歇根州立大学 | PLS 202 Introduction to Data Analytics and the Social Sciences | R语言 Final Project代写 ggplot2

MSU密歇根州立大学 | PLS 202 Introduction to Data Analytics and the Social Sciences | R语言 Assignment5代写 linear regression

MSU密歇根州立大学 | PLS 202 Introduction to Data Analytics and the Social Sciences | R语言 Assignment4代写 ggplot2

MSU密歇根州立大学 | PLS 202 Introduction to Data Analytics and the Social Sciences | R语言 Assignment3代写

MSU密歇根州立大学 | PLS 202 Introduction to Data Analytics and the Social Sciences | R语言 Assignment2代写

DATA3888_2022 Lab: Week 2

Brain box data analytics – Sample Code

Preparation and assume knowledge

Aims

1 Identify eye movement – single wave file

2 Extracting signals from multiple files

3 Streaming classifier

4 Data visualization

5 Preparation for Week 2 lecture

相关文章