# Load packages
library(tidyverse)
library(here)
library(plotly)PSY6422 Mini Project
1.0 Overview
This project will produce a visualisation that demonstrates how the use of different neuroimaging modalities has changed over recent years. Specifically, the project focuses on human neuroimaging. This page provides the steps taken to produce this visualisation. The packages required to produce this visualisaiton are detailed below. More information about the data used in this project is available in the 2.0 Data Origins section. The research question is then provided in section 3.0 Research Question before the steps to prepare the data for plotting are described in section 4.0 Data Preparation. Section 5.0 Visualisation presents the final visualisation. Finally, conclusions from the visualisation, self-reflections and the next steps are discussed in section 6.0 Summary.
1.1 Packages
1.2 Version Information
| Package | Version |
|---|---|
| R | R version 4.5.1 |
| Tidyverse | tidyverse_2.0.0 |
| Here | here_1.0.2 |
| Plotly | plotly_4.11.0 |
2.0 Data Origins
The data used in this project was found in a Crowdsourced list of ‘Open Psychological Datasets’. In this list was the OpenNeuro Dataset of Metadata. OpenNeuro is a data archive that shares public datasets for others to use in their research. The meta-dataset used in this project consists of a list of neuroimaging studies that have used a range of imaging modalities. This data was collected by researchers uploading their data to the data archive. The dataset was downloaded on the 10th of October 2025 and therefore may not contain all data that is in the live metadata online.
2.1 Load Data
# Load Open Neuro Metadata dataset
neuroscience_raw <- read_csv(here("datasets", "neuroscience_metadata.csv"))2.2 Raw Data Summary and Variables
# Show the first ten rows of the raw data
head(neuroscience_raw, 10)# A tibble: 10 × 48
accession_number dataset_url dataset_name made_public most_recent_snapshot
<chr> <chr> <chr> <chr> <chr>
1 ds000001 https://openn… ds001 10/12/2016 5/14/2020
2 ds000002 https://openn… Classificat… 10/12/2016 7/14/2018
3 ds000003 https://openn… Rhyme judgm… 10/13/2016 5/14/2020
4 ds000005 https://openn… ds000005 10/13/2016 7/14/2018
5 ds000006 https://openn… ds000006 10/13/2016 7/14/2018
6 ds000007 https://openn… ds000007 10/13/2016 7/14/2018
7 ds000008 https://openn… ds000008 10/13/2016 7/14/2018
8 ds000009 https://openn… ds000009 12/15/2016 7/14/2018
9 ds000011 https://openn… Classificat… 10/13/2016 12/14/2022
10 ds000017 https://openn… ds000017 11/7/2016 7/14/2018
# ℹ 43 more variables: num_subjects <dbl>, modalities <chr>, dx_status <chr>,
# ages <chr>, tasks <chr>, num_trials <dbl>, study_design <chr>,
# domain_studied <chr>, longitudinal <chr>, processed_data <chr>,
# species <chr>, nondefaced_consent <chr>, affirmed_defaced <chr>,
# doi_of_papers_from_source_data_lab <chr>,
# doi_of_paper_published_using_openneuro_dataset <chr>, senior_author <chr>,
# size_gb <dbl>, ...23 <lgl>, ...24 <lgl>, ...25 <lgl>, ...26 <lgl>, …
# Provide a summary of the dataset
summary(neuroscience_raw) accession_number dataset_url dataset_name made_public
Length:2502 Length:2502 Length:2502 Length:2502
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
most_recent_snapshot num_subjects modalities dx_status
Length:2502 Min. : 0.00 Length:2502 Length:2502
Class :character 1st Qu.: 12.00 Class :character Class :character
Mode :character Median : 25.00 Mode :character Mode :character
Mean : 45.27
3rd Qu.: 47.00
Max. :2951.00
NA's :1014
ages tasks num_trials study_design
Length:2502 Length:2502 Min. : -1.0 Length:2502
Class :character Class :character 1st Qu.: 9.0 Class :character
Mode :character Mode :character Median : 72.0 Mode :character
Mean : 807.5
3rd Qu.: 300.0
Max. :26760.0
NA's :2297
domain_studied longitudinal processed_data species
Length:2502 Length:2502 Length:2502 Length:2502
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
nondefaced_consent affirmed_defaced doi_of_papers_from_source_data_lab
Length:2502 Length:2502 Length:2502
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
doi_of_paper_published_using_openneuro_dataset senior_author
Length:2502 Length:2502
Class :character Class :character
Mode :character Mode :character
size_gb ...23 ...24 ...25 ...26
Min. : 0.000 Mode:logical Mode:logical Mode:logical Mode:logical
1st Qu.: 3.188 NA's:2502 NA's:2502 NA's:2502 NA's:2502
Median : 12.180
Mean : 56.532
3rd Qu.: 45.005
Max. :8925.160
NA's :1024
...27 ...28 ...29 ...30 ...31
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:2502 NA's:2502 NA's:2502 NA's:2502 NA's:2502
...32 ...33 ...34 ...35 ...36
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:2502 NA's:2502 NA's:2502 NA's:2502 NA's:2502
...37 ...38 ...39 ...40 ...41
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:2502 NA's:2502 NA's:2502 NA's:2502 NA's:2502
...42 ...43 ...44 ...45 ...46
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:2502 NA's:2502 NA's:2502 NA's:2502 NA's:2502
...47 ...48
Mode:logical Mode:logical
NA's:2502 NA's:2502
As seen from viewing the first ten rows of the raw data, there are 48 columns. Due to the this, not all variables are shown when using the head() function therefore, I have used the summary() function to gain information about the hidden variables. Using this function determines that first 23 of the 48 columns are variables with the remaining 25 columns containing NAs. From these 23 variables, 3 will be retained for the analysis conducted in this project. These variables are as follows:
| Variable Name | Meaning |
|---|---|
| made_public | the date the dataset was made public |
| modalities | the imaging modalities used in the study |
| species | the species of the subjects used in the study |
3.0 Research Question
The aim of this project is to visualise “How has the use of neuroimaging techniques in human studies changed over time?”. The visualisation produced to answer this research question will demonstrate the count of individual (e.g., EEG, fMRI and NIRS) and multimodal imaging techniques over time from 2018to 2025.
4.0 Data Preparation
4.1 Data Cleaning
Firstly the variables and empty columns were removed from the dataset so that only the variables needed to create the final dataset for the visualisation remained. In addition to this, two new columns were created: ‘ID’ and ‘date_published’. The ‘ID’ column makes the data easier to read while the ‘date_published’ column converts the ‘made_public’ data from a character variable to a date variable. The ‘made_public’ column was then removed to avoid confusion with the new ‘date_published’ column.
## Clean data -----------------------------------------------
### Make dataset with just the variables needed for the -
### project
neuroscience_data <- neuroscience_raw %>%
select(made_public, modalities, species) %>% # Only retain these columns
drop_na() %>% # Remove all rows containing NAs
mutate(ID = row_number(), .before = 1, # Create an ID column to the left of the dataset
date = as.Date(made_public, "%m/%d/%Y"), # Create a date column
date_published = format(date, "%Y")) %>% # Remove the day and month
select(ID, date_published, modalities, species) # Remove the made_public and date columnNext the ‘species’ column was filtered to only include human studies as the research question involves investigating the neuroimaging techniques in humans.
### Remove subjects that aren't human
human_data <- neuroscience_data %>%
filter(species == "Human")
### Sanity Check: check that there aren't any non-human species remaining
for(i in human_data$species){
if(i != "Human"){
print("Test Failed") # If test is passed nothing should be printed
}
}4.2 Renaming ‘modalities’ values
The values in the modalities column were not suitable for this project as they were inconsistent strings. In order to make the data ready for producing a visualisation, the values were recoded into 6 categories: ‘multimodal’, ‘eeg’, ‘meg’, ‘dMRI’, ‘fMRI’, and ‘sMRI’.
### View the number of modalities in the dataset and save as a dataset
modality_count <- human_data %>%
group_by(modalities) %>%
count()
### Remove unspecified groups in modalities variable and rename with imaging categories
modality_data <- human_data %>%
filter(modalities != "beh") %>%
mutate(modalities = case_match(modalities,
c("bold, events, t1w",
"eeg, nirs",
"mri_diffusion, mri_functional, mri",
"mri_diffusion, mri_functional, mri_structural, eeg, mri",
"mri_diffusion, mri_functional, mri_structural, mri",
"mri_diffusion, mri_structural, eeg, mri",
"mri_diffusion, mri_structural, mri",
"mri_diffusion, mri_structural, mri_functional, mri",
"mri_diffusion, mri_structural, mri_functional, mri, pet",
"mri_diffusion, mri_structural, mri_functional, mri_perfusion, mri",
"mri_functional, mri, eeg",
"mri_functional, mri_diffusion, mri_structural, eeg, mri, beh",
"mri_functional, mri_diffusion, mri_structural, meg, mri",
"mri_functional, mri_diffusion, mri_structural, mri",
"mri_functional, mri_perfusion, mri_structural, mri",
"mri_functional, mri_structural, eeg, mri",
"mri_functional, mri_structural, mri",
"mri_functional, mri_structural, mri, beh",
"mri_functional, mri_structural, mri, eeg",
"mri_functional, mri_structural, mri, eeg, beh",
"mri_functional, mri_structural, mri, ieeg",
"mri_functional, mri_structural, mri_diffusion, mri",
"mri_functional, mri_structural, pet_dynamic, mri, pet",
"mri_functional, pet_static, mri_structural, pet_dynamic, mri, pet",
"mri_structural, eeg, mri",
"mri_structural, ieeg, mri",
"mri_structural, meg, mri",
"mri_structural, meg, mri, beh",
"mri_structural, mri, pet",
"mri_structural, mri_diffusion, mri",
"mri_structural, mri_diffusion, mri_functional, mri",
"mri_structural, mri_diffusion, mri_functional, mri, eeg",
"mri_structural, mri_functional, ieeg, mri",
"mri_structural, mri_functional, mri",
"mri_structural, mri_functional, mri, eeg",
"mri_structural, pet, mri",
"pet_dynamic, mri_functional, mri_structural, mri, pet",
"t1w, bold, events",
"t1w, bold, events, fieldmap",
"t1w, channels, eeg, events, bold") ~ "multimodal",
c("channels, eeg, electrodes, events",
"channels, eeg, events",
"eeg, beh",
"ieeg",
"ieeg, eeg") ~ "eeg",
"meg, beh" ~ "meg",
"mri_diffusion, mri" ~ "dMRI",
"mri_functional, mri" ~ "fMRI",
c("mri_structural, mri",
"t1w") ~ "sMRI",
.default = modalities))
### Sanity check: Ensure the total number of observations is 846
### and that the only categories are "multimodal", "eeg", "meg", "dMRI", "fMRI", and "sMRI"
modality_count <- modality_data %>%
group_by(modalities) %>%
count()
sum(modality_count$n)[1] 846
4.3 Reshaping the data
Using the clean ‘modality_data’, the dataset was reshaped to group the data by modality and calculate the count of each modality per year.
## Reshaping the data -------------------------------------
### Calculate the count for each modality per year
neuroimaging_overtime <- modality_data %>%
group_by(modalities) %>%
count(date_published)
### Sanity check: Ensure the total of column 'n' is 846
sum(neuroimaging_overtime$n)[1] 846
### View the first 10 rows of the data
head(neuroimaging_overtime, 10)# A tibble: 10 × 3
# Groups: modalities [3]
modalities date_published n
<chr> <chr> <int>
1 dMRI 2024 2
2 dMRI 2025 1
3 eeg 2019 2
4 eeg 2020 12
5 eeg 2021 22
6 eeg 2022 35
7 eeg 2023 55
8 eeg 2024 56
9 eeg 2025 46
10 fMRI 2020 1
Viewing the neuroimaging_overtime dataset showed that some of the imaging modalities did not have any data for certain years. This would make the whole dataset difficult to visualise, therefore it was decided to assign a value of 0 to the missing years for each modality. A custom function was created to do this.
# Create data frame for the missing data
missing_data <- function(modality, date, count){
data.frame(
modalities = modality,
date_published = date,
n = count
)
}### Some modalities don't have data for every year
### To ensure this is reflected in the plot, add rows for these cases where n = 0
### First make a new dataframe with the missing data:
### dMRI is missing 2018, 2019, 2020, 2021, 2022, and 2023
dMRI_rows <- missing_data("dMRI", c("2018", "2019", "2020", "2021", "2022", "2023"), 0)
### eeg is missing 2018
eeg_rows <- missing_data("eeg", "2018", 0)
### fMRI is missing 2018, 2019, and 2022
fMRI_rows <- missing_data("fMRI", c("2018", "2019", "2022"), 0)
### meg is missing 2018 and 2019
meg_rows <- missing_data("meg", c("2018", "2019"), 0)
### nirs is missing 2018, 2019, 2020, 2021, and 2022
nirs_rows <- missing_data("nirs", c("2018", "2019", "2020", "2021", "2022"), 0)
### sMRI is missing 2018
sMRI_rows <- missing_data("sMRI", "2018", 0)
### Join the missing data dataframes
missing_modality_years <- rbind(dMRI_rows, eeg_rows, fMRI_rows, meg_rows, nirs_rows, sMRI_rows)
### Join missing_modality_years and neuroimaging_overtime
neuroimaging_final <- rbind(neuroimaging_overtime, missing_modality_years)
### Order the dataframe by modality and date_published
neuroimaging_final <- neuroimaging_final[with(neuroimaging_final, order(modalities, date_published)), ]At this point the data was ready for visualisation. Below is a preview of the processed data.
### Preview the final dataset
head(neuroimaging_final, 10)# A tibble: 10 × 3
# Groups: modalities [2]
modalities date_published n
<chr> <chr> <dbl>
1 dMRI 2018 0
2 dMRI 2019 0
3 dMRI 2020 0
4 dMRI 2021 0
5 dMRI 2022 0
6 dMRI 2023 0
7 dMRI 2024 2
8 dMRI 2025 1
9 eeg 2018 0
10 eeg 2019 2
5.0 Visualisation
5.1 Draft Plot
An initial basic line graph was plotted to get an idea of what the data looked like as a visualisation.
# Create basic plot to understand what the data looks like plotted
draft_plot <- ggplot(neuroimaging_final, aes(x = date_published, y = n, group = modalities)) +
geom_line()
print(draft_plot)
Layers were added to this plot to customise the colour of the lines to represent each modality, change the size of the font for the labels and change the colours of the grid lines. To do this a custom theme was created and a colour palette was used.
# Make a custom theme
theme_neuroimaging = theme(
plot.title = element_text(size = 12, hjust = 0.5),
axis.title = element_text(size = 10),
legend.position = "right",
legend.title = element_text(size = 10),
panel.background = element_rect("gray100"),
panel.grid.major = element_line(colour = "gray87"),
axis.line = element_line(colour = "gray10")
)## Adding layers -----------------------------------------
### Add geom_point so that each plot point stands out
### Label the axis and legend and provide a title
### Set the colour palette and add a custom theme
draft_plot <- ggplot(neuroimaging_final, mapping = aes(x = date_published, y = n)) +
geom_line(aes(group = modalities, text = paste("Modality:", modalities), colour = as.factor(modalities))) +
geom_point(aes(group = modalities, text = paste("Modality:", modalities), colour = as.factor(modalities))) +
labs(x = "Year",
y = "Frequency of modality use",
title = "Frequency of different neuroimaging techniques used over time",
colour = "Modality") +
scale_colour_brewer(palette = "Set2") +
theme_neuroimagingWarning in geom_line(aes(group = modalities, text = paste("Modality:",
modalities), : Ignoring unknown aesthetics: text
Warning in geom_point(aes(group = modalities, text = paste("Modality:", :
Ignoring unknown aesthetics: text
### View the draft plot
print(draft_plot)
5.2 Final Interactive Visualisation
To make the final visualisation more intuitive, the ‘plotly’ package was used to introduce an interactive element to the plot. To make sure the interactive labels were reflective of the data, a tooltip was added.
## Make plot interactive -------------------------------
interactive_draft <- ggplotly(draft_plot, tooltip = c("x", "y", "text"))
## Save to environment as final visualisation ----------
neuroimaging_visualisation <- interactive_draftThe final visualisation is rendered below:
### View the final visualisation
neuroimaging_visualisation6.0 Summary
6.1 Conclusions from the Final Visualisation
The aim of the visualisation in this project was to answer the research question “How has use of neuroimaging techniques in human studies changed over time?”. From the final visualisation it can be concluded that multimodal imaging is the most frequently imaging technique across all dates in the analysis. EEG has increased in popularity from 2018, however other imaging techniques were used much less often. This could be due to the costs associated with these imaging modalities. Furthermore the data used in this project was open access data uploaded by researchers, as a result the data may not be reflective of the true neuroimaging research field.
6.2 Self Reflection
This project has taught me how to create and interactive visualisation using the tidyverse and plotly packages. Additionally, I have learnt how to use GitHub as a version control system. This is useful for myself to keep track of changes I have made in my projects as well as for others who may wish to reproduce my projects in the future. For me, the most difficult part of this project was wrangling my data so that it was in the correct format to produce the visualisation with the process involving a lot of trial and error.
6.3 Suggestions for the Future
To expand on the current visualisation, I would like to learn how to use gganimate to make the lines of the graph move across the plot from 2018 to 2025. This would make the graph more complex and would highlight the change in frequency of different modalities over time to a greater extent than the current visualisation. Additionally, it would be insightful to conduct the same visualisation over a larger period of time. This would allow for a better view of how approaches to neuroimaging in research studies have changed over time.
7.0 References
OpenNeuro. (2019). OpenNeuro Dataset Metadata. Google Docs. https://docs.google.com/spreadsheets/d/1rsVlKg0vBzkx7XUGK4joky9cM8umtkQRpJ2Y-5d6x7c/edit?gid=762232233#gid=762232233
Psychological Stimulus Sets and Datasets. (2019). Google Docs. https://docs.google.com/spreadsheets/d/1ejOJTNTL5ApCuGTUciV0REEEAqvhI2Rd2FCoj7afops/edit?gid=0#gid=0