Data & Software Resources
Table of Contents
- ESIP Resources
- Managing data
- FAIR Data
- Data & Software Citation
- Documentation & MetaData
- Data Sharing & Reuse
- Find Research Datasets
- Data Management
- Data Management Plans
- Project open data
- Council of Data Facilities (CDF)
- COPDESS
- DOE (Department of Energy) resources
- NASA (National Aeronautics and Space Administration) resources
- NOAA (National Oceanic and Atmospheric Administration) resources
- Smithsonian Institution resources
- USGS (United States Geological Survey) resources
- Education-oriented resources
Software/model-oriented resources
Websites
ESIP Resources
Cluster mission statement discussion, drafts
ESIP summer 2013 panel discussion
Project Open Data Implementation Guide
ESIP collaboration Areas: Has links to multiple committees that list relevant resources.
Discovery Cluster: Linking Datasets to the Applications that Use Them
Managing data
Easy Resources for Managing Your Ocean Data
Data Management
Figshare: topic of Data management and data science
Data Management Best Practices
Data Management Short Course for Scientists
Ten Simple Rules for Creating a Good Data Management Plan
Fundamentals in Data Management for Qualitative and Quantitative Arctic Research
ESIP Data Management Training Clearinghouse (DMTC)
Data management training at level of grad school: Series of google docs
figshare search on “Data management”
figshare search on keyword: “Data management”
Data Management Plans
Ten Simple Rules for Creating a Good Data Management Plan
Video titled “10 Simple Rules for Creating a Data Management Plan”
General Information and Resources:
- Creating a Data Management Plan is presented as a critical component of effective data management and a valuable research tool.
- The ESIP Data Management Training Clearinghouse may help researchers find training resources to supplement learning on topics like Data Management Plans.
Overall, the sources indicate that data help desks serve a need by assisting researchers with creating and understanding Data Management Plans, pointing them to relevant tools, funder guidance, and educational resources.
FAIR Data
FAIR: Findable, Accessible, Interoperable, and Reusable
Data & Software Citation
Data Citation Guidelines for Earth Science Data, Version 2
5 Tips to Citing Your Research Software and Improving Discovery: This resource from theAGU libcce & ShelleyStall provides 5 Tips to Citing Your Research Software and Improving Discovery.
Connect your research to your data, software, & institution: This Digital Presence Checklist from theAGU ShelleyStall libcce helps you connect your research to your data, software, & institution.
Documentation & MetaData
What to document with data
https://managing-qualitative-data.org/modules/2/a/
https://dmeg.cessda.eu/Data-Management-Expert-Guide/2.-Organise-Document/Documentation-and-metadata
Data documentation: what to include
About metadata https://www.ncei.noaa.gov/resources/metadata#Intro
Metadata Best Practices and Data Publishing
Sites regarding adding metadata to data
DataCite - Offers guidelines for assigning metadata to research datasets, focusing on citation and discoverability. datacite.org
Dublin Core Metadata Initiative - Provides a standard set of vocabulary terms to describe web resources, including datasets. dublincore.org
The International Organization for Standardization (ISO) - Publishes standards like ISO 19115 for geographic information metadata. iso.org
The National Oceanic and Atmospheric Administration (NOAA) - Offers a comprehensive metadata guidance document for oceanographic data. noaa.gov
DataONE - Provides resources and best practices for managing and sharing scientific data, including metadata standards. dataone.org
The Ecological Metadata Language (EML) - A standard for metadata in ecological and environmental datasets, with clear documentation. eml.cci.ku.edu
The Biological Data Consortium - Offers guidelines for metadata standards in biological data. biologicaldataconsortium.org
The UK Data Archive - Provides detailed guidance on preparing and documenting research data, including metadata. data-archive.ac.uk
The World Data System (WDS) - Offers metadata guidelines to enhance the discoverability of scientific data. wds.org
Schema.org - A collaborative, community-driven initiative to create, maintain, and promote schemas for structured data on the internet. schema.org
Web apps and open-source tools designed to streamline the process of filling in metadata for datasets.
Metabase: An open-source tool that allows users to create metadata profiles and manage data effectively. It helps structure metadata input without having to write everything in a single document. metabase.com
DataONE: Offers an online tool called the DataONE Metadata Editor, which provides a user-friendly interface for entering and managing metadata compliant with various standards. dataone.org
EML Editor: A web-based tool specifically for creating metadata in the Ecological Metadata Language (EML) format. It guides users through filling out necessary fields. eml.cci.ku.edu
CKAN: An open-source data management system that includes features for metadata management, allowing users to create and edit metadata records through a web interface. ckan.org
OpenRefine: While primarily a tool for cleaning and transforming data, it can also assist in adding and managing metadata for datasets in a structured way. openrefine.org
Dublin Core Metadata Generator: A simple web app that helps create Dublin Core metadata records through a guided interface. dublincore.org
Geonetwork: An open-source catalog application for spatially referenced resources that provides metadata editing capabilities. geonetwork-opensource.org
RDMKit: This tool provides a framework for managing research data, including metadata, and is customizable for various disciplines. rdmkit.org
Biodiversity Information Standards (TDWG): Offers metadata standards and tools that facilitate the documentation of biodiversity datasets, including easy-to-use templates. tdwg.org
Free web apps and tools that can help extract metadata
Sites with information and tools to extract metadata from data files and accompanying documents. While not all of these tools may extract metadata automatically, they provide frameworks or interfaces that can facilitate the process significantly.
Meta-Analysis Data Extraction Tool (MADET): This tool helps extract metadata from documents and datasets, focusing on specific attributes relevant to meta-analysis. made.ariadne-infrastructure.eu
Pandas and Python Scripts: While not a web app, using Python with libraries like Pandas can help automate the extraction of metadata from structured data files (like CSVs) and text documents. You can find many open-source scripts on GitHub for this purpose.
DataCite Metadata Schema: DataCite provides tools and examples to help automate metadata creation, especially when submitting datasets for DOI registration. While it’s more of a guideline, it offers templates that can facilitate extraction. datacite.org
Easy Data Upload (EDU): A web app that allows you to upload datasets and includes features for metadata extraction. It’s user-friendly and designed for various types of scientific data. edudata.org
RDA Metadata Standards Directory: The Research Data Alliance offers links to various tools and resources for metadata extraction. Some tools might include automated extraction features. rd-alliance.org
Frictionless Data: Provides tools for data packaging and can help extract metadata from structured datasets. While it’s more focused on data standards, it can assist in generating relevant metadata. frictionlessdata.io
CKAN: Although primarily a data management system, CKAN allows users to upload datasets and provides features for extracting and managing metadata. Some installations may offer automation through plugins. ckan.org
Geonetwork: This tool allows for metadata management and can extract geographic metadata from uploaded datasets, especially if they contain spatial references. geonetwork-opensource.org
DataHub: A platform that allows users to upload datasets and provides features for managing and extracting metadata. datahub.io
Biodiversity Information Standards (TDWG): Offers tools for extracting biodiversity-related metadata from datasets, which might be useful depending on your data type. tdwg.org
Data Sharing & Reuse
Reasons to share and archive data
What is a data sharing archive?
Data management: archiving and preservation
Find Research Datasets
Finding and reusing research datasets
Data Management
Easy Resources for Managing Marine Data
Article on data management, an overview
Project open data
Data management & governance Playbooks, guidance, templates, and other resources to support the implementation of policy, the creation of data governance structures, and the day-to-day work of data management in the federal government Browse Data management & governance > https://resources.data.gov/categories/data-management-governance/
Data tools Software tools and complementary resources to support the hands-on work of data practitioners Browse Data tools > https://resources.data.gov/categories/data-tools/
Data incubator Crowdsourcing, prize competitions, public-private partnerships, and other resources to support the incubation of data projects Browse Data incubator > https://resources.data.gov/categories/data-incubator/
Skills development Guides, tutorials, and personnel-related resources to build and develop a data-savvy workforce Browse Skills development > https://resources.data.gov/categories/skills-development/
Guidance Guidance from the Office of Management and Budget on data issues, and other memoranda applicable to federal agencies Browse Guidance > https://resources.data.gov/categories/guidance/
Case studies & examples Articles, use cases, and proof points describing projects undertaken by data managers and data practitioners across the federal government Browse Case studies & examples > https://resources.data.gov/categories/case-studies-examples/
Council of Data Facilities (CDF)
CDF is a federation of existing and emerging geoscience data facilities that serves as an effective cyberinfrastructure foundation for domain and interdisciplinary earth system science.
What we do: Connect data facilities, especially those that include Earth science datasets, and provide a collective voice for CDF members.
Council of Data Facilities (CDF)
Council of Data Facilities (CDF) Wiki
COPDESS
Esip Coalition for Publishing Data in the Earth and Space Sciences (COPDESS)
DOE (Department of Energy) resources
ESS-DIVE (Environmental System Science Data Infrastructure for a Virtual Ecosystem)
KBase (Systems Biology Knowledgebase)
NASA (National Aeronautics and Space Administration) resources
Global Change Master Directory (GCMD)
NASA Earth Science Data Systems (ESDS)
NASA Transform to Open Science
NSIDC (National Snow and Ice Data Center)
NOAA (National Oceanic and Atmospheric Administration) resources
National Environmental Satellite, Data and Information Service (NESDIS)
NCEI (National Centers for Environmental Information) data submission and archival
NOAA World Data System for Paleoclimatology
Smithsonian Institution resources
National Museum of Natural History (NMNH) Biorepository
Smithsonian Environmental Research Center (SERC)
USGS (United States Geological Survey) resources
Acceptable Digital Repositories for USGS Scientific Publications and Data
Guidance, best practices, and tools for data management
Education-oriented resources
SERC (Science Education Resource Center)
Software/model-oriented resources
CIG (Computational Infrastructure for Geodynamics)
CSDMS (Community Surface Dynamics Modeling System)
Water-Organic-Rock-Microbe (WORM) portal
Videos
Data and Software Guidance
Talk regarding data & software guidance
Videos from Previous Data Help Desks
Megan Carter at 2020 ESIP Summer Meeting: Recording of Megan Carter giving the above presentation at the 2020 ESIP Summer Meeting Session on Connecting Informatics to Science Communities
NASA Giovanni system: This short video demonstrates that the NASA Giovanni system can now provide averages over several years using daily data variables and short time intervals. It highlights examining spatial and temporal variability of recurring Earth system phenomena with the Giovanni Recurring Averages option for daily data variables. This video is related to the #DataHelpDesk at #AGU23.
YouTube channel for IRIS_EPO: This is the YouTube channel for IRIS_EPO, which advances discovery, research, and education in seismology through a fantastic suite of videos. This is associated with #DataHelpDesk #AGU23.
Demo introducing QGreenland: This demo introduces #QGreenland, a free and open-source data package for #QGIS, as an all-in-one mapping and data analysis tool for #Greenland. This was promoted at #DataHelpDesk #AGU23.
Video demonstrating Geoweaver: This video demonstrates Geoweaver, an open-source #ML / #AI workflow solution that helps with tracking research code and remembering history. A one-pager is also paired with this demo. This was featured at #DataHelpDesk #AGU23.
AGU webinar with libcce ShelleyStall: This theAGU webinar with libcce ShelleyStall provides a checklist, tools, and practices that make sharing data and software at the time of publication easier, addressing what should be cited in a paper.
The principles of tidy data: This video from #DataHelpDesk expert sjeanetteclark ArcticDataCtr DataONEorg teaches the principles of tidy data, how to recognize untidy data, and why having tidy data can set you up for success.
how ontologies help you understand the tons of great data available: In this video from #DataHelpDesk expert sjeanetteclark, you can learn how ontologies help you understand the tons of great data available. This is from ArcticDataCtr DataONEorg.
Getting Started with NASA Worldview: This demonstration will introduce you to the NASA Earth Observing System Data and Information System (EOSDIS) Worldview imagery mapping and visualization application.
How to use the NASAWorldview imagery mapping & visualization app: This tutorial shows how to use the #NASAWorldview imagery mapping & visualization app to explore global Earth science data imagery, with many layers available within 3 hours of observation. It also explains how to download the underlying data. This was highlighted as having #NASA experts at the #AGU23 #DataHelpDesk.
New Earth science data resources available at earthdata.nasa.gov: This video introduces new Earth science data resources available at earthdata.nasa.gov to help users find, access, use, and visualize NASA Earth science data. It is useful for those just getting started, novice users, or expert data users. This was shared at #AGU23 #DataHelpDesk and #GSA2021 #DataHelpDesk.
What About Model Data?: This video, titled “What About Model Data?”, addresses challenges in knowing what data to keep from a simulation, relevant for the many #simulation #modeling talks. This was shared at #DataHelpDesk #AGU23.
A roadmap to help grad students identify data management practices they should be considering: This short video presents a roadmap to help grad students identify data management practices they should be considering.
https://youtu.be/bftzPnFdtHk / https://doi.org/10.5281/zenodo.4706146: This helpful video from theAGU ShelleyStall & libcce discusses your Digital Presence – Increasing Your Impact with Citations and Collaborations. The slides are also available at the provided Zenodo link.
Plan, curate, and connect your software: This checklist helps you to plan, curate, and connect your software. It is from theAGU ShelleyStall libcce.
Intro video for ESSDIVE: This is an intro video for ESSDIVE, a data repository for US Dept. of Energy Earth and environmental science data.
How to use the ESS-DIVE repository search feature to locate datasets: This video demonstrates how to use the ESS-DIVE repository search feature to locate datasets for your next #datascience project.
ESSDIVE recommended reporting format: This GitBook page outlines the ESSDIVE recommended reporting format for multidisciplinary sample identifiers and associated metadata in ecosystem sciences. It is not explicitly described as a video, but could contain embedded videos.
The NEON Data Portal: This video introduces the NEON Data Portal and provides an introduction to working with NEON data in R.
Figshare for Institutions: This video highlights that Figshare for Institutions is a next-generation, all-in-one, hosted #repository that can help with #FAIRdata and #OpenScience. It is designed for both administrators and end users.
NEON Data Portal: This video offers a guided tour of the NEON Data Portal and an introduction to working with NEON data in R.
introduces DataDiscStudio: This video introduces DataDiscStudio, a resource that helps researchers find the data they need from over 1.6 million datasets from 40+ geoscience data repositories and community contributions.
Basic introduction to ERDDAP: This is a basic introduction to ERDDAP.
NOAA Open Data and Jupyter Notebooks: This video from NOAA shows how to build your own weather app using NOAA Open Data and Jupyter Notebooks.
A brief overview of ezEML: This video provides a brief overview of ezEML, a form-based, do-it-yourself online application for creating metadata in the Ecological Metadata Language (EML). It is aimed at scientists who want to prepare their dataset for submission to a data repository but are not proficient in EML editing. A more in-depth tutorial is available.
ERDDAP is a data server: This #DataHelpDesk spotlight highlights that ERDDAP is a data server that provides a simple, consistent way to download subsets of gridded & tabular scientific datasets in common file formats and make graphs & maps.
Video from NEON: This is another video from NEON.
Introduces OpenAltimetry: This video introduces OpenAltimetry, an exploration and visualization tool for data from #NASA #ICESat and #ICESat2 data.
10 Simple Rules for Creating a Data Management Plan: This video, addressing a common #DataHelpDesk question, presents 10 Simple Rules for Creating a Data Management Plan, from nceas Amber Budden and UNMLibraries Bill Michener.
Data citation best practices: This video discusses data citation best practices and how the makedatacount project is supporting data citations for increased credit and attribution.
DataONEorg supports easy access and discovery of data: This #DataHelpDesk Spotlight shows how DataONEorg supports easy access and discovery of data across a network of Earth & environmental science data repositories.
Full overview of Data Management Planning: This video provides a full overview of Data Management Planning, including guidance for creating a DMP and a demo of the DMPTool.
Environmental Data Initiative repository and curation services: This #DataHelpDesk Repository Spotlight offers a brief overview of the repository and curation services offered by the Environmental Data Initiative EDIgotdata.
Ag Data Commons: This #DataHelpDesk spotlight is on the Ag Data Commons, a research data catalog and repository available to help the agricultural research community share and discover research data funded by the USDA.
https://ftp.osuosl.org/pub/fosdem/2021/D.geospatial/osgeolive.mp4: This link points to a video showcasing a powerful open-source geospatial toolkit.
Playlist of videos from Argovis: This is a playlist of videos from Argovis.
A new way to visualize Argo’s biogeochemical data: This Argovis video presents a new way to visualize Argo’s biogeochemical data.
Introduces Science gateways: This video introduces Science gateways, which allow science & engineering communities to access shared resources. sciencegateways serves the science gateway community by sharing experiences and providing services.
StraboSpot: This #DataHelpDesk spotlight showcases StraboSpot, a geologic data system that allows researchers to digitally collect, store, and share both field and laboratory data based on how geologists actually work.
JupyterHub and Hydroshare: This video highlights another great CUAHSI resource: cloud-based data tools for research, collaboration, and workflow documentation in the aquatic sciences: JupyterHub and Hydroshare.
CUAHSI HydroShare: This #DataHelpDesk spotlight is on CUAHSI HydroShare: a repository for sharing, collaborating around, and formally publishing scientific data.
IRIS has a YouTube channel: This indicates that IRIS has a YouTube channel with lots of instruction on seismology and related geosciences.
https://www.dataone.org/webinars/tidy-ing-your-data-simple-steps-reproducible-research / https://vimeo.com/378621271: This video, “Tidy-ing Your Data: Simple Steps for Reproducible Research”, is from #DataHelpDesk expert sjeanetteclark ArcticDataCtr DataONEorg.
NASA’s GES DISC: This video introduces NASA’s Goddard Earth Sciences Data and Information Services Center (GES DISC).
https://www.youtube.com/watch?v=2RkrvISQQsA / https://youtu.be/2RkrvISQQsA?si=gu4hCFRLKZjQYuum: This is a GES DISC L34RS Demonstration Video.
NASA GES DISC Level 2: This video is a NASA GES DISC Level 2 Subsetting & Data Validation Demo.
NASA GES DISC Giovanni: This is a NASA GES DISC Giovanni 20th Anniversary Demo Video.
EOSDIS resources: This video tours new Earth Observing System Data and Information System (EOSDIS) Earth science data resources available at earthdata.nasa.gov to help users find, access, use, and visualize NASA Earth science data. It is for users of all levels.
Animal Telemetry Network Data Portal: This video introduces the Animal Telemetry Network Data Portal.
Journal Practices for Data (and Software): Module 1: This is Module 1: Introduction of “The Paper and The Data: Authors, Reviewers, and Editors Webinar on Updated Journal Practices for Data (and Software)” from AGU/ESIP. It covers challenges with accessing data, the AGU Data Position Statement, recommendations from the NAS, updated journal guidelines, and benefits for sharing data/software.
Journal Practices for Data (and Software): Module 2: This is Module 2: Data of the same webinar, discussing what data to share, what repository to use, availability statements, citation, and examples.
Journal Practices for Data (and Software): Module 3: This is Module 3: Software, covering what software to share, availability statements, citation, GitHub considerations, and examples.
Journal Practices for Data (and Software): Module 4: This is Module 4: Peer Review, including recommendations from AGU and examples.
Journal Practices for Data (and Software): Module 5: This is Module 5: Persistent Identifiers (PIDs), discussing ORCID, DOI, and the PID Graph.
How to find training resources on research data skills: This video offers quick tips & suggested tools from data support specialist Nancy Hoebelheinrich to find training resources on research data skills targeted to wherever you are on your research path.
Introduces Rockd: This video introduces Rockd, a mobile app that draws on community data resources to help you explore, capture, and share your geological surroundings.
How to explore and visualize oceanobserv data: This #DataHelpDesk expert Stace Beaulieu demos how to explore and visualize oceanobserv data including Quality Assurance of Real-Time Oceanographic Data (QARTOD) flags.
GitHub Repositories
Earth Science GitHub Repos (many software projects relating to earth science)