Tag

Datasets

All articles tagged with #datasets

Diversity-based biosignature could separate life from non-life in space data
space3 hours ago

Diversity-based biosignature could separate life from non-life in space data

Researchers report a May 2026 Nature Astronomy study from UC Riverside showing living systems produce amino-acids in a more diverse and evenly distributed pattern than non-living chemistry, using an ecology-inspired diversity measure (richness and evenness) tested on about 100 published datasets. The method—reading how evenly molecules are distributed rather than looking for specific molecules—also finds that fatty acids do reverse trends, so the signature isn't universal across molecule classes. Because the metric can run on standard abundance tables, it could be tested with measurements from current space missions, offering a low‑cost biosignature tool, though it remains preliminary: results are based on terrestrial data, not live missions, and real mission data will be needed to validate its usefulness and account for contamination and preservation effects.

Your Music in AI Training Data: The Hidden Cost of GenAI Sound
technology1 day ago

Your Music in AI Training Data: The Hidden Cost of GenAI Sound

The Atlantic's AI Watchdog reveals that many AI music systems train on vast public datasets that often provide links to tracks rather than the actual audio, raising licensing, privacy, and authorship concerns. The piece highlights how this transparency gap, combined with inconsistent licensing and terms of service, could undermine musicians’ control over their work and enable potential lawsuits, all while arguing that current models are predictive rather than truly creative. It points to examples like Hainbach’s large dataset and Google/YouTube-related training questions, and urges stronger disclosure and guardrails to address inequities in who benefits from AI-generated music.

SZA and Kenny Beats Call Out AI Training of Their Songs
technology1 day ago

SZA and Kenny Beats Call Out AI Training of Their Songs

After The Atlantic revealed an AI-detection tool showing millions of songs— including works by SZA and Kenny Beats—in datasets used to train AI music generators, the artists publicly condemned the practice as exploitative. The story also covers mixed industry reactions, ongoing lawsuits against Suno and Udio, and the fact that some datasets link to YouTube/Spotify, highlighting that the legal and ethical framework for AI training data remains unresolved.

Advancements in Deep Learning for Predicting Organic Molecular Spectra and Diagnosing Active Pulmonary Tuberculosis
science-and-technology2 years ago

Advancements in Deep Learning for Predicting Organic Molecular Spectra and Diagnosing Active Pulmonary Tuberculosis

Researchers have developed a deep learning model called DetaNet for predicting selected organic molecular spectra. The model utilizes a combination of atomic and electronic features, as well as spherical harmonic functions, to generate accurate predictions. The researchers used publicly available datasets, including optimized structures and various properties of molecules, to train and validate the model. The DetaNet model and trained parameters are available for reference, along with the program used for spectrum prediction. This research contributes to the growing field of deep learning in spectroscopy and has potential applications in various scientific disciplines.