machine learning artificial intelligence organic geochemistry molecular fossils biomarkers

AI and the Molecular Fossil Record of Petroleum

Every oil sample is a molecular fossil record. We are training artificial intelligence on large petroleum geochemistry databases to find patterns that traditional biomarker ratios miss.

AI and the Molecular Fossil Record of Petroleum

An oil sample contains hundreds of measurable organic compounds. Their relative abundances reflect what organisms were living in the source environment, what conditions were like when the organic matter was deposited, and what happened to it in the subsurface afterward. There are databases with hundreds of thousands of these samples, but the data is high-dimensional and most analysis relies on simple ratios between compounds.

Another approach is to produce embeddings on these databases. The idea is to learn a representation of oil compositions that incorporates the chemistry, the geological context, the paleogeography, and any other information we have about each sample, so samples with similar geological histories end up near each other in the embedding space. This captures relationships between source biology, depositional environment, and thermal maturity that you can’t see by looking at one ratio at a time.

Working at database scale

With enough samples, you can start to pick out signals tied to major events in Earth history: evolutionary innovations, mass extinctions, changes in ocean redox. These patterns are invisible in any single study but emerge when you analyze tens of thousands of samples together.

Beyond two-compound ratios

Traditional petroleum geochemistry works with ratios between pairs of compounds, one at a time. We use the full molecular fingerprint simultaneously. Some of the most interesting signals only show up in the multivariate structure.