Understanding the Earth requires geoscientists to pull together a variety data to look for common patterns that represent an event in the rock record. One data set that is commonly used for hydrocarbon exploration in sedimentary rocks is the interpretation of log facies from wells. Log facies are generated by individual geoscientists as they manually interpret patterns of a gamma ray curve, which measures the vertical changes in radioactivity in a well. The gamma log patterns are the result of grain size changes in the rocks (i.e. mudstones are more radioactive than sandstones) and specific patterns in the gamma log can be interpreted as different events. For example, a bell shape pattern can be interpreted as a meandering river channel. After interpreting tens to hundreds of wells the geoscientist reviews all these patterns on a map to understand how the rocks were deposited (i.e. beach deposits). This map is used to predict the location of ideal reservoirs (i.e. porous rocks) that can hold hydrocarbons for Exploration companies to target.
The objective of this contest is as follows: Given an array of GR (Gamma-Ray) values, accurately predict the log facies type corresponding to each value.
The data set consists of the following files:
-
Train:
-
well_id & row_id: columns uniquely identifying each row
-
GR: Gamma-Ray log values
-
label: Variable to be predicted
-
-
Test: test samples for prediction
-
Submission Format: The submission file format
-
Download raw data
-
Data cleansing and preprocessing --> Preprocessing pipeline
-
EDA
-
Feature Engineering (optional)
-
Build ML model (baseline)
-
Train and evaluate model
-
visualize and assess predictions
The Society for Sedimentary Geology (SEPM) has a more detailed description of the log facies interpretation workflow that this challenge represents and how geoscientists use this information to interpret the geologic past.