Hi everyone! We hope you’re having a great holiday. We wanted to talk a little about the use case we picked for our book on machine learning pipelines. For the book, we decided to train a model that forecasts Bitcoin (BTC) prices and then use a big data streaming architecture to make trades based on our forecasts. Pretty cool right? We think so! So here’s a bit on how it works.
Making a prediction
We use historical BTC price data available from Kaggle. This dataset contains years of minute-by-minute pricing. We then used a state of the art time series forecasting tool called Prophet which was recently open sourced by Facebook. Prophet uses an additive regressive model to separate out the general price trend from hourly and daily cyclicality. This enables us to understand, minute by minute, what we expect to happen to price in the future–irrespective of the general trend in price. In this way, the model could theoretically yield profitable trades without regard for whether price is going generally up or down.
Streaming order data
Once we walk our readers through how to train this kind of model using a big historic data set, we set up a streaming pipeline using the Gemini exchange’s Websocket API. This provides us the entire stream of bid and ask orders placed on the Gemini exchange. We feed this to Kafka, a distributed streaming platform developed at LinkedIn. Kafka acts as glue for the analytic and trading components of the pipeline.
We then ingest this massive feed using Spark Streaming. We use Spark Streaming to analyze buy and sell side liquidity to ensure that our model’s predictive power won’t be impacted in real time by the volatility of the market. We pass this analysis back to Kafka.
Our trading bot takes the Spark processed data and, along with the forecasts we generated at the outset, executes trades. It then stores data on how our bot is performing in Elasticsearch so that we can monitor the bot’s performance in real-time.
We hope so! We use this example in our book to teach you how big data architectures work and how to build machine learning pipelines. Stay tuned for more updates on our progress!