A study in how efficient that the public odds reflect the actual strike rate¶
(dataset is from year 2006-2020)We will have a look at the correlation between public odds and strike rate first:
In [1]:
# Import library and data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df_runs = pd.read_csv('results.csv')
df_runs.head()
Out[1]:
In [2]:
# Check strike rate group by odds
df_runs['won'] = [1 if df_runs['place'].loc[ei] == "1" else 0 for ei in df_runs.index ]
df_runs['win_odds'] = [99 if df_runs['win_odds'].loc[ei] > 99 else df_runs['win_odds'].loc[ei] for ei in df_runs.index ]
df_runs['win_odds'] = df_runs['win_odds'].round()
# remove scratched horses
df_runs = df_runs.drop(df_runs[df_runs.win_odds <= 0].index)
odds_vs_wins = df_runs.groupby('win_odds')['won'].mean()
ax = odds_vs_wins.plot();
ax.set(xlabel='win odds', ylabel='strike rate')
plt.show()
In [3]:
# Check the correlation between odds and strike rate
odds_vs_wins_df = pd.DataFrame({'odds':odds_vs_wins.index, 'strike_rate':odds_vs_wins.values})
odds_vs_wins_df['odds'].corr(odds_vs_wins_df['strike_rate'])
Out[3]:
We get a -0.523 corr between public odds and strike_rate indicates that there seems to be a moderate linear relationship between them. If you are an experienced horse racing model builder, you will recognize that you can hardly create a factor to reach a corr close to -0.523.
Applying domain knowledge¶
To compare the public odds and strike rate in a percentage to percentage way, we convert the public odds to an estimated probability. To calculate such estimated probability, we also need to consider the HKJC 17.5% take.
In [4]:
# Check strike rate group by chance converted from odds.
df_runs['odds_prob'] = ((1-0.175)/df_runs['win_odds']).round(2)
odds_prob_vs_wins = df_runs.groupby('odds_prob')['won'].mean()
ax =odds_prob_vs_wins.plot();
ax.set(xlabel='odds prob', ylabel='strike rate')
plt.show()
In [5]:
# Check the correlation between odds_prob and strike rate
odds_prob_vs_wins_df = pd.DataFrame({'odds_prob':odds_prob_vs_wins.index, 'strike_rate':odds_prob_vs_wins.values})
odds_prob_vs_wins_df['odds_prob'].corr(odds_prob_vs_wins_df['strike_rate'])
Out[5]:
We get a much higher 0.992 corr between odds_prob and strike_rate after considering the HKJC takes and convert it to be an estimated probability.
The betting market of horse racing in Hong Kong looks quite efficient, really?
The betting market of horse racing in Hong Kong looks quite efficient, really?
Is there a trend that the betting market is getting more efficient over time?¶
Let's have a look at the correlation between odds and strike_rate by year.
In [6]:
# Group strike rate by year and odds_prob
df_runs['year'] = pd.DatetimeIndex(df_runs['race_date']).year
df_runs['dummy'] = df_runs['year'];
odds_prob_vs_wins_in_year = df_runs.groupby(['year','odds_prob']).agg({'dummy':'size', 'won':'mean'}).rename(columns={'dummy':'count','won':'strike_rate'}).reset_index()
odds_prob_vs_wins_in_year = odds_prob_vs_wins_in_year.drop(odds_prob_vs_wins_in_year[odds_prob_vs_wins_in_year["count"] < 10].index)
yearly_corr_df = pd.DataFrame(columns=['year','corr'])
for yr in range(2006, 2021):
corr_by_year = odds_prob_vs_wins_in_year[odds_prob_vs_wins_in_year['year']==yr]
corr = corr_by_year['odds_prob'].corr(corr_by_year['strike_rate'])
yearly_corr_df = yearly_corr_df.append({"year":yr, "corr":corr}, ignore_index=True)
print("Year {0} {1}").format(yr, corr.round(3))
In [7]:
# plot the corr between odds_prob and strike rate by year
yearly_corr_df.plot(x='year',y="corr")
Out[7]:
As we can see from the chart above, the correlation between odds and strike rate are very high most of the time.
There seems to be not much room for those value betting punters in Hong Kong horse racing from the WIN pool. Beating the public odds should be an extremely challenging task for them.
From this study, I would conclude that the betting market in horse racing in Hong Kong is not getting more and more efficient over time. It has always been quite efficient. (It was efficient at least for the last 15 years)
For further study: If the public odds in WIN pool are so predictive, can we take advantage of its accuracy to find value bets in varoius exotic pools?
There seems to be not much room for those value betting punters in Hong Kong horse racing from the WIN pool. Beating the public odds should be an extremely challenging task for them.
From this study, I would conclude that the betting market in horse racing in Hong Kong is not getting more and more efficient over time. It has always been quite efficient. (It was efficient at least for the last 15 years)
For further study: If the public odds in WIN pool are so predictive, can we take advantage of its accuracy to find value bets in varoius exotic pools?
In [ ]:
沒有留言:
張貼留言