A study in how efficient that the public odds reflect the actual strike rate¶

(dataset is from year 2006-2020)
We will have a look at the correlation between public odds and strike rate first:

# Import library and data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


df_runs = pd.read_csv('results.csv')
df_runs.head()

# Check strike rate group by odds

df_runs['won'] = [1 if df_runs['place'].loc[ei] == "1" else 0 for ei in df_runs.index ]
df_runs['win_odds'] = [99 if df_runs['win_odds'].loc[ei] > 99 else df_runs['win_odds'].loc[ei] for ei in df_runs.index ]
df_runs['win_odds'] = df_runs['win_odds'].round()

# remove scratched horses
df_runs = df_runs.drop(df_runs[df_runs.win_odds <= 0].index)

odds_vs_wins = df_runs.groupby('win_odds')['won'].mean()
ax = odds_vs_wins.plot();
ax.set(xlabel='win odds', ylabel='strike rate')
plt.show()

# Check the correlation between odds and strike rate

odds_vs_wins_df = pd.DataFrame({'odds':odds_vs_wins.index, 'strike_rate':odds_vs_wins.values})
odds_vs_wins_df['odds'].corr(odds_vs_wins_df['strike_rate'])

-0.523455712383361

We get a -0.523 corr between public odds and strike_rate indicates that there seems to be a moderate linear relationship between them. If you are an experienced horse racing model builder, you will recognize that you can hardly create a factor to reach a corr close to -0.523.

Applying domain knowledge¶

To compare the public odds and strike rate in a percentage to percentage way, we convert the public odds to an estimated probability. To calculate such estimated probability, we also need to consider the HKJC 17.5% take.

# Check strike rate group by chance converted from odds.

df_runs['odds_prob'] = ((1-0.175)/df_runs['win_odds']).round(2)

odds_prob_vs_wins = df_runs.groupby('odds_prob')['won'].mean()
ax =odds_prob_vs_wins.plot();
ax.set(xlabel='odds prob', ylabel='strike rate')
plt.show()

# Check the correlation between odds_prob and strike rate

odds_prob_vs_wins_df = pd.DataFrame({'odds_prob':odds_prob_vs_wins.index, 'strike_rate':odds_prob_vs_wins.values})
odds_prob_vs_wins_df['odds_prob'].corr(odds_prob_vs_wins_df['strike_rate'])

0.992256764581831

We get a much higher 0.992 corr between odds_prob and strike_rate after considering the HKJC takes and convert it to be an estimated probability.
The betting market of horse racing in Hong Kong looks quite efficient, really?

Is there a trend that the betting market is getting more efficient over time?¶

Let's have a look at the correlation between odds and strike_rate by year.

# Group strike rate by year and odds_prob

df_runs['year'] = pd.DatetimeIndex(df_runs['race_date']).year
df_runs['dummy'] = df_runs['year'];
odds_prob_vs_wins_in_year = df_runs.groupby(['year','odds_prob']).agg({'dummy':'size', 'won':'mean'}).rename(columns={'dummy':'count','won':'strike_rate'}).reset_index()
odds_prob_vs_wins_in_year = odds_prob_vs_wins_in_year.drop(odds_prob_vs_wins_in_year[odds_prob_vs_wins_in_year["count"] < 10].index)

yearly_corr_df = pd.DataFrame(columns=['year','corr'])

for yr in range(2006, 2021):
    corr_by_year = odds_prob_vs_wins_in_year[odds_prob_vs_wins_in_year['year']==yr] 
    corr = corr_by_year['odds_prob'].corr(corr_by_year['strike_rate'])    
    yearly_corr_df = yearly_corr_df.append({"year":yr, "corr":corr}, ignore_index=True)
    print("Year {0}  {1}").format(yr, corr.round(3))

Year 2006  0.988
Year 2007  0.995
Year 2008  0.994
Year 2009  0.99
Year 2010  0.996
Year 2011  0.986
Year 2012  0.996
Year 2013  0.995
Year 2014  0.945
Year 2015  0.989
Year 2016  0.975
Year 2017  0.995
Year 2018  0.999
Year 2019  0.997
Year 2020  0.989

# plot the corr between odds_prob and strike rate by year

yearly_corr_df.plot(x='year',y="corr")

<matplotlib.axes._subplots.AxesSubplot at 0xb3acf70>

As we can see from the chart above, the correlation between odds and strike rate are very high most of the time.
There seems to be not much room for those value betting punters in Hong Kong horse racing from the WIN pool. Beating the public odds should be an extremely challenging task for them.
From this study, I would conclude that the betting market in horse racing in Hong Kong is not getting more and more efficient over time. It has always been quite efficient. (It was efficient at least for the last 15 years)
For further study: If the public odds in WIN pool are so predictive, can we take advantage of its accuracy to find value bets in varoius exotic pools?

	race_num	place	horse_race_num	horse_name	horse_id	jockey	trainer	weight_carried	horse_weight	draw	lbw	finish_time	win_odds	race_date
0	8	5	8	LUCKY LAD	A148	M Rodd	J Moore	121	1136	9	3-1/4	1.23.60	5.0	2005-09-04
1	4	1	6	MASTER MARAUDER	A184	M Rodd	J Moore	127	1098	7	-	1.36.70	8.5	2005-09-04
2	1	13	6	COSMO STAR	A260	H K Yim	K L Man	126	1051	4	9-1/4	1.25.50	35.0	2005-09-04
3	9	1	3	CHEEKY	A266	C Brown	C Fownes	129	1016	13	-	1.22.60	15.0	2005-09-04
4	7	5	1	GENERAL KINGY	B039	O Doleuze	K W Lui	129	1212	3	2-3/4	0.57.90	5.0	2005-09-04

疫情時代集作2020

2020年7月28日星期二

茶餐廳必用的外賣利器

2020年7月19日星期日

Betting market in horse racing - Hong Kong

A study in how efficient that the public odds reflect the actual strike rate¶

Applying domain knowledge¶

Is there a trend that the betting market is getting more efficient over time?¶

茶餐廳必用的外賣利器

2020年7月28日 星期二

茶餐廳必用的外賣利器

2020年7月19日 星期日

Betting market in horse racing - Hong Kong

A study in how efficient that the public odds reflect the actual strike rate¶

Applying domain knowledge¶

Is there a trend that the betting market is getting more efficient over time?¶

茶餐廳必用的外賣利器

2020年7月28日星期二

2020年7月19日星期日