Sell in May and Go Away?
Dan Bikle -- http://bikle.com -- 2009-07-13
Many of us will remember October 2008.
Another month which sticks in my memory is May 2008.
During that month I heard the phrase, "Sell in May and Go Away."
The message being that summer is not a good time to hold stocks.
Should we "Sell in May and Go Away?"
And if yes, when should we come back?
We start by building a shell script to obtain pricing data from yahoo:
#! /bin/sh
# wget.sh
# Use wget to download csv files full of pricing data from finance.yahoo.com
. /oracle/.orcl
set -x
# cd to the right place
cd /oracle/vlsql/md/sim/
# wget em
wget --output-document=DIA.csv http://ichart.finance.yahoo.com/table.csv?s=DIA
wget --output-document=MDY.csv http://ichart.finance.yahoo.com/table.csv?s=MDY
wget --output-document=SPY.csv http://ichart.finance.yahoo.com/table.csv?s=SPY
wget --output-document=XLB.csv http://ichart.finance.yahoo.com/table.csv?s=XLB
wget --output-document=XLE.csv http://ichart.finance.yahoo.com/table.csv?s=XLE
wget --output-document=XLF.csv http://ichart.finance.yahoo.com/table.csv?s=XLF
wget --output-document=XLI.csv http://ichart.finance.yahoo.com/table.csv?s=XLI
wget --output-document=XLK.csv http://ichart.finance.yahoo.com/table.csv?s=XLK
wget --output-document=XLV.csv http://ichart.finance.yahoo.com/table.csv?s=XLV
Results of Above Script
The above script gives us a set of .csv files with appropriate names.
oracle@zareason:~/vlsql/md/sim$
oracle@zareason:~/vlsql/md/sim$
oracle@zareason:~/vlsql/md/sim$
oracle@zareason:~/vlsql/md/sim$
oracle@zareason:~/vlsql/md/sim$ ll *csv
-rw-r--r-- 1 oracle dba 148962 2009-06-15 02:20 DIA.csv
-rw-r--r-- 1 oracle dba 174759 2009-06-15 02:20 MDY.csv
-rw-r--r-- 1 oracle dba 215309 2009-06-15 02:20 SPY.csv
-rw-r--r-- 1 oracle dba 127152 2009-06-15 02:20 XLB.csv
-rw-r--r-- 1 oracle dba 128525 2009-06-15 02:20 XLE.csv
-rw-r--r-- 1 oracle dba 128869 2009-06-15 02:20 XLF.csv
-rw-r--r-- 1 oracle dba 126368 2009-06-15 02:20 XLI.csv
-rw-r--r-- 1 oracle dba 127814 2009-06-15 02:20 XLK.csv
-rw-r--r-- 1 oracle dba 125970 2009-06-15 02:20 XLV.csv
oracle@zareason:~/vlsql/md/sim$
oracle@zareason:~/vlsql/md/sim$
oracle@zareason:~/vlsql/md/sim$
Load the .csv Files Into an Oracle Table
The next step of extracting the Yahoo data is accomplished with the
Oracle utility named SQL*Loader.
A shell script which calls SQL*Loader is listed below:
#! /bin/sh
# etfmas.sh
. /oracle/.orcl
set -x
cd /oracle/vlsql/md/sim/
cat DIA.csv | awk -F, '{print "DIA,"$1",DIA"$1","$6","$7}' | grep 0 > etfmas.csv
cat MDY.csv | awk -F, '{print "MDY,"$1",MDY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat SPY.csv | awk -F, '{print "SPY,"$1",SPY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLB.csv | awk -F, '{print "XLB,"$1",XLB"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLE.csv | awk -F, '{print "XLE,"$1",XLE"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLF.csv | awk -F, '{print "XLF,"$1",XLF"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLI.csv | awk -F, '{print "XLI,"$1",XLI"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLK.csv | awk -F, '{print "XLK,"$1",XLK"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLV.csv | awk -F, '{print "XLV,"$1",XLV"$1","$6","$7}' | grep 0 >> etfmas.csv
sqlplus trade/t <<EOF
DROP TABLE etfmas;
CREATE TABLE etfmas(tkr VARCHAR2(4),ydate DATE,tkr_date VARCHAR2(15),yvol NUMBER,yprice NUMBER)COMPRESS;
EOF
sqlldr trade/t bindsize=20971520 readsize=20971520 rows=123456 control=etfmas.ctl
grep loaded etfmas.log
sqlplus trade/t <<EOF
@etfmas.sql
EOF
# end of script
The above script depends on a SQL*Loader control file which is displayed below:
LOAD DATA
INFILE 'etfmas.csv'
replace INTO TABLE etfmas
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
tkr CHAR NULLIF (tkr=BLANKS)
,ydate DATE "YYYY-MM-DD" NULLIF (ydate=BLANKS)
,tkr_date CHAR NULLIF (tkr_date=BLANKS)
,yvol DECIMAL EXTERNAL NULLIF (yvol=BLANKS)
,yprice DECIMAL EXTERNAL NULLIF (yprice=BLANKS)
)
The Table etfmas
The above shell script runs in about 5 seconds.
The data lands in a table named etfmas.
13:55:54 SQL> @etfmas.sql
13:55:58 SQL> --
13:55:58 SQL> -- etfmas.sql
13:55:58 SQL> --
13:55:58 SQL> COLUMN years FORMAT 999.9
13:55:58 SQL> SELECT
13:55:58 2 tkr,COUNT(*)
13:55:58 3 ,MIN(ydate)
13:55:58 4 ,MAX(ydate)
13:55:58 5 ,(MAX(ydate)-MIN(ydate))/365 years
13:55:58 6 FROM etfmas
13:55:58 7 GROUP BY tkr ORDER BY tkr
13:55:58 8 /
TKR COUNT(*) MIN(YDATE) MAX(YDATE) YEARS
---- ---------- ---------- ---------- ------
DIA 2889 1998-01-20 2009-07-10 11.5
MDY 3498 1995-08-18 2009-07-10 13.9
SPY 4143 1993-01-29 2009-07-10 16.5
XLB 2653 1998-12-22 2009-07-10 10.6
XLE 2653 1998-12-22 2009-07-10 10.6
XLF 2653 1998-12-22 2009-07-10 10.6
XLI 2653 1998-12-22 2009-07-10 10.6
XLK 2653 1998-12-22 2009-07-10 10.6
XLV 2653 1998-12-22 2009-07-10 10.6
9 rows selected.
Elapsed: 00:00:00.06
13:55:58 SQL>
Notice the time spans. For the SPY ETF we have data going back to
1993. If these ETFs display either seasonal or monthly price
patterns, we should be able to see them.
Match Prices With Future Prices
Next, we build a view which matches each price of an ETF to a price
which is 40 trading days in the future.
Also we get a corresponding percentage gain.
--
-- cr_sim.sql
--
-- In a subquery, for each tkr_date we obtain the leading price
-- (yprice40) which is 40 trading days ahead in the future.
-- With yprice40, we get the percentage gain for that 40 day period.
-- Also we get the month to see if it is predictive.
CREATE OR REPLACE VIEW sim40 AS SELECT tkr_date,tkr,ydate,yprice,yprice40
,100*(yprice40 - yprice)/yprice pg
,0 + TO_CHAR ( ydate,'MM' ) mnth
FROM
(
SELECT
tkr_date,tkr,ydate,yprice
,LEAD(yprice,40,NULL)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice40
FROM etfmas
ORDER BY tkr_date
)
WHERE yprice40 IS NOT NULL AND yprice > 0
/
SET PAGESIZE 123
-- Look at tkr
SELECT tkr,ROUND(AVG(pg),2)FROM sim40
GROUP BY tkr ORDER BY tkr;
-- Look at mnth
SELECT mnth,ROUND(AVG(pg),2)FROM sim40
GROUP BY mnth ORDER BY mnth;
-- Look at tkr and mnth
SELECT tkr,mnth,ROUND(AVG(pg),2)FROM sim40
GROUP BY tkr,mnth ORDER BY tkr,mnth;
-- Look at tkr and mnth, but order by pg
SELECT tkr,mnth,ROUND(AVG(pg),2)FROM sim40
GROUP BY tkr,mnth ORDER BY ROUND(AVG(pg),2);
Here is a screen dump from the above sql script:
13:59:42 SQL>
13:59:44 SQL> @cr_sim.sql
13:59:51 SQL> --
13:59:51 SQL> -- cr_sim.sql
13:59:51 SQL> --
13:59:51 SQL>
13:59:51 SQL> -- In a subquery, for each tkr_date we obtain the leading price
13:59:51 SQL> -- (yprice40) which is 40 trading days ahead in the future.
13:59:51 SQL>
13:59:51 SQL> -- With yprice40, we get the percentage gain for that 40 day period.
13:59:51 SQL>
13:59:51 SQL> -- Also we get the month to see if it is predictive.
13:59:51 SQL>
13:59:51 SQL> CREATE OR REPLACE VIEW sim40 AS SELECT tkr_date,tkr,ydate,yprice,yprice40
13:59:51 2 ,100*(yprice40 - yprice)/yprice pg
13:59:51 3 ,0 + TO_CHAR ( ydate,'MM' ) mnth
13:59:51 4 FROM
13:59:51 5 (
13:59:51 6 SELECT
13:59:51 7 tkr_date,tkr,ydate,yprice
13:59:51 8 ,LEAD(yprice,40,NULL)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice40
13:59:51 9 FROM etfmas
13:59:51 10 ORDER BY tkr_date
13:59:51 11 )
13:59:51 12 WHERE yprice40 IS NOT NULL AND yprice > 0
13:59:51 13 /
View created.
Elapsed: 00:00:00.03
13:59:51 SQL>
13:59:51 SQL> SET PAGESIZE 123
13:59:51 SQL>
13:59:51 SQL> -- Look at tkr
13:59:51 SQL> SELECT tkr,ROUND(AVG(pg),2)FROM sim40
13:59:51 2 GROUP BY tkr ORDER BY tkr;
TKR ROUND(AVG(PG),2)
---- ----------------
DIA .58
MDY 1.58
SPY 1.2
XLB 1.06
XLE 1.86
XLF -.09
XLI .47
XLK -.3
XLV .32
9 rows selected.
Elapsed: 00:00:00.28
The above data was copied to a spreadsheet and charted:
The two ETFs, MDY and XLE performed the best.
The two ETFs, XLF and XLK performed the worst
Instead of averaging over tkr, we average over month:
13:59:51 SQL>
13:59:51 SQL> -- Look at mnth
13:59:51 SQL> SELECT mnth,ROUND(AVG(pg),2)FROM sim40
13:59:51 2 GROUP BY mnth ORDER BY mnth;
MNTH ROUND(AVG(PG),2)
---------- ----------------
1 -.83
2 2.77
3 5.53
4 2.82
5 -1.09
6 -1.28
7 -1.08
8 -2.73
9 .09
10 3.59
11 1.96
12 -.72
12 rows selected.
Elapsed: 00:00:00.32
The above data was copied to a spreadsheet and charted:
The chart tells me that I should sell in May and go away.
I should then return to the market in October.
Let's drill down into each of the ETF tkrs for each month.
13:59:51 SQL>
13:59:51 SQL> -- Look at tkr and mnth
13:59:51 SQL> SELECT tkr,mnth,ROUND(AVG(pg),2)FROM sim40
13:59:51 2 GROUP BY tkr,mnth ORDER BY tkr,mnth;
TKR MNTH ROUND(AVG(PG),2)
---- ---------- ----------------
DIA 1 -1.39
DIA 2 2.64
DIA 3 5.04
DIA 4 1.71
DIA 5 -1.62
DIA 6 -.94
DIA 7 -1.63
DIA 8 -3.26
DIA 9 1.21
DIA 10 4.28
DIA 11 1.75
DIA 12 -1.33
MDY 1 1.08
MDY 2 2.33
MDY 3 5.45
MDY 4 4.3
MDY 5 .07
MDY 6 -1.51
MDY 7 -.81
MDY 8 -1.57
MDY 9 .67
MDY 10 4.32
MDY 11 2.56
MDY 12 1.46
SPY 1 .15
SPY 2 1.5
SPY 3 3.43
SPY 4 2.92
SPY 5 .42
SPY 6 -.32
SPY 7 -.4
SPY 8 -.92
SPY 9 1.28
SPY 10 3.41
SPY 11 1.96
SPY 12 .74
XLB 1 .19
XLB 2 4.98
XLB 3 8.01
XLB 4 2.41
XLB 5 -2.85
XLB 6 -1.92
XLB 7 -2.72
XLB 8 -4.89
XLB 9 .4
XLB 10 6.18
XLB 11 3.46
XLB 12 -1.14
XLE 1 3.26
XLE 2 6.26
XLE 3 7.54
XLE 4 5.15
XLE 5 -.23
XLE 6 -1.52
XLE 7 -.11
XLE 8 -1.78
XLE 9 -2.74
XLE 10 1.83
XLE 11 2.48
XLE 12 1.15
XLF 1 -3.36
XLF 2 4.13
XLF 3 7.92
XLF 4 2.1
XLF 5 -2.49
XLF 6 -1.39
XLF 7 -.61
XLF 8 -2.87
XLF 9 -.49
XLF 10 .6
XLF 11 -1.1
XLF 12 -4.3
XLI 1 -2.4
XLI 2 3.46
XLI 3 7.83
XLI 4 3.27
XLI 5 -2.03
XLI 6 -1.2
XLI 7 -.81
XLI 8 -3.52
XLI 9 -1.1
XLI 10 3.29
XLI 11 1.25
XLI 12 -3.25
XLK 1 -3.98
XLK 2 0
XLK 3 2.28
XLK 4 1.77
XLK 5 -.77
XLK 6 -2.32
XLK 7 -2.22
XLK 8 -3.53
XLK 9 1.61
XLK 10 5.15
XLK 11 1.18
XLK 12 -2.86
XLV 1 -2.05
XLV 2 .43
XLV 3 3.46
XLV 4 1.41
XLV 5 -1.41
XLV 6 -.98
XLV 7 -.82
XLV 8 -3.66
XLV 9 -1.12
XLV 10 2.95
XLV 11 3.92
XLV 12 1.45
108 rows selected.
Elapsed: 00:00:00.13
Some of the above data was copied to a spreadsheet and charted.
The two best:
The two worst:
Run the above query again but order by avg_pg rather than by tkr.
Notice that the two "bad" tkrs, XLF and XLK congregate near the top of the output.
Notice that the two "good" tkrs, MDY and XLE congregate near the top of the output.
Also we see the "bad" months near the top and the "good" months near the bottom.
13:59:52 SQL>
13:59:52 SQL> -- Look at tkr and mnth, but order by pg
13:59:52 SQL> SELECT tkr,mnth,ROUND(AVG(pg),2)FROM sim40
13:59:52 2 GROUP BY tkr,mnth ORDER BY ROUND(AVG(pg),2);
TKR MNTH ROUND(AVG(PG),2)
---- ---------- ----------------
XLB 8 -4.89
XLF 12 -4.3
XLK 1 -3.98
XLV 8 -3.66
XLK 8 -3.53
XLI 8 -3.52
XLF 1 -3.36
DIA 8 -3.26
XLI 12 -3.25
XLF 8 -2.87
XLK 12 -2.86
XLB 5 -2.85
XLE 9 -2.74
XLB 7 -2.72
XLF 5 -2.49
XLI 1 -2.4
XLK 6 -2.32
XLK 7 -2.22
XLV 1 -2.05
XLI 5 -2.03
XLB 6 -1.92
XLE 8 -1.78
DIA 7 -1.63
DIA 5 -1.62
MDY 8 -1.57
XLE 6 -1.52
MDY 6 -1.51
XLV 5 -1.41
XLF 6 -1.39
DIA 1 -1.39
DIA 12 -1.33
XLI 6 -1.2
XLB 12 -1.14
XLV 9 -1.12
XLF 11 -1.1
XLI 9 -1.1
XLV 6 -.98
DIA 6 -.94
SPY 8 -.92
XLV 7 -.82
XLI 7 -.81
MDY 7 -.81
XLK 5 -.77
XLF 7 -.61
XLF 9 -.49
SPY 7 -.4
SPY 6 -.32
XLE 5 -.23
XLE 7 -.11
XLK 2 0
MDY 5 .07
SPY 1 .15
XLB 1 .19
XLB 9 .4
SPY 5 .42
XLV 2 .43
XLF 10 .6
MDY 9 .67
SPY 12 .74
MDY 1 1.08
XLE 12 1.15
XLK 11 1.18
DIA 9 1.21
XLI 11 1.25
SPY 9 1.28
XLV 4 1.41
XLV 12 1.45
MDY 12 1.46
SPY 2 1.5
XLK 9 1.61
DIA 4 1.71
DIA 11 1.75
XLK 4 1.77
XLE 10 1.83
SPY 11 1.96
XLF 4 2.1
XLK 3 2.28
MDY 2 2.33
XLB 4 2.41
XLE 11 2.48
MDY 11 2.56
DIA 2 2.64
SPY 4 2.92
XLV 10 2.95
XLE 1 3.26
XLI 4 3.27
XLI 10 3.29
SPY 10 3.41
SPY 3 3.43
XLV 3 3.46
XLB 11 3.46
XLI 2 3.46
XLV 11 3.92
XLF 2 4.13
DIA 10 4.28
MDY 4 4.3
MDY 10 4.32
XLB 2 4.98
DIA 3 5.04
XLE 4 5.15
XLK 10 5.15
MDY 3 5.45
XLB 10 6.18
XLE 2 6.26
XLE 3 7.54
XLI 3 7.83
XLF 3 7.92
XLB 3 8.01
108 rows selected.
Elapsed: 00:00:00.13
13:59:52 SQL>
13:59:54 SQL>
Analysis
Over all the months, the ETFs which gave good average returns where
MDY and XLE.
Over all the ETFs, the months which gave good average returns where
2, 3, 4, 10, 11.
Over all the ETFs, the months which gave bad average returns where
1, 5, 6, 7, 8, 12
So the statement, "Sell in May and Go-Away" has some validity.
And when should we return?
Answer: October.
Matching the "good" ETFs with the "good" months does give good results:
15:34:44 SQL> --
15:34:44 SQL> -- good2.sql
15:34:44 SQL> --
15:34:44 SQL>
15:34:44 SQL> SELECT ROUND(AVG(pg),2)FROM sim40
15:34:44 2 WHERE tkr IN('MDY','XLE')
15:34:44 3 AND mnth IN(2,3,4,10,11)
15:34:44 4 /
ROUND(AVG(PG),2)
----------------
4.22
Elapsed: 00:00:00.01
15:34:44 SQL>
15:34:44 SQL> SELECT tkr,mnth,ROUND(AVG(pg),2)FROM sim40
15:34:44 2 WHERE tkr IN('MDY','XLE')
15:34:44 3 AND mnth IN(2,3,4,10,11)
15:34:44 4 GROUP BY tkr,mnth ORDER BY tkr,mnth
15:34:44 5 /
TKR MNTH ROUND(AVG(PG),2)
---- ---------- ----------------
MDY 2 2.33
MDY 3 5.45
MDY 4 4.3
MDY 10 4.32
MDY 11 2.56
XLE 2 6.26
XLE 3 7.54
XLE 4 5.15
XLE 10 1.83
XLE 11 2.48
10 rows selected.
Elapsed: 00:00:00.00
The above data was copied to a spreadsheet and charted.
And matching bad ETFs with bad rounded months gives bad results
(which might be good for One brave enough to short the market):
16:13:00 SQL> --
16:13:00 SQL> -- bad2.sql
16:13:00 SQL> --
16:13:00 SQL>
16:13:00 SQL> SELECT ROUND(AVG(pg),2)FROM sim40
16:13:00 2 WHERE tkr IN('XLF','XLK')
16:13:00 3 AND mnth IN(1, 5, 6, 7, 8, 12)
16:13:00 4 /
ROUND(AVG(PG),2)
----------------
-2.57
Elapsed: 00:00:00.01
16:13:00 SQL>
16:13:00 SQL> SELECT tkr,mnth,ROUND(AVG(pg),2)FROM sim40
16:13:00 2 WHERE tkr IN('XLF','XLK')
16:13:00 3 AND mnth IN(1, 5, 6, 7, 8, 12)
16:13:00 4 GROUP BY tkr,mnth ORDER BY tkr,mnth
16:13:00 5 /
TKR MNTH ROUND(AVG(PG),2)
---- ---------- ----------------
XLF 1 -3.36
XLF 5 -2.49
XLF 6 -1.39
XLF 7 -.61
XLF 8 -2.87
XLF 12 -4.3
XLK 1 -3.98
XLK 5 -.77
XLK 6 -2.32
XLK 7 -2.22
XLK 8 -3.53
XLK 12 -2.86
12 rows selected.
Elapsed: 00:00:00.12
The above data was copied to a spreadsheet and charted.
Conclusion
Over the past 16 years, monthly price patterns are apparent.
An investment strategy which recognizes both strong ETFs and monthly
price patterns would be a wise way to participate in the stock market.