Mixing Oracle and Ruby With ETF Price Data
Dan Bikle -- http://bikle.com -- 2009-06-15Topics
Programming Techniques/Syntax
Motivation
A stock market investor might ask simple questions: - Should I react to the slope of the moving average of the price of my stock holdings? - Or, when I buy a stock, should I just hold it until I need the money? A database programmer may use this paper as a collection of techniques: - Acquisition of trading data - Authoring SQL-Analytic functions - Implementing both Star and Snow-Flake schemas - Mapping fact-tables, Star and Snow-Flake schemas to OLAP cubes and dimensions - Operating Oracle Analytic Workspace Manager (AWM) - Operating the Measure Data Viewer in AWM - Using Ruby to both execute and generate SQL - Operating the SVM algorithm within Oracle - Interpreting results from the SVM algorithm
Trading Strategies
Specifically, we look at three trading strategies and how they might be combined with the slopes of moving averages of ETF prices. - Buy and Hold (bh) - Long or Cash (loc) - Long or Short (los) Buy and Hold means that we buy and hold a long position in a specific Exchange Traded Fund (ETF) for a period of time. Long or Cash means that we buy and hold a long position in a specific ETF for a period of time, or we hold cash for that period of time. Long or Short means that we buy and hold a long position in a specific ETF for a period of time, or we hold a short position in that ETF for that period of time.
Pivot-Point and Holding-Period
In each of the strategies we explore 2 degrees of freedom:
- The pivot-point, or the amount of time embedded within the moving
average slope (5, 10, or 20 days)
- The holding-period, or amount of time an ETF position is held
before the position is evaluated (5, 10, or 20 days)
We make note that pivot-point does not apply to the bh-strategy.
los Example
If for example we want to model the los strategy we pick a pivot-point of say 20 days. Next, we pick a specific ETF, DIA for example. Then, we pick a holding period of say, 10 days. Next, we pick a starting date, 2006-01-02 maybe. Then for that date we inspect the slope of the 20-day moving average of DIA on 2006-01-02. If the slope is positive we simulate a buy of 1 share of DIA at the closing price os 2006-01-02 otherwise we simulate a short of 1 share of DIA. Next, we look at the closing price of DIA 10-trading-days after 2006-01-02 which is 2006-01-16. Then, we compute the percentage gain or loss for that simulated scenario and store it in a database table. The above example suggests that the columns of the table has these names: -strategy (this holds the value 'los') -pivot_point -holding_period -tkr (this holds the value 'DIA') -ydate (date of the simulated buy) -pct_gain If we want to model 3 pivot-points (5, 10, and 20 days) with 3 holding-periods (5, 10, 20 days), then it is easy to see that the DIA ETF simulation generates 6 rows in this table for the buy-date of 2006-01-02. If we want to model all of 2006, then we multiply the 6 rows by the number of trading days in 2006 which is 6 x 250 = 1500 rows.
List of ETFs
Arbitrarily we select this list of ETFs to study:
23:23:10 SQL> select tkr,min(ydate)from etfmas group by tkr order by min(ydate);
TKR MIN(YDATE)
---- ----------
SPY 1993-01-29
MDY 1995-08-18
DIA 1998-01-20
XLE 1998-12-22
XLI 1998-12-22
XLV 1998-12-22
XLF 1998-12-22
XLK 1998-12-22
XLB 1998-12-22
OIH 2001-02-26
EFA 2001-08-27
SHY 2002-07-31
Based on the query above we set the starting date of our analysis to be Tuesday, 2002-09-03 which is 1 month past the earliest data point we have for the SHY ETF (2002-07-31).
Our Approach
We have expressed some information about our general goal along with a description of the data we intend to study. How will we answer the question posed in the motivation section at the beginning? We write SQL queries, bolstered by Oracle Analytic Functions, to answer the above question. Additionally we Extract, Transform, and then Load the data into an OLAP datastore. Then we run simple OLAP-dimensional reports against the OLAP datastore. These reports give us an idea of the general effectiveness for the strategies: bh, loc, and los. Next, we setup a backtesting infrastructure for the SVM algorithm in Oracle Data Miner. The backtesting infrastructure constrains the SVM algorithm so that it has a choice of following the above three strategies or doing nothing. After running a series of backtests we have data about SVM algorithm reaction to historical price movements. Ordinary SQL is then used to report on the effectiveness of SVM. We then answer the question: "Can the SVM algorithm predict opportune times to implement the strategies bh, loc, los?"
ETL of Yahoo ETF Data
Yahoo provides pricing data for the above list of ETFs. The script listed below was used to obtain this data
#! /bin/sh
# wget.sh
# Use wget to download csv files full of pricing data from finance.yahoo.com
. /oracle/.orcl
set -x
# cd to the right place
cd /oracle/vlsql/mkt_data/etfmas/
# wget em
wget --output-document=DIA.csv http://ichart.finance.yahoo.com/table.csv?s=DIA
wget --output-document=EFA.csv http://ichart.finance.yahoo.com/table.csv?s=EFA
wget --output-document=MDY.csv http://ichart.finance.yahoo.com/table.csv?s=MDY
wget --output-document=OIH.csv http://ichart.finance.yahoo.com/table.csv?s=OIH
wget --output-document=SHY.csv http://ichart.finance.yahoo.com/table.csv?s=SHY
wget --output-document=SPY.csv http://ichart.finance.yahoo.com/table.csv?s=SPY
wget --output-document=XLB.csv http://ichart.finance.yahoo.com/table.csv?s=XLB
wget --output-document=XLE.csv http://ichart.finance.yahoo.com/table.csv?s=XLE
wget --output-document=XLF.csv http://ichart.finance.yahoo.com/table.csv?s=XLF
wget --output-document=XLI.csv http://ichart.finance.yahoo.com/table.csv?s=XLI
wget --output-document=XLK.csv http://ichart.finance.yahoo.com/table.csv?s=XLK
wget --output-document=XLV.csv http://ichart.finance.yahoo.com/table.csv?s=XLV
The above script runs very quickly, less than 10 seconds on an ordinary Linux host connected to an ordinary DSL modem.
wget Screendump
oracle@zareason:~/vlsql/md/etfmas$ ./wget.sh
+ cd /oracle/vlsql/mkt_data/etfmas/
+ wget --output-document=DIA.csv 'http://ichart.finance.yahoo.com/table.csv?s=DIA'
--10:41:41-- http://ichart.finance.yahoo.com/table.csv?s=DIA
=> `DIA.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 146,470 278.38K/s
10:41:42 (277.87 KB/s) - `DIA.csv' saved [146470]
+ wget --output-document=EFA.csv 'http://ichart.finance.yahoo.com/table.csv?s=EFA'
--10:41:42-- http://ichart.finance.yahoo.com/table.csv?s=EFA
=> `EFA.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 96,792 250.84K/s
10:41:43 (250.11 KB/s) - `EFA.csv' saved [96792]
+ wget --output-document=MDY.csv 'http://ichart.finance.yahoo.com/table.csv?s=MDY'
--10:41:43-- http://ichart.finance.yahoo.com/table.csv?s=MDY
=> `MDY.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 172,154 293.94K/s
10:41:43 (293.29 KB/s) - `MDY.csv' saved [172154]
+ wget --output-document=OIH.csv 'http://ichart.finance.yahoo.com/table.csv?s=OIH'
--10:41:43-- http://ichart.finance.yahoo.com/table.csv?s=OIH
=> `OIH.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 104,036 295.36K/s
10:41:44 (294.96 KB/s) - `OIH.csv' saved [104036]
+ wget --output-document=SHY.csv 'http://ichart.finance.yahoo.com/table.csv?s=SHY'
--10:41:44-- http://ichart.finance.yahoo.com/table.csv?s=SHY
=> `SHY.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 80,700 297.73K/s
10:41:44 (297.48 KB/s) - `SHY.csv' saved [80700]
+ wget --output-document=SPY.csv 'http://ichart.finance.yahoo.com/table.csv?s=SPY'
--10:41:44-- http://ichart.finance.yahoo.com/table.csv?s=SPY
=> `SPY.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 212,759 249.86K/s
10:41:46 (249.10 KB/s) - `SPY.csv' saved [212759]
+ wget --output-document=XLB.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLB'
--10:41:46-- http://ichart.finance.yahoo.com/table.csv?s=XLB
=> `XLB.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 124,685 293.89K/s
10:41:46 (293.59 KB/s) - `XLB.csv' saved [124685]
+ wget --output-document=XLE.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLE'
--10:41:46-- http://ichart.finance.yahoo.com/table.csv?s=XLE
=> `XLE.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 126,026 251.35K/s
10:41:47 (250.50 KB/s) - `XLE.csv' saved [126026]
+ wget --output-document=XLF.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLF'
--10:41:47-- http://ichart.finance.yahoo.com/table.csv?s=XLF
=> `XLF.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 126,354 304.52K/s
10:41:47 (304.35 KB/s) - `XLF.csv' saved [126354]
+ wget --output-document=XLI.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLI'
--10:41:47-- http://ichart.finance.yahoo.com/table.csv?s=XLI
=> `XLI.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 123,884 268.63K/s
10:41:48 (268.06 KB/s) - `XLI.csv' saved [123884]
+ wget --output-document=XLK.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLK'
--10:41:48-- http://ichart.finance.yahoo.com/table.csv?s=XLK
=> `XLK.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 125,357 293.65K/s
10:41:48 (293.29 KB/s) - `XLK.csv' saved [125357]
+ wget --output-document=XLV.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLV'
--10:41:48-- http://ichart.finance.yahoo.com/table.csv?s=XLV
=> `XLV.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
[ <=> ] 123,517 276.35K/s
10:41:49 (275.77 KB/s) - `XLV.csv' saved [123517]
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ ll *csv
-rw-r--r-- 1 oracle dba 146470 2009-04-02 10:41 DIA.csv
-rw-r--r-- 1 oracle dba 96792 2009-04-02 10:41 EFA.csv
-rw-r--r-- 1 oracle dba 172154 2009-04-02 10:41 MDY.csv
-rw-r--r-- 1 oracle dba 104036 2009-04-02 10:41 OIH.csv
-rw-r--r-- 1 oracle dba 80700 2009-04-02 10:41 SHY.csv
-rw-r--r-- 1 oracle dba 212759 2009-04-02 10:41 SPY.csv
-rw-r--r-- 1 oracle dba 124685 2009-04-02 10:41 XLB.csv
-rw-r--r-- 1 oracle dba 126026 2009-04-02 10:41 XLE.csv
-rw-r--r-- 1 oracle dba 126354 2009-04-02 10:41 XLF.csv
-rw-r--r-- 1 oracle dba 123884 2009-04-02 10:41 XLI.csv
-rw-r--r-- 1 oracle dba 125357 2009-04-02 10:41 XLK.csv
-rw-r--r-- 1 oracle dba 123517 2009-04-02 10:41 XLV.csv
oracle@zareason:~/vlsql/md/etfmas$
The next step of extracting the Yahoo data was accomplished with the Oracle utility named SQL*Loader.
Use SQL*Loader To "Extract" (the E in ETL)
A shell script which calls SQL*Loader is listed below:
#!/bin/sh
# etfmas.sh
. /oracle/.orcl
set -x
cd /oracle/vlsql/mkt_data/etfmas/
cat DIA.csv | awk -F, '{print "DIA,"$1",DIA"$1","$6","$7}' | grep 0 > etfmas.csv
cat EFA.csv | awk -F, '{print "EFA,"$1",EFA"$1","$6","$7}' | grep 0 >> etfmas.csv
cat MDY.csv | awk -F, '{print "MDY,"$1",MDY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat OIH.csv | awk -F, '{print "OIH,"$1",OIH"$1","$6","$7}' | grep 0 >> etfmas.csv
cat SHY.csv | awk -F, '{print "SHY,"$1",SHY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat SPY.csv | awk -F, '{print "SPY,"$1",SPY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLB.csv | awk -F, '{print "XLB,"$1",XLB"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLE.csv | awk -F, '{print "XLE,"$1",XLE"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLF.csv | awk -F, '{print "XLF,"$1",XLF"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLI.csv | awk -F, '{print "XLI,"$1",XLI"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLK.csv | awk -F, '{print "XLK,"$1",XLK"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLV.csv | awk -F, '{print "XLV,"$1",XLV"$1","$6","$7}' | grep 0 >> etfmas.csv
sqlplus trade/t <<EOF
DROP TABLE etfmas;
CREATE TABLE etfmas(tkr VARCHAR2(4),ydate DATE,tkr_date VARCHAR2(15),yvol NUMBER,yprice NUMBER)COMPRESS;
EOF
sqlldr trade/t bindsize=20971520 readsize=20971520 rows=123456 control=etfmas.ctl
grep loaded etfmas.log
sqlplus trade/t <<EOF
@etfmas.sql
EOF
SQL*Loader Screendump
A screen dump from the above script is displayed below. Notice that the script ran in about 1 second:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ date; ./etfmas.sh ; date
Fri Apr 3 09:29:31 PDT 2009
+ cd /oracle/vlsql/mkt_data/etfmas/
+ cat DIA.csv
+ awk -F, '{print "DIA,"$1",DIA"$1","$6","$7}'
+ grep 0
+ cat EFA.csv
+ awk -F, '{print "EFA,"$1",EFA"$1","$6","$7}'
+ grep 0
+ cat MDY.csv
+ awk -F, '{print "MDY,"$1",MDY"$1","$6","$7}'
+ grep 0
+ cat OIH.csv
+ awk -F, '{print "OIH,"$1",OIH"$1","$6","$7}'
+ grep 0
+ cat SHY.csv
+ awk -F, '{print "SHY,"$1",SHY"$1","$6","$7}'
+ grep 0
+ cat SPY.csv
+ awk -F, '{print "SPY,"$1",SPY"$1","$6","$7}'
+ grep 0
+ cat XLB.csv
+ awk -F, '{print "XLB,"$1",XLB"$1","$6","$7}'
+ grep 0
+ cat XLE.csv
+ awk -F, '{print "XLE,"$1",XLE"$1","$6","$7}'
+ grep 0
+ cat XLF.csv
+ awk -F, '{print "XLF,"$1",XLF"$1","$6","$7}'
+ grep 0
+ cat XLI.csv
+ awk -F, '{print "XLI,"$1",XLI"$1","$6","$7}'
+ grep 0
+ cat XLK.csv
+ awk -F, '{print "XLK,"$1",XLK"$1","$6","$7}'
+ grep 0
+ cat XLV.csv
+ awk -F, '{print "XLV,"$1",XLV"$1","$6","$7}'
+ grep 0
+ sqlplus trade/t
SQL*Plus: Release 11.1.0.6.0 - Production on Fri Apr 3 09:29:31 2009
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Session altered.
09:29:31 SQL> SET ECHO ON TIME ON TIMING ON LINES 155 PAGES 99
09:29:31 SQL>
Table dropped.
Elapsed: 00:00:00.10
09:29:32 SQL>
Table created.
Elapsed: 00:00:00.03
09:29:32 SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
+ sqlldr trade/t bindsize=20971520 readsize=20971520 rows=123456 control=etfmas.ctl
SQL*Loader: Release 11.1.0.6.0 - Production on Fri Apr 3 09:29:32 2009
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Commit point reached - logical record count 16256
Commit point reached - logical record count 31464
+ grep loaded etfmas.log
Table ETFMAS, loaded from every logical record.
31464 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
+ sqlplus trade/t
SQL*Plus: Release 11.1.0.6.0 - Production on Fri Apr 3 09:29:32 2009
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Session altered.
09:29:32 SQL> SET ECHO ON TIME ON TIMING ON LINES 155 PAGES 99
09:29:32 SQL> 09:29:32 SQL> --
09:29:32 SQL> -- etfmas.sql
09:29:32 SQL> --
09:29:32 SQL>
09:29:32 SQL> SELECT tkr,COUNT(*),MIN(ydate),MAX(ydate)FROM etfmas GROUP BY tkr ORDER BY tkr ;
TKR COUNT(*) MIN(YDATE) MAX(YDATE)
---- ---------- ---------- ----------
DIA 2821 1998-01-20 2009-04-02
EFA 1910 2001-08-27 2009-04-02
MDY 3430 1995-08-18 2009-04-02
OIH 2037 2001-02-26 2009-04-02
SHY 1681 2002-07-31 2009-04-02
SPY 4075 1993-01-29 2009-04-02
XLB 2585 1998-12-22 2009-04-02
XLE 2585 1998-12-22 2009-04-02
XLF 2585 1998-12-22 2009-04-02
XLI 2585 1998-12-22 2009-04-02
XLK 2585 1998-12-22 2009-04-02
XLV 2585 1998-12-22 2009-04-02
12 rows selected.
Elapsed: 00:00:00.06
09:29:32 SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Fri Apr 3 09:29:32 PDT 2009
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
Evident from the screendump above is that Extraction (the E in ETL), of stock prices from yahoo.com flows data into a table named etfmas. Once etfmas is filled with data, we can consider Extraction for this effort to be finished.
Transform (the T in ETL)
The script displayed below does the first step of the Transform:
--
-- mas.sql
--
CONNECT trade/t
-- Creates a view full of much of the data needed to drive my Moving-Avg-Study (MAS).
-- Some of the thinking behind this script is written down here:
-- vlsql/notes/notes2008_1209xp.txt
-- MAS depends on Moving Average so I use an inline view to get 3 types of Moving Averages.
CREATE OR REPLACE VIEW etfmai AS SELECT tkr_date,tkr,ydate,yprice,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
-- Get MAS from past 5days, 10days, 20days. Use LAG() to get yesterday's Moving Average
,yprice_avg5-LAG(yprice_avg5)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg5slp
,yprice_avg10-LAG(yprice_avg10)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg10slp
,yprice_avg20-LAG(yprice_avg20)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg20slp
-- Get PG for future 5days, 10days, 20days
,100 * (yprice5 - yprice)/yprice pct_gain5
,100 * (yprice10 - yprice)/yprice pct_gain10
,100 * (yprice20 - yprice)/yprice pct_gain20
FROM
(
SELECT tkr_date,tkr,ydate,yprice,yvol
-- To get MAS, I 1st need Moving Avg. I like 3 types: 5day, 10day, 20day
,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) yprice_avg5
,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) yprice_avg10
,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)yprice_avg20
-- To get future PG I use LEAD()
,LEAD(yprice,5,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date) yprice5
,LEAD(yprice,10,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice10
,LEAD(yprice,20,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice20
FROM etfmas WHERE yprice>0
ORDER BY tkr_date
)
ORDER BY tkr_date
/
-- Create a view which will help create the mas table.
CREATE OR REPLACE VIEW mas10 AS SELECT tkr_date,sysdate - ydate dayspast,yprice,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
-- The columns below should not propagate to the SVM-build-score tables.
-- They need to be dropped or filtered out later. For now, I need them.
,pct_gain5
,pct_gain10
,pct_gain20
,tkr
,ydate
--
FROM etfmai ORDER BY tkr_date
/
-- The mas table contains a super-set of the columns which will go into the SVM-build-score tables.
-- Add slope columns from SHY, and XLE under assumption that interest rates and energy prices affect ETF prices.
DROP TABLE mas;
CREATE TABLE mas AS SELECT m.tkr_date,sysdate - m.ydate dayspast,m.yprice,m.yvol
,m.yprice_avg5
,m.yprice_avg10
,m.yprice_avg20
,m.yprice_avg5slp
,m.yprice_avg10slp
,m.yprice_avg20slp
,'nup' aa_targ_att
--
,shy.yprice_avg5slp shy_yprice_avg5slp
,xle.yprice_avg5slp xle_yprice_avg5slp
--
,shy.yprice_avg10slp shy_yprice_avg10slp
,xle.yprice_avg10slp xle_yprice_avg10slp
--
,shy.yprice_avg20slp shy_yprice_avg20slp
,xle.yprice_avg20slp xle_yprice_avg20slp
--
-- The columns below should not propagate to the SVM-build-score tables.
-- They need to be dropped or filtered out later. For now, I need them.
,m.pct_gain5
,m.pct_gain10
,m.pct_gain20
,m.tkr
,m.ydate
FROM mas10 m, mas10 shy, mas10 xle
WHERE m.ydate=shy.ydate AND m.ydate=xle.ydate
AND shy.tkr='SHY'
AND xle.tkr='XLE'
ORDER BY m.tkr_date
/
-- The model-builder chokes on wide target attributes. Slim down to 3.
ALTER TABLE mas MODIFY aa_targ_att VARCHAR2(3);
mas.sql Screendump
21:56:19 SQL>
21:56:20 SQL> @mas.sql
21:56:25 SQL> --
21:56:25 SQL> -- mas.sql
21:56:25 SQL> --
21:56:25 SQL>
21:56:25 SQL> CONNECT trade/t
Connected.
Session altered.
Elapsed: 00:00:00.00
21:56:25 SQL> SET ECHO ON TIME ON TIMING ON LINES 155 PAGES 99
21:56:25 SQL>
21:56:25 SQL> -- Creates a view full of much of the data needed to drive my Moving-Avg-Study (MAS).
21:56:25 SQL> -- Some of the thinking behind this script is written down here:
21:56:25 SQL> -- vlsql/notes/notes2008_1209xp.txt
21:56:25 SQL>
21:56:25 SQL> -- MAS depends on Moving Average so I use an inline view to get 3 types of Moving Averages.
21:56:25 SQL> CREATE OR REPLACE VIEW etfmai AS SELECT tkr_date,tkr,ydate,yprice,yvol
21:56:25 2 ,yprice_avg5
21:56:25 3 ,yprice_avg10
21:56:25 4 ,yprice_avg20
21:56:25 5 -- Get MAS from past 5days, 10days, 20days. Use LAG() to get yesterday's Moving Average
21:56:25 6 ,yprice_avg5-LAG(yprice_avg5)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg5slp
21:56:25 7 ,yprice_avg10-LAG(yprice_avg10)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg10slp
21:56:25 8 ,yprice_avg20-LAG(yprice_avg20)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg20slp
21:56:25 9 -- Get PG for future 5days, 10days, 20days
21:56:25 10 ,100 * (yprice5 - yprice)/yprice pct_gain5
21:56:25 11 ,100 * (yprice10 - yprice)/yprice pct_gain10
21:56:25 12 ,100 * (yprice20 - yprice)/yprice pct_gain20
21:56:25 13 FROM
21:56:25 14 (
21:56:25 15 SELECT tkr_date,tkr,ydate,yprice,yvol
21:56:25 16 -- To get MAS, I 1st need Moving Avg. I like 3 types: 5day, 10day, 20day
21:56:25 17 ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) yprice_avg5
21:56:25 18 ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) yprice_avg10
21:56:25 19 ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)yprice_avg20
21:56:25 20 -- To get future PG I use LEAD()
21:56:25 21 ,LEAD(yprice,5,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date) yprice5
21:56:25 22 ,LEAD(yprice,10,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice10
21:56:25 23 ,LEAD(yprice,20,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice20
21:56:25 24 FROM etfmas WHERE yprice>0
21:56:25 25 ORDER BY tkr_date
21:56:25 26 )
21:56:25 27 ORDER BY tkr_date
21:56:25 28 /
View created.
Elapsed: 00:00:00.79
21:56:25 SQL>
21:56:25 SQL> -- Create a view which will help create the mas table.
21:56:25 SQL> CREATE OR REPLACE VIEW mas10 AS SELECT tkr_date,sysdate - ydate dayspast,yprice,yvol
21:56:25 2 ,yprice_avg5
21:56:25 3 ,yprice_avg10
21:56:25 4 ,yprice_avg20
21:56:25 5 ,yprice_avg5slp
21:56:25 6 ,yprice_avg10slp
21:56:25 7 ,yprice_avg20slp
21:56:25 8 -- The columns below should not propagate to the SVM-build-score tables.
21:56:25 9 -- They need to be dropped or filtered out later. For now, I need them.
21:56:25 10 ,pct_gain5
21:56:25 11 ,pct_gain10
21:56:25 12 ,pct_gain20
21:56:25 13 ,tkr
21:56:25 14 ,ydate
21:56:25 15 --
21:56:25 16 FROM etfmai ORDER BY tkr_date
21:56:25 17 /
View created.
Elapsed: 00:00:00.04
21:56:25 SQL>
21:56:25 SQL> -- The mas table contains a super-set of the columns which will go into the SVM-build-score tables.
21:56:25 SQL> -- Add slope columns from SHY, and XLE under assumption that interest rates and energy prices affect ETF prices.
21:56:25 SQL> DROP TABLE mas;
Table dropped.
Elapsed: 00:00:00.13
21:56:26 SQL> CREATE TABLE mas AS SELECT m.tkr_date,sysdate - m.ydate dayspast,m.yprice,m.yvol
21:56:26 2 ,m.yprice_avg5
21:56:26 3 ,m.yprice_avg10
21:56:26 4 ,m.yprice_avg20
21:56:26 5 ,m.yprice_avg5slp
21:56:26 6 ,m.yprice_avg10slp
21:56:26 7 ,m.yprice_avg20slp
21:56:26 8 ,'nup' aa_targ_att
21:56:26 9 --
21:56:26 10 ,shy.yprice_avg5slp shy_yprice_avg5slp
21:56:26 11 ,xle.yprice_avg5slp xle_yprice_avg5slp
21:56:26 12 --
21:56:26 13 ,shy.yprice_avg10slp shy_yprice_avg10slp
21:56:26 14 ,xle.yprice_avg10slp xle_yprice_avg10slp
21:56:26 15 --
21:56:26 16 ,shy.yprice_avg20slp shy_yprice_avg20slp
21:56:26 17 ,xle.yprice_avg20slp xle_yprice_avg20slp
21:56:26 18 --
21:56:26 19 -- The columns below should not propagate to the SVM-build-score tables.
21:56:26 20 -- They need to be dropped or filtered out later. For now, I need them.
21:56:26 21 ,m.pct_gain5
21:56:26 22 ,m.pct_gain10
21:56:26 23 ,m.pct_gain20
21:56:26 24 ,m.tkr
21:56:26 25 ,m.ydate
21:56:26 26 FROM mas10 m, mas10 shy, mas10 xle
21:56:26 27 WHERE m.ydate=shy.ydate AND m.ydate=xle.ydate
21:56:26 28 AND shy.tkr='SHY'
21:56:26 29 AND xle.tkr='XLE'
21:56:26 30 ORDER BY m.tkr_date
21:56:26 31 /
Table created.
Elapsed: 00:00:01.30
21:56:27 SQL>
21:56:27 SQL> -- The model-builder chokes on wide target attributes. Slim down to 3.
21:56:27 SQL> ALTER TABLE mas MODIFY aa_targ_att VARCHAR2(3);
Table altered.
Elapsed: 00:00:00.01
21:56:27 SQL>
21:56:32 SQL>
As we can see, the script only needed 12 seconds to run.
List of Columns in the mas Table
22:04:11 SQL> DESCRIBE mas
Name Null? Type
-------------------------- -------- ------------------
TKR_DATE VARCHAR2(15)
DAYSPAST NUMBER
YPRICE NUMBER
YVOL NUMBER
YPRICE_AVG5 NUMBER
YPRICE_AVG10 NUMBER
YPRICE_AVG20 NUMBER
YPRICE_AVG5SLP NUMBER
YPRICE_AVG10SLP NUMBER
YPRICE_AVG20SLP NUMBER
AA_TARG_ATT VARCHAR2(3)
SHY_YPRICE_AVG5SLP NUMBER
XLE_YPRICE_AVG5SLP NUMBER
SHY_YPRICE_AVG10SLP NUMBER
XLE_YPRICE_AVG10SLP NUMBER
SHY_YPRICE_AVG20SLP NUMBER
XLE_YPRICE_AVG20SLP NUMBER
PCT_GAIN5 NUMBER
PCT_GAIN10 NUMBER
PCT_GAIN20 NUMBER
TKR VARCHAR2(4)
YDATE DATE
22:04:16 SQL>
Transform Even More(the T in ETL)
The next part of our transformation step involves copying data from the mas table into dimension tables and a fact table. A comprehensive discussion about dimension tables and a fact tables can be found at this URL: http://philip.greenspun.com/sql/data-warehousing.html We focus first on the most important thing in our data. Obviously the most important columns in the mas table are PCT_GAIN5, PCT_GAIN10, and PCT_GAIN20. If we can find a cause and effect relationship between any of the columns like TKR or SHY_YPRICE_AVG5SLP and percentage gain, then our work is done and we have the knowledge we need to get rich. We will use data warehousing jargon to label percent_gain as a "measure". Typically in data warehousing efforts, the measure corresponds to some attribute in some system which you want to predict. From an implementaion perspective, the measure only appears in a fact table. The dimensions appear in dimension tables. Both the fact table and its dimension tables need to be constructed so they can be joined. The other columns in the mas table correspond to "dimensions". If the values of dimensions (alone or working in concert) affect the values of a measure we want to see this relationship. OLAP reports are sometimes well suited for showing the relationships between dimensions and a measure. Arbitrarily, we have chosen to connect the following dimensions to our first OLAP data store: - time - tkr - strategy - holding_period - pivot_point The time dimension is the most complicated due to the fact that it is hierarchical. A script to create the tables for the time dimension is displayed below:
--
-- cr_time_dimension.sql
--
-- Start by creating a table full of dates.
DROP TABLE dropme_soon;
CREATE TABLE dropme_soon(ydate DATE);
BEGIN
FOR i IN 1..21234 LOOP
INSERT INTO dropme_soon SELECT TRUNC(TO_DATE('2020-01-01','YYYY-MM-DD')) - i FROM dual;
END LOOP;
END;
/
SELECT MIN(ydate),MAX(ydate),COUNT(*)FROM dropme_soon;
-- Next create a table full of days which exist between the endpoints of the data-dates in table: mas.
DROP TABLE day_dimension ;
CREATE TABLE day_dimension AS
SELECT
0+TO_CHAR(ydate,'YYYYMMDD')day_id
,0+TO_CHAR(ydate,'YYYYMM')month_id
,TO_CHAR(ydate,'YYYY_MM_DD')day_dsc
,ydate day_end_date
,1 day_time_span
FROM dropme_soon
WHERE ydate BETWEEN (SELECT MIN(ydate)FROM mas)AND(SELECT MAX(ydate)FROM mas)
/
-- Months next.
DROP TABLE month_dimension ;
CREATE TABLE month_dimension AS
SELECT
0+TO_CHAR(day_end_date,'YYYYMM')month_id
,0+TO_CHAR(day_end_date,'YYYY')year_id
,TO_CHAR(day_end_date,'YYYY_MM')month_dsc
,MAX(day_end_date) month_end_date
,1 + MAX(day_end_date) - MIN(day_end_date) month_time_span
FROM day_dimension
GROUP BY
0+TO_CHAR(day_end_date,'YYYYMM')
,0+TO_CHAR(day_end_date,'YYYY')
,TO_CHAR(day_end_date,'YYYY_MM')
/
SELECT COUNT(*)FROM month_dimension ORDER BY month_id;
-- SELECT * FROM month_dimension ORDER BY month_id;
-- Years next.
DROP TABLE year_dimension ;
CREATE TABLE year_dimension AS
SELECT
0+TO_CHAR(day_end_date,'YYYY')year_id
,1 all_time_id
,TO_CHAR(day_end_date,'YYYY')year_dsc
,MAX(day_end_date) year_end_date
,1 + MAX(day_end_date) - MIN(day_end_date) year_time_span
FROM day_dimension
GROUP BY
0+TO_CHAR(day_end_date,'YYYY')
,1
,TO_CHAR(day_end_date,'YYYY')
/
-- All time next
DROP TABLE all_time_dimension ;
CREATE TABLE all_time_dimension AS
SELECT
1 all_time_id
,'All_Time'all_time_dsc
,MAX(day_end_date) all_time_end_date
,1 + MAX(day_end_date) - MIN(day_end_date) all_time_time_span
FROM day_dimension
/
SELECT * FROM year_dimension ORDER BY year_id;
SELECT * FROM all_time_dimension ;
-- See how well they join:
SELECT COUNT(*)FROM day_dimension;
SELECT COUNT(*)FROM day_dimension d, month_dimension m, year_dimension y, all_time_dimension a
WHERE d.month_id = m.month_id AND m.year_id = y.year_id AND y.all_time_id = a.all_time_id
/
The script to create the other dimensions is simple:
--
-- cr_dimensions.sql
--
DROP TABLE tkr_dimension ;
CREATE TABLE tkr_dimension(tkr VARCHAR2(5));
INSERT INTO tkr_dimension VALUES('DIA');
INSERT INTO tkr_dimension VALUES('EFA');
INSERT INTO tkr_dimension VALUES('MDY');
INSERT INTO tkr_dimension VALUES('OIH');
INSERT INTO tkr_dimension VALUES('SHY');
INSERT INTO tkr_dimension VALUES('SPY');
INSERT INTO tkr_dimension VALUES('XLB');
INSERT INTO tkr_dimension VALUES('XLE');
INSERT INTO tkr_dimension VALUES('XLF');
INSERT INTO tkr_dimension VALUES('XLI');
INSERT INTO tkr_dimension VALUES('XLK');
INSERT INTO tkr_dimension VALUES('XLV');
DROP TABLE strategy_dimension;
CREATE TABLE strategy_dimension(strategy VARCHAR2(9));
INSERT INTO strategy_dimension VALUES('bh');
INSERT INTO strategy_dimension VALUES('loc');
INSERT INTO strategy_dimension VALUES('los');
DROP TABLE pivot_point_dimension;
CREATE TABLE pivot_point_dimension(pivot_point NUMBER);
INSERT INTO pivot_point_dimension VALUES(5);
INSERT INTO pivot_point_dimension VALUES(10);
INSERT INTO pivot_point_dimension VALUES(20);
DROP TABLE holding_period_dimension;
CREATE TABLE holding_period_dimension(holding_period NUMBER);
INSERT INTO holding_period_dimension VALUES(5);
INSERT INTO holding_period_dimension VALUES(10);
INSERT INTO holding_period_dimension VALUES(20);
The script to create the fact table is a bit more involved:
--
-- cr_yfact.sql
--
-- This script creates yfact from the mas table.
-- We call it yfact because it contains data from yahoo.
-- We use a series of SELECTs and INSERTs.
-- Each item in the series corresponds to a unique combination of dimensions for the yfact table.
-- The dimensions and their values are listed below:
-- strategy: bh, loc, los
-- pivot_point: 5,10,20
-- holding_period: 5,10,20
-- tkr:
-- - DIA
-- - EFA
-- - MDY
-- - OIH
-- - SHY
-- - SPY
-- - XLB
-- - XLE
-- - XLF
-- - XLI
-- - XLK
-- - XLV
-- DROP and CREATE yfact first
-- pg (percentage gain, is the measure)
DROP TABLE yfact;
CREATE TABLE yfact
(
pg NUMBER
,day_id NUMBER -- fk to day_dimension
,tkr VARCHAR2(11) -- DIA, EFA, MDY, ...
,strategy VARCHAR2(11) -- bh, loc, los
,pivot_point NUMBER -- 5,10,20
,holding_period NUMBER -- 5,10,20
)
/
-- SELECT, INSERT rows where strategy == bh
-- SELECT, INSERT rows where pivot_point == 5,10,20
-- SELECT, INSERT rows where holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain5,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh', 5, 5 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain5,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',10, 5 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain5,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',20, 5 FROM mas;
-- SELECT, INSERT rows where strategy == bh
-- SELECT, INSERT rows where pivot_point == 5,10,20
-- SELECT, INSERT rows where holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain10,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh', 5,10 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain10,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',10,10 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain10,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',20,10 FROM mas;
-- SELECT, INSERT rows where strategy == bh
-- SELECT, INSERT rows where pivot_point == 5,10,20
-- SELECT, INSERT rows where holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain20,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh', 5,20 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain20,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',10,20 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain20,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',20,20 FROM mas;
--
-- loc below
--
-- SELECT, INSERT rows where strategy == loc
-- SELECT, INSERT rows where pivot_point == 5
-- SELECT, INSERT rows where holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
-- loc logic: If mvg-avg slope < 0 hold cash (which gives a pg of 0) ELSE accept pg from long position
CASE WHEN yprice_avg5slp<0 THEN 0 ELSE pct_gain5 END -- loc logic implemented here
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 5, 5 FROM mas;
-- strategy == loc, pivot_point == 10, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN 0 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 10, 5 FROM mas;
-- strategy == loc, pivot_point == 20, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN 0 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 20, 5 FROM mas;
-- bump holding_period to 10
-- strategy == loc, pivot_point == 5, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN 0 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 5, 10 FROM mas;
-- strategy == loc, pivot_point == 10, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN 0 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 10, 10 FROM mas;
-- strategy == loc, pivot_point == 20, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN 0 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 20, 10 FROM mas;
-- bump holding_period to 20
-- strategy == loc, pivot_point == 5, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN 0 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 5, 20 FROM mas;
-- strategy == loc, pivot_point == 10, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN 0 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 10, 20 FROM mas;
-- strategy == loc, pivot_point == 20, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN 0 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 20, 20 FROM mas;
--
-- los below
--
-- SELECT, INSERT rows where strategy == los
-- SELECT, INSERT rows where pivot_point == 5
-- SELECT, INSERT rows where holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
-- los logic: If mvg-avg slope < 0 accept -pg from short position ELSE accept pg from long position
CASE WHEN yprice_avg5slp<0 THEN -pct_gain5 ELSE pct_gain5 END -- los logic implemented here
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 5, 5 FROM mas;
-- strategy == los, pivot_point == 10, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN -pct_gain5 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 10, 5 FROM mas;
-- strategy == los, pivot_point == 20, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN -pct_gain5 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 20, 5 FROM mas;
-- bump holding_period to 10
-- strategy == los, pivot_point == 5, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN -pct_gain10 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 5, 10 FROM mas;
-- strategy == los, pivot_point == 10, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN -pct_gain10 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 10, 10 FROM mas;
-- strategy == los, pivot_point == 20, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN -pct_gain10 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 20, 10 FROM mas;
-- bump holding_period to 20
-- strategy == los, pivot_point == 5, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN -pct_gain20 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 5, 20 FROM mas;
-- strategy == los, pivot_point == 10, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN -pct_gain20 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 10, 20 FROM mas;
-- strategy == los, pivot_point == 20, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN -pct_gain20 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 20, 20 FROM mas;
--
-- Verify that the dimension tables can be joined
--
SELECT COUNT(*)FROM yfact;
SELECT COUNT(*)
FROM yfact y
,day_dimension d
,tkr_dimension t
,strategy_dimension s
,pivot_point_dimension p
,holding_period_dimension h
WHERE y.day_id =d.day_id
AND y.tkr =t.tkr
AND y.strategy =s.strategy
AND y.pivot_point =p.pivot_point
AND y.holding_period=h.holding_period
/
Once the yfact table is created, we can build more fact tables from it:
--
-- cr_yfact_ru.sql
--
-- Creates a set of fact tables which have dimensions "rolled-up" using a simple average.
-- Average-Roll-Up tkr
DROP TABLE yfact_rut;
CREATE TABLE yfact_rut AS SELECT AVG(pg)pg,day_id,strategy,pivot_point,holding_period FROM yfact GROUP BY day_id,strategy,pivot_point,holding_period;
-- Average-Roll-Up strategy
DROP TABLE yfact_rus ;
CREATE TABLE yfact_rus AS SELECT AVG(pg)pg,day_id,tkr,pivot_point,holding_period FROM yfact GROUP BY day_id,tkr,pivot_point,holding_period;
-- Average-Roll-Up pivot_point
DROP TABLE yfact_rup ;
CREATE TABLE yfact_rup AS SELECT AVG(pg)pg,day_id,tkr,strategy,holding_period FROM yfact GROUP BY day_id,tkr,strategy,holding_period;
-- Average-Roll-Up holding_period
DROP TABLE yfact_ruh;
CREATE TABLE yfact_ruh AS SELECT AVG(pg)pg,day_id,tkr,strategy,pivot_point FROM yfact GROUP BY day_id,tkr,strategy,pivot_point;
-- Average-Roll-Up tkr and strategy
DROP TABLE yfact_ruts;
CREATE TABLE yfact_ruts AS SELECT AVG(pg)pg,day_id,pivot_point,holding_period FROM yfact GROUP BY day_id,pivot_point,holding_period;
-- Average-Roll-Up tkr and pivot_point
DROP TABLE yfact_rutp;
CREATE TABLE yfact_rutp AS SELECT AVG(pg)pg,day_id,strategy,holding_period FROM yfact GROUP BY day_id,strategy,holding_period;
-- Average-Roll-Up tkr and holding_period
DROP TABLE yfact_ruth;
CREATE TABLE yfact_ruth AS SELECT AVG(pg)pg,day_id,strategy,pivot_point FROM yfact GROUP BY day_id,strategy,pivot_point;
Load (the L in ETL) and Analytic Workspace Manager (AWM)
Once the fact tables are built we can begin working with the Oracle tool named Analytic Workspace Manager (AWM). A general tutorial for working with Analytic Workspace Manager should be available from one or both of the URLs listed below: - http://www.oracle.com/technology/obe/10gr2_db_single/bidw/awm/awm_otn.htm - http://www.google.com/search?q=awm+tutorial We start with the dimensions.
The trickiest dimension to setup is the time dimension so we start with that:
- Start AWM. We use a command line like this:
- ${JAVA_HOME}/bin/java -mx512m -jar awm11.1.0.7.0B.jar &
- Point the tool at a database and a schema.
- Once that is done we see something like this:
Next create an "Analytic Workspace" via right-click and then see something like this:
Next, start creation of a new dimension via right-click on "Dimensions":
Via right-clicks, create levels: all, year, month, and day:
Via right-clicks, create hierarchy: time_hierarchy: Put all at top, year below that, month below that, day below that:
Next, do "mappings": Left click mappings. We see 3 panes:
Switch from star-schema to snowflake-schema:
In the middle pane find the tables created by the cr_time_dimension.sql script: all_time_dimension, year_dimension, month_dimension, and day_dimension. Drag them to the right-hand-pane:
Connect them to the TIME dimension:
The other dimensions are easier to build than the time dimension.
Screenshots for the tkr dimension are displayed below:
The other dimensions like the tkr dimension are easy to build:
Again, notice that the time dimension is different than the others: - It has levels - It has a hierarchy - It is of type "Time" not "User" - It was built from a Snowflake Schema not Star Schema Now that the dimensions are built we create the YCUBE from the yfact table. Right-click on Cubes-icon to bring up the "Create Cube" dialog box. The images describe the steps:
Next, create a measure named pg. It's very easy; right-click to bring up the dialog box. Give it a name; and press "Create":
Next, we map the yfact table to YCUBE. Right-click on "Mappings" icon and then drag the yfact table to the right-hand-pane:
Connect yfact columns to YCUBE attributes:
That completes construction of YCUBE. Next, we fill YCUBE and the dimensions with data from the dimension tables and the yfact table. This is a simple multi-step process: - Right-click YCUBE - Select "Maintain cube YCUBE..." to bring up the maintain dialog box:
Click "Finish" and wait about 10 or 20 minutes (if your machine is slow):
At this point, YCUBE and its dimensions are full of data. Right-click the "PG" measure-icon inside of the "Measures" folder icon:
Click "View Data PG..." to bring up the "Measure Data Viewer":
The data viewer is not intuitive to many people. We usually start by clicking the question-mark-pencil in the upper-left-hand corner. This brings up a box called the "Query Builder":
Now, our intention is to enable the dimensions so they are viewable. Note that "Query Builder" is designed to disable most of the dimension values by default. Perhaps this will confuse first-time users. Maybe a better design dictates that all dimension values are displayed by default. We click on the "Dimensions" tab and we see this:
We use the chevrons to enable the Time dimension values:
Then we enable other dimension values:
The appearence of the Measure Data Viewer changes dramtically:
We expand it to fill the screen. We see that the bh strategy did poorly in 2008 and the first quarter of 2009:
Some interesting results from the loc strategy are listed below:
It is obvious that the Measure Data Viewer is useful. A bit of interaction with it reveals that the Buy and Hold strategy worked best. But, anyone who followed that strategy in 2008 suffered significant loss. Let us now turn our attention to the question, "Can the SVM algorithim resident inside Orace Data Mining enhance our ability to use the strategies Buy and Hold, Long or Cash, and Long or Short?"
One Dimensional SVM Description
The SVM algorithm may be a useful tool for predicting future percentage gain of an ETF. How does the SVM algorithm work? Imagine that at 4:01 pm every trading day we collect the 20-day moving average slope of the DIA price. What do we see? Well, this is easily simulated via query against the mas table.
12:03:36 SQL> desc mas
Name Null? Type
-------------------------------- -------- -----------------------
TKR_DATE VARCHAR2(15)
DAYSPAST NUMBER
YPRICE NUMBER
YVOL NUMBER
YPRICE_AVG5 NUMBER
YPRICE_AVG10 NUMBER
YPRICE_AVG20 NUMBER
YPRICE_AVG5SLP NUMBER
YPRICE_AVG10SLP NUMBER
YPRICE_AVG20SLP NUMBER
AA_TARG_ATT VARCHAR2(3)
SHY_YPRICE_AVG5SLP NUMBER
XLE_YPRICE_AVG5SLP NUMBER
SHY_YPRICE_AVG10SLP NUMBER
XLE_YPRICE_AVG10SLP NUMBER
SHY_YPRICE_AVG20SLP NUMBER
XLE_YPRICE_AVG20SLP NUMBER
PCT_GAIN5 NUMBER
PCT_GAIN10 NUMBER
PCT_GAIN20 NUMBER
TKR VARCHAR2(4)
YDATE DATE
12:06:05 SQL> SELECT ydate,tkr,yprice_avg20slp FROM mas WHERE tkr='DIA'AND ydate BETWEEN'2008-01-02'AND'2008-01-31'ORDER BY ydate;
YDATE TKR YPRICE_AVG20SLP
---------- ---- ---------------
2008-01-02 DIA -.1145
2008-01-03 DIA -.067
2008-01-04 DIA -.29
2008-01-07 DIA -.374
2008-01-08 DIA -.497
2008-01-09 DIA -.4915
2008-01-10 DIA -.3045
2008-01-11 DIA -.4125
2008-01-14 DIA -.3485
2008-01-15 DIA -.381
2008-01-16 DIA -.3385
2008-01-17 DIA -.5095
2008-01-18 DIA -.547
2008-01-22 DIA -.634
2008-01-23 DIA -.5595
2008-01-24 DIA -.571
2008-01-25 DIA -.637
2008-01-28 DIA -.487
2008-01-29 DIA -.4245
2008-01-30 DIA -.42
2008-01-31 DIA -.2185
21 rows selected.
Elapsed: 00:00:00.01
12:08:03 SQL>
Next, for each of these days we wait 10 trading days and then collect resulting percentage gain. Again, this is easily simulated via query against the mas table.
12:08:03 SQL> SELECT ydate,tkr,yprice_avg20slp,pct_gain10 FROM mas WHERE tkr='DIA'AND ydate BETWEEN'2008-01-02'AND'2008-01-31'ORDER BY ydate;
YDATE TKR YPRICE_AVG20SLP PCT_GAIN10
---------- ---- --------------- ----------
2008-01-02 DIA -.1145 -4.5772709
2008-01-03 DIA -.067 -6.8605108
2008-01-04 DIA -.29 -5.8196393
2008-01-07 DIA -.374 -6.7950261
2008-01-08 DIA -.497 -2.3838681
2008-01-09 DIA -.4915 -2.670335
2008-01-10 DIA -.3045 -4.2922595
2008-01-11 DIA -.4125 -1.9946267
2008-01-14 DIA -.3485 -2.2157763
2008-01-15 DIA -.381 -1.0987209
2008-01-16 DIA -.3385 1.19508778
2008-01-17 DIA -.5095 4.77556531
2008-01-18 DIA -.547 4.52804494
2008-01-22 DIA -.634 3.13306938
2008-01-23 DIA -.5595 -.35962198
2008-01-24 DIA -.571 -.56534752
2008-01-25 DIA -.637 -.36113211
2008-01-28 DIA -.487 -.98022927
2008-01-29 DIA -.4245 -.61799604
2008-01-30 DIA -.42 1.19383187
2008-01-31 DIA -.2185 -1.5800619
21 rows selected.
Elapsed: 00:00:00.02
12:14:57 SQL>
We ask a simple question, is pct_gain10 dependent on yprice_avg20slp? If there is a dependency, it is not obvious from the above report of 21 rows. Perhaps we do have a dependency but it is very weak. In order to express, or find, this dependency we try an idea from Data Mining called Classification. It's a simple idea; we classify the data in pct_gain10 into two groups: up-group contains rows where pct_gain10 > 3.0 nup-group contains rows where pct_gain10 <= 3.0 SQL is well suited for showing us the two groups:
12:32:24 SQL> l
1 SELECT ydate,tkr,yprice_avg20slp,pct_gain10,
2 CASE WHEN pct_gain10 > 3.0 THEN 'up' ELSE 'nup' END the_class
3* FROM mas WHERE tkr='DIA'AND ydate BETWEEN'2008-01-02'AND'2008-01-31'ORDER BY ydate
12:32:25 SQL> /
YDATE TKR YPRICE_AVG20SLP PCT_GAIN10 THE
---------- ---- --------------- ---------- ---
2008-01-02 DIA -.1145 -4.5772709 nup
2008-01-03 DIA -.067 -6.8605108 nup
2008-01-04 DIA -.29 -5.8196393 nup
2008-01-07 DIA -.374 -6.7950261 nup
2008-01-08 DIA -.497 -2.3838681 nup
2008-01-09 DIA -.4915 -2.670335 nup
2008-01-10 DIA -.3045 -4.2922595 nup
2008-01-11 DIA -.4125 -1.9946267 nup
2008-01-14 DIA -.3485 -2.2157763 nup
2008-01-15 DIA -.381 -1.0987209 nup
2008-01-16 DIA -.3385 1.19508778 nup
2008-01-17 DIA -.5095 4.77556531 up
2008-01-18 DIA -.547 4.52804494 up
2008-01-22 DIA -.634 3.13306938 up
2008-01-23 DIA -.5595 -.35962198 nup
2008-01-24 DIA -.571 -.56534752 nup
2008-01-25 DIA -.637 -.36113211 nup
2008-01-28 DIA -.487 -.98022927 nup
2008-01-29 DIA -.4245 -.61799604 nup
2008-01-30 DIA -.42 1.19383187 nup
2008-01-31 DIA -.2185 -1.5800619 nup
21 rows selected.
Next, we turn to wikipedia.org and look up 'SVM Algorithm': http://www.google.com/search?q=SVM+algorithm+site:wikipedia.org We see the following sentence there: "Viewing input data as two sets of vectors in an n-dimensional space, an SVM will construct a separating hyperplane in that space, one which maximizes the margin between the two data sets." Let's scrutinze the above sentence. In our example, "Viewing input data as two sets of vectors" means that we have separated our rows into two groups, the up-group and the nup-group. In our example, "Viewing input data as two sets of vectors in an n-dimensional space" means that we view our 2 groups by plotting them on a 1-dimensional line. We have only 1-dimension because a 1-dimensional line is easy to visualize. How exactly do we view our 1-dimensional line? Step 1, draw a horizontal line. Step 2, write "yprice_avg20slp" to the right of it. Step 3, write "0" in the middle of it. Step 4, write a tiny, minscule red "n" on the line for each nup-row: - The first one is to the left of the 0 at location -.1145 - The second one is to the left of the 0 at location -.067 Step 5, write a tiny, minscule green "u" on the line for each up-row: - The first one is to the left of the 0 at location -.5095 - The second one is to the left of the 0 at location -.547 So that 5-step process helps us see the 1-dimensional space. We return to the wikipedia-sentence and read the next clause: - "an SVM will construct a separating hyperplane" In our example this means that SVM will draw a curvy line such that the green-ups are separated from the red-nups. We return to the wikipedia-sentence and read the next clause: - "one which maximizes the margin between the two data sets." In our example this means that SVM will draw the curvy line as much in the middle between the green-ups and the red-nups as possible. Finally, we may obtain a row in the mas table where the value for yprice_avg20slp is known but "up" or "nup" are not yet known. The idea then is to plot the value of yprice_avg20slp on the 1-dimensional line as a blue-dot. Next we see which side of the curvy line that the blue-dot resides. Based on that, we predict whether this blue-dot should be a red-nup or a green-up. Perhaps it is obvious that a one-dimensional-SVM lacks predictive power. But we do believe that a discussion of a one-dimensional-SVM helps us introduce the concept of an n-dimensional-SVM.
Two Dimensional SVM Description
When we look at a description of the mas table, it is obvious that we can add a second dimension to our discussion of SVM. We pick xle_yprice_avg20slp which is the moving average slope of the XLE ETF. The thinking is that if xle_yprice_avg20slp is high then energy prices are increasing and this drags down the price of DIA. So, now we have two dimensions: - yprice_avg20slp - xle_yprice_avg20slp We plot them on a two-dimensional graph just like we ploted tables of x and y back in Middle School. Each point is a tiny black-circle if it is in the up-class or a tiny white-circle if it is in the nup-class. Once this plot is done and SVM is implemented, we might visualize the graph as that displayed here: http://en.wikipedia.org/wiki/Support_vector_machine#Motivation The above image depicts the separating hyperplane constructed by SVM as a straight-red-line. Keep in mind that we may visualize the separating hyperplane as a curvy line.
Classifying ETF percentage gain via SVM inside of Oracle Dataminer
We start our classification effort by creating a set of shell scripts. The names of the scripts are listed below:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ ll mas*score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_DIA_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_EFA_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_MDY_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_OIH_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_SHY_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_SPY_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_XLB_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_XLE_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_XLF_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_XLI_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_XLK_score.sh
-rwxr-xr-x 1 oracle dba 80 2009-02-16 22:51 mas_bh_XLV_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_DIA_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_EFA_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_MDY_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_OIH_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_SHY_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_SPY_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_XLB_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_XLE_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_XLF_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_XLI_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_XLK_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_loc_XLV_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_DIA_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_EFA_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_MDY_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_OIH_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_SHY_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_SPY_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_XLB_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_XLE_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_XLF_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_XLI_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_XLK_score.sh
-rwxr-xr-x 1 oracle dba 82 2009-02-04 15:10 mas_los_XLV_score.sh
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
It is obvious from the names of the scripts that each script "scores" a set of rows for a specific strategy and a specific ETF ticker symbol. For example the last script, mas_los_XLV_score.sh, scores a set of rows for the Long-Or-Short strategy applied to the XLV ETF. The verb "score" means classify but with the added nuance that SVM attach a confidence-level (expressed as a probability) to the classification attempt. Creating the above 36 scripts by hand is too tedious. Instead, a ruby script writes the scripts:
#!/usr/bin/env ruby
# Builds shell scripts for a set of sql scripts.
Dir["mas_lo?_*_score.sql","mas_bh_*_score.sql"].sort.each do |s|
s =~ /(mas.*).(sql)/
script_name = "#{$1}.sh"
shell_syntax = <<ENDshell_syntax
#!/bin/sh
# #{script_name}
sqlplus trade/t <<EOF
@#{s}
EOF
ENDshell_syntax
(fhw = File.open(script_name,"w")).write(shell_syntax); fhw.close
end
Notice that the above ruby script depends on a set of SQL scripts. Those scripts are listed below:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ ll mas*score.sql
-rw-r--r-- 1 oracle dba 4468341 2009-04-05 01:39 mas_bh_DIA_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_EFA_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_MDY_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_OIH_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_SHY_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_SPY_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_XLB_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_XLE_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_XLF_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_XLI_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_XLK_score.sql
-rw-r--r-- 1 oracle dba 4462951 2009-04-05 01:39 mas_bh_XLV_score.sql
-rw-r--r-- 1 oracle dba 13962050 2009-04-11 03:00 mas_loc_DIA_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_EFA_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_MDY_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_OIH_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_SHY_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_SPY_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLB_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLE_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLF_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLI_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLK_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLV_score.sql
-rw-r--r-- 1 oracle dba 15528860 2009-04-11 03:00 mas_los_DIA_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_EFA_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_MDY_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_OIH_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_SHY_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_SPY_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLB_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLE_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLF_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLI_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLK_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLV_score.sql
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
Notice that these are large SQL scripts. Two simple shell commands reveal that the very last script required 12 hours to run:
oracle@zareason:~/vlsql/log$
oracle@zareason:~/vlsql/log$
oracle@zareason:~/vlsql/log$ head mas_los_XLV_score.sh.out.txt | grep 2009
SQL*Plus: Release 11.1.0.6.0 - Production on Thu May 21 04:24:07 2009
oracle@zareason:~/vlsql/log$
oracle@zareason:~/vlsql/log$ ll mas_los_XLV_score.sh.out.txt
-rw-r--r-- 1 oracle dba 244307039 2009-05-21 18:20 mas_los_XLV_score.sh.out.txt
oracle@zareason:~/vlsql/log$
oracle@zareason:~/vlsql/log$
The "loc" scripts each ran in about 11.5 hours.
The "bh" scripts each ran in about 4 hours.
The above list of scripts required this amount of time to run:
(11.5hr/script + 4hr/script + 12hr/script) x 12 scripts = 330 hours
330 hr x (1 day / 24 hr) = 13.7 days
The three types of scripts ("bh","loc", and "los") where written with
the help of two ruby scripts.
The first ruby script is listed below:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ cat mas_bh.rb
#!/usr/bin/env ruby
# mas_bh.rb
# Demo: ruby -r mas_bh.rb -e 'MasBh.new.write_score_sql_scripts'
# Ruby script which uses nested loops to generate several large SQL scripts.
# Each SQL script is a wrapper for the SVM algorithm resident within Oracle Data Miner.
class MasBh
require 'rubygems'
require 'oci8'
def write_score_sql_scripts
# tkrs_a will hold a list of unique tkr names resident within the stock-price-time-series data.
tkrs_a = []
(oci8conn = OCI8.new('trade','t')).exec("SELECT DISTINCT tkr FROM etfmas ORDER BY tkr"){ |r| tkrs_a << r.to_s }
# See em.
p(tkrs_a)
# holding_periods_a holds integers representing days a tkr is held before its percentage gain is measured.
holding_periods_a = ["5","10","20"]
# pp is acronym for Pivot Point. It is not used by the BH strategy but we set it here so it will propagate into the mas_predictions table.
# Having it in that table makes it easier to compare BH strategy to LOC and LOS strategies.
pp="20"
# For bh strategy, build-model, and then score.
# The score will give us the probability that the pct-gain will be >= (the average + 1 std. deviation).
strategy="bh"
tkrs_a.each do |tkr|
sqlscript_name = "mas_#{strategy}_#{tkr}_score.sql"
p(sqlscript_name)
(fhw = File.open(sqlscript_name,"w")).write("--\n-- #{sqlscript_name}\n--\n\n"); fhw.close
holding_periods_a.each do |hp|
some_dates_a = []
oci8conn.exec("SELECT TO_CHAR(ydate,'YYYY-MM-DD')FROM etfmas WHERE tkr='#{tkr}'AND ydate>TO_DATE('2002-09-01','YYYY-MM-DD')AND ydate<TO_DATE('2009-04-01','YYYY-MM-DD')ORDER BY ydate"){ |r| some_dates_a << r.to_s }
some_dates_a.each do |adate|
sqlplus_syntax = <<END
--
CREATE OR REPLACE view sme AS SELECT
tkr_date
,NULL aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND ydate='#{adate}'
/
CREATE OR REPLACE view bme AS SELECT
tkr_date
,CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE aa_targ_att END aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND 5+#{hp}+ydate<'#{adate}'
/
@cr_bme.sql #{strategy} #{hp} #{pp}
PURGE RECYCLEBIN;
--
END
(fha = File.open(sqlscript_name,"a")).write(sqlplus_syntax); fha.close
end # tkrs_a.each
end # some_dates_a.each
end # holding_periods_a.each
end # def write_score_sql_scripts
end # class MasBh
MasBh.new.write_score_sql_scripts
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
The comments in the above script adequately describe it. Sample syntax which the script had generated is displayed below:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ head -50 mas_bh_DIA_score.sql
--
-- mas_bh_DIA_score.sql
--
--
CREATE OR REPLACE view sme AS SELECT
tkr_date
,NULL aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND ydate='2002-09-03'
/
CREATE OR REPLACE view bme AS SELECT
tkr_date
,CASE WHEN pct_gain5>(SELECT AVG(pct_gain5)+STDDEV(pct_gain5)FROM mas WHERE tkr='DIA')THEN'up'ELSE aa_targ_att END aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND 5+5+ydate<'2002-09-03'
/
@cr_bme.sql bh 5 20
PURGE RECYCLEBIN;
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
The first view, sme, contains a column named aa_targ_att. In SVM jargon that column is the target attribute. We intend to predict its value. Notice the predicate of the sme view. It constrains the data such that we have WHERE tkr='DIA'AND ydate='2002-09-03'. This helps us simulate the idea that we are watching the market for DIA close at 4pm on 2002-09-03. As the clock strikes 4, we make note of the closing price of DIA. With that price we can compute other column values such as yprice_avg5slp. Since we want to predict the target attribute of 1 ETF for just one day it makes sense that the sme view contains only 1 row. The second view, bme, contains many rows and thus many values in the column named aa_targ_att. These rows are fed to SVM as "training data" so that it may predict the 1 value of aa_targ_att in the 1 row of sme. The actual implementation of SVM is done by the call of the cr_bme.sql script. Notice in the above example that we feed 3 command line parameters to the cr_bme.sql script: @cr_bme.sql bh 5 20 The first parameter is bh which corresponds to the Buy and Hold strategy. The second parameter is 5 which corresponds to a 5 trading-day holding period. The third parameter is 20 which corresponds to a 20 trading-day pivot-point. Keep in mind that the Buy and Hold strategy does not depend on pivot-point. We set it here, however, because the cr_bme.sql script wants to see it. The cr_bme.sql script does need the pivot-point value when we use it to score "loc" and "los" data. The contents of the cr_bme.sql script are displayed later in this paper. As mentioned above we have two ruby scripts. The first script, just described, is used to generate SQL statements for just the Buy and Hold strategy. The second script is used to generate SQL statements for both the "loc" and "los" strategies. The second ruby script is displayed below:
#!/usr/bin/env ruby
# mas.rb
# Demo: ruby -r mas.rb -e 'Mas.new.write_score_sql_scripts'
# Ruby script which uses nested loops to generate several SQL scripts.
# Each script is a wrapper for the SVM algorithm resident within Oracle Data Miner.
# This ruby script is a tool to study possible strategies for trading individual ETFs.
# Also it aims to answer the question,
# "Can SVM effectively tell us to abandon all strategies and move to cash during certain conditions?"
# This Ruby script helps me deal with a cartesian product of these attributes:
# Strategy (Long Or Cash, Long Or Short)
# Strategy Pivot Points (5day-Moving-Avg-Slope, 10day-Moving-Avg-Slope, 20day-Moving-Avg-Slope)
# Holding Period (days a tkr is held before its percentage gain is measured)
# Dates in a time-series (ydate)
# Stock Ticker (tkr). Currently some ETF symbols: SPY, DIA, EFA, MDY, XLE, ...
class Mas
require 'rubygems'
require 'oci8'
def write_score_sql_scripts
# tkrs_a will hold a list of unique tkr names resident within the stock-price-time-series data.
tkrs_a = []
(oci8conn = OCI8.new('trade','t')).exec("SELECT DISTINCT tkr FROM etfmas ORDER BY tkr"){ |r| tkrs_a << r.to_s }
# See em.
p(tkrs_a)
# pivot_points_a holds integers representing a type of moving average slope.
# Some strategies (loc, los) depend upon a pivot point.
# For example, The loc-Trader will watch the moving average slope. Once it moves above zero, he will go Long.
# Once it moves below zero, he will sell so that he holds cash.
pivot_points_a = ["5","10","20"]
# holding_periods_a holds integers representing days a tkr is held before its percentage gain is measured.
holding_periods_a = ["5","10","20"]
# strategies_a holds a list of acronyms for strategies.
# Some of them are:
# bh = Buy and Hold aka Long
# loc = Long Or Cash
# los = Long Or Short
strategies_a = ['loc','los']
# For each strategy, build-model, and then score.
# The score will give us the probability that the pct-gain will be above a threshold dependent on std. deviation.
strategies_a.each do |strategy|
tkrs_a.each do |tkr|
sqlscript_name = "mas_#{strategy}_#{tkr}_score.sql"
(fhw = File.open(sqlscript_name,"w")).write("--\n-- #{sqlscript_name}\n--\n\n"); fhw.close
pivot_points_a.each do |pp|
holding_periods_a.each do |hp|
p("strategy: #{strategy} tkr: #{tkr} pivot_point: #{pp} holding_period: #{hp}")
some_dates_a = []
oci8conn.exec("SELECT TO_CHAR(ydate,'YYYY-MM-DD')FROM etfmas WHERE tkr='#{tkr}'AND ydate>TO_DATE('2002-09-01','YYYY-MM-DD')AND ydate<TO_DATE('2009-04-01','YYYY-MM-DD')ORDER BY ydate"){ |r| some_dates_a << r.to_s }
some_dates_a.each do |adate|
sqlplus_syntax1 = <<ENDsqlplus_syntax1
CREATE OR REPLACE view sme AS SELECT
NULL aa_targ_att
,tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND ydate='#{adate}'
/
CREATE OR REPLACE view bme AS SELECT
ENDsqlplus_syntax1
# Here the aa_targ_att column depends on my choice of strategy
case strategy
when "loc"
aa_targ_att_syntax = <<ENDloc_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN'nup'
ELSE
CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE'nup'END
END aa_targ_att
ENDloc_syntax
when "los"
aa_targ_att_syntax = <<ENDlos_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN
CASE WHEN pct_gain#{hp}<(SELECT AVG(pct_gain#{hp})-STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE'nup'END
ELSE
CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE'nup'END
END aa_targ_att
ENDlos_syntax
end
# Mix together the syntax
sqlplus_syntax = <<ENDsqlplus_syntax
#{sqlplus_syntax1}#{aa_targ_att_syntax},tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND 5+#{hp}+ydate<'#{adate}'
/
@cr_bme.sql #{strategy} #{hp} #{pp}
PURGE RECYCLEBIN;
--
ENDsqlplus_syntax
#Use cr_bme.sql to build a model from bme table and then score the sme table.
#Since we are scoring only 1 tkr for 1 day, we expect only 1 prediction.
(fha = File.open(sqlscript_name,"a")).write(sqlplus_syntax); fha.close
end # tkrs_a.each
end # some_dates_a.each
end # holding_periods_a.each
end # pivot_points_a.each
end # strategies_a.each
end # def write_score_sql_scripts
end # class Mas
Mas.new.write_score_sql_scripts
# end of ruby script
The comments in the above script adequately describe it. Sample syntax which the Ruby script had generated is displayed below:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ head -50 mas_loc_DIA_score.sql
--
-- mas_loc_DIA_score.sql
--
CREATE OR REPLACE view sme AS SELECT
NULL aa_targ_att
,tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND ydate='2002-09-03'
/
CREATE OR REPLACE view bme AS SELECT
CASE WHEN yprice_avg5slp<0 THEN'nup'
ELSE
CASE WHEN pct_gain5>(SELECT AVG(pct_gain5)+STDDEV(pct_gain5)FROM mas WHERE tkr='DIA')THEN'up'ELSE'nup'END
END aa_targ_att
,tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND 5+5+ydate<'2002-09-03'
/
@cr_bme.sql loc 5 5
PURGE RECYCLEBIN;
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
Notice that both of the ruby scripts generate SQL syntax which calls the cr_bme.sql script. The contents of the cr_bme.sql script are displayed below:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ cat cr_bme.sql
--
-- cr_bme.sql
--
-- usage: cr_bme.sql strategy holding_period pivot_point
-- A template script which accepts a pattern on the command line which is used at various places in the script to name related things.
-- Demo: @cr_bme.sql los 20 10
DEFINE strategy = '&&1'
DEFINE holding_period = '&&2'
DEFINE pivot_point = '&&3'
DEFINE target = 'aa_targ_att'
DEFINE model_name = 'mas_model'
DEFINE bldtable = 'bme'
DEFINE scoretable = 'sme'
DEFINE case_id = 'tkr_date'
-- Define a variable to help me exclude some columns from some data mining calls
DEFINE exclude1 = "'&target','&case_id'"
-- Builds an SVM model using pl/sql.
-----------------------------------------------------------------------
-- BUILD THE MODEL
-----------------------------------------------------------------------
-- Cleanup old build data preparation objects for repeat runs
DELETE svmc_miss_num;
DELETE svmc_miss_cat;
DELETE svmc_clip;
DELETE svmc_norm;
DELETE svmc_settings;
BEGIN EXECUTE IMMEDIATE 'DROP VIEW svmc_winsor';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW svmc_build_prep';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_build_miss_num';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_build_miss_cat';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
--------------------------------
-- PREPARE BUILD (TRAINING) DATA
--
-- 1. Missing Value treatment for all Predictors and
-- 2. Outlier Treatment and
-- 3. Normalization are performed below.
-- NOTE: that unlike SVM regression, the classification target is NOT
-- normalized here.
BEGIN
-- Perform missing value treatment for all predictors
-- create miss tables
-- DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_NUM (miss_table_name => 'svmc_miss_num');
-- DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_CAT (miss_table_name => 'svmc_miss_cat');
-- populate miss tables
DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_NUM_MEAN (
miss_table_name => 'svmc_miss_num',
data_table_name => '&bldtable',
exclude_list => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));
DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE (
miss_table_name => 'svmc_miss_cat',
data_table_name => '&bldtable',
exclude_list => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));
-- xform input data to replace missing values
DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM(
miss_table_name => 'svmc_miss_num',
data_table_name => '&bldtable',
xform_view_name => 'xformed_build_miss_num');
DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT(
miss_table_name => 'svmc_miss_cat',
data_table_name => '&bldtable',
xform_view_name => 'xformed_build_miss_cat');
-- Perform outlier treatment.
-- create clip table
-- DBMS_DATA_MINING_TRANSFORM.CREATE_CLIP (clip_table_name => 'svmc_clip');
-- populate clip table
DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_WINSOR_TAIL (
clip_table_name => 'svmc_clip',
data_table_name => '&bldtable',
tail_frac => 0.025,
exclude_list => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));
-- xform input data to winsorized data
DBMS_DATA_MINING_TRANSFORM.XFORM_CLIP(
clip_table_name => 'svmc_clip',
data_table_name => '&bldtable',
xform_view_name => 'svmc_winsor');
-- create normalization table
-- DBMS_DATA_MINING_TRANSFORM.CREATE_NORM_LIN (norm_table_name => 'svmc_norm');
-- populate normalization table based on winsorized data
DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_MINMAX (
norm_table_name => 'svmc_norm',
data_table_name => 'svmc_winsor',
exclude_list => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));
-- normalize the original data
DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN (
norm_table_name => 'svmc_norm',
data_table_name => '&bldtable',
xform_view_name => 'svmc_build_prep');
END;
/
-- CREATE AND POPULATE A SETTINGS TABLE
--
-- DROP TABLE svmc_settings ;
-- CREATE TABLE svmc_settings (
-- setting_name VARCHAR2(30),
-- setting_value VARCHAR2(30));
-- The default classification algorithm is Naive Bayes. So override
-- this choice to SVM using a settings table.
-- SVM chooses a kernel type automatically. This choice can be overriden
-- by the user. Linear kernel is preferred high dimensional data, and
-- Gaussian kernel for low dimensional data. Here we use linear kernel
-- to demonstrate the get_model_details_svm() API, which applies only for
-- models.
--
BEGIN
-- Populate settings table
INSERT INTO svmc_settings (setting_name, setting_value) VALUES
(dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines);
INSERT INTO svmc_settings (setting_name, setting_value) VALUES
(dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_gaussian);
-- 99123123 is 99,123,123 is 99mb
INSERT INTO svmc_settings (setting_name, setting_value) VALUES
(dbms_data_mining.svms_kernel_cache_size,222123123);
-- (dbms_data_mining.svms_kernel_cache_size,99123123);
-- (dbms_data_mining.svms_kernel_cache_size,333123123);
-- Examples of other possible overrides are:
-- (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_linear);
--(dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_complexity_factor);
COMMIT;
END;
/
---------------------
-- CREATE A NEW MODEL
--
-- Cleanup old model with the same name for repeat runs
BEGIN DBMS_DATA_MINING.DROP_MODEL('&model_name');
EXCEPTION WHEN OTHERS THEN NULL; END;
/
-- Build a new SVM Model
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => '&model_name',
mining_function => dbms_data_mining.classification,
data_table_name => 'svmc_build_prep',
case_id_column_name => '&case_id',
target_column_name => '&target',
settings_table_name => 'svmc_settings');
END;
/
-----------------------------------------------------------------------
-- APPLY/score THE MODEL
-----------------------------------------------------------------------
-- Cleanup old scoring data preparation objects for repeat runs
BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_apply_miss_num';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_apply_miss_cat';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW svmc_apply_prep';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
-----------------------
-- PREPARE SCORING DATA
--
-- If the data for model creation has been prepared, then the data
-- to be scored using the model must be prepared in the same manner
-- in order to obtain meaningful results.
--
-- 1. Missing Value treatment for all Predictors and
-- 2. Normalization
-- No outlier treatment will be performed during test and apply. The
-- normalization step is sufficient, since the normalization parameters
-- already capture the effects of outlier treatment done with build data.
--
BEGIN
-- Xform Test data to replace missing values
DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM(
miss_table_name => 'svmc_miss_num',
data_table_name => '&scoretable',
xform_view_name => 'xformed_apply_miss_num');
DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT(
miss_table_name => 'svmc_miss_cat',
data_table_name => '&scoretable',
xform_view_name => 'xformed_apply_miss_cat');
-- Normalize the data to be scored
DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN (
norm_table_name => 'svmc_norm',
data_table_name => '&scoretable',
xform_view_name => 'svmc_apply_prep');
END;
/
-- Get prob and the prediction:
INSERT INTO mas_predictions (tkr_date,strategy,holding_period,pivot_point,prediction,prob_up,prob_nup,run_date)
SELECT
tkr_date,'&strategy','&holding_period','&pivot_point'
,PREDICTION( &model_name USING *)prediction
,PREDICTION_PROBABILITY(&model_name,'up' USING *)prob_up
,PREDICTION_PROBABILITY(&model_name,'nup' USING *)prob_nup
,sysdate
FROM svmc_apply_prep
/
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
We see above, that the last SELECT statement is where SVM is actually implemented. All of the syntax above that statement completes mildly complex data manipulations. It transforms the data into a form which SVM can digest. It is obvious from the SELECT statement that each SVM prediction is 3 data values: - An "up" prediction - Or, a "nup" prediction (meaning "not-up") - Probability of "up" - Probability of "nup" We should note that prob_up + prob_nup is always equal to 1. So, we really only need one of the probabilities. Both the PREDICTION() function and the PREDICTION_PROBABILITY() function are documented in the Oracle SQL Reference and the Oracle Data Mining Guide: - http://www.google.com/search?q=prediction_probability+site:oracle.com Notice that the INSERT statement is loading the predictions and probabilities into a table named mas_predictions. Also note that we generated the predictions twice with the intention of using an average of each prediction. During the second gathering of predictions we removed 1 day of data from the data set sent to SVM in order to prevent the generation of an identical set of predictions. We display some data from mas_predictions below:
13:22:27 SQL>
13:22:28 SQL>
13:22:28 SQL>
13:22:29 SQL>
13:22:33 SQL>
13:22:33 SQL> @qry_mas_predictions.sql
13:22:39 SQL> SET ECHO ON
13:22:39 SQL> --
13:22:39 SQL> -- qry_mas_predictions.sql
13:22:39 SQL> --
13:22:39 SQL>
13:22:39 SQL> -- This script displays interesting data from mas_predictions table.
13:22:39 SQL> -- The mas_predictions table contains predictions of classifications of the mas table.
13:22:39 SQL> -- The two classifications are 'up' and 'nup'.
13:22:39 SQL> -- The original classification values reside in the column: bme.aa_targ_att
13:22:39 SQL> -- The predicted classification values reside in the column: mas_predictions.prediction
13:22:39 SQL> -- Each prediction comes with a confidence level which resides in two columns:
13:22:39 SQL> -- -- prob_up and prob_nup
13:22:39 SQL> -- Be aware that prob_up + prob_nup is equal to 1.
13:22:39 SQL>
13:22:39 SQL> SET LINES 55
13:22:39 SQL> DESCRIBE mas_predictions
Name Null? Type
-------------------------- -------- ------------------
TKR_DATE VARCHAR2(19)
STRATEGY VARCHAR2(9)
HOLDING_PERIOD NUMBER
PIVOT_POINT NUMBER
PREDICTION VARCHAR2(11)
PROB_UP NUMBER
PROB_NUP NUMBER
RUN_DATE DATE
13:22:39 SQL>
13:22:39 SQL> -- Count the tkrs
13:22:39 SQL> SELECT tkr,COUNT(tkr)FROM
13:22:39 2 (
13:22:39 3 SELECT
13:22:39 4 REGEXP_SUBSTR(tkr_date,'...')tkr
13:22:39 5 ,strategy
13:22:39 6 ,holding_period
13:22:39 7 ,pivot_point
13:22:39 8 ,prediction
13:22:39 9 ,prob_up
13:22:39 10 ,prob_nup
13:22:39 11 ,run_date
13:22:39 12 FROM mas_predictions
13:22:39 13 )
13:22:39 14 GROUP BY tkr ORDER BY tkr
13:22:39 15 /
TKR COUNT(TKR)
--- ----------
DIA 69108
EFA 68611
MDY 69078
OIH 68739
SHY 69140
SPY 69254
XLB 68648
XLE 67646
XLF 68918
XLI 69056
XLK 68985
XLV 69552
12 rows selected.
Elapsed: 00:00:00.00
13:22:39 SQL>
13:22:39 SQL> -- Drill down into counts of strategy, holding_period, and pivot_point
13:22:39 SQL> SET LINES 77 PAGES 55
13:22:39 SQL> COLUMN tkr FORMAT A3
13:22:39 SQL> SELECT tkr, strategy, holding_period, pivot_point, COUNT(tkr)FROM
13:22:39 2 (
13:22:39 3 SELECT
13:22:39 4 REGEXP_SUBSTR(tkr_date,'...')tkr
13:22:39 5 ,strategy
13:22:39 6 ,holding_period
13:22:39 7 ,pivot_point
13:22:39 8 ,prediction
13:22:39 9 ,prob_up
13:22:39 10 ,prob_nup
13:22:39 11 ,run_date
13:22:39 12 FROM mas_predictions
13:22:39 13 )
13:22:39 14 GROUP BY tkr,strategy,holding_period,pivot_point
13:22:39 15 ORDER BY tkr,strategy,holding_period,pivot_point
13:22:39 16 /
TKR STRATEGY HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
DIA bh 5 20 3312
DIA bh 10 20 3312
DIA bh 20 20 3244
DIA loc 5 5 3312
DIA loc 5 10 3312
DIA loc 5 20 3312
DIA loc 10 5 3312
DIA loc 10 10 3312
DIA loc 10 20 3312
DIA loc 20 5 3234
DIA loc 20 10 3220
DIA loc 20 20 3202
DIA los 5 5 3312
DIA los 5 10 3312
DIA los 5 20 3312
DIA los 10 5 3312
DIA los 10 10 3312
DIA los 10 20 3312
DIA los 20 5 3284
DIA los 20 10 3284
DIA los 20 20 3272
EFA bh 5 20 3312
EFA bh 10 20 3312
EFA bh 20 20 3244
EFA loc 5 5 3312
EFA loc 5 10 3244
EFA loc 5 20 3214
EFA loc 10 5 3312
EFA loc 10 10 3312
EFA loc 10 20 3218
EFA loc 20 5 3214
EFA loc 20 10 3224
EFA loc 20 20 2996
EFA los 5 5 3312
EFA los 5 10 3308
EFA los 5 20 3299
EFA los 10 5 3312
EFA los 10 10 3312
EFA los 10 20 3284
EFA los 20 5 3284
EFA los 20 10 3284
EFA los 20 20 3302
MDY bh 5 20 3312
MDY bh 10 20 3312
MDY bh 20 20 3312
MDY loc 5 5 3312
MDY loc 5 10 3312
MDY loc 5 20 3312
MDY loc 10 5 3312
MDY loc 10 10 3312
MDY loc 10 20 3194
MDY loc 20 5 3192
TKR STRATEGY HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
MDY loc 20 10 3312
MDY loc 20 20 3192
MDY los 5 5 3312
MDY los 5 10 3312
MDY los 5 20 3312
MDY los 10 5 3312
MDY los 10 10 3312
MDY los 10 20 3274
MDY los 20 5 3284
MDY los 20 10 3312
MDY los 20 20 3262
OIH bh 5 20 3312
OIH bh 10 20 3312
OIH bh 20 20 3312
OIH loc 5 5 3312
OIH loc 5 10 3312
OIH loc 5 20 3294
OIH loc 10 5 3312
OIH loc 10 10 3312
OIH loc 10 20 3214
OIH loc 20 5 2966
OIH loc 20 10 3230
OIH loc 20 20 3200
OIH los 5 5 3312
OIH los 5 10 3312
OIH los 5 20 3303
OIH los 10 5 3312
OIH los 10 10 3312
OIH los 10 20 3280
OIH los 20 5 3292
OIH los 20 10 3264
OIH los 20 20 3264
SHY bh 5 20 3309
SHY bh 10 20 3310
SHY bh 20 20 3280
SHY loc 5 5 3309
SHY loc 5 10 3308
SHY loc 5 20 3308
SHY loc 10 5 3306
SHY loc 10 10 3274
SHY loc 10 20 3274
SHY loc 20 5 3280
SHY loc 20 10 3280
SHY loc 20 20 3280
SHY los 5 5 3312
SHY los 5 10 3308
SHY los 5 20 3308
SHY los 10 5 3306
SHY los 10 10 3274
SHY los 10 20 3274
SHY los 20 5 3280
SHY los 20 10 3280
TKR STRATEGY HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
SHY los 20 20 3280
SPY bh 5 20 3312
SPY bh 10 20 3312
SPY bh 20 20 3312
SPY loc 5 5 3312
SPY loc 5 10 3312
SPY loc 5 20 3312
SPY loc 10 5 3312
SPY loc 10 10 3312
SPY loc 10 20 3194
SPY loc 20 5 3312
SPY loc 20 10 3312
SPY loc 20 20 3202
SPY los 5 5 3312
SPY los 5 10 3312
SPY los 5 20 3312
SPY los 10 5 3312
SPY los 10 10 3312
SPY los 10 20 3284
SPY los 20 5 3312
SPY los 20 10 3312
SPY los 20 20 3270
XLB bh 5 20 3312
XLB bh 10 20 3312
XLB bh 20 20 3244
XLB loc 5 5 3242
XLB loc 5 10 3312
XLB loc 5 20 3214
XLB loc 10 5 3234
XLB loc 10 10 3312
XLB loc 10 20 3190
XLB loc 20 5 3192
XLB loc 20 10 3202
XLB loc 20 20 3194
XLB los 5 5 3304
XLB los 5 10 3312
XLB los 5 20 3306
XLB los 10 5 3306
XLB los 10 10 3312
XLB los 10 20 3280
XLB los 20 5 3294
XLB los 20 10 3290
XLB los 20 20 3284
XLE bh 5 20 3312
XLE bh 10 20 3312
XLE bh 20 20 3312
XLE loc 5 5 3312
XLE loc 5 10 3312
XLE loc 5 20 2940
XLE loc 10 5 3312
XLE loc 10 10 3312
XLE loc 10 20 2944
TKR STRATEGY HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
XLE loc 20 5 2946
XLE loc 20 10 3312
XLE loc 20 20 2944
XLE los 5 5 3312
XLE los 5 10 3312
XLE los 5 20 3298
XLE los 10 5 3312
XLE los 10 10 3312
XLE los 10 20 3280
XLE los 20 5 3294
XLE los 20 10 3312
XLE los 20 20 2944
XLF bh 5 20 3312
XLF bh 10 20 3312
XLF bh 20 20 3312
XLF loc 5 5 3312
XLF loc 5 10 3312
XLF loc 5 20 3194
XLF loc 10 5 3312
XLF loc 10 10 3312
XLF loc 10 20 3194
XLF loc 20 5 3312
XLF loc 20 10 3312
XLF loc 20 20 3012
XLF los 5 5 3312
XLF los 5 10 3312
XLF los 5 20 3284
XLF los 10 5 3312
XLF los 10 10 3312
XLF los 10 20 3284
XLF los 20 5 3312
XLF los 20 10 3312
XLF los 20 20 3270
XLI bh 5 20 3312
XLI bh 10 20 3312
XLI bh 20 20 3230
XLI loc 5 5 3312
XLI loc 5 10 3312
XLI loc 5 20 3312
XLI loc 10 5 3312
XLI loc 10 10 3312
XLI loc 10 20 3312
XLI loc 20 5 3220
XLI loc 20 10 3194
XLI loc 20 20 3194
XLI los 5 5 3312
XLI los 5 10 3312
XLI los 5 20 3312
XLI los 10 5 3312
XLI los 10 10 3312
XLI los 10 20 3312
XLI los 20 5 3284
TKR STRATEGY HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
XLI los 20 10 3282
XLI los 20 20 3284
XLK bh 5 20 3312
XLK bh 10 20 3312
XLK bh 20 20 3252
XLK loc 5 5 3312
XLK loc 5 10 3312
XLK loc 5 20 3294
XLK loc 10 5 3312
XLK loc 10 10 3312
XLK loc 10 20 3224
XLK loc 20 5 3234
XLK loc 20 10 3220
XLK loc 20 20 3214
XLK los 5 5 3312
XLK los 5 10 3312
XLK los 5 20 3303
XLK los 10 5 3312
XLK los 10 10 3312
XLK los 10 20 3284
XLK los 20 5 3284
XLK los 20 10 3284
XLK los 20 20 3272
XLV bh 5 20 3312
XLV bh 10 20 3312
XLV bh 20 20 3312
XLV loc 5 5 3312
XLV loc 5 10 3312
XLV loc 5 20 3312
XLV loc 10 5 3312
XLV loc 10 10 3312
XLV loc 10 20 3312
XLV loc 20 5 3312
XLV loc 20 10 3312
XLV loc 20 20 3312
XLV los 5 5 3312
XLV los 5 10 3312
XLV los 5 20 3312
XLV los 10 5 3312
XLV los 10 10 3312
XLV los 10 20 3312
XLV los 20 5 3312
XLV los 20 10 3312
XLV los 20 20 3312
252 rows selected.
Elapsed: 00:00:00.01
13:22:39 SQL>
13:22:39 SQL> -- Take a quick look at the predictions
13:22:39 SQL>
13:22:39 SQL> -- Verify that prob_up + prob_nup is 1
13:22:39 SQL> SELECT AVG(prob_up + prob_nup)FROM mas_predictions;
AVG(PROB_UP+PROB_NUP)
---------------------
1
Elapsed: 00:00:00.00
13:22:39 SQL> -- Verify with a ROUND of 4 significant digits
13:22:39 SQL> SELECT COUNT(*)FROM mas_predictions WHERE ROUND((prob_up+prob_nup),4) !=1 ;
COUNT(*)
----------
0
Elapsed: 00:00:00.00
13:22:39 SQL>
13:22:39 SQL> -- Count the predictions with confidence level rounded to 1 significant digit
13:22:39 SQL> SELECT
13:22:39 2 ROUND(prob_up,1)prob_up
13:22:39 3 ,ROUND(prob_nup,1)prob_nup
13:22:39 4 ,prediction
13:22:39 5 ,COUNT(prediction)
13:22:39 6 FROM
13:22:39 7 (
13:22:39 8 SELECT
13:22:39 9 REGEXP_SUBSTR(tkr_date,'...')tkr
13:22:39 10 ,strategy
13:22:39 11 ,holding_period
13:22:39 12 ,pivot_point
13:22:39 13 ,prediction
13:22:39 14 ,prob_up
13:22:39 15 ,prob_nup
13:22:39 16 ,run_date
13:22:39 17 FROM mas_predictions
13:22:39 18 )
13:22:39 19 GROUP BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
13:22:39 20 ORDER BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
13:22:39 21 /
PROB_UP PROB_NUP PREDICTION COUNT(PREDICTION)
---------- ---------- ----------- -----------------
0 1 nup 1800
.1 .9 nup 154948
.2 .8 nup 614280
.3 .7 nup 35103
.4 .6 nup 10727
.5 .5 nup 2873
.5 .5 up 2033
.6 .4 up 2530
.7 .3 up 1417
.8 .2 up 775
.9 .1 up 247
1 0 up 2
12 rows selected.
Elapsed: 00:00:00.00
13:22:39 SQL>
13:22:40 SQL>
13:22:41 SQL>
13:22:41 SQL>
We look at the above output and ask an obvious question, "How accurate are the 'up' predictions?" Percentage gain information for a specific combination of tkr, closing date, strategy, holding_period, and pivot_point is contained within a table described earlier in this paper: yfact. We join mas_predictions with yfact and then query the result:
15:06:37 SQL> @cr_mp_j_yfact.sql
15:07:02 SQL> --
15:07:02 SQL> -- cr_mp_j_yfact.sql
15:07:02 SQL> --
15:07:02 SQL>
15:07:02 SQL> -- Creates a view, mp_j_yfact, which joins mas_predictions with
15:07:02 SQL> -- yfact. This view helps us analyze the accuracy of SVM predictions
15:07:02 SQL> -- within the mas_predictions table.
15:07:02 SQL>
15:07:02 SQL> CREATE OR REPLACE VIEW mp_j_yfact AS SELECT
15:07:02 2 m.prediction
15:07:02 3 ,m.prob_up
15:07:02 4 ,y.pg -- pct_gain
15:07:02 5 ,y.day_id -- fk to day_dimension
15:07:02 6 ,y.tkr -- DIA, EFA, MDY, ...
15:07:02 7 ,y.strategy -- bh, loc, los
15:07:02 8 ,y.pivot_point -- 5,10,20
15:07:02 9 ,y.holding_period -- 5,10,20
15:07:02 10 FROM yfact y, mas_predictions m
15:07:02 11 WHERE REPLACE(m.tkr_date,'-') = y.tkr||y.day_id
15:07:02 12 AND y.strategy = m.strategy
15:07:02 13 AND y.pivot_point = m.pivot_point
15:07:02 14 AND y.holding_period = m.holding_period
15:07:02 15 /
View created.
Elapsed: 00:00:00.22
15:07:02 SQL>
15:07:02 SQL> -- Now query
15:07:02 SQL> SELECT COUNT(*)FROM mas_predictions;
COUNT(*)
----------
826735
Elapsed: 00:00:00.00
15:07:02 SQL> SELECT COUNT(*)FROM yfact;
COUNT(*)
----------
544644
Elapsed: 00:00:00.00
15:07:02 SQL> SELECT COUNT(*)FROM mp_j_yfact;
COUNT(*)
----------
826735
Elapsed: 00:00:03.78
15:07:06 SQL>
15:07:06 SQL> -- Look at data rolled up to highest level
15:07:06 SQL> SELECT holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp_j_yfact
15:07:06 2 GROUP BY holding_period ORDER BY holding_period
15:07:06 3 /
HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
-------------- ------------ ----------
5 -.01880193 277235
10 .0529347 276588
20 .153422813 272912
Elapsed: 00:00:03.24
15:07:09 SQL>
15:07:09 SQL> -- Drill down into prediction
15:07:09 SQL> SELECT prediction, holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp_j_yfact
15:07:09 2 GROUP BY prediction,holding_period ORDER BY prediction,holding_period
15:07:09 3 /
PREDICTION HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- -------------- ------------ ----------
nup 5 -.01533182 276021
nup 10 .062276532 274374
nup 20 .167687269 269336
up 5 -.80778346 1214
up 10 -1.104769 2214
up 20 -.92094285 3576
6 rows selected.
Elapsed: 00:00:03.25
15:07:12 SQL>
15:09:16 SQL>
15:09:17 SQL>
It's obvious from the last query that SVM cannot consistently predict if an ETF resides in the 'up' class. Compare the "up" class to the "nup" class. Look at the values for AVG_PCT_GAIN when PREDICTION is "up". They are all negative! The ETFs which were predicted by SVM to be "not-up" actually had higher values for AVG_PCT_GAIN. Perhaps it would do better if we gave SVM something more useful than slopes of moving averages but that is the topic of another paper. Next we display the results of a query which drills down into other dimensions of yfact.
15:21:05 SQL>
15:21:05 SQL> @qry_mp_j_yfact.sql
15:21:52 SQL> --
15:21:52 SQL> -- qry_mp_j_yfact.sql
15:21:52 SQL> --
15:21:52 SQL>
15:21:52 SQL> -- Drills down into yfact dimensions of view: mp_j_yfact
15:21:52 SQL>
15:21:52 SQL> SELECT
15:21:52 2 tkr,prediction,strategy,pivot_point,holding_period
15:21:52 3 ,ROUND(AVG(pg),3)avg_pct_gain
15:21:52 4 ,COUNT(*)FROM mp_j_yfact
15:21:52 5 GROUP BY tkr,prediction,strategy,pivot_point,holding_period
15:21:52 6 ORDER BY tkr,prediction,strategy,pivot_point,holding_period
15:21:52 7 /
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
DIA nup bh 20 5 .046 3305
DIA nup bh 20 10 .089 3287
DIA nup bh 20 20 .119 3127
DIA nup loc 5 5 -.037 3312
DIA nup loc 5 10 -.02 3308
DIA nup loc 5 20 -.002 3192
DIA nup loc 10 5 -.016 3312
DIA nup loc 10 10 .028 3304
DIA nup loc 10 20 .102 3187
DIA nup loc 20 5 .02 3312
DIA nup loc 20 10 .043 3312
DIA nup loc 20 20 .106 3180
DIA nup los 5 5 -.115 3294
DIA nup los 5 10 -.12 3305
DIA nup los 5 20 -.082 3207
DIA nup los 10 5 -.106 3257
DIA nup los 10 10 -.003 3258
DIA nup los 10 20 .158 3206
DIA nup los 20 5 -.012 3307
DIA nup los 20 10 .042 3292
DIA nup los 20 20 .152 3192
DIA up bh 20 5 1.952 7
DIA up bh 20 10 -.039 25
DIA up bh 20 20 -.4 117
DIA up loc 5 10 1.309 4
DIA up loc 5 20 .23 42
DIA up loc 10 10 1.071 8
DIA up loc 10 20 .145 33
DIA up loc 20 20 -.291 22
DIA up los 5 5 -1.754 18
DIA up los 5 10 -2.364 7
DIA up los 5 20 -2.724 77
DIA up los 10 5 1.32 55
DIA up los 10 10 -1.546 54
DIA up los 10 20 -3.03 78
DIA up los 20 5 .925 5
DIA up los 20 10 -7.344 20
DIA up los 20 20 -2.12 80
EFA nup bh 20 5 .14 3289
EFA nup bh 20 10 .24 3254
EFA nup bh 20 20 .46 3156
EFA nup loc 5 5 -.018 3311
EFA nup loc 5 10 .108 3310
EFA nup loc 5 20 .332 3187
EFA nup loc 10 5 .107 3244
EFA nup loc 10 10 .231 3305
EFA nup loc 10 20 .503 3178
EFA nup loc 20 5 .195 3210
EFA nup loc 20 10 .389 3196
EFA nup loc 20 20 .741 2938
EFA nup los 5 5 -.155 3299
EFA nup los 5 10 -.014 3302
EFA nup los 5 20 .296 3190
EFA nup los 10 5 .093 3262
EFA nup los 10 10 .249 3272
EFA nup los 10 20 .676 3170
EFA nup los 20 5 .29 3240
EFA nup los 20 10 .619 3167
EFA nup los 20 20 1.072 3177
EFA up bh 20 5 -2.645 23
EFA up bh 20 10 .052 58
EFA up bh 20 20 -.368 88
EFA up loc 5 5 -1.006 1
EFA up loc 5 10 -.168 2
EFA up loc 5 20 -.387 27
EFA up loc 10 10 -1.181 7
EFA up loc 10 20 -.191 46
EFA up loc 20 5 .019 4
EFA up loc 20 10 -.044 22
EFA up loc 20 20 .378 58
EFA up los 5 5 -.714 13
EFA up los 5 10 -2.401 10
EFA up los 5 20 -3.033 94
EFA up los 10 5 .318 46
EFA up los 10 10 -2.229 40
EFA up los 10 20 -3.074 114
EFA up los 20 5 -1.955 59
EFA up los 20 10 -2.766 117
EFA up los 20 20 -2.353 125
MDY nup bh 20 5 .104 3296
MDY nup bh 20 10 .189 3259
MDY nup bh 20 20 .372 3293
MDY nup loc 5 5 -.062 3312
MDY nup loc 5 10 .036 3312
MDY nup loc 5 20 .175 3190
MDY nup loc 10 5 .012 3312
MDY nup loc 10 10 .1 3312
MDY nup loc 10 20 .309 3303
MDY nup loc 20 5 .1 3312
MDY nup loc 20 10 .198 3187
MDY nup loc 20 20 .341 3192
MDY nup los 5 5 -.229 3312
MDY nup los 5 10 -.118 3307
MDY nup los 5 20 -.018 3265
MDY nup los 10 5 -.058 3299
MDY nup los 10 10 0 3300
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
MDY nup los 10 20 .297 3277
MDY nup los 20 5 .111 3278
MDY nup los 20 10 .22 3240
MDY nup los 20 20 .413 3225
MDY up bh 20 5 .367 16
MDY up bh 20 10 .646 53
MDY up bh 20 20 -1.939 19
MDY up loc 5 20 -3.671 2
MDY up loc 10 20 2.428 9
MDY up loc 20 10 .515 7
MDY up los 5 10 -4.499 5
MDY up los 5 20 -4.595 19
MDY up los 10 5 -5.891 13
MDY up los 10 10 1.156 12
MDY up los 10 20 -2.158 35
MDY up los 20 5 -1.426 34
MDY up los 20 10 -4.365 34
MDY up los 20 20 -5.573 37
OIH nup bh 20 5 .293 3299
OIH nup bh 20 10 .558 3310
OIH nup bh 20 20 1.165 3263
OIH nup loc 5 5 -.092 3312
OIH nup loc 5 10 .11 3305
OIH nup loc 5 20 .487 2954
OIH nup loc 10 5 .043 3312
OIH nup loc 10 10 .253 3312
OIH nup loc 10 20 .549 3226
OIH nup loc 20 5 .163 3294
OIH nup loc 20 10 .482 3214
OIH nup loc 20 20 .718 3200
OIH nup los 5 5 -.475 3312
OIH nup los 5 10 -.333 3303
OIH nup los 5 20 -.151 3264
OIH nup los 10 5 -.209 3310
OIH nup los 10 10 -.055 3309
OIH nup los 10 20 .096 3243
OIH nup los 20 5 .025 3298
OIH nup los 20 10 .414 3277
OIH nup los 20 20 .445 3258
OIH up bh 20 5 -.057 13
OIH up bh 20 10 3.035 2
OIH up bh 20 20 -3.981 49
OIH up loc 5 10 4.303 7
OIH up loc 5 20 4.917 12
OIH up loc 10 20 4.357 4
OIH up los 5 10 3.767 9
OIH up los 5 20 6.88 28
OIH up los 10 5 4.845 2
OIH up los 10 10 .569 3
OIH up los 10 20 6.468 21
OIH up los 20 5 9.603 5
OIH up los 20 10 6.899 3
OIH up los 20 20 33.461 6
SHY nup bh 20 5 .067 3222
SHY nup bh 20 10 .132 3145
SHY nup bh 20 20 .262 3112
SHY nup loc 5 5 .041 3281
SHY nup loc 5 10 .086 3269
SHY nup loc 5 20 .178 3222
SHY nup loc 10 5 .048 3276
SHY nup loc 10 10 .095 3230
SHY nup loc 10 20 .188 3221
SHY nup loc 20 5 .056 3238
SHY nup loc 20 10 .111 3201
SHY nup loc 20 20 .211 3203
SHY nup los 5 5 .017 3257
SHY nup los 5 10 .037 3225
SHY nup los 5 20 .096 3109
SHY nup los 10 5 .031 3265
SHY nup los 10 10 .057 3141
SHY nup los 10 20 .113 3026
SHY nup los 20 5 .045 3229
SHY nup los 20 10 .094 3093
SHY nup los 20 20 .167 3002
SHY up bh 20 5 .036 87
SHY up bh 20 10 .129 165
SHY up bh 20 20 .227 168
SHY up loc 5 5 .1 28
SHY up loc 5 10 .151 37
SHY up loc 5 20 .305 58
SHY up loc 10 5 .133 32
SHY up loc 10 10 .114 44
SHY up loc 10 20 .31 59
SHY up loc 20 5 .031 70
SHY up loc 20 10 .107 73
SHY up loc 20 20 .347 77
SHY up los 5 5 -.033 55
SHY up los 5 10 .208 81
SHY up los 5 20 .166 171
SHY up los 10 5 .027 43
SHY up los 10 10 .143 133
SHY up los 10 20 .222 254
SHY up los 20 5 -.008 79
SHY up los 20 10 .058 181
SHY up los 20 20 .176 278
SPY nup bh 20 5 .051 3280
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
SPY nup bh 20 10 .08 3292
SPY nup bh 20 20 .121 3276
SPY nup loc 5 5 -.093 3312
SPY nup loc 5 10 -.036 3312
SPY nup loc 5 20 0 3304
SPY nup loc 10 5 -.042 3312
SPY nup loc 10 10 .01 3307
SPY nup loc 10 20 .128 3302
SPY nup loc 20 5 .031 3312
SPY nup loc 20 10 .042 3188
SPY nup loc 20 20 .107 3196
SPY nup los 5 5 -.235 3308
SPY nup los 5 10 -.145 3295
SPY nup los 5 20 -.1 3276
SPY nup los 10 5 -.131 3281
SPY nup los 10 10 -.035 3260
SPY nup los 10 20 .179 3254
SPY nup los 20 5 .021 3309
SPY nup los 20 10 .061 3258
SPY nup los 20 20 .175 3222
SPY up bh 20 5 -.732 32
SPY up bh 20 10 -.514 20
SPY up bh 20 20 .333 36
SPY up loc 5 20 1.118 8
SPY up loc 10 10 0 5
SPY up loc 10 20 1.826 10
SPY up loc 20 10 .809 6
SPY up loc 20 20 2.149 6
SPY up los 5 5 4.898 4
SPY up los 5 10 -.701 17
SPY up los 5 20 -1.797 36
SPY up los 10 5 .314 31
SPY up los 10 10 -1.308 52
SPY up los 10 20 -1.832 58
SPY up los 20 5 -3.927 3
SPY up los 20 10 -6.439 26
SPY up los 20 20 -2.27 48
XLB nup bh 20 5 .147 3300
XLB nup bh 20 10 .265 3305
XLB nup bh 20 20 .508 3193
XLB nup loc 5 5 -.035 3241
XLB nup loc 5 10 -.035 3233
XLB nup loc 5 20 .029 3183
XLB nup loc 10 5 -.012 3309
XLB nup loc 10 10 -.007 3311
XLB nup loc 10 20 .139 3194
XLB nup loc 20 5 .071 3214
XLB nup loc 20 10 .145 3190
XLB nup loc 20 20 .292 3192
XLB nup los 5 5 -.232 3295
XLB nup los 5 10 -.381 3300
XLB nup los 5 20 -.418 3262
XLB nup los 10 5 -.172 3300
XLB nup los 10 10 -.285 3310
XLB nup los 10 20 -.185 3274
XLB nup los 20 5 -.002 3281
XLB nup los 20 10 .028 3259
XLB nup los 20 20 .137 3222
XLB up bh 20 5 -.926 12
XLB up bh 20 10 3.869 7
XLB up bh 20 20 .167 51
XLB up loc 5 5 .417 1
XLB up loc 5 10 2.879 1
XLB up loc 5 20 .116 9
XLB up loc 10 5 -1.311 3
XLB up loc 10 10 -1.514 1
XLB up loc 10 20 -3.931 8
XLB up loc 20 20 -4.59 2
XLB up los 5 5 .513 9
XLB up los 5 10 .075 6
XLB up los 5 20 .009 32
XLB up los 10 5 .417 12
XLB up los 10 10 -5.423 2
XLB up los 10 20 -5.155 16
XLB up los 20 5 -1.638 25
XLB up los 20 10 -3.95 21
XLB up los 20 20 -1.296 62
XLE nup bh 20 5 .315 3301
XLE nup bh 20 10 .606 3255
XLE nup bh 20 20 1.111 3255
XLE nup loc 5 5 .048 3312
XLE nup loc 5 10 .307 3304
XLE nup loc 5 20 .519 2939
XLE nup loc 10 5 .113 3312
XLE nup loc 10 10 .314 3302
XLE nup loc 10 20 .492 3295
XLE nup loc 20 5 .132 2939
XLE nup loc 20 10 .33 2938
XLE nup loc 20 20 .337 2930
XLE nup los 5 5 -.212 3311
XLE nup los 5 10 -.011 3284
XLE nup los 5 20 -.12 3280
XLE nup los 10 5 -.061 3292
XLE nup los 10 10 -.003 3255
XLE nup los 10 20 -.18 3292
XLE nup los 20 5 -.047 3269
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
XLE nup los 20 10 -.003 3250
XLE nup los 20 20 -.442 2927
XLE up bh 20 5 -1.435 11
XLE up bh 20 10 .712 57
XLE up bh 20 20 5.482 57
XLE up loc 5 10 -1.892 8
XLE up loc 5 20 4.135 7
XLE up loc 10 10 -1.82 10
XLE up loc 10 20 4.272 17
XLE up loc 20 5 3.619 1
XLE up loc 20 10 -1.908 6
XLE up loc 20 20 6.04 14
XLE up los 5 5 -2.513 1
XLE up los 5 10 .709 28
XLE up los 5 20 4.471 14
XLE up los 10 5 -3.915 20
XLE up los 10 10 .533 57
XLE up los 10 20 2.625 20
XLE up los 20 5 -.982 29
XLE up los 20 10 1.45 30
XLE up los 20 20 5.629 17
XLF nup bh 20 5 -.106 3304
XLF nup bh 20 10 -.254 3294
XLF nup bh 20 20 -.669 3289
XLF nup loc 5 5 -.286 3312
XLF nup loc 5 10 -.389 3312
XLF nup loc 5 20 -.518 3312
XLF nup loc 10 5 -.166 3312
XLF nup loc 10 10 -.299 3312
XLF nup loc 10 20 -.315 3312
XLF nup loc 20 5 -.054 3194
XLF nup loc 20 10 -.147 3194
XLF nup loc 20 20 -.228 3012
XLF nup los 5 5 -.455 3312
XLF nup los 5 10 -.51 3312
XLF nup los 5 20 -.29 3273
XLF nup los 10 5 -.204 3302
XLF nup los 10 10 -.244 3283
XLF nup los 10 20 .16 3261
XLF nup los 20 5 .019 3280
XLF nup los 20 10 .064 3238
XLF nup los 20 20 .354 3207
XLF up bh 20 5 -4.541 8
XLF up bh 20 10 -2.898 18
XLF up bh 20 20 1.511 23
XLF up los 5 20 -8.051 39
XLF up los 10 5 -3.605 10
XLF up los 10 10 -10.092 29
XLF up los 10 20 -8.629 51
XLF up los 20 5 -9.54 4
XLF up los 20 10 -7.469 46
XLF up los 20 20 -4.214 63
XLI nup bh 20 5 .044 3293
XLI nup bh 20 10 .075 3280
XLI nup bh 20 20 .108 3180
XLI nup loc 5 5 -.032 3310
XLI nup loc 5 10 .07 3312
XLI nup loc 5 20 .098 3219
XLI nup loc 10 5 .01 3312
XLI nup loc 10 10 .089 3311
XLI nup loc 10 20 .23 3188
XLI nup loc 20 5 .056 3312
XLI nup loc 20 10 .132 3311
XLI nup loc 20 20 .215 3190
XLI nup los 5 5 -.082 3280
XLI nup los 5 10 .051 3295
XLI nup los 5 20 .122 3238
XLI nup los 10 5 -.002 3292
XLI nup los 10 10 .154 3275
XLI nup los 10 20 .405 3231
XLI nup los 20 5 .107 3280
XLI nup los 20 10 .271 3253
XLI nup los 20 20 .378 3234
XLI up bh 20 5 -1.306 19
XLI up bh 20 10 -1.004 32
XLI up bh 20 20 -.588 50
XLI up loc 5 5 -1.531 2
XLI up loc 5 20 1.111 1
XLI up loc 10 10 1.671 1
XLI up loc 10 20 .277 6
XLI up loc 20 10 1.671 1
XLI up loc 20 20 2.307 4
XLI up los 5 5 -2.206 32
XLI up los 5 10 4.694 17
XLI up los 5 20 -5.369 46
XLI up los 10 5 -2.224 20
XLI up los 10 10 -3.346 37
XLI up los 10 20 -5.929 51
XLI up los 20 5 -3.248 32
XLI up los 20 10 -3.738 59
XLI up los 20 20 -5.085 50
XLK nup bh 20 5 .167 3226
XLK nup bh 20 10 .302 3225
XLK nup bh 20 20 .31 3216
XLK nup loc 5 5 -.082 3312
XLK nup loc 5 10 .022 3312
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
XLK nup loc 5 20 .058 3234
XLK nup loc 10 5 .009 3312
XLK nup loc 10 10 .095 3304
XLK nup loc 10 20 .121 3220
XLK nup loc 20 5 .038 3289
XLK nup loc 20 10 .065 3216
XLK nup loc 20 20 .044 3214
XLK nup los 5 5 -.237 3268
XLK nup los 5 10 -.17 3280
XLK nup los 5 20 -.074 3250
XLK nup los 10 5 -.096 3309
XLK nup los 10 10 -.019 3283
XLK nup los 10 20 .06 3243
XLK nup los 20 5 -.046 3296
XLK nup los 20 10 -.011 3233
XLK nup los 20 20 -.089 3240
XLK up bh 20 5 -1.901 86
XLK up bh 20 10 -3.131 87
XLK up bh 20 20 -2.014 36
XLK up loc 10 10 -4.783 8
XLK up loc 20 5 .907 5
XLK up loc 20 10 -5.372 8
XLK up los 5 5 -3.279 44
XLK up los 5 10 .113 32
XLK up los 5 20 -6.663 34
XLK up los 10 5 1.272 3
XLK up los 10 10 -3.075 29
XLK up los 10 20 -1.512 41
XLK up los 20 5 .576 7
XLK up los 20 10 -4.031 51
XLK up los 20 20 -1.561 32
XLV nup bh 20 5 .021 3297
XLV nup bh 20 10 .041 3257
XLV nup bh 20 20 .075 3279
XLV nup loc 5 5 -.024 3312
XLV nup loc 5 10 .063 3306
XLV nup loc 5 20 .03 3308
XLV nup loc 10 5 .056 3310
XLV nup loc 10 10 .056 3308
XLV nup loc 10 20 .061 3309
XLV nup loc 20 5 -.001 3312
XLV nup loc 20 10 -.03 3283
XLV nup loc 20 20 -.078 3298
XLV nup los 5 5 -.072 3309
XLV nup los 5 10 .087 3305
XLV nup los 5 20 -.024 3284
XLV nup los 10 5 .089 3294
XLV nup los 10 10 .072 3277
XLV nup los 10 20 .041 3235
XLV nup los 20 5 -.027 3310
XLV nup los 20 10 -.099 3272
XLV nup los 20 20 -.215 3257
XLV up bh 20 5 .736 15
XLV up bh 20 10 .246 55
XLV up bh 20 20 -.165 33
XLV up loc 5 10 1.076 6
XLV up loc 5 20 .528 4
XLV up loc 10 5 0 2
XLV up loc 10 10 0 4
XLV up loc 10 20 1.022 3
XLV up loc 20 10 .436 29
XLV up loc 20 20 -.641 14
XLV up los 5 5 -.769 3
XLV up los 5 10 -.259 7
XLV up los 5 20 1.471 28
XLV up los 10 5 .013 18
XLV up los 10 10 -.343 35
XLV up los 10 20 .484 77
XLV up los 20 5 .032 2
XLV up los 20 10 .195 40
XLV up los 20 20 -1.343 55
455 rows selected.
Elapsed: 00:00:05.16
15:21:57 SQL>
15:21:57 SQL>
Inspection of the above output suggests that SVM works well with the tkrs: OIH, and XLE when they are held for 20 days. We display that information below. Notice the values for AVG_PCT_GAIN when PREDICTION is "up":
15:33:48 SQL>
15:33:48 SQL>
15:33:48 SQL> SELECT
15:33:48 2 tkr,prediction,strategy,pivot_point,holding_period
15:33:48 3 ,ROUND(AVG(pg),3)avg_pct_gain
15:33:48 4 ,COUNT(*)FROM mp_j_yfact
15:33:48 5 WHERE tkr IN('OIH','XLE')AND holding_period=20
15:33:48 6 GROUP BY tkr,prediction,strategy,pivot_point,holding_period
15:33:48 7 ORDER BY tkr,prediction,strategy,pivot_point,holding_period
15:33:48 8 /
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
OIH nup bh 20 20 1.165 3263
OIH nup loc 5 20 .487 2954
OIH nup loc 10 20 .549 3226
OIH nup loc 20 20 .718 3200
OIH nup los 5 20 -.151 3264
OIH nup los 10 20 .096 3243
OIH nup los 20 20 .445 3258
OIH up bh 20 20 -3.981 49
OIH up loc 5 20 4.917 12
OIH up loc 10 20 4.357 4
OIH up los 5 20 6.88 28
OIH up los 10 20 6.468 21
OIH up los 20 20 33.461 6
XLE nup bh 20 20 1.111 3255
XLE nup loc 5 20 .519 2939
XLE nup loc 10 20 .492 3295
XLE nup loc 20 20 .337 2930
XLE nup los 5 20 -.12 3280
XLE nup los 10 20 -.18 3292
XLE nup los 20 20 -.442 2927
XLE up bh 20 20 5.482 57
XLE up loc 5 20 4.135 7
XLE up loc 10 20 4.272 17
XLE up loc 20 20 6.04 14
XLE up los 5 20 4.471 14
XLE up los 10 20 2.625 20
XLE up los 20 20 5.629 17
27 rows selected.
Elapsed: 00:00:00.92
15:33:49 SQL>
15:33:49 SQL>
15:33:49 SQL>
15:35:12 SQL>
15:35:12 SQL>
15:37:32 SQL>
15:37:32 SQL> -- Rollup
15:37:32 SQL> SELECT
15:37:32 2 prediction,strategy,pivot_point,holding_period
15:37:32 3 ,ROUND(AVG(pg),3)avg_pct_gain
15:37:32 4 ,COUNT(*)FROM mp_j_yfact
15:37:32 5 WHERE tkr IN('OIH','XLE')AND holding_period=20
15:37:32 6 GROUP BY prediction,strategy,pivot_point,holding_period
15:37:32 7 ORDER BY prediction,strategy,pivot_point,holding_period
15:37:32 8 /
PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- -------------- ------------ ----------
nup bh 20 20 1.138 6518
nup loc 5 20 .503 5893
nup loc 10 20 .52 6521
nup loc 20 20 .536 6130
nup los 5 20 -.135 6544
nup los 10 20 -.043 6535
nup los 20 20 .025 6185
up bh 20 20 1.108 106
up loc 5 20 4.629 19
up loc 10 20 4.288 21
up loc 20 20 6.04 14
up los 5 20 6.077 42
up los 10 20 4.593 41
up los 20 20 12.889 23
14 rows selected.
Elapsed: 00:00:00.91
15:37:33 SQL>
15:37:33 SQL>
15:37:33 SQL>
We offer ideas for improvement. First, the idea that some ETFs are more predictable is certainly worth study. This is easily implemented by assigning a unique integer to each tkr and then attaching those integers to the attribute list fed to SVM. Some syntax to move us in that direction is listed below:
12:09:21 SQL> @cr_tkr_int.sql
12:09:40 SQL> --
12:09:40 SQL> -- cr_tkr_int.sql
12:09:40 SQL> --
12:09:40 SQL>
12:09:40 SQL> DROP TABLE tkr_int;
Table dropped.
Elapsed: 00:00:00.71
12:09:40 SQL> CREATE TABLE tkr_int AS SELECT tkr_int, tkr FROM
12:09:40 2 (SELECT rownum tkr_int, tkr FROM(SELECT distinct tkr FROM mas ORDER BY tkr))
12:09:40 3 /
Table created.
Elapsed: 00:00:00.50
12:09:41 SQL> SELECT * FROM tkr_int;
TKR_INT TKR
---------- ----
1 DIA
2 EFA
3 MDY
4 OIH
5 SHY
6 SPY
7 XLB
8 XLE
9 XLF
10 XLI
11 XLK
12 XLV
12 rows selected.
Elapsed: 00:00:00.22
12:09:41 SQL>
12:09:44 SQL>
12:09:44 SQL>
Secondly, it is obvious that many SVM 'up' predictions actaully correspond with a negative percentage gain. And we notice that for some tkrs it is consistent behavior. If we have a correlation there, then that event is actually predictive. Perhaps we harvest this predictive power by running SVM a third time. What are the steps we need to follow to feed prob_up from runs 1 and 2 of SVM into run 3 as an attribute? The source of of prob_up is obvious, it is the mas_predictions table. Actually we have a small wrinkle there; what we really want is the average of prob_up from runs 1 and 2. This syntax gives a mas_predictions_avg table:
12:09:44 SQL>
12:09:44 SQL>
12:12:40 SQL> @cr_mas_predictions_avg.sql
12:12:52 SQL> --
12:12:52 SQL> -- cr_mas_predictions_avg.sql
12:12:52 SQL> --
12:12:52 SQL>
12:12:52 SQL> DROP TABLE mas_predictions_avg;
Table dropped.
Elapsed: 00:00:00.12
12:12:52 SQL> CREATE TABLE mas_predictions_avg AS SELECT
12:12:52 2 tkr_date
12:12:52 3 ,strategy
12:12:52 4 ,holding_period
12:12:52 5 ,pivot_point
12:12:52 6 ,prediction
12:12:52 7 ,AVG(prob_up)prob_up12
12:12:52 8 FROM mas_predictions
12:12:52 9 GROUP BY
12:12:52 10 tkr_date
12:12:52 11 ,strategy
12:12:52 12 ,holding_period
12:12:52 13 ,pivot_point
12:12:52 14 ,prediction
12:12:52 15 /
Table created.
Elapsed: 00:00:06.44
12:12:58 SQL>
We study the steps we followed to do runs 1 and 2 to construct run 3. Our current dependency chain is listed below: - mas_predictions_avg depends on - mas_predictions table depends on - cr_bme.sql script depends on - both sme, and bme views which depend on - mas table So we join tables mas_predictions_avg, tkr_int, and mas to feed both tkr_int and prob_up into a a third run of SVM. Some template syntax for every sme view is listed below:
15:36:40 SQL>
15:36:41 SQL>
15:36:41 SQL>
15:36:41 SQL> l
1 CREATE OR REPLACE view sme AS SELECT
2 NULL aa_targ_att
3 ,m.tkr_date
4 ,m.dayspast
5 ,m.yprice
6 ,m.yvol
7 ,m.yprice_avg5
8 ,m.yprice_avg10
9 ,m.yprice_avg20
10 ,m.yprice_avg5slp
11 ,m.yprice_avg10slp
12 ,m.yprice_avg20slp
13 ,m.shy_yprice_avg5slp
14 ,m.xle_yprice_avg5slp
15 ,m.shy_yprice_avg10slp
16 ,m.xle_yprice_avg10slp
17 ,m.shy_yprice_avg20slp
18 ,m.xle_yprice_avg20slp
19 ,p.prob_up12
20 ,t.tkr_int
21 FROM mas m, mas_predictions_avg p, tkr_int t
22 -- WHERE m.tkr='XLV'AND ydate='2009-03-31'
23 WHERE ydate='2009-03-31'
24 AND m.tkr_date = p.tkr_date
25 AND p.strategy = 'los'
26 AND p.holding_period = 20
27 AND p.pivot_point = 20
28* AND m.tkr = t.tkr
15:36:42 SQL> /
View created.
Elapsed: 00:00:00.04
15:36:43 SQL> select count(*)from sme;
COUNT(*)
----------
12
Elapsed: 00:00:00.15
15:36:53 SQL>
15:36:54 SQL>
15:36:54 SQL>
Notice the structure of the above template syntax is different than the structure for runs 1 and 2:
15:38:23 SQL>
15:38:23 SQL>
15:38:23 SQL> l
1 CREATE OR REPLACE view sme AS SELECT
2 NULL aa_targ_att
3 ,tkr_date
4 ,dayspast
5 ,yprice
6 ,yvol
7 ,yprice_avg5
8 ,yprice_avg10
9 ,yprice_avg20
10 ,yprice_avg5slp
11 ,yprice_avg10slp
12 ,yprice_avg20slp
13 ,shy_yprice_avg5slp
14 ,xle_yprice_avg5slp
15 ,shy_yprice_avg10slp
16 ,xle_yprice_avg10slp
17 ,shy_yprice_avg20slp
18 ,xle_yprice_avg20slp
19* FROM mas WHERE tkr='XLV'AND ydate='2009-03-31'
15:38:24 SQL> /
View created.
Elapsed: 00:00:00.04
15:38:25 SQL>
15:38:26 SQL>
15:38:26 SQL> select count ( * ) from sme ;
COUNT(*)
----------
1
Elapsed: 00:00:00.02
15:38:31 SQL>
15:38:32 SQL>
The differences are listed below: - run3-sme depends on a join of mas, mas_predictions_avg, and tkr_int - run3-sme returns 1 row for each tkr, 12 rows in all - run12-sme returns 1 row Since run3-sme returns 12 rows, we are giving SVM the ability to focus on the relative behaviors of the tkrs. In runs 1 and 2 we asked SVM to evaluate each tkr independently. So if OIH behavior is more predicitve than DIA, based on the tkr-attributes which SVM can access, SVM now has the ability to discover that relationship. For run3, the template syntax for every bme view is listed below:
16:01:49 SQL>
16:01:50 SQL>
16:01:50 SQL>
16:01:50 SQL> l
1 CREATE OR REPLACE view bme AS SELECT
2 CASE WHEN yprice_avg20slp<0 THEN
3 CASE WHEN pct_gain20<(SELECT AVG(pct_gain20)-STDDEV(pct_gain20)FROM mas)THEN'up'ELSE'nup'END
4 ELSE
5 CASE WHEN pct_gain20>(SELECT AVG(pct_gain20)+STDDEV(pct_gain20)FROM mas)THEN'up'ELSE'nup'END
6 END aa_targ_att
7 ,m.tkr_date
8 ,m.dayspast
9 ,m.yprice
10 ,m.yvol
11 ,m.yprice_avg5
12 ,m.yprice_avg10
13 ,m.yprice_avg20
14 ,m.yprice_avg5slp
15 ,m.yprice_avg10slp
16 ,m.yprice_avg20slp
17 ,m.shy_yprice_avg5slp
18 ,m.xle_yprice_avg5slp
19 ,m.shy_yprice_avg10slp
20 ,m.xle_yprice_avg10slp
21 ,m.shy_yprice_avg20slp
22 ,m.xle_yprice_avg20slp
23 ,p.prob_up12
24 ,t.tkr_int
25 FROM mas m, mas_predictions_avg p, tkr_int t
26 -- WHERE m.tkr='XLV'AND 5+20+ydate<'2009-03-31'
27 WHERE 5+20+ydate<'2009-03-31'
28 AND m.tkr_date = p.tkr_date
29 AND p.strategy = 'los'
30 AND p.holding_period = 20
31 AND p.pivot_point = 20
32* AND m.tkr = t.tkr
16:01:51 SQL>
16:01:52 SQL>
16:01:52 SQL> /
View created.
Elapsed: 00:00:00.06
16:01:58 SQL>
16:02:00 SQL>
16:02:01 SQL> select tkr_int,count(*)from bme group by tkr_int order by tkr_int ;
TKR_INT COUNT(*)
---------- ----------
1 1620
2 1653
3 1614
4 1614
5 1647
6 1617
7 1638
8 1469
9 1622
10 1624
11 1618
12 1641
12 rows selected.
Elapsed: 00:00:00.46
16:02:17 SQL>
16:02:18 SQL>
Next we jump up a level and display the ruby script used to generate the large amount of SQL syntax required to complete run3:
#!/usr/bin/env ruby
# mas_run3.rb
# Demo: ruby -r mas_run3.rb -e 'MasRun3.new.write_score_sql_scripts'
# Ruby script which uses nested loops to generate several SQL scripts.
# Each script is a wrapper for the SVM algorithm resident within Oracle Data Miner.
class MasRun3
require 'rubygems'
require 'oci8'
def write_score_sql_scripts
# pivot_points_a holds integers representing a type of moving average slope.
# Some strategies (loc, los) depend upon a pivot point.
# For example, The loc-Trader will watch the moving average slope. Once it moves above zero, he will go Long.
# Once it moves below zero, he will sell so that he holds cash.
pivot_points_a = ["5","10","20"]
# holding_periods_a holds integers representing days a tkr is held before its percentage gain is measured.
# holding_periods_a = ["5","10","20"]
# Now, We only like 20 day holding period.
holding_periods_a = ["20"]
# strategies_a holds a list of acronyms for strategies.
# Some of them are:
# bh = Buy and Hold aka Long
# loc = Long Or Cash
# los = Long Or Short
strategies_a = ['loc','los']
# For each strategy, build-model, and then score.
# The score will give us the probability that the pct-gain will be above a threshold dependent on std. deviation.
strategies_a.each do |strategy|
pivot_points_a.each do |pp|
holding_periods_a.each do |hp|
p("strategy: #{strategy} pivot_point: #{pp} holding_period: #{hp}")
sqlscript_name = "mas_run3_#{strategy}_#{pp}_#{hp}_score.sql"
(fhw = File.open(sqlscript_name,"w")).write("--\n-- #{sqlscript_name}\n--\n\n"); fhw.close
some_dates_a = []
dates_sql = "SELECT DISTINCT TO_CHAR(ydate,'YYYY-MM-DD')FROM etfmas WHERE ydate>TO_DATE('2002-09-01','YYYY-MM-DD')AND ydate<TO_DATE('2009-04-01','YYYY-MM-DD')"
OCI8.new('trade','t').exec(dates_sql){ |r| some_dates_a << r.to_s }
some_dates_a.sort.each do |adate|
sqlplus_syntax1 = <<ENDsqlplus_syntax1
CREATE OR REPLACE view sme AS SELECT
NULL aa_targ_att
,m.tkr_date
,m.dayspast
,m.yprice
,m.yvol
,m.yprice_avg5
,m.yprice_avg10
,m.yprice_avg20
,m.yprice_avg5slp
,m.yprice_avg10slp
,m.yprice_avg20slp
,m.shy_yprice_avg5slp
,m.xle_yprice_avg5slp
,m.shy_yprice_avg10slp
,m.xle_yprice_avg10slp
,m.shy_yprice_avg20slp
,m.xle_yprice_avg20slp
,p.prob_up12
,t.tkr_int
FROM mas m, mas_predictions_avg p, tkr_int t
WHERE ydate='#{adate}'
AND m.tkr_date = p.tkr_date
AND p.strategy = '#{strategy}'
AND p.holding_period = #{hp}
AND p.pivot_point = #{pp}
AND m.tkr = t.tkr
/
CREATE OR REPLACE view bme AS SELECT
ENDsqlplus_syntax1
# Here the aa_targ_att column depends on my choice of strategy
case strategy
when "loc"
aa_targ_att_syntax = <<ENDloc_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN'nup'
ELSE
CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas)THEN'up'ELSE'nup'END
END aa_targ_att
ENDloc_syntax
when "los"
aa_targ_att_syntax = <<ENDlos_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN
CASE WHEN pct_gain#{hp}<(SELECT AVG(pct_gain#{hp})-STDDEV(pct_gain#{hp})FROM mas)THEN'up'ELSE'nup'END
ELSE
CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas)THEN'up'ELSE'nup'END
END aa_targ_att
ENDlos_syntax
end # case strategy
# Mix together the syntax
sqlplus_syntax = <<ENDsqlplus_syntax
#{sqlplus_syntax1}#{aa_targ_att_syntax},m.tkr_date
,m.dayspast
,m.yprice
,m.yvol
,m.yprice_avg5
,m.yprice_avg10
,m.yprice_avg20
,m.yprice_avg5slp
,m.yprice_avg10slp
,m.yprice_avg20slp
,m.shy_yprice_avg5slp
,m.xle_yprice_avg5slp
,m.shy_yprice_avg10slp
,m.xle_yprice_avg10slp
,m.shy_yprice_avg20slp
,m.xle_yprice_avg20slp
,p.prob_up12
,t.tkr_int
FROM mas m, mas_predictions_avg p, tkr_int t
WHERE 5+#{hp}+ydate<'#{adate}'
AND m.tkr_date = p.tkr_date
AND p.strategy = '#{strategy}'
AND p.holding_period = #{hp}
AND p.pivot_point = #{pp}
AND m.tkr = t.tkr
/
ALTER SYSTEM FLUSH SHARED_POOL;
@cr_bme_run3.sql #{strategy} #{hp} #{pp}
PURGE RECYCLEBIN;
--
ENDsqlplus_syntax
#Use cr_bme_run3.sql to build a model from bme table and then score the sme table.
(fha = File.open(sqlscript_name,"a")).write(sqlplus_syntax); fha.close
end # some_dates_a.each
end # holding_periods_a.each
end # pivot_points_a.each
end # strategies_a.each
end # def write_score_sql_scripts
end # class Mas
MasRun3.new.write_score_sql_scripts
# end of ruby script
The generated SQL scripts are displayed below:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ ll mas_run3*l
-rw-r--r-- 1 oracle dba 2190267 2009-05-29 21:31 mas_run3_loc_10_20_score.sql
-rw-r--r-- 1 oracle dba 2190262 2009-05-29 21:31 mas_run3_loc_20_20_score.sql
-rw-r--r-- 1 oracle dba 2183629 2009-05-29 21:31 mas_run3_loc_5_20_score.sql
-rw-r--r-- 1 oracle dba 2339482 2009-05-29 21:31 mas_run3_los_10_20_score.sql
-rw-r--r-- 1 oracle dba 2339482 2009-05-29 21:31 mas_run3_los_20_20_score.sql
-rw-r--r-- 1 oracle dba 2332849 2009-05-29 21:31 mas_run3_los_5_20_score.sql
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
The above six SQL scripts are brought under the control of one shell script:
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ cat mas_run3.sh
#!/bin/sh
# mas_run3.sh
set -x
cd /oracle/vlsql/md/etfmas/
. /oracle/.orcl
echo exit >> mas_run3_loc_10_20_score.sql
echo exit >> mas_run3_loc_20_20_score.sql
echo exit >> mas_run3_loc_5_20_score.sql
echo exit >> mas_run3_los_10_20_score.sql
echo exit >> mas_run3_los_20_20_score.sql
echo exit >> mas_run3_los_5_20_score.sql
sqlplus trade/t @mas_run3_loc_5_20_score.sql
sqlplus trade/t @mas_run3_loc_10_20_score.sql
sqlplus trade/t @mas_run3_loc_20_20_score.sql
sqlplus trade/t @mas_run3_los_5_20_score.sql
sqlplus trade/t @mas_run3_los_10_20_score.sql
sqlplus trade/t @mas_run3_los_20_20_score.sql
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
After the shell script ran, we inspected its duration:
19:03:22 SQL>
19:03:43 SQL>
19:03:43 SQL> select min( run_date ), max( run_date ), 24 * (( max( run_date )-min( run_date ))) from mas_predictions_run3 ;
MIN(RUN_DA MAX(RUN_DA 24*((MAX(RUN_DATE)-MIN(RUN_DATE)))
---------- ---------- ----------------------------------
2009-05-29 2009-05-30 17.3544444
Elapsed: 00:00:00.00
19:03:49 SQL>
19:03:49 SQL>
19:03:50 SQL>
The duration of 17.35 hours is significantly less than the durations of run1 and run2. We have 2 reasons for this: 1. We only call SVM for a holding period of 20 days instead of 5, 10, 20 days. 2. Each call to SVM in run3 processes 12 rows instead of just 1 row so run1 and run2 called SVM 72 times more than run3 did. We show the predictions and probabilities which SVM loaded into mas_predictions_run3 for us:
19:18:49 SQL>
19:18:49 SQL>
19:18:49 SQL>
19:18:49 SQL> -- Count the predictions with confidence level rounded to 1 significant digit
19:18:49 SQL> SELECT
19:18:49 2 ROUND(prob_up,1)prob_up
19:18:49 3 ,ROUND(prob_nup,1)prob_nup
19:18:49 4 ,prediction
19:18:49 5 ,COUNT(prediction)
19:18:49 6 FROM
19:18:49 7 (
19:18:49 8 SELECT
19:18:49 9 REGEXP_SUBSTR(tkr_date,'...')tkr
19:18:49 10 ,strategy
19:18:49 11 ,holding_period
19:18:49 12 ,pivot_point
19:18:49 13 ,prediction
19:18:49 14 ,prob_up
19:18:49 15 ,prob_nup
19:18:49 16 ,run_date
19:18:49 17 FROM mas_predictions_run3
19:18:49 18 )
19:18:49 19 GROUP BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
19:18:49 20 ORDER BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
19:18:49 21 /
PROB_UP PROB_NUP PREDICTION COUNT(PREDICTION)
---------- ---------- ----------- -----------------
0 1 nup 3216
.1 .9 nup 61831
.2 .8 nup 40043
.3 .7 nup 5534
.4 .6 nup 2016
.5 .5 nup 640
.5 .5 up 505
.6 .4 up 739
.7 .3 up 427
.8 .2 up 251
.9 .1 up 77
1 0 up 1
12 rows selected.
Elapsed: 00:00:00.07
19:18:49 SQL>
19:18:49 SQL>
19:18:49 SQL>
Next, we show how well the run3 predictions did.
19:18:49 SQL>
19:26:53 SQL>
19:26:53 SQL> @cr_mp3_j_yfact.sql
19:27:04 SQL> --
19:27:04 SQL> -- cr_mp3_j_yfact.sql
19:27:04 SQL> --
19:27:04 SQL>
19:27:04 SQL> -- Creates a view, mp3_j_yfact, which joins mas_predictions_run3 with
19:27:04 SQL> -- yfact. This view helps us analyze the accuracy of SVM predictions
19:27:04 SQL> -- within the mas_predictions_run3 table.
19:27:04 SQL>
19:27:04 SQL> CREATE OR REPLACE VIEW mp3_j_yfact AS SELECT
19:27:04 2 m.prediction
19:27:04 3 ,m.prob_up
19:27:04 4 ,y.pg -- pct_gain
19:27:04 5 ,y.day_id -- fk to day_dimension
19:27:04 6 ,y.tkr -- DIA, EFA, MDY, ...
19:27:04 7 ,y.strategy -- bh, loc, los
19:27:04 8 ,y.pivot_point -- 5,10,20
19:27:04 9 ,y.holding_period -- 5,10,20
19:27:04 10 FROM yfact y, mas_predictions_run3 m
19:27:04 11 WHERE REPLACE(m.tkr_date,'-') = y.tkr||y.day_id
19:27:04 12 AND y.strategy = m.strategy
19:27:04 13 AND y.pivot_point = m.pivot_point
19:27:04 14 AND y.holding_period = m.holding_period
19:27:04 15 /
View created.
Elapsed: 00:00:00.08
19:27:04 SQL>
19:27:04 SQL> -- Now query
19:27:04 SQL> SELECT COUNT(*)FROM mas_predictions_run3;
COUNT(*)
----------
115280
Elapsed: 00:00:00.02
19:27:04 SQL> SELECT COUNT(*)FROM yfact;
COUNT(*)
----------
544644
Elapsed: 00:00:00.45
19:27:04 SQL> SELECT COUNT(*)FROM mp3_j_yfact;
COUNT(*)
----------
115280
Elapsed: 00:00:02.79
19:27:07 SQL>
19:27:07 SQL> -- Look at data rolled up to highest level
19:27:07 SQL> SELECT holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp3_j_yfact
19:27:07 2 GROUP BY holding_period ORDER BY holding_period
19:27:07 3 /
HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
-------------- ------------ ----------
20 .133084439 115280
Elapsed: 00:00:03.22
19:27:10 SQL>
19:27:10 SQL> -- Drill down into prediction
19:27:10 SQL> SELECT prediction, holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp3_j_yfact
19:27:10 2 GROUP BY prediction,holding_period ORDER BY prediction,holding_period
19:27:10 3 /
PREDICTION HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- -------------- ------------ ----------
nup 20 .153713598 113280
up 20 -1.0353511 2000
Elapsed: 00:00:03.11
19:27:13 SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
The results of the last query above tell us all that we need to know about run3. Notice we get a negative value for AVG_PCT_GAIN when PREDICTION is "up". The idea of "chaining" predictions from run1 and run2 into run3 does not work here. To drive the point home we display more results below:
20:40:27 SQL> @qry_mp3_j_yfact.sql
20:40:37 SQL> --
20:40:37 SQL> -- qry_mp3_j_yfact.sql
20:40:37 SQL> --
20:40:37 SQL>
20:40:37 SQL> -- Drills down into yfact dimensions of view: mp3_j_yfact
20:40:37 SQL>
20:40:37 SQL> SELECT
20:40:37 2 tkr,prediction,strategy,pivot_point,holding_period
20:40:37 3 ,ROUND(AVG(pg),3)avg_pct_gain
20:40:37 4 ,COUNT(*)FROM mp3_j_yfact
20:40:37 5 GROUP BY tkr,prediction,strategy,pivot_point,holding_period
20:40:37 6 ORDER BY tkr,prediction,strategy,pivot_point,holding_period
20:40:37 7 /
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
DIA nup loc 5 20 .006 1616
DIA nup loc 10 20 .103 1611
DIA nup loc 20 20 .132 1488
DIA nup los 5 20 -.131 1622
DIA nup los 10 20 .084 1598
DIA nup los 20 20 .097 1592
DIA up loc 5 20 -7.506 1
DIA up loc 20 20 .808 8
DIA up los 5 20 .545 16
DIA up los 10 20 .953 41
DIA up los 20 20 .886 42
EFA nup loc 5 20 .331 1597
EFA nup loc 10 20 .493 1616
EFA nup loc 20 20 .718 1494
EFA nup los 5 20 .194 1625
EFA nup los 10 20 .588 1585
EFA nup los 20 20 1.011 1583
EFA up loc 5 20 -.672 13
EFA up loc 10 20 -.91 2
EFA up loc 20 20 .896 10
EFA up los 5 20 .501 28
EFA up los 10 20 -1.027 68
EFA up los 20 20 -.824 70
MDY nup loc 5 20 .177 1596
MDY nup loc 10 20 .3 1610
MDY nup loc 20 20 .31 1488
MDY nup los 5 20 -.012 1623
MDY nup los 10 20 .255 1586
MDY nup los 20 20 .358 1593
MDY up loc 5 20 -6.925 2
MDY up loc 10 20 .36 3
MDY up loc 20 20 2.158 8
MDY up los 5 20 -1.881 19
MDY up los 10 20 -.601 53
MDY up los 20 20 -.135 39
OIH nup loc 5 20 .509 1463
OIH nup loc 10 20 .549 1578
OIH nup loc 20 20 .826 1457
OIH nup los 5 20 .047 1587
OIH nup los 10 20 .241 1519
OIH nup los 20 20 .56 1527
OIH up loc 5 20 .223 20
OIH up loc 10 20 .418 34
OIH up loc 20 20 1.918 39
OIH up los 5 20 -2.423 52
OIH up los 10 20 -1.336 114
OIH up los 20 20 -.293 105
SHY nup loc 5 20 .183 1617
SHY nup loc 10 20 .195 1613
SHY nup loc 20 20 .219 1497
SHY nup los 5 20 .1 1644
SHY nup los 10 20 .124 1640
SHY nup los 20 20 .169 1645
SHY up los 5 20 -.158 4
SHY up los 10 20 .032 13
SHY up los 20 20 .095 15
SPY nup loc 5 20 0 1617
SPY nup loc 10 20 .104 1612
SPY nup loc 20 20 .101 1489
SPY nup los 5 20 -.083 1622
SPY nup los 10 20 .128 1605
SPY nup los 20 20 .148 1573
SPY up loc 20 20 1.231 7
SPY up los 5 20 -2.386 15
SPY up los 10 20 -.076 34
SPY up los 20 20 .375 59
XLB nup loc 5 20 .028 1598
XLB nup loc 10 20 .114 1604
XLB nup loc 20 20 .266 1483
XLB nup los 5 20 -.334 1605
XLB nup los 10 20 -.077 1570
XLB nup los 20 20 .224 1581
XLB up loc 5 20 1.148 3
XLB up loc 10 20 7.558 1
XLB up loc 20 20 1.6 15
XLB up los 5 20 -3.297 40
XLB up los 10 20 -3.584 73
XLB up los 20 20 -1.717 65
XLE nup loc 5 20 .525 1463
XLE nup loc 10 20 .554 1594
XLE nup loc 20 20 .387 1453
XLE nup los 5 20 -.015 1591
XLE nup los 10 20 -.117 1573
XLE nup los 20 20 -.264 1432
XLE up loc 5 20 .483 11
XLE up loc 10 20 -1.312 19
XLE up loc 20 20 1.702 31
XLE up los 5 20 -1.631 53
XLE up los 10 20 -.785 76
XLE up los 20 20 -2.345 55
XLF nup loc 5 20 -.529 1617
XLF nup loc 10 20 -.347 1611
XLF nup loc 20 20 -.277 1490
XLF nup los 5 20 -.315 1601
XLF nup los 10 20 .076 1566
XLF nup los 20 20 .405 1578
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
XLF up loc 10 20 .867 1
XLF up loc 20 20 2.489 6
XLF up los 5 20 -2.931 39
XLF up los 10 20 -1.537 78
XLF up los 20 20 -1.205 59
XLI nup loc 5 20 .099 1611
XLI nup loc 10 20 .23 1597
XLI nup loc 20 20 .206 1487
XLI nup los 5 20 -.055 1615
XLI nup los 10 20 .305 1572
XLI nup los 20 20 .286 1572
XLI up loc 20 20 .386 9
XLI up los 5 20 8.292 22
XLI up los 10 20 .507 66
XLI up los 20 20 1.25 60
XLK nup loc 5 20 .058 1617
XLK nup loc 10 20 .121 1610
XLK nup loc 20 20 .101 1491
XLK nup los 5 20 .009 1606
XLK nup los 10 20 .353 1562
XLK nup los 20 20 .06 1592
XLK up loc 20 20 1.018 5
XLK up los 5 20 -5.807 35
XLK up los 10 20 -5.534 76
XLK up los 20 20 -4.506 40
XLV nup loc 5 20 .026 1617
XLV nup loc 10 20 .044 1612
XLV nup loc 20 20 -.072 1496
XLV nup los 5 20 .067 1611
XLV nup los 10 20 .136 1583
XLV nup los 20 20 -.159 1591
XLV up los 5 20 -2.745 26
XLV up los 10 20 -1.936 58
XLV up los 20 20 -1.348 44
130 rows selected.
Elapsed: 00:00:03.20
20:40:40 SQL>
20:40:40 SQL>
20:40:40 SQL> SELECT
20:40:40 2 tkr,prediction,strategy,pivot_point,holding_period
20:40:40 3 ,ROUND(AVG(pg),3)avg_pct_gain
20:40:40 4 ,COUNT(*)FROM mp3_j_yfact
20:40:40 5 WHERE tkr IN('OIH','XLE')AND holding_period=20
20:40:40 6 GROUP BY tkr,prediction,strategy,pivot_point,holding_period
20:40:40 7 ORDER BY tkr,prediction,strategy,pivot_point,holding_period
20:40:40 8 /
TKR PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
OIH nup loc 5 20 .509 1463
OIH nup loc 10 20 .549 1578
OIH nup loc 20 20 .826 1457
OIH nup los 5 20 .047 1587
OIH nup los 10 20 .241 1519
OIH nup los 20 20 .56 1527
OIH up loc 5 20 .223 20
OIH up loc 10 20 .418 34
OIH up loc 20 20 1.918 39
OIH up los 5 20 -2.423 52
OIH up los 10 20 -1.336 114
OIH up los 20 20 -.293 105
XLE nup loc 5 20 .525 1463
XLE nup loc 10 20 .554 1594
XLE nup loc 20 20 .387 1453
XLE nup los 5 20 -.015 1591
XLE nup los 10 20 -.117 1573
XLE nup los 20 20 -.264 1432
XLE up loc 5 20 .483 11
XLE up loc 10 20 -1.312 19
XLE up loc 20 20 1.702 31
XLE up los 5 20 -1.631 53
XLE up los 10 20 -.785 76
XLE up los 20 20 -2.345 55
24 rows selected.
Elapsed: 00:00:00.34
20:40:41 SQL>
20:40:41 SQL> -- Rollup
20:40:41 SQL> SELECT
20:40:41 2 prediction,strategy,pivot_point,holding_period
20:40:41 3 ,ROUND(AVG(pg),3)avg_pct_gain
20:40:41 4 ,COUNT(*)FROM mp3_j_yfact
20:40:41 5 WHERE tkr IN('OIH','XLE')AND holding_period=20
20:40:41 6 GROUP BY prediction,strategy,pivot_point,holding_period
20:40:41 7 ORDER BY prediction,strategy,pivot_point,holding_period
20:40:41 8 /
PREDICTION STRATEGY PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN COUNT(*)
----------- ----------- ----------- -------------- ------------ ----------
nup loc 5 20 .517 2926
nup loc 10 20 .552 3172
nup loc 20 20 .607 2910
nup los 5 20 .016 3178
nup los 10 20 .059 3092
nup los 20 20 .161 2959
up loc 5 20 .315 31
up loc 10 20 -.202 53
up loc 20 20 1.822 70
up los 5 20 -2.023 105
up los 10 20 -1.116 190
up los 20 20 -.998 160
12 rows selected.
Elapsed: 00:00:00.14
20:40:41 SQL>
20:40:41 SQL>
20:40:41 SQL>
Notice that we usually get negative values for AVG_PCT_GAIN when PREDICTION is "up". So, the results of the last query above tell us all that we need to know about run3. The idea of feeding predictions from run1 and run2 into run3 does not work.
Conclusion
The Analytic Workspace Manager provided by Oracle is a useful tool for visualizing the effectiveness of some simple trading strategies for ETFs. The SVM algorithm as implemented within the the Oracle RDBMS is a powerful tool for classification of data. But the algorithm needs a proper set of predictor attributes in order to classify future price movements of ETFs. A set of predictor attributes may be easily obtained from the slopes of moving averages of ETFs. When these attributes are fed to SVM, it does have the ability to flag positive price movements over a 20 day span for the ETFs, OIH and XLE. For other ETFs such as SPY and DIA, the correlation between the slopes of moving averages and future price movements is very weak. For these other ETFs, SVM lacks the ability to accurately predict their future price movements from the slopes of moving averages. For the database programmer, this paper could be used as a source of Programming Techniques/Syntax: Demo of wget Command Line Demo of 11g Advanced Compression Feature http://www.google.com/search?q=11g+Advanced+Compression+site:oracle.com Shell Script Demo of Inline SQL Demo of SQL*Loader Command Line Simple Demo of AWK to Prepare Flat Files for SQL*Loader Moving Avg Slope via Inline View and Analytic Functions Alter Size of a Table Column Subquery to Feed 2 Values to BETWEEN Predicate Simple FOR LOOP in SQL Demo of CASE WHEN ELSE END in SQL Demo of Starting AWM From jar File Demo of Using Ruby to Generate Several Shell Scripts mas_bh.rb, Ruby Script to Generate SQL Scripts for "bh" Strategy Demo of the OCI8 Ruby-Oracle Interface bh SQL Syntax Generated From Ruby Script mas.rb, Ruby Script to Generate Many Large SQL Scripts los and loc SQL Syntax Generated From Ruby Script Passing Command Line Values to SQL*Plus Variables Various PL/SQL Examples Demo of SQL PREDICTION() Function Demo of SQL PREDICTION_PROBABILITY() Function SQL PREDICTION() Function Documentation REGEXP_SUBSTR SQL Function to Match First 3 Characters Ruby Script to Generate Run3 Syntax