Mixing Oracle and Ruby With ETF Price Data
Dan Bikle -- http://bikle.com -- 2009-06-15

Topics

Programming Techniques/Syntax

Motivation

A stock market investor might ask simple questions:
  - Should I react to the slope of the moving average of the price of my stock holdings?
  - Or, when I buy a stock, should I just hold it until I need the money?

A database programmer may use this paper as a collection of techniques:
  - Acquisition of trading data
  - Authoring SQL-Analytic functions
  - Implementing both Star and Snow-Flake schemas
  - Mapping fact-tables, Star and Snow-Flake schemas to OLAP cubes and dimensions
  - Operating Oracle Analytic Workspace Manager (AWM)
  - Operating the Measure Data Viewer in AWM
  - Using Ruby to both execute and generate SQL
  - Operating the SVM algorithm within Oracle
  - Interpreting results from the SVM algorithm

Trading Strategies

Specifically, we look at three trading strategies and how they might
be combined with the slopes of moving averages of ETF prices.

  - Buy and Hold (bh)
  - Long or Cash (loc)
  - Long or Short (los)

Buy and Hold means that we buy and hold a long position in a specific
Exchange Traded Fund (ETF) for a period of time.

Long or Cash means that we buy and hold a long position in a specific
ETF for a period of time, or we hold cash for that period of time.

Long or Short means that we buy and hold a long position in a specific
ETF for a period of time, or we hold a short position in that ETF for
that period of time.

Pivot-Point and Holding-Period

In each of the strategies we explore 2 degrees of freedom:

  - The pivot-point, or the amount of time embedded within the moving
    average slope (5, 10, or 20 days)

  - The holding-period, or amount of time an ETF position is held
    before the position is evaluated (5, 10, or 20 days)

We make note that pivot-point does not apply to the bh-strategy.

los Example


If for example we want to model the los strategy we pick a pivot-point
of say 20 days.  Next, we pick a specific ETF, DIA for example.

Then, we pick a holding period of say, 10 days.  Next, we
pick a starting date, 2006-01-02 maybe.  Then for that
date we inspect the slope of the 20-day moving average of DIA on
2006-01-02.  If the slope is positive we simulate a buy of 1
share of DIA at the closing price os 2006-01-02 otherwise we 
simulate a short of 1 share of DIA.

Next, we look at the closing price of DIA 10-trading-days after
2006-01-02 which is 2006-01-16.  Then, we compute the percentage
gain or loss for that simulated scenario and store it in a database table.

The above example suggests that the columns of the table has these names:
  -strategy (this holds the value 'los')
  -pivot_point
  -holding_period
  -tkr (this holds the value 'DIA')
  -ydate (date of the simulated buy)
  -pct_gain

If we want to model 3 pivot-points (5, 10, and 20 days) with 3
holding-periods (5, 10, 20 days), then it is easy to see that the DIA
ETF simulation generates 6 rows in this table for the buy-date of
2006-01-02.

If we want to model all of 2006, then we multiply the 6 rows by the
number of trading days in 2006 which is 6 x 250 = 1500 rows.

List of ETFs

Arbitrarily we select this list of ETFs to study:

23:23:10 SQL> select tkr,min(ydate)from etfmas group by tkr order by min(ydate);

TKR  MIN(YDATE)
---- ----------
SPY  1993-01-29
MDY  1995-08-18
DIA  1998-01-20
XLE  1998-12-22
XLI  1998-12-22
XLV  1998-12-22
XLF  1998-12-22
XLK  1998-12-22
XLB  1998-12-22
OIH  2001-02-26
EFA  2001-08-27
SHY  2002-07-31

Based on the query above we set the starting date of our analysis to be Tuesday, 2002-09-03 which is 1 month past the earliest data point we have for the SHY ETF (2002-07-31).

Our Approach


We have expressed some information about our general goal along with a
description of the data we intend to study.

How will we answer the question posed in the motivation section at the
beginning?

We write SQL queries, bolstered by Oracle Analytic Functions, to
answer the above question.

Additionally we Extract, Transform, and then Load the data into an
OLAP datastore.

Then we run simple OLAP-dimensional reports against the OLAP datastore.

These reports give us an idea of the general effectiveness for the
strategies: bh, loc, and los.

Next, we setup a backtesting infrastructure for the SVM algorithm in Oracle Data Miner.

The backtesting infrastructure constrains the SVM algorithm so that it
has a choice of following the above three strategies or doing nothing.

After running a series of backtests we have data about SVM algorithm
reaction to historical price movements.

Ordinary SQL is then used to report on the effectiveness of SVM.

We then answer the question:
"Can the SVM algorithm predict opportune times to implement the strategies bh, loc, los?"

ETL of Yahoo ETF Data

Yahoo provides pricing data for the above list of ETFs. The script listed below was used to obtain this data

#! /bin/sh

# wget.sh

# Use wget to download csv files full of pricing data from finance.yahoo.com

. /oracle/.orcl

set -x

# cd to the right place
cd /oracle/vlsql/mkt_data/etfmas/

# wget em
wget --output-document=DIA.csv http://ichart.finance.yahoo.com/table.csv?s=DIA
wget --output-document=EFA.csv http://ichart.finance.yahoo.com/table.csv?s=EFA
wget --output-document=MDY.csv http://ichart.finance.yahoo.com/table.csv?s=MDY
wget --output-document=OIH.csv http://ichart.finance.yahoo.com/table.csv?s=OIH
wget --output-document=SHY.csv http://ichart.finance.yahoo.com/table.csv?s=SHY
wget --output-document=SPY.csv http://ichart.finance.yahoo.com/table.csv?s=SPY
wget --output-document=XLB.csv http://ichart.finance.yahoo.com/table.csv?s=XLB
wget --output-document=XLE.csv http://ichart.finance.yahoo.com/table.csv?s=XLE
wget --output-document=XLF.csv http://ichart.finance.yahoo.com/table.csv?s=XLF
wget --output-document=XLI.csv http://ichart.finance.yahoo.com/table.csv?s=XLI
wget --output-document=XLK.csv http://ichart.finance.yahoo.com/table.csv?s=XLK
wget --output-document=XLV.csv http://ichart.finance.yahoo.com/table.csv?s=XLV


The above script runs very quickly, less than 10 seconds on an ordinary Linux host connected to an ordinary DSL modem.

wget Screendump



oracle@zareason:~/vlsql/md/etfmas$ ./wget.sh
+ cd /oracle/vlsql/mkt_data/etfmas/
+ wget --output-document=DIA.csv 'http://ichart.finance.yahoo.com/table.csv?s=DIA'
--10:41:41--  http://ichart.finance.yahoo.com/table.csv?s=DIA
           => `DIA.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 146,470      278.38K/s

10:41:42 (277.87 KB/s) - `DIA.csv' saved [146470]

+ wget --output-document=EFA.csv 'http://ichart.finance.yahoo.com/table.csv?s=EFA'
--10:41:42--  http://ichart.finance.yahoo.com/table.csv?s=EFA
           => `EFA.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [  <=>                                ] 96,792       250.84K/s

10:41:43 (250.11 KB/s) - `EFA.csv' saved [96792]

+ wget --output-document=MDY.csv 'http://ichart.finance.yahoo.com/table.csv?s=MDY'
--10:41:43--  http://ichart.finance.yahoo.com/table.csv?s=MDY
           => `MDY.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 172,154      293.94K/s

10:41:43 (293.29 KB/s) - `MDY.csv' saved [172154]

+ wget --output-document=OIH.csv 'http://ichart.finance.yahoo.com/table.csv?s=OIH'
--10:41:43--  http://ichart.finance.yahoo.com/table.csv?s=OIH
           => `OIH.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [  <=>                                ] 104,036      295.36K/s

10:41:44 (294.96 KB/s) - `OIH.csv' saved [104036]

+ wget --output-document=SHY.csv 'http://ichart.finance.yahoo.com/table.csv?s=SHY'
--10:41:44--  http://ichart.finance.yahoo.com/table.csv?s=SHY
           => `SHY.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [  <=>                                ] 80,700       297.73K/s

10:41:44 (297.48 KB/s) - `SHY.csv' saved [80700]

+ wget --output-document=SPY.csv 'http://ichart.finance.yahoo.com/table.csv?s=SPY'
--10:41:44--  http://ichart.finance.yahoo.com/table.csv?s=SPY
           => `SPY.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [     <=>                             ] 212,759      249.86K/s

10:41:46 (249.10 KB/s) - `SPY.csv' saved [212759]

+ wget --output-document=XLB.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLB'
--10:41:46--  http://ichart.finance.yahoo.com/table.csv?s=XLB
           => `XLB.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 124,685      293.89K/s

10:41:46 (293.59 KB/s) - `XLB.csv' saved [124685]

+ wget --output-document=XLE.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLE'
--10:41:46--  http://ichart.finance.yahoo.com/table.csv?s=XLE
           => `XLE.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 126,026      251.35K/s

10:41:47 (250.50 KB/s) - `XLE.csv' saved [126026]

+ wget --output-document=XLF.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLF'
--10:41:47--  http://ichart.finance.yahoo.com/table.csv?s=XLF
           => `XLF.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 126,354      304.52K/s

10:41:47 (304.35 KB/s) - `XLF.csv' saved [126354]

+ wget --output-document=XLI.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLI'
--10:41:47--  http://ichart.finance.yahoo.com/table.csv?s=XLI
           => `XLI.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 123,884      268.63K/s

10:41:48 (268.06 KB/s) - `XLI.csv' saved [123884]

+ wget --output-document=XLK.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLK'
--10:41:48--  http://ichart.finance.yahoo.com/table.csv?s=XLK
           => `XLK.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 125,357      293.65K/s

10:41:48 (293.29 KB/s) - `XLK.csv' saved [125357]

+ wget --output-document=XLV.csv 'http://ichart.finance.yahoo.com/table.csv?s=XLV'
--10:41:48--  http://ichart.finance.yahoo.com/table.csv?s=XLV
           => `XLV.csv'
Resolving ichart.finance.yahoo.com... 69.147.86.173
Connecting to ichart.finance.yahoo.com|69.147.86.173|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]

    [   <=>                               ] 123,517      276.35K/s

10:41:49 (275.77 KB/s) - `XLV.csv' saved [123517]

oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$

oracle@zareason:~/vlsql/md/etfmas$ ll *csv
-rw-r--r-- 1 oracle dba 146470 2009-04-02 10:41 DIA.csv
-rw-r--r-- 1 oracle dba  96792 2009-04-02 10:41 EFA.csv
-rw-r--r-- 1 oracle dba 172154 2009-04-02 10:41 MDY.csv
-rw-r--r-- 1 oracle dba 104036 2009-04-02 10:41 OIH.csv
-rw-r--r-- 1 oracle dba  80700 2009-04-02 10:41 SHY.csv
-rw-r--r-- 1 oracle dba 212759 2009-04-02 10:41 SPY.csv
-rw-r--r-- 1 oracle dba 124685 2009-04-02 10:41 XLB.csv
-rw-r--r-- 1 oracle dba 126026 2009-04-02 10:41 XLE.csv
-rw-r--r-- 1 oracle dba 126354 2009-04-02 10:41 XLF.csv
-rw-r--r-- 1 oracle dba 123884 2009-04-02 10:41 XLI.csv
-rw-r--r-- 1 oracle dba 125357 2009-04-02 10:41 XLK.csv
-rw-r--r-- 1 oracle dba 123517 2009-04-02 10:41 XLV.csv
oracle@zareason:~/vlsql/md/etfmas$

The next step of extracting the Yahoo data was accomplished with the Oracle utility named SQL*Loader.

Use SQL*Loader To "Extract" (the E in ETL)

A shell script which calls SQL*Loader is listed below:

#!/bin/sh

# etfmas.sh

. /oracle/.orcl

set -x

cd /oracle/vlsql/mkt_data/etfmas/

cat DIA.csv | awk -F, '{print "DIA,"$1",DIA"$1","$6","$7}' | grep 0 > etfmas.csv
cat EFA.csv | awk -F, '{print "EFA,"$1",EFA"$1","$6","$7}' | grep 0 >> etfmas.csv
cat MDY.csv | awk -F, '{print "MDY,"$1",MDY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat OIH.csv | awk -F, '{print "OIH,"$1",OIH"$1","$6","$7}' | grep 0 >> etfmas.csv
cat SHY.csv | awk -F, '{print "SHY,"$1",SHY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat SPY.csv | awk -F, '{print "SPY,"$1",SPY"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLB.csv | awk -F, '{print "XLB,"$1",XLB"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLE.csv | awk -F, '{print "XLE,"$1",XLE"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLF.csv | awk -F, '{print "XLF,"$1",XLF"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLI.csv | awk -F, '{print "XLI,"$1",XLI"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLK.csv | awk -F, '{print "XLK,"$1",XLK"$1","$6","$7}' | grep 0 >> etfmas.csv
cat XLV.csv | awk -F, '{print "XLV,"$1",XLV"$1","$6","$7}' | grep 0 >> etfmas.csv


sqlplus trade/t <<EOF
DROP   TABLE etfmas;
CREATE TABLE etfmas(tkr VARCHAR2(4),ydate DATE,tkr_date VARCHAR2(15),yvol NUMBER,yprice NUMBER)COMPRESS;
EOF

sqlldr trade/t bindsize=20971520 readsize=20971520 rows=123456 control=etfmas.ctl
grep loaded etfmas.log


sqlplus trade/t <<EOF
@etfmas.sql
EOF

SQL*Loader Screendump

A screen dump from the above script is displayed below. Notice that the script ran in about 1 second:

oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$ date; ./etfmas.sh ; date
Fri Apr  3 09:29:31 PDT 2009
+ cd /oracle/vlsql/mkt_data/etfmas/
+ cat DIA.csv
+ awk -F, '{print "DIA,"$1",DIA"$1","$6","$7}'
+ grep 0
+ cat EFA.csv
+ awk -F, '{print "EFA,"$1",EFA"$1","$6","$7}'
+ grep 0
+ cat MDY.csv
+ awk -F, '{print "MDY,"$1",MDY"$1","$6","$7}'
+ grep 0
+ cat OIH.csv
+ awk -F, '{print "OIH,"$1",OIH"$1","$6","$7}'
+ grep 0
+ cat SHY.csv
+ awk -F, '{print "SHY,"$1",SHY"$1","$6","$7}'
+ grep 0
+ cat SPY.csv
+ awk -F, '{print "SPY,"$1",SPY"$1","$6","$7}'
+ grep 0
+ cat XLB.csv
+ awk -F, '{print "XLB,"$1",XLB"$1","$6","$7}'
+ grep 0
+ cat XLE.csv
+ awk -F, '{print "XLE,"$1",XLE"$1","$6","$7}'
+ grep 0
+ cat XLF.csv
+ awk -F, '{print "XLF,"$1",XLF"$1","$6","$7}'
+ grep 0
+ cat XLI.csv
+ awk -F, '{print "XLI,"$1",XLI"$1","$6","$7}'
+ grep 0
+ cat XLK.csv
+ awk -F, '{print "XLK,"$1",XLK"$1","$6","$7}'
+ grep 0
+ cat XLV.csv
+ awk -F, '{print "XLV,"$1",XLV"$1","$6","$7}'
+ grep 0
+ sqlplus trade/t

SQL*Plus: Release 11.1.0.6.0 - Production on Fri Apr 3 09:29:31 2009

Copyright (c) 1982, 2007, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


Session altered.

09:29:31 SQL> SET ECHO ON TIME ON TIMING ON LINES 155 PAGES 99
09:29:31 SQL>
Table dropped.

Elapsed: 00:00:00.10
09:29:32 SQL>
Table created.

Elapsed: 00:00:00.03
09:29:32 SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
+ sqlldr trade/t bindsize=20971520 readsize=20971520 rows=123456 control=etfmas.ctl

SQL*Loader: Release 11.1.0.6.0 - Production on Fri Apr 3 09:29:32 2009

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Commit point reached - logical record count 16256
Commit point reached - logical record count 31464
+ grep loaded etfmas.log
Table ETFMAS, loaded from every logical record.
  31464 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.
+ sqlplus trade/t

SQL*Plus: Release 11.1.0.6.0 - Production on Fri Apr 3 09:29:32 2009

Copyright (c) 1982, 2007, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


Session altered.

09:29:32 SQL> SET ECHO ON TIME ON TIMING ON LINES 155 PAGES 99
09:29:32 SQL> 09:29:32 SQL> --
09:29:32 SQL> -- etfmas.sql
09:29:32 SQL> --
09:29:32 SQL>
09:29:32 SQL> SELECT tkr,COUNT(*),MIN(ydate),MAX(ydate)FROM etfmas GROUP BY tkr ORDER BY tkr ;

TKR    COUNT(*) MIN(YDATE) MAX(YDATE)
---- ---------- ---------- ----------
DIA        2821 1998-01-20 2009-04-02
EFA        1910 2001-08-27 2009-04-02
MDY        3430 1995-08-18 2009-04-02
OIH        2037 2001-02-26 2009-04-02
SHY        1681 2002-07-31 2009-04-02
SPY        4075 1993-01-29 2009-04-02
XLB        2585 1998-12-22 2009-04-02
XLE        2585 1998-12-22 2009-04-02
XLF        2585 1998-12-22 2009-04-02
XLI        2585 1998-12-22 2009-04-02
XLK        2585 1998-12-22 2009-04-02
XLV        2585 1998-12-22 2009-04-02

12 rows selected.

Elapsed: 00:00:00.06
09:29:32 SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Fri Apr  3 09:29:32 PDT 2009
oracle@zareason:~/vlsql/md/etfmas$
oracle@zareason:~/vlsql/md/etfmas$


Evident from the screendump above is that Extraction (the E in ETL), 
of stock prices from yahoo.com flows data into a table named etfmas.

Once etfmas is filled with data, we can consider Extraction for this
effort to be finished.

Transform (the T in ETL)

The script displayed below does the first step of the Transform:

--
-- mas.sql
--

CONNECT trade/t

-- Creates a view full of much of the data needed to drive my Moving-Avg-Study (MAS).
-- Some of the thinking behind this script is written down here:
-- vlsql/notes/notes2008_1209xp.txt

-- MAS depends on Moving Average so I use an inline view to get 3 types of Moving Averages.
CREATE OR REPLACE VIEW etfmai AS SELECT tkr_date,tkr,ydate,yprice,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
-- Get MAS from past 5days, 10days, 20days.  Use LAG() to get yesterday's Moving Average
,yprice_avg5-LAG(yprice_avg5)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg5slp
,yprice_avg10-LAG(yprice_avg10)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg10slp
,yprice_avg20-LAG(yprice_avg20)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg20slp
-- Get PG for future 5days, 10days, 20days
,100 * (yprice5 - yprice)/yprice pct_gain5
,100 * (yprice10 - yprice)/yprice pct_gain10
,100 * (yprice20 - yprice)/yprice pct_gain20
FROM
(
  SELECT tkr_date,tkr,ydate,yprice,yvol
  -- To get MAS, I 1st need Moving Avg.  I like 3 types: 5day, 10day, 20day
  ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) yprice_avg5
  ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) yprice_avg10
  ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)yprice_avg20
  -- To get future PG I use LEAD()
  ,LEAD(yprice,5,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date) yprice5
  ,LEAD(yprice,10,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice10
  ,LEAD(yprice,20,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice20
  FROM etfmas WHERE yprice>0
  ORDER BY tkr_date
)
ORDER BY tkr_date
/

-- Create a view which will help create the mas table.
CREATE OR REPLACE VIEW mas10 AS SELECT tkr_date,sysdate - ydate dayspast,yprice,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
-- The columns below should not propagate to the SVM-build-score tables.
-- They need to be dropped or filtered out later.  For now, I need them.
,pct_gain5
,pct_gain10
,pct_gain20
,tkr
,ydate
--
FROM etfmai ORDER BY tkr_date
/

-- The mas table contains a super-set of the columns which will go into the SVM-build-score tables.
-- Add slope columns from SHY, and XLE under assumption that interest rates and energy prices affect ETF prices.
DROP TABLE mas;
CREATE TABLE mas AS SELECT m.tkr_date,sysdate - m.ydate dayspast,m.yprice,m.yvol
,m.yprice_avg5
,m.yprice_avg10
,m.yprice_avg20
,m.yprice_avg5slp
,m.yprice_avg10slp
,m.yprice_avg20slp
,'nup' aa_targ_att
--
,shy.yprice_avg5slp shy_yprice_avg5slp
,xle.yprice_avg5slp xle_yprice_avg5slp
--
,shy.yprice_avg10slp shy_yprice_avg10slp
,xle.yprice_avg10slp xle_yprice_avg10slp
--
,shy.yprice_avg20slp shy_yprice_avg20slp
,xle.yprice_avg20slp xle_yprice_avg20slp
--
-- The columns below should not propagate to the SVM-build-score tables.
-- They need to be dropped or filtered out later.  For now, I need them.
,m.pct_gain5
,m.pct_gain10
,m.pct_gain20
,m.tkr
,m.ydate
FROM mas10 m, mas10 shy, mas10 xle
WHERE m.ydate=shy.ydate AND m.ydate=xle.ydate
AND shy.tkr='SHY'
AND xle.tkr='XLE'
ORDER BY m.tkr_date
/

-- The model-builder chokes on wide target attributes. Slim down to 3.
ALTER TABLE mas MODIFY aa_targ_att VARCHAR2(3);



mas.sql Screendump



21:56:19 SQL>
21:56:20 SQL> @mas.sql
21:56:25 SQL> --
21:56:25 SQL> -- mas.sql
21:56:25 SQL> --
21:56:25 SQL>
21:56:25 SQL> CONNECT trade/t
Connected.

Session altered.

Elapsed: 00:00:00.00
21:56:25 SQL> SET ECHO ON TIME ON TIMING ON LINES 155 PAGES 99
21:56:25 SQL>
21:56:25 SQL> -- Creates a view full of much of the data needed to drive my Moving-Avg-Study (MAS).
21:56:25 SQL> -- Some of the thinking behind this script is written down here:
21:56:25 SQL> -- vlsql/notes/notes2008_1209xp.txt
21:56:25 SQL>
21:56:25 SQL> -- MAS depends on Moving Average so I use an inline view to get 3 types of Moving Averages.
21:56:25 SQL> CREATE OR REPLACE VIEW etfmai AS SELECT tkr_date,tkr,ydate,yprice,yvol
21:56:25   2  ,yprice_avg5
21:56:25   3  ,yprice_avg10
21:56:25   4  ,yprice_avg20
21:56:25   5  -- Get MAS from past 5days, 10days, 20days.  Use LAG() to get yesterday's Moving Average
21:56:25   6  ,yprice_avg5-LAG(yprice_avg5)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg5slp
21:56:25   7  ,yprice_avg10-LAG(yprice_avg10)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg10slp
21:56:25   8  ,yprice_avg20-LAG(yprice_avg20)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice_avg20slp
21:56:25   9  -- Get PG for future 5days, 10days, 20days
21:56:25  10  ,100 * (yprice5 - yprice)/yprice pct_gain5
21:56:25  11  ,100 * (yprice10 - yprice)/yprice pct_gain10
21:56:25  12  ,100 * (yprice20 - yprice)/yprice pct_gain20
21:56:25  13  FROM
21:56:25  14  (
21:56:25  15    SELECT tkr_date,tkr,ydate,yprice,yvol
21:56:25  16    -- To get MAS, I 1st need Moving Avg.  I like 3 types: 5day, 10day, 20day
21:56:25  17    ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) yprice_avg5
21:56:25  18    ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) yprice_avg10
21:56:25  19    ,AVG(yprice)OVER(PARTITION BY tkr ORDER BY tkr_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW)yprice_avg20
21:56:25  20    -- To get future PG I use LEAD()
21:56:25  21    ,LEAD(yprice,5,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date) yprice5
21:56:25  22    ,LEAD(yprice,10,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice10
21:56:25  23    ,LEAD(yprice,20,yprice)OVER(PARTITION BY tkr ORDER BY tkr_date)yprice20
21:56:25  24    FROM etfmas WHERE yprice>0
21:56:25  25    ORDER BY tkr_date
21:56:25  26  )
21:56:25  27  ORDER BY tkr_date
21:56:25  28  /

View created.

Elapsed: 00:00:00.79
21:56:25 SQL>
21:56:25 SQL> -- Create a view which will help create the mas table.
21:56:25 SQL> CREATE OR REPLACE VIEW mas10 AS SELECT tkr_date,sysdate - ydate dayspast,yprice,yvol
21:56:25   2  ,yprice_avg5
21:56:25   3  ,yprice_avg10
21:56:25   4  ,yprice_avg20
21:56:25   5  ,yprice_avg5slp
21:56:25   6  ,yprice_avg10slp
21:56:25   7  ,yprice_avg20slp
21:56:25   8  -- The columns below should not propagate to the SVM-build-score tables.
21:56:25   9  -- They need to be dropped or filtered out later.  For now, I need them.
21:56:25  10  ,pct_gain5
21:56:25  11  ,pct_gain10
21:56:25  12  ,pct_gain20
21:56:25  13  ,tkr
21:56:25  14  ,ydate
21:56:25  15  --
21:56:25  16  FROM etfmai ORDER BY tkr_date
21:56:25  17  /

View created.

Elapsed: 00:00:00.04
21:56:25 SQL>
21:56:25 SQL> -- The mas table contains a super-set of the columns which will go into the SVM-build-score tables.
21:56:25 SQL> -- Add slope columns from SHY, and XLE under assumption that interest rates and energy prices affect ETF prices.
21:56:25 SQL> DROP TABLE mas;

Table dropped.

Elapsed: 00:00:00.13
21:56:26 SQL> CREATE TABLE mas AS SELECT m.tkr_date,sysdate - m.ydate dayspast,m.yprice,m.yvol
21:56:26   2  ,m.yprice_avg5
21:56:26   3  ,m.yprice_avg10
21:56:26   4  ,m.yprice_avg20
21:56:26   5  ,m.yprice_avg5slp
21:56:26   6  ,m.yprice_avg10slp
21:56:26   7  ,m.yprice_avg20slp
21:56:26   8  ,'nup' aa_targ_att
21:56:26   9  --
21:56:26  10  ,shy.yprice_avg5slp shy_yprice_avg5slp
21:56:26  11  ,xle.yprice_avg5slp xle_yprice_avg5slp
21:56:26  12  --
21:56:26  13  ,shy.yprice_avg10slp shy_yprice_avg10slp
21:56:26  14  ,xle.yprice_avg10slp xle_yprice_avg10slp
21:56:26  15  --
21:56:26  16  ,shy.yprice_avg20slp shy_yprice_avg20slp
21:56:26  17  ,xle.yprice_avg20slp xle_yprice_avg20slp
21:56:26  18  --
21:56:26  19  -- The columns below should not propagate to the SVM-build-score tables.
21:56:26  20  -- They need to be dropped or filtered out later.  For now, I need them.
21:56:26  21  ,m.pct_gain5
21:56:26  22  ,m.pct_gain10
21:56:26  23  ,m.pct_gain20
21:56:26  24  ,m.tkr
21:56:26  25  ,m.ydate
21:56:26  26  FROM mas10 m, mas10 shy, mas10 xle
21:56:26  27  WHERE m.ydate=shy.ydate AND m.ydate=xle.ydate
21:56:26  28  AND shy.tkr='SHY'
21:56:26  29  AND xle.tkr='XLE'
21:56:26  30  ORDER BY m.tkr_date
21:56:26  31  /

Table created.

Elapsed: 00:00:01.30
21:56:27 SQL>
21:56:27 SQL> -- The model-builder chokes on wide target attributes. Slim down to 3.
21:56:27 SQL> ALTER TABLE mas MODIFY aa_targ_att VARCHAR2(3);

Table altered.

Elapsed: 00:00:00.01

21:56:27 SQL>
21:56:32 SQL>

As we can see, the script only needed 12 seconds to run.

List of Columns in the mas Table



22:04:11 SQL> DESCRIBE mas
 Name                       Null?    Type
 -------------------------- -------- ------------------
 TKR_DATE                            VARCHAR2(15)
 DAYSPAST                            NUMBER
 YPRICE                              NUMBER
 YVOL                                NUMBER
 YPRICE_AVG5                         NUMBER
 YPRICE_AVG10                        NUMBER
 YPRICE_AVG20                        NUMBER
 YPRICE_AVG5SLP                      NUMBER
 YPRICE_AVG10SLP                     NUMBER
 YPRICE_AVG20SLP                     NUMBER
 AA_TARG_ATT                         VARCHAR2(3)
 SHY_YPRICE_AVG5SLP                  NUMBER
 XLE_YPRICE_AVG5SLP                  NUMBER
 SHY_YPRICE_AVG10SLP                 NUMBER
 XLE_YPRICE_AVG10SLP                 NUMBER
 SHY_YPRICE_AVG20SLP                 NUMBER
 XLE_YPRICE_AVG20SLP                 NUMBER
 PCT_GAIN5                           NUMBER
 PCT_GAIN10                          NUMBER
 PCT_GAIN20                          NUMBER
 TKR                                 VARCHAR2(4)
 YDATE                               DATE

22:04:16 SQL>


Transform Even More(the T in ETL)

The next part of our transformation step involves copying data from
the mas table into dimension tables and a fact table.

A comprehensive discussion about dimension tables and a fact tables
can be found at this URL:

http://philip.greenspun.com/sql/data-warehousing.html

We focus first on the most important thing in our data.  Obviously the
most important columns in the mas table are PCT_GAIN5, PCT_GAIN10, and PCT_GAIN20.

If we can find a cause and effect relationship between any of the
columns like TKR or SHY_YPRICE_AVG5SLP and percentage gain, then our
work is done and we have the knowledge we need to get rich.

We will use data warehousing jargon to label percent_gain as a "measure".

Typically in data warehousing efforts, the measure corresponds to some
attribute in some system which you want to predict.

From an implementaion perspective, the measure only appears in a fact
table.  The dimensions appear in dimension tables.  Both the fact
table and its dimension tables need to be constructed so they can be joined.

The other columns in the mas table correspond to "dimensions".  If the
values of dimensions (alone or working in concert) affect the values
of a measure we want to see this relationship.

OLAP reports are sometimes well suited for showing the relationships
between dimensions and a measure.


Arbitrarily, we have chosen to connect the following dimensions to our
first OLAP data store:
  - time
  - tkr
  - strategy
  - holding_period
  - pivot_point


The time dimension is the most complicated due to the fact that it is hierarchical.
A script to create the tables for the time dimension is displayed below:



--
-- cr_time_dimension.sql
--

-- Start by creating a table full of dates.
DROP   TABLE dropme_soon;
CREATE TABLE dropme_soon(ydate DATE);
BEGIN
  FOR i IN 1..21234 LOOP
    INSERT INTO dropme_soon SELECT TRUNC(TO_DATE('2020-01-01','YYYY-MM-DD')) - i FROM dual;
  END LOOP;
END;
/
SELECT MIN(ydate),MAX(ydate),COUNT(*)FROM dropme_soon;

-- Next create a table full of days which exist between the endpoints of the data-dates in table: mas.
DROP   TABLE day_dimension ;
CREATE TABLE day_dimension AS
SELECT
0+TO_CHAR(ydate,'YYYYMMDD')day_id
,0+TO_CHAR(ydate,'YYYYMM')month_id
,TO_CHAR(ydate,'YYYY_MM_DD')day_dsc
,ydate day_end_date
,1 day_time_span
FROM dropme_soon
WHERE ydate BETWEEN (SELECT MIN(ydate)FROM mas)AND(SELECT MAX(ydate)FROM mas)
/

-- Months next.
DROP   TABLE month_dimension ;
CREATE TABLE month_dimension AS
SELECT
0+TO_CHAR(day_end_date,'YYYYMM')month_id
,0+TO_CHAR(day_end_date,'YYYY')year_id
,TO_CHAR(day_end_date,'YYYY_MM')month_dsc
,MAX(day_end_date) month_end_date
,1 + MAX(day_end_date) - MIN(day_end_date) month_time_span
FROM day_dimension
GROUP BY
0+TO_CHAR(day_end_date,'YYYYMM')
,0+TO_CHAR(day_end_date,'YYYY')
,TO_CHAR(day_end_date,'YYYY_MM')
/

SELECT COUNT(*)FROM month_dimension ORDER BY month_id;
-- SELECT * FROM month_dimension ORDER BY month_id;

-- Years next.
DROP   TABLE year_dimension ;
CREATE TABLE year_dimension AS
SELECT
0+TO_CHAR(day_end_date,'YYYY')year_id
,1 all_time_id
,TO_CHAR(day_end_date,'YYYY')year_dsc
,MAX(day_end_date) year_end_date
,1 + MAX(day_end_date) - MIN(day_end_date) year_time_span
FROM day_dimension
GROUP BY
0+TO_CHAR(day_end_date,'YYYY')
,1
,TO_CHAR(day_end_date,'YYYY')
/

-- All time next
DROP   TABLE all_time_dimension ;
CREATE TABLE all_time_dimension AS
SELECT
1 all_time_id
,'All_Time'all_time_dsc
,MAX(day_end_date) all_time_end_date
,1 + MAX(day_end_date) - MIN(day_end_date) all_time_time_span
FROM day_dimension
/

SELECT * FROM year_dimension ORDER BY year_id;
SELECT * FROM all_time_dimension ;

-- See how well they join:
SELECT COUNT(*)FROM day_dimension;

SELECT COUNT(*)FROM day_dimension d, month_dimension m, year_dimension y, all_time_dimension a
WHERE d.month_id = m.month_id AND m.year_id = y.year_id AND y.all_time_id = a.all_time_id
/


The script to create the other dimensions is simple:

--
-- cr_dimensions.sql
--

DROP   TABLE tkr_dimension ;
CREATE TABLE tkr_dimension(tkr VARCHAR2(5));
INSERT INTO tkr_dimension VALUES('DIA');
INSERT INTO tkr_dimension VALUES('EFA');
INSERT INTO tkr_dimension VALUES('MDY');
INSERT INTO tkr_dimension VALUES('OIH');
INSERT INTO tkr_dimension VALUES('SHY');
INSERT INTO tkr_dimension VALUES('SPY');
INSERT INTO tkr_dimension VALUES('XLB');
INSERT INTO tkr_dimension VALUES('XLE');
INSERT INTO tkr_dimension VALUES('XLF');
INSERT INTO tkr_dimension VALUES('XLI');
INSERT INTO tkr_dimension VALUES('XLK');
INSERT INTO tkr_dimension VALUES('XLV');

DROP   TABLE strategy_dimension;
CREATE TABLE strategy_dimension(strategy VARCHAR2(9));
INSERT INTO strategy_dimension VALUES('bh');
INSERT INTO strategy_dimension VALUES('loc');
INSERT INTO strategy_dimension VALUES('los');

DROP   TABLE pivot_point_dimension;
CREATE TABLE pivot_point_dimension(pivot_point NUMBER);
INSERT INTO pivot_point_dimension VALUES(5);
INSERT INTO pivot_point_dimension VALUES(10);
INSERT INTO pivot_point_dimension VALUES(20);

DROP   TABLE holding_period_dimension;
CREATE TABLE holding_period_dimension(holding_period NUMBER);
INSERT INTO holding_period_dimension VALUES(5);
INSERT INTO holding_period_dimension VALUES(10);
INSERT INTO holding_period_dimension VALUES(20);

The script to create the fact table is a bit more involved:

--
-- cr_yfact.sql
--

-- This script creates yfact from the mas table.
-- We call it yfact because it contains data from yahoo.

-- We use a series of SELECTs and INSERTs.
-- Each item in the series corresponds to a unique combination of dimensions for the yfact table.

-- The dimensions and their values are listed below:

-- strategy:       bh, loc, los
-- pivot_point:    5,10,20
-- holding_period: 5,10,20

-- tkr:
--  - DIA
--  - EFA
--  - MDY
--  - OIH
--  - SHY
--  - SPY
--  - XLB
--  - XLE
--  - XLF
--  - XLI
--  - XLK
--  - XLV

-- DROP and CREATE yfact first
-- pg (percentage gain, is the measure)
DROP   TABLE yfact;
CREATE TABLE yfact
(
pg              NUMBER
,day_id         NUMBER -- fk to day_dimension
,tkr            VARCHAR2(11) -- DIA, EFA, MDY, ...
,strategy       VARCHAR2(11) -- bh, loc, los
,pivot_point    NUMBER -- 5,10,20
,holding_period NUMBER -- 5,10,20
)
/

-- SELECT, INSERT rows where strategy == bh
-- SELECT, INSERT rows where pivot_point == 5,10,20
-- SELECT, INSERT rows where holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain5,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh', 5, 5 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain5,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',10, 5 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain5,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',20, 5 FROM mas;

-- SELECT, INSERT rows where strategy == bh
-- SELECT, INSERT rows where pivot_point == 5,10,20
-- SELECT, INSERT rows where holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain10,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh', 5,10 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain10,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',10,10 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain10,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',20,10 FROM mas;

-- SELECT, INSERT rows where strategy == bh
-- SELECT, INSERT rows where pivot_point == 5,10,20
-- SELECT, INSERT rows where holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain20,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh', 5,20 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain20,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',10,20 FROM mas;
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)SELECT pct_gain20,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'bh',20,20 FROM mas;

--
-- loc below
--

-- SELECT, INSERT rows where strategy == loc
-- SELECT, INSERT rows where pivot_point == 5
-- SELECT, INSERT rows where holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
-- loc logic: If mvg-avg slope < 0 hold cash (which gives a pg of 0) ELSE accept pg from long position
CASE WHEN yprice_avg5slp<0 THEN 0 ELSE pct_gain5 END -- loc logic implemented here
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 5, 5 FROM mas;

-- strategy == loc, pivot_point == 10, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN 0 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 10, 5 FROM mas;

-- strategy == loc, pivot_point == 20, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN 0 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 20, 5 FROM mas;

-- bump holding_period to 10

-- strategy == loc, pivot_point == 5, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN 0 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 5, 10 FROM mas;

-- strategy == loc, pivot_point == 10, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN 0 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 10, 10 FROM mas;

-- strategy == loc, pivot_point == 20, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN 0 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 20, 10 FROM mas;

-- bump holding_period to 20

-- strategy == loc, pivot_point == 5, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN 0 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 5, 20 FROM mas;

-- strategy == loc, pivot_point == 10, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN 0 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 10, 20 FROM mas;

-- strategy == loc, pivot_point == 20, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN 0 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'loc', 20, 20 FROM mas;


--
-- los below
--

-- SELECT, INSERT rows where strategy == los
-- SELECT, INSERT rows where pivot_point == 5
-- SELECT, INSERT rows where holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
-- los logic: If mvg-avg slope < 0 accept -pg from short position ELSE accept pg from long position
CASE WHEN yprice_avg5slp<0 THEN -pct_gain5 ELSE pct_gain5 END -- los logic implemented here
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 5, 5 FROM mas;

-- strategy == los, pivot_point == 10, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN -pct_gain5 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 10, 5 FROM mas;

-- strategy == los, pivot_point == 20, holding_period == 5
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN -pct_gain5 ELSE pct_gain5 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 20, 5 FROM mas;

-- bump holding_period to 10

-- strategy == los, pivot_point == 5, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN -pct_gain10 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 5, 10 FROM mas;

-- strategy == los, pivot_point == 10, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN -pct_gain10 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 10, 10 FROM mas;

-- strategy == los, pivot_point == 20, holding_period == 10
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN -pct_gain10 ELSE pct_gain10 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 20, 10 FROM mas;

-- bump holding_period to 20

-- strategy == los, pivot_point == 5, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg5slp<0 THEN -pct_gain20 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 5, 20 FROM mas;

-- strategy == los, pivot_point == 10, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg10slp<0 THEN -pct_gain20 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 10, 20 FROM mas;

-- strategy == los, pivot_point == 20, holding_period == 20
INSERT INTO yfact(pg,day_id,tkr,strategy,pivot_point,holding_period)
SELECT
CASE WHEN yprice_avg20slp<0 THEN -pct_gain20 ELSE pct_gain20 END
,0+TO_CHAR(ydate,'YYYYMMDD'),tkr,'los', 20, 20 FROM mas;

--
-- Verify that the dimension tables can be joined
--

SELECT COUNT(*)FROM yfact;

SELECT COUNT(*)
FROM yfact y
,day_dimension d
,tkr_dimension t
,strategy_dimension s
,pivot_point_dimension p
,holding_period_dimension h
WHERE y.day_id        =d.day_id
AND   y.tkr           =t.tkr
AND   y.strategy      =s.strategy
AND   y.pivot_point   =p.pivot_point
AND   y.holding_period=h.holding_period
/


Once the yfact table is created, we can build more fact tables from it:

--
-- cr_yfact_ru.sql
--

-- Creates a set of fact tables which have dimensions "rolled-up" using a simple average.

-- Average-Roll-Up tkr
DROP   TABLE yfact_rut;
CREATE TABLE yfact_rut AS SELECT AVG(pg)pg,day_id,strategy,pivot_point,holding_period FROM yfact GROUP BY day_id,strategy,pivot_point,holding_period;

-- Average-Roll-Up strategy
DROP   TABLE yfact_rus ;
CREATE TABLE yfact_rus AS SELECT AVG(pg)pg,day_id,tkr,pivot_point,holding_period FROM yfact GROUP BY day_id,tkr,pivot_point,holding_period;

-- Average-Roll-Up pivot_point
DROP   TABLE yfact_rup ;
CREATE TABLE yfact_rup AS SELECT AVG(pg)pg,day_id,tkr,strategy,holding_period FROM yfact GROUP BY day_id,tkr,strategy,holding_period;

-- Average-Roll-Up holding_period
DROP   TABLE yfact_ruh;
CREATE TABLE yfact_ruh AS SELECT AVG(pg)pg,day_id,tkr,strategy,pivot_point FROM yfact GROUP BY day_id,tkr,strategy,pivot_point;

-- Average-Roll-Up tkr and strategy
DROP   TABLE yfact_ruts;
CREATE TABLE yfact_ruts AS SELECT AVG(pg)pg,day_id,pivot_point,holding_period FROM yfact GROUP BY day_id,pivot_point,holding_period;

-- Average-Roll-Up tkr and pivot_point
DROP   TABLE yfact_rutp;
CREATE TABLE yfact_rutp AS SELECT AVG(pg)pg,day_id,strategy,holding_period FROM yfact GROUP BY day_id,strategy,holding_period;

-- Average-Roll-Up tkr and holding_period
DROP   TABLE yfact_ruth;
CREATE TABLE yfact_ruth AS SELECT AVG(pg)pg,day_id,strategy,pivot_point FROM yfact GROUP BY day_id,strategy,pivot_point;


Load (the L in ETL) and Analytic Workspace Manager (AWM)

Once the fact tables are built we can begin working with the Oracle tool named Analytic Workspace Manager (AWM).

A general tutorial for working with Analytic Workspace Manager should be available from one or both of the URLs listed below:

- http://www.oracle.com/technology/obe/10gr2_db_single/bidw/awm/awm_otn.htm

- http://www.google.com/search?q=awm+tutorial

We start with the dimensions.

The trickiest dimension to setup is the time dimension so we start with that:
  - Start AWM.  We use a command line like this:
    - ${JAVA_HOME}/bin/java  -mx512m -jar awm11.1.0.7.0B.jar &
  - Point the tool at a database and a schema.
  - Once that is done we see something like this:


Next create an "Analytic Workspace" via right-click and then see something like this:


Next, start creation of a new dimension via right-click on "Dimensions":



Via right-clicks, create levels: all, year, month, and day:


Via right-clicks, create hierarchy: time_hierarchy: Put all at top, year below that, month below that, day below that:



Next, do "mappings": Left click mappings. We see 3 panes:


Switch from star-schema to snowflake-schema:


In the middle pane find the tables created by the cr_time_dimension.sql script: all_time_dimension, year_dimension, month_dimension, and day_dimension. Drag them to the right-hand-pane:


Connect them to the TIME dimension:


The other dimensions are easier to build than the time dimension.

Screenshots for the tkr dimension are displayed below:





The other dimensions like the tkr dimension are easy to build:





Again, notice that the time dimension is different than the others:
  - It has levels
  - It has a hierarchy
  - It is of type "Time" not "User"
  - It was built from a Snowflake Schema not Star Schema

Now that the dimensions are built we create the YCUBE from the yfact table.

Right-click on Cubes-icon to bring up the "Create Cube" dialog box.
The images describe the steps:








Next, create a measure named pg. It's very easy; right-click to bring up the dialog box. Give it a name; and press "Create":



Next, we map the yfact table to YCUBE. Right-click on "Mappings" icon and then drag the yfact table to the right-hand-pane:



Connect yfact columns to YCUBE attributes:



That completes construction of YCUBE.

Next, we fill YCUBE and the dimensions with data from the dimension
tables and the yfact table.

This is a simple multi-step process:
  - Right-click YCUBE
  - Select "Maintain cube YCUBE..." to bring up the maintain dialog box:


Click "Finish" and wait about 10 or 20 minutes (if your machine is slow):



At this point, YCUBE and its dimensions are full of data. Right-click the "PG" measure-icon inside of the "Measures" folder icon:



Click "View Data PG..." to bring up the "Measure Data Viewer":



The data viewer is not intuitive to many people.

We usually start by clicking the question-mark-pencil in the upper-left-hand corner.

This brings up a box called the "Query Builder":


Now, our intention is to enable the dimensions so they are viewable.

Note that "Query Builder" is designed to disable most of the dimension values by default.

Perhaps this will confuse first-time users.

Maybe a better design dictates that all dimension values are displayed by default.

We click on the "Dimensions" tab and we see this:


We use the chevrons to enable the Time dimension values:



Then we enable other dimension values:






The appearence of the Measure Data Viewer changes dramtically:



We expand it to fill the screen. We see that the bh strategy did poorly in 2008 and the first quarter of 2009:



Some interesting results from the loc strategy are listed below:




It is obvious that the Measure Data Viewer is useful.

A bit of interaction with it reveals that the Buy and Hold strategy worked best.
But, anyone who followed that strategy in 2008 suffered significant loss.

Let us now turn our attention to the question, "Can the SVM algorithim
resident inside Orace Data Mining enhance our ability to use the
strategies Buy and Hold, Long or Cash, and Long or Short?"

One Dimensional SVM Description

The SVM algorithm may be a useful tool for predicting future percentage gain of an ETF. How does the SVM algorithm work? Imagine that at 4:01 pm every trading day we collect the 20-day moving average slope of the DIA price. What do we see? Well, this is easily simulated via query against the mas table.

12:03:36 SQL> desc mas

 Name                             Null?    Type
 -------------------------------- -------- -----------------------
 TKR_DATE                                  VARCHAR2(15)
 DAYSPAST                                  NUMBER
 YPRICE                                    NUMBER
 YVOL                                      NUMBER
 YPRICE_AVG5                               NUMBER
 YPRICE_AVG10                              NUMBER
 YPRICE_AVG20                              NUMBER
 YPRICE_AVG5SLP                            NUMBER
 YPRICE_AVG10SLP                           NUMBER
 YPRICE_AVG20SLP                           NUMBER
 AA_TARG_ATT                               VARCHAR2(3)
 SHY_YPRICE_AVG5SLP                        NUMBER
 XLE_YPRICE_AVG5SLP                        NUMBER
 SHY_YPRICE_AVG10SLP                       NUMBER
 XLE_YPRICE_AVG10SLP                       NUMBER
 SHY_YPRICE_AVG20SLP                       NUMBER
 XLE_YPRICE_AVG20SLP                       NUMBER
 PCT_GAIN5                                 NUMBER
 PCT_GAIN10                                NUMBER
 PCT_GAIN20                                NUMBER
 TKR                                       VARCHAR2(4)
 YDATE                                     DATE

12:06:05 SQL> SELECT ydate,tkr,yprice_avg20slp FROM mas WHERE tkr='DIA'AND ydate BETWEEN'2008-01-02'AND'2008-01-31'ORDER BY ydate;

YDATE      TKR  YPRICE_AVG20SLP
---------- ---- ---------------
2008-01-02 DIA           -.1145
2008-01-03 DIA            -.067
2008-01-04 DIA             -.29
2008-01-07 DIA            -.374
2008-01-08 DIA            -.497
2008-01-09 DIA           -.4915
2008-01-10 DIA           -.3045
2008-01-11 DIA           -.4125
2008-01-14 DIA           -.3485
2008-01-15 DIA            -.381
2008-01-16 DIA           -.3385
2008-01-17 DIA           -.5095
2008-01-18 DIA            -.547
2008-01-22 DIA            -.634
2008-01-23 DIA           -.5595
2008-01-24 DIA            -.571
2008-01-25 DIA            -.637
2008-01-28 DIA            -.487
2008-01-29 DIA           -.4245
2008-01-30 DIA             -.42
2008-01-31 DIA           -.2185

21 rows selected.

Elapsed: 00:00:00.01
12:08:03 SQL>

Next, for each of these days we wait 10 trading days and then collect resulting percentage gain. Again, this is easily simulated via query against the mas table.


12:08:03 SQL> SELECT ydate,tkr,yprice_avg20slp,pct_gain10 FROM mas WHERE tkr='DIA'AND ydate BETWEEN'2008-01-02'AND'2008-01-31'ORDER BY ydate;

YDATE      TKR  YPRICE_AVG20SLP PCT_GAIN10
---------- ---- --------------- ----------
2008-01-02 DIA           -.1145 -4.5772709
2008-01-03 DIA            -.067 -6.8605108
2008-01-04 DIA             -.29 -5.8196393
2008-01-07 DIA            -.374 -6.7950261
2008-01-08 DIA            -.497 -2.3838681
2008-01-09 DIA           -.4915  -2.670335
2008-01-10 DIA           -.3045 -4.2922595
2008-01-11 DIA           -.4125 -1.9946267
2008-01-14 DIA           -.3485 -2.2157763
2008-01-15 DIA            -.381 -1.0987209
2008-01-16 DIA           -.3385 1.19508778
2008-01-17 DIA           -.5095 4.77556531
2008-01-18 DIA            -.547 4.52804494
2008-01-22 DIA            -.634 3.13306938
2008-01-23 DIA           -.5595 -.35962198
2008-01-24 DIA            -.571 -.56534752
2008-01-25 DIA            -.637 -.36113211
2008-01-28 DIA            -.487 -.98022927
2008-01-29 DIA           -.4245 -.61799604
2008-01-30 DIA             -.42 1.19383187
2008-01-31 DIA           -.2185 -1.5800619

21 rows selected.

Elapsed: 00:00:00.02
12:14:57 SQL>
We ask a simple question, is pct_gain10 dependent on yprice_avg20slp?

If there is a dependency, it is not obvious from the above report of 21 rows.

Perhaps we do have a dependency but it is very weak.

In order to express, or find, this dependency we try an idea from Data
Mining called Classification.

It's a simple idea; we classify the data in pct_gain10 into two groups:
  up-group  contains rows where pct_gain10 >  3.0
  nup-group contains rows where pct_gain10 <= 3.0

SQL is well suited for showing us the two groups:



12:32:24 SQL> l
  1  SELECT ydate,tkr,yprice_avg20slp,pct_gain10,
  2  CASE WHEN pct_gain10 >  3.0 THEN 'up' ELSE 'nup' END the_class
  3* FROM mas WHERE tkr='DIA'AND ydate BETWEEN'2008-01-02'AND'2008-01-31'ORDER BY ydate
12:32:25 SQL> /

YDATE      TKR  YPRICE_AVG20SLP PCT_GAIN10 THE
---------- ---- --------------- ---------- ---
2008-01-02 DIA           -.1145 -4.5772709 nup
2008-01-03 DIA            -.067 -6.8605108 nup
2008-01-04 DIA             -.29 -5.8196393 nup
2008-01-07 DIA            -.374 -6.7950261 nup
2008-01-08 DIA            -.497 -2.3838681 nup
2008-01-09 DIA           -.4915  -2.670335 nup
2008-01-10 DIA           -.3045 -4.2922595 nup
2008-01-11 DIA           -.4125 -1.9946267 nup
2008-01-14 DIA           -.3485 -2.2157763 nup
2008-01-15 DIA            -.381 -1.0987209 nup
2008-01-16 DIA           -.3385 1.19508778 nup
2008-01-17 DIA           -.5095 4.77556531 up
2008-01-18 DIA            -.547 4.52804494 up
2008-01-22 DIA            -.634 3.13306938 up
2008-01-23 DIA           -.5595 -.35962198 nup
2008-01-24 DIA            -.571 -.56534752 nup
2008-01-25 DIA            -.637 -.36113211 nup
2008-01-28 DIA            -.487 -.98022927 nup
2008-01-29 DIA           -.4245 -.61799604 nup
2008-01-30 DIA             -.42 1.19383187 nup
2008-01-31 DIA           -.2185 -1.5800619 nup

21 rows selected.


Next, we turn to wikipedia.org and look up 'SVM Algorithm':
http://www.google.com/search?q=SVM+algorithm+site:wikipedia.org

We see the following sentence there:

"Viewing input data as two sets of vectors in an n-dimensional space,
an SVM will construct a separating hyperplane in that space, one which
maximizes the margin between the two data sets."

Let's scrutinze the above sentence.

In our example, "Viewing input data as two sets of vectors" means that
we have separated our rows into two groups, the up-group and the
nup-group.

In our example, "Viewing input data as two sets of vectors in an
n-dimensional space" means that we view our 2 groups by plotting them
on a 1-dimensional line.

We have only 1-dimension because a 1-dimensional line is easy to visualize.

How exactly do we view our 1-dimensional line?

Step 1, draw a horizontal line.

Step 2, write "yprice_avg20slp" to the right of it.

Step 3, write "0" in the middle of it.

Step 4, write a tiny, minscule red "n" on the line for each nup-row:
  - The first  one is to the left of the 0 at location -.1145
  - The second one is to the left of the 0 at location -.067

Step 5, write a tiny, minscule green "u" on the line for each up-row:
  - The first  one is to the left of the 0 at location -.5095
  - The second one is to the left of the 0 at location  -.547

So that 5-step process helps us see the 1-dimensional space.

We return to the wikipedia-sentence and read the next clause:
  - "an SVM will construct a separating hyperplane"

In our example this means that SVM will draw a curvy line such that
the green-ups are separated from the red-nups.

We return to the wikipedia-sentence and read the next clause:
  - "one which maximizes the margin between the two data sets."

In our example this means that SVM will draw the curvy line as much in
the middle between the green-ups and the red-nups as possible.

Finally, we may obtain a row in the mas table where the value for
yprice_avg20slp is known but "up" or "nup" are not yet known.

The idea then is to plot the value of yprice_avg20slp on the
1-dimensional line as a blue-dot.  Next we see which side of the curvy
line that the blue-dot resides.

Based on that, we predict whether this blue-dot should be a red-nup or a green-up.

Perhaps it is obvious that a one-dimensional-SVM lacks predictive
power.  But we do believe that a discussion of a one-dimensional-SVM
helps us introduce the concept of an n-dimensional-SVM.

Two Dimensional SVM Description

When we look at a description of the mas table, it is obvious that we
can add a second dimension to our discussion of SVM.

We pick xle_yprice_avg20slp which is the moving average slope of the XLE ETF.

The thinking is that if xle_yprice_avg20slp is high then energy prices
are increasing and this drags down the price of DIA.

So, now we have two dimensions:
  - yprice_avg20slp
  - xle_yprice_avg20slp

We plot them on a two-dimensional graph just like we ploted tables of
x and y back in Middle School.

Each point is a tiny black-circle if it is in the up-class or a tiny
white-circle if it is in the nup-class.

Once this plot is done and SVM is implemented, we might visualize the
graph as that displayed here:

http://en.wikipedia.org/wiki/Support_vector_machine#Motivation

The above image depicts the separating hyperplane constructed by SVM
as a straight-red-line.  Keep in mind that we may visualize the
separating hyperplane as a curvy line.

Classifying ETF percentage gain via SVM inside of Oracle Dataminer

We start our classification effort by creating a set of shell scripts. The names of the scripts are listed below:

oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ ll mas*score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_DIA_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_EFA_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_MDY_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_OIH_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_SHY_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_SPY_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_XLB_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_XLE_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_XLF_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_XLI_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_XLK_score.sh
-rwxr-xr-x 1 oracle dba   80 2009-02-16 22:51 mas_bh_XLV_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_DIA_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_EFA_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_MDY_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_OIH_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_SHY_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_SPY_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_XLB_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_XLE_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_XLF_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_XLI_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_XLK_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_loc_XLV_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_DIA_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_EFA_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_MDY_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_OIH_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_SHY_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_SPY_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_XLB_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_XLE_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_XLF_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_XLI_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_XLK_score.sh
-rwxr-xr-x 1 oracle dba   82 2009-02-04 15:10 mas_los_XLV_score.sh
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

It is obvious from the names of the scripts that each script "scores"
a set of rows for a specific strategy and a specific ETF ticker
symbol.

For example the last script, mas_los_XLV_score.sh, scores a set of
rows for the Long-Or-Short strategy applied to the XLV ETF.

The verb "score" means classify but with the added nuance that SVM
attach a confidence-level (expressed as a probability) to the
classification attempt.

Creating the above 36 scripts by hand is too tedious.  Instead, a ruby
script writes the scripts:


#!/usr/bin/env ruby

# Builds shell scripts for a set of sql scripts.

Dir["mas_lo?_*_score.sql","mas_bh_*_score.sql"].sort.each do |s|
  s =~ /(mas.*).(sql)/
  script_name = "#{$1}.sh"
shell_syntax = <<ENDshell_syntax
#!/bin/sh
# #{script_name}
sqlplus trade/t <<EOF
@#{s}
EOF
ENDshell_syntax
  (fhw = File.open(script_name,"w")).write(shell_syntax); fhw.close
end

Notice that the above ruby script depends on a set of SQL scripts. Those scripts are listed below:

oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ ll mas*score.sql
-rw-r--r-- 1 oracle dba  4468341 2009-04-05 01:39 mas_bh_DIA_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_EFA_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_MDY_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_OIH_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_SHY_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_SPY_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_XLB_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_XLE_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_XLF_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_XLI_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_XLK_score.sql
-rw-r--r-- 1 oracle dba  4462951 2009-04-05 01:39 mas_bh_XLV_score.sql
-rw-r--r-- 1 oracle dba 13962050 2009-04-11 03:00 mas_loc_DIA_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_EFA_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_MDY_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_OIH_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_SHY_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_SPY_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLB_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLE_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLF_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLI_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLK_score.sql
-rw-r--r-- 1 oracle dba 13945208 2009-04-11 03:00 mas_loc_XLV_score.sql
-rw-r--r-- 1 oracle dba 15528860 2009-04-11 03:00 mas_los_DIA_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_EFA_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_MDY_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_OIH_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_SHY_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:00 mas_los_SPY_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLB_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLE_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLF_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLI_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLK_score.sql
-rw-r--r-- 1 oracle dba 15510128 2009-04-11 03:01 mas_los_XLV_score.sql
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

Notice that these are large SQL scripts. Two simple shell commands reveal that the very last script required 12 hours to run:

oracle@zareason:~/vlsql/log$ 
oracle@zareason:~/vlsql/log$ 
oracle@zareason:~/vlsql/log$ head mas_los_XLV_score.sh.out.txt | grep 2009
SQL*Plus: Release 11.1.0.6.0 - Production on Thu May 21 04:24:07 2009
oracle@zareason:~/vlsql/log$ 
oracle@zareason:~/vlsql/log$ ll mas_los_XLV_score.sh.out.txt
-rw-r--r-- 1 oracle dba 244307039 2009-05-21 18:20 mas_los_XLV_score.sh.out.txt
oracle@zareason:~/vlsql/log$ 
oracle@zareason:~/vlsql/log$ 

The "loc" scripts each ran in about 11.5 hours.
The "bh" scripts each ran in about 4 hours.

The above list of scripts required this amount of time to run:

(11.5hr/script + 4hr/script + 12hr/script) x 12 scripts = 330 hours

330 hr x (1 day / 24 hr) = 13.7 days

The three types of scripts ("bh","loc", and "los") where written with
the help of two ruby scripts.

The first ruby script is listed below:


oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ cat mas_bh.rb
#!/usr/bin/env ruby

# mas_bh.rb

# Demo: ruby -r mas_bh.rb -e 'MasBh.new.write_score_sql_scripts'

# Ruby script which uses nested loops to generate several large SQL scripts.
# Each SQL script is a wrapper for the SVM algorithm resident within Oracle Data Miner.

class MasBh
require 'rubygems'
require 'oci8'
  def write_score_sql_scripts
    # tkrs_a will hold a list of unique tkr names resident within the stock-price-time-series data.
    tkrs_a = []
    (oci8conn = OCI8.new('trade','t')).exec("SELECT DISTINCT tkr FROM etfmas ORDER BY tkr"){ |r| tkrs_a << r.to_s }
    # See em.
    p(tkrs_a)

    # holding_periods_a holds integers representing days a tkr is held before its percentage gain is measured.
    holding_periods_a = ["5","10","20"]

    # pp is acronym for Pivot Point.  It is not used by the BH strategy but we set it here so it will propagate into the mas_predictions table.
    # Having it in that table makes it easier to compare BH strategy to LOC and LOS strategies.
    pp="20"

    # For bh strategy, build-model, and then score.
    # The score will give us the probability that the pct-gain will be >= (the average + 1 std. deviation).
    strategy="bh"
    tkrs_a.each do |tkr|
      sqlscript_name = "mas_#{strategy}_#{tkr}_score.sql"
      p(sqlscript_name)
      (fhw = File.open(sqlscript_name,"w")).write("--\n-- #{sqlscript_name}\n--\n\n"); fhw.close
      holding_periods_a.each do |hp|
        some_dates_a = []
        oci8conn.exec("SELECT TO_CHAR(ydate,'YYYY-MM-DD')FROM etfmas WHERE tkr='#{tkr}'AND ydate>TO_DATE('2002-09-01','YYYY-MM-DD')AND ydate<TO_DATE('2009-04-01','YYYY-MM-DD')ORDER BY ydate"){ |r| some_dates_a << r.to_s }
        some_dates_a.each do |adate|

sqlplus_syntax = <<END
--
CREATE OR REPLACE view sme AS SELECT
tkr_date
,NULL aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND ydate='#{adate}'
/

CREATE OR REPLACE view bme AS SELECT
tkr_date
,CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE aa_targ_att END aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND 5+#{hp}+ydate<'#{adate}'
/


@cr_bme.sql #{strategy} #{hp} #{pp}
PURGE RECYCLEBIN;
--
END
          (fha = File.open(sqlscript_name,"a")).write(sqlplus_syntax); fha.close
        end # tkrs_a.each
      end # some_dates_a.each
    end # holding_periods_a.each
  end # def write_score_sql_scripts
end # class MasBh

MasBh.new.write_score_sql_scripts
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

The comments in the above script adequately describe it.

Sample syntax which the script had generated is displayed below:



oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ head -50 mas_bh_DIA_score.sql
--
-- mas_bh_DIA_score.sql
--

--
CREATE OR REPLACE view sme AS SELECT
tkr_date
,NULL aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND ydate='2002-09-03'
/

CREATE OR REPLACE view bme AS SELECT
tkr_date
,CASE WHEN pct_gain5>(SELECT AVG(pct_gain5)+STDDEV(pct_gain5)FROM mas WHERE tkr='DIA')THEN'up'ELSE aa_targ_att END aa_targ_att
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND 5+5+ydate<'2002-09-03'
/


@cr_bme.sql bh 5 20
PURGE RECYCLEBIN;
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

The first view, sme, contains a column named aa_targ_att.  In SVM
jargon that column is the target attribute.  We intend to predict its
value.  Notice the predicate of the sme view.  It constrains the data
such that we have WHERE tkr='DIA'AND ydate='2002-09-03'.  This helps
us simulate the idea that we are watching the market for DIA close at
4pm on 2002-09-03.  As the clock strikes 4, we make note of the
closing price of DIA.  With that price we can compute other column
values such as yprice_avg5slp.  Since we want to predict the target
attribute of 1 ETF for just one day it makes sense that the sme view
contains only 1 row.

The second view, bme, contains many rows and thus many values in the
column named aa_targ_att.  These rows are fed to SVM as "training data" 
so that it may predict the 1 value of aa_targ_att in the 1 row of sme.

The actual implementation of SVM is done by the call of the cr_bme.sql script.

Notice in the above example that we feed 3 command line parameters to
the cr_bme.sql script:

@cr_bme.sql bh 5 20

The first parameter is bh which corresponds to the Buy and Hold strategy.

The second parameter is 5 which corresponds to a 5 trading-day holding period.

The third parameter is 20 which corresponds to a 20 trading-day pivot-point.

Keep in mind that the Buy and Hold strategy does not depend on pivot-point.

We set it here, however, because the cr_bme.sql script wants to see it.

The cr_bme.sql script does need the pivot-point value when we use it
to score "loc" and "los" data.

The contents of the cr_bme.sql script are displayed later in this paper.

As mentioned above we have two ruby scripts.  The first script, just
described, is used to generate SQL statements for just the Buy and
Hold strategy.

The second script is used to generate SQL statements for both the
"loc" and "los" strategies.

The second ruby script is displayed below:


#!/usr/bin/env ruby

# mas.rb

# Demo: ruby -r mas.rb -e 'Mas.new.write_score_sql_scripts'

# Ruby script which uses nested loops to generate several SQL scripts.
# Each script is a wrapper for the SVM algorithm resident within Oracle Data Miner.

# This ruby script is a tool to study possible strategies for trading individual ETFs.
# Also it aims to answer the question,
# "Can SVM effectively tell us to abandon all strategies and move to cash during certain conditions?"

# This Ruby script helps me deal with a cartesian product of these attributes:
# Strategy (Long Or Cash, Long Or Short)
# Strategy Pivot Points (5day-Moving-Avg-Slope, 10day-Moving-Avg-Slope, 20day-Moving-Avg-Slope)
# Holding Period (days a tkr is held before its percentage gain is measured)
# Dates in a time-series (ydate)
# Stock Ticker (tkr).  Currently some ETF symbols: SPY, DIA, EFA, MDY, XLE, ...

class Mas
require 'rubygems'
require 'oci8'
  def write_score_sql_scripts
    # tkrs_a will hold a list of unique tkr names resident within the stock-price-time-series data.
    tkrs_a = []
    (oci8conn = OCI8.new('trade','t')).exec("SELECT DISTINCT tkr FROM etfmas ORDER BY tkr"){ |r| tkrs_a << r.to_s }
    # See em.
    p(tkrs_a)

    # pivot_points_a holds integers representing a type of moving average slope.
    # Some strategies (loc, los) depend upon a pivot point.
    # For example, The loc-Trader will watch the moving average slope.  Once it moves above zero, he will go Long.
    # Once it moves below zero, he will sell so that he holds cash.
    pivot_points_a = ["5","10","20"]

    # holding_periods_a holds integers representing days a tkr is held before its percentage gain is measured.
    holding_periods_a = ["5","10","20"]

    # strategies_a holds a list of acronyms for strategies.
    # Some of them are:
    # bh = Buy and Hold aka Long
    # loc = Long Or Cash
    # los = Long Or Short
    strategies_a = ['loc','los']

    # For each strategy, build-model, and then score.
    # The score will give us the probability that the pct-gain will be above a threshold dependent on std. deviation.
    strategies_a.each do |strategy|
      tkrs_a.each do |tkr|
        sqlscript_name = "mas_#{strategy}_#{tkr}_score.sql"
        (fhw = File.open(sqlscript_name,"w")).write("--\n-- #{sqlscript_name}\n--\n\n"); fhw.close
        pivot_points_a.each do |pp|
          holding_periods_a.each do |hp|
            p("strategy: #{strategy}  tkr: #{tkr} pivot_point: #{pp} holding_period: #{hp}")
            some_dates_a = []
            oci8conn.exec("SELECT TO_CHAR(ydate,'YYYY-MM-DD')FROM etfmas WHERE tkr='#{tkr}'AND ydate>TO_DATE('2002-09-01','YYYY-MM-DD')AND ydate<TO_DATE('2009-04-01','YYYY-MM-DD')ORDER BY ydate"){ |r| some_dates_a << r.to_s }
            some_dates_a.each do |adate|


sqlplus_syntax1 = <<ENDsqlplus_syntax1
CREATE OR REPLACE view sme AS SELECT
NULL aa_targ_att
,tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND ydate='#{adate}'
/
CREATE OR REPLACE view bme AS SELECT
ENDsqlplus_syntax1

              # Here the aa_targ_att column depends on my choice of strategy
              case strategy
                when "loc"
aa_targ_att_syntax = <<ENDloc_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN'nup'
ELSE
  CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE'nup'END
END aa_targ_att
ENDloc_syntax
                when "los"
aa_targ_att_syntax = <<ENDlos_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN
  CASE WHEN pct_gain#{hp}<(SELECT AVG(pct_gain#{hp})-STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE'nup'END
ELSE
  CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas WHERE tkr='#{tkr}')THEN'up'ELSE'nup'END
END aa_targ_att
ENDlos_syntax
              end

              # Mix together the syntax
sqlplus_syntax = <<ENDsqlplus_syntax
#{sqlplus_syntax1}#{aa_targ_att_syntax},tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='#{tkr}'AND 5+#{hp}+ydate<'#{adate}'
/

@cr_bme.sql #{strategy} #{hp} #{pp}
PURGE RECYCLEBIN;

--
ENDsqlplus_syntax
              #Use cr_bme.sql to build a model from bme table and then score the sme table.
              #Since we are scoring only 1 tkr for 1 day, we expect only 1 prediction.
              (fha = File.open(sqlscript_name,"a")).write(sqlplus_syntax); fha.close
            end # tkrs_a.each
          end # some_dates_a.each
        end # holding_periods_a.each
      end # pivot_points_a.each
    end # strategies_a.each
  end # def write_score_sql_scripts
end # class Mas

Mas.new.write_score_sql_scripts
# end of ruby script

The comments in the above script adequately describe it. Sample syntax which the Ruby script had generated is displayed below:

oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ head -50 mas_loc_DIA_score.sql
--
-- mas_loc_DIA_score.sql
--

CREATE OR REPLACE view sme AS SELECT
NULL aa_targ_att
,tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND ydate='2002-09-03'
/
CREATE OR REPLACE view bme AS SELECT
CASE WHEN yprice_avg5slp<0 THEN'nup'
ELSE
  CASE WHEN pct_gain5>(SELECT AVG(pct_gain5)+STDDEV(pct_gain5)FROM mas WHERE tkr='DIA')THEN'up'ELSE'nup'END
END aa_targ_att
,tkr_date
,dayspast
,yprice
,yvol
,yprice_avg5
,yprice_avg10
,yprice_avg20
,yprice_avg5slp
,yprice_avg10slp
,yprice_avg20slp
,shy_yprice_avg5slp
,xle_yprice_avg5slp
,shy_yprice_avg10slp
,xle_yprice_avg10slp
,shy_yprice_avg20slp
,xle_yprice_avg20slp
FROM mas WHERE tkr='DIA'AND 5+5+ydate<'2002-09-03'
/

@cr_bme.sql loc 5 5
PURGE RECYCLEBIN;
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

Notice that both of the ruby scripts generate SQL syntax which calls the cr_bme.sql script. The contents of the cr_bme.sql script are displayed below:

oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ cat cr_bme.sql
--
-- cr_bme.sql
--

-- usage: cr_bme.sql strategy holding_period pivot_point

-- A template script which accepts a pattern on the command line which is used at various places in the script to name related things.
-- Demo: @cr_bme.sql los 20 10

DEFINE strategy       = '&&1'
DEFINE holding_period = '&&2'
DEFINE pivot_point    = '&&3'

DEFINE target     = 'aa_targ_att'
DEFINE model_name = 'mas_model'
DEFINE bldtable   = 'bme'
DEFINE scoretable = 'sme'
DEFINE case_id    = 'tkr_date'

-- Define a variable to help me exclude some columns from some data mining calls
DEFINE exclude1 = "'&target','&case_id'"

-- Builds an SVM model using pl/sql.

-----------------------------------------------------------------------
--                            BUILD THE MODEL
-----------------------------------------------------------------------

-- Cleanup old build data preparation objects for repeat runs
DELETE svmc_miss_num;
DELETE svmc_miss_cat;
DELETE svmc_clip;
DELETE svmc_norm;
DELETE svmc_settings;

BEGIN EXECUTE IMMEDIATE 'DROP VIEW svmc_winsor';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW svmc_build_prep';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_build_miss_num';
EXCEPTION WHEN OTHERS THEN NULL; END;
/

BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_build_miss_cat';
EXCEPTION WHEN OTHERS THEN NULL; END;
/


--------------------------------
-- PREPARE BUILD (TRAINING) DATA
--

-- 1. Missing Value treatment for all Predictors and
-- 2. Outlier Treatment and
-- 3. Normalization are performed below.
--    NOTE: that unlike SVM regression, the classification target is NOT
--    normalized here.

BEGIN
  -- Perform missing value treatment for all predictors
  -- create miss tables
--  DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_NUM (miss_table_name => 'svmc_miss_num');
--  DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_CAT (miss_table_name => 'svmc_miss_cat');

  -- populate miss tables
  DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_NUM_MEAN (
    miss_table_name => 'svmc_miss_num',
    data_table_name => '&bldtable',
    exclude_list    => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));

  DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE (
    miss_table_name => 'svmc_miss_cat',
    data_table_name => '&bldtable',
    exclude_list    => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));

  -- xform input data to replace missing values
  DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM(
    miss_table_name => 'svmc_miss_num',
    data_table_name => '&bldtable',
    xform_view_name => 'xformed_build_miss_num');
  DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT(
    miss_table_name => 'svmc_miss_cat',
    data_table_name => '&bldtable',
    xform_view_name => 'xformed_build_miss_cat');

  -- Perform outlier treatment.
  -- create clip table
  --  DBMS_DATA_MINING_TRANSFORM.CREATE_CLIP (clip_table_name => 'svmc_clip');

  -- populate clip table
  DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_WINSOR_TAIL (
    clip_table_name => 'svmc_clip',
    data_table_name => '&bldtable',
    tail_frac       => 0.025,
    exclude_list    => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));

  -- xform input data to winsorized data
  DBMS_DATA_MINING_TRANSFORM.XFORM_CLIP(
    clip_table_name => 'svmc_clip',
    data_table_name => '&bldtable',
    xform_view_name => 'svmc_winsor');

  -- create normalization table
  -- DBMS_DATA_MINING_TRANSFORM.CREATE_NORM_LIN (norm_table_name => 'svmc_norm');

  -- populate normalization table based on winsorized data
  DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_MINMAX (
    norm_table_name => 'svmc_norm',
    data_table_name => 'svmc_winsor',
    exclude_list    => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (&exclude1));

  -- normalize the original data
  DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN (
    norm_table_name => 'svmc_norm',
    data_table_name => '&bldtable',
    xform_view_name => 'svmc_build_prep');
END;
/

-- CREATE AND POPULATE A SETTINGS TABLE
--
-- DROP   TABLE svmc_settings ;
-- CREATE TABLE svmc_settings (
--   setting_name  VARCHAR2(30),
--   setting_value VARCHAR2(30));

-- The default classification algorithm is Naive Bayes. So override
-- this choice to SVM using a settings table.
-- SVM chooses a kernel type automatically. This choice can be overriden
-- by the user. Linear kernel is preferred high dimensional data, and
-- Gaussian kernel for low dimensional data. Here we use linear kernel
-- to demonstrate the get_model_details_svm() API, which applies only for
-- models.
--
BEGIN
-- Populate settings table
  INSERT INTO svmc_settings (setting_name, setting_value) VALUES
  (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines);
  INSERT INTO svmc_settings (setting_name, setting_value) VALUES
  (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_gaussian);
  -- 99123123 is 99,123,123 is 99mb
  INSERT INTO svmc_settings (setting_name, setting_value) VALUES
  (dbms_data_mining.svms_kernel_cache_size,222123123);
--  (dbms_data_mining.svms_kernel_cache_size,99123123);
--  (dbms_data_mining.svms_kernel_cache_size,333123123);
  -- Examples of other possible overrides are:
  -- (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_linear);
  --(dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_complexity_factor);
  COMMIT;
END;
/

---------------------
-- CREATE A NEW MODEL
--
-- Cleanup old model with the same name for repeat runs
BEGIN DBMS_DATA_MINING.DROP_MODEL('&model_name');
EXCEPTION WHEN OTHERS THEN NULL; END;
/

-- Build a new SVM Model
BEGIN
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name          => '&model_name',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'svmc_build_prep',
    case_id_column_name => '&case_id',
    target_column_name  => '&target',
    settings_table_name => 'svmc_settings');
END;
/

-----------------------------------------------------------------------
--                               APPLY/score THE MODEL
-----------------------------------------------------------------------

-- Cleanup old scoring data preparation objects for repeat runs
BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_apply_miss_num';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW xformed_apply_miss_cat';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
BEGIN EXECUTE IMMEDIATE 'DROP VIEW svmc_apply_prep';
EXCEPTION WHEN OTHERS THEN NULL; END;
/
-----------------------
-- PREPARE SCORING DATA
--
-- If the data for model creation has been prepared, then the data
-- to be scored using the model must be prepared in the same manner
-- in order to obtain meaningful results.
--
-- 1. Missing Value treatment for all Predictors and
-- 2. Normalization
-- No outlier treatment will be performed during test and apply. The
-- normalization step is sufficient, since the normalization parameters
-- already capture the effects of outlier treatment done with build data.
--
BEGIN
  -- Xform Test data to replace missing values
  DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM(
    miss_table_name => 'svmc_miss_num',
    data_table_name => '&scoretable',
    xform_view_name => 'xformed_apply_miss_num');

  DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT(
    miss_table_name => 'svmc_miss_cat',
    data_table_name => '&scoretable',
    xform_view_name => 'xformed_apply_miss_cat');

  -- Normalize the data to be scored
  DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN (
    norm_table_name => 'svmc_norm',
    data_table_name => '&scoretable',
    xform_view_name => 'svmc_apply_prep');
END;
/

-- Get prob and the prediction:
INSERT INTO mas_predictions (tkr_date,strategy,holding_period,pivot_point,prediction,prob_up,prob_nup,run_date)
SELECT
tkr_date,'&strategy','&holding_period','&pivot_point'
,PREDICTION(            &model_name   USING *)prediction
,PREDICTION_PROBABILITY(&model_name,'up' USING *)prob_up
,PREDICTION_PROBABILITY(&model_name,'nup' USING *)prob_nup
,sysdate
FROM svmc_apply_prep
/
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

We see above, that the last SELECT statement is where SVM is actually
implemented.  All of the syntax above that statement completes mildly
complex data manipulations.  It transforms the data into a form which
SVM can digest.

It is obvious from the SELECT statement that each SVM prediction is 3 data values:
  - An "up" prediction
  - Or, a "nup" prediction (meaning "not-up")
  - Probability of "up"
  - Probability of "nup"

We should note that prob_up + prob_nup is always equal to 1.
So, we really only need one of the probabilities.

Both the PREDICTION() function and the PREDICTION_PROBABILITY()
function are documented in the Oracle SQL Reference and the Oracle
Data Mining Guide:
  - http://www.google.com/search?q=prediction_probability+site:oracle.com

Notice that the INSERT statement is loading the predictions and
probabilities into a table named mas_predictions.

Also note that we generated the predictions twice with the intention
of using an average of each prediction.

During the second gathering of predictions we removed 1 day of data
from the data set sent to SVM in order to prevent the generation of an
identical set of predictions.

We display some data from mas_predictions below:


13:22:27 SQL> 
13:22:28 SQL> 
13:22:28 SQL> 
13:22:29 SQL> 
13:22:33 SQL> 
13:22:33 SQL> @qry_mas_predictions.sql
13:22:39 SQL> SET ECHO ON
13:22:39 SQL> --
13:22:39 SQL> -- qry_mas_predictions.sql
13:22:39 SQL> --
13:22:39 SQL> 
13:22:39 SQL> -- This script displays interesting data from mas_predictions table.
13:22:39 SQL> -- The mas_predictions table contains predictions of classifications of the mas table.
13:22:39 SQL> -- The two classifications are 'up' and 'nup'.
13:22:39 SQL> -- The original classification values reside in the column: bme.aa_targ_att
13:22:39 SQL> -- The predicted classification values reside in the column: mas_predictions.prediction
13:22:39 SQL> -- Each prediction comes with a confidence level which resides in two columns:
13:22:39 SQL> -- -- prob_up and prob_nup
13:22:39 SQL> -- Be aware that prob_up + prob_nup is equal to 1.
13:22:39 SQL> 
13:22:39 SQL> SET LINES 55
13:22:39 SQL> DESCRIBE mas_predictions
 Name                       Null?    Type
 -------------------------- -------- ------------------
 TKR_DATE                            VARCHAR2(19)
 STRATEGY                            VARCHAR2(9)
 HOLDING_PERIOD                      NUMBER
 PIVOT_POINT                         NUMBER
 PREDICTION                          VARCHAR2(11)
 PROB_UP                             NUMBER
 PROB_NUP                            NUMBER
 RUN_DATE                            DATE

13:22:39 SQL> 
13:22:39 SQL> -- Count the tkrs
13:22:39 SQL> SELECT tkr,COUNT(tkr)FROM
13:22:39   2  (
13:22:39   3  SELECT
13:22:39   4  REGEXP_SUBSTR(tkr_date,'...')tkr
13:22:39   5  ,strategy
13:22:39   6  ,holding_period
13:22:39   7  ,pivot_point
13:22:39   8  ,prediction
13:22:39   9  ,prob_up
13:22:39  10  ,prob_nup
13:22:39  11  ,run_date
13:22:39  12  FROM mas_predictions
13:22:39  13  )
13:22:39  14  GROUP BY tkr ORDER BY tkr
13:22:39  15  /

TKR COUNT(TKR)
--- ----------
DIA      69108
EFA      68611
MDY      69078
OIH      68739
SHY      69140
SPY      69254
XLB      68648
XLE      67646
XLF      68918
XLI      69056
XLK      68985
XLV      69552

12 rows selected.

Elapsed: 00:00:00.00
13:22:39 SQL> 
13:22:39 SQL> -- Drill down into counts of strategy, holding_period, and pivot_point
13:22:39 SQL> SET LINES 77 PAGES 55
13:22:39 SQL> COLUMN tkr FORMAT A3
13:22:39 SQL> SELECT tkr, strategy, holding_period, pivot_point, COUNT(tkr)FROM
13:22:39   2  (
13:22:39   3  SELECT
13:22:39   4  REGEXP_SUBSTR(tkr_date,'...')tkr
13:22:39   5  ,strategy
13:22:39   6  ,holding_period
13:22:39   7  ,pivot_point
13:22:39   8  ,prediction
13:22:39   9  ,prob_up
13:22:39  10  ,prob_nup
13:22:39  11  ,run_date
13:22:39  12  FROM mas_predictions
13:22:39  13  )
13:22:39  14  GROUP BY tkr,strategy,holding_period,pivot_point
13:22:39  15  ORDER BY tkr,strategy,holding_period,pivot_point
13:22:39  16  /

TKR STRATEGY  HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
DIA bh                     5          20       3312
DIA bh                    10          20       3312
DIA bh                    20          20       3244
DIA loc                    5           5       3312
DIA loc                    5          10       3312
DIA loc                    5          20       3312
DIA loc                   10           5       3312
DIA loc                   10          10       3312
DIA loc                   10          20       3312
DIA loc                   20           5       3234
DIA loc                   20          10       3220
DIA loc                   20          20       3202
DIA los                    5           5       3312
DIA los                    5          10       3312
DIA los                    5          20       3312
DIA los                   10           5       3312
DIA los                   10          10       3312
DIA los                   10          20       3312
DIA los                   20           5       3284
DIA los                   20          10       3284
DIA los                   20          20       3272
EFA bh                     5          20       3312
EFA bh                    10          20       3312
EFA bh                    20          20       3244
EFA loc                    5           5       3312
EFA loc                    5          10       3244
EFA loc                    5          20       3214
EFA loc                   10           5       3312
EFA loc                   10          10       3312
EFA loc                   10          20       3218
EFA loc                   20           5       3214
EFA loc                   20          10       3224
EFA loc                   20          20       2996
EFA los                    5           5       3312
EFA los                    5          10       3308
EFA los                    5          20       3299
EFA los                   10           5       3312
EFA los                   10          10       3312
EFA los                   10          20       3284
EFA los                   20           5       3284
EFA los                   20          10       3284
EFA los                   20          20       3302
MDY bh                     5          20       3312
MDY bh                    10          20       3312
MDY bh                    20          20       3312
MDY loc                    5           5       3312
MDY loc                    5          10       3312
MDY loc                    5          20       3312
MDY loc                   10           5       3312
MDY loc                   10          10       3312
MDY loc                   10          20       3194
MDY loc                   20           5       3192

TKR STRATEGY  HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
MDY loc                   20          10       3312
MDY loc                   20          20       3192
MDY los                    5           5       3312
MDY los                    5          10       3312
MDY los                    5          20       3312
MDY los                   10           5       3312
MDY los                   10          10       3312
MDY los                   10          20       3274
MDY los                   20           5       3284
MDY los                   20          10       3312
MDY los                   20          20       3262
OIH bh                     5          20       3312
OIH bh                    10          20       3312
OIH bh                    20          20       3312
OIH loc                    5           5       3312
OIH loc                    5          10       3312
OIH loc                    5          20       3294
OIH loc                   10           5       3312
OIH loc                   10          10       3312
OIH loc                   10          20       3214
OIH loc                   20           5       2966
OIH loc                   20          10       3230
OIH loc                   20          20       3200
OIH los                    5           5       3312
OIH los                    5          10       3312
OIH los                    5          20       3303
OIH los                   10           5       3312
OIH los                   10          10       3312
OIH los                   10          20       3280
OIH los                   20           5       3292
OIH los                   20          10       3264
OIH los                   20          20       3264
SHY bh                     5          20       3309
SHY bh                    10          20       3310
SHY bh                    20          20       3280
SHY loc                    5           5       3309
SHY loc                    5          10       3308
SHY loc                    5          20       3308
SHY loc                   10           5       3306
SHY loc                   10          10       3274
SHY loc                   10          20       3274
SHY loc                   20           5       3280
SHY loc                   20          10       3280
SHY loc                   20          20       3280
SHY los                    5           5       3312
SHY los                    5          10       3308
SHY los                    5          20       3308
SHY los                   10           5       3306
SHY los                   10          10       3274
SHY los                   10          20       3274
SHY los                   20           5       3280
SHY los                   20          10       3280

TKR STRATEGY  HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
SHY los                   20          20       3280
SPY bh                     5          20       3312
SPY bh                    10          20       3312
SPY bh                    20          20       3312
SPY loc                    5           5       3312
SPY loc                    5          10       3312
SPY loc                    5          20       3312
SPY loc                   10           5       3312
SPY loc                   10          10       3312
SPY loc                   10          20       3194
SPY loc                   20           5       3312
SPY loc                   20          10       3312
SPY loc                   20          20       3202
SPY los                    5           5       3312
SPY los                    5          10       3312
SPY los                    5          20       3312
SPY los                   10           5       3312
SPY los                   10          10       3312
SPY los                   10          20       3284
SPY los                   20           5       3312
SPY los                   20          10       3312
SPY los                   20          20       3270
XLB bh                     5          20       3312
XLB bh                    10          20       3312
XLB bh                    20          20       3244
XLB loc                    5           5       3242
XLB loc                    5          10       3312
XLB loc                    5          20       3214
XLB loc                   10           5       3234
XLB loc                   10          10       3312
XLB loc                   10          20       3190
XLB loc                   20           5       3192
XLB loc                   20          10       3202
XLB loc                   20          20       3194
XLB los                    5           5       3304
XLB los                    5          10       3312
XLB los                    5          20       3306
XLB los                   10           5       3306
XLB los                   10          10       3312
XLB los                   10          20       3280
XLB los                   20           5       3294
XLB los                   20          10       3290
XLB los                   20          20       3284
XLE bh                     5          20       3312
XLE bh                    10          20       3312
XLE bh                    20          20       3312
XLE loc                    5           5       3312
XLE loc                    5          10       3312
XLE loc                    5          20       2940
XLE loc                   10           5       3312
XLE loc                   10          10       3312
XLE loc                   10          20       2944

TKR STRATEGY  HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
XLE loc                   20           5       2946
XLE loc                   20          10       3312
XLE loc                   20          20       2944
XLE los                    5           5       3312
XLE los                    5          10       3312
XLE los                    5          20       3298
XLE los                   10           5       3312
XLE los                   10          10       3312
XLE los                   10          20       3280
XLE los                   20           5       3294
XLE los                   20          10       3312
XLE los                   20          20       2944
XLF bh                     5          20       3312
XLF bh                    10          20       3312
XLF bh                    20          20       3312
XLF loc                    5           5       3312
XLF loc                    5          10       3312
XLF loc                    5          20       3194
XLF loc                   10           5       3312
XLF loc                   10          10       3312
XLF loc                   10          20       3194
XLF loc                   20           5       3312
XLF loc                   20          10       3312
XLF loc                   20          20       3012
XLF los                    5           5       3312
XLF los                    5          10       3312
XLF los                    5          20       3284
XLF los                   10           5       3312
XLF los                   10          10       3312
XLF los                   10          20       3284
XLF los                   20           5       3312
XLF los                   20          10       3312
XLF los                   20          20       3270
XLI bh                     5          20       3312
XLI bh                    10          20       3312
XLI bh                    20          20       3230
XLI loc                    5           5       3312
XLI loc                    5          10       3312
XLI loc                    5          20       3312
XLI loc                   10           5       3312
XLI loc                   10          10       3312
XLI loc                   10          20       3312
XLI loc                   20           5       3220
XLI loc                   20          10       3194
XLI loc                   20          20       3194
XLI los                    5           5       3312
XLI los                    5          10       3312
XLI los                    5          20       3312
XLI los                   10           5       3312
XLI los                   10          10       3312
XLI los                   10          20       3312
XLI los                   20           5       3284

TKR STRATEGY  HOLDING_PERIOD PIVOT_POINT COUNT(TKR)
--- --------- -------------- ----------- ----------
XLI los                   20          10       3282
XLI los                   20          20       3284
XLK bh                     5          20       3312
XLK bh                    10          20       3312
XLK bh                    20          20       3252
XLK loc                    5           5       3312
XLK loc                    5          10       3312
XLK loc                    5          20       3294
XLK loc                   10           5       3312
XLK loc                   10          10       3312
XLK loc                   10          20       3224
XLK loc                   20           5       3234
XLK loc                   20          10       3220
XLK loc                   20          20       3214
XLK los                    5           5       3312
XLK los                    5          10       3312
XLK los                    5          20       3303
XLK los                   10           5       3312
XLK los                   10          10       3312
XLK los                   10          20       3284
XLK los                   20           5       3284
XLK los                   20          10       3284
XLK los                   20          20       3272
XLV bh                     5          20       3312
XLV bh                    10          20       3312
XLV bh                    20          20       3312
XLV loc                    5           5       3312
XLV loc                    5          10       3312
XLV loc                    5          20       3312
XLV loc                   10           5       3312
XLV loc                   10          10       3312
XLV loc                   10          20       3312
XLV loc                   20           5       3312
XLV loc                   20          10       3312
XLV loc                   20          20       3312
XLV los                    5           5       3312
XLV los                    5          10       3312
XLV los                    5          20       3312
XLV los                   10           5       3312
XLV los                   10          10       3312
XLV los                   10          20       3312
XLV los                   20           5       3312
XLV los                   20          10       3312
XLV los                   20          20       3312

252 rows selected.

Elapsed: 00:00:00.01

13:22:39 SQL> 
13:22:39 SQL> -- Take a quick look at the predictions
13:22:39 SQL> 
13:22:39 SQL> -- Verify that prob_up + prob_nup is 1
13:22:39 SQL> SELECT AVG(prob_up + prob_nup)FROM mas_predictions;

AVG(PROB_UP+PROB_NUP)
---------------------
                    1

Elapsed: 00:00:00.00
13:22:39 SQL> -- Verify with a ROUND of 4 significant digits
13:22:39 SQL> SELECT COUNT(*)FROM mas_predictions WHERE ROUND((prob_up+prob_nup),4) !=1 ;

  COUNT(*)
----------
         0

Elapsed: 00:00:00.00
13:22:39 SQL> 
13:22:39 SQL> -- Count the predictions with confidence level rounded to 1 significant digit
13:22:39 SQL> SELECT
13:22:39   2  ROUND(prob_up,1)prob_up
13:22:39   3  ,ROUND(prob_nup,1)prob_nup
13:22:39   4  ,prediction
13:22:39   5  ,COUNT(prediction)
13:22:39   6  FROM
13:22:39   7  (
13:22:39   8    SELECT
13:22:39   9    REGEXP_SUBSTR(tkr_date,'...')tkr
13:22:39  10    ,strategy
13:22:39  11    ,holding_period
13:22:39  12    ,pivot_point
13:22:39  13    ,prediction
13:22:39  14    ,prob_up
13:22:39  15    ,prob_nup
13:22:39  16    ,run_date
13:22:39  17    FROM mas_predictions
13:22:39  18  )
13:22:39  19  GROUP BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
13:22:39  20  ORDER BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
13:22:39  21  /

   PROB_UP   PROB_NUP PREDICTION  COUNT(PREDICTION)
---------- ---------- ----------- -----------------
         0          1 nup                      1800
        .1         .9 nup                    154948
        .2         .8 nup                    614280
        .3         .7 nup                     35103
        .4         .6 nup                     10727
        .5         .5 nup                      2873
        .5         .5 up                       2033
        .6         .4 up                       2530
        .7         .3 up                       1417
        .8         .2 up                        775
        .9         .1 up                        247
         1          0 up                          2

12 rows selected.

Elapsed: 00:00:00.00
13:22:39 SQL> 
13:22:40 SQL> 
13:22:41 SQL> 
13:22:41 SQL> 

We look at the above output and ask an obvious question,
"How accurate are the 'up' predictions?"

Percentage gain information for a specific combination of tkr, closing
date, strategy, holding_period, and pivot_point is contained within a
table described earlier in this paper: yfact.

We join mas_predictions with yfact and then query the result:


15:06:37 SQL> @cr_mp_j_yfact.sql
15:07:02 SQL> --
15:07:02 SQL> -- cr_mp_j_yfact.sql
15:07:02 SQL> --
15:07:02 SQL> 
15:07:02 SQL> -- Creates a view, mp_j_yfact, which joins mas_predictions with
15:07:02 SQL> -- yfact.  This view helps us analyze the accuracy of SVM predictions
15:07:02 SQL> -- within the mas_predictions table.
15:07:02 SQL> 
15:07:02 SQL> CREATE OR REPLACE VIEW mp_j_yfact AS SELECT
15:07:02   2  m.prediction
15:07:02   3  ,m.prob_up
15:07:02   4  ,y.pg             -- pct_gain
15:07:02   5  ,y.day_id          -- fk to day_dimension
15:07:02   6  ,y.tkr             -- DIA, EFA, MDY, ...
15:07:02   7  ,y.strategy        -- bh, loc, los
15:07:02   8  ,y.pivot_point     -- 5,10,20
15:07:02   9  ,y.holding_period  -- 5,10,20
15:07:02  10  FROM yfact y, mas_predictions m
15:07:02  11  WHERE REPLACE(m.tkr_date,'-') = y.tkr||y.day_id
15:07:02  12  AND y.strategy = m.strategy
15:07:02  13  AND y.pivot_point = m.pivot_point
15:07:02  14  AND y.holding_period = m.holding_period
15:07:02  15  /

View created.

Elapsed: 00:00:00.22
15:07:02 SQL> 
15:07:02 SQL> -- Now query
15:07:02 SQL> SELECT COUNT(*)FROM mas_predictions;

  COUNT(*)
----------
    826735

Elapsed: 00:00:00.00
15:07:02 SQL> SELECT COUNT(*)FROM yfact;

  COUNT(*)
----------
    544644

Elapsed: 00:00:00.00
15:07:02 SQL> SELECT COUNT(*)FROM mp_j_yfact;

  COUNT(*)
----------
    826735

Elapsed: 00:00:03.78
15:07:06 SQL> 
15:07:06 SQL> -- Look at data rolled up to highest level
15:07:06 SQL> SELECT holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp_j_yfact
15:07:06   2  GROUP BY holding_period ORDER BY holding_period
15:07:06   3  /

HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
-------------- ------------ ----------
             5   -.01880193     277235
            10     .0529347     276588
            20   .153422813     272912

Elapsed: 00:00:03.24
15:07:09 SQL> 
15:07:09 SQL> -- Drill down into prediction
15:07:09 SQL> SELECT    prediction, holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp_j_yfact
15:07:09   2  GROUP BY prediction,holding_period ORDER BY prediction,holding_period
15:07:09   3  /

PREDICTION  HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- -------------- ------------ ----------
nup                      5   -.01533182     276021
nup                     10   .062276532     274374
nup                     20   .167687269     269336
up                       5   -.80778346       1214
up                      10    -1.104769       2214
up                      20   -.92094285       3576

6 rows selected.

Elapsed: 00:00:03.25
15:07:12 SQL> 
15:09:16 SQL> 
15:09:17 SQL> 

It's obvious from the last query that SVM cannot consistently predict
if an ETF resides in the 'up' class.

Compare the "up" class to the "nup" class.

Look at the values for AVG_PCT_GAIN when PREDICTION is "up".

They are all negative!

The ETFs which were predicted by SVM to be "not-up" actually had higher
values for AVG_PCT_GAIN.

Perhaps it would do better if we gave SVM something more useful than
slopes of moving averages but that is the topic of another paper.

Next we display the results of a query which drills down into other dimensions of yfact.


15:21:05 SQL> 
15:21:05 SQL> @qry_mp_j_yfact.sql
15:21:52 SQL> --
15:21:52 SQL> -- qry_mp_j_yfact.sql
15:21:52 SQL> --
15:21:52 SQL> 
15:21:52 SQL> -- Drills down into yfact dimensions of view: mp_j_yfact
15:21:52 SQL> 
15:21:52 SQL> SELECT
15:21:52   2  tkr,prediction,strategy,pivot_point,holding_period
15:21:52   3  ,ROUND(AVG(pg),3)avg_pct_gain
15:21:52   4  ,COUNT(*)FROM mp_j_yfact
15:21:52   5  GROUP BY tkr,prediction,strategy,pivot_point,holding_period
15:21:52   6  ORDER BY tkr,prediction,strategy,pivot_point,holding_period
15:21:52   7  /

TKR         PREDICTION  STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
DIA         nup         bh                   20              5         .046       3305
DIA         nup         bh                   20             10         .089       3287
DIA         nup         bh                   20             20         .119       3127
DIA         nup         loc                   5              5        -.037       3312
DIA         nup         loc                   5             10         -.02       3308
DIA         nup         loc                   5             20        -.002       3192
DIA         nup         loc                  10              5        -.016       3312
DIA         nup         loc                  10             10         .028       3304
DIA         nup         loc                  10             20         .102       3187
DIA         nup         loc                  20              5          .02       3312
DIA         nup         loc                  20             10         .043       3312
DIA         nup         loc                  20             20         .106       3180
DIA         nup         los                   5              5        -.115       3294
DIA         nup         los                   5             10         -.12       3305
DIA         nup         los                   5             20        -.082       3207
DIA         nup         los                  10              5        -.106       3257
DIA         nup         los                  10             10        -.003       3258
DIA         nup         los                  10             20         .158       3206
DIA         nup         los                  20              5        -.012       3307
DIA         nup         los                  20             10         .042       3292
DIA         nup         los                  20             20         .152       3192
DIA         up          bh                   20              5        1.952          7
DIA         up          bh                   20             10        -.039         25
DIA         up          bh                   20             20          -.4        117
DIA         up          loc                   5             10        1.309          4
DIA         up          loc                   5             20          .23         42
DIA         up          loc                  10             10        1.071          8
DIA         up          loc                  10             20         .145         33
DIA         up          loc                  20             20        -.291         22
DIA         up          los                   5              5       -1.754         18
DIA         up          los                   5             10       -2.364          7
DIA         up          los                   5             20       -2.724         77
DIA         up          los                  10              5         1.32         55
DIA         up          los                  10             10       -1.546         54
DIA         up          los                  10             20        -3.03         78
DIA         up          los                  20              5         .925          5
DIA         up          los                  20             10       -7.344         20
DIA         up          los                  20             20        -2.12         80
EFA         nup         bh                   20              5          .14       3289
EFA         nup         bh                   20             10          .24       3254
EFA         nup         bh                   20             20          .46       3156
EFA         nup         loc                   5              5        -.018       3311
EFA         nup         loc                   5             10         .108       3310
EFA         nup         loc                   5             20         .332       3187
EFA         nup         loc                  10              5         .107       3244
EFA         nup         loc                  10             10         .231       3305
EFA         nup         loc                  10             20         .503       3178
EFA         nup         loc                  20              5         .195       3210
EFA         nup         loc                  20             10         .389       3196
EFA         nup         loc                  20             20         .741       2938
EFA         nup         los                   5              5        -.155       3299
EFA         nup         los                   5             10        -.014       3302
EFA         nup         los                   5             20         .296       3190
EFA         nup         los                  10              5         .093       3262
EFA         nup         los                  10             10         .249       3272
EFA         nup         los                  10             20         .676       3170
EFA         nup         los                  20              5          .29       3240
EFA         nup         los                  20             10         .619       3167
EFA         nup         los                  20             20        1.072       3177
EFA         up          bh                   20              5       -2.645         23
EFA         up          bh                   20             10         .052         58
EFA         up          bh                   20             20        -.368         88
EFA         up          loc                   5              5       -1.006          1
EFA         up          loc                   5             10        -.168          2
EFA         up          loc                   5             20        -.387         27
EFA         up          loc                  10             10       -1.181          7
EFA         up          loc                  10             20        -.191         46
EFA         up          loc                  20              5         .019          4
EFA         up          loc                  20             10        -.044         22
EFA         up          loc                  20             20         .378         58
EFA         up          los                   5              5        -.714         13
EFA         up          los                   5             10       -2.401         10
EFA         up          los                   5             20       -3.033         94
EFA         up          los                  10              5         .318         46
EFA         up          los                  10             10       -2.229         40
EFA         up          los                  10             20       -3.074        114
EFA         up          los                  20              5       -1.955         59
EFA         up          los                  20             10       -2.766        117
EFA         up          los                  20             20       -2.353        125
MDY         nup         bh                   20              5         .104       3296
MDY         nup         bh                   20             10         .189       3259
MDY         nup         bh                   20             20         .372       3293
MDY         nup         loc                   5              5        -.062       3312
MDY         nup         loc                   5             10         .036       3312
MDY         nup         loc                   5             20         .175       3190
MDY         nup         loc                  10              5         .012       3312
MDY         nup         loc                  10             10           .1       3312
MDY         nup         loc                  10             20         .309       3303
MDY         nup         loc                  20              5           .1       3312
MDY         nup         loc                  20             10         .198       3187
MDY         nup         loc                  20             20         .341       3192
MDY         nup         los                   5              5        -.229       3312
MDY         nup         los                   5             10        -.118       3307
MDY         nup         los                   5             20        -.018       3265
MDY         nup         los                  10              5        -.058       3299
MDY         nup         los                  10             10            0       3300

TKR         PREDICTION  STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
MDY         nup         los                  10             20         .297       3277
MDY         nup         los                  20              5         .111       3278
MDY         nup         los                  20             10          .22       3240
MDY         nup         los                  20             20         .413       3225
MDY         up          bh                   20              5         .367         16
MDY         up          bh                   20             10         .646         53
MDY         up          bh                   20             20       -1.939         19
MDY         up          loc                   5             20       -3.671          2
MDY         up          loc                  10             20        2.428          9
MDY         up          loc                  20             10         .515          7
MDY         up          los                   5             10       -4.499          5
MDY         up          los                   5             20       -4.595         19
MDY         up          los                  10              5       -5.891         13
MDY         up          los                  10             10        1.156         12
MDY         up          los                  10             20       -2.158         35
MDY         up          los                  20              5       -1.426         34
MDY         up          los                  20             10       -4.365         34
MDY         up          los                  20             20       -5.573         37
OIH         nup         bh                   20              5         .293       3299
OIH         nup         bh                   20             10         .558       3310
OIH         nup         bh                   20             20        1.165       3263
OIH         nup         loc                   5              5        -.092       3312
OIH         nup         loc                   5             10          .11       3305
OIH         nup         loc                   5             20         .487       2954
OIH         nup         loc                  10              5         .043       3312
OIH         nup         loc                  10             10         .253       3312
OIH         nup         loc                  10             20         .549       3226
OIH         nup         loc                  20              5         .163       3294
OIH         nup         loc                  20             10         .482       3214
OIH         nup         loc                  20             20         .718       3200
OIH         nup         los                   5              5        -.475       3312
OIH         nup         los                   5             10        -.333       3303
OIH         nup         los                   5             20        -.151       3264
OIH         nup         los                  10              5        -.209       3310
OIH         nup         los                  10             10        -.055       3309
OIH         nup         los                  10             20         .096       3243
OIH         nup         los                  20              5         .025       3298
OIH         nup         los                  20             10         .414       3277
OIH         nup         los                  20             20         .445       3258
OIH         up          bh                   20              5        -.057         13
OIH         up          bh                   20             10        3.035          2
OIH         up          bh                   20             20       -3.981         49
OIH         up          loc                   5             10        4.303          7
OIH         up          loc                   5             20        4.917         12
OIH         up          loc                  10             20        4.357          4
OIH         up          los                   5             10        3.767          9
OIH         up          los                   5             20         6.88         28
OIH         up          los                  10              5        4.845          2
OIH         up          los                  10             10         .569          3
OIH         up          los                  10             20        6.468         21
OIH         up          los                  20              5        9.603          5
OIH         up          los                  20             10        6.899          3
OIH         up          los                  20             20       33.461          6
SHY         nup         bh                   20              5         .067       3222
SHY         nup         bh                   20             10         .132       3145
SHY         nup         bh                   20             20         .262       3112
SHY         nup         loc                   5              5         .041       3281
SHY         nup         loc                   5             10         .086       3269
SHY         nup         loc                   5             20         .178       3222
SHY         nup         loc                  10              5         .048       3276
SHY         nup         loc                  10             10         .095       3230
SHY         nup         loc                  10             20         .188       3221
SHY         nup         loc                  20              5         .056       3238
SHY         nup         loc                  20             10         .111       3201
SHY         nup         loc                  20             20         .211       3203
SHY         nup         los                   5              5         .017       3257
SHY         nup         los                   5             10         .037       3225
SHY         nup         los                   5             20         .096       3109
SHY         nup         los                  10              5         .031       3265
SHY         nup         los                  10             10         .057       3141
SHY         nup         los                  10             20         .113       3026
SHY         nup         los                  20              5         .045       3229
SHY         nup         los                  20             10         .094       3093
SHY         nup         los                  20             20         .167       3002
SHY         up          bh                   20              5         .036         87
SHY         up          bh                   20             10         .129        165
SHY         up          bh                   20             20         .227        168
SHY         up          loc                   5              5           .1         28
SHY         up          loc                   5             10         .151         37
SHY         up          loc                   5             20         .305         58
SHY         up          loc                  10              5         .133         32
SHY         up          loc                  10             10         .114         44
SHY         up          loc                  10             20          .31         59
SHY         up          loc                  20              5         .031         70
SHY         up          loc                  20             10         .107         73
SHY         up          loc                  20             20         .347         77
SHY         up          los                   5              5        -.033         55
SHY         up          los                   5             10         .208         81
SHY         up          los                   5             20         .166        171
SHY         up          los                  10              5         .027         43
SHY         up          los                  10             10         .143        133
SHY         up          los                  10             20         .222        254
SHY         up          los                  20              5        -.008         79
SHY         up          los                  20             10         .058        181
SHY         up          los                  20             20         .176        278
SPY         nup         bh                   20              5         .051       3280

TKR         PREDICTION  STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
SPY         nup         bh                   20             10          .08       3292
SPY         nup         bh                   20             20         .121       3276
SPY         nup         loc                   5              5        -.093       3312
SPY         nup         loc                   5             10        -.036       3312
SPY         nup         loc                   5             20            0       3304
SPY         nup         loc                  10              5        -.042       3312
SPY         nup         loc                  10             10          .01       3307
SPY         nup         loc                  10             20         .128       3302
SPY         nup         loc                  20              5         .031       3312
SPY         nup         loc                  20             10         .042       3188
SPY         nup         loc                  20             20         .107       3196
SPY         nup         los                   5              5        -.235       3308
SPY         nup         los                   5             10        -.145       3295
SPY         nup         los                   5             20          -.1       3276
SPY         nup         los                  10              5        -.131       3281
SPY         nup         los                  10             10        -.035       3260
SPY         nup         los                  10             20         .179       3254
SPY         nup         los                  20              5         .021       3309
SPY         nup         los                  20             10         .061       3258
SPY         nup         los                  20             20         .175       3222
SPY         up          bh                   20              5        -.732         32
SPY         up          bh                   20             10        -.514         20
SPY         up          bh                   20             20         .333         36
SPY         up          loc                   5             20        1.118          8
SPY         up          loc                  10             10            0          5
SPY         up          loc                  10             20        1.826         10
SPY         up          loc                  20             10         .809          6
SPY         up          loc                  20             20        2.149          6
SPY         up          los                   5              5        4.898          4
SPY         up          los                   5             10        -.701         17
SPY         up          los                   5             20       -1.797         36
SPY         up          los                  10              5         .314         31
SPY         up          los                  10             10       -1.308         52
SPY         up          los                  10             20       -1.832         58
SPY         up          los                  20              5       -3.927          3
SPY         up          los                  20             10       -6.439         26
SPY         up          los                  20             20        -2.27         48
XLB         nup         bh                   20              5         .147       3300
XLB         nup         bh                   20             10         .265       3305
XLB         nup         bh                   20             20         .508       3193
XLB         nup         loc                   5              5        -.035       3241
XLB         nup         loc                   5             10        -.035       3233
XLB         nup         loc                   5             20         .029       3183
XLB         nup         loc                  10              5        -.012       3309
XLB         nup         loc                  10             10        -.007       3311
XLB         nup         loc                  10             20         .139       3194
XLB         nup         loc                  20              5         .071       3214
XLB         nup         loc                  20             10         .145       3190
XLB         nup         loc                  20             20         .292       3192
XLB         nup         los                   5              5        -.232       3295
XLB         nup         los                   5             10        -.381       3300
XLB         nup         los                   5             20        -.418       3262
XLB         nup         los                  10              5        -.172       3300
XLB         nup         los                  10             10        -.285       3310
XLB         nup         los                  10             20        -.185       3274
XLB         nup         los                  20              5        -.002       3281
XLB         nup         los                  20             10         .028       3259
XLB         nup         los                  20             20         .137       3222
XLB         up          bh                   20              5        -.926         12
XLB         up          bh                   20             10        3.869          7
XLB         up          bh                   20             20         .167         51
XLB         up          loc                   5              5         .417          1
XLB         up          loc                   5             10        2.879          1
XLB         up          loc                   5             20         .116          9
XLB         up          loc                  10              5       -1.311          3
XLB         up          loc                  10             10       -1.514          1
XLB         up          loc                  10             20       -3.931          8
XLB         up          loc                  20             20        -4.59          2
XLB         up          los                   5              5         .513          9
XLB         up          los                   5             10         .075          6
XLB         up          los                   5             20         .009         32
XLB         up          los                  10              5         .417         12
XLB         up          los                  10             10       -5.423          2
XLB         up          los                  10             20       -5.155         16
XLB         up          los                  20              5       -1.638         25
XLB         up          los                  20             10        -3.95         21
XLB         up          los                  20             20       -1.296         62
XLE         nup         bh                   20              5         .315       3301
XLE         nup         bh                   20             10         .606       3255
XLE         nup         bh                   20             20        1.111       3255
XLE         nup         loc                   5              5         .048       3312
XLE         nup         loc                   5             10         .307       3304
XLE         nup         loc                   5             20         .519       2939
XLE         nup         loc                  10              5         .113       3312
XLE         nup         loc                  10             10         .314       3302
XLE         nup         loc                  10             20         .492       3295
XLE         nup         loc                  20              5         .132       2939
XLE         nup         loc                  20             10          .33       2938
XLE         nup         loc                  20             20         .337       2930
XLE         nup         los                   5              5        -.212       3311
XLE         nup         los                   5             10        -.011       3284
XLE         nup         los                   5             20         -.12       3280
XLE         nup         los                  10              5        -.061       3292
XLE         nup         los                  10             10        -.003       3255
XLE         nup         los                  10             20         -.18       3292
XLE         nup         los                  20              5        -.047       3269

TKR         PREDICTION  STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
XLE         nup         los                  20             10        -.003       3250
XLE         nup         los                  20             20        -.442       2927
XLE         up          bh                   20              5       -1.435         11
XLE         up          bh                   20             10         .712         57
XLE         up          bh                   20             20        5.482         57
XLE         up          loc                   5             10       -1.892          8
XLE         up          loc                   5             20        4.135          7
XLE         up          loc                  10             10        -1.82         10
XLE         up          loc                  10             20        4.272         17
XLE         up          loc                  20              5        3.619          1
XLE         up          loc                  20             10       -1.908          6
XLE         up          loc                  20             20         6.04         14
XLE         up          los                   5              5       -2.513          1
XLE         up          los                   5             10         .709         28
XLE         up          los                   5             20        4.471         14
XLE         up          los                  10              5       -3.915         20
XLE         up          los                  10             10         .533         57
XLE         up          los                  10             20        2.625         20
XLE         up          los                  20              5        -.982         29
XLE         up          los                  20             10         1.45         30
XLE         up          los                  20             20        5.629         17
XLF         nup         bh                   20              5        -.106       3304
XLF         nup         bh                   20             10        -.254       3294
XLF         nup         bh                   20             20        -.669       3289
XLF         nup         loc                   5              5        -.286       3312
XLF         nup         loc                   5             10        -.389       3312
XLF         nup         loc                   5             20        -.518       3312
XLF         nup         loc                  10              5        -.166       3312
XLF         nup         loc                  10             10        -.299       3312
XLF         nup         loc                  10             20        -.315       3312
XLF         nup         loc                  20              5        -.054       3194
XLF         nup         loc                  20             10        -.147       3194
XLF         nup         loc                  20             20        -.228       3012
XLF         nup         los                   5              5        -.455       3312
XLF         nup         los                   5             10         -.51       3312
XLF         nup         los                   5             20         -.29       3273
XLF         nup         los                  10              5        -.204       3302
XLF         nup         los                  10             10        -.244       3283
XLF         nup         los                  10             20          .16       3261
XLF         nup         los                  20              5         .019       3280
XLF         nup         los                  20             10         .064       3238
XLF         nup         los                  20             20         .354       3207
XLF         up          bh                   20              5       -4.541          8
XLF         up          bh                   20             10       -2.898         18
XLF         up          bh                   20             20        1.511         23
XLF         up          los                   5             20       -8.051         39
XLF         up          los                  10              5       -3.605         10
XLF         up          los                  10             10      -10.092         29
XLF         up          los                  10             20       -8.629         51
XLF         up          los                  20              5        -9.54          4
XLF         up          los                  20             10       -7.469         46
XLF         up          los                  20             20       -4.214         63
XLI         nup         bh                   20              5         .044       3293
XLI         nup         bh                   20             10         .075       3280
XLI         nup         bh                   20             20         .108       3180
XLI         nup         loc                   5              5        -.032       3310
XLI         nup         loc                   5             10          .07       3312
XLI         nup         loc                   5             20         .098       3219
XLI         nup         loc                  10              5          .01       3312
XLI         nup         loc                  10             10         .089       3311
XLI         nup         loc                  10             20          .23       3188
XLI         nup         loc                  20              5         .056       3312
XLI         nup         loc                  20             10         .132       3311
XLI         nup         loc                  20             20         .215       3190
XLI         nup         los                   5              5        -.082       3280
XLI         nup         los                   5             10         .051       3295
XLI         nup         los                   5             20         .122       3238
XLI         nup         los                  10              5        -.002       3292
XLI         nup         los                  10             10         .154       3275
XLI         nup         los                  10             20         .405       3231
XLI         nup         los                  20              5         .107       3280
XLI         nup         los                  20             10         .271       3253
XLI         nup         los                  20             20         .378       3234
XLI         up          bh                   20              5       -1.306         19
XLI         up          bh                   20             10       -1.004         32
XLI         up          bh                   20             20        -.588         50
XLI         up          loc                   5              5       -1.531          2
XLI         up          loc                   5             20        1.111          1
XLI         up          loc                  10             10        1.671          1
XLI         up          loc                  10             20         .277          6
XLI         up          loc                  20             10        1.671          1
XLI         up          loc                  20             20        2.307          4
XLI         up          los                   5              5       -2.206         32
XLI         up          los                   5             10        4.694         17
XLI         up          los                   5             20       -5.369         46
XLI         up          los                  10              5       -2.224         20
XLI         up          los                  10             10       -3.346         37
XLI         up          los                  10             20       -5.929         51
XLI         up          los                  20              5       -3.248         32
XLI         up          los                  20             10       -3.738         59
XLI         up          los                  20             20       -5.085         50
XLK         nup         bh                   20              5         .167       3226
XLK         nup         bh                   20             10         .302       3225
XLK         nup         bh                   20             20          .31       3216
XLK         nup         loc                   5              5        -.082       3312
XLK         nup         loc                   5             10         .022       3312

TKR         PREDICTION  STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
XLK         nup         loc                   5             20         .058       3234
XLK         nup         loc                  10              5         .009       3312
XLK         nup         loc                  10             10         .095       3304
XLK         nup         loc                  10             20         .121       3220
XLK         nup         loc                  20              5         .038       3289
XLK         nup         loc                  20             10         .065       3216
XLK         nup         loc                  20             20         .044       3214
XLK         nup         los                   5              5        -.237       3268
XLK         nup         los                   5             10         -.17       3280
XLK         nup         los                   5             20        -.074       3250
XLK         nup         los                  10              5        -.096       3309
XLK         nup         los                  10             10        -.019       3283
XLK         nup         los                  10             20          .06       3243
XLK         nup         los                  20              5        -.046       3296
XLK         nup         los                  20             10        -.011       3233
XLK         nup         los                  20             20        -.089       3240
XLK         up          bh                   20              5       -1.901         86
XLK         up          bh                   20             10       -3.131         87
XLK         up          bh                   20             20       -2.014         36
XLK         up          loc                  10             10       -4.783          8
XLK         up          loc                  20              5         .907          5
XLK         up          loc                  20             10       -5.372          8
XLK         up          los                   5              5       -3.279         44
XLK         up          los                   5             10         .113         32
XLK         up          los                   5             20       -6.663         34
XLK         up          los                  10              5        1.272          3
XLK         up          los                  10             10       -3.075         29
XLK         up          los                  10             20       -1.512         41
XLK         up          los                  20              5         .576          7
XLK         up          los                  20             10       -4.031         51
XLK         up          los                  20             20       -1.561         32
XLV         nup         bh                   20              5         .021       3297
XLV         nup         bh                   20             10         .041       3257
XLV         nup         bh                   20             20         .075       3279
XLV         nup         loc                   5              5        -.024       3312
XLV         nup         loc                   5             10         .063       3306
XLV         nup         loc                   5             20          .03       3308
XLV         nup         loc                  10              5         .056       3310
XLV         nup         loc                  10             10         .056       3308
XLV         nup         loc                  10             20         .061       3309
XLV         nup         loc                  20              5        -.001       3312
XLV         nup         loc                  20             10         -.03       3283
XLV         nup         loc                  20             20        -.078       3298
XLV         nup         los                   5              5        -.072       3309
XLV         nup         los                   5             10         .087       3305
XLV         nup         los                   5             20        -.024       3284
XLV         nup         los                  10              5         .089       3294
XLV         nup         los                  10             10         .072       3277
XLV         nup         los                  10             20         .041       3235
XLV         nup         los                  20              5        -.027       3310
XLV         nup         los                  20             10        -.099       3272
XLV         nup         los                  20             20        -.215       3257
XLV         up          bh                   20              5         .736         15
XLV         up          bh                   20             10         .246         55
XLV         up          bh                   20             20        -.165         33
XLV         up          loc                   5             10        1.076          6
XLV         up          loc                   5             20         .528          4
XLV         up          loc                  10              5            0          2
XLV         up          loc                  10             10            0          4
XLV         up          loc                  10             20        1.022          3
XLV         up          loc                  20             10         .436         29
XLV         up          loc                  20             20        -.641         14
XLV         up          los                   5              5        -.769          3
XLV         up          los                   5             10        -.259          7
XLV         up          los                   5             20        1.471         28
XLV         up          los                  10              5         .013         18
XLV         up          los                  10             10        -.343         35
XLV         up          los                  10             20         .484         77
XLV         up          los                  20              5         .032          2
XLV         up          los                  20             10         .195         40
XLV         up          los                  20             20       -1.343         55

455 rows selected.

Elapsed: 00:00:05.16
15:21:57 SQL> 
15:21:57 SQL> 

Inspection of the above output suggests that SVM works well with the
tkrs: OIH, and XLE when they are held for 20 days.

We display that information below.

Notice the values for AVG_PCT_GAIN when PREDICTION is "up":


15:33:48 SQL> 
15:33:48 SQL> 
15:33:48 SQL> SELECT
15:33:48   2  tkr,prediction,strategy,pivot_point,holding_period
15:33:48   3  ,ROUND(AVG(pg),3)avg_pct_gain
15:33:48   4  ,COUNT(*)FROM mp_j_yfact
15:33:48   5  WHERE tkr IN('OIH','XLE')AND holding_period=20
15:33:48   6  GROUP BY tkr,prediction,strategy,pivot_point,holding_period
15:33:48   7  ORDER BY tkr,prediction,strategy,pivot_point,holding_period
15:33:48   8  /

TKR         PREDICTION  STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
OIH         nup         bh                   20             20        1.165       3263
OIH         nup         loc                   5             20         .487       2954
OIH         nup         loc                  10             20         .549       3226
OIH         nup         loc                  20             20         .718       3200
OIH         nup         los                   5             20        -.151       3264
OIH         nup         los                  10             20         .096       3243
OIH         nup         los                  20             20         .445       3258
OIH         up          bh                   20             20       -3.981         49
OIH         up          loc                   5             20        4.917         12
OIH         up          loc                  10             20        4.357          4
OIH         up          los                   5             20         6.88         28
OIH         up          los                  10             20        6.468         21
OIH         up          los                  20             20       33.461          6
XLE         nup         bh                   20             20        1.111       3255
XLE         nup         loc                   5             20         .519       2939
XLE         nup         loc                  10             20         .492       3295
XLE         nup         loc                  20             20         .337       2930
XLE         nup         los                   5             20         -.12       3280
XLE         nup         los                  10             20         -.18       3292
XLE         nup         los                  20             20        -.442       2927
XLE         up          bh                   20             20        5.482         57
XLE         up          loc                   5             20        4.135          7
XLE         up          loc                  10             20        4.272         17
XLE         up          loc                  20             20         6.04         14
XLE         up          los                   5             20        4.471         14
XLE         up          los                  10             20        2.625         20
XLE         up          los                  20             20        5.629         17

27 rows selected.

Elapsed: 00:00:00.92
15:33:49 SQL> 
15:33:49 SQL> 
15:33:49 SQL> 
15:35:12 SQL> 
15:35:12 SQL> 


15:37:32 SQL> 
15:37:32 SQL> -- Rollup
15:37:32 SQL> SELECT
15:37:32   2  prediction,strategy,pivot_point,holding_period
15:37:32   3  ,ROUND(AVG(pg),3)avg_pct_gain
15:37:32   4  ,COUNT(*)FROM mp_j_yfact
15:37:32   5  WHERE tkr IN('OIH','XLE')AND holding_period=20
15:37:32   6  GROUP BY prediction,strategy,pivot_point,holding_period
15:37:32   7  ORDER BY prediction,strategy,pivot_point,holding_period
15:37:32   8  /

PREDICTION  STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- -------------- ------------ ----------
nup         bh                   20             20        1.138       6518
nup         loc                   5             20         .503       5893
nup         loc                  10             20          .52       6521
nup         loc                  20             20         .536       6130
nup         los                   5             20        -.135       6544
nup         los                  10             20        -.043       6535
nup         los                  20             20         .025       6185
up          bh                   20             20        1.108        106
up          loc                   5             20        4.629         19
up          loc                  10             20        4.288         21
up          loc                  20             20         6.04         14
up          los                   5             20        6.077         42
up          los                  10             20        4.593         41
up          los                  20             20       12.889         23

14 rows selected.

Elapsed: 00:00:00.91
15:37:33 SQL> 
15:37:33 SQL> 
15:37:33 SQL> 

We offer ideas for improvement.

First, the idea that some ETFs are more predictable is certainly worth study.

This is easily implemented by assigning a unique integer to each tkr
and then attaching those integers to the attribute list fed to SVM.


Some syntax to move us in that direction is listed below:


12:09:21 SQL> @cr_tkr_int.sql
12:09:40 SQL> --
12:09:40 SQL> -- cr_tkr_int.sql
12:09:40 SQL> --
12:09:40 SQL> 
12:09:40 SQL> DROP   TABLE tkr_int;

Table dropped.

Elapsed: 00:00:00.71
12:09:40 SQL> CREATE TABLE tkr_int AS SELECT tkr_int, tkr FROM
12:09:40   2  (SELECT rownum tkr_int, tkr FROM(SELECT distinct tkr FROM mas ORDER BY tkr))
12:09:40   3  /

Table created.

Elapsed: 00:00:00.50
12:09:41 SQL> SELECT * FROM tkr_int;

   TKR_INT TKR
---------- ----
         1 DIA
         2 EFA
         3 MDY
         4 OIH
         5 SHY
         6 SPY
         7 XLB
         8 XLE
         9 XLF
        10 XLI
        11 XLK
        12 XLV

12 rows selected.

Elapsed: 00:00:00.22
12:09:41 SQL> 
12:09:44 SQL> 
12:09:44 SQL> 

Secondly, it is obvious that many SVM 'up' predictions actaully
correspond with a negative percentage gain.

And we notice that for some tkrs it is consistent behavior.

If we have a correlation there, then that event is actually predictive.

Perhaps we harvest this predictive power by running SVM a third time.

What are the steps we need to follow to feed prob_up from runs 1 and 2
of SVM into run 3 as an attribute?

The source of of prob_up is obvious, it is the mas_predictions table.

Actually we have a small wrinkle there; what we really want is the
average of prob_up from runs 1 and 2.

This syntax gives a mas_predictions_avg table:


12:09:44 SQL> 
12:09:44 SQL> 
12:12:40 SQL> @cr_mas_predictions_avg.sql
12:12:52 SQL> --
12:12:52 SQL> -- cr_mas_predictions_avg.sql
12:12:52 SQL> --
12:12:52 SQL> 
12:12:52 SQL> DROP   TABLE mas_predictions_avg;

Table dropped.

Elapsed: 00:00:00.12
12:12:52 SQL> CREATE TABLE mas_predictions_avg AS SELECT
12:12:52   2  tkr_date
12:12:52   3  ,strategy
12:12:52   4  ,holding_period
12:12:52   5  ,pivot_point
12:12:52   6  ,prediction
12:12:52   7  ,AVG(prob_up)prob_up12
12:12:52   8  FROM mas_predictions
12:12:52   9  GROUP BY
12:12:52  10  tkr_date
12:12:52  11  ,strategy
12:12:52  12  ,holding_period
12:12:52  13  ,pivot_point
12:12:52  14  ,prediction
12:12:52  15  /

Table created.

Elapsed: 00:00:06.44
12:12:58 SQL> 

We study the steps we followed to do runs 1 and 2 to construct run 3.

Our current dependency chain is listed below:
  - mas_predictions_avg   depends on
  - mas_predictions table depends on 
  - cr_bme.sql script     depends on
  - both sme, and bme views which depend on 
  - mas table

So we join tables mas_predictions_avg, tkr_int, and mas to feed both
tkr_int and prob_up into a a third run of SVM.

Some template syntax for every sme view is listed below:


15:36:40 SQL> 
15:36:41 SQL> 
15:36:41 SQL> 
15:36:41 SQL> l
  1  CREATE OR REPLACE view sme AS SELECT
  2  NULL aa_targ_att
  3  ,m.tkr_date
  4  ,m.dayspast
  5  ,m.yprice
  6  ,m.yvol
  7  ,m.yprice_avg5
  8  ,m.yprice_avg10
  9  ,m.yprice_avg20
 10  ,m.yprice_avg5slp
 11  ,m.yprice_avg10slp
 12  ,m.yprice_avg20slp
 13  ,m.shy_yprice_avg5slp
 14  ,m.xle_yprice_avg5slp
 15  ,m.shy_yprice_avg10slp
 16  ,m.xle_yprice_avg10slp
 17  ,m.shy_yprice_avg20slp
 18  ,m.xle_yprice_avg20slp
 19  ,p.prob_up12
 20  ,t.tkr_int
 21  FROM mas m, mas_predictions_avg p, tkr_int t
 22  -- WHERE m.tkr='XLV'AND ydate='2009-03-31'
 23  WHERE		     ydate='2009-03-31'
 24  AND m.tkr_date = p.tkr_date
 25  AND p.strategy = 'los'
 26  AND p.holding_period = 20
 27  AND p.pivot_point = 20
 28* AND m.tkr = t.tkr
15:36:42 SQL> /

View created.

Elapsed: 00:00:00.04
15:36:43 SQL> select count(*)from sme;

  COUNT(*)
----------
	12

Elapsed: 00:00:00.15
15:36:53 SQL> 
15:36:54 SQL> 
15:36:54 SQL> 

Notice the structure of the above template syntax is different than the structure for runs 1 and 2:

15:38:23 SQL> 
15:38:23 SQL> 
15:38:23 SQL> l
  1  CREATE OR REPLACE view sme AS SELECT
  2  NULL aa_targ_att
  3  ,tkr_date
  4  ,dayspast
  5  ,yprice
  6  ,yvol
  7  ,yprice_avg5
  8  ,yprice_avg10
  9  ,yprice_avg20
 10  ,yprice_avg5slp
 11  ,yprice_avg10slp
 12  ,yprice_avg20slp
 13  ,shy_yprice_avg5slp
 14  ,xle_yprice_avg5slp
 15  ,shy_yprice_avg10slp
 16  ,xle_yprice_avg10slp
 17  ,shy_yprice_avg20slp
 18  ,xle_yprice_avg20slp
 19* FROM mas WHERE tkr='XLV'AND ydate='2009-03-31'
15:38:24 SQL> /

View created.

Elapsed: 00:00:00.04
15:38:25 SQL> 
15:38:26 SQL> 
15:38:26 SQL> select count ( * ) from sme ;

  COUNT(*)
----------
	 1

Elapsed: 00:00:00.02
15:38:31 SQL> 
15:38:32 SQL> 

The differences are listed below:
  - run3-sme depends on a join of mas, mas_predictions_avg, and tkr_int
  - run3-sme returns 1 row for each tkr, 12 rows in all
  - run12-sme returns 1 row 

Since run3-sme returns 12 rows, we are giving SVM the ability to focus
on the relative behaviors of the tkrs.

In runs 1 and 2 we asked SVM to evaluate each tkr independently.

So if OIH behavior is more predicitve than DIA, based on the
tkr-attributes which SVM can access, SVM now has the ability to
discover that relationship.

For run3, the template syntax for every bme view is listed below:


16:01:49 SQL> 
16:01:50 SQL> 
16:01:50 SQL> 
16:01:50 SQL> l
  1  CREATE OR REPLACE view bme AS SELECT
  2  CASE WHEN yprice_avg20slp<0 THEN
  3    CASE WHEN pct_gain20<(SELECT AVG(pct_gain20)-STDDEV(pct_gain20)FROM mas)THEN'up'ELSE'nup'END
  4  ELSE
  5    CASE WHEN pct_gain20>(SELECT AVG(pct_gain20)+STDDEV(pct_gain20)FROM mas)THEN'up'ELSE'nup'END
  6  END aa_targ_att
  7  ,m.tkr_date
  8  ,m.dayspast
  9  ,m.yprice
 10  ,m.yvol
 11  ,m.yprice_avg5
 12  ,m.yprice_avg10
 13  ,m.yprice_avg20
 14  ,m.yprice_avg5slp
 15  ,m.yprice_avg10slp
 16  ,m.yprice_avg20slp
 17  ,m.shy_yprice_avg5slp
 18  ,m.xle_yprice_avg5slp
 19  ,m.shy_yprice_avg10slp
 20  ,m.xle_yprice_avg10slp
 21  ,m.shy_yprice_avg20slp
 22  ,m.xle_yprice_avg20slp
 23  ,p.prob_up12
 24  ,t.tkr_int
 25  FROM mas m, mas_predictions_avg p, tkr_int t
 26  -- WHERE m.tkr='XLV'AND 5+20+ydate<'2009-03-31'
 27  WHERE		     5+20+ydate<'2009-03-31'
 28  AND m.tkr_date = p.tkr_date
 29  AND p.strategy = 'los'
 30  AND p.holding_period = 20
 31  AND p.pivot_point = 20
 32* AND m.tkr = t.tkr
16:01:51 SQL> 
16:01:52 SQL> 
16:01:52 SQL> /

View created.

Elapsed: 00:00:00.06
16:01:58 SQL> 
16:02:00 SQL> 
16:02:01 SQL> select tkr_int,count(*)from bme group by tkr_int order by tkr_int ;

   TKR_INT   COUNT(*)
---------- ----------
	 1	 1620
	 2	 1653
	 3	 1614
	 4	 1614
	 5	 1647
	 6	 1617
	 7	 1638
	 8	 1469
	 9	 1622
	10	 1624
	11	 1618
	12	 1641

12 rows selected.

Elapsed: 00:00:00.46
16:02:17 SQL> 
16:02:18 SQL> 

Next we jump up a level and display the ruby script used to generate the large amount of SQL syntax required to complete run3:

#!/usr/bin/env ruby

# mas_run3.rb

# Demo: ruby -r mas_run3.rb -e 'MasRun3.new.write_score_sql_scripts'

# Ruby script which uses nested loops to generate several SQL scripts.
# Each script is a wrapper for the SVM algorithm resident within Oracle Data Miner.

class MasRun3
require 'rubygems'
require 'oci8'
  def write_score_sql_scripts
    # pivot_points_a holds integers representing a type of moving average slope.
    # Some strategies (loc, los) depend upon a pivot point.
    # For example, The loc-Trader will watch the moving average slope.  Once it moves above zero, he will go Long.
    # Once it moves below zero, he will sell so that he holds cash.
    pivot_points_a = ["5","10","20"]

    # holding_periods_a holds integers representing days a tkr is held before its percentage gain is measured.
    # holding_periods_a = ["5","10","20"]
    # Now, We only like 20 day holding period.
    holding_periods_a = ["20"]

    # strategies_a holds a list of acronyms for strategies.
    # Some of them are:
    # bh = Buy and Hold aka Long
    # loc = Long Or Cash
    # los = Long Or Short
    strategies_a = ['loc','los']

    # For each strategy, build-model, and then score.
    # The score will give us the probability that the pct-gain will be above a threshold dependent on std. deviation.
    strategies_a.each do |strategy|
      pivot_points_a.each do |pp|
        holding_periods_a.each do |hp|
          p("strategy: #{strategy} pivot_point: #{pp} holding_period: #{hp}")
          sqlscript_name = "mas_run3_#{strategy}_#{pp}_#{hp}_score.sql"
          (fhw = File.open(sqlscript_name,"w")).write("--\n-- #{sqlscript_name}\n--\n\n"); fhw.close
          some_dates_a = []
          dates_sql = "SELECT DISTINCT TO_CHAR(ydate,'YYYY-MM-DD')FROM etfmas WHERE ydate>TO_DATE('2002-09-01','YYYY-MM-DD')AND ydate<TO_DATE('2009-04-01','YYYY-MM-DD')"
	  OCI8.new('trade','t').exec(dates_sql){ |r| some_dates_a << r.to_s }
          some_dates_a.sort.each do |adate|

sqlplus_syntax1 = <<ENDsqlplus_syntax1
CREATE OR REPLACE view sme AS SELECT
NULL aa_targ_att
,m.tkr_date
,m.dayspast
,m.yprice
,m.yvol
,m.yprice_avg5
,m.yprice_avg10
,m.yprice_avg20
,m.yprice_avg5slp
,m.yprice_avg10slp
,m.yprice_avg20slp
,m.shy_yprice_avg5slp
,m.xle_yprice_avg5slp
,m.shy_yprice_avg10slp
,m.xle_yprice_avg10slp
,m.shy_yprice_avg20slp
,m.xle_yprice_avg20slp
,p.prob_up12
,t.tkr_int
FROM mas m, mas_predictions_avg p, tkr_int t
WHERE ydate='#{adate}'
AND m.tkr_date = p.tkr_date
AND p.strategy = '#{strategy}'
AND p.holding_period = #{hp}
AND p.pivot_point = #{pp}
AND m.tkr = t.tkr
/
CREATE OR REPLACE view bme AS SELECT
ENDsqlplus_syntax1

            # Here the aa_targ_att column depends on my choice of strategy
            case strategy
              when "loc"
aa_targ_att_syntax = <<ENDloc_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN'nup'
ELSE
  CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas)THEN'up'ELSE'nup'END
END aa_targ_att
ENDloc_syntax
              when "los"
aa_targ_att_syntax = <<ENDlos_syntax
CASE WHEN yprice_avg#{pp}slp<0 THEN
  CASE WHEN pct_gain#{hp}<(SELECT AVG(pct_gain#{hp})-STDDEV(pct_gain#{hp})FROM mas)THEN'up'ELSE'nup'END
ELSE
  CASE WHEN pct_gain#{hp}>(SELECT AVG(pct_gain#{hp})+STDDEV(pct_gain#{hp})FROM mas)THEN'up'ELSE'nup'END
END aa_targ_att
ENDlos_syntax
            end # case strategy
            # Mix together the syntax
sqlplus_syntax = <<ENDsqlplus_syntax
#{sqlplus_syntax1}#{aa_targ_att_syntax},m.tkr_date
,m.dayspast
,m.yprice
,m.yvol
,m.yprice_avg5
,m.yprice_avg10
,m.yprice_avg20
,m.yprice_avg5slp
,m.yprice_avg10slp
,m.yprice_avg20slp
,m.shy_yprice_avg5slp
,m.xle_yprice_avg5slp
,m.shy_yprice_avg10slp
,m.xle_yprice_avg10slp
,m.shy_yprice_avg20slp
,m.xle_yprice_avg20slp
,p.prob_up12
,t.tkr_int
FROM mas m, mas_predictions_avg p, tkr_int t
WHERE 5+#{hp}+ydate<'#{adate}'
AND m.tkr_date = p.tkr_date
AND p.strategy = '#{strategy}'
AND p.holding_period = #{hp}
AND p.pivot_point = #{pp}
AND m.tkr = t.tkr
/
ALTER SYSTEM FLUSH SHARED_POOL;
@cr_bme_run3.sql #{strategy} #{hp} #{pp}
PURGE RECYCLEBIN;

--
ENDsqlplus_syntax
            #Use cr_bme_run3.sql to build a model from bme table and then score the sme table.
            (fha = File.open(sqlscript_name,"a")).write(sqlplus_syntax); fha.close
          end # some_dates_a.each
        end # holding_periods_a.each
      end # pivot_points_a.each
    end # strategies_a.each
  end # def write_score_sql_scripts
end # class Mas

MasRun3.new.write_score_sql_scripts
# end of ruby script

The generated SQL scripts are displayed below:

oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ ll mas_run3*l
-rw-r--r-- 1 oracle dba 2190267 2009-05-29 21:31 mas_run3_loc_10_20_score.sql
-rw-r--r-- 1 oracle dba 2190262 2009-05-29 21:31 mas_run3_loc_20_20_score.sql
-rw-r--r-- 1 oracle dba 2183629 2009-05-29 21:31 mas_run3_loc_5_20_score.sql
-rw-r--r-- 1 oracle dba 2339482 2009-05-29 21:31 mas_run3_los_10_20_score.sql
-rw-r--r-- 1 oracle dba 2339482 2009-05-29 21:31 mas_run3_los_20_20_score.sql
-rw-r--r-- 1 oracle dba 2332849 2009-05-29 21:31 mas_run3_los_5_20_score.sql
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

The above six SQL scripts are brought under the control of one shell script:

oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ cat mas_run3.sh
#!/bin/sh

# mas_run3.sh

set -x

cd /oracle/vlsql/md/etfmas/
. /oracle/.orcl

echo exit >> mas_run3_loc_10_20_score.sql
echo exit >> mas_run3_loc_20_20_score.sql
echo exit >> mas_run3_loc_5_20_score.sql
echo exit >> mas_run3_los_10_20_score.sql
echo exit >> mas_run3_los_20_20_score.sql
echo exit >> mas_run3_los_5_20_score.sql

sqlplus trade/t @mas_run3_loc_5_20_score.sql
sqlplus trade/t @mas_run3_loc_10_20_score.sql
sqlplus trade/t @mas_run3_loc_20_20_score.sql
sqlplus trade/t @mas_run3_los_5_20_score.sql
sqlplus trade/t @mas_run3_los_10_20_score.sql
sqlplus trade/t @mas_run3_los_20_20_score.sql
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

After the shell script ran, we inspected its duration:

19:03:22 SQL> 
19:03:43 SQL> 
19:03:43 SQL> select min( run_date ), max( run_date ), 24 * (( max( run_date )-min( run_date ))) from mas_predictions_run3 ;

MIN(RUN_DA MAX(RUN_DA 24*((MAX(RUN_DATE)-MIN(RUN_DATE)))
---------- ---------- ----------------------------------
2009-05-29 2009-05-30			      17.3544444

Elapsed: 00:00:00.00
19:03:49 SQL> 
19:03:49 SQL> 
19:03:50 SQL> 

The duration of 17.35 hours is significantly less than the durations
of run1 and run2.

We have 2 reasons for this:
1. We only call SVM for a holding period of 20 days instead of 5, 10, 20 days.

2. Each call to SVM in run3 processes 12 rows instead of just 1 row so
   run1 and run2 called SVM 72 times more than run3 did.

We show the predictions and probabilities which SVM loaded into
mas_predictions_run3 for us:



19:18:49 SQL> 
19:18:49 SQL> 
19:18:49 SQL> 
19:18:49 SQL> -- Count the predictions with confidence level rounded to 1 significant digit
19:18:49 SQL> SELECT
19:18:49   2  ROUND(prob_up,1)prob_up
19:18:49   3  ,ROUND(prob_nup,1)prob_nup
19:18:49   4  ,prediction
19:18:49   5  ,COUNT(prediction)
19:18:49   6  FROM
19:18:49   7  (
19:18:49   8  	SELECT
19:18:49   9  	REGEXP_SUBSTR(tkr_date,'...')tkr
19:18:49  10  	,strategy
19:18:49  11  	,holding_period
19:18:49  12  	,pivot_point
19:18:49  13  	,prediction
19:18:49  14  	,prob_up
19:18:49  15  	,prob_nup
19:18:49  16  	,run_date
19:18:49  17  	FROM mas_predictions_run3
19:18:49  18  )
19:18:49  19  GROUP BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
19:18:49  20  ORDER BY prediction,ROUND(prob_up,1),ROUND(prob_nup,1)
19:18:49  21  /

   PROB_UP   PROB_NUP PREDICTION  COUNT(PREDICTION)
---------- ---------- ----------- -----------------
	 0	    1 nup		       3216
	.1	   .9 nup		      61831
	.2	   .8 nup		      40043
	.3	   .7 nup		       5534
	.4	   .6 nup		       2016
	.5	   .5 nup			640
	.5	   .5 up			505
	.6	   .4 up			739
	.7	   .3 up			427
	.8	   .2 up			251
	.9	   .1 up			 77
	 1	    0 up			  1

12 rows selected.

Elapsed: 00:00:00.07
19:18:49 SQL> 
19:18:49 SQL> 
19:18:49 SQL> 

Next, we show how well the run3 predictions did.

19:18:49 SQL> 
19:26:53 SQL> 
19:26:53 SQL> @cr_mp3_j_yfact.sql 
19:27:04 SQL> --
19:27:04 SQL> -- cr_mp3_j_yfact.sql
19:27:04 SQL> --
19:27:04 SQL> 
19:27:04 SQL> -- Creates a view, mp3_j_yfact, which joins mas_predictions_run3 with
19:27:04 SQL> -- yfact.  This view helps us analyze the accuracy of SVM predictions
19:27:04 SQL> -- within the mas_predictions_run3 table.
19:27:04 SQL> 
19:27:04 SQL> CREATE OR REPLACE VIEW mp3_j_yfact AS SELECT
19:27:04   2  m.prediction
19:27:04   3  ,m.prob_up
19:27:04   4  ,y.pg		-- pct_gain
19:27:04   5  ,y.day_id 	 -- fk to day_dimension
19:27:04   6  ,y.tkr		 -- DIA, EFA, MDY, ...
19:27:04   7  ,y.strategy	 -- bh, loc, los
19:27:04   8  ,y.pivot_point	 -- 5,10,20
19:27:04   9  ,y.holding_period  -- 5,10,20
19:27:04  10  FROM yfact y, mas_predictions_run3 m
19:27:04  11  WHERE REPLACE(m.tkr_date,'-') = y.tkr||y.day_id
19:27:04  12  AND y.strategy = m.strategy
19:27:04  13  AND y.pivot_point = m.pivot_point
19:27:04  14  AND y.holding_period = m.holding_period
19:27:04  15  /

View created.

Elapsed: 00:00:00.08
19:27:04 SQL> 
19:27:04 SQL> -- Now query
19:27:04 SQL> SELECT COUNT(*)FROM mas_predictions_run3;

  COUNT(*)
----------
    115280

Elapsed: 00:00:00.02
19:27:04 SQL> SELECT COUNT(*)FROM yfact;

  COUNT(*)
----------
    544644

Elapsed: 00:00:00.45
19:27:04 SQL> SELECT COUNT(*)FROM mp3_j_yfact;

  COUNT(*)
----------
    115280

Elapsed: 00:00:02.79
19:27:07 SQL> 
19:27:07 SQL> -- Look at data rolled up to highest level
19:27:07 SQL> SELECT holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp3_j_yfact
19:27:07   2  GROUP BY holding_period ORDER BY holding_period
19:27:07   3  /

HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
-------------- ------------ ----------
	    20	 .133084439	115280

Elapsed: 00:00:03.22
19:27:10 SQL> 
19:27:10 SQL> -- Drill down into prediction
19:27:10 SQL> SELECT	prediction, holding_period,AVG(pg)avg_pct_gain,COUNT(*)FROM mp3_j_yfact
19:27:10   2  GROUP BY prediction,holding_period ORDER BY prediction,holding_period
19:27:10   3  /

PREDICTION  HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- -------------- ------------ ----------
nup			20   .153713598     113280
up			20   -1.0353511       2000

Elapsed: 00:00:03.11
19:27:13 SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
oracle@zareason:~/vlsql/md/etfmas$ 
oracle@zareason:~/vlsql/md/etfmas$ 

The results of the last query above tell us all that we need to know about run3.

Notice we get a negative value for AVG_PCT_GAIN when PREDICTION is "up".

The idea of "chaining" predictions from run1 and run2 into run3 does not work here.

To drive the point home we display more results below:


20:40:27 SQL> @qry_mp3_j_yfact.sql
20:40:37 SQL> --
20:40:37 SQL> -- qry_mp3_j_yfact.sql
20:40:37 SQL> --
20:40:37 SQL> 
20:40:37 SQL> -- Drills down into yfact dimensions of view: mp3_j_yfact
20:40:37 SQL> 
20:40:37 SQL> SELECT
20:40:37   2  tkr,prediction,strategy,pivot_point,holding_period
20:40:37   3  ,ROUND(AVG(pg),3)avg_pct_gain
20:40:37   4  ,COUNT(*)FROM mp3_j_yfact
20:40:37   5  GROUP BY tkr,prediction,strategy,pivot_point,holding_period
20:40:37   6  ORDER BY tkr,prediction,strategy,pivot_point,holding_period
20:40:37   7  /

TKR	    PREDICTION	STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
DIA	    nup 	loc		      5 	    20	       .006	  1616
DIA	    nup 	loc		     10 	    20	       .103	  1611
DIA	    nup 	loc		     20 	    20	       .132	  1488
DIA	    nup 	los		      5 	    20	      -.131	  1622
DIA	    nup 	los		     10 	    20	       .084	  1598
DIA	    nup 	los		     20 	    20	       .097	  1592
DIA	    up		loc		      5 	    20	     -7.506	     1
DIA	    up		loc		     20 	    20	       .808	     8
DIA	    up		los		      5 	    20	       .545	    16
DIA	    up		los		     10 	    20	       .953	    41
DIA	    up		los		     20 	    20	       .886	    42
EFA	    nup 	loc		      5 	    20	       .331	  1597
EFA	    nup 	loc		     10 	    20	       .493	  1616
EFA	    nup 	loc		     20 	    20	       .718	  1494
EFA	    nup 	los		      5 	    20	       .194	  1625
EFA	    nup 	los		     10 	    20	       .588	  1585
EFA	    nup 	los		     20 	    20	      1.011	  1583
EFA	    up		loc		      5 	    20	      -.672	    13
EFA	    up		loc		     10 	    20	       -.91	     2
EFA	    up		loc		     20 	    20	       .896	    10
EFA	    up		los		      5 	    20	       .501	    28
EFA	    up		los		     10 	    20	     -1.027	    68
EFA	    up		los		     20 	    20	      -.824	    70
MDY	    nup 	loc		      5 	    20	       .177	  1596
MDY	    nup 	loc		     10 	    20		 .3	  1610
MDY	    nup 	loc		     20 	    20		.31	  1488
MDY	    nup 	los		      5 	    20	      -.012	  1623
MDY	    nup 	los		     10 	    20	       .255	  1586
MDY	    nup 	los		     20 	    20	       .358	  1593
MDY	    up		loc		      5 	    20	     -6.925	     2
MDY	    up		loc		     10 	    20		.36	     3
MDY	    up		loc		     20 	    20	      2.158	     8
MDY	    up		los		      5 	    20	     -1.881	    19
MDY	    up		los		     10 	    20	      -.601	    53
MDY	    up		los		     20 	    20	      -.135	    39
OIH	    nup 	loc		      5 	    20	       .509	  1463
OIH	    nup 	loc		     10 	    20	       .549	  1578
OIH	    nup 	loc		     20 	    20	       .826	  1457
OIH	    nup 	los		      5 	    20	       .047	  1587
OIH	    nup 	los		     10 	    20	       .241	  1519
OIH	    nup 	los		     20 	    20		.56	  1527
OIH	    up		loc		      5 	    20	       .223	    20
OIH	    up		loc		     10 	    20	       .418	    34
OIH	    up		loc		     20 	    20	      1.918	    39
OIH	    up		los		      5 	    20	     -2.423	    52
OIH	    up		los		     10 	    20	     -1.336	   114
OIH	    up		los		     20 	    20	      -.293	   105
SHY	    nup 	loc		      5 	    20	       .183	  1617
SHY	    nup 	loc		     10 	    20	       .195	  1613
SHY	    nup 	loc		     20 	    20	       .219	  1497
SHY	    nup 	los		      5 	    20		 .1	  1644
SHY	    nup 	los		     10 	    20	       .124	  1640
SHY	    nup 	los		     20 	    20	       .169	  1645
SHY	    up		los		      5 	    20	      -.158	     4
SHY	    up		los		     10 	    20	       .032	    13
SHY	    up		los		     20 	    20	       .095	    15
SPY	    nup 	loc		      5 	    20		  0	  1617
SPY	    nup 	loc		     10 	    20	       .104	  1612
SPY	    nup 	loc		     20 	    20	       .101	  1489
SPY	    nup 	los		      5 	    20	      -.083	  1622
SPY	    nup 	los		     10 	    20	       .128	  1605
SPY	    nup 	los		     20 	    20	       .148	  1573
SPY	    up		loc		     20 	    20	      1.231	     7
SPY	    up		los		      5 	    20	     -2.386	    15
SPY	    up		los		     10 	    20	      -.076	    34
SPY	    up		los		     20 	    20	       .375	    59
XLB	    nup 	loc		      5 	    20	       .028	  1598
XLB	    nup 	loc		     10 	    20	       .114	  1604
XLB	    nup 	loc		     20 	    20	       .266	  1483
XLB	    nup 	los		      5 	    20	      -.334	  1605
XLB	    nup 	los		     10 	    20	      -.077	  1570
XLB	    nup 	los		     20 	    20	       .224	  1581
XLB	    up		loc		      5 	    20	      1.148	     3
XLB	    up		loc		     10 	    20	      7.558	     1
XLB	    up		loc		     20 	    20		1.6	    15
XLB	    up		los		      5 	    20	     -3.297	    40
XLB	    up		los		     10 	    20	     -3.584	    73
XLB	    up		los		     20 	    20	     -1.717	    65
XLE	    nup 	loc		      5 	    20	       .525	  1463
XLE	    nup 	loc		     10 	    20	       .554	  1594
XLE	    nup 	loc		     20 	    20	       .387	  1453
XLE	    nup 	los		      5 	    20	      -.015	  1591
XLE	    nup 	los		     10 	    20	      -.117	  1573
XLE	    nup 	los		     20 	    20	      -.264	  1432
XLE	    up		loc		      5 	    20	       .483	    11
XLE	    up		loc		     10 	    20	     -1.312	    19
XLE	    up		loc		     20 	    20	      1.702	    31
XLE	    up		los		      5 	    20	     -1.631	    53
XLE	    up		los		     10 	    20	      -.785	    76
XLE	    up		los		     20 	    20	     -2.345	    55
XLF	    nup 	loc		      5 	    20	      -.529	  1617
XLF	    nup 	loc		     10 	    20	      -.347	  1611
XLF	    nup 	loc		     20 	    20	      -.277	  1490
XLF	    nup 	los		      5 	    20	      -.315	  1601
XLF	    nup 	los		     10 	    20	       .076	  1566
XLF	    nup 	los		     20 	    20	       .405	  1578

TKR	    PREDICTION	STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
XLF	    up		loc		     10 	    20	       .867	     1
XLF	    up		loc		     20 	    20	      2.489	     6
XLF	    up		los		      5 	    20	     -2.931	    39
XLF	    up		los		     10 	    20	     -1.537	    78
XLF	    up		los		     20 	    20	     -1.205	    59
XLI	    nup 	loc		      5 	    20	       .099	  1611
XLI	    nup 	loc		     10 	    20		.23	  1597
XLI	    nup 	loc		     20 	    20	       .206	  1487
XLI	    nup 	los		      5 	    20	      -.055	  1615
XLI	    nup 	los		     10 	    20	       .305	  1572
XLI	    nup 	los		     20 	    20	       .286	  1572
XLI	    up		loc		     20 	    20	       .386	     9
XLI	    up		los		      5 	    20	      8.292	    22
XLI	    up		los		     10 	    20	       .507	    66
XLI	    up		los		     20 	    20	       1.25	    60
XLK	    nup 	loc		      5 	    20	       .058	  1617
XLK	    nup 	loc		     10 	    20	       .121	  1610
XLK	    nup 	loc		     20 	    20	       .101	  1491
XLK	    nup 	los		      5 	    20	       .009	  1606
XLK	    nup 	los		     10 	    20	       .353	  1562
XLK	    nup 	los		     20 	    20		.06	  1592
XLK	    up		loc		     20 	    20	      1.018	     5
XLK	    up		los		      5 	    20	     -5.807	    35
XLK	    up		los		     10 	    20	     -5.534	    76
XLK	    up		los		     20 	    20	     -4.506	    40
XLV	    nup 	loc		      5 	    20	       .026	  1617
XLV	    nup 	loc		     10 	    20	       .044	  1612
XLV	    nup 	loc		     20 	    20	      -.072	  1496
XLV	    nup 	los		      5 	    20	       .067	  1611
XLV	    nup 	los		     10 	    20	       .136	  1583
XLV	    nup 	los		     20 	    20	      -.159	  1591
XLV	    up		los		      5 	    20	     -2.745	    26
XLV	    up		los		     10 	    20	     -1.936	    58
XLV	    up		los		     20 	    20	     -1.348	    44

130 rows selected.

Elapsed: 00:00:03.20
20:40:40 SQL> 
20:40:40 SQL> 
20:40:40 SQL> SELECT
20:40:40   2  tkr,prediction,strategy,pivot_point,holding_period
20:40:40   3  ,ROUND(AVG(pg),3)avg_pct_gain
20:40:40   4  ,COUNT(*)FROM mp3_j_yfact
20:40:40   5  WHERE tkr IN('OIH','XLE')AND holding_period=20
20:40:40   6  GROUP BY tkr,prediction,strategy,pivot_point,holding_period
20:40:40   7  ORDER BY tkr,prediction,strategy,pivot_point,holding_period
20:40:40   8  /

TKR	    PREDICTION	STRATEGY    PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- ----------- -------------- ------------ ----------
OIH	    nup 	loc		      5 	    20	       .509	  1463
OIH	    nup 	loc		     10 	    20	       .549	  1578
OIH	    nup 	loc		     20 	    20	       .826	  1457
OIH	    nup 	los		      5 	    20	       .047	  1587
OIH	    nup 	los		     10 	    20	       .241	  1519
OIH	    nup 	los		     20 	    20		.56	  1527
OIH	    up		loc		      5 	    20	       .223	    20
OIH	    up		loc		     10 	    20	       .418	    34
OIH	    up		loc		     20 	    20	      1.918	    39
OIH	    up		los		      5 	    20	     -2.423	    52
OIH	    up		los		     10 	    20	     -1.336	   114
OIH	    up		los		     20 	    20	      -.293	   105
XLE	    nup 	loc		      5 	    20	       .525	  1463
XLE	    nup 	loc		     10 	    20	       .554	  1594
XLE	    nup 	loc		     20 	    20	       .387	  1453
XLE	    nup 	los		      5 	    20	      -.015	  1591
XLE	    nup 	los		     10 	    20	      -.117	  1573
XLE	    nup 	los		     20 	    20	      -.264	  1432
XLE	    up		loc		      5 	    20	       .483	    11
XLE	    up		loc		     10 	    20	     -1.312	    19
XLE	    up		loc		     20 	    20	      1.702	    31
XLE	    up		los		      5 	    20	     -1.631	    53
XLE	    up		los		     10 	    20	      -.785	    76
XLE	    up		los		     20 	    20	     -2.345	    55

24 rows selected.

Elapsed: 00:00:00.34
20:40:41 SQL> 
20:40:41 SQL> -- Rollup
20:40:41 SQL> SELECT
20:40:41   2  prediction,strategy,pivot_point,holding_period
20:40:41   3  ,ROUND(AVG(pg),3)avg_pct_gain
20:40:41   4  ,COUNT(*)FROM mp3_j_yfact
20:40:41   5  WHERE tkr IN('OIH','XLE')AND holding_period=20
20:40:41   6  GROUP BY prediction,strategy,pivot_point,holding_period
20:40:41   7  ORDER BY prediction,strategy,pivot_point,holding_period
20:40:41   8  /

PREDICTION  STRATEGY	PIVOT_POINT HOLDING_PERIOD AVG_PCT_GAIN   COUNT(*)
----------- ----------- ----------- -------------- ------------ ----------
nup	    loc 		  5		20	   .517       2926
nup	    loc 		 10		20	   .552       3172
nup	    loc 		 20		20	   .607       2910
nup	    los 		  5		20	   .016       3178
nup	    los 		 10		20	   .059       3092
nup	    los 		 20		20	   .161       2959
up	    loc 		  5		20	   .315 	31
up	    loc 		 10		20	  -.202 	53
up	    loc 		 20		20	  1.822 	70
up	    los 		  5		20	 -2.023        105
up	    los 		 10		20	 -1.116        190
up	    los 		 20		20	  -.998        160

12 rows selected.

Elapsed: 00:00:00.14
20:40:41 SQL> 
20:40:41 SQL> 
20:40:41 SQL> 

Notice that we usually get negative values for AVG_PCT_GAIN when PREDICTION is "up".

So, the results of the last query above tell us all that we need to know about run3.

The idea of feeding predictions from run1 and run2 into run3 does not work.

Conclusion

The Analytic Workspace Manager provided by Oracle is a useful tool for
visualizing the effectiveness of some simple trading strategies for ETFs.

The SVM algorithm as implemented within the the Oracle RDBMS is a
powerful tool for classification of data.

But the algorithm needs a proper set of predictor attributes in order
to classify future price movements of ETFs.

A set of predictor attributes may be easily obtained from the slopes
of moving averages of ETFs.

When these attributes are fed to SVM, it does have the ability to flag
positive price movements over a 20 day span for the ETFs, OIH and XLE.

For other ETFs such as SPY and DIA, the correlation between the slopes
of moving averages and future price movements is very weak.

For these other ETFs, SVM lacks the ability to accurately predict
their future price movements from the slopes of moving averages.

For the database programmer, this paper could be used as a source of
Programming Techniques/Syntax:

Demo of wget Command Line
Demo of 11g Advanced Compression Feature
http://www.google.com/search?q=11g+Advanced+Compression+site:oracle.com
Shell Script Demo of Inline SQL
Demo of SQL*Loader Command Line
Simple Demo of AWK to Prepare Flat Files for SQL*Loader
Moving Avg Slope via Inline View and Analytic Functions
Alter Size of a Table Column
Subquery to Feed 2 Values to BETWEEN Predicate
Simple FOR LOOP in SQL
Demo of CASE WHEN ELSE END in SQL
Demo of Starting AWM From jar File
Demo of Using Ruby to Generate Several Shell Scripts
mas_bh.rb, Ruby Script to Generate SQL Scripts for "bh" Strategy
Demo of the OCI8 Ruby-Oracle Interface
bh SQL Syntax Generated From Ruby Script
mas.rb, Ruby Script to Generate Many Large SQL Scripts
los and loc SQL Syntax Generated From Ruby Script
Passing Command Line Values to SQL*Plus Variables
Various PL/SQL Examples
Demo of SQL PREDICTION() Function
Demo of SQL PREDICTION_PROBABILITY() Function
SQL PREDICTION() Function Documentation
REGEXP_SUBSTR SQL Function to Match First 3 Characters
Ruby Script to Generate Run3 Syntax