Simple linear regression
In this article, we will discuss simple linear regression. We will go over the concept then show an example with Apple stock price history. We use linear regression function in datalib.js to calculate the parameters. We use d3.js to plot the final visualization.
Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables [1].
- One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
- The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
One example is height and weight of individuals. You'd expect weight to increase with height. The following image shows simple linear regression line for weight and height.
Simple linear regression analysis produces an equation y = slope * x + intercept. In the above example, the equation is y = 0.63x + -43.82. With new value x, we can use the equation to predict the value of y.
Next, we will use simple linear regression to examine the relationship of Apple stock price from October 2008 to May 2012. Below is the link to the live example on dtab.io. It includes data, analysis code, and visualization.
Linear regression with datalib.js and d3.js
The Apple stock price dataset we use consists of two columns: date and close. Each data point shows the closing stock price on a particular date. There are 916 data points.
We want to model stock price based on date. Our x variable will be date and y variable will be close. Simple linear regression requires two numerical variables. We need to convert x variable to its number representation. In this case, we'll use the unix timestamp representation. The below code will convert date to unix timestamp:
=+(new Date(A2))
And here's the result:
To calculate linear regression data points, we write a utility function around datalib.
var dlLinearRegression = function(data) {
var mapData = data;
// perform linear regression analysis
// variable lin contains the model
var lin = dl.linearRegression(mapData,
function(d) {
return d[0];
},
function(d) {
return d[1];
}
);
var result = [];
// use equation to calculate result data points
// y = slope * x + intercept
mapData.forEach(function(d) {
result.push([d[0], lin.slope * d[0] + lin.intercept]);
})
return result;
}
Here are the result data points:
=dlLinearRegression(C2:D916)
Next, we want to plot the stock price data points and the regression line. We use d3.js to create the plot with the below code:
.regression-chart {
font-size: 12px;
}
.axis path,
.axis line {
fill: none;
stroke: #000;
shape-rendering: crispEdges;
}
.x.axis path {
display: none;
}
.line {
fill: none;
stroke: steelblue;
stroke-width: 1.5px;
}
.reg {
fill: none;
stroke: orange;
stroke-width: 1.5px;
}
var width = 500, height = 280;
var margin = {top: 20, right: 30, bottom: 30, left: 50};
var el = new Dtab.GridView({
id: 'dl-regression-chart',
className: 'regression-chart',
pos: {x: 20, y: 200},
css: {width: width + margin.left + margin.right, height: height + margin.top + margin.bottom},
draggable: true,
clear: true
});
var x = d3.time.scale()
.range([0, width]);
var y = d3.scale.linear()
.range([height, 0]);
var xAxis = d3.svg.axis()
.scale(x)
.orient("bottom")
.tickFormat(d3.time.format("%m/%y"));
var yAxis = d3.svg.axis()
.scale(y)
.orient("left");
var line = d3.svg.line()
.x(function(d) { return x(d[0]); })
.y(function(d) { return y(d[1]); });
var svg = d3.select("#dl-regression-chart").append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
var linedata = dtab.getRange('C2:D916')[0];
var rdata = dtab.getRange('F2:G916')[0];
x.domain(d3.extent(linedata, function(d) { return d[0]; }));
y.domain(d3.extent(linedata, function(d) { return d[1]; }));
svg.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + height + ")")
.call(xAxis);
svg.append("g")
.attr("class", "y axis")
.call(yAxis)
.append("text")
.attr("transform", "rotate(-90)")
.attr("y", 6)
.attr("dy", ".71em")
.style("text-anchor", "end")
.text("Price ($)");
svg.append("path")
.datum(linedata)
.attr("class", "line")
.attr("d", line);
svg.append("path")
.datum(rdata)
.attr("class", "reg")
.attr("d", line);
Simple linear regression is easy to get started. Stock prices are more complex to model. We can see the limitation of the analysis.
References
[1] - Stat 501 - Pennsylvania State University - https://onlinecourses.science.psu.edu/stat501/node/251
[2] - Linear regression with datalib.js and d3.js - https://dtab.io/sheets/56d8fbe44438a12f62e25bc5