Locked lesson.
About this lesson
In this video, we'll learn how to make predictions and analyze results based on the data analysis we've performed so far.
Exercise files
Download this lesson’s related exercise files.
Make Predictions and Analyze Results.docx57.1 KB Make Predictions and Analyze Results - Solution.docx
55.8 KB
Quick reference
Make Predictions and Analyze Results
Next it's time to make predictions and analyze the result with a scatter plot and a dist plot.
When to use
Do this at the end of every Linear Regression.
Instructions
Create a variable named 'predictions' and set it equal to our model predictions.
predictions = lin.predict(X_test)
Run a scatter plot of the y_test data vs. our predictions.
plt.scatter(y_test, predictions)
Run a Seaborn distplot to determine whether the data is normally distributed.
sns.distplot((y_test - predictions))
Hints & tips
- predictions = lin.predict(X_test)
- plt.scatter(y_test, predictions)
- sns.distplot((y_test - predictions))
- 00:05 Okay, so let's make some predictions and analyze some results.
- 00:09 So let's create a variable called predictions.
- 00:12 And let's set that equal to, now we can call lin.predict.
- 00:17 And we can just pass in what do we want to predict?
- 00:20 Well, we want to predict on X_test,
- 00:23 not our trained to data but our test data.
- 00:29 And we can run, we can just see that what this is.
- 00:34 And we get an array with a bunch of price data, right?
- 00:38 So it doesn't mean a whole lot to us, but
- 00:40 we can make a scatterplot to kind of look at this.
- 00:42 So let's go plt.scatter, and we want to pass in y_test,
- 00:50 and then our X_test, predictions, so we can just add in predictions.
- 00:56 When we do that, we get this scatterplot, and
- 00:59 you'll notice it's sort of seems linear, right?
- 01:02 Not the most linear, when we looked at this linear regression
- 01:06 on Wikipedia these were really kind of tightly bunched around this line.
- 01:11 Whereas these, if we pictured a line going through here,
- 01:15 these aren't really tightly bunched around,
- 01:17 there's a fit line going through here, there's a bunch of other outlier.
- 01:22 So this may or may not be the best model to use linear regression.
- 01:27 And we can sort of see, in order for a linear regression to really work well,
- 01:33 that data needs to be normally distributed.
- 01:36 So we can sort of use seaborn real quick to just check whether or
- 01:41 not this is normally distributed by calling a distplot, and
- 01:46 we can pass in our y_test- predictions.
- 01:52 And when we do that, okay, it looks sort of normally distributed,
- 01:56 but we've got this tail over here that's definitely not normal.
- 02:00 Maybe account for this stuff over here that's sort of not in line with everything
- 02:04 else here, so there may be some outliers in our data, super expensive houses,
- 02:10 maybe react differently to market data, who knows?
- 02:14 It's really interesting to look at.
- 02:18 If we run this for our training data, Which we really wouldn't want to do.
- 02:25 But this looks a little more normally distributed.
- 02:33 But not really when we look at the chart.
- 02:37 So that's our linear regression model.
- 02:40 Like I said, very, very quick overview for this, just to kind of whet
- 02:45 your appetite and show you sort of things that are available for future learning.
- 02:49 Now that you've learned an introduction to data analysis with Python,
- 02:53 machine learning is the next thing you're going to want to focus on.
- 02:56 And this is just a taste of machine learning.
- 02:59 So just a quick recap, we loaded our Boston data,
- 03:02 we slapped it into a data frame, we set our x and
- 03:05 our y as the columns of that data frame, the y is the actual price data.
- 03:11 And we split our data into training and testing groups.
- 03:14 We ran our linear regression, we fit our model to our x train and y train.
- 03:21 We found our intercepts and our coefficients.
- 03:23 Coefficients give us data that relates to a one unit increase and
- 03:28 x correlates to an increase of whatever this thing is in prices, which may or
- 03:33 may not make sense if you go through here.
- 03:35 But remember, this is very, very old data that may not reflect reality very well,
- 03:40 and sort of keep that in mind.
- 03:42 We can run predictions, we can create predictions, and then analyze our results.
- 03:48 So that's all for this video.
Lesson notes are only available for subscribers.