Cannot Understand With Sklearn's PolynomialFeatures


Answer :

If you have features [a, b, c] the default polynomial features(in sklearn the degree is 2) should be [1, a, b, c, a^2, b^2, c^2, ab, bc, ca].

2.61576000e+03 is 37.8x62.2=2615,76 (2615,76 = 2.61576000 x 10^3)

In a simple way with the PolynomialFeatures you can create new features. There is a good reference here. Of course there are and disadvantages("Overfitting") of using PolynomialFeatures(see here).

Edit:
We have to be careful when using the polynomial features. The formula for calculating the number of the polynomial features is N(n,d)=C(n+d,d) where n is the number of the features, d is the degree of the polynomial, C is binomial coefficient(combination). In our case the number is C(3+2,2)=5!/(5-2)!2!=10 but when the number of features or the degree is height the polynomial features becomes too many. For example:

N(100,2)=5151 N(100,5)=96560646 

So in this case you may need to apply regularization to penalize some of the weights. It is quite possible that the algorithm will start to suffer from curse of dimensionality (here is also a very nice discussion).


PolynomialFeatures generates a new matrix with all polynomial combinations of features with given degree.

Like [a] will be converted into [1,a,a^2] for degree 2.

You can visualize input being transformed into matrix generated by PolynomialFeatures.

from sklearn.preprocessing import PolynomialFeatures a = np.array([1,2,3,4,5]) a = a[:,np.newaxis] poly = PolynomialFeatures(degree=2) a_poly = poly.fit_transform(a) print(a_poly) 

Output:

 [[ 1.  1.  1.]  [ 1.  2.  4.]  [ 1.  3.  9.]  [ 1.  4. 16.]  [ 1.  5. 25.]] 

You can see matrix generated in form of [1,a,a^2]

To observe polynomial features on scatter plot, let's use number 1-100.

import numpy as np from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import PolynomialFeatures  #Making 1-100 numbers a = np.arange(1,100,1) a = a[:,np.newaxis]  #Scaling data with 0 mean and 1 standard Deviation, so it can be observed easily scaler = StandardScaler() a = scaler.fit_transform(a)  #Applying PolynomialFeatures poly = PolynomialFeatures(degree=2) a_poly = poly.fit_transform(a)  #Flattening Polynomial feature matrix (Creating 1D array), so it can be plotted.  a_poly = a_poly.flatten() #Creating array of size a_poly with number series. (For plotting) xarr = np.arange(1,a_poly.size+1,1)  #Plotting plt.scatter(xarr,a_poly) plt.title("Degree 2 Polynomial") plt.show() 

Output:

2 Degree

Changing degree=3 ,we get:

3 Degree


You have 3-dimensional data and the following code generates all poly features of degree 2:

X=np.array([[230.1,37.8,69.2]]) from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures() X_poly=poly.fit_transform(X) X_poly #array([[  1.00000000e+00,   2.30100000e+02,   3.78000000e+01, #      6.92000000e+01,   5.29460100e+04,   8.69778000e+03, #      1.59229200e+04,   1.42884000e+03,   2.61576000e+03, #      4.78864000e+03]]) 

This can also be generated with the following code:

a, b, c = 230.1, 37.8, 69.2 # 3-dimensional data np.array([[1,a,b,c,a**2,a*b,c*a,b**2,b*c,c**2]]) # all possible degree-2 polynomial features # array([[  1.00000000e+00,   2.30100000e+02,   3.78000000e+01,       6.92000000e+01,   5.29460100e+04,   8.69778000e+03,       1.59229200e+04,   1.42884000e+03,   2.61576000e+03,       4.78864000e+03]]) 

Comments

Popular posts from this blog

Are Regular VACUUM ANALYZE Still Recommended Under 9.1?

Can Feynman Diagrams Be Used To Represent Any Perturbation Theory?