Data Analysis & ML/시계열분석

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset)

YSY^ 2021. 3. 2. 20:38

ysyblog.tistory.com/179?category=1186605

[시계열분석] 시계열 변수(빈도/추세/계절성/주기/시계열분해/더미변수/지연값)

시계열분석은 어떤문제를 다루나 - regression, regulariRegularization Algorithms, clustering에서 주로 쓰며 ,Regression이 많이 쓰인다. 시계열 분석과 기계학습의 차이 확률 과정(Stochastic Process): 상관..

ysyblog.tistory.com

해당 포스팅은 위 포스팅에 이어 진행됩니다.

Import Package and Dataset

# Ignore the warnings
import warnings
# warnings.filterwarnings('always') #항상 warning가 뜨게함
warnings.filterwarnings('ignore')

# System related and data input controls
import os #시스템 관련 핸들링

# Data manipulation and visualization
import pandas as pd
pd.options.display.float_format = '{:,.2f}'.format
pd.options.display.max_rows = 10  # 보여지는 행개수
pd.options.display.max_columns = 20 # 보여지는 열개수
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Modeling algorithms
# General
import statsmodels.api as sm
from scipy import stats

# Model selection
from sklearn.model_selection import train_test_split

# Evaluation metrics
# for regression
from sklearn.metrics import mean_squared_log_error, mean_squared_error,  r2_score, mean_absolute_error

DataSET

location = "https://raw.githubusercontent.com/cheonbi/TimeSeriesAnalysis/master/Data/BikeSharingDemand/Bike_Sharing_Demand_Full.csv"
raw_all = pd.read_csv(location) #train data와 test data를 합친것
raw_all

Feature Engineering: 데이터에서 시계열패턴 추출하기

# string to datetime
raw_all['datetime'] = pd.to_datetime(raw_all['datetime'])

# set index as datetime column
raw_all.set_index('DateTime', inplace=True)

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 0

데이터를 시간단위로 만들기

raw_all.asfreq('H').index #시간단위로 되어있다. 
# raw_all.asfreq('D') #일별데이터로 바꿔줌
# raw_all.asfreq('W')
raw_all.asfreq('H').isnull().sum() # 하지만 165개가 빠져있다.

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 1

raw_all.asfreq('H')[raw_all.asfreq('H').isnull().sum(axis=1) > 0] #아래 시간대가 빠져있는 것이다.

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 2

# setting frequency of time series data
raw_all = raw_all.asfreq('H', method='ffill') #앞에 있는 데이터로 뒤에 있는 값을 채운다. 3시가 비이었으면 2시의 데이터로 가져다가 채운다.

Target 변수 시각화 및 시계열 분해

타겟변수인 count와 count와 관련이 있는 'registered','casual'을 비교하면 다음과 같다.
raw_all[['count','registered','casual']].plot(kind='line', figsize=(20,6), linewidth=3, fontsize=20, xlim=('2012-01-01', '2012-06-01'), ylim=(0,1000)) plt.title('Time Series of Target', fontsize=20) plt.xlabel('Index', fontsize=15) plt.ylabel('Demand', fontsize=15) plt.show()

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 3

    # line plot of Y
    raw_all[['count']].plot(kind='line', figsize=(20,6), linewidth=3, fontsize=20,
                                                xlim=('2012-01-01', '2012-03-01'), ylim=(0,1000))
    plt.title('Time Series of Target', fontsize=20)
    plt.xlabel('Index', fontsize=15)
    plt.ylabel('Demand', fontsize=15)
    plt.show()

시계열분해

# split data as trend + seasonal + residual
plt.rcParams['figure.figsize'] = (14, 9)
sm.tsa.seasonal_decompose(raw_all['count'], model='additive').plot() 
plt.show()

sm.tsa.seasonal_decompose는 계절성을 없애는 함수이며, model = additive는 trend, seasonal, residual가 더히기(+)로 이루어져 있을 것이다라는 것이다.

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 5

    # 수치로도 볼 수 있다.
    result = sm.tsa.seasonal_decompose(raw_all['count'], model='additive')
    result.observed

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 6

    result.trend

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 7

trend에 nan이 있는데 실제로는 그게 아님, trend를 만드려면 여러개가 있어야 하는데 초반과 끝부분은 그게 없어서 그렇기 때문,
해당 데이터는 12개를 트렌드를 추론하는데 사용

# split data as trend * seasonal * residual
sm.tsa.seasonal_decompose(raw_all['count'], model='multiplicative').plot() 
plt.show()

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 8

model = multiplicative은 trend, seasonal, residual가 곱하기(*)로 이루어져 있을 것이다라는 것이다.
따라서 도메인의 특성에 따라 additive, multiplicative를 선택해야 한다.

# fill nan as some values of data
result = sm.tsa.seasonal_decompose(raw_all['count'], model='additive')
Y_trend = pd.DataFrame(result.trend)
Y_trend.fillna(method='ffill', inplace=True) #trend의 nan값 채우기(뒤에 있는 nam값들 앞 값드로 채우기)
Y_trend.fillna(method='bfill', inplace=True) #trend의 nan값 채우기(앞에 있는 nam값들 뒤 값들로 채우기)
Y_trend.columns = ['count_trend'] 

Y_seasonal = pd.DataFrame(result.seasonal)
Y_seasonal.fillna(method='ffill', inplace=True) #seasonal의 nan값 채우기(뒤에 있는 nam값들 앞 값드로 채우기)
Y_seasonal.fillna(method='bfill', inplace=True) #seasonal의 nan값 채우기(앞에 있는 nam값들 뒤 값들로 채우기)
Y_seasonal.columns = ['count_seasonal']

# merging several columns
raw_all = pd.concat([raw_all, Y_trend, Y_seasonal], axis=1)

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset) 9

해당 포스팅은 패스트캠퍼스의 <파이썬을 활용한 시계열 데이터 분석 A-Z 올인원 패키지> 강의를 듣고 정리한 내용입니다

728x90

저작자표시 비영리

'Data Analysis & ML > 시계열분석' 카테고리의 다른 글

[시계열분석] 시계열 변수 추출 실습(Python)(4) - 시계열 데이터 준비(train/test set 분리) (bike-sharing-demand dataset) (1)	2021.03.03
[시계열분석] 시계열 변수 추출 실습(Python)(3) - 종속변수들과 독립변수들과의 관계를 파악하기 위한 시각화 (bike-sharing-demand dataset) (0)	2021.03.03
[시계열분석] 시계열 변수 추출 실습(Python)(2) - 이동평균/지연값/증감폭/그룹화 (bike-sharing-demand dataset) (0)	2021.03.03
[시계열분석] 시계열 변수(빈도/추세/계절성/주기/시계열분해/더미변수/지연값) (0)	2021.02.19
[시계열분석] R을 활용한 시계열 분석(정상성 판단) (0)	2020.12.24

현재글[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset)

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset)

Import Package and Dataset

DataSET

Feature Engineering: 데이터에서 시계열패턴 추출하기

데이터를 시간단위로 만들기

Target 변수 시각화 및 시계열 분해

시계열분해

'Data Analysis & ML > 시계열분석' 카테고리의 다른 글

'Data Analysis & ML/시계열분석'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[시계열분석] 시계열 변수 추출 실습(Python)(1) - 시계열 분해 (bike-sharing-demand dataset)

Import Package and Dataset

DataSET

Feature Engineering: 데이터에서 시계열패턴 추출하기

데이터를 시간단위로 만들기

Target 변수 시각화 및 시계열 분해

시계열분해

'Data Analysis & ML > 시계열분석' 카테고리의 다른 글

'Data Analysis & ML/시계열분석'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역