2018-01-17

共分散と相関係数

共分散とはxの偏差とyの偏差の積の平均値
一方が増加すると一方も増加するのが正の相関
一方が増加すると一方は減少するのが負の相関
0に近づくほど相関は弱い

x=[3, 8, 9, 7, 4, 5, 8, 10, 9, 7]
y=[4, 6, 8, 4, 5, 4, 7,   9, 7, 6]
xの平均7
yの平均6
xの偏差=[-4,1,2,0,-3,-2,1,3,2]
yの偏差=[-2,0,2,-2,-1,-2,1,3,1]

xの偏差の2乗=[16,1,4,0,9,4,1,9,4]
xの標準偏差=2.19   48/10(xの偏差の2乗の和/10)の平行根

yの偏差の2乗=[4,0,4,4,1,4,1,9,1]
yの標準偏差=1.69  28(yの偏差の2乗の和/10)の平行根

共分散
xの偏差*yの偏差=[8, 0, 4, 0, 3, 4, 1, 9, 2, 0]
xの偏差*yの偏差の和=31
31/10=3.1

相関係数
共分散を標準化した値
共分散/(xの標準偏差*yの標準偏差)=0.84
計算を簡略化すると(10で割っているのを打ち消す)
31/(48の平行根*28の平行根)

pythonで
>>> x = np.array([3, 8, 9, 7, 4, 5, 8, 10, 9, 7])
>>> y = np.array([4, 6, 8, 4, 5, 4, 7, 9, 7, 6])

平均
>>> x.mean()
7.0
>>> y.mean()
6.0

偏差
>>> [i-x.mean() for i in x]
[-4.0, 1.0, 2.0, 0.0, -3.0, -2.0, 1.0, 3.0, 2.0, 0.0]
>>> [i-y.mean() for i in y]
[-2.0, 0.0, 2.0, -2.0, -1.0, -2.0, 1.0, 3.0, 1.0, 0.0]

分散
>>> np.var(x)
4.7999999999999998
>>> np.var(y)
2.7999999999999998

標準偏差
>>> x.std()
2.1908902300206643
>>> y.std()
1.6733200530681511

共分散
>>> np.cov(x,y, bias=True)
array([[ 4.8,  3.1],
       [ 3.1,  2.8]])

相関係数
>>> np.corrcoef(x,y)
array([[ 1.        ,  0.84559432],
       [ 0.84559432,  1.        ]])

2018-01-10

numpyメモ

#配列
list = np.array([1,2,3,4])
my_list1 = [1,2,3,4]
my_array1 = np.array(my_list1)

#始点、終点（含まない）、間隔
>>>np.arange(0,10,2)
array([0, 2, 4, 6, 8])
＃ 0~10の区間を15等分
>>> np.linspace(0,10,15)
array([ 0.        ,  0.71428571,  1.42857143,  2.14285714,  2.85714286,
        3.57142857,  4.28571429,  5.        ,  5.71428571,  6.42857143,
        7.14285714,  7.85714286,  8.57142857,  9.28571429, 10.        ])

# 配列のサイズを調べる
my_array1.shape

# 配列のデータ型を調べる
my_array1.dtype

#すべての要素が０
np.zeros(5)

# arange関数
>>>np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(1,10,2)
array([1, 3, 5, 7, 9])

#平均10、標準偏差5の正規分布
>>> np.random.normal(10.0,5.0, 10)
array([ 10.93299706,  14.88926584,  10.31387511,   9.31244588,
         5.82275012,   7.61138349,   2.87642971,  18.23249893,
        12.77071065,   3.04294258])

#一様分布に従う乱数が生成　下限-10、上限10、10個
 >>> np.random.uniform(-10.0, 10.0, 10)
array([ 7.63868191, -5.60987394,  8.72772981, -9.06028545,  8.22631925,
       -3.93651342, -7.83732211, -2.10760123,  2.39558808,  9.62909359])

#標準正規分布（ガウス分布）は、平均0, 標準偏差1の正規分布
>>> np.random.randn(4)
array([ 0.85246024,  0.21607126,  3.11295252,  0.79788004])

2017-12-28

postgresql10をcentos7にインストール

# yum install https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm
# yum install postgresql10-devel postgresql10-contrib
# /usr/pgsql-10/bin/postgresql-10-setup initdb

postgres=# create role username with createdb login password '***';
postgres=# create database dbname OWNER username;

2017-11-23

elixirでマルチコアCPUプログラミング

プログラミングErlangのマルチコアCPUプログラミングの章にあるpmapのコードをelixirで書いた
https://github.com/iyoo14/map_utilgithub.com

結果
100個のリストの各数値の
フィボナッチ数の計算
は非効率な再帰版の場合は並列の方が早いが
効率のよい末尾再帰の場合は並列の方が遅い

1000個の数値のリストを含んだ100個のリストを作る
各リストの数値の並び替え
は並列の方が遅い

となった。

2017-11-12

elixirインストール

＃ cd /usr/local/src/
# wget http://erlang.org/download/otp_src_20.0.tar.gz
# tar xfzv otp_src_20.0.tar.gz
# cd otp_src_20.0
# ./configure
# make
# make install
# cd /usr/local/lib
# git clone https://github.com/elixir-lang/elixir.git
# cd elixir
# make clean test

2016-04-22

sedで指定した行を

# 6行目から15行目を出力
sed -n -e 6,15p
# 1つ目の正規表現にマッチする行から2つ目の正規表現にマッチする行までを出力
sed -n -e /xxx/,/yyy/p

2016-03-15

VIM バイナリ編集

$ vim -b

:%!xxd

元に戻す
:%!xxd -r