pythonのグラフライブラリといえば、matplotlib, seabornが有名です.
私見では、seabornはmatplotlibの補完的な機能を中心に設計されていて、かつインターフェースが (ggplotなどに慣れてしまうと) 一貫性がなくて APIマニュアルとにらめっこが大変。
まあなれればたいしたことないのかもしれないけど、pythonでもRでも同じようにグラフをかきたいという人には、ggplotのpython移植であるplotnineがおすすめ。
In [1]:
from plotnine import *
import matplotlib as mpl
Diamonds 4C¶
Good <- -> Poor
Color : DEFGHIJ
Clarity: FL, IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1, I2, I3
Cut: Ideal, Excellent(Premium), Very good, good, fair, poor
Carat: weight
ダイアモンドの4Cと値段のサンプルデータ. plotnineにあるし、Rでも定番。
In [2]:
# Colorだけ、factorのorderが逆なので、ひっくり返しておく
from plotnine.data import diamonds as diamonds_
diamonds = diamonds_.assign(color = diamonds_.color.cat.reorder_categories(diamonds_.color.cat.categories[::-1]))
categorical data¶
In [3]:
ggplot(diamonds, aes(x='color',y='stat(count)',fill='cut')) + geom_bar(position=position_dodge()) \
+ xlab('Color of diamonds') + ylab('Number count') + ggtitle('Cut/Color of diamonds')
# same
# ggplot(diamonds, aes(x='color',fill='cut')) + geom_bar(stat='count', position=position_dodge())
Out[3]:
In [4]:
# size
display(ggplot(diamonds, aes(x='color',y='clarity')) + geom_count(aes(size='stat(n)'))
+ ggtitle('n of diamonds')
)
display(ggplot(diamonds, aes(x='color',y='clarity')) + geom_count(aes(size='stat(prop)',group='clarity'))
+ ggtitle('proportion of color/clarity'))
display(ggplot(diamonds, aes(x='color',y='clarity')) + geom_count(aes(size='stat(prop)',group='color'))
+ ggtitle('proportion of clarity/color'))
In [5]:
ct = pd.crosstab(diamonds.cut, diamonds.color)
# ct.reset_index() fails.
# https://github.com/pandas-dev/pandas/issues/19136
ct = ct.rename(columns=str).reset_index()
display(ct)
ggplot(ct, aes(x='cut',y='D'))+geom_bar(stat='identity')
Out[5]:
metric variable¶
In [6]:
(ggplot(diamonds, aes(x='carat',y='price',color='color')) + geom_point(size=0.1)
+ ggtitle('scatter plot'))
Out[6]:
In [7]:
# plotnine does not have geom_contour, use geom_density_2d instead
# https://github.com/has2k1/plotnine/issues/110
display(ggplot(diamonds.sample(1000), aes(x='carat',y='price')) + geom_point(size=0.2) + geom_density_2d(color='red')
+ ggtitle('contour') )
In [8]:
display(ggplot(diamonds, aes(x='carat',y='price')) + geom_bin2d(binwidth=(.05,400))
+ ggtitle('heatmap-like') )
display(ggplot(diamonds[diamonds.carat<=.5], aes(x='carat',y='price')) + geom_bin2d(binwidth=(.01,100))
+ ggtitle('heatmap-like: carat<=0.5'))
In [9]:
max(diamonds.carat)//0.25
Out[9]:
In [10]:
#(ggplot(diamonds, aes('carat', 'price')) + geom_boxplot(aes(group = 'cut_width(carat, 0.25)')))
bins = [0.25*i for i in range(2+int(max(diamonds.carat)//0.25))]
dd = diamonds.assign(caratbin=pd.cut(diamonds.carat, bins))
dd.caratbin.value_counts(dropna=False)
display(ggplot(dd, aes(x='caratbin',y='price')) + geom_boxplot() )
#display(ggplot(dd, aes(x='caratbin',y='price')) + geom_violin() )
In [11]:
# geom_hex is not implemented.
display(ggplot(diamonds, aes(x='carat',y='price',color='color')) + geom_point(size=.1)
+ facet_grid(('cut','color'))
+ geom_hline(yintercept=10000)
+ ggtitle('facet_grid')
+ theme(figure_size=(20,20)))
categorical vs metric¶
In [12]:
my_candidates = diamonds[(diamonds.carat<1.1)&(diamonds.carat>0.9)]
display(ggplot(my_candidates, aes(x='cut',y='price',color='color')) + geom_boxplot() )
In [13]:
display(ggplot(my_candidates, aes(x='cut',y='price',color='color')) + geom_violin() )
Comments
comments powered by Disqus