python使用pandas模块

Pandas 数据结构 – Series , pandas.Series( data, index, dtype, name, copy)

1.简单pandas示例

import  pandas as pd
a = [1,2,3]
myvar = pd.Series(a)
print(myvar)

#输出
0    1
1    2
2    3
dtype: int64

2. 我们可以指定索引

import  pandas as pd
a = ["Google","Baidu","Huawei"]
data = pd.Series(a,index=["x","y","z"])
print(data)
#print(data['y']) #根据索引值读取对象

#输出
x    Google
y     Baidu
z    Huawei
dtype: object

3. 我们也可以使用 key/value 对象，类似字典来创建 Series

import  pandas as pd
sites = {1:"Google",2:"Baidu",3:"Wiki"}
myvar = pd.Series(sites)
print(myvar)

#输出
1    Google
2     Baidu
3      Wiki
dtype: object

4. 可以指定字典中的索引

import  pandas as pd
sites = {1:"Google",2:"Wiki"}
myvar = pd.Series(sites,index=[1,2])
print(myvar)

#输出
1    Google
2      Wiki
dtype: object

5.设置Series实例参数

import  pandas as pd
sites = {1: "Google",2:"Baidu",3:"Wiki"}
myvar = pd.Series(sites,index=[1,2],name="Hello World")
print(myvar)

#输出
1    Google
2     Baidu
Name: Hello World, dtype: object

Pandas 数据结构 – DataFrame , pandas.DataFrame( data, index, columns, dtype, copy)

1.使用列表创建

import  pandas as pd
data = [['Google',10],['Baidu',12],['Wiki',13]]
df = pd.DataFrame(data,columns=['Site','Age'],dtype=float)
print(df)

#输出
Site   Age
0  Google  10.0
1   Baidu  12.0
2    Wiki  13.0

以下实例使用 ndarrays 创建

import pandas as pd
data = {'site':['Google','Baidu','Wiki'],'Age':[10,12,13]}
df = pd.DataFrame(data)
print(df)

#输出
site  Age
0  Google   10
1   Baidu   12
2    Wiki   13

使用字典创建

import  pandas as pd
data = [{'a':1,'b':2},{'a':5,'b':'10','c':'20'}]
df = pd.DataFrame(data)
print(df)]
#输出
a  b      c
0  1  2   NaN
1  5  10  20

Pandas 可以使用 loc, 属性返回指定行的数据，如果没有设置索引，第一行索引为 0 ，第二行索引为 1

import pandas as pd
data = {
    "calories": [420,380, 390],
    "duration": [50,40,45]
        }
df = pd.DataFrame(data)
print(df.loc[0])

#输出
calories    420
duration     50
Name: 0, dtype: int64

loc返回结果也可以是Series数据

import pandas as pd
data = {
    "calories": [420,380, 390],
    "duration": [50,40,45]
        }
df = pd.DataFrame(data)
print(df.loc[[0,1]])

#输出
calories  duration
0       420        50
1       380        40

dataframe指定索引

import  pandas as pd
data = {
    "calories":[420,300,500],
    "duration":[50,30,40]

}
df = pd.DataFrame(data,index=["day1","day2","day3"])
print(df)

#输出
calories  duration
day1       420        50
day2       300        30
day3       500        40

loc指定索引

import  pandas as pd
data = {
    "alpha": [1,2,3],
    "Xma":[10,30,40]
}
df = pd.DataFrame(data,index=["day1","day2","day3"])
print(df.loc['day1'])
#输出
alpha     1
Xma      10
Name: day1, dtype: int64

Published by 王健 on 2022年9月26日

0 Comments

发表评论取消回复

Python tkinter库

gopup以一敌十的爬虫api库

matplotlib模块