使用Pandas和matplotlib库进行简单的数据分析与可视化

Pandas 是 Python 语言的一个扩展程序库,用于数据分析。

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.

本文章将会对一些数据data.csv进行处理与绘图,形如:

Idx Title of Book Description Authors Rating Price Availability Book Category
0 It’s Only the Himalayas Wherever you go, whatever you do, just . . . don’t do anything stupid. S. Bedford 2 45.17 19 Travel

Analysis

Read File

利用pd.read_csv完成读取,指定index_col字段,以指定数据随后所使用的索引。

1
2
3
4
5
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
dfcand=pd.read_csv("data.csv", sep=',',index_col=0)

Group by

dataframe对象使用.groupby()可以对数据进行合理的归并分组,是一个pandas.core.groupby.generic.DataFrameGroupBy 对象

.size()

.size()字段可以得出各个分组的名称和对应大小(数量),是一个pandas.core.series.Series对象

1
2
3
4
5
6
7
8
9
10
11
12
dfcand.groupby('Book Category').size()

#Book Category
#Academic 1
#Add a comment 67
#Adult Fiction 1
#Art 8
#Autobiography 9
#Biography 5
#...
#Young Adult 54
#dtype: int64

Series对象

.sort_values(ascending=False)

Series对象,.sout_values()可以规定其排序方式

1
2
3
4
5
6
7
8
9
10
11
12
piedata = dfcand.groupby('Book Category').size().sort_values(ascending=False)

#Book Category
#Default 152
#Nonfiction 110
#Sequential Art 75
#Add a comment 67
#Fiction 65
#Young Adult 54
#...
#Academic 1
#dtype: int64

一般的,对于Series对象,可以利用比较符号进行筛选,例如,我们要获得以上大于7的值

1
2
3
4
5
6
7
8
abovepiedata = piedata[piedata>7]

#Book Category
#Default 152
#Nonfiction 110
#...
#Autobiography 9
#dtype: int64

pd.concat()

可以使用此方法将多个Series对象合成为dataframe对象

当axis = 1时,如果其索引一样,会将合并的Series作为新的列,最终合并为dataframe

1
pd3d = pd.concat([ratingseries,availabilityseries,sizeseries],axis= 1,ignore_index= False)

对于合并后可能出现的未命名列,可以使用.iloc 获取,例如

1
pd3d.iloc[:,2]

Benford

By Benford’s law it is often the case that 1 occurs more frequently than 2, 2 more frequently than 3, and so on. This observation is a simplified version of Benford’s law. More precisely, the law gives a prediction of the frequency of leading digits using base-10 logarithms that predicts specific frequencies which decrease as the digits increase from 1 to 9.

1
2
3
4
frequency = {'1':0,'2':0,'3':0,'4':0,'5':0,'6':0,'7':0,'8':0,'9':0}
def benford(num):
firstdigit = str(num)[0]
frequency[firstdigit] = frequency[firstdigit] + 1

Draw diagrams

General

1
2
3
4
5
6
7
8
9
plt.figure(figure = (10,10) #规定画布的大小
plt.rcParams.update({'font.family': 'Times New Roman'}) #规定显示字体
plt.rcParams.update({'font.weight': 'normal'}) #规定字体粗细
plt.rcParams.update({'font.size': 15}) #规定字体字号

#Draw Diagram

plt.title("This is a figure") #规定图片标题
plt.show() #显示图片

Pie

Series对象,可利用.values获取其值

利用plt.pie()绘制饼图

第一个参数提供数据

labels = 提供对应数据的标签

startangle = 规定第一个刻度的角度27a6d1f0aa648134a54357c580480631.png

labeldistance = 规定标签到饼图的距离

1
plt.pie(abovepiedata.values,labels=abovepiedata.index,startangle= 32,labeldistance= 1.12)

3D Scatter

1
2
3
4
5
6
7
ax = plt.axes(projection = '3d') #规定为3D散点图
ax.scatter3D(pd3d['Rating'], pd3d.iloc[:,2], pd3d['Availability']) #为散点图提供数据
plt.xticks((range(5)) #规定x轴的刻度标度

plt.xlabel('Avg Rating')#规定x轴标签
plt.ylabel('Size of Category',rotation = 39)#规定y轴标签
ax.set_zlabel('Avg stock')#规定z轴标签

60be8e80839f1377b500f8880e84589a.png

Bar & Plot

plt.bar() 中,x = 提供了数据的条目(有几列数据)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
plt.bar(x=range(len(benser)), height=benser, label='Data', tick_label = benser.index)

plt.xlabel("First Digit")
plt.ylabel("Data")

#与上图共用同一个x轴
ax2 = plt.twinx()

#规定y轴的取值区间
ax2.set_ylim([0.04, 0.33]);
plt.plot(range(len(benser)), y, marker='.', color ='goldenrod', linewidth='1', label="Benford's Law")

#给出折线上的百分比图例
plt.legend(loc="upper right")
for a, b in zip(range(len(benser)), y):
plt.text(a, b, str('{:.2f}'.format(b*100)) + '%', ha='center', va='bottom', fontsize=8)

31c32aecc6b84bc4f1c38402c0cfb495.png

How to get the author of the book by title

思路很简单,找到一个 API 即可。

此 API 接受书名(Title)作为参数,返回的数据中须包含作者(Author)字段。

经过一些探索,Google Books APIs 可以作为一个合格的解决办法。而Goodreads.com的搜索框可以是一个辅助选项。

Google Books APIs

Refer to this webpage to find more information: Google Books APIs Getting Started

Google Books has a vision to digitize the world’s books. You can use the Google Books API to search content, organize an authenticated user’s personal library and modify it as well.

Books concepts

为了能够正确处理随后返回的json数据,应当理解以下四则基本概念:

  • Volume: A volume represents the data that Google Books hosts about a book or magazine. It is the primary resource in the Books API. All other resources in this API either contain or annotate a volume.

  • Bookshelf: A bookshelf is a collection of volumes. Google Books provides a set of predefined bookshelves for each user, some of which are completely managed by the user, some of which are automatically filled in based on user’s activity, and some of which are mixed. Users can create, modify or delete other bookshelves, which are always filled with volumes manually. Bookshelves can be made private or public by the user.

    Note: Creating and deleting bookshelves as well as modifying privacy settings on bookshelves can currently only be done through the Google Books site.

  • Review: A review of a volume is a combination of a star rating and/or text. A user can submit one review per volume. Reviews are also available from outside sources and are attributed appropriately.

  • Reading Position: A reading position indicates the last read position in a volume for a user. A user can only have one reading position per volume. If the user has not opened that volume before, then the reading position does not exist. The reading position can store detailed position information down to the resolution of a word. This information is always private to the user.

Working with volumes

You can perform a volumes search by sending an HTTP GET request to the following URI:

https://www.googleapis.com/books/v1/volumes?q=search+terms

例如:搜索书目《It’s Only the Himalayas》,如果配置正确,会得到一个json,内含所有的搜索结果,一般的,认为第一个结果就是我们搜索得到的书目。

Goodreads.com

Refer to this webpage to find more information: Goodreads.com

Discover and share books you love on Goodreads, the world’s largest site for readers and book recommendations!

实例请求:搜索《Test》

对于搜索请求https://www.goodreads.com/search?q=`keyword`&qid=

只需要将关键词填入q=后

Scrape

Goodreads.com 可能有反爬机制,可以使用伪装浏览器的办法缓解一些情况:

1
2
3
4
5
6
7
8
send_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
"Connection": "keep-alive",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "zh-CN,zh;q=0.8"
}

result = requests.get(main_url, send_headers, proxies=proxies)

一些情况下Goodreads也无法给出正确的作者,需要使用try catch关键字捕获异常情况。

Python Code

注意事项

对于在Python中使用这两个办法,需要注意一些内容

  1. 代理可能导致错误request eof occurred in violation of protocol (_ssl.c:997)

    多见于使用的代理工具代理模式为全局代理,并且未在Python脚本中正确配置代理,尝试通过以下办法解决:

    1
    2
    3
    4
    5
    6
    7
    8
    import requests
    proxies = {
    'http': 'http://your_server:your_port',
    'https': 'http://your_server:your_port',
    }

    #仅在需要代理的请求下,填写参数proxies=proxies
    result = requests.get(Full_API_Link,proxies = proxies)
  2. Google Books APIs 所返回的json中,结果title字段下的标题并不与本地title字段相匹配。通过使用Python自带的 difflib 库实现匹配功能。

  3. 由于持续的请求,可能会导致请求失败,故使用try: except:关键字捕获异常,保证程序正常运行。

简单实现

假设:

  1. 本电脑使用的代理工具是CFW,未开启全局代理,默认端口

  2. 可以正常请求到对应书目,遍历json找到最为匹配的标题。

  3. Google Books APIs 无法正确获得作者,进而使用Goodreads尝试获取之。

代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import requests
proxies = {
'http': 'http://localhost:7890',
'https': 'http://localhost:7890',
}
import json
import difflib、
from bs4 import BeautifulSoup

def UrlToSoupAdvanced(Url:str):
main_url = Url
print(main_url)
send_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
"Connection": "keep-alive",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "zh-CN,zh;q=0.8"
}

try:
result = requests.get(main_url, send_headers, proxies=proxies)
except :
time.sleep(5)
return "[Unknown][Goodreads]Fail to request the link"

return BeautifulSoup(result.text, 'html.parser')

def TwoStrMatch(string1:str,string2:str):
result = difflib.SequenceMatcher(None, string1, string2).quick_ratio()
print(string1 + "====" + string2 + "====" + str(result))
return result

def GetAuthorByTitleUsingGoogleBooksAPI(Title:str):
Google_Book_APIs_Head = "https://www.googleapis.com/books/v1/volumes?q="
API_KEY = "AIzaSyCdlpdS8EWgKIN6EW95fwjiLqDkkiIA8Pg"
Search_Data = Title

#https://www.googleapis.com/books/v1/volumes?q=A%20Summer%20In%20Europe+intitle:keyes&key=AIzaSyCdlpdS8EWgKIN6EW95fwjiLqDkkiIA8Pg
#https://www.googleapis.com/books/v1/volumes?q=a summer in europe&printType=books&intitle:a summer in europe

Full_API_Link = Google_Book_APIs_Head + Search_Data
#Full_API_Link = Google_Book_APIs_Head + "It's_Only_the_Himalayas"

print("The request link is " + Full_API_Link)
try:
result = requests.get(Full_API_Link,proxies = proxies)
except :
time.sleep(5)
return GetAuthorByTitleUsingGoodreadsSearch(Title)



return_json_dict = json.loads(s = result.text)
if 'items' in return_json_dict:
return_items = return_json_dict['items']
else:
return GetAuthorByTitleUsingGoodreadsSearch(Title)

#对于返回的搜索结果
for book in return_items:
current_volumn = book['volumeInfo']
current_title = current_volumn['title']

if 'authors' in current_volumn:
current_authors = current_volumn['authors']

#如果标题直接匹配,正常返回
if (TwoStrMatch(current_title, Title) > 0.85):
return str(','.join(current_authors))

if('subtitle' in current_volumn):
#如果加上副标题是匹配的,则正常返回
if (TwoStrMatch(current_title + current_volumn['subtitle'], Title) > 0.85):
return str(','.join(current_authors))

return GetAuthorByTitleUsingGoodreadsSearch(Title)

def GetAuthorByTitleUsingGoodreadsSearch(Title:str):

Full_Search_Link = "https://www.goodreads.com/search?q=" + title +"&qid="


try:
soup = UrlToSoupAdvanced(Full_Search_Link)
except:
return "[Unknown][Goodreads]Fail to request the link"

try:
author = soup.find('span', itemprop="author").div.a.span.string
except:
return "[Unknown][Goodreads]No correct Author"

if(author == None):
return "[Unknown][Goodreads]No correct Author"
return author

Camera

参考视频:# 04.相机系统 | GAMES204-计算成像

Nautilus Eye 大王乌贼 小孔成像阶段

tapetum incidum 反射层,提高夜视能力

Compound Eye 光场相机

Human Eye

Visual acuity: 20/20 is ~1 arc min

Field of view: ~190° monocular, ~120°binocular,~135° vertical

Temporal resolution: ~60Hz(depends on contrast, luminance)

Dynamic range: instantaneouse 6.5 f-stops,adapt to 46.5f-stops

Color: everything in the CIE xy diagram; distances are linear in CIE Lab

Depth cues in 3D displays: vergence, focus, conflicts,

Accommodation range: ~8cm to ∞,年龄越大看看近能力越差

Camera Optics: Lens

ec646e532214e77599a4bcecd23f81be.png

a8047fc3ad676d0b026095db6b8b5f19.png

Aberrations

Distortion

2022-09-28-15-40-12-image.png

Aperture

2022-09-28-15-41-50-image.png

2022-09-28-15-45-18-image.png

Unit: f-number N = f/D

f 焦距,D aperture 直径

D 越小,进光量越大,景深越小

Camera Depth of Field

物距、Aperture Size、放大率

M放大率越大,景深越小

25ee6b05eabc4a82a38d73a3428931e0.png

2022-09-28-15-47-03-image.png

hyper plane

b6ea4cceb87c62e7a74dbe0e4b696757.png

光圈小会带来曝光时间长的问题

Field of View

超广角,长焦

824cb200b298e4cb4590c2af2ed6e4f3.png

Focal length 是固定的,Sensor 是固定的

f27778e65d026159398b0d7e87ab3c4f.png

a6b225fadce5b51fe63034a26bc7c05b.png

Diffraction Limit 衍射极限

5f726342e3625ca0ba111aefc6212dea.png

Dolly zoom

Sensor 传感器

非线性的 曝光程度(Log Exposure)和曝光密度(Density)

84dd9d32bb15d0ac9e3fa0475dedca08.png

HDR imaging 高动态范围

ac9925536f0387a309d0128aac782576.png

415bf0d42c896f3e46f1a614d4403756.png

ISO 相对线性
5375eb11736d57918a6387a4a0c5da08.png
电子快门对于运动快的物体会导致拉伸变形

Intro to Android

Application Components

Services

A service is an component that runs at the background

E.g. music play at the background

Service is not a thread

Broadcast Receiver

Receives broadcast from Android system and other apps and responses with pre-coded actions.

Apps are required to regist to get the broadcast

Example: When the battery level changes, the system broadcast a message.

Activity

Offen refer to one interface on your phone.

Primary class for interacting with user.

For example, Wechat:

  • When you click on the app launcher, its corresponding greeting page
    will be shown to you.

    Android system invokes the main activity of the Wechat.

  • Apps that request for online payment (such as railway ticket
    payment) can directly reach the payment page.

    Another app invokes the payment activity of Wechat.

  • Apps that request for “share on moments” can directly invoke the
    moments sharing page of Wechat.

    Another app invokes the “share on moments” activity of Wechat

Back Stack

When a new activty is launched, the previous activity will be paused and sent to the top of the back stack

3074c66675d6c28231d50f91cb67a2ae.png

Life circle

  • Runing: The activity is on the top of the screen and gained focus. Running will not be killed.

  • Paused: The activity is on the top of the screen and gained focus.

  • Stopped: The activity is completely covered by another running activity.

  • Destroyed

When running short of memory, a stopped activity is more likely to get killed than a paused/running activity.

5f74b1146359b2da39a899a5156f07a6.png

Good implementation of callback functions can make your app more robust and performant.

Possible issues with a bad implementation:

  • Crashing if the user receives a phone call or switches to another app while using your app.

  • Consuming valuable system resources when the user is not actively using it.

  • Losing the user’s progress if they leave your app and return to it at a later time.

  • Crashing or losing the user’s progress when the screen rotates between landscape and portrait orientation.

CallBack Funtions: Typical Uses
  • onCreate(): Initial setup, load persistent state.

  • onRestart(): read cached state

  • onStart(): reset application

  • onResume(): start foreground-only behaviors

  • onPause(): shutdown foreground-only behaviors

  • For example: temporarily stop UI animations.

  • onStop(): cache state

  • onDestroy(): save persistent state

The above points are very general. Carefully design your app and keep the life cycle graph in mind.

Create Activities

23c0e5afcf14c33b320504479143645f.png

  1. Create a new Activity class. Which either inherits Android.app.Activity or its subclasses.

  2. Override Activity.onCreate().

  3. Create a layout XML file in res/layout and use setContentView() to load this layout.

  4. Register the new activity in AndroidManifest.xml.

    If it is a main activity, you need to add a special section in the manifest file.

    5f50cb9f027a1bd432f0dc1ea3bb051c.png

Activities can be started by calling the function

1
startActivity(Intent intent)

To call Activity2 Inside an activity, do:

1
2
3
4
Intent intent =
new Intent(this, Activity2.class);

startActivity(intent);

The name of the target activity is not always explicitly specified. For instance, to let Android system choose an suitable activity for sending email (in Lecture 9):

1
2
3
Intent intent = new Intent(Intent.ACTION_SEND);
Intent.putExtra(Intent.EXTRA_EMAIL, recipientArray);
startActivity(intent);

Obtain vals from another activity

Sometimes we wish to obtain results from another activity. We need to start the activity using

1
startActivityForResult()

You must also implement the function below to get the return result

1
onActivityResult()

e3abb411a93727fb5764515c2ad064d3.png

Closing Activities

Android will automatically manage the life cycles of your activities.

You can destroy the current activity manually by calling finish().

To finish an activity that you previously invoked with

1
2
startActivityForResult(Intent,
int), use finishActivity(int requestCode)

Can be handy when you want to make sure that the user won’t return to this activity in the future.

Passing Data between Activities

f7362876239d8ffc3874022bfed42ee0.png

Bundle and Intent is actually the same thing.

Console output

Your console output (System.out) can be seen from the “run” window in Android Studio.

You should normally use Log class instead, though:

Android Logcat | Log.v(), Log.d(), Log.i(), Log.w(), Log.e() - EyeHunts