How to get the author of the book by title
Introduction
思路很简单,找到一个 API 即可。
此 API 接受书名(Title)作为参数,返回的数据中须包含作者(Author)字段。
经过一些探索,Google Books APIs 可以作为一个合格的解决办法。而Goodreads.com的搜索框可以是一个辅助选项。
Google Books APIs
Refer to this webpage to find more information: Google Books APIs Getting Started
Google Books has a vision to digitize the world’s books. You can use the Google Books API to search content, organize an authenticated user’s personal library and modify it as well.
Books concepts
为了能够正确处理随后返回的json数据,应当理解以下四则基本概念:
Volume: A volume represents the data that Google Books hosts about a book or magazine. It is the primary resource in the Books API. All other resources in this API either contain or annotate a volume.
Bookshelf: A bookshelf is a collection of volumes. Google Books provides a set of predefined bookshelves for each user, some of which are completely managed by the user, some of which are automatically filled in based on user’s activity, and some of which are mixed. Users can create, modify or delete other bookshelves, which are always filled with volumes manually. Bookshelves can be made private or public by the user.
Note: Creating and deleting bookshelves as well as modifying privacy settings on bookshelves can currently only be done through the Google Books site.
Review: A review of a volume is a combination of a star rating and/or text. A user can submit one review per volume. Reviews are also available from outside sources and are attributed appropriately.
Reading Position: A reading position indicates the last read position in a volume for a user. A user can only have one reading position per volume. If the user has not opened that volume before, then the reading position does not exist. The reading position can store detailed position information down to the resolution of a word. This information is always private to the user.
Working with volumes
Performing a search
You can perform a volumes search by sending an HTTP GET
request to the following URI:
https://www.googleapis.com/books/v1/volumes?q=search+terms
例如:搜索书目《It’s Only the Himalayas》,如果配置正确,会得到一个json,内含所有的搜索结果,一般的,认为第一个结果就是我们搜索得到的书目。
Goodreads.com
Refer to this webpage to find more information: Goodreads.com
Discover and share books you love on Goodreads, the world’s largest site for readers and book recommendations!
Search Bar
实例请求:搜索《Test》
对于搜索请求https://www.goodreads.com/search?q=`keyword`&qid=
只需要将关键词填入q=后
Scrape
Goodreads.com 可能有反爬机制,可以使用伪装浏览器的办法缓解一些情况:
1 | send_headers = { |
一些情况下Goodreads也无法给出正确的作者,需要使用try
catch
关键字捕获异常情况。
Python Code
注意事项
对于在Python中使用这两个办法,需要注意一些内容
代理可能导致错误request eof occurred in violation of protocol (_ssl.c:997)
多见于使用的代理工具代理模式为全局代理,并且未在Python脚本中正确配置代理,尝试通过以下办法解决:
1
2
3
4
5
6
7
8import requests
proxies = {
'http': 'http://your_server:your_port',
'https': 'http://your_server:your_port',
}
#仅在需要代理的请求下,填写参数proxies=proxies
result = requests.get(Full_API_Link,proxies = proxies)Google Books APIs 所返回的json中,结果
title
字段下的标题并不与本地title
字段相匹配。通过使用Python自带的 difflib 库实现匹配功能。由于持续的请求,可能会导致请求失败,故使用
try:
except:
关键字捕获异常,保证程序正常运行。
简单实现
假设:
本电脑使用的代理工具是CFW,未开启全局代理,默认端口
可以正常请求到对应书目,遍历json找到最为匹配的标题。
Google Books APIs 无法正确获得作者,进而使用Goodreads尝试获取之。
代码:
1 | import requests |
How to get the author of the book by title
http://www.tsingloo.com/2022/10/17/1f9ff2fe8cc9453e8c58ab93813cc9e3/