电子工业出版社电子书下载

电子工业出版社电子书下载

背景

“悦读”是一款依托电子工业出版社自身优质的图书内容资源,为高校图书馆、科研院所、企业等机构用户,提供电子书在线阅读与学习服务的产品。

官网: http://yd.51zhy.cn/

2020年初,疫情刚开始的时候,“悦读”免费开放了一段时间。我是从那个时间第一次接触这个网站。当时便下载了几乎全站的计算机相关的新书。

https://zhuanlan.zhihu.com/p/106234902

image-20220420175415198

下载

任意一本图书的试读页面:

image-20220420165945149

image-20220420170110681

/transfer//content/authorize为授权接口,判定是否有权限阅读完整图书。

响应中有大量URL,其中Url是完整的图书的PDF文件(加密),SplitFileUrls是按页切分的PDF文件(加密)。

响应中的Key是通过RSA加密,用RSA解密后得到一个字符串,用于解密PDF文件的。

较新的图书可以直接通过Url下载完整的PDF文件并进行AES解密,但年代比较久远的图书虽然会返回Url,但下载时返回404,因此较老的图书无法获得完整的PDF文件。

试读一般只允许读完整图书的1%。

image-20220420170820902

删除请求中的DeviceToken后发现可以阅读整书的50%。

image-20220420171014584

通过多次修改请求参数,发现将AppId修改为27后,响应体中的URL发生变化。

image-20220420171110609

返回的URL中的文件名为数字递增,其中fn参数无实际作用。

image-20220420171311661

虽然还是只返回50%的URL,但文件名递增,可以生成剩下50%的URL,实现整本图书的下载。

将所有页面下载后,解密PDF并使用工具合并成一个PDF。

PDF解密

登录账号后,会通过/transfer/rsa/create接口创建一个RSA密钥对。

image-20220420174302046

/transfer//content/authorize返回内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"Success":true,
"Code":200,
"Description":"成功",
"Data":{
"Key":"fsd07X9VMuqfH44sjsXqg5BOHFNQm4xPSfiRZeXNsl1sK+NeHLwRPSuZrdpAaahWMarsAq5sxzW8HhZO3ynBWY3Owk5aerCRJz63iFt841RNueqzR0zjVTmre38SbFrII3NAYeFpwQfSvLWap/21+ys6/pGKnFBmhM9hnzvYG6CQ2BOfILfPOev/BxFSe7OtFs34sOtQ69k+JenFyXQMU7RdAL+bPEC63bJ8YoYX5YgprPN3v1Ja3zfd6UoSY131wEKH3h3cWtZETTAwCwFFvHyOd7pF3XcPTrwtJTn0xWIttJ6+kgOW5usl0giXZRqIGgv4xhHWSswEziUV2JmEPQ==",
"Url":"https://file.51zhy.cn/m8Jmwf-kj179E75eGldkJqRGYGtbQ6c_eKPl5DfMvzivu45htz7q94zT3jcVIF6J7ZYv5kmMi2nz6X9evLr7rfkxSH8d1y4xDLCgUUXp0I0=.pdf?filetoken=6oI4guQbGYJl-4Qnfd5gBW-0Fd73zOlMAHhQHhuxp0bpPL3TWWhk5ZzmoOUMrsWD1hIQfF4oZDGD0YNP0HCMKXyWjNfETWDP6TsLYPYKtCneHoKNHIE7m3scjcuCOLBhOcx_g8-K0ZatpRp-7fzS2IQliWP0dG62a75TNI9elaHIdCrF2-N9NErM4sRTCEpqfKHk2QFnxjZWD-v7zWbhCJDogstrSchKy8Z5OqMZ_CuD11PJ5y8kj39ouY2pkW6C",
"SplitFileUrls":[
"https://file.51zhy.cn/m8Jmwf-kj179E75eGldkJqRGYGtbQ6c_eKPl5DfMvzivu45htz7q94zT3jcVIF6J7ZYv5kmMi2nz6X9evLr7reksOJZXL-0SSPNFtLiTmrs=.pdf?filetoken=6oI4guQbGYJl-4Qnfd5gBW-0Fd73zOlMAHhQHhuxp0bpPL3TWWhk5ZzmoOUMrsWD1hIQfF4oZDGD0YNP0HCMKXyWjNfETWDP6TsLYPYKtCneHoKNHIE7m3scjcuCOLBhOcx_g8-K0ZatpRp-7fzS2IQliWP0dG62a75TNI9elaHIdCrF2-N9NErM4sRTCEpqfKHk2QFnxjZWD-v7zWbhCHN8JSNII5Ia3B9BRKgk3Ob7UWNh_NJKgHH_6xT5D8p3",
"https://file.51zhy.cn/m8Jmwf-kj179E75eGldkJqRGYGtbQ6c_eKPl5DfMvzivu45htz7q94zT3jcVIF6J7ZYv5kmMi2nz6X9evLr7rb6goEeVp66oOVLU7B_lJ_w=.pdf?filetoken=6oI4guQbGYJl-4Qnfd5gBW-0Fd73zOlMAHhQHhuxp0bpPL3TWWhk5ZzmoOUMrsWD1hIQfF4oZDGD0YNP0HCMKXyWjNfETWDP6TsLYPYKtCneHoKNHIE7m3scjcuCOLBhOcx_g8-K0ZatpRp-7fzS2IQliWP0dG62a75TNI9elaHIdCrF2-N9NErM4sRTCEpqfKHk2QFnxjZWD-v7zWbhCHN8JSNII5Ia3B9BRKgk3ObkL1T0LPdOMeH6bOm-ARUC",
"......"
],
"FileId":63877238,
"PackageBaseUrl":null,
"WordCountOrDuration":null,
"AllowReadPercentage":0.1,
"SplitFiles":[

],
"IsOnline":null,
"FileFormat":null,
"AuthorizeStrategy":null,
"NumberOfPages":78
}
}

解密:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import base64
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_v1_5 as Cipher_pkcsl_v1_5

private_key = "-----BEGIN RSA PRIVATE KEY-----\r\nMIIEowIBAAKCAQEA1XCu+FRkMWDemy1vZn4ZXKJAnuh8GSKunJS4+GfozHs5SPxt\r\n8G1gWbrpX/lHi5lifDsCYk+TnB4bwAXLBrY4NmiYwMPZC+bFkbfxi6y6yrB5mEF1\r\nFzTuGTlRhwq2A+yKVQCj7RkPNpQL3x0qq03Hpd6z+mM3FS8Gi/MJyzVQeShFtjC6\r\nOMSyCRHAaz454NjUmCIGup4sWVKi6qzl7nNhB+7IC6/DkvXoWSxvSE7oKlg4ny5p\r\nzP5rtxA/Lp4XpVPB55ATK4QZdqJa9YcbNqiRxAgdo4VwWcKinh4IDwkoDermD+Bi\r\ndThC48yOjX4MVQdQTzZ5fA5sj5KMrtbkt3RmMQIBAwKCAQEAjksfUDhCy5XpvMj0\r\n7v67kxbVv0WoEMHJvbh7UEVF3ad7hf2eoEjq5nyblVDaXRDsUtIBlt+3vWln1Vky\r\nBHl6zvBl1dfmB+8uYSVLsnMnMcr7utZOD3ieu3uLr1x5V/MG41XCnhC0zw1dP2jH\r\nHN6FGT8ipuzPY3SvB/db3M41phmhcXFTH2fVNTwT7S2gqZOnk1GVaJoxh7347CbC\r\nlT6opuXI8wfJn7RWXGbxlWMKP9m/hkIYCwgRegWUjHYMbLNHZ3Sr4T3HMz2QCyOt\r\nnMsPwJR6F3N4o2874s3pX2oMVjTLJlzLzT7HNPfKP9ZxHNH5/gyXY4l06eigdyCb\r\nfMMIwwKBgQDs1T/pGrKLE0Q7SxYOKrKQ+B+R2Vi/W64Ycmc1wmXMyHeWUDB3+v5n\r\n0LrsFuqhxvqrQTF8NQZcyy7kd/OwqjT2PYlhZoYlo1jyRqP5LZYIDOZg+O6grOOx\r\nL1s84rhFGnhpNmP2kduKrelqrny42FGdZmzbDKsgWnyx0+tCF7a6VwKBgQDmtsbU\r\nbvZnJfNnPGO6t8jIQwgUxF4isgeVFguMTC+XRR6ETvOdKGj+/dcY0U+3A5buFJnJ\r\nh2v0tNj74/lUWBHgjtev8yFJBm0Qni2dneHyFkMFh2mat7gXmo3tHTHQciNMH/E6\r\nL36NZm90fz+p0Xq767a7WhUc1jjqKDq5ZJketwKBgQCd43/wvHcHYi183LlexyG1\r\n+r+2kOXU58lloZoj1u6IhaUO4CBP/KmaiydID0cWhKcc1iD9eK7oh3SYT/fLHCNO\r\n07Drma7DwjtMLxf7c7las0RApfRrHe0gyjzTQdAuEaWbeZf5tpJcc/Dxyah7OuET\r\nmZ3nXcdq5v3L4pzWunnRjwKBgQCZzy84SfmaGUzvfZfRz9swLLAN2D7BzAUODrJd\r\niB+6LhRYNKJoxZtUqToQi4p6AmSeuGaGWkf4eJCn7VDi5WFAXzp1TMDbWZ4LFB5p\r\nE+v2uYIDr5u8enq6ZwlIviE1oWzdaqDRdP8I7vT4VNUb4Px9R88nkWNojtCcGtHQ\r\n7btpzwKBgEht3KU2AskblYaCCl0vQXyil5yN7r1N3SR0TjGL7NwA2IirDQFqZ+GW\r\nwHE1//TId+mkhhsCQynIUHmWnrSTBQB2UbrC79jq0mjI8Xdrz03lpWne+kCLHac2\r\nMNZE2pERlctsoOowh4sbP0cmu5qh2P/3ci1t8SnmPQllEA8Ut3W+\r\n-----END RSA PRIVATE KEY-----\r\n"

rsakey = RSA.importKey(private_key)
cipher = Cipher_pkcsl_v1_5.new(rsakey)

Key = 'fsd07X9VMuqfH44sjsXqg5BOHFNQm4xPSfiRZeXNsl1sK+NeHLwRPSuZrdpAaahWMarsAq5sxzW8HhZO3ynBWY3Owk5aerCRJz63iFt841RNueqzR0zjVTmre38SbFrII3NAYeFpwQfSvLWap/21+ys6/pGKnFBmhM9hnzvYG6CQ2BOfILfPOev/BxFSe7OtFs34sOtQ69k+JenFyXQMU7RdAL+bPEC63bJ8YoYX5YgprPN3v1Ja3zfd6UoSY131wEKH3h3cWtZETTAwCwFFvHyOd7pF3XcPTrwtJTn0xWIttJ6+kgOW5usl0giXZRqIGgv4xhHWSswEziUV2JmEPQ=='

aes_key = cipher.decrypt(base64.b64decode(Key),'')
print(aes_key)
# X0RlG3RIvYYmJxz1

使用aek_key解密pdf文件:

1
2
3
4
5
6
7
8
9
10
from Crypto.Cipher import AES

aes_key = 'X0RlG3RIvYYmJxz
1'
cryptos = AES.new(aes_key, AES.MODE_ECB)
pdf_data = open('enc.pdf', 'rb').read()
decrypted_pdf_data = cryptos.decrypt(pdf_data)
with open('dec.pdf', 'wb') as fo:
fo.write(decrypted_pdf_data)

合并PDF

尝试不同工具后,发现Adobe Acrobat 2020是最完美的,优化了PDF文件体积。(第一个是使用万兴PDF专家合并的,体积是所有分页文件之和)

image-20220421113345297

image-20220421113024753

给PDF添加书签

在线试读时的书签:

image-20220420172044405

/transfer/tableofcontent/list返回书签相关信息。

image-20220420171735838

使用Python解析出标题及页码后,用PyPDF2即可添加书签。

1
2
3
4
5
6
7
8
9
import PyPDF2

filename = '../63876872.pdf'
pdf_reader = PyPDF2.PdfFileReader(filename)
pdf_writer = PyPDF2.PdfFileWriter()
pdf_writer.cloneDocumentFromReader(pdf_reader)
pdf_writer.addBookmark(Title, Page-1)
with open(filename.replace('.pdf', '_bookmark.pdf'), 'wb') as out_pdf:
pdf_writer.write(out_pdf)

实际操作过程中,会报错,参考https://www.codetd.com/en/article/11823498#ValueError_Type_Outlines_Count_0_is_not_in_list_120 后解决。

image-20220420173750377