Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileManager #2013

Closed
wants to merge 15 commits into from
Closed

FileManager #2013

wants to merge 15 commits into from

Conversation

gongweibao
Copy link
Contributor

@gongweibao gongweibao commented May 4, 2017

根据#1902的讨论,总结了一下。
命令和详细的参数部分还在增加中。
Here may be better to review

Maybe I need to write document in English, or foreign colleagues can't read。

## Objetive
在本文档中,我们设计说明了用户上传、下载、管理自己在PaddlePaddle Cloud上的文件所涉及到的模块和流程

<image src=./src/filemanager.png width=8900>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个图中, cloud.paddlepaddle.org 也是在ingress后面的。

我理解下面的"Module"章节,是对这个图的解释,这样的话就不需要在单独的章节里了,跟在图下面,这样读者的思路会比较顺畅。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,79 @@
# Desgin doc: FileManager
## Objetive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果文档都是中文标题也都统一用中文吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<image src=./src/filemanager.png width=8900>

## Module
### PFS Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PFS需要先定义, fileserver, FileManager PFS,这些名词统一一下得。PFS需要对应图中的名词。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- 做Http转发、负载均衡
- 注意配置session保持,以便来自一个用户的访问可以定向到一个固定的机器上,减少冲突写的机会。


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上图中的CephFS这也需要一起解释下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- [RESTAPI](./RESTAPI.md)接口

## 解释
### 为什么有chunk的抽象:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么有chunk的抽象: => 文件分块上传

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

### 关于认证
> 通信各方都需要有各自的身份证。一个公司可以自签名一个CA身份证,并 且用它来给每个雇员以及每个程序签署身份证。这样,只要每台电脑上都预先安 装好公司自己的CA身份证,就可以用这个身份证验证每个雇员和程序的身份了。 这是目前很多公司的常用做法

身份的认证来自于用户或者程序是否有crt标识身份,以及是否有可信的CA确认这个身份证是否有效。我们这里描述的crt涉及到两个部分,一个是Client端程序访问FileServer的crt,不妨称之为Client crt;另外一个是FileServer访问CephFS的crt,不妨称之为CephFS crt。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个用户只需要有一个cert就可以了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Client和FileServer相互认证的办法
`cloud.paddlepaddle.org`需要有自己的CA,FileServer和注册用户也要为其生成各自的私钥(key)、crt。这样用户把CA、自己的key和crt下载到本地后,Client程序可以用之和FileServer可以做相互的认证

- CephFS验证FileServer的身份的两种方法
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只写一种计划实现的方法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- 第二种:CephFS crt只有一个,也就是admin crt,拥有所有volume的读写权限。
FileServer从Client crt提取Client的身份(username),限制其可以操作的volume。 我们选择这种。

### 关于文件传输
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分可以放在chunk介绍之上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ls - list directory contents or a file attributes

# Synopsis
` ls [OPTION]... <PFSPath>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些命令是直接在客户端执行么,如果是,应该是paddle pfs ls ....

这些命令的参数都应该对应到REST 中API

Copy link
Contributor Author

@gongweibao gongweibao May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参考了一下aws.s3的子命令部分,例如ls,这么写应该也可以。

## Objetive
在本文档中,我们设计说明了用户上传、下载、管理自己在PaddlePaddle Cloud上的文件所涉及到的模块和流程

<image src=./src/filemanager.png width=8900>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我对ingress不是很了解,为什么ingress在图中被分成了多个pod?

Copy link
Contributor

@Yancey1989 Yancey1989 May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingress实际会有多个Pod组成,每个Pod里是一个Nginx的实例.

## Objetive
在本文档中,我们设计说明了用户上传、下载、管理自己在PaddlePaddle Cloud上的文件所涉及到的模块和流程

<image src=./src/filemanager.png width=8900>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉pod可以去掉了,有一些让人混淆。感觉这里讲的是模块和流程,不是每个程序通过什么方式运行的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<image src=./src/filemanager.png width=8900>

## Module
### PFS Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缩写之前全称应该出现过:PFS Client -> Paddle Filesystem (PFS) Client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### FileServer
功能说明:
- gorpc写的HttpServer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

简写需要大写,比如:HTTP, URL。
这里应该改成:goRPC写的HTTPServer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

功能说明:
- gorpc写的HttpServer
- 响应外部的REST API的请求
- 在kubernets中运行
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> kubernetes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

惭愧。Done。

- CephFS验证FileServer的身份的两种方法
- 第一种:每一个用户都有自己单独的访问CephFS crt。
用户访问其空间时,由FileServer读取它然后才可以在CephFS上完成操作。
- 第二种:CephFS crt只有一个,也就是admin crt,拥有所有volume的读写权限。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有个疑问,不一定是对的:为什么fileserver访问ceph需要认证,fileserver是我们自己在内网启动的,不能相信吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实不需要。Done。

FileServer从Client crt提取Client的身份(username),限制其可以操作的volume。 我们选择这种。

### 关于文件传输
文件传输的的关键在于需要Client端对比src和dst的文件chunks的checkSum是否保持一致,不一致的由Client Get或者Post chunk完成。藉由上述的方法完成断点的数据传输。 upload文件时,由于一个文件可以是多个FileServer可写的,存在冲突的机会,需要Client端在Post最后一个chunk的时候检查dest文件的MD5值是否和本地文件一致。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get和Post是HTTP的GET和POST吗?用的是goRPC,貌似跟HTTP GET/POST没多大关系?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

FileServer从Client crt提取Client的身份(username),限制其可以操作的volume。 我们选择这种。

### 关于文件传输
文件传输的的关键在于需要Client端对比src和dst的文件chunks的checkSum是否保持一致,不一致的由Client Get或者Post chunk完成。藉由上述的方法完成断点的数据传输。 upload文件时,由于一个文件可以是多个FileServer可写的,存在冲突的机会,需要Client端在Post最后一个chunk的时候检查dest文件的MD5值是否和本地文件一致。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“需要Client端在Post最后一个chunk的时候检查dest文件的MD5值是否和本地文件一致。”
是客户端检查还是服务器检查?


- 优化的方法:

- dst文件不存在时,可以没有Get的过程,只有Post。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是很明白这个Get和Post是指的什么?

Copy link
Contributor Author

@gongweibao gongweibao May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done。这个概念不应该暴露在这里。

- 优化的方法:

- dst文件不存在时,可以没有Get的过程,只有Post。
- 文件的chunks信息可以做cache,不用每次启动传输都去读和计算。这个由于比较复杂,第一期暂时不做。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得这个可以不要写在design doc里了,这只是一个优化的可能方案,有没有效,有没有必要都不一定。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## Objetive
在本文档中,我们设计说明了用户上传、下载、管理自己在PaddlePaddle Cloud上的文件所涉及到的模块和流程

<image src=./src/filemanager.png width=8900>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

图中有些单词要大写例如Api=>API, ingress=>Ingress, cephfs=>CephFS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## Objetive
在本文档中,我们设计说明了用户上传、下载、管理自己在PaddlePaddle Cloud上的文件所涉及到的模块和流程

<image src=./src/filemanager.png width=8900>
Copy link
Contributor

@Yancey1989 Yancey1989 May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingress实际会有多个Pod组成,每个Pod里是一个Nginx的实例.

- 第一种:每一个用户都有自己单独的访问CephFS crt。
用户访问其空间时,由FileServer读取它然后才可以在CephFS上完成操作。
- 第二种:CephFS crt只有一个,也就是admin crt,拥有所有volume的读写权限。
FileServer从Client crt提取Client的身份(username),限制其可以操作的volume。 我们选择这种。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

简述一些选择第二种的理由吧?

FileServer从Client crt提取Client的身份(username),限制其可以操作的volume。 我们选择这种。

### 关于文件传输
文件传输的的关键在于需要Client端对比src和dst的文件chunks的checkSum是否保持一致,不一致的由Client Get或者Post chunk完成。藉由上述的方法完成断点的数据传输。 upload文件时,由于一个文件可以是多个FileServer可写的,存在冲突的机会,需要Client端在Post最后一个chunk的时候检查dest文件的MD5值是否和本地文件一致。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不一致的由Client Get或者Post chunk完成

何时用Get,何时用Post呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## 参考文档
- [Do you see tls?](https://github.com/k8sp/tls/blob/master/README.md)
- [s3](http://docs.aws.amazon.com/cli/latest/reference/s3/)
- linux man document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议也加个链接吧,linux man document

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## Objetive
在本文档中,我们设计说明了用户上传、下载、管理自己在PaddlePaddle Cloud上的文件所涉及到的模块和流程

<image src=./src/filemanager.png width=8900>
Copy link
Contributor

@Yancey1989 Yancey1989 May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉可以不暴露Kubernetes的概念,Ingress=>Layer 7 Load Balancer 这样的描述比较通用,否则的话还需要解释什么是Ingress,什么是Pod,而这并不是这篇Design Doc的重点

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,83 @@
# FileManager设计文档
## 名词解释
- PFS:是Paddle cloud File System的简称。与之相对的是Local File System。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle cloud=>PaddlePaddle Cloud

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Ingress
- 在kubernets中运行
- 做HTTP转发、负载均衡
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

做HTTP转发、负载均衡=>提供七层协议的反向代理,负载均衡

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- 在kubernetes中运行
- [RESTAPI](./RESTAPI.md)接口

## 文件传输
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是FileManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是。不过说明的不合理。已经改正。Done

## 名词解释
- PFS:是Paddle cloud File System的简称。与之相对的是Local File System。
- FileServer:接收用户管理文件命令的服务端
- FileManger:用户管理自己自己在PFS文件上的系统称为FileManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有看到介绍FileManager的地方。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

主要功能包括:

- 提供常用的命令行文件管理命令管理文件
- 支持的命令在[Here] (./pfs/pfs.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

超链接的markdone语法有误。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

惭愧。Done

- The following command copies a single file to pfs

```
paddle pfs cp ./text1.txt /pfs/mydir/text1.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- 用Golang写,可以跨平台执行

- 双向验证
PFSClient需要和Ingress之间做双向验证<sup>[tls](#tls)</sup>,所有用户需要首先在`cloud.paddlepaddle.org`上注册一下,申请用户空间,并且把系统生成的Key、CRT、CA下载到本地,然后才能使用PFSClient。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingress的粘性链接于双向验证的实现细节想通了吗?:)请于武毅和闫旭确认一下可行:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦,解释在这里

## 文件传输优化

### 分块文件传输
用户文件可能是比较大的,上传到Cloud或者下载到本地的时间可能比较长,而且在传输的过程中也可能出现网络不稳定的情况。为了应对以上的问题,我们提出了Chunk的概念,一个Chunk由所在的文件偏移、数据、数据长度及校验值组成。文件数据内容的上传和下载都是都过Chunk的操作来实现的。由于Chunk比较小(默认256K),完成一个传输动作完成的时间也比较短,不容易出错。PFSClient在传输完毕最后一个Chunk的时候检查desttination文件的MD5值是否和source文件一致。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

desttination -> destination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


```
GET /file: Get attribue of files
POST /file: Touch a file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touch file有什么用吗,为什么要支持?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touch的语义本来是没有就创建,有就不创建。。。我还是写的更清楚一下。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不明白,paddle pfs的什么命令对应touch file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


```
GET /dir: List all files in a directory
POST /dir: Touch a directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touch directory的目的是mkdir吗?直接写成create a directory是否更容易懂?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

paddle pfs ls /pfs/mydir/text1.txt
```

Output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output容易随着程序的变化而变化(比如可能会加入进度条),就不要写在design doc里面了吧。
以下还有很多处Output。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

If destination already exist, please [rm](rm.md) it first.

```
mv [OPTION]...
Copy link
Contributor

@helinwang helinwang May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<LocalPath> <PFSPath> or <PFSPath> <LocalPath> or <PFSPath> <PFSPath>这一行有点让读者困惑,是不是看看怎么写的明白点?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Description

```
-r, -R, --recursive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一个事情就只用一个方法来做吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

-r, -R, --recursive
remove directories and their contents recursively

--page-size (integer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm如何有page-size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Synopsis
` sync [OPTIONS]...
<LocalPath> <PFSPath> or <PFSPath> <LocalPath> or <PFSPath> <PFSPath>`
Copy link
Contributor

@helinwang helinwang May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<LocalPath> <PFSPath> or <PFSPath> <LocalPath> or <PFSPath> <PFSPath>有点看不懂:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gongweibao
Copy link
Contributor Author

评论太多了。关闭这个PR,重新开一个。

@gongweibao gongweibao closed this May 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants