为了情景的完整性,让我们勾勒一个Golang HTTP服务器,它在不同的端点和请求类型(GET和POST等)之间每分钟有一百万个请求。
我应该如何扩展这个概念?我应该为每个端点创建不同的工作池和作业吗?还是可以创建不同的作业并将它们输入到同一个队列中,由同一个池来处理?
我希望保持简单性,即如果我创建一个新的API端点,我不必创建新的工作池,这样我就可以专注于API。但是性能也非常重要。
我正在尝试构建的代码是从之前链接的示例中获取的,这里是一个拥有这段代码的他人的github 'gist'。
package main
import (
"encoding/json"
"io/ioutil"
"net/http"
)
var largePool chan func()
var smallPool chan func()
func main() {
// Start two different sized worker pools (e.g., for different workloads).
// Cancelation and graceful shutdown omited for brevity.
largePool = make(chan func(), 100)
smallPool = make(chan func(), 10)
for i := 0; i < 100; i++ {
go func() {
for f := range largePool {
f()
}
}()
}
for i := 0; i < 10; i++ {
go func() {
for f := range smallPool {
f()
}
}()
}
http.HandleFunc("/endpoint-1", handler1)
http.HandleFunc("/endpoint-2", handler2) // naming things is hard, okay?
http.ListenAndServe(":8080", nil)
}
func handler1(w http.ResponseWriter, r *http.Request) {
// Imagine a JSON body containing a URL that we are expected to fetch.
// Light work that doesn't consume many of *our* resources and can be done
// in bulk, so we put in in the large pool.
var job struct{ URL string }
if err := json.NewDecoder(r.Body).Decode(&job); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
go func() {
largePool <- func() {
http.Get(job.URL)
// Do something with the response
}
}()
w.WriteHeader(http.StatusAccepted)
}
func handler2(w http.ResponseWriter, r *http.Request) {
// The request body is an image that we want to do some fancy processing
// on. That's hard work; we don't want to do too many of them at once, so
// so we put those jobs in the small pool.
b, err := ioutil.ReadAll(r.Body)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
go func() {
smallPool <- func() {
processImage(b)
}
}()
w.WriteHeader(http.StatusAccepted)
}
func processImage(b []byte) {}
这只是一个简单的例子,为了让大家理解。如何设置工作池并不是很重要,你只需要定义一个聪明的任务即可。在上面的例子中,使用了闭包,但你也可以定义一个Job接口。
type Job interface {
Do()
}
var largePool chan Job
var smallPool chan Job
现在,我不会把整个工作池方法称为“简单”。你说你的目标是限制正在执行工作的goroutine数量。这根本不需要工作线程,只需要一个限制器。以下是与上文相同的示例,但使用通道作为信号量来限制并发。
package main
import (
"encoding/json"
"io/ioutil"
"net/http"
)
var largePool chan struct{}
var smallPool chan struct{}
func main() {
largePool = make(chan struct{}, 100)
smallPool = make(chan struct{}, 10)
http.HandleFunc("/endpoint-1", handler1)
http.HandleFunc("/endpoint-2", handler2)
http.ListenAndServe(":8080", nil)
}
func handler1(w http.ResponseWriter, r *http.Request) {
var job struct{ URL string }
if err := json.NewDecoder(r.Body).Decode(&job); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
go func() {
// Block until there are fewer than cap(largePool) light-work
// goroutines running.
largePool <- struct{}{}
defer func() { <-largePool }() // Let everyone that we are done
http.Get(job.URL)
}()
w.WriteHeader(http.StatusAccepted)
}
func handler2(w http.ResponseWriter, r *http.Request) {
b, err := ioutil.ReadAll(r.Body)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
go func() {
// Block until there are fewer than cap(smallPool) hard-work
// goroutines running.
smallPool <- struct{}{}
defer func() { <-smallPool }() // Let everyone that we are done
processImage(b)
}()
w.WriteHeader(http.StatusAccepted)
}
func processImage(b []byte) {}
为什么需要工作池并不清楚?使用goroutine是否足够?
如果资源受限,可以考虑实现速率限制。如果没有限制,那么为什么不根据需要启动goroutine?
学习的最佳方法是研究其他人如何做好的事情。
看看https://github.com/valyala/fasthttp
Fast HTTP package for Go. Tuned for high performance. Zero memory allocations in hot paths. Up to 10x faster than
net/http
.
他们声称:
每个物理服务器从超过1.5M个并发keep-alive连接中提供高达200K rps的服务
这相当令人印象深刻,我怀疑您是否可以通过pool / jobqueue
做得更好。
如之前所述,在您的服务器中,每个请求处理程序都将在至少一个goroutine中运行。
但是,如果需要,您仍然可以使用工作池来进行后端并行任务。例如,假设您的某些Http Handler函数触发对其他外部API的调用并“聚合”它们的结果,因此在这种情况下,调用的顺序并不重要,这是您可以利用工作池的一种场景,并分配您的工作以便将它们分派到工作goroutine中并行运行每个任务:
示例代码片段:
// build empty response
capacity := config.GetIntProperty("defaultListCapacity")
list := model.NewResponseList(make([]model.Response, 0, capacity), 1, 1, 0)
// search providers
providers := getProvidersByCountry(country)
// create a slice of jobResult outputs
jobOutputs := make([]<-chan job.JobResult, 0)
// distribute work
for i := 0; i < len(providers); i++ {
job := search(providers[i], m)
if job != nil {
jobOutputs = append(jobOutputs, job.ReturnChannel)
// Push each job onto the queue.
GetInstance().JobQueue <- *job
}
}
// Consume the merged output from all jobs
out := job.Merge(jobOutputs...)
for r := range out {
if r.Error == nil {
mergeSearchResponse(list, r.Value.(*model.ResponseList))
}
}
return list
异步运行“通用”任务的工作池完整示例:https://github.com/guilhebl/go-offer/blob/master/offer/repo.go