I recently got bit by a blocking channel send in Go causing an entire program to hang. I had gorouting thing running on a timer to check something periodically and send results to a channel. Another goroutine was consuming from the channel. Notably, the channel was unbuffered intentially as I assume the consumer would also be picking things up.
Here’s the problem: the consumer sometimes took longer than the producer’s timer and the producer would get stuck in a blocked state trying to send another message. Worse: sometimes the result from the producer goroutine would cause the rest of the app to shut down blocking the producer indefinitely and causing the parent WaitGroup
to hang and not respond to signals because the ctx.Done()
was never reached in a select block.
The solution is to use a select
block with a default or respect the context’s Done()
channel or with a default case.
Here’s an example program.
package main
import (
"context"
"fmt"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
func runSchedule(ctx context.Context, results chan<- string) {
ticker := time.NewTicker(time.Second * 1)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
results <- "bye"
}
}
}
func runConsumer(ctx context.Context, results <-chan string) {
for {
select {
case <-ctx.Done():
return
case result := <-results:
// prentend doing things with this result takes some time, specifically
// time longer than the interval above.
<-time.After(time.Second * 5)
fmt.Println(result)
}
}
}
func main() {
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer cancel()
var wg sync.WaitGroup
results := make(chan string)
wg.Add(1)
go func() {
runSchedule(ctx, results)
wg.Done()
}()
wg.Add(1)
go func() {
runConsumer(ctx, results)
wg.Done()
}()
wg.Wait()
}
The above will hand and a Ctrl+C
may cancel it or it may just hang indefinitely. Depends on if the select
block in the producer decided to go with ctx.Done()
or the timer.
Possible solution would be to select
around the send and respect ctx.Done()
func runSchedule(ctx context.Context, results chan<- string) {
ticker := time.NewTicker(time.Second * 1)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
select {
case results <- "bye":
// noop
case <-ctx.Done():
return;
}
}
}
}
Or depending on the program, just drop the message and let the outer loop continue. I went with this solution in my own app, since the consumer go routine processing a message meant that the app was shutting down.
func runSchedule(ctx context.Context, results chan<- string) {
ticker := time.NewTicker(time.Second * 1)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
select {
case results <- "bye":
// noop
default:
log.Println("results channel is full, dropping message")
}
}
}
}
Another option, make a child context with a timeout so the select doesn’t block indefinitely (and the timer can continue to do its thing) and the loop respects the parent context’s cancellation.
func sendResult(ctx context.Context, results chan<- string) {
childCtx, cancel := context.WithTimeout(ctx, time.Millisecond*500)
defer cancel()
select {
case <-childCtx.Done():
log.Println("timeout sending result, dropping")
case results <- "result":
// noop
}
}
func runSchedule(ctx context.Context, results chan<- string) {
ticker := time.NewTicker(time.Second * 1)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
sendResult(ctx, results)
}
}
}