Beware Blocking Channel Sends in Go

I recently got bit by a blocking channel send in Go causing an entire program to hang. I had gorouting thing running on a timer to check something periodically and send results to a channel. Another goroutine was consuming from the channel. Notably, the channel was unbuffered intentially as I assume the consumer would also be picking things up.

Here’s the problem: the consumer sometimes took longer than the producer’s timer and the producer would get stuck in a blocked state trying to send another message. Worse: sometimes the result from the producer goroutine would cause the rest of the app to shut down blocking the producer indefinitely and causing the parent WaitGroup to hang and not respond to signals because the ctx.Done() was never reached in a select block.

The solution is to use a select block with a default or respect the context’s Done() channel or with a default case.

Here’s an example program.

package main

import (
	"context"
	"fmt"
	"os"
	"os/signal"
	"sync"
	"syscall"
	"time"
)

func runSchedule(ctx context.Context, results chan<- string) {
	ticker := time.NewTicker(time.Second * 1)
	defer ticker.Stop()

	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			results <- "bye"
		}
	}
}

func runConsumer(ctx context.Context, results <-chan string) {
	for {
		select {
		case <-ctx.Done():
			return
		case result := <-results:
			// prentend doing things with this result takes some time, specifically
			// time longer than the interval above.
			<-time.After(time.Second * 5)
			fmt.Println(result)
		}
	}
}

func main() {
	ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
	defer cancel()

	var wg sync.WaitGroup

	results := make(chan string)

	wg.Add(1)
	go func() {
		runSchedule(ctx, results)
		wg.Done()
	}()

	wg.Add(1)
	go func() {
		runConsumer(ctx, results)
		wg.Done()
	}()

	wg.Wait()
}

The above will hand and a Ctrl+C may cancel it or it may just hang indefinitely. Depends on if the select block in the producer decided to go with ctx.Done() or the timer.

Possible solution would be to select around the send and respect ctx.Done()

func runSchedule(ctx context.Context, results chan<- string) {
	ticker := time.NewTicker(time.Second * 1)
	defer ticker.Stop()

	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			select {
			case results <- "bye":
				// noop
			case <-ctx.Done():
				return;
			}
		}
	}
}

Or depending on the program, just drop the message and let the outer loop continue. I went with this solution in my own app, since the consumer go routine processing a message meant that the app was shutting down.

func runSchedule(ctx context.Context, results chan<- string) {
	ticker := time.NewTicker(time.Second * 1)
	defer ticker.Stop()

	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			select {
			case results <- "bye":
				// noop
			default:
				log.Println("results channel is full, dropping message")
			}
		}
	}
}

Another option, make a child context with a timeout so the select doesn’t block indefinitely (and the timer can continue to do its thing) and the loop respects the parent context’s cancellation.

func sendResult(ctx context.Context, results chan<- string) {
	childCtx, cancel := context.WithTimeout(ctx, time.Millisecond*500)
	defer cancel()

	select {
	case <-childCtx.Done():
		log.Println("timeout sending result, dropping")
	case results <- "result":
		// noop
	}
}

func runSchedule(ctx context.Context, results chan<- string) {
	ticker := time.NewTicker(time.Second * 1)
	defer ticker.Stop()

	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			sendResult(ctx, results)
		}
	}
}
Posted in Go