I have built TQMemory as a high-performance, in-memory cache that can be used as a drop-in replacement for Memcached. It uses the same CLI flags, speaks the same protocol, and under some conditions it exceeds Memcached performance. When used as a Go package, it circumvents network, and achieves over 15 million GET requests per second (about 48x faster than Memcached over sockets).

See: https://github.com/mevdschee/tqmemory

What is TQMemory?

TQMemory is implemented in Go, and can be used as both as an embedded library and as a standalone server. It speaks the Memcached protocol (both text and binary), meaning that in server mode it works out-of-the-box with existing clients.

Performance

TQMemory is optimized for write-heavy workloads with larger values, such as SQL query results. Benchmarks were run with Unix sockets, 10 clients, and 10KB values:

Benchmark Chart

With 4 threads, TQMemory achieves 233K SET and 261K GET requests per second, compared to Memcached’s 161K SET and 329K GET. This means SET operations are about 45% faster than Memcached, while GET performance is roughly 21% slower.

I measured that 99% of the overhead of TQMemory was due to network I/O. That’s why when you embed TQMemory in your Go application, it is about 48x faster than Memcached. With 4 threads, the embedded package achieves 439K SET and 15.9M GET requests per second. In certain niche use cases, such high performance can be a game-changer.

How it Works

TQMemory uses a sharded, worker-based architecture:

  • Sharded Cache: Keys are distributed across workers via FNV-1a hash
  • Per-Shard Workers: Each shard has a dedicated goroutine for writes
  • Direct GET Path: Reads use RWMutex for concurrent access (no channel overhead)
  • LRU Eviction: When memory limit is reached, least recently used items are evicted
  • Batched LRU: LRU updates are processed every 100ms to reduce contention

Each worker maintains its own map for O(1) lookups, a min-heap for TTL expiration, and a linked list for LRU ordering. This provides predictable latency and simple reasoning about concurrency.

Use Case: SQL Query Result Caching

The primary use case I have built TQMemory for is caching expensive database queries. When embedded as a Go package, you get near-instant cache hits:

import "github.com/mevdschee/tqmemory/pkg/tqmemory"

// Initialize: 4 shards, 512MB memory limit
cache := tqmemory.NewShardedCache(4, 512*1024*1024)

func GetProducts(db *sql.DB, categoryID int) ([]Product, error) {
    key := fmt.Sprintf("products:cat:%d", categoryID)
    
    // Cache hit: ~15M RPS capable
    if data, _, err := cache.Get(key); err == nil {
        var products []Product
        json.Unmarshal(data, &products)
        return products, nil
    }
    
    // Cache miss: query database
    products, err := queryProductsFromDB(db, categoryID)
    if err != nil {
        return nil, err
    }
    
    // Cache for 5 minutes
    data, _ := json.Marshal(products)
    cache.Set(key, data, 0, 300)
    
    return products, nil
}

Conclusion

TQMemory is a specialized tool for Go developers who want blazing-fast Memcache like in-process caching. I cannot recommend it as a replacement for Memcached as a network service, as Memcached has better read performance (GET performance is ~20% faster) and TQMemory is not battle tested, while Memcached is rock solid.

Disclaimer: I’ve built TQMemory as a learning project for high performance caching. While benchmarks are promising, test thoroughly before using it in production.

See: https://github.com/mevdschee/tqmemory