Sourcing as a channel, not a feature.

Software Engineering Professional

Bengaluru, India
Platform Engineer Staff Engineer Operations Engineer Go Developer Full Stack Go Developer
Actively hiring

Software Engineering Professional

BT Group
Bengaluru, India
Platform Engineer Staff Engineer Operations Engineer Go Developer Full Stack Go Developer
BT Group
Actively hiring

hackajob is partnering with BT Group to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Job Req ID: 58237

Posting Date: 6-May-2026

Function:  Software Engineering 

Location: Bengaluru

About the role

You will build the fastest, most security-critical services in the Cognium platform. Every agent invocation flows through the code you write - the LLM Router, Tool Gateway, and Guardrail Service are on the hot path of every single request. The performance requirements are unforgiving: p99 latency under 8 seconds under 100 concurrent agents, sub-1ms Cedar policy evaluation, sub-millisecond NATS event publishing, 10MB pod footprint.
This is not a CRUD API role. You will write concurrent, high-throughput, zero-allocation-critical Go code. You will reason about goroutine lifecycle, channel backpressure, context propagation, circuit breakers, and gVisor sandbox isolation. You will be the engineer who owns the path between a user's message and the LLM's response.

What you’ll be doing

Core Service Development
•    Design and implement the LLM Router with three routing profiles: bedrock_api (SigV4 auth, 60s timeout, HTTP/1.1 chunked streaming), cloud_api (Bearer token, 30s timeout, SSE), local_inference (vLLM, 120s timeout, no cloud fallback for Restricted data)
•    Build ThrottlingException retry logic with jitter backoff for Bedrock API - distinguish retryable vs non-retryable error codes, implement fallback chain activation
•    Implement Redis prompt cache with cache key hash(system_prompt + tools + model) - TTL management, cache invalidation on agent version change or KB update, cache hit rate metrics
•    Build Tool Gateway execution engine: concurrent tool call dispatch, per-tool timeout enforcement, idempotency key pre-call lookup (dedup on non-idempotent APIs), circuit breaker per tool_id
•    Implement gVisor (runsc) sidecar injection for custom SDK tools - configure sandbox network egress to allowed_egress domains only, enforce memory limits via K8s ResourceLimits
•    Build Vault dynamic secret client in Go — AppRole authentication, lease lifecycle management, just-in-time credential injection for tool API calls, automatic lease renewal goroutine
•    Implement Guardrail Service input pipeline: DLP scan (Presidio via Python sidecar gRPC call), DistilBERT injection classifier (threshold 0.85 block, 0.50–0.85 flag), content filtering regex engine
•    Build Guardrail Service output pipeline: faithfulness LLM-as-Judge call (Claude Haiku via LLM Router, threshold 0.70), content grounding check (threshold 0.65), Detoxify toxicity (threshold 0.90), PII redaction
•    Implement OTel span instrumentation on every service: root_span propagation, span attribute enrichment (agent_id, model, tenant_id, tokens_in, tokens_out, cost_microcents), context propagation across gRPC and HTTP calls

Concurrency and Performance
•    Write concurrent Go using goroutines, channels, sync primitives (WaitGroup, Mutex, RWMutex, Once, Pool) - zero data races, verified with go race detector in CI
•    Implement backpressure mechanisms using Redis token bucket counters - reject invocations when concurrency ceiling is reached with configurable shed vs queue behaviour
•    Profile and optimise heap allocations on hot paths - use sync.Pool for frequently allocated structs, benchmark with pprof, target zero heap allocations in critical loops
•    Implement streaming response proxy in LLM Router - buffer-free pass-through of SSE and HTTP/1.1 chunked responses to the client with context cancellation propagation
•    Write benchmarks for all critical path functions - BenchmarkXxx with b.ReportAllocs(), track allocation counts per operation, set regression thresholds in CI

Integration
•    Implement NATS JetStream publisher in Tool Gateway and Guardrail Service - publish tool execution events and guardrail trigger events to Kafka via NATS with at-least-once delivery guarantee
•    Implement Kafka producer for audit events - exactly-once semantics using idempotent producer + transactions, SHA-256 hash chain continuation per event
•    Write gRPC clients for Vault API, Cedar evaluation service, and Python-based ML sidecars (DistilBERT, Detoxify) - deadline propagation, retry policy, circuit breaker
•    Build Cedar policy evaluation call in the API Gateway filter - <1ms budget for Cedar gRPC call, use connection pooling, implement local decision cache with NATS-driven invalidation

Essential Skills / Experience

Go Language Expertise
•    Go 1.21+ - generics, iterators, structured logging (slog), slices/maps packages, error wrapping (fmt.Errorf %w, errors.Is/As)
•    Concurrency - goroutines, channels (buffered/unbuffered), select, context.Context cancellation and deadline propagation, sync package, atomic operations
•    Memory management - escape analysis, sync.Pool, pprof profiling (heap, CPU, goroutine, block, mutex), go tool trace, GOMAXPROCS tuning
•    Testing - table-driven tests, testify, gomock for interface mocking, httptest for HTTP handler testing, go race detector, benchmarks with b.ReportAllocs()
•    Build tooling - Go modules, build tags, ldflags for version injection, CGO considerations, multi-arch builds (GOARCH=amd64 + arm64) for Cosign-signed images
•    gRPC in Go - google.golang.org/grpc, interceptors (unary + streaming), reflection, health check protocol, graceful shutdown

System and Network Programming
•    HTTP server internals - net/http server tuning, custom transports, connection pooling, keep-alive configuration, TLS 1.3, mTLS client certificates
•    Streaming protocols - HTTP/1.1 chunked transfer encoding, Server-Sent Events (SSE), WebSocket upgrade handling, io.Reader/Writer pipeline composition
•    Service mesh integration - Istio sidecar awareness, mTLS passthrough, x-forwarded-for header handling, envoy filter interaction
•    Container-native design - graceful shutdown (SIGTERM handling, connection drain), liveness/readiness probe endpoints, structured startup sequencing

Messaging and Streaming
•    Apache Kafka - producer/consumer patterns, exactly-once semantics, topic partitioning strategy, consumer group management, Kafka Streams for real-time aggregations, MirrorMaker2 for replication
•    Apache Kafka - producer/consumer patterns, exactly-once semantics, topic design, consumer group rebalancing, Kafka Streams for cost attribution aggregation, MirrorMaker2
•    NATS JetStream - subject hierarchy design, push/pull consumers, durable subscriptions, key-value store for Cedar cache state, acknowledgement strategies, flow control
•    Redis - Cluster mode (hash slots, resharding), Lua scripting for atomic operations, pub/sub, keyspace notifications, TTL patterns, Redlock distributed locking

Desirable Skills / Experience

Identity and Security
•    Azure AD integration - SCIM 2.0 provisioning protocol, OIDC token validation, SAML 2.0 assertion parsing, conditional access claim inspection, group-to-role mapping, regulatory access gate
•    HashiCorp Vault (OpenBao) - dynamic secrets engine, AppRole and Kubernetes auth methods, PKI certificate issuance, lease renewal, transit encryption for CMEK
•    JWT / OAuth 2.0 - token validation (RS256/ES256), scope enforcement, token introspection, refresh token rotation, service-to-service credential management

Observability (Basics)
•    Dynatrace - distributed trace instrumentation via OTLP, custom metrics with DQL query verification, SLO definition and burn rate alerting, Davis AI problem configuration, OneAgent Kubernetes monitoring
•    OpenTelemetry - SDK span creation and attribute enrichment (agent_id, tenant_id, model, cost_microcents), context propagation across HTTP and gRPC boundaries, OTLP exporter configuration
•    Structured logging - JSON format (trace_id, span_id, service, level), PII scrubbing before emission, log correlation with distributed traces in Dynatrace

Infrastructure and Platform (Basics)
•    Git - trunk-based development, conventional commits, semantic versioning tags, branch protection rules, merge request workflows in GitLab
•    GitLab CI/CD - pipeline authoring (.gitlab-ci.yml), multi-stage pipelines, environment promotion gates, parallel job matrix, GitLab Container Registry, merge request pipelines with test coverage enforcement
•    Kubernetes - pod anti-affinity rules for HA, resource requests vs limits, HPA configuration (CPU + custom KEDA metrics), liveness/readiness/startup probes, namespace NetworkPolicy authoring, ServiceAccount RBAC
•    Docker - multi-stage Dockerfile optimisation, distroless base images, Cosign image signing post-build, SBOM generation with Syft, Trivy vulnerability scan (block on CRITICAL) in CI

•    Temporal.io - workflow and activity implementation, HITL Signals, per-tenant namespace isolation, Worker scaling, workflow versioning.
•    Cedar policy language - ABAC/RBAC policy authoring, condition expression design, policy unit testing, namespace and entity type modelling
•    Istio - VirtualService traffic splitting for canary deployments, DestinationRule connection pool tuning, PeerAuthentication for mTLS, Envoy filter chain extension
•    Argo Rollouts - Rollout resource authoring, AnalysisTemplate definition with Dynatrace metric queries, progressive delivery step configuration, automatic rollback triggers
•    Helm - chart authoring, values.yaml parameterisation, chart testing, Helm hooks for schema migrations

Our Package

BT Group is the UK’s leading communications group and the holding company behind some of the country’s most recognised brands – including BT, EE, Openreach and Plusnet. Our purpose is as simple as it is ambitious: we connect for good.  Our customers include consumers, small, medium and large businesses, public sector organisations and other communications providers. 

BT Group’s role is about setting direction, unlocking value and creating the conditions for our brands and businesses to thrive.

Having come through the most capital-intensive phase of our fibre investment, our focus now is on what comes next – simplifying how we operate, using technology and AI to work smarter, and organising ourselves to serve customers better and grow sustainably. Group teams shape strategy, policy, brand, capital allocation and transformation, helping the whole organisation perform at its best.

We have a singular culture that unites all our people: we are customer-first challengers, who are committed, clear and connected. These behaviours unite us as one team to deliver for our colleagues, our customers, our stakeholders and the country.   Joining BT Group means working at the heart of a business that matters to the UK, with the opportunity to shape decisions, influence outcomes and help set the future course of one of the country’s most important companies.

hackajob is partnering with BT Group to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?