RTSP APIs

The rtsp component provides a flexible, multi-codec RTSP streaming framework for ESP32 devices. It supports MJPEG, H.264, and generic audio codecs through an extensible packetizer/depacketizer architecture. The component handles RTP packet splitting and reassembly; encoding and decoding of media data is handled externally by the application.

How RTSP Works

The component uses a split control-plane / media-plane design:

  • RTSP over TCP handles session control such as OPTIONS, DESCRIBE, SETUP, PLAY, PAUSE, and TEARDOWN.

  • SDP returned from DESCRIBE tells the client what tracks exist, how they are encoded, and which per-track control URLs must be used for SETUP.

  • RTP/UDP carries encoded media packets after playback starts.

  • RTCP/UDP sockets are created alongside RTP sockets, but the current ESPP implementation keeps RTCP support lightweight and does not yet implement a full control/feedback plane.

        sequenceDiagram
  participant App as Application
  participant Server as RtspServer / RtspSession
  participant Client as RtspClient
  App->>Server: add_track() / send_frame()
  Client->>Server: OPTIONS
  Server-->>Client: 200 OK
  Client->>Server: DESCRIBE
  Server-->>Client: SDP with session + track control paths
  Client->>Server: SETUP(trackID=n, client_port=RTP-RTCP)
  Server-->>Client: Session + Transport headers
  Client->>Server: PLAY
  Server-->>Client: 200 OK
  Server-->>Client: RTP/UDP packets for each active track
  Client-->>App: on_jpeg_frame() or on_frame(track_id, data)
  Client->>Server: TEARDOWN
  Server-->>Client: 200 OK
    

In ESPP, the server generates one SDP description per session, with one m=... section and one a=control:.../trackID=N entry per registered track. The client parses those lines during describe() and then issues SETUP once per discovered track before calling PLAY.

Packetization Pipeline

The codec-specific logic is intentionally separated from the RTSP core:

        flowchart LR
  Frame["Encoded frame bytes"] --> Packetizer["Codec packetizer"]
  Packetizer --> Chunks["RTP payload chunks"]
  Chunks --> Header["RtspServer adds RTP headers"]
  Header --> Session["RtspSession sends UDP packets"]
  Session --> ClientRtp["RtspClient RTP socket"]
  ClientRtp --> Depacketizer["Codec depacketizer"]
  Depacketizer --> Callback["Application callback"]
    

RtspServer::send_frame(track_id, data) asks the selected packetizer to split the encoded frame into MTU-sized chunks, adds RTP headers with track-specific SSRC and sequence numbers, and leaves the resulting packets queued for active sessions to transmit. On the client side, RtspClient::handle_rtp_packet() parses the RTP header, uses the payload type to find the matching depacketizer, and emits a completed frame through either on_jpeg_frame or the generic on_frame(track_id, data) callback.

Legacy MJPEG Compatibility

For backward compatibility, the component still preserves the older MJPEG-only behavior:

  • RtspServer::send_frame(std::span<const uint8_t>) lazily creates a default track 0 and uses the legacy RFC 2435-compatible MJPEG wire format.

  • RtspClient automatically creates an MjpegDepacketizer when a JPEG callback is registered and payload type 26 is discovered in SDP.

This means older single-track MJPEG integrations can keep working while newer multi-track applications use add_track() plus codec-specific packetizers.

RTSP Client

The RtspClient class connects to an RTSP server and receives media streams over RTP/UDP. It dispatches incoming RTP packets to codec-specific depacketizers based on payload type.

For backward compatibility, setting the on_jpeg_frame callback automatically creates an MjpegDepacketizer for MJPEG streams (payload type 26). For generic multi-track use, applications can use the on_frame callback and inspect parsed SDP metadata through tracks().

The client now supports:

  • generic on_frame(track_id, data) callbacks for multi-track sessions

  • parsed SDP track metadata including media type, payload type, codec name, sample rate, channel count, and resolved control path

  • automatic depacketizer selection for MJPEG, H.264, and generic payloads discovered during DESCRIBE

  • an on_connection_lost callback for reconnect / rediscovery workflows when the RTSP control socket or RTP stream disappears after playback starts

RTSP Server

The RtspServer class accepts RTSP connections and streams media over RTP/UDP. It supports multiple tracks, each with its own codec-specific packetizer, SSRC, and sequence numbering.

For backward compatibility, calling send_frame(const JpegFrame&) lazily creates a default MJPEG track. For other codecs, register tracks via add_track() and send frames with send_frame(track_id, data).

The server also exposes helpers that are useful for embedded capture loops:

  • configurable accept, session-dispatch, and per-session control task stack sizes

  • has_active_sessions() to avoid capturing when no client is actively playing

  • get_capture_cooldown() and get_recommended_capture_period() so an application can slow capture when RTP backpressure is observed

  • a legacy MJPEG send_frame(std::span<const uint8_t>) path that preserves the older wire format for existing MJPEG-only users

RTP Packetizers & Depacketizers

The packetizer/depacketizer abstraction allows the server and client to support multiple media codecs without changing the RTSP core. Concrete implementations are provided for:

  • MJPEG (MjpegPacketizer / MjpegDepacketizer) — RFC 2435 JPEG over RTP

  • H.264 (H264Packetizer / H264Depacketizer) — RFC 6184 with FU-A fragmentation

  • Generic (GenericPacketizer / GenericDepacketizer) — MTU chunking for audio or other pre-encoded payloads, with frame reconstruction based on RTP marker / timestamp boundaries

Custom packetizers can be created by subclassing RtpPacketizer or RtpDepacketizer.

Relevant Specifications

These are the main standards to keep beside the code when working on this component:

Specification

Why it matters here

RFC 2326: Real Time Streaming Protocol (RTSP)

Primary control-plane reference for the RTSP/1.0 request and response flow implemented by RtspClient, RtspServer, and RtspSession.

RFC 7826: RTSP 2.0

Useful background for newer RTSP deployments; informative here because the current component speaks RTSP/1.0 on the wire.

RFC 3550: RTP / RTCP

Defines RTP headers, timestamps, sequence numbers, SSRC handling, and the RTCP control protocol model used by the transport layer.

RFC 4566: Session Description Protocol (SDP)

Describes the SDP m=, a=control:, and a=rtpmap: lines that the server generates and the client parses during DESCRIBE.

RFC 3551: RTP A/V Profile

Defines common RTP payload-type and clock-rate conventions used alongside dynamic payloads.

RFC 2435: RTP Payload Format for JPEG

Reference for the MJPEG packetization and depacketization path.

RFC 6184: RTP Payload Format for H.264 Video

Reference for the H.264 FU-A fragmentation and reassembly path.

Testing and Utilities

There are several ways to exercise the RTSP stack:

  • ESPP Python library: espp/lib contains the build scripts and bindings used to expose the RTSP client / server classes to Python.

  • Python harness scripts: espp/python contains wrapper and multitrack scripts for exercising legacy MJPEG flows, generic multi-track flows, live microphone audio, and end-to-end host validation.

  • Embedded examples and downstream apps: the component example plus repositories such as camera-streamer and camera-display cover practical server/client integrations.

See python/README.md in the repository root for more information on the host-side scripts.

Example

The RTSP Example page demonstrates several RTSP usage patterns selected via menuconfig, including:

  • legacy MJPEG server + client behavior

  • server-only MJPEG streaming

  • client-only MJPEG reception

  • API-level packetizer / depacketizer exercises

  • multi-track streaming with MJPEG video plus generic audio

For more complete integrations, see the camera-streamer and camera-display repositories.

API Reference

Header File

Classes

class RtspClient : public espp::BaseComponent

A class for interacting with an RTSP server using RTP and RTCP over UDP

This class is used to connect to an RTSP server and receive JPEG frames over RTP. It uses the TCP socket to send RTSP requests and receive RTSP responses. It uses the UDP socket to receive RTP and RTCP packets.

The RTSP client is designed to be used with the RTSP server in the [camera-streamer]https://github.com/esp-cpp/camera-streamer) project, but it should work with any RTSP server that sends JPEG frames over RTP.

RtspClient Example

  espp::RtspClient rtsp_client({
      .server_address = ip_address,
      .rtsp_port = CONFIG_RTSP_SERVER_PORT,
      .path = "/mjpeg/1",
      .on_jpeg_frame =
          [](std::shared_ptr<espp::JpegFrame> jpeg_frame) {
            fmt::print("Got JPEG frame of size {}x{}\n", jpeg_frame->get_width(),
                       jpeg_frame->get_height());
          },
      .log_level = espp::Logger::Verbosity::ERROR,
  });

  std::error_code ec;

  do {
    ec.clear();
    rtsp_client.connect(ec);
    if (ec) {
      logger.error("Error connecting to server: {}", ec.message());
      logger.info("Retrying in 1s...");
      std::this_thread::sleep_for(1s);
    }
  } while (ec);

  rtsp_client.describe(ec);
  if (ec) {
    logger.error("Error describing server: {}", ec.message());
  }

  rtsp_client.setup(ec);
  if (ec) {
    logger.error("Error setting up server: {}", ec.message());
  }

  rtsp_client.play(ec);
  if (ec) {
    logger.error("Error playing server: {}", ec.message());
  }

Public Types

typedef std::function<void(std::shared_ptr<espp::JpegFrame> jpeg_frame)> jpeg_frame_callback_t

Function type for the callback to call when a JPEG frame is received.

using frame_callback_t = std::function<void(int track_id, std::vector<uint8_t> &&data)>

Generic frame callback — called for any track/codec with raw frame data.

using disconnect_callback_t = std::function<void(void)>

Callback invoked when the RTSP server disappears after playback starts.

Public Functions

explicit RtspClient(const espp::RtspClient::Config &config)

Constructor

Parameters:

config – The configuration for the RTSP client

~RtspClient()

Destructor Disconnects from the RTSP server

std::string send_request(const std::string &method, const std::string &path, const std::unordered_map<std::string, std::string> &extra_headers, std::error_code &ec)

Send an RTSP request to the server

Note

This is a blocking call

Note

This will parse the response and set the session ID if it is present in the response. If the response is not a 200 OK, then an error code will be set and the response will be returned. If the response is a 200 OK, then the response will be returned and the error code will be set to success.

Parameters:
  • method – The method to use for connecting. Options are “OPTIONS”, “DESCRIBE”, “SETUP”, “PLAY”, and “TEARDOWN”

  • path – The path to the RTSP stream on the server.

  • extra_headers – Any extra headers to send with the request. These will be added to the request after the CSeq and Session headers. The key is the header name and the value is the header value. For example, {“Accept”: “application/sdp”} will add “Accept: application/sdp” to the request. The “User-Agent” header will be added automatically. The “CSeq” and “Session” headers will be added automatically. The “Accept” header will be added automatically. The “Transport” header will be added automatically for the “SETUP” method. Defaults to an empty map.

  • ec – The error code to set if an error occurs

Returns:

The response from the server

void connect(std::error_code &ec)

Connect to the RTSP server Connects to the RTSP server and sends the OPTIONS request.

Parameters:

ec – The error code to set if an error occurs

void disconnect(std::error_code &ec)

Disconnect from the RTSP server Disconnects from the RTSP server and sends the TEARDOWN request.

Parameters:

ec – The error code to set if an error occurs

void describe(std::error_code &ec)

Describe the RTSP stream Sends the DESCRIBE request to the RTSP server and parses the response.

Parameters:

ec – The error code to set if an error occurs

void setup(std::error_code &ec)

Setup the RTSP stream

Note

Starts the RTP and RTCP threads. Sends the SETUP request to the RTSP server and parses the response.

Note

The default ports are 5000 and 5001 for RTP and RTCP respectively.

Note

The default receive timeout is 5 seconds.

Parameters:

ec – The error code to set if an error occurs

void setup(size_t rtp_port, size_t rtcp_port, const std::chrono::duration<float> &receive_timeout, std::error_code &ec)

Setup the RTSP stream Sends the SETUP request to the RTSP server and parses the response.

Note

Starts the RTP and RTCP threads.

Parameters:
  • rtp_port – The RTP client port

  • rtcp_port – The RTCP client port

  • receive_timeout – The timeout for receiving RTP and RTCP packets

  • ec – The error code to set if an error occurs

void add_depacketizer(int payload_type, std::shared_ptr<RtpDepacketizer> depacketizer)

Register a depacketizer for a specific RTP payload type. When RTP packets with this payload type are received, they are dispatched to the registered depacketizer.

Parameters:
  • payload_type – The RTP payload type (e.g., 26 for MJPEG, 96 for H264)

  • depacketizer – The depacketizer to handle packets of this type

void play(std::error_code &ec)

Play the RTSP stream Sends the PLAY request to the RTSP server and parses the response.

Parameters:

ec – The error code to set if an error occurs

void pause(std::error_code &ec)

Pause the RTSP stream Sends the PAUSE request to the RTSP server and parses the response.

Parameters:

ec – The error code to set if an error occurs

void teardown(std::error_code &ec)

Teardown the RTSP stream Sends the TEARDOWN request to the RTSP server and parses the response.

Parameters:

ec – The error code to set if an error occurs

inline const std::vector<TrackInfo> &tracks() const

Get the parsed SDP track descriptions from the most recent DESCRIBE call.

Returns:

The ordered set of discovered media tracks.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for the RTSP client.

Public Members

std::string server_address

The server IP Address to connect to.

int rtsp_port = {8554}

The port of the RTSP server.

std::string path = {"/mjpeg/1"}

The path to the RTSP stream on the server. Will be appended to the server address and port to form the full path of the form “rtsp://<server_address>:<rtsp_port><path>”

frame_callback_t on_frame = {nullptr}

Generic frame callback for any codec (track_id, raw frame data)

jpeg_frame_callback_t on_jpeg_frame = {nullptr}

JPEG-specific frame callback (backward compatible). If set and no depacketizer is registered for PT 26, an MjpegDepacketizer is automatically created.

disconnect_callback_t on_connection_lost = {nullptr}

Called once if the client loses the server after playback starts. This callback is intended for applications that want to stop playback and re-enter service discovery or reconnect logic automatically.

espp::Logger::Verbosity log_level = espp::Logger::Verbosity::INFO

The verbosity of the logger.

struct TrackInfo

Header File

Classes

class RtspServer : public espp::BaseComponent

Class for streaming MJPEG data from a camera using RTSP + RTP Starts a TCP socket to listen for RTSP connections, and then spawns off a new RTSP session for each connection.

See also

RtspSession

RtspServer example

  const std::string server_uri =
      fmt::format("rtsp://{}:{}/mjpeg/1", ip_address, CONFIG_RTSP_SERVER_PORT);
  logger.info("Starting RTSP Server on port {}", CONFIG_RTSP_SERVER_PORT);
  logger.info("RTSP URI: {}", server_uri);

  espp::RtspServer rtsp_server({
      .server_address = ip_address,
      .port = CONFIG_RTSP_SERVER_PORT,
      .path = "/mjpeg/1",
      .log_level = espp::Logger::Verbosity::INFO,
  });
  rtsp_server.start();

  std::span<const uint8_t> frame_data(reinterpret_cast<const uint8_t *>(jpeg_data),
                                      sizeof(jpeg_data));
  espp::JpegFrame jpeg_frame(frame_data);

  logger.info("Parsed JPEG image, num bytes: {}", jpeg_frame.get_data().size());
  logger.info("Created frame of size {}x{}", jpeg_frame.get_width(), jpeg_frame.get_height());
  rtsp_server.send_frame(jpeg_frame);

Note

This class does not currently send RTCP packets

Public Functions

explicit RtspServer(const espp::RtspServer::Config &config)

Construct an RTSP server.

Parameters:

config – The configuration for the RTSP server

~RtspServer()

Destroy the RTSP server.

void set_session_log_level(espp::Logger::Verbosity log_level)

Sets the log level for the RTSP sessions created by this server.

Note

This does not affect the log level of the RTSP server itself

Note

This does not change the log level of any sessions that have already been created

Parameters:

log_level – The log level to set

bool start(const std::chrono::duration<float> &accept_timeout = std::chrono::seconds(5))

Start the RTSP server Starts the accept task, session task, and binds the RTSP socket.

Parameters:

accept_timeout – The timeout for accepting new connections

Returns:

True if the server was started successfully, false otherwise

void stop()

Stop the FTP server Stops the accept task, session task, and closes the RTSP socket.

void add_track(const TrackConfig &config)

Register a media track with the server. Each track has its own packetizer, SSRC, and sequence number.

Parameters:

config – Track configuration including the packetizer.

bool has_active_sessions()

Returns true when at least one session is actively playing.

Returns:

True if an active RTSP session is ready to receive RTP packets.

std::chrono::milliseconds get_capture_cooldown()

Returns how long capture should wait before queueing another frame.

Returns:

Remaining RTP backpressure cooldown, or zero if sending may resume.

Returns the minimum recommended period between captured frames.

Returns:

Recommended capture period based on recent RTP backpressure history.

void send_frame(int track_id, std::span<const uint8_t> frame_data)

Send a frame on a specific track. The track’s packetizer splits the frame into RTP payload chunks, which are then wrapped with RTP headers and queued for delivery.

Note

Overwrites any existing pending packets for this track.

Parameters:
  • track_id – The track to send on.

  • frame_data – Raw encoded frame data.

void send_frame(const espp::JpegFrame &frame)

Send a JPEG frame over the RTSP connection (backward compatible). If no tracks have been added, lazily creates a default MJPEG track on track 0. Uses the legacy RtpJpegPacket packetization to preserve the exact wire format for existing MJPEG users.

Note

Overwrites any existing frame that has not been sent.

Parameters:

frame – The frame to send.

void send_frame(std::span<const uint8_t> frame_data)

Send raw JPEG bytes over the default MJPEG track. Uses the legacy MJPEG RTP packetization path without copying the frame into an intermediate JpegFrame object.

Note

Overwrites any existing frame that has not been sent.

Parameters:

frame_data – Complete JPEG bytes, including header and EOI marker.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for the RTSP server.

Public Members

std::string server_address

The ip address of the server.

int port

The port to listen on.

std::string path

The path to the RTSP stream.

size_t max_data_size = 1000

The maximum size of RTP packet data for the MJPEG stream. Frames will be broken up into multiple packets if they are larger than this. It seems that 1500 works well for sending, but is too large for the esp32 (camera-display) to receive properly.

espp::Logger::Verbosity log_level = espp::Logger::Verbosity::WARN

The log level for the RTSP server.

size_t accept_task_stack_size_bytes = default_accept_task_stack_size_bytes

RTSP accept-task stack size, in bytes.

size_t session_task_stack_size_bytes = default_session_task_stack_size_bytes

RTSP session-dispatch task stack size, in bytes.

size_t control_task_stack_size_bytes = RtspSession::Config::default_control_task_stack_size_bytes

Per-session RTSP control-task stack size, in bytes

struct TrackConfig

Configuration for a media track to be registered with the server.

Public Members

int track_id = {0}

Track identifier.

std::shared_ptr<espp::RtpPacketizer> packetizer

Codec-specific packetizer.

Header File

Classes

class RtspSession : public espp::BaseComponent

Class that reepresents an RTSP session, which is uniquely identified by a session id and sends frame data over RTP and RTCP to the client

Public Functions

explicit RtspSession(std::shared_ptr<espp::TcpSocket> control_socket, const espp::RtspSession::Config &config)

Construct a new RtspSession object.

Parameters:
  • control_socket – The control socket of the session

  • config – The configuration of the session

~RtspSession()

Destroy the RtspSession object Stop the session task.

uint32_t get_session_id() const

Get the session id.

Returns:

The session id

bool is_closed() const

Check if the session is closed.

Returns:

True if the session is closed, false otherwise

bool is_connected() const

Get whether the session is connected

Returns:

True if the session is connected, false otherwise

bool is_active() const

Get whether the session is active

Returns:

True if the session is active, false otherwise

void play()

Mark the session as active This will cause the server to start sending frames to the client

void pause()

Pause the session This will cause the server to stop sending frames to the client

Note

This does not stop the session, it just pauses it

Note

This is useful for when the client is buffering

void teardown()

Teardown the session This will cause the server to stop sending frames to the client and close the connection

bool send_rtp_packet(int track_id, const espp::RtpPacket &packet)

Send an RTP packet on a specific track

Parameters:
  • track_id – The track to send on

  • packet – The RTP packet to send

Returns:

True if the packet was sent successfully, false otherwise

bool send_rtp_packet(int track_id, std::span<const uint8_t> packet_data)

Send a serialized RTP packet on a specific track.

Parameters:
  • track_id – The track to send on

  • packet_data – Serialized RTP packet bytes

Returns:

True if the packet was sent successfully, false otherwise

bool send_rtp_packet(const espp::RtpPacket &packet)

Send an RTP packet to the client (backward compat — sends on default track 0)

Parameters:

packet – The RTP packet to send

Returns:

True if the packet was sent successfully, false otherwise

bool send_rtp_packet(std::span<const uint8_t> packet_data)

Send a serialized RTP packet to the client (default track 0).

Parameters:

packet_data – Serialized RTP packet bytes

Returns:

True if the packet was sent successfully, false otherwise

bool send_rtcp_packet(int track_id, const espp::RtcpPacket &packet)

Send an RTCP packet on a specific track

Parameters:
  • track_id – The track to send on

  • packet – The RTCP packet to send

Returns:

True if the packet was sent successfully, false otherwise

bool send_rtcp_packet(const espp::RtcpPacket &packet)

Send an RTCP packet to the client (backward compat — sends on default track 0)

Parameters:

packet – The RTCP packet to send

Returns:

True if the packet was sent successfully, false otherwise

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for the RTSP session.

Public Members

std::string server_address

The address of the server.

std::string rtsp_path

The RTSP path of the session.

std::chrono::duration<float> receive_timeout = std::chrono::seconds(5)

The timeout for receiving data. Should be > 0.

size_t control_task_stack_size_bytes = default_control_task_stack_size_bytes

RTSP control-task stack size, in bytes

std::function<std::string(const std::string &session_path, uint32_t session_id, const std::string &server_address)> sdp_generator

SDP generator callback. If set, called during DESCRIBE to produce the SDP body. If not set, a default MJPEG SDP is generated for backward compatibility.

Param session_path:

Full RTSP path (e.g., “rtsp://ip:port/path”)

Param session_id:

The session ID

Param server_address:

The server address with port

espp::Logger::Verbosity log_level = espp::Logger::Verbosity::WARN

The log level of the session.

struct Track

Represents one media track within an RTSP session.

Public Members

int track_id = {0}

Track identifier (matches trackID=N in SDP)

std::string control_path

Control path suffix (e.g., “trackID=0”)

espp::UdpSocket rtp_socket

RTP socket for this track.

espp::UdpSocket rtcp_socket

RTCP socket for this track.

int client_rtp_port = {0}

Client’s RTP port.

int client_rtcp_port = {0}

Client’s RTCP port.

bool setup_complete = {false}

Whether SETUP has been completed for this track.

Header File

Classes

class RtpPacketizer : public espp::BaseComponent

Abstract base class for splitting media frames into RTP payload chunks. Concrete packetizers (e.g. MJPEG, H.264) override the pure-virtual methods to produce codec-specific payloads. The RTSP server wraps each returned RtpPayloadChunk with an RTP header before sending.

Subclassed by espp::GenericPacketizer, espp::H264Packetizer, espp::MjpegPacketizer

Public Functions

inline explicit RtpPacketizer(const Config &config, const std::string &name)

Construct an RtpPacketizer.

Parameters:
  • config – The configuration for this packetizer.

  • name – A human-readable name used for logging.

virtual ~RtpPacketizer() = default

Destructor.

virtual std::vector<RtpPayloadChunk> packetize(std::span<const uint8_t> frame_data) = 0

Packetize a complete media frame into RTP payload chunks.

Parameters:

frame_data – The raw frame bytes to packetize.

Returns:

A vector of RtpPayloadChunk ready to be wrapped in RTP packets.

virtual int get_payload_type() const = 0

Get the RTP payload type number for this codec.

Returns:

The RTP payload type (e.g. 26 for MJPEG, 96 for dynamic).

virtual uint32_t get_clock_rate() const = 0

Get the RTP clock rate for timestamp calculation.

Returns:

The clock rate in Hz (e.g. 90000 for video, 8000 for audio).

virtual std::string get_sdp_media_attributes() const = 0

Generate the SDP media-level attributes for this codec.

Returns:

A string containing SDP a= lines (without trailing CRLF).

virtual std::string get_sdp_media_line() const = 0

Generate the SDP m= line for this codec.

Returns:

A string containing the SDP m= line (without trailing CRLF).

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for RtpPacketizer.

Public Members

size_t max_payload_size = {1400}

Maximum payload bytes per RTP packet.

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Classes

class RtpDepacketizer : public espp::BaseComponent

Abstract base class for reassembling media frames from incoming RTP packets. Concrete depacketizers (e.g. MJPEG, H.264) override process_packet() to accumulate payload data and invoke the frame callback when a complete frame has been assembled.

Subclassed by espp::GenericDepacketizer, espp::H264Depacketizer, espp::MjpegDepacketizer

Public Types

using frame_callback_t = std::function<void(std::vector<uint8_t>&&)>

Callback type invoked when a complete frame has been reassembled. The frame data is moved into the callback to avoid copies.

Public Functions

inline explicit RtpDepacketizer(const Config &config, const std::string &name)

Construct an RtpDepacketizer.

Parameters:
  • config – The configuration for this depacketizer.

  • name – A human-readable name used for logging.

virtual ~RtpDepacketizer() = default

Destructor.

virtual void process_packet(const RtpPacket &packet) = 0

Process an incoming RTP packet, accumulating payload data. When a complete frame is assembled the frame callback is invoked.

Parameters:

packet – The RTP packet to process.

inline void set_frame_callback(frame_callback_t cb)

Set the callback for completed frames.

Parameters:

cb – The callback to invoke when a full frame is ready.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for RtpDepacketizer.

Public Members

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Header File

Classes

class MjpegPacketizer : public espp::RtpPacketizer

MJPEG packetizer that fragments JPEG frames into RFC 2435 RTP payloads.

This class takes complete JPEG frames and produces RTP payload chunks suitable for MJPEG streaming. Each chunk contains an RFC 2435 MJPEG header, and the first chunk additionally includes quantization tables.

Public Functions

inline explicit MjpegPacketizer(const Config &config)

Construct an MJPEG packetizer.

Parameters:

config – Configuration for the packetizer.

virtual std::vector<RtpPayloadChunk> packetize(std::span<const uint8_t> frame_data) override

Packetize a complete JPEG frame into RFC 2435 RTP payload chunks.

Parameters:

frame_data – Raw JPEG data including the JPEG header.

Returns:

Vector of payload chunks ready to be wrapped in RTP packets.

virtual int get_payload_type() const override

Get the RTP payload type for MJPEG.

Returns:

26 (static JPEG payload type).

virtual uint32_t get_clock_rate() const override

Get the RTP clock rate for MJPEG.

Returns:

90000 Hz.

virtual std::string get_sdp_media_attributes() const override

Get the SDP media attributes for MJPEG.

Returns:

SDP rtpmap attribute string.

virtual std::string get_sdp_media_line() const override

Get the SDP media line for MJPEG.

Returns:

SDP media description line.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for the MJPEG packetizer.

Public Members

size_t max_payload_size = {1400}

Maximum payload bytes per RTP packet.

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Classes

class MjpegDepacketizer : public espp::RtpDepacketizer

MJPEG depacketizer that reassembles JPEG frames from RTP packets.

This class receives individual RTP packets containing RFC 2435 MJPEG payloads, reassembles the scan data fragments, reconstructs the JPEG header from the MJPEG header fields, and delivers complete JPEG frames through callbacks.

Public Types

using jpeg_frame_callback_t = std::function<void(std::shared_ptr<JpegFrame>)>

Callback type for receiving complete JPEG frames as JpegFrame objects.

using frame_callback_t = std::function<void(std::vector<uint8_t>&&)>

Callback type invoked when a complete frame has been reassembled. The frame data is moved into the callback to avoid copies.

Public Functions

inline explicit MjpegDepacketizer(const Config &config)

Construct an MJPEG depacketizer.

Parameters:

config – Configuration for the depacketizer.

virtual void process_packet(const RtpPacket &packet) override

Process an incoming RTP packet containing MJPEG data.

Note

Packets are parsed as RtpJpegPacket. When a complete frame is assembled (marker bit set and no missing sequence numbers), both the generic frame callback and the JPEG frame callback are invoked.

Parameters:

packet – The RTP packet to process.

void set_jpeg_frame_callback(jpeg_frame_callback_t cb)

Set callback for receiving complete JPEG frames.

Parameters:

cb – Callback receiving a shared pointer to the completed JpegFrame.

inline void set_frame_callback(frame_callback_t cb)

Set the callback for completed frames.

Parameters:

cb – The callback to invoke when a full frame is ready.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for the MJPEG depacketizer.

Public Members

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Classes

class H264Packetizer : public espp::RtpPacketizer

RTP packetizer for H.264 video per RFC 6184.

Accepts H.264 access units in Annex B byte-stream format (NAL units separated by 0x00000001 or 0x000001 start codes) and produces a sequence of RTP payload chunks suitable for transmission.

Supports two NAL-unit packetization strategies:

  • **Single NAL unit mode** — NAL fits within max_payload_size.

  • **FU-A fragmentation** — NAL exceeds max_payload_size (packetization_mode >= 1).

Example

    // Synthetic SPS and PPS (minimal valid-ish NAL units)
    std::vector<uint8_t> sps = {0x67, 0x42, 0xC0, 0x1E, 0xD9, 0x00, 0xA0, 0x47, 0xFE, 0xC8};
    std::vector<uint8_t> pps = {0x68, 0xCE, 0x38, 0x80};

    espp::H264Packetizer h264_packer({
        .max_payload_size = 1400,
        .payload_type = 96,
        .profile_level_id = "42C01E",
        .packetization_mode = 1,
        .sps = sps,
        .pps = pps,
    });

    check(h264_packer.get_payload_type() == 96, "H264Packetizer: payload type is 96");
    check(h264_packer.get_clock_rate() == 90000, "H264Packetizer: clock rate is 90000");

    auto sdp_attrs = h264_packer.get_sdp_media_attributes();
    check(sdp_attrs.find("H264/90000") != std::string::npos,
          "H264Packetizer: SDP contains H264/90000");
    check(sdp_attrs.find("profile-level-id=42C01E") != std::string::npos,
          "H264Packetizer: SDP contains profile-level-id");
    check(sdp_attrs.find("sprop-parameter-sets=") != std::string::npos,
          "H264Packetizer: SDP contains SPS/PPS base64");

    // Create a synthetic H.264 access unit in Annex B format:
    // Start code + SPS + Start code + PPS + Start code + small IDR slice
    std::vector<uint8_t> annex_b_frame;
    // SPS NAL
    annex_b_frame.insert(annex_b_frame.end(), {0x00, 0x00, 0x00, 0x01});
    annex_b_frame.insert(annex_b_frame.end(), sps.begin(), sps.end());
    // PPS NAL
    annex_b_frame.insert(annex_b_frame.end(), {0x00, 0x00, 0x00, 0x01});
    annex_b_frame.insert(annex_b_frame.end(), pps.begin(), pps.end());
    // IDR slice NAL (type 5) — fill with dummy data
    annex_b_frame.insert(annex_b_frame.end(), {0x00, 0x00, 0x00, 0x01});
    annex_b_frame.push_back(0x65); // NAL header: type=5 (IDR)
    for (int i = 0; i < 100; i++) {
      annex_b_frame.push_back(static_cast<uint8_t>(i & 0xFF));
    }

    auto chunks =
        h264_packer.packetize(std::span<const uint8_t>(annex_b_frame.data(), annex_b_frame.size()));
    check(!chunks.empty(), "H264Packetizer: produced chunks from Annex B data");
    check(chunks.back().marker, "H264Packetizer: last chunk has marker bit");

    // With small NALs, all should be single NAL mode (no FU-A needed)
    check(chunks.size() == 3, "H264Packetizer: 3 chunks for SPS+PPS+IDR");

Note

This class does not manage RTP headers (sequence numbers, timestamps, SSRC). The caller wraps each returned chunk into an RtpPacket.

Public Functions

explicit H264Packetizer(const Config &config)

Construct an H264Packetizer.

Parameters:

config – The configuration for the packetizer.

~H264Packetizer() override = default

Destructor.

virtual std::vector<RtpPayloadChunk> packetize(std::span<const uint8_t> frame_data) override

Packetize a complete H.264 access unit (Annex B format).

The input may contain multiple NAL units separated by 3-byte or 4-byte start codes. Each NAL is individually packetized (single NAL or FU-A). The marker bit is set on the last chunk of the last NAL unit in the access unit.

Parameters:

frame_data – Raw Annex B byte-stream of one access unit.

Returns:

Vector of RTP payload chunks ready for transmission.

std::vector<RtpPayloadChunk> packetize_nal(std::span<const uint8_t> nal_data, bool is_last_nal = true)

Packetize a single pre-parsed NAL unit (no start code prefix).

Parameters:
  • nal_data – The raw NAL unit bytes (including NAL header byte).

  • is_last_nal – If true, the marker bit is set on the last chunk.

Returns:

Vector of RTP payload chunks for this NAL.

void set_sps_pps(std::span<const uint8_t> sps, std::span<const uint8_t> pps)

Update the SPS and PPS used for SDP generation.

Parameters:
  • sps – Sequence Parameter Set raw bytes.

  • pps – Picture Parameter Set raw bytes.

virtual int get_payload_type() const override

Get the RTP payload type.

Returns:

The dynamic payload type configured for H.264.

virtual uint32_t get_clock_rate() const override

Get the RTP clock rate for H.264 video.

Returns:

90000 (fixed for H.264).

virtual std::string get_sdp_media_attributes() const override

Get the SDP attribute lines for H.264.

Returns:

SDP a= lines (rtpmap and fmtp) without trailing CRLF.

virtual std::string get_sdp_media_line() const override

Get the SDP m= media line for H.264.

Returns:

SDP m= line without trailing CRLF.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for the H264Packetizer.

Public Members

size_t max_payload_size = {1400}

Maximum payload bytes per RTP packet.

int payload_type = {96}

Dynamic RTP payload type (typically 96–127).

std::string profile_level_id

H.264 profile-level-id hex string, e.g. “42C01E”.

int packetization_mode = {1}

0 = single NAL only, 1 = non-interleaved (FU-A allowed).

std::vector<uint8_t> sps

Sequence Parameter Set raw bytes (without start code).

std::vector<uint8_t> pps

Picture Parameter Set raw bytes (without start code).

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Classes

class H264Depacketizer : public espp::RtpDepacketizer

RTP depacketizer for H.264 video per RFC 6184.

Reassembles H.264 access units from incoming RTP packets. Supports:

  • **Single NAL unit** packets (NAL type 1–23)

  • **STAP-A** aggregation packets (NAL type 24)

  • **FU-A** fragmentation packets (NAL type 28)

When the RTP marker bit is set, the accumulated NAL units are delivered as one Annex B byte-stream (each NAL prefixed with 0x00 0x00 0x00 0x01) via the frame callback set with set_frame_callback().

Example

    espp::H264Depacketizer h264_depacker(espp::H264Depacketizer::Config{});

    bool frame_received = false;
    size_t frame_size = 0;

    h264_depacker.set_frame_callback([&](std::vector<uint8_t> &&data) {
      frame_received = true;
      frame_size = data.size();
      logger.info("H264Depacketizer: got frame of {} bytes", data.size());
      // Verify Annex B start codes are present
      if (data.size() >= 4) {
        bool has_start_code =
            (data[0] == 0x00 && data[1] == 0x00 && data[2] == 0x00 && data[3] == 0x01);
        logger.info("H264Depacketizer: Annex B start code present: {}", has_start_code);
      }
    });

    // Create synthetic single NAL packets and feed them
    std::vector<uint8_t> sps = {0x67, 0x42, 0xC0, 0x1E, 0xD9};
    std::vector<uint8_t> pps = {0x68, 0xCE, 0x38, 0x80};
    std::vector<uint8_t> idr = {0x65, 0x01, 0x02, 0x03, 0x04};

    auto make_rtp = [](const std::vector<uint8_t> &payload, int pt, uint16_t seq, bool marker) {
      espp::RtpPacket pkt(payload.size());
      pkt.set_version(2);
      pkt.set_payload_type(pt);
      pkt.set_sequence_number(seq);
      pkt.set_timestamp(0);
      pkt.set_ssrc(54321);
      pkt.set_marker(marker);
      pkt.set_payload(std::span<const uint8_t>(payload));
      pkt.serialize();
      return pkt;
    };

    h264_depacker.process_packet(make_rtp(sps, 96, 0, false));
    h264_depacker.process_packet(make_rtp(pps, 96, 1, false));
    h264_depacker.process_packet(make_rtp(idr, 96, 2, true)); // marker = end of AU

    check(frame_received, "H264Depacketizer: frame callback invoked");
    // Expected: 3 NALs with start codes = 3*(4) + 5+4+5 = 26 bytes
    check(frame_size > 0, "H264Depacketizer: frame has data");

Public Types

using frame_callback_t = std::function<void(std::vector<uint8_t>&&)>

Callback type invoked when a complete frame has been reassembled. The frame data is moved into the callback to avoid copies.

Public Functions

explicit H264Depacketizer(const Config &config)

Construct an H264Depacketizer.

Parameters:

config – The configuration for the depacketizer.

~H264Depacketizer() override = default

Destructor.

virtual void process_packet(const RtpPacket &packet) override

Process an incoming RTP packet containing H.264 payload.

Handles single NAL, STAP-A, and FU-A packet types. NAL units are buffered until the RTP marker bit indicates the end of an access unit, at which point the complete Annex B frame is delivered via the callback.

Parameters:

packet – The RTP packet to process.

inline void set_frame_callback(frame_callback_t cb)

Set the callback for completed frames.

Parameters:

cb – The callback to invoke when a full frame is ready.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for the H264Depacketizer.

Public Members

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Classes

class GenericPacketizer : public espp::RtpPacketizer

A generic RTP packetizer suitable for audio codecs (PCM, G.711, Opus, etc.) or any pre-formatted data that simply needs MTU-based chunking. It splits frame data into chunks of at most max_payload_size bytes and marks the last chunk with the RTP marker bit.

Example

    espp::GenericPacketizer generic_packer({
        .max_payload_size = 500,
        .payload_type = 97,
        .clock_rate = 48000,
        .encoding_name = "opus",
        .channels = 2,
        .fmtp = {},
        .media_type = espp::MediaType::AUDIO,
    });

    check(generic_packer.get_payload_type() == 97, "GenericPacketizer: payload type is 97");
    check(generic_packer.get_clock_rate() == 48000, "GenericPacketizer: clock rate is 48000");

    auto sdp_line = generic_packer.get_sdp_media_line();
    check(sdp_line.find("m=audio") != std::string::npos,
          "GenericPacketizer: SDP media line is audio");

    auto sdp_attrs = generic_packer.get_sdp_media_attributes();
    check(sdp_attrs.find("opus/48000/2") != std::string::npos,
          "GenericPacketizer: SDP has encoding/rate/channels");

    // Packetize 1200 bytes of synthetic audio
    std::vector<uint8_t> audio_data(1200, 0xAB);
    auto chunks =
        generic_packer.packetize(std::span<const uint8_t>(audio_data.data(), audio_data.size()));
    check(chunks.size() == 3, "GenericPacketizer: 1200 bytes @ 500 MTU = 3 chunks");
    check(chunks.back().marker, "GenericPacketizer: last chunk has marker");
    check(!chunks.front().marker, "GenericPacketizer: first chunk has no marker");

Public Functions

explicit GenericPacketizer(const Config &config)

Construct a GenericPacketizer.

Parameters:

config – The configuration for this packetizer.

~GenericPacketizer() override = default

Destructor.

virtual std::vector<RtpPayloadChunk> packetize(std::span<const uint8_t> frame_data) override

Split frame data into RTP payload chunks of at most max_payload_size. The last (or only) chunk has its marker flag set.

Parameters:

frame_data – The raw frame bytes to packetize.

Returns:

A vector of RtpPayloadChunk ready to be wrapped in RTP packets.

virtual int get_payload_type() const override

Get the RTP payload type number.

Returns:

The configured RTP payload type.

virtual uint32_t get_clock_rate() const override

Get the RTP clock rate.

Returns:

The configured clock rate in Hz.

virtual std::string get_sdp_media_attributes() const override

Generate the SDP media-level attribute lines for this codec. Produces an a=rtpmap line and, if fmtp is non-empty, an a=fmtp line.

Returns:

A string containing the SDP a= lines.

virtual std::string get_sdp_media_line() const override

Generate the SDP m= line for this codec.

Returns:

A string such as “m=audio 0 RTP/AVP 96”.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for GenericPacketizer.

Public Members

size_t max_payload_size = {1400}

Maximum payload bytes per RTP packet.

int payload_type = {96}

RTP payload type number.

uint32_t clock_rate = {48000}

Clock rate in Hz for RTP timestamps.

std::string encoding_name = {"L16"}

Encoding name for SDP rtpmap line.

int channels = {1}

Number of audio channels.

std::string fmtp

Optional format parameters for SDP fmtp line.

espp::MediaType media_type = {espp::MediaType::AUDIO}

Media type for the SDP m= line.

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Classes

class GenericDepacketizer : public espp::RtpDepacketizer

A generic RTP depacketizer that reassembles media frames from incoming RTP packets. It accumulates payload data until a packet with the marker bit set is received, then delivers the complete frame via the frame callback. If a packet arrives with a different RTP timestamp than the current accumulation buffer, the old buffer is discarded and a new one is started.

This is suitable for audio codecs (PCM, G.711, Opus, etc.) or any payload format that uses simple marker-based framing.

Example

    espp::GenericDepacketizer generic_depacker(espp::GenericDepacketizer::Config{});

    bool audio_frame_received = false;
    size_t audio_frame_size = 0;

    generic_depacker.set_frame_callback([&](std::vector<uint8_t> &&data) {
      audio_frame_received = true;
      audio_frame_size = data.size();
    });

    // Packetize and depacketize audio data
    espp::GenericPacketizer generic_packer({.max_payload_size = 500,
                                            .payload_type = 97,
                                            .clock_rate = 48000,
                                            .encoding_name = "L16",
                                            .channels = 1,
                                            .fmtp = {},
                                            .media_type = espp::MediaType::AUDIO});
    std::vector<uint8_t> audio_data(1200, 0xCD);
    auto chunks =
        generic_packer.packetize(std::span<const uint8_t>(audio_data.data(), audio_data.size()));

    uint16_t seq = 0;
    for (auto &chunk : chunks) {
      espp::RtpPacket pkt(chunk.data.size());
      pkt.set_version(2);
      pkt.set_payload_type(97);
      pkt.set_sequence_number(seq++);
      pkt.set_timestamp(1000);
      pkt.set_ssrc(77777);
      pkt.set_marker(chunk.marker);
      pkt.set_payload(std::span<const uint8_t>(chunk.data));
      pkt.serialize();
      generic_depacker.process_packet(pkt);
    }

    check(audio_frame_received, "GenericDepacketizer: frame callback invoked");
    check(audio_frame_size == 1200, "GenericDepacketizer: round-trip frame size matches");

Public Types

using frame_callback_t = std::function<void(std::vector<uint8_t>&&)>

Callback type invoked when a complete frame has been reassembled. The frame data is moved into the callback to avoid copies.

Public Functions

explicit GenericDepacketizer(const Config &config)

Construct a GenericDepacketizer.

Parameters:

config – The configuration for this depacketizer.

~GenericDepacketizer() override = default

Destructor.

virtual void process_packet(const RtpPacket &packet) override

Process an incoming RTP packet. Payload data is accumulated until a packet with the marker bit set is received. At that point the assembled frame is delivered via the frame callback and the buffer is reset.

Parameters:

packet – The RTP packet to process.

inline void set_frame_callback(frame_callback_t cb)

Set the callback for completed frames.

Parameters:

cb – The callback to invoke when a full frame is ready.

inline const std::string &get_name() const

Get the name of the component

Note

This is the tag of the logger

Returns:

A const reference to the name of the component

inline void set_log_tag(const std::string_view &tag)

Set the tag for the logger

Parameters:

tag – The tag to use for the logger

inline espp::Logger::Verbosity get_log_level() const

Get the log level for the logger

Returns:

The verbosity level of the logger

inline void set_log_level(espp::Logger::Verbosity level)

Set the log level for the logger

Parameters:

level – The verbosity level to use for the logger

inline void set_log_verbosity(espp::Logger::Verbosity level)

Set the log verbosity for the logger

See also

set_log_level

Note

This is a convenience method that calls set_log_level

Parameters:

level – The verbosity level to use for the logger

inline espp::Logger::Verbosity get_log_verbosity() const

Get the log verbosity for the logger

See also

get_log_level

Note

This is a convenience method that calls get_log_level

Returns:

The verbosity level of the logger

inline void set_log_rate_limit(std::chrono::duration<float> rate_limit)

Set the rate limit for the logger

Note

Only calls to the logger that have _rate_limit suffix will be rate limited

Parameters:

rate_limit – The rate limit to use for the logger

struct Config

Configuration for GenericDepacketizer.

Public Members

espp::Logger::Verbosity log_level = {espp::Logger::Verbosity::WARN}

Log verbosity level.

Header File

Classes

class RtpPacket

RtpPacket is a class to parse RTP packet. It can be used to parse and serialize RTP packets. The RTP header fields are stored in the class and can be modified. The payload is stored in the packet_ vector and can be modified.

Subclassed by espp::RtpJpegPacket

Public Functions

RtpPacket()

Construct an empty RtpPacket. The packet_ vector is empty and the header fields are set to 0.

explicit RtpPacket(size_t payload_size)

Construct an RtpPacket with a payload of size payload_size. The packet_ vector is resized to RTP_HEADER_SIZE + payload_size.

explicit RtpPacket(std::span<const uint8_t> data)

Construct an RtpPacket from a span of bytes. Stores the bytes in the packet_ vector and parses the header.

Parameters:

data – The span of bytes to parse.

~RtpPacket()

Destructor.

int get_version() const

Get the RTP version.

Returns:

The RTP version.

bool get_padding() const

Get the padding flag.

Returns:

The padding flag.

bool get_extension() const

Get the extension flag.

Returns:

The extension flag.

int get_csrc_count() const

Get the CSRC count.

Returns:

The CSRC count.

bool get_marker() const

Get the marker flag.

Returns:

The marker flag.

int get_payload_type() const

Get the payload type.

Returns:

The payload type.

int get_sequence_number() const

Get the sequence number.

Returns:

The sequence number.

int get_timestamp() const

Get the timestamp.

Returns:

The timestamp.

int get_ssrc() const

Get the SSRC.

Returns:

The SSRC.

void set_version(int version)

Set the RTP version.

Parameters:

version – The RTP version to set.

void set_padding(bool padding)

Set the padding flag.

Parameters:

padding – The padding flag to set.

void set_extension(bool extension)

Set the extension flag.

Parameters:

extension – The extension flag to set.

void set_csrc_count(int csrc_count)

Set the CSRC count.

Parameters:

csrc_count – The CSRC count to set.

void set_marker(bool marker)

Set the marker flag.

Parameters:

marker – The marker flag to set.

void set_payload_type(int payload_type)

Set the payload type.

Parameters:

payload_type – The payload type to set.

void set_sequence_number(int sequence_number)

Set the sequence number.

Parameters:

sequence_number – The sequence number to set.

void set_timestamp(int timestamp)

Set the timestamp.

Parameters:

timestamp – The timestamp to set.

void set_ssrc(int ssrc)

Set the SSRC.

Parameters:

ssrc – The SSRC to set.

void serialize()

Serialize the RTP header.

Note

This method should be called after modifying the RTP header fields.

Note

This method does not serialize the payload. To set the payload, use set_payload(). To get the payload, use get_payload().

std::span<const uint8_t> get_data() const

Get a span view of the whole packet.

Note

The span is valid as long as the packet_ vector is not modified.

Note

If you manually build the packet_ vector, you should make sure that you call serialize() before calling this method.

Returns:

A span of the whole packet.

size_t get_rtp_header_size() const

Get the size of the RTP header.

Returns:

The size of the RTP header.

std::span<const uint8_t> get_rtp_header() const

Get a span of bytes of the RTP header.

Returns:

A span of bytes of the RTP header.

std::vector<uint8_t> &get_packet()

Get a reference to the packet_ vector.

Returns:

A reference to the packet_ vector.

std::span<const uint8_t> get_payload() const

Get a span of bytes of the payload.

Returns:

A span of bytes of the payload.

void set_payload(std::span<const uint8_t> payload)

Set the payload.

Parameters:

payload – The payload to set.

Header File

Classes

class RtpJpegPacket : public espp::RtpPacket

RTP packet for JPEG video. The RTP payload for JPEG is defined in RFC 2435.

Public Functions

inline explicit RtpJpegPacket(std::span<const uint8_t> data)

Construct an RTP packet from a buffer.

Parameters:

data – The buffer containing the RTP packet.

inline explicit RtpJpegPacket(const int type_specific, const int frag_type, const int q, const int width, const int height, std::span<const uint8_t> q0, std::span<const uint8_t> q1, std::span<const uint8_t> scan_data)

Construct an RTP packet from fields

This will construct a packet with quantization tables, so it can only be used for the first packet in a frame.

Parameters:
  • type_specific – The type-specific field.

  • frag_type – The fragment type field.

  • q – The q field.

  • width – The width field.

  • height – The height field.

  • q0 – The first quantization table.

  • q1 – The second quantization table.

  • scan_data – The scan data.

inline explicit RtpJpegPacket(const int type_specific, const int offset, const int frag_type, const int q, const int width, const int height, std::span<const uint8_t> scan_data)

Construct an RTP packet from fields

This will construct a packet without quantization tables, so it cannot be used for the first packet in a frame.

Parameters:
  • type_specific – The type-specific field.

  • offset – The offset field.

  • frag_type – The fragment type field.

  • q – The q field.

  • width – The width field.

  • height – The height field.

  • scan_data – The scan data.

inline int get_type_specific() const

Get the type-specific field.

Returns:

The type-specific field.

inline int get_offset() const

Get the offset field.

Returns:

The offset field.

inline int get_q() const

Get the fragment type field.

Returns:

The fragment type field.

inline int get_width() const

Get the fragment type field.

Returns:

The fragment type field.

inline int get_height() const

Get the fragment type field.

Returns:

The fragment type field.

inline std::span<const uint8_t> get_mjpeg_header() const

Get the mjepg header.

Returns:

The mjepg header.

inline bool has_q_tables() const

Get whether the packet contains quantization tables.

Note

The quantization tables are optional. If they are present, the number of quantization tables is always 2.

Note

This check is based on the value of the q field. If the q field is 128-256, the packet contains quantization tables.

Returns:

Whether the packet contains quantization tables.

inline int get_num_q_tables() const

Get the number of quantization tables.

Note

The quantization tables are optional. If they are present, the number of quantization tables is always 2.

Note

Only the first packet in a frame contains quantization tables.

Returns:

The number of quantization tables.

inline std::span<const uint8_t> get_q_table(int index) const

Get the quantization table at the specified index.

Parameters:

index – The index of the quantization table.

Returns:

The quantization table at the specified index.

inline void set_q_table(int index, std::span<const uint8_t> q_table)

Set the quantization table at the specified index.

Note

This will not change the size of the packet. If the index is out of bounds, the quantization table will not be set.

Parameters:
  • index – The index of the quantization table.

  • q_table – The quantization table to set.

inline std::span<const uint8_t> get_jpeg_data() const

Get the JPEG data. The jpeg data is the payload minus the mjpeg header and quantization tables.

Returns:

The JPEG data.

int get_version() const

Get the RTP version.

Returns:

The RTP version.

bool get_padding() const

Get the padding flag.

Returns:

The padding flag.

bool get_extension() const

Get the extension flag.

Returns:

The extension flag.

int get_csrc_count() const

Get the CSRC count.

Returns:

The CSRC count.

bool get_marker() const

Get the marker flag.

Returns:

The marker flag.

int get_payload_type() const

Get the payload type.

Returns:

The payload type.

int get_sequence_number() const

Get the sequence number.

Returns:

The sequence number.

int get_timestamp() const

Get the timestamp.

Returns:

The timestamp.

int get_ssrc() const

Get the SSRC.

Returns:

The SSRC.

void set_version(int version)

Set the RTP version.

Parameters:

version – The RTP version to set.

void set_padding(bool padding)

Set the padding flag.

Parameters:

padding – The padding flag to set.

void set_extension(bool extension)

Set the extension flag.

Parameters:

extension – The extension flag to set.

void set_csrc_count(int csrc_count)

Set the CSRC count.

Parameters:

csrc_count – The CSRC count to set.

void set_marker(bool marker)

Set the marker flag.

Parameters:

marker – The marker flag to set.

void set_payload_type(int payload_type)

Set the payload type.

Parameters:

payload_type – The payload type to set.

void set_sequence_number(int sequence_number)

Set the sequence number.

Parameters:

sequence_number – The sequence number to set.

void set_timestamp(int timestamp)

Set the timestamp.

Parameters:

timestamp – The timestamp to set.

void set_ssrc(int ssrc)

Set the SSRC.

Parameters:

ssrc – The SSRC to set.

void serialize()

Serialize the RTP header.

Note

This method should be called after modifying the RTP header fields.

Note

This method does not serialize the payload. To set the payload, use set_payload(). To get the payload, use get_payload().

std::span<const uint8_t> get_data() const

Get a span view of the whole packet.

Note

The span is valid as long as the packet_ vector is not modified.

Note

If you manually build the packet_ vector, you should make sure that you call serialize() before calling this method.

Returns:

A span of the whole packet.

size_t get_rtp_header_size() const

Get the size of the RTP header.

Returns:

The size of the RTP header.

std::span<const uint8_t> get_rtp_header() const

Get a span of bytes of the RTP header.

Returns:

A span of bytes of the RTP header.

std::vector<uint8_t> &get_packet()

Get a reference to the packet_ vector.

Returns:

A reference to the packet_ vector.

std::span<const uint8_t> get_payload() const

Get a span of bytes of the payload.

Returns:

A span of bytes of the payload.

void set_payload(std::span<const uint8_t> payload)

Set the payload.

Parameters:

payload – The payload to set.

Header File

Classes

class RtcpPacket

A class to represent a RTCP packet.

This class is used to represent a RTCP packet. It is used as a base class for all RTCP packet types.

Note

At the moment, this class is not used.

Public Functions

RtcpPacket() = default

Constructor, default.

virtual ~RtcpPacket() = default

Destructor, default.

std::string_view get_data() const

Get the buffer of the packet.

Returns:

The buffer of the packet

Header File

Classes

class JpegHeader

A class to generate a JPEG header for a given image size and quantization tables. The header is generated once and then cached for future use. The header is generated according to the JPEG standard and is compatible with the ESP32 camera driver.

Public Functions

inline explicit JpegHeader(int width, int height, std::span<const uint8_t> q0_table, std::span<const uint8_t> q1_table)

Create a JPEG header for a given image size and quantization tables.

Parameters:
  • width – The image width in pixels.

  • height – The image height in pixels.

  • q0_table – The quantization table for the Y channel.

  • q1_table – The quantization table for the Cb and Cr channels.

inline explicit JpegHeader(std::span<const uint8_t> data)

Create a JPEG header from a given JPEG header data.

inline ~JpegHeader()

Destructor.

inline int get_width() const

Get the image width.

Returns:

The image width in pixels.

inline int get_height() const

Get the image height.

Returns:

The image height in pixels.

inline size_t size() const

Get the size of the JPEG header data.

Note

This is the size of the serialized JPEG header, not the image size.

Returns:

The size of the JPEG header data in bytes.

inline bool is_valid() const

Returns whether this header parsed or serialized successfully.

inline std::span<const uint8_t> get_data() const

Get the JPEG header data.

Returns:

The JPEG header data.

inline std::span<const uint8_t> get_quantization_table(int index) const

Get the Quantization table at the index.

Parameters:

index – The index of the quantization table.

Returns:

The quantization table.

Header File

Classes

class JpegFrame

A class that represents a complete JPEG frame.

This class is used to collect the JPEG scans that are received in RTP packets and to serialize them into a complete JPEG frame.

Public Functions

inline explicit JpegFrame(const espp::RtpJpegPacket &packet)

Construct a JpegFrame from a RtpJpegPacket.

This constructor will parse the header of the packet and add the JPEG data to the frame.

Parameters:

packet – The packet to parse.

inline explicit JpegFrame(const std::vector<uint8_t> &data)

Construct a JpegFrame from a vector of jpeg data.

Note

The vector must contain the complete JPEG data, including the JPEG header and EOI marker.

Parameters:

data – The vector containing the jpeg data.

inline explicit JpegFrame(std::span<const uint8_t> data)

Construct a JpegFrame from a span of jpeg data.

Note

The span must contain the complete JPEG data, including the JPEG header and EOI marker.

Parameters:

data – The span containing the jpeg data.

inline explicit JpegFrame(const uint8_t *data, size_t size)

Construct a JpegFrame from buffer of jpeg data

Parameters:
  • data – The buffer containing the jpeg data.

  • size – The size of the buffer.

inline const espp::JpegHeader &get_header() const

Get a reference to the header.

Returns:

A reference to the header.

inline int get_width() const

Get the width of the frame.

Returns:

The width of the frame.

inline int get_height() const

Get the height of the frame.

Returns:

The height of the frame.

inline bool is_complete() const

Check if the frame is complete.

Returns:

True if the frame is complete, false otherwise.

inline void append(const espp::RtpJpegPacket &packet)

Append a RtpJpegPacket to the frame. This will add the JPEG data to the frame.

Parameters:

packet – The packet containing the scan to append.

inline void add_scan(const espp::RtpJpegPacket &packet)

Append a JPEG scan to the frame. This will add the JPEG data to the frame.

Note

If the packet contains the EOI marker, the frame will be finalized, and no further scans can be added.

Parameters:

packet – The packet containing the scan to append.

inline std::span<const uint8_t> get_data() const

Get the serialized data. This will return the serialized data.

Returns:

The serialized data.

inline std::span<const uint8_t> get_scan_data() const

Get the scan data. This will return the scan data.

Returns:

The scan data.