SSL/TLS internals

This is a technical description of how SSL/TLS are handled in MINA. You don’t need to read this to have it working, better have a look at the SslFilter user guide page. However, if you want to get a deeper understanding on how it’s built, this is the place !

Note: We will assume the MINA API user has already created the SSLContext to use, so it’s not part of those explanations.

Components

The SslFilter is the filter that command everything related to SSL/TLS. It works hands in hands with the SSLHandler class, which handles the SSLEngine dialog.

With NIO, Java provides a special class that handles the SSL/TLS protocol, the SSLEngine class. It’s quite a complex piece of code, and it requires some time to tame it… We will have a look into it during this tutorial.

The biggest advantage of this SSLEngine part is that it totally abstracts the underlaying network part, so it can be used as an encryption part regardless of what the channel is used. You can even decide to use it without transmitting anything :-) On the other hand, it’s pretty complex to use…

Creation

In order to inject the filter in the chain, it must first be created. This filter takes one or two parameters:

  • SSLContext: the Java class that contains all the information related to the SSL/TLS establishment
  • autoStart: tells if the handshake should be started immediately. This is an optional parameter, defaulting to TRUE. In some cases, like when using startTLS, it’s critical to not start the handshake immediately, as it will be established on demand.

Initialization

When injecting the SslFilter into your chain, either before starting your service, or while running it, it has to be initialized in some way. Actually, this is a two phases process:

  • pre-initialization, during the filter injection
  • post-initialization, by starting the Handshake dialog

Most of the time, both phases will be merged, as one can tell the first phase to immediately starts the second.

In any case, it’s handled by the onPreAdd and onPostAdd events, which means it’s automatically processed when you push the SslFilter into the chain.

onPreAdd

The SslHandler class is created and initialized, and the instance is stored into the session attributes. That means each session has its own instance of SslHandler. This initialization will create a SSLEngine instance based on the provided SSLContext instance. The initialization will differ based on the ‘side’ you are on: server or client. Basically, the server side will wait for the client to initiate the handshake, while the client side will initiate it.

It’s also responsible to set the enabled ciphers and protocols, if one wants to use a restricted set, or an extended set (newer versions of Java have disabled old protocols and insecure ciphers).

Last, not least, it sets a list of status flags:

  • writingEncryptedData: false. This flag is used during the handshake
  • handshakeStatus: the HandShake status, which is originally set to NOT_HANDSHAKING
  • firstSSLNegotiation: true. This flag is used to tell the sslHandler to send or not an event to the application (MINA 2.1+ only)
  • handshakeComplete: false. It will be set to true when the handshake has been completed.

Side note: some of those flags are probably spurious. Some cleanup might be done to get rid of the useless ones.

onPostAdd

This event will initiate an immediate handshake if required. Depending on the peer side, the action will be different.

  • if we are on the server peer, it will create the SslEngine, initialize it for it to be ready to process incoming data.
  • if we are on the client peer, it will create the SslEngine, initialize it and send the first handshake message.

In any case, the session will only be able to process handshake messages starting from this point, until the handshake is completed.

The consequence is that we can’t anymore send messages to the remote peer until the handshake has been completed. However, we need to keep a track of those messages so that the can be published properly once the session has been secured.

At this point, the session has an instance of a SslHandler in its attribute, if the autoStart flag is set to true, otherwise it will be created when the sessionOpened event is propagated.

Note: The autStart flag is set to false if we want to do something while processing the sessionCreated event, before the creation of the SslHandler instance. This is typically when you build a filter chain dynamically, injecting a new filter between the HeadFilter and the SslFilter, you don’t want the messages to start flowing until the chain is fully built. Most of the time, however, this flag should be set to true.

Packets handling

The underlaying SslEngine will only consume complete handshake packets, while we may receive incomplete ones.

We need to keep whatever comes until it’s complete (ie until the SslEngine.unwrap() does not return a BUFFER_UNDERFLOW result). As the underlaying protocol is TCP, we may have to gather incoming messages up to the point it contains at least a complete handshake packet.

Assuming we have to deal with Handhsake packets, there are a few technical use cases:

  • We have received less than one packet: in this case, we will gather the newly coming data to the pending data. The maximum message size will be limited by the SO_RCVBUF configuration set when initializing the server, and it may be smaller than a TLS packet size. In any case, we gather whatever we just received to the pending buffer, increasing its size on the fly. Once we have a complete TLS packet, we can process it, remove if from the buffer, and continue from the remaining bytes.

  • we have received one or more than one packet: In this case, we will proceed packet by packet, until we have processed each one of them.

Note: The pending buffer can be pre-allocated to the size set by the SO_RECVBUF parameter, in order to avoid to allocate it each time we process a new packet.

Note: Using a circular ByteBuffer would spare the system the need to move the data to the begining of the buffer when it has consumed a TLS packet and there are remaining bytes, sadly the SslEngine class expect plain ByteBuffer instances :-/.

All the received data are flowing through the SslFilter.messageReceived() method, which delegate the processing to the associated Sslhandler instance:

    public void messageReceived(NextFilter next, IoSession session, Object message) throws Exception {
        SslHandler sslHandler = getSslHandler(session);
        sslHandler.receive(next, IoBuffer.class.cast(message));
    }

The SslHandler.receive() method will gather the incoming data, then process them. When done, it may have to complete the following steps:

  • Write back to the remote peer the constructed messages (either handshake packets or encrypted data packets)
  • Send to the application the received messages (only data packets)
  • Forward the produced events (SslEvent.SECURED or SslEvent.UNSECURED)

Buffer management

We have different use cases. The first thing is that a TLS record has a limited size, which is the sum of :

  • the header (5 bytes) plus some IV (Initialisation Vector, 16 bytes)
  • the data, 16 384 bytes, ie 2^14, or 32 768 bytes (2^15) for windows.
  • some padding (block cipher padding), 256 bytes
  • the MAC maximum size (48 bytes)

Most of the time, records will be smaller (and the packet size is stored in the TLS header).

Now, considering the very nature of TCP, we may receive a TLS record in small chunks, or more than one TLS record in a packet of data. We have to deal with both cases.

We will use an internal pending buffer that will contain the incoming data until they can be fully processed. The important point is that the SslEngine class is using ByteBuffer to get things done, though a set of methods:

  • wrap(ByteBuffer src, [int offset, int length,] ByteBuffer dst): Encrypt the data from the source buffer into the destination buffer
  • unwrap(ByteBuffer src, [int offset, int length,] ByteBuffer dst): Decrypt the data from the source buffer into the destination buffer

So it’s important, from a performance perspective, to allocate a quite large buffer that will be able to hold at least 2 TLS packets.

The following schema shows that we are switching from an incoming IoBuffer to a local pendingBuffer then a resulting unencrypted IoBuffer that is propagated to the application:

                          +----------------------------------------------------------------+
                          |                                                                |
                          v                                                                |
+------------------+     +----------------+------------------+        +----------------+   |
| incoming message | --> | pending buffer | incoming message | --+--> | remaining data | --+
+------------------+     +----------------+------------------+   |    +----------------+
                                                                 |
                                                                 |    +----------------+
                                                                 +--> | uncrypted data | --> Application
                                                                      +----------------+

Receiving small chunks

In the case we receive a TLS record split in several packets, we won’t be able to uncrypt them until we have received a full TLS record.

We need to collect at least a full TLS record before calling the SslEngine.unwrap() method, otherwise it will return a BUFFER_UNDERFLOW error.

NOTE: the check is also done inside SSLEngine, but it’s a good idea to do it outside, to avoid a good chunk of useless work to be done.

We use an inner buffer for that purpose, that will cumulate incoming data. This buffer should be large enough to receive at least a TLS maximum size bytes, and even more. We will pre-allocate such a buffer, which may be expanded (we may receive more than one TLS packet), and shrank (because if we keep it growing, it will use too much memory).

Receiving multiple record

We may also receive more than one TLS record. In this case, we will need to loop until we have consumed all of them.

Handshake handling

Now that the SslFilter is set, the server is ready to process the handshake protocol.

Once the client has sent the first Handshake message (ClientHello), and the server has received it, the dialogue is all about completing the handshake exchange, up to the point the connection is secured or cancelled.

There are multiple phases, but all in all, it’s about reading messages and sending back responses.

The very first message is sent by the client. That also mean we won’t be able to process any message that is not part of the handshake: if we have some messages to send to the client, or message to be sent by the client, they will be stored in a queue until the handshake is completed.

Handshake packets are encoded using a header which always starts with the 0x16 byte. The following two bytes are the TLS version, and the next two bytes is the packet size:

+------+------+------+-----------+
| 0x16 | 0XMM | 0Xmm | 0xab 0xcd |
+------+------+------+-----------+
    ^      ^      ^        ^
    |      |      |        |
    |      |      |        +-- packet size, from 0 to 2^14 (or 2^15 on Windows)
    |      |      |        
    |      |      +----------- minor version
    |      |
    |      +------------------ major version
    |
    +------------------------- Handshake byte

The major and minor numbers are encoding for:

  • 0x03 0x00: SSL 3.0 (deprecated)
  • 0x03 0x01: TLS 1.0 (deprecated)
  • 0x03 0x02: TLS 1.1
  • 0x03 0x03: TLS 1.2 and TLS 1.3

Beside this header, the handshake message has a type which is encoded in the protocol header:

+------+-----------+-----------+
| 0xNN | 0xab 0xcd | 0xabcd... |
+------+-----------+-----------+
    ^         ^          ^
    |         |          |
    |         |          +-- packet data
    |         |
    |         +------------- message size
    |
    +----------------------- Handshake type

The handshake type is one of:

  • 0x00: hello_request, used by the server to request a new handshake
  • 0x01: client_hello
  • 0x02: server_hello
  • 0x03: hello_verify_request_RESERVED (TLS 1.2 and TLS 1.3)
  • 0X04: new_session_ticket (TLS 1.3)
  • 0x05: end_of_early_data (TLS 1.3)
  • 0x06: hello_retry_request_RESERVED (TLS 1.2 and TLS 1.3)
  • 0x08: encrypted_extensions (TLS 1.2 and TLS 1.3)
  • 0x0B: certificate
  • 0x0C: server_key_exchange (**TLS 1.0 **and TLS 1.1), server_key_exchange_RESERVED (TLS 1.2 and TLS 1.3)
  • 0x0D: certificate_request
  • 0x0E: server_hello_done (TLS 1.0 and TLS 1.1), server_hello_done_RESERVED (TLS 1.2 and TLS 1.3)
  • 0x0F: certificate_verify
  • 0x10: client_key_exchange (TLS 1.0 and TLS 1.1), client_key_exchange_RESERVED (TLS 1.2 and TLS 1.3)
  • 0x14: finished
  • 0x15: certificate_url_RESERVED (TLS 1.2 and TLS 1.3)
  • 0x16: certificate_status_RESERVED (TLS 1.2 and TLS 1.3)
  • 0x17: supplemental_data_RESERVED (TLS 1.2 and TLS 1.3)
  • 0x18: key_update (TLS 1.2 and TLS 1.3)
  • 0xFE: message_hash (TLS 1.2 and TLS 1.3)

The idea is to push any received message to the SSLEngine instance, asking it for a response message to send back to the remote peer, and so on until we reach the state where the Handshake is completed or aborted.

SSLEngine states

We have two states to consider:

  • The SSLEngine states
  • The Handshake states

The first one deal with the SSLEngine status when called either during the Handshake or the data transfert, and we have 4 of them:

  • BUFFER_UNDERFLOW: The received TLS packet does not contain enough information to be correctly handled. In this case, we have to wait for more data from the remote peer, and retry.
  • BUFFER_OVERFLOW: The buffer that was pre-allocated to receive the result of the SSLEngine.wrap()/unwrap() call is not big enough. It needs to be resized, and the call should be done once more. It’s clear that this should never happen, which mean we should always allocate a big enough buffer to receive the processed data.
  • CLOSED: The SSLEngine has been closed for one reason or another. We need to send the encrypted data to the remote peer, and shutdown the session.
  • OK: The data processing was fine, we can go on with the generated buffer.

The second one deals with the Handshake processing. We have 5 different states:

  • NEED_UNWRAP: The SSLEngine is expecting to receive some TLS packet to process, ie a TLS protocol message from the remote peer.
  • NEED_WRAP: The SSLEngine is expecting to generate some TLS response packet, ie a TLS protocol message to send to the remote peer.
  • NEED_TASK: Some expensive task are to be executed, and it can be done in a separate thread to avoid blocking the SSLEngine during this processing.
  • NOT_HANDSHAKING:

The deal when processing the Handshake protocol is to play with those two status. We start with the SSLEngine status after each operation, then when we get an OK, we can check the Handshake status to get to the next step.

Synchronous vs asynchonous tasks

There are more than just wrap() and unwrap() synchronous methods, when it comes to interact with the SSLEngine instance: it may provide a list of asynchronous tasks to execute. The idea is to avoid blocking the instance for long operations. However, doing so will still require some kind of synchronisation: when the tasks are completed, you need to proceed with the next steps (likely wrap).

We may decide to execute each task in the current thread, synchronously, or to delegate the execution to a separate thread. In the first case, the Handshake may take quite a while to be processed, as some long operation may have to be executed (like the validation of a cettificate froma remote peer). Assuming we may have more than one task to process, the global time will be the sum of all tasks. In the second case, we have a different problem: if we delegate the tasks to separate threads, then we have to implement a mechanism to know when all the delegated tasks are completed.

The biggest advantage of the first approach is that it’s easy to implement: we iterate on all the tasks, one after the other, and when done, we can keep going with the handshake processing.

The biggest advantage of the second approach is that we run the tasks in parallel, optimizing the CPU usage, likely speeding up the Handshake completion.

The biggest drawback of the first approach is the increased time taken to handle the tasks, slowing down the Hansdhake processing.

The biggest drawback of the second approach is potential usage of many threads, causing costly context switches when many session are processing a handshake. Using a thread pool also risks to cause a starvation and some augmented delay, compared to the first approach.

One more thing to consider: if we want to use Virtual Threads, as they still depend on a underlying physical thread, the starvation problem remains…

Note: all the SSLEngine methods (wrap(), unwrap(), etc) are synchronized, so once you start a delegated task, any other Handshake operation will be suspended, waiting for the task to be completed. That means tasks can be executed concurrently, but no other operation.

(SSLConsumer)
      ^
      |
      +-- [AlertConsumer]
      |
      +-- [CertificateStatusConsumer]
      |
      +-- [ClientHelloConsumer]
      |
      +-- [ClientKeyExchangeConsumer]
      |
      +-- [DHClientKeyExchangeConsumer]
      |
      +-- [DHServerKeyExchangeConsumer]
      |
      +-- [ECDHClientKeyExchangeConsumer]
      |
      +-- [ECDHEClientKeyExchangeConsumer]
      |
      +-- [ECDHServerKeyExchangeConsumer]
      |
      +-- [EncryptedExtensionsConsumer]
      |
      +-- [HelloRequestConsumer]
      |
      +-- [KeyUpdateConsumer]
      |
      +-- [KrbClientKeyExchangeConsumer]
      |
      +-- [NewSessionTicketConsumer]
      |
      +-- [RSAClientKeyExchangeConsumer]
      |
      +-- [RSAServerKeyExchangeConsumer]
      |
      +-- [S30CertificateVerifyConsumer]
      |
      +-- [ServerHelloConsumer]
      |
      +-- [ServerHelloDoneConsumer]
      |
      +-- [ServerKeyExchangeConsumer]
      |
      +-- [SSLHandshake]
      |
      +-- [T10CertificateRequestConsumer]
      |
      +-- [T10CertificateVerifyConsumer]
      |
      +-- [T10ChangeCipherSpecConsumer]
      |
      +-- [T12CertificateConsumer]
      |
      +-- [T12CertificateRequestConsumer]
      |
      +-- [T12CertificateVerifyConsumer]
      |
      +-- [T12FinishedConsumer]
      |
      +-- [T13CertificateConsumer]
      |
      +-- [T13CertificateRequestConsumer]
      |
      +-- [T13CertificateVerifyConsumer]
      |
      +-- [T13ChangeCipherSpecConsumer]
      |
      +-- [T13FinishedConsumer]

Here is an exemple for the CLIENT_HELLO message on the server side:

Application Data handling

Once the handshake has been successful, we can start exchange data with the remote peer. Those data will be encrypted, then transmitted into a TLS packet, and on the remote peer, the data are unencypted, then transmitted to the application handler:

    Sender peer                               TLS packets              Network
+------------------+                      +-----------------+      
| Application data | --> {encryption} --> | i5*du7!!yyho_&y | ------------+
+------------------+                      +-----------------+             |
                                                                          v 
                                                                    _ ___________ 
                                                                   (_)___________)
                                                                          |
+------------------+                      +-----------------+             |
| Application data | <-- {decryption} <-- | i5*du7!!yyho_&y | <-----------+
+------------------+                      +-----------------+
   Receiving peer                             TLS packets

Each encrypted data are stored into TLS packet (and we may need more than one TLS packets to store the whole data).

The encryption and decryption are done by the SslEngine instance of the session.

Note: The TLS packets contain opaque data (bytes), so there must be some added logic to make sense of those data for the application to process them.

In any case, we have two use cases:

  • Writing data
  • Reading data

Reading data

Let’s start with the reading side. As explained upper, we receive bytes which are parts or TLS packets, or complete TLS packets. We will decrypt them only when we have enough bytes to form a complete TLS packet. In this case, we just have to tell the SslEngine instance to decrypt it, which will product a decrypted buffer containing the original data. Last, not least, we have to send those data to the application handler.

The process is described by the following schema:

[[encrypted data]]           .--------.
        |                    |        |
        +--------------> [unwrap]     | --> [decrypted data] --+ 
                             |        |                        |
                             .________.                        |
                                                               v
                              SslEngine        nextFilter.received([decrypted data])
                              instance

This is a bit simplified, we have some additional steps to follow:

  • Check that we have received enough data to be able to pass it to the SslEngine instance (ie we have received at least a complete TLS data packet)
  • Allocate a new buffer that will receive the decrypted data
  • Check the status of the SslEngine instance before and after the decryption, and act accordingly
  • Loop on the received data

All in all, we can see that we end with a call to the SSLEngine.unwrap() method, which is responsible for decoding what ha sbeen received.

Otherwise, it’s pretty straightforward. Here are the calls

SslFilter.messageReceived()
 |
 +-- SslHandler.receive()
      | 
      +-- SslHandler.receiveStart(next, message);       // (1)
      |    |
      |    +-- SslHandler.resumeDecodeBuffer(message)   // (2)
      |    |
      |    +-- SslHandler.receiveLoop(next, source)     // (3)
      |    |    |
      |    |    +-- check if the inbound channel has been closed        // (4)
      |    |    |
      |    |    +-- SslHandler.allocateAppBuffer(source.remaining())    // (5)
      |    |    |
      |    |    +-- SSLEngine.unwrap(source.buf(), dest.buf())          // (6)
      |    |
      |    +-- SslHandler.suspendDecodeBuffer(source)   // (7)
      |
      +-- SslHandler.throwPendingError(next)           // (8)
      |
      +-- SslHandler.forwardWrites(next)                // (9)
      |
      +-- SslHandler.forwardReceived(next)              // (10)
      |    |
      |    +-- next.messageReceived(mSession, x)        // (11) Up to the Application handler
      |         
      +-- SslHandler.forwardEvents(next)                // (12)
  • (1) Process the incoming message
  • (2) Accumulate the received message in a session buffer
  • (3) Loop until the session buffer has been consumed, and as soon as we have a complete TLS packet
  • (4) Check if the remote peer has closed the connection. If so, shutdown the TLS layer
  • (5) Allocate a buffer to store the decrypted message
  • (6) Decrypt the message.
  • (7) Clean up the decoding buffer
  • (8) Manage any possible errors
  • (9) Process the pending writes
  • (10) Process the decrypted message
  • (11) Call the next filter in the stack with the decrypted message, if we are able of doing so
  • (12) This all is necessary when processing the handshake, it does nothing here

Note: In step (4), we need to verify that the remote peer hasn’t closed the outgoing connection. If so, we have to write everything that is pending.

Writing data