An Overview of TCPCopy for Rookies – DZone – Uplaza

With the speedy growth of Web expertise, server-side architectures have turn out to be more and more complicated. It’s now tough to rely solely on the non-public expertise of builders or testers to cowl all attainable enterprise situations. Due to this fact, actual on-line site visitors is essential for server-side testing. TCPCopy [1] is an open-source site visitors replay instrument that has been broadly adopted by massive enterprises. Whereas many use TCPCopy for testing of their initiatives, they could not totally perceive its underlying ideas. This text gives a short introduction to how TCPCopy works, with the hope of aiding readers.

Structure

The structure of TCPCopy has undergone a number of upgrades, and this text introduces the most recent 1.0 model. As proven within the diagram beneath, TCPCopy consists of two parts: tcpcopy and intercept. tcpcopy runs on the web server, capturing stay TCP request packets, modifying the TCP/IP header data, and sending them to the take a look at server, successfully “tricking” the take a look at server. intercept runs on an auxiliary server, dealing with duties equivalent to relaying response data again to tcpcopy.

Determine 1: Overview of the TCPCopy Structure

The simplified interplay course of is as follows:

  1. tcpcopy captures packets on the web server.
  2. tcpcopy modifies the IP and TCP headers, spoofing the supply IP and port, and sends the packet to the take a look at server. The spoofed IP deal with is set by the -x and -c parameters set at startup.
  3. The take a look at server receives the request and returns a response packet with the vacation spot IP and port set to the spoofed IP and port from tcpcopy.
  4. The response packet is routed to the intercept server, the place intercept captures and parses the IP and TCP headers, usually returning solely empty response knowledge to tcpcopy.
  5. tcpcopy receives and processes the returned knowledge.

Technical Rules

TCPCopy operates in two modes: on-line and offline. The web mode is primarily used for real-time capturing of stay request packets, whereas the offline mode reads request packets from pcap-format information. Regardless of the distinction in working modes, the core ideas stay the identical. This part gives an in depth rationalization of TCPCopy’s core ideas from a number of views.

1. Packet Capturing and Sending

The core features of tcpcopy may be summarized as “capturing” and “sending” packets. Let’s start with packet capturing. How do you seize actual site visitors from the server? Many individuals might really feel confused when first encountering this query. The truth is, Linux working methods already present the required performance, and a strong understanding of superior Linux community programming is all that is wanted. The initialization of packet capturing and sending in tcpcopy is dealt with within the tcpcopy/src/communication/tc_socket.c file. Subsequent, we’ll introduce the 2 strategies tcpcopy makes use of for packet capturing and packet sending.

Uncooked Socket

A uncooked socket can obtain packets from the community interface card on the native machine.  That is notably helpful for monitoring and analyzing community site visitors. The code for initializing uncooked socket packet capturing in tcpcopy is proven beneath, and this technique helps capturing packets at each the information hyperlink layer and the IP layer.

int
tc_raw_socket_in_init(int sort)
{
    int        fd, recv_buf_opt, ret;
    socklen_t  opt_len;

    if (sort == COPY_FROM_LINK_LAYER) {
        /* Copy ip datagram from Hyperlink layer */
        fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_IP));
    } else {
        /* Copy ip datagram from IP layer */
#if (TC_UDP)
        fd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
#else
        fd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);
#endif
    }

    if (fd == -1) {
        tc_log_info(LOG_ERR, errno, "Create raw socket to input failed");   
        fprintf(stderr, "Create raw socket to input failed:%sn", strerror(errno));
        return TC_INVALID_SOCK;
    }

    recv_buf_opt = 67108864;
    opt_len = sizeof(int);

    ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &recv_buf_opt, opt_len);
    if (ret == -1) {
        tc_log_info(LOG_ERR, errno, "Set raw socket(%d)'s recv buffer failed");
        tc_socket_close(fd);
        return TC_INVALID_SOCK;
    }

    return fd;
}

The code for initializing the uncooked socket for sending packets is proven beneath. First, it creates a uncooked socket on the IP layer and informs the protocol stack to not append an IP header to the IP layer.

int
tc_raw_socket_out_init(void)
{
    int fd, n;

    n = 1;

    /*
     * On Linux when setting the protocol as IPPROTO_RAW,
     * then by default the kernel units the IP_HDRINCL possibility and 
     * thus doesn't prepend its personal IP header. 
     */
    fd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);

    if (fd == -1) {
        tc_log_info(LOG_ERR, errno, "Create raw socket to output failed");
        fprintf(stderr, "Create raw socket to output failed: %sn", strerror(errno));
        return TC_INVALID_SOCK;
    } 

    /*
     * Inform the IP layer to not prepend its personal header.
     * It doesn't want setting for linux, however *BSD wants
     */
    if (setsockopt(fd, IPPROTO_IP, IP_HDRINCL, &n, sizeof(n)) 

Assemble the whole packet and ship it to the goal server.

  • dst_addr is stuffed with the goal IP deal with.
  • The IP header is populated with the supply and vacation spot IP addresses.
  • The TCP header is stuffed with the supply port, vacation spot port, and different related data.

Pcap

Pcap is an utility programming interface (API) supplied by the working system for capturing community site visitors, with its title derived from “packet capture.” On Linux methods, pcap is applied by way of libpcap, and most packet seize instruments, equivalent to tcpdump, use libpcap for capturing site visitors.

Under is the code for initializing packet seize with pcap.

int
tc_pcap_socket_in_init(pcap_t **pd, char *system, 
        int snap_len, int buf_size, char *pcap_filter)
{
    int         fd;
    char        ebuf[PCAP_ERRBUF_SIZE]; 
    struct      bpf_program fp;
    bpf_u_int32 internet, netmask;      

    if (system == NULL) {
        return TC_INVALID_SOCK;
    }

    tc_log_info(LOG_NOTICE, 0, "pcap open,device:%s", system);

    *ebuf=" ";

    if (tc_pcap_open(pd, system, snap_len, buf_size) == TC_ERR) {
        return TC_INVALID_SOCK;
    }

    if (pcap_lookupnet(system, &internet, &netmask, ebuf) 

The code for initializing packet sending with pcap is as follows:

int
tc_pcap_snd_init(char *if_name, int mtu)
{
    char  pcap_errbuf[PCAP_ERRBUF_SIZE];

    pcap_errbuf[0] = ' ';
    pcap = pcap_open_live(if_name, mtu + sizeof(struct ethernet_hdr), 
            0, 0, pcap_errbuf);
    if (pcap_errbuf[0] != ' ') {
        tc_log_info(LOG_ERR, errno, "pcap open %s, failed:%s", 
                if_name, pcap_errbuf);
        fprintf(stderr, "pcap open %s, failed: %s, err:%sn", 
                if_name, pcap_errbuf, strerror(errno));
        return TC_ERR;
    }

    return TC_OK;
}

Uncooked Socket vs. Pcap

Since tcpcopy presents two strategies, which one is best?

When capturing packets, we're primarily involved with the precise packets we want. If the seize configuration will not be set appropriately, the system kernel would possibly seize too many irrelevant packets, resulting in packet loss, particularly beneath excessive site visitors stress. After in depth testing, it has been discovered that when utilizing the pcap interface to seize request packets, the packet loss fee in stay environments is usually larger than when utilizing uncooked sockets. Due to this fact, tcpcopy defaults to utilizing uncooked sockets for packet seize, though the pcap interface will also be used (with the --enable-pcap possibility), which is principally suited to high-end pfring captures and captures after swap mirroring.

For packet sending, tcpcopy makes use of the uncooked socket output interface by default, however it could actually additionally ship packets by way of pcap_inject (utilizing the --enable-dlinject possibility). The selection of which technique to make use of may be decided based mostly on efficiency testing in your precise atmosphere.

2. TCP Protocol Stack

We all know that the TCP protocol is stateful. Though the packet-sending mechanism was defined earlier, with out establishing an precise TCP connection, the despatched packets can't be really acquired by the testing service. In on a regular basis community programming, we usually use the TCP socket interfaces supplied by the working system, which summary away a lot of the complexity of TCP states. Nonetheless, in tcpcopy, since we have to modify the supply IP and vacation spot IP of the packets to deceive the testing service, the APIs supplied by the working system are now not adequate.

Because of this, tcpcopy implements a simulated TCP state machine, representing probably the most complicated and difficult side of its codebase. The related code, positioned in tcpcopy/src/tcpcopy/tc_session.c, handles essential duties equivalent to simulating TCP interactions, managing community latency, and emulating upper-layer interactions.

Determine 2: Traditional TCP state machine overview

In tcpcopy, a session is outlined to keep up data for various connections. Totally different captured packets are processed accordingly:

  • SYN packet: Represents a brand new connection request. tcpcopy assigns a supply IP, modifies the vacation spot IP and port, then sends the packet to the take a look at server. On the identical time, it creates a brand new session to retailer all states of this connection.
  • ACK packet:
    • Pure ACK packet: To cut back the variety of despatched packets, tcpcopy typically would not ship pure ACKs.
    • ACK packet with payload (indicating a selected request): It finds the corresponding session and sends the packet to the take a look at server. If the session continues to be ready for the response to the earlier request, it delays sending.
  • RST packet: If the present session is ready for the take a look at server’s response, the RST packet will not be despatched. In any other case, it is despatched.
  • FIN packet: If the present session is ready for the take a look at server’s response, it waits; in any other case, the FIN packet is shipped.

3. Routing

After tcpcopy sends the request packets, their journey will not be completely clean:

  • The IP of the request packet is cast and never the precise IP of the machine working tcpcopy. If some machines have rpfilter (reverse path filtering) enabled, it’s going to examine whether or not the supply IP deal with is reliable. If the supply IP is untrustworthy, the packet will likely be discarded on the IP layer.
  • If the take a look at server receives the request packet, the response packet will likely be despatched to the cast IP deal with. To make sure these response packets do not mistakenly return to the consumer with the cast IP, correct routing configuration is important. If the routing is not arrange appropriately, the response packet will not be captured by intercept, resulting in incomplete knowledge trade.
  • After intercept captures the response packet, it extracts the response packet, and discards the precise knowledge, returning solely the response headers and different obligatory data to tcpcopy. When obligatory, it additionally merges the return data to cut back the impression on the community of the machine working tcpcopy.

4. Intercept

For these new to tcpcopy, it could be puzzling — why is intercept obligatory if we have already got tcpcopy? Whereas intercept could appear redundant, it truly performs a vital position. You possibly can consider intercept because the server-side counterpart of tcpcopy, with its title itself explaining its perform: an “interceptor.” However what precisely does intercept must intercept? The reply is the response packet from the take a look at service.

If intercept weren’t used, the response packets from the take a look at server can be despatched on to tcpcopy. Since tcpcopy is deployed in a stay atmosphere, this implies the response packets can be despatched on to the manufacturing server, considerably rising its community load and probably affecting the traditional operation of the stay service. With intercept, by spoofing the supply IP, the take a look at service is led to “believe” that these spoofed IP purchasers are accessing it. Intercept additionally performs aggregation and optimization of the response packet data, additional guaranteeing that the stay atmosphere on the community degree will not be impacted by the take a look at atmosphere.

intercept is an impartial course of that, by default, captures packets utilizing the pcap technique. Throughout startup, the -F parameter must be handed, for instance, “tcp and src port 8080,” following libpcap‘s filter syntax. Which means that intercept doesn’t join on to the take a look at service however listens on the desired port, capturing the return knowledge packets from the take a look at service and interacting with tcpcopy.

5. Efficiency

tcpcopy makes use of a single-process, single-thread structure based mostly on an epoll/choose event-driven mannequin, with associated code positioned within the tcpcopy/src/occasion listing. By default, epoll is used throughout compilation, although you may swap to pick out with the --select possibility. The selection of technique can rely upon the efficiency variations noticed throughout testing. Theoretically, epoll performs higher when dealing with numerous connections.

In sensible use, tcpcopy‘s efficiency is instantly tied to the quantity of site visitors and the variety of connections established by intercept. The only-threaded structure itself is often not a efficiency bottleneck (as an example, Nginx and Redis each use single-threaded + epoll fashions and might deal with massive quantities of concurrency). Since tcpcopy solely establishes connections instantly with intercept and doesn’t want to hook up with the take a look at machines or occupy port numbers, tcpcopy consumes fewer assets, with the primary impression being on community bandwidth consumption.

static tc_event_actions_t tc_event_actions = {
#ifdef TC_HAVE_EPOLL
    tc_epoll_create,
    tc_epoll_destroy,
    tc_epoll_add_event,
    tc_epoll_del_event,
    tc_epoll_polling
#else
    tc_select_create,
    tc_select_destroy,
    tc_select_add_event,
    tc_select_del_event,
    tc_select_polling
#endif
};

Conclusion

TCPCopy is a superb open-source mission. Nonetheless, as a result of creator’s limitations, this text solely covers the core technical ideas of TCPCopy, leaving many particulars untouched [2]. Nonetheless, I hope this introduction gives some inspiration to these taken with TCPCopy and site visitors replay applied sciences!

References

[1] GitHub: session-replay-tools/tcpcopy

[2] Cellular take a look at growth A short evaluation of the precept of TCPCopy, a real-time site visitors playback instrument

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version