In net/core/rtnetlink.c, there is an rtnetlink_init which is of interest to us.
__initfunc(void rtnetlink_init(void))
{
#ifdef RTNL_DEBUG
printk("Initializing RT netlink socket\n");
#endif
rtnl = netlink_kernel_create(NETLINK_ROUTE, rtnetlink_rcv);
if (rtnl == NULL)
panic("rtnetlink_init: cannot initialize rtnetlink\n");
register_netdevice_notifier(&rtnetlink_dev_notifier);
rtnetlink_links[PF_UNSPEC] = link_rtnetlink_table;
rtnetlink_links[PF_PACKET] = link_rtnetlink_table;
}
This function is called as part of the sock_init function in net/socket.c The function creates a netlink socket in the kernel which handles the user requests. The code of the netlink_kernel_create is
struct sock *
netlink_kernel_create(int unit, void (*input)(struct sock *sk, int len))
{
.
.
if (netlink_create(sock, unit) < 0) {
sock_release(sock);
return NULL;
}
sk = sock->sk;
if (input)
sk->data_ready = input;
netlink_insert(sk);
.
.
}
The function creates a netlink socket and then makes an entry in the
nl_table, infact since this socket is created when the system comes up, it
will be the first entry in that table. This netlink socket which is created
will have a pid = 0, which is the reason that all user netlink sockets which
want to perfrom NETLINK_ROUTE related functions have to contact this socket by
setting the pid to be 0. Also note that the function is called with a function
pointer rtnetlink_rcv and the data_ready pointer is set to this value. This
function is significant in the sense that this is the entry point into the
kernel.
The link_rtnetlink_table is a table of structures
struct rtnetlink_link
{
int (*doit)(struct sk_buff *, struct nlmsghdr*, void *attr);
int (*dumpit)(struct sk_buff *, struct netlink_callback *cb);
};
which consists of the doit and dumpit function pointers. The table can be
indexed by the action to be performed say RTM_NEWQDISC, RTM_DELQDISC etc and
the corresponding function called.
This table is furthur filled up in sched/sch_api.c as
link_p[RTM_NEWQDISC-RTM_BASE].doit = tc_modify_qdisc;
link_p[RTM_DELQDISC-RTM_BASE].doit = tc_get_qdisc;
link_p[RTM_GETQDISC-RTM_BASE].doit = tc_get_qdisc;
link_p[RTM_GETQDISC-RTM_BASE].dumpit = tc_dump_qdisc;
link_p[RTM_NEWTCLASS-RTM_BASE].doit = tc_ctl_tclass;
link_p[RTM_DELTCLASS-RTM_BASE].doit = tc_ctl_tclass;
link_p[RTM_GETTCLASS-RTM_BASE].doit = tc_ctl_tclass;
link_p[RTM_GETTCLASS-RTM_BASE].dumpit = tc_dump_tclass;
and the route related function pointers are stored in /net/ipv4/devinet.c
static struct rtnetlink_link inet_rtnetlink_table[RTM_MAX-RTM_BASE+1] =
{
.
.
{ inet_rtm_newroute, NULL, },
{ inet_rtm_delroute, NULL, },
{ inet_rtm_getroute, inet_dump_fib, },
.
.
}
rtnetlink_links[PF_INET] = inet_rtnetlink_table;
Now let us trace how the netlink packet from the user space finds its way in the kernel. The send_msg is mapped to sys_sendmsg which inturn calls the netlink_sendmsg() in our case, this function calls the netlink_unicast() or netlink_broadcast() as the case may be. This function identifies to which netlink socket this message has to be passed by comparing the pids of all the netlink sockets in the nl_table and calls the data_ready function of that socket which is the rtnetlink_rcv() for NETLINK_ROUTE case. The relevant section of the code is
int netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 pid, int
nonblock)
{
.
.
for (sk = nl_table[protocol]; sk; sk = sk->next) {
if (sk->protinfo.af_netlink.pid != pid)
continue;
.
sk->data_ready(sk, len);
}
The flow of code from rtnetlink_rcv() is that the skb is dequeued and then passed on to rtnetlink_rcv_skb() which inturn calls the rtnetlink_rcv_msg(), this function actually extracts the operation to be performed from the netlink packet and calls the corresponding doit function by indexing into the rtnetlink_links array depending on the family, eg. for queue and class related stuff, the family is AF_UNSPEC and the indexing is done into the link_rtnetlink_table, whereas for route modifications, the indexing is done into the inet_rtnetlink_table because the family is AF_INET. Thus the appropriate function is reached and the necessary action taken and the success/failure reported to the user.