Discussion: Proposing some changes to the 'node_info' table
I started working on an implementation of the network mapper/crawler in Python because I thought asyncio
would be an easy way to speed up the polling of the network (and I'm much more familiar with Python): pyMeshMap
Coming it at from the mapping side there are a few changes to the database that I think would make this more efficient (and easier to write). Granted, I've only dug into the mapping side so I don't know the repercussions to the rest of the application. It basically comes down to a few changes:
- Use the wlan_ip as the primary key.
- Replace the removed_nodes table with a node_status column in node_info that is used to track a node's status (for example: "active", "inactive", and "gone").
- Move manually set locations to another table.
Mapping Process
For context, here's how I intend the node mapping process:
- Expire the hosts_ignore table based on age of entries.
- Get a collection of IPs to ignore (preferably a hashed collection for fast lookups, I'm using Python's
set
collection). - Run the poller, providing it a "starting" node for OLSR and the IPs to ignore and getting back data for all the nodes it could poll and any errors.
- Loop through the nodes, adding/updating the database.
- Add ignore entries for any errors
- Run a query to update node statuses based on time since last seen.
Rationale
I suggest the primary key change because the IP address needs to be unique on the network so it would be a natural primary key. Furthermore, it's how the nodes are referenced when looking up the location information to build the topology information, so having it as a key value should be a performance increase in larger networks (although that depends on specific database implementations). Another benefit is that would eliminate the need for additional/separate queries to handle the majority of node renames (the exception being if someone changed the IP address as well, so the question is how important that edge case is).
Consolidating the node tables would mean that there was no longer a need to check if a node had been deleted and then deleting the row from the one table and adding it to the other one. A single update with the current timestamp and an "active" status would return a node to active status. It would require updating queries that select the nodes to display so that non-active nodes are excluded. I haven't looked into how many places that would be.
Similarly, moving manually set locations to a separate table would eliminate another "check" in the import process, although it would complicate the location lookup, both for updating topology and placing on the map. Thus the more I think about this one I'm less sold on it. It would have the advantage of always having the node's location in node_info so that if someone did enter the location we'd be able to see that value.
While the goal of several of these is to reduce database calls, I'm not sure the actual performance impact of all of them, so I was also considering the simplicity of the code. And some of that is impacted by the tools I'm using in Python. For instance, the SQLAlchemy ORM has a convenient method to handle the INSERT/UPDATE by just calling dbsession.merge(new_model)
.