Those are the “sexy” projects — the ones where you’re truly challenged, and where your knowledge and know-how are pushed to the limit and then some to get you to that deliverable. You know you’re working on cutting edge stuff, pushing the limits of what’s presently out there, and it’s undeniably exciting.
When I joined TextNow about eighteen months ago, I met with two members of the calling and messaging team during the interview process. They introduced me to the seriously cool “elastic calling” feature they had recently deployed, which is an ingenious solution to help deal with situations where a customer’s IP link is poor and unable to continue carrying the VOIP stream. That’s a perfect example of the kind of outside-the-box lateral-thinking move that I really enjoy — a problem that pushes my knowledge of the problem space to its absolute limits. It showed me that TextNow is a company that isn’t afraid to listen to some potentially wild ideas that push the envelope and truly innovate to solve a real-world problem. (As a side note, many companies say they welcome these types of thinkers and are willing to take risks with projects, but very few companies actually are.) I was genuinely excited to start working here.
My first month on the calling and messaging team was spent doing bug fixes and helping track down customer-reported issues, which really helped me get familiar with the code base and learn how the back-end worked with the various supported clients. I first tackled how to implement support for MMS — a formidable challenge, to say the least — but then moved into working with termination providers to reduce our costs on the backend. These projects were not visible to our customers, but they impacted the business just the same. But I discovered that, in true TextNow fashion, everyone pulled together to solve the challenges and found ways to crush the hurdles that stood in our way of success, and we delivered solid implementations across the board.
At this point in TextNow’s growth, there was a reorganization happening, and I was asked by my then-manager where I would like to be in the re-org. Calling and Messaging had been split up into separate teams. Which one did I want to join? Was there another team I wanted to join instead? I was looking at the giant chart that had all the various roles in Engineering with names next to most of them.
I noticed a team labeled ‘COGS’.
“What’s COGS?”, I asked him, expecting it to be some code-name for an unannounced skunk works project.
His answer was not what I expected. “‘Cost of goods sold’. Basically, we need an engineering resource to build a framework to pull together all our data and reconcile it against our bills.”
“That sounds sexy…”, I replied sarcastically.
“Yeah, it’s not a very visible or ‘sexy’ project”, using airquotes as he said it, “But it is essential to get it built to help the business continue to grow. Today it’s all being done manually.”
I thought to myself: this sounds like a relatively simple project, at most a quarter or two worth of work, and it does need a senior software developer, so I can tackle it as I doubt anyone else will want to volunteer. As a bonus, it’ll be an area I haven’t had much prior experience in, so I can use it as an opportunity to learn about the financial side as well.
“Alright, put me in for the COGS role,” and with that my name had been added to the giant chart.
That night when I went home and discussed my day with my wife, she was genuinely concerned with my decision to choose the COGS role.
“Won’t you get bored?” she asked. “You thrive on doing outside-the-box stuff, pushing your boundaries. This sounds like importing data into Excel and running a formula.”
A part of me agreed with her perception, and while I didn’t truly believe it would be as simple as importing data into Excel, I also didn’t think it would extend as far as it has.
A sense of purpose and accomplishment is what gets most developers engaged and excited to go to work in the mornings, and I am no exception. Looking around the TextNow office, there isn’t a single person in Engineering that doesn’t take pride in what they’re working on, regardless of how visible or invisible it appears to be. Everything we do is in service of solving a problem that needs to be solved, and COGS had a problem that needed to be solved.
My initial task was to reconcile our calling costs against our monthly bills. Seemed pretty straightforward — our backend systems collect logs for every call, we have a rate-deck that gives a cost for the call, and we also get a list of calls in the form of a CDR (calling detail records) that line up with the bill we receive. All I had to do was pull the info for the calls from our backend, compare it to the CDR, and identify anything that didn’t line up. Just a few minor problems to overcome first.
The first issue was an easy one. The phone numbers in our backend system were normalized to e.164 format — for example, +18885551212 — while our CDRs were using various multiple formats, usually 10 digit (i.e. 8885551212). That just required normalizing the CDR numbers to all be e.164 format.
Then came the matching logic. I tried importing it all into a DB and letting it try to do the matching. Not only did it fail to match correctly, but it was also far too slow. Not being a database administrator myself, I decided to tackle it using code. From my past experience working on network appliances, I figured adapting the quintuple approach was the best way to match entries (although in this case it was four and they were the from, to, duration, and timestamps, so really more of a quadruple), hash all those values and compare. I implemented a quick proof of concept, and ran some data through it, and…had a success rate of about 20%. Huh? Clearly something was wrong, but this was basic hash matching! I had done this dozens of times before!
It turns out that our logs will record the time that the call was initiated as the timestamp, while the CDR records the time the call was connected as the timestamp. Those timestamps do not line up. There is no predictable delta between them either, as network latency is a factor, so attempting to adjust those timestamps and hashing them wouldn’t work.
What I needed was a fuzzy-matching algorithm. But while fuzzy matching is great for words in strings, it’s not so great for my needs as I needed it for timestamps. I could hash the from, to, and duration, and then use those matches to narrow down the possible matches, examine the timestamps, and find the ones that are the closest match. Sounded reasonable, so I did another proof of concept. Even with the recursive fuzzy-matching-like logic, it was still running reasonably well, but I was starting to creep up in my memory usage. Why do we have so many calls in a day?
We have many of millions of calls per day and as it happens, some of those are re-dials to the same number in a similar time frame. While the proof of concept was good at identifying the correct calls to reconcile when they were typical calls with conversations, the unanswered, multiple redial scenarios were matching randomly. I needed a way to tell the order in which they occurred when doing the fuzzy match. And just to make it a little more difficult, the call duration will round up to the nearest billable amount, so any call less than six seconds would have a duration of six seconds in one system, while the actual duration is in the other system. So I had even less of a certain match to work with.
I ended up with a multi-pass approach. First, match all the long calls using the fuzzy match on timestamp and hash match on the rest. Then take the remaining short calls and do a modified fuzzy match that takes into account the order in which the calls show up in the data sources. That seemed to work pretty well.
One bill down, about half a dozen to go. So much for the theory that all I had to do was import the data into excel, or that I wouldn’t be challenged technically. I ended up creating a specialized (and heavily optimized) matching algorithm to take our data and our termination provider’s data and reconcile them, without running out of memory, or taking more than twenty-four hours to process one day’s worth of data.
Having been in the role for six months now, my work has directly and indirectly fed into multiple departments, helped catch billing mistakes from our vendors in excess of six figures, identify gaps in our inventory tracking, help feed fraud detection, and improve response times for the abuse department. Not bad considering I thought all I would deliver is a way to automatically reconcile some bills.
All this is because at TextNow, even if you’re on a team of one (at least on paper), you’re always interacting with other teams in all sorts of areas of the business. A casual conversation with a member of the abuse department in the lunchroom about the struggles with their system backlogging lead to us realizing that with just a few tweaks to my existing data import logic, we could improve the speed at which we service abuse requests by several orders of magnitude while maintaining the chain of evidence integrity. Two completely separate systems with very different reasons for existing could be combined to gain some serious efficiencies. A few more conversations with members of our internal tools team, and we now have a new tool deployed that helped alleviate the abuse department’s backlog.
That is the essence of what working at TextNow is like, regardless of what team you end up on, or role you take. The constant opportunities to collaborate and solve issues in new and unique ways is ever present. We’re all cogs in a big machine that work together to help deliver one of the best services out there, if you’ll excuse the pun.
But remember, I don’t work on a ‘sexy’ project. I’m just the reconciler.