Founded in 2011, Scalyr was built by a team of ex-Google engineers with years of DevOps experience. We know what it's like to be on call, get an alert, and not have enough information to track down the problem. We know what it's like to juggle half a dozen balky monitoring systems in an effort to get a complete picture. We know what it's like to scramble to respond to a crisis, casting about to find out what's wrong, and wait agonizing seconds for each new graph to load. We know what it's like to know that the information you need is in the logs somewhere, but not be able to get to it without taking time you don't have to write code you don't want to maintain.
We knew there was a better way to handle server monitoring and production analytics. It's the year 2013, amazing things are all around us. It had to be possible to get quick, easy, flexible access to monitoring data using a simple interface. But no one seemed to be building it. So... we did.
Scalyr Logs is built around a few core principles:
- Unification — all aspects of production data, including graphs, logs, dashboards, and alerts, should be managed in a single system. One thing to install, one system to learn, one place to go for production data.
- Flexibility — work with any type of data, from raw timeseries or logs to complex structured data. All analysis and display tools apply to all data.
- Practicality — keep things simple and straightforward, like the best engineering tools. Avoid glossy setup wizards and buzzword-compliant "enterprise" features, and focus on the day-to-day needs of engineers.
- Depth — "simplicity" shouldn't be an excuse for limited functionality. Scalyr Logs is built around a full-fledged query language, supporting regular expressions, numeric comparisons, arithmetic operations, boolean combinations, and more. This query language is used everywhere, from graphs to log search to dasbhoards to alert rules.
- Ease of use. The best engineering tools manage to combine power with simplicity and ease of use. This is the standard we use to guide product decisions. For instance, we work hard to make the power of the query and visualization tools available through simple exploration, with drill-down links to refine queries and pivot to new visualizations. We count the number of clicks needed to get up and running, we've added context-sensitive usage tips so that you can dive in without having to stop and read a manual, and we track our page load times to the millisecond in a constant effort to push the envelope on responsiveness.
It all starts with the Scalyr Agent, a lightweight daemon that you install on your servers. The agent is responsible for collecting data and forwarding it to our servers. It records basic system metrics, monitors log files, and accepts data from Graphite- and OpenTSDB-compatible tools. All of this is bundled into a single data stream, buffered, and streamed over SSL to the Scalyr Logs backend. (For platforms like Heroku where installing the agent is not possible, we also support syslog.)
At the backend, parsing rules are applied to extract structured information from log messages. At the end of this process, each data item is represented as an "event" — an arbitrary list of named fields, with a timestamp. System metric data points, log messages, and custom data are all represented as events.
Events are then stored in a custom-built event database, where they can be accessed through a powerful query engine. The query engine is the heart of Scalyr Logs. Graphs, log search, facet analysis, dashboards, and alert evaluation are all built around the query engine. This unified design allows us to focus our energy on scaling and optimizing the query engine, delivering the flexibility, depth, and speed that are central to the product vision. Some techniques that we use to accelerate queries:
- Aggressive parallelism. When you load a page, our entire cluster drops whatever background work it was doing and devotes its full attention to your query. A second later, it will probably do the same for someone else, but the bursty nature of monitoring workloads is such that you get the benefit of far more computing power than you'd have in a self-hosted solution.
- Column-oriented storage. For each field (e.g. URL, status, and user-agent), the values for all events are stored together, allowing the query engine to rapidly scan through just those columns that are relevant to a particular query.
- Home-grown database implementation. We've built our own columnar data store, optimized for monitoring workloads.
- Solid-state (flash) storage for recent logs.
- Precomputation. Dashboard graphs are precomputed on a continuous basis. Whenever you load a dashboard, the data is already there without having to be queried.
- Pre-parsing. Log messages are parsed on arrival, rather than on the fly during query execution.
- Intelligent subsampling. With enormous result sets, we build results based on a random subsample — a few hundred thousand data points is plenty to generate a graph.
Jobs at Scalyr
Scalyr is looking for a few no-nonsense engineers. We hire for talent, track record, and passion, not the ability to tweak your resume to match a job description. If you live and breathe code, enjoy challenges, need to feel proud of your work, and want to get in on the ground floor of something big — contact firstname.lastname@example.org.
What we're up to
At Scalyr, we mean to change the face of cloud computing, one service at a time. To that end, we're building a complete new software stack, breaking new ground in data management, scaling, reliability, manageability, performance, and user interface.
What to tell us
Resumes are fine, but what we're especially looking for is something you've done in the past that you're proud of. That can be a pointer to an open-source project, a technical blog, anything. Often your best work will be something you can't share directly; in that case, just tell us about it. Regardless, please be clear about what role you played in the project, and what made your part interesting and pride-worthy. To put it another way: take the best paragraph in your resume, expand on that, and don't worry so much about the rest.
Also please say something about your career goals and the type of role you're interested in.
Scalyr is located on the San Francisco Peninsula. You'll be working with a small team of world-class, passionate engineers. Our founder, Steve Newman, was the co-founder and lead engineer on Writely (which became Google Docs) and led the original development of the consistent cross-data-center replication in Google's Megastore (http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf). Hours and work-from-home are flexible up to a point; we don't believe in rigid rules, but do believe that face-to-face interaction is critical for a team.