Tag: machine learning

Finding Exoplanets with Splunk

Splunk is a software platform designed to search, analyze and visualize machine-generated data, making sense of what, to most of us, looks like chaos.

Ordinarily, the machine data used by Splunk is gathered from websites, applications, servers, network equipment, sensors, IoT (internet-of-things) devices, etc, but there’s no limit to the complexity of data Splunk can consume.

Splunk specializes in Big Data, so why not use it to search the biggest data of all and find exoplanets?

What is an exoplanet?

An exoplanet is a planet in orbit around another star.

The first confirmed exoplanet was discovered in 1995 orbiting the star 51 Pegasi, which makes this an exciting new, emerging field of astronomy. Since then, Earth-based and space-based telescopes such as Kepler have been used to detect thousands of planets around other stars.

At first, the only planets we found were super-hot Jupiters, enormous gas giants orbiting close to their host stars. As techniques have been refined, thousands of exoplanets have been discovered at all sizes and out to distances comparable with planets in our own solar system. We have even discovered exomoons!


How do you find an exoplanet?

Imagine standing on stage at a rock concert, peering toward the back of the auditorium, staring straight at one of the spotlights. Now, try to figure out when a mosquito flies past that blinding light. In essence, that’s what telescopes like NASA’s TESS (Transiting Exoplanet Survey Satellite) are doing.

The dip in starlight intensity can be just a fraction of a percent, but it’s enough to signal that a planet is transiting the star.

Transits have been observed for hundreds of years in one form or another, but only recently has this idea been applied outside our solar system.

Australia has a long history of human exploration, starting some 60,000 years ago. In 1769 after (the then) Lieutenant James Cook sailed to Tahiti to observe the transit of Venus across the face of the our closest star, the Sun, he was ordered to begin a new search for the Great Southern Land which we know as Australia. Cook’s observation of the transit of Venus used largely the same technique as NASA’s Hubble, Kepler and TESS space telescopes but on a much simpler scale.

Our ability to monitor planetary transits has improved considerably since the 1700s.

NASA’s TESS orbiting telescope can cover an area 400 times as broad as NASA’s Kepler space telescope and is capable of monitoring a wider range of star types than Kepler, so we are on the verge of finding tens of thousands of exoplanets, some of which may contain life!

How can we use Splunk to find an exoplanet?

 Science thrives on open data.

All the raw information captured by both Earth-based and space-based telescopes like TESS are publicly available, but there’s a mountain of data to sift through and it’s difficult to spot needles in this celestial haystack, making this an ideal problem for Splunk to solve.

While playing with this over Christmas, I used the NASA Exoplanet Archive, and specifically the PhotoMetric data containing 642 light curves to look for exoplanets. I used wget in Linux to retrieve the raw data as text files, but it is possible to retrieve this data via web services.

MAST, the Mikulski Archive for Space Telescopes, has made available a web API that allows up to 500,000 records to be retrieved at a time using JSON format, making the data even more accessible to Splunk.

Some examples of API queries that can be run against the MAST are:

The raw data for a given observation appears as:

Information from the various telescopes does differ in format and structure, but it’s all stored in text files that can be interrogated by Splunk.

Values like the name of the star (in this case, Gliese 436) are identified in the header, while dates are stored either using HJD (Heliocentric Julian Dates) or BJD (Barycentric Julian Dates) centering on the Sun (with a difference of only 4 seconds between them).

Some observatories will use MJD which is the Modified Julian Date (being the Julian Date minus 2,400,000.5 which equates to November 17, 1858). Sounds complicated, but MJD is an attempt to simplify date calculations.

Think of HJD, BJD and MJD like UTC but for the entire solar system.

One of the challenges faced in gathering this data is that the column metadata is split over three lines, with the title, the data type and the measurement unit all appearing on separate lines.

The actual data captured by the telescope doesn’t start being displayed until line 138 (and this changes from file to file as various telescopes and observation sets have different amounts of associated metadata).

In this example, our columns are…

  • HJD - which is expressed as days, with the values beyond the decimal point being the fraction of that day when the observation occurred
  • Normalized Flux - which is the apparent brightness of the star
  • Normalized Flux Uncertainty - capturing any potential anomalies detected during the collection process that might cast doubt on the result (so long as this is insignificant it can be ignored).

Heliocentric Julian Dates (HJD) are measured from noon (instead of midnight) on 1 January 4713 BC and are represented by numbers into the millions, like 2,455,059.6261813 where the integer is the days elapsed since then, while the decimal fraction is the portion of the day. With a ratio of 0.00001 to 0.864 seconds, multiplying the fraction by 86400 will give us the seconds elapsed since noon on any given Julian Day. Confused? Well, your computer won’t be as it loves working in decimals and fractions, so although this system may seem counterintuitive, it makes date calculations simple math.

We can reverse engineer Epoch dates and regular dates from HJD/BJD, giving Splunk something to work with other than obscure heliocentric dates.

  • As Julian Dates start at noon rather than midnight, all our calculations are shifted by half a day to align with Epoch (Unix time)
  • The Julian date for the start of Epoch on CE 1970 January 1st 00:00:00.0 UT is 2440587.500000
  • Any-Julian-Date-minus-Epoch = 2455059.6261813 - 2440587.5 = 14472.12618
  • Epoch-Day = floor(Any-Julian-Date-minus-Epoch) * milliseconds-in-a-day = 14472 * 86400000 = 1250380800000
  • Epoch-Time = floor((Any-Julian-Date-minus-Epoch – floor(Any-Julian-Date-minus-Epoch)) * milliseconds-in-a-day = floor(0. 6261813 * 86400000) = 10902064
  • Observation-Epoch-Day-Time = Epoch-Day + Epoch-Time = 1250380800000 + 10902064 = 1250391702064

That might seem a little convoluted, but we now have a way of translating astronomical date/times into something Splunk can understand.

I added a bunch of date calculations like this to my props.conf file so dates would appear more naturally within Splunk.



LINE_BREAKER = ([\r\n]+)

EVAL-exo_observation_epoch = ((FLOOR(exo_HJD - 2440587.5) * 86400000) + FLOOR(((exo_HJD - 2440587.5) - FLOOR(exo_HJD - 2440587.5))  *  86400000))

EVAL-exo_observation_date = (strftime(((FLOOR(exo_HJD - 2440587.5) * 86400000) + FLOOR(((exo_HJD - 2440587.5) - FLOOR(exo_HJD - 2440587.5))  *  86400000)) / 1000,"%d/%m/%Y %H:%M:%S.%3N"))

EVAL-_time = strptime((strftime(((FLOOR(exo_HJD - 2440587.5) * 86400000) + FLOOR(((exo_HJD - 2440587.5) - FLOOR(exo_HJD - 2440587.5))  *  86400000)) / 1000,"%d/%m/%Y %H:%M:%S.%3N")),"%d/%m/%Y %H:%M:%S.%3N")

Once date conversions are in place, we can start crafting queries that map the relative flux of a star and allow us to observe exoplanets in another solar system.

Let’s look at a star with the unassuming ID 0300059.

sourcetype=exoplanets host="0300059"

| rex field=_raw "\s+(?P<exo_HJD>24\d+.\d+)\s+(?P<exo_flux>[-]?\d+.\d+)\s+(?P<exo_flux_uncertainty>[-]?\d+.\d+)" | timechart span=1s avg(exo_flux)

And there it is… an exoplanet blotting out a small fraction of starlight as it passes between us and its host star!

What about us?

While curating the Twitter account @RealScientists, Dr. Jessie Christiansen made the point that we only see planets transit stars like this if they’re orbiting on the same plane we’re observing. She also pointed out that “if you were an alien civilization looking at our solar system, and you were lined up just right, every 365 days you would see a (very tiny! 0.01%!!) dip in the brightness that would last for 10 hours or so. That would be Earth!”

There have even been direct observations of planets in orbit around stars, looking down from above (or up from beneath depending on your vantage point). With the next generation of space telescopes, like the James Webb, we’ll be able to see these in greater detail.


Image credit: NASA exoplanet exploration

Next steps

From here, the sky’s the limit—quite literally.

Now we’ve brought data into Splunk we can begin to examine trends over time.

Astronomy is BIG DATA in all caps. The Square Kilometer Array (SKA), which comes on line in 2020, will create more data each day than is produced on the Internet in a year!

Astronomical data is the biggest of the Big Data sets and that poses a problem for scientists. There’s so much data it is impossible to mine it all thoroughly. This has led to the emergence of citizen science, where regular people can contribute to scientific discoveries using tools like Splunk.

Most stars have multiple planets, so some complex math is required to distinguish between them, looking at the frequency, magnitude and duration of their transits to identify them individually. Over the course of billions of years, the motion of planets around a star fall into a pattern known as orbital resonance, which is something that can be predicted and tested by Splunk to distinguish between planets and even be used to predict undetected planets!

Then there’s the tantalizing possibility of exomoons orbiting exoplanets. These moons would appear as a slight dip in the transit line (similar to what’s seen above at the end of the exoplanet’s transit). But confirming the existence of an exomoon relies on repeated observations, clearly distinguished from the motion of other planets around that star. Once isolated, the transit lines should show a dip in different locations for different transits (revealing how the exomoon is swinging out to the side of the planet and increasing the amount of light being blocked at that point).

Given its strength with modelling data, predictive analytics and machine learning, Splunk is an ideal platform to support the search for exoplanets.

Find out more

If you’d like to learn more about how Splunk can help your organization reach for the stars, contact one of our account managers.

Our team on the case

Our Splunk stories

5 quick tips for customising your SAP data in Splunk

Understanding how your SAP system is performing can be a time-consuming process. With multiple environments, servers, APIs, interfaces and applications, there are numerous pieces to the puzzle, and stitching together the end-to-end story requires a lot of work.

That’s where SAP PowerConnect can assist. This SAP-certified tool simplifies the process by seamlessly collating your SAP metrics into a single command console: Splunk. PowerConnect compiles and stores data from each component across your SAP landscape and presents the information in familiar, easily accessible, and customisable Splunk dashboards. When coupled with Splunk’s ability to also gather machine data from non-SAP systems, this solution provides a powerful insight mechanism to understand your end user’s experience.

Given the magnitude of information PowerConnect can collate and analyse, you may think that setting it up would take days—if not weeks—of effort for your technical team. But one of PowerConnect’s key features is its incredibly fast time to value. Whatever the size of your environment, PowerConnect can be rapidly deployed and have searchable data available in less than ten minutes. Furthermore, it is highly customisable in providing the ability to collect data from custom SAP modules and display these in meaningful context sensitive dashboards, or integrate with Splunk IT Service Intelligence.

Here are some quick tips for customising your SAP data with PowerConnect.

1. Use the out-of-the-box dashboards

SAP software runs on top of the SAP NetWeaver platform, which forms the foundation for the majority of applications developed by SAP. The PowerConnect add-on is compatible with NetWeaver versions 7.0 through to 7.5 including S/4 HANA. It runs inside SAP and extracts machine data, security events, and logs from SAP—and ingests the information into Splunk in real time.

PowerConnect can access all data and objects exposed via the SAP NetWeaver layer, including:

  • API
  • Function Modules
  • IDoc
  • Report Writer Reports
  • Change Documents
  • CCMS
  • Tables
PowerConnect has access to all the data and objects exposed via the SAP NetWeaver layer

If your SAP system uses S/4 HANA, Fiori, ECC, BW components or all the above, you can gain insight into performance and configuration with PowerConnect.

To help organise and understand the collated data, PowerConnect comes with preconfigured SAP ABAP and Splunk dashboards out of the box based on best practices and customer experiences:

Sample PowerConnect for SAP: SAP ABAP Dashboard

Sample PowerConnect for SAP: Splunk Dashboard

PowerConnect centralises all your operational data in one place, giving you a single view in real time that will help you make decisions, determine strategy, understand the end-user experience, spot trends, and report on SLAs. You can view global trends in your SAP system or drill down to concentrate on metrics from a specific server or user.

2. Set your data retention specifications

PowerConnect also gives you the ability to configure how, and how long, you store and visualise data. Using Splunk, you can generate reports from across your entire SAP landscape or focus on specific segment(s). You may have long data retention requirements or be more interested in day-to-day performance—either way, PowerConnect can take care of the unique reporting needs for your business when it comes to SAP data.

You have complete control over your data, allowing you to manage data coverage, retention, and access:

  • All data sets that are collected by PowerConnect can be turned off, so you only need to ingest data that interests you.
  • Fine grain control over ingested data is possible by disabling individual fields inside any data sets.
  • You can customise the collection interval for each data set to help manage the flow of data across your network.
  • Data sets can be directed to different indexes, allowing you to manage different data retention and archiving rates for different use cases.

3. Make dashboards for the data that matters to you

You have the full power of Splunk at your disposal to customise the default dashboards, or you can use them as a base to create your own. This means you can use custom visualisations, or pick from those available on Splunkbase to interpret and display the data you’re interested in.

Even better, you’re not limited to SAP data in your searches and on your dashboards; Splunk data from outside your SAP system can be correlated and referenced with your SAP data. For example, you may want to view firewall or load balancer metrics against user volumes, or track sales data with BW usage.

4. Compare your SAP data with other organisational data

It is also possible to ingest PowerConnect data with another Splunk app, such as IT Service Intelligence (ITSI) to create, configure, and measure Service Levels and Key Performance Indicators. SAP system metrics can feed into a centralised view of the health and key performance indicators of your IT services. ITSI can then help proactively identify issues and prioritise resolution of those affecting business-critical services. This out-of-the-box monitoring will give you a comprehensive view of how your SAP system is working.

The PowerConnect framework is extensible and can be adapted to collect metrics from custom developed function modules. Sample custom extractor code templates are provided to allow your developers to quickly extend the framework to capture your custom-developed modules. These are the modules you develop to address specific business needs that SAP doesn’t address natively. As with all custom code, the level of testing will vary, and gaining access to key metrics within these modules can help both analyse usage as well as expose any issues within the code.

5. Learn more at our PowerConnect event

If you are interested in taking advantage of PowerConnect for Splunk to understand how your SAP system is performing, come along to our PowerConnect information night in Sydney or Melbourne in May. Register below to ensure your place.

PowerConnect Explainer Video

Find out more

To find out more about how JDS can help you implement PowerConnect, contact our team today on 1300 780 432, or email [email protected].

Our team on the case

Our Splunk stories