Federated Search: Incorporating the Right Data at the Right Time

In 2023, approximately 120 zettabytes* of data are expected to be generated worldwide. A single zettabyte is 10^21 – which is a 1 followed by 21 zeros (a trillion gigabytes). That’s a lot of data, and the volume increases exponentially each year. It’s safe to say, at these sizes, not all data sets can be ingested into a single analytical or decision intelligence platform. So, how does one manage access to all this content and still derive effective analytics? The solution is to use a federated search. 

A federated search is simply a framework through which multiple external sources are searched at the same time, using API interfaces for interacting with third-party systems (open-source data, crypto ledgers, published content), subscription services (public records, court documents, business profiles), data sets (real property, arrest records, visitor logs), and even third-party libraries (image classification, address standardization, natural-language processing). 

Many systems today utilize federated search to improve usability. For example, on your mobile phone, if you search for “pizza,” it will return any installed apps relating to pizza, plus references to contacts, text messages, pictures, or notes stored on the phone; these are analogous to your local databases. Additionally, you’ll see suggestions for various web pages and options for on-demand searches into other services such as calendars, app store, maps, and emails; these are the federated searches. 

The Role of Federated Search in Agency Investigations 

An investigator usually performs federated searches “on-demand” once they’ve narrowed their focus to a specific type of entity or value. For example, in a money laundering investigation, a suspicious person may be connected to multiple accounts derived from several online financial transactions. For the investigator to check other sources quickly and easily, the analytical platform should provide a clickable interface showing what federated searches are in place based on the available data. For example, the investigator can perform a federated search for any Bitcoin wallet references in the financial data to see recent blockchain transactions. For email addresses in the suspect’s profile, federated searches sent to designated verification services will show connections to addresses, phone numbers, and additional target entities. A federated search will return the host’s name, city/state, and ISP name for IP addresses used to conduct the transactions.  

Federated searches can also support ETL/ELT (Extract, Transform, Load/Extract, Load, Transform) function to enrich selected entities with specific values. For example, a suspect’s home address can be geocoded so that it can be viewed on a map, and its distance calculated to a certain bank location. This requires the investigator to select the address (or set of addresses) and invoke a federated search to a remote service to standardize and encode the address into a precision lat-long value. This avoids the time, expense, and processing required to encode the entire data set and only expends resources where and when needed. 

Additional Benefits of Federated Search 

When implemented properly within an analytical or decision intelligence platform and with the right APIs, federated search can deliver more accurate results and a complete investigative profile. Additional benefits include: 

  • Gathering information from multiple sources at the same time 
  • Scheduling searches for regular updates  
  • Increasing analytical context by staying within the same platform 
  • Refresh results on-demand with minimal processing impact  
  • Easily extend to new sources and systems  

Federated searches and ETL/ELT methods are common ways to add value to the underlying data. They often integrate with other transformation systems, libraries, and services to deliver incremental value. Our next blog in this series, Adding Value to Data: The Power of Data Transformation for Improved Analytical Outcomes, will focus on transformation types, including algorithms, heuristics, and machine-learning models. Having a full understanding of how to exploit data is key to deploying successful analytical platforms.  

Visit NEXYTE.AI to learn more about NEXYTE, the data fusion and machine learning platform revolutionizing decision intelligence. 

How much is Zetta? 

* Zetta = 10^21 / 1021 (1,000,000,000,000,000,000,000) 

  • All the grains of sand on all the beaches in the world 
  • You could fill all the world’s oceans over 3,000 times (zettagallons) 
  • It would weigh almost 6 earths (zettatonnes)  
  • Light would take 600 million years to travel this distance (zettamiles) 
  • A spaceship could travel to the sun and back over 5 trillion times (zettamiles) 

Let's Empower Decision Intelligence

Inbar Goldstein , Product Manager

Inbar is a Product Manager for NEXYTE, Cognyte’s decision intelligence platform. She has over two decades of experience in the Defense and Intelligence sectors. Inbar holds a BA in Computer Science from the Academic College of Tel-Aviv and is currently pursuing a Master’s degree in Cyber, Politics, and Government at Tel-Aviv University .
See more from this author