Business Analysis Consulting
Business analysis by Kinzz
Home Resources Articles Data Flow Diagram

Data Flow Diagram (DFD)


A Data Flow Diagram (DFD) tracks processes and their data paths within the business or system boundary under investigation. A DFD defines each domain boundary and illustrates the logical movement and transformation of data within the defined boundary. The diagram shows 'what' input data enters the domain, 'what' logical processes the domain applies to that data, and 'what' output data leaves the domain. Essentially, a DFD is a tool for process modelling and one of the oldest.


A Data Flow Diagram is useful for establishing the boundary of the business or system domain (the sphere of analysis activity) under investigation. It identifies any external entities along with their data interfaces that interact with the processes of interest. A DFD can be a useful tool (particularly when used as a top DFD - refer to Context diagram) for helping secure stakeholder agreement (sign-off) on the project scope. It is also a useful tool for breaking down a process into its sub-processes for closer analysis.


A Data Flow Diagram can be modelled early in the requirements elicitation process of the Analysis phase within the System Development Life Cycle (SDLC) to define the project scope. A DFD can also be created throughout the SDLC to investigate an aspect of the system. If necessary, each process under study within a DFD can be broken down into its sub-processes on a new DFD to show more details. A sub-process in turn can be broken down further to reveal its sub-processes on a new DFD, and so on until sufficient analysis is reached. The activity of drilling down the DFD levels is called 'functional decomposition' with the resulting new DFD referred to as a 'levelled DFD'. For example, the top level DFD (also known as a Context diagram) is a level 0 DFD, level 1 DFD refers to the initial decomposition, level 2 DFD to a second level decomposition, and so on.


The primary audience involved with a DFD are stakeholders such as project sponsors, managers, and subject matter experts who provide the information for a DFD and are the same people who should approve each DFD. Project managers and requirements teams are also involved to plan the project work.


A DFD can be assembled from the following four components:

Process: A process is a logical activity that transforms or manipulates incoming data within the domain under investigation. A process can be regarded as a 'black box'; it receives input, processes it, and produces output. A rounded rectangle (or circle) represents a process under study. Each process is labelled inside its rectangle to describe its function or purpose. It is common to use a verb-noun phrase for naming a process, e.g. ‘Check stock’ and ‘Reserve product’. If several Data Flow Diagrams reference each other (e.g. as in a decomposed levelled DFD structure) each process should be tagged with a numbering scheme or identifier to show the hierarchical relationships between them. For example, 'process 1.0' in level 0 DFD can decompose into 'process 1.1', 'process 1.2', 'process 1.3', etc, in level 1 DFD. Similarly, 'process 1.1' in level 1 DFD can decompose further into 'process 1.1.1', 'process 1.1.2' etc, in level 2 DFD, and so on. It is usual to place the level identifier above the process name with a horizontal line separating them.

External entity: An external entity sits outside the domain of interest and supplies data to or receives data from the domain. An external entity is referred to as an external source or sink (destination) for data flowing in and out of the domain. A rectangle defines an external entity and is labelled with a noun phrase inside its rectangle to describe an organisation, process, machine or person (i.e. a thing) that is outside the domain under analysis. Examples of naming an external entity are ‘Payment company’, ‘Store locator’, 'Mainframe server' and ‘Customer’. Note an external entity in a DFD is not permitted to transform data; only a process can.

Data flow: A data flow represents the path of data moving through the domain under analysis. A data flow shows the movement of data between a process and an external entity, a process and another process, and a process and a data store (data repository). An arrow is the symbol used to connect a process with other DFD components. Each arrow should be labelled appropriately to describe the data being passed, e.g. ‘Customer details’, ‘Rejected order’ and 'Stock level lookup' are common. A data flow can move the same type of data in both directions in which case both ends should show the arrows. Data flows are also useful for identifying interfaces which will need closer data analysis (e.g. ER data modelling). Note that a data flow is considered to be within the domain of the process under study whereas an external entity is not.

Data store: A data store represents a logical data repository accessible within the domain under study. A data store can be a place where data is created, read, changed and stored temporarily or permanently by a process. A thin rectangle with the right side open (or two horizontal parallel lines) shows a data store and is labelled with a noun phrase inside its rectangle to describe the data stored, e.g. 'Order records' and 'Online catalogue'. Physically, a data store can represent a file or a database system. Note a data store is not permitted to process data.


There are two main styles of diagrammatic notations for Data Flow Diagrams; Gane & Sarson notation set (e.g. rounded square symbol for a process, missing right-sided thin rectangle symbol for a data store), and Yourdon's notation (e.g. circle symbol for a process, parallel horizontal lines symbol for a data store).

While there are guiding principles and rules for using Data Flow Diagrams, in practice they are not necessarily always followed. Some DFD practitioners add new symbols or adapt the rules to suit their needs. Sometimes this could be useful but the important point is that whatever the principles or rules, they should be applied consistently throughout the project. The following are some of the principles and rules for using Data Flow Diagrams:


Within a DFD, the external environment (i.e. external entities) sends data into the domain under analysis, where it is transformed as it moves from one process to another inside the domain. The processed data finally returns to the external environment as output data.

A DFD is designed to be broken down or decomposed into a hierarchical tree structure of Data Flow Diagrams (DFD levels) with each child DFD revealing more details than its parent.


A process and a data store must each have incoming and outgoing data flows.

A process and a data store can each have one or more input data flows and one or more output data flows.

A process can connect to any other component, including to another process.

An external entity must send its outgoing data flow only to a process.

An external entity must receive an input data flow only from a process.

A data store receives an incoming data flow only from a process.

A data store sends an output data flow only to a process.

Only a process is permitted to transform or change data.


The following steps present guidelines for developing Data Flow Diagrams:

Define the process: Start a DFD by identifying each process under study and naming it appropriately. If the process needs to be broken down into sub-processes for further analysis, add a numbering scheme to each process to show the hierarchical relationships between the parent and child processes (see 'Process' under Component section). Also, label each DFD with a descriptive name for referencing purposes, appending its levelled DFD sequence identifier appropriately.

Identify external entities: Identify each external entity that interacts or impacts the process of interest and label each one with a descriptive name.

Connect data flows: Connect each external entity to the process under investigation by an arrow to show its data flow direction. Name each data flow clearly and as close to it as possible so as to avoid confusing the name with other data flows close by.

Identify data stores: Add any identified data store on the diagram, label it and connect its data flow to its associated process.

Repeat the steps for each levelled DFD: Repeat the above steps for each levelled DFD that is discovered or needed for further analysis. Finally, name each DFD consistently for referencing back to it throughout the project.


Keep in mind the following pointers when developing Data Flow Diagrams:

Decide which style of notation (Gane & Sarson or Yourdon) to use for the DFD components throughout the project.

Begin with the Context diagram to show the target domain's top process (think of it as the top 'black box') with its major external entities and data flows. This will illustrate the overall business or system under investigation, including its domain boundary from the outside environment. Sometimes it may be useful to leave out any data store details in a Context diagram in order to focus on the big picture; there is an assumption that the data stores are included within the top process.

If you wish to avoid data flow lines crossing over each other, try repositioning the components to see if this helps. If it does not, it is acceptable to duplicate components on the diagram for this purpose.

If possible, limit the number of decomposition DFD levels to three; it will be easier for stakeholders to follow. If further DFD levels are necessary, provide a separate diagram to show the hierarchical tree structure of all the levels involved.

It is common to attach a Context diagram in a high-level business requirements document to define the project scope. More detailed Data Flow Diagrams can be developed later on during the analysis phase and attached to the requirements specification or functional documentation as needed.

Engage project stakeholders when developing Data Flow Diagrams and secure their approval (sigh-off) as soon as possible.

Related articles