Privacy Policy The best selling drugs in Italy are the ones that could be advertised - Andrea Ciufo

The best selling drugs in Italy are the ones that could be advertised

The best selling drugs in Italy are the ones that could be advertised

On the Ministry of Health Website, there is a open data section where you can find the information in a *.csv format on the top-50best selling drugs in Italy.

I decided to investigate this dataset, grouping some information (Python Code Attached Below, I will also upload on Git Hub) and visualize this data better.

Often plots are better than data in a tabular format.

Drugs are divided into two categories:

  • SOP, “Senza Obbligo di Prescrizione” that means without a prescription
  • Over The Counter or Self-Care/Medication drugs that could be advertised

If you clean a bit the Data from the Ministry you could plot an interesting graph.

I realized a scatter plot where SOPs are the blue dots and OTCs are the red dots.

Strictly speaking, this drugs are not the best selling, but the most “distributed” this is the definition on the dataset metadata, but it is a good proxy sales.

 

If we group the two categories (SOP and OTC) and we sum “Boxes quantity provided to pharmacy and drugstore” we obtain that in the first semester 2016*:

30.3 Millions of OCT boxes where distributed against 12.6 Millions of SOP boxes

I am sorry if I did not format in a readable way Y-axis, it is expressed ad 10^7 power ( 3*10^7= 30Milions)

 

Based on this results I can say:

  • Equivalent drugs are little used, even though Italian pharmacists are obliged during the purchase to inform the customer about this option
  • There might be an excessive drugs consumption with a negative impact on health, this could be related on advertising and not a use from the General Practitioner, but I am not a General Practitioner so this consideration is beyond my knowledge
  • It’s fundamental for a rigorous analysis that Ministry of Helth publishes :
    • 2017 and 2018 time series on 50 best selling drugs
    • Data on General Drugs, in order to evaluate correlation and run a data comparison

For an intellectual integrity, I wrote to the Ministry Offices asking an updated dataset and to get data on equivalent drugs, if they will reply me I will update you

 

At the end of the article, you can find all the Python work for data selection, analysis, and visualization from the starting dataset.

After the aggregation through a groupby method, I added a column with a new column with the label drugs “Prescription Modality” This column was added through a union (Merge) with a DataFrame containing the drugs set and the “Prescription Modality” extracted from the original DataFrame.

Thanks for reading the article!

If you like it and you find it useful share it! If you think a fix is needed or any improvement, text me : )

Andrea

 

In [1]:
# Import pandas
import pandas as pd

import numpy as np
# Import plotting module
import matplotlib.pyplot as plt

#Import regex module
import re
file_name='C_17_dataset_15_download_itemDownload_0_upFile.csv'
csv=pd.read_csv(file_name,sep=';',encoding="ISO-8859-1",skiprows=1)
csv.info()

#We explore drugs name 
#A necessary step for 
#the next phase of cleaning and aggregation 
csv_mod=csv.dropna().copy()
csv_mod['Farmaco']=None
#A drug "Rinazina" started with ** because the Ministry have to confirm the data
#for semplicity i rempoved ** 
csv_mod['Denominazione della confezione']=csv_mod['Denominazione della confezione'].str.replace('*','')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 6 columns):
N°                                                                        51 non-null object
Codice Autorizzazione all'Immissione in Commercio
(AIC)                   50 non-null float64
Denominazione della confezione                                            50 non-null object
Fascia PTN                                                                50 non-null object
Modalità
Prescrizione                                                     50 non-null object
Quantità confezioni fornite alle farmacie ed agli esercizi commerciali    50 non-null object
dtypes: float64(1), object(5)
memory usage: 2.5+ KB
In [2]:
#We add a Column "Farmaco" to our Dataframe 
#Why? This column help us for the following group operations

csv_mod['Farmaco']=None
def primaparola(colonna_di_testo):
    
    pattern=r'\W*(\w[^,. !?"]*)'
    return re.match(pattern,colonna_di_testo).group()

print(type(csv_mod['Denominazione della confezione']))

estratto=csv_mod['Denominazione della confezione'].apply(primaparola)
csv_mod['Farmaco']=estratto
csv_mod['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali']=pd.to_numeric(csv_mod['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'].str.replace('.',''))
<class 'pandas.core.series.Series'>
In [3]:
#In this section we create a DataFrame 
#Contaiining the unique "Farmaco" type 
#Es. TACHIPIRINA label could be in multiple rows inside 
#farmaco column because
#there are at least two version
#containg or 500mg or 1000 mg paracetamolo 

#For doing so 
#We drop duplicates
selezione= csv_mod.drop_duplicates(subset=['Farmaco']).copy()
#We select only te relevant columns 
#for building our DataFrame
selezione=selezione[['Modalità\nPrescrizione','Farmaco']]
In [4]:
#Here we start our aggragation job
#here we also visualize in descending order
#the boxes distributed
grouped=csv_mod.groupby(by='Farmaco').sum()
grouped=grouped.sort_values(by='Quantità confezioni fornite alle farmacie ed agli esercizi commerciali',ascending=False)
print("Grouped Prima del reset dell'index")
print(grouped.head(2))
#we reset index in order to get
#drugs name as a column and not as an index
grouped=grouped.reset_index()
print("Grouped Dopo il reset dell' index")
print(grouped.head(2))

#We merge the two DataFrame 
#on the "Farmaco" column
df_farma=grouped.merge(selezione,on='Farmaco')
print(df_farma.head(3))
print(grouped.info())
Grouped Prima del reset dell'index
               Codice Autorizzazione all'Immissione in Commercio\n(AIC)  \
Farmaco                                                                   
TACHIPIRINA                                           89215558.0          
ENTEROGERMINA                                         39138155.0          

               Quantità confezioni fornite alle farmacie ed agli esercizi commerciali  
Farmaco                                                                                
TACHIPIRINA                                              8309360                       
ENTEROGERMINA                                            3834716                       
Grouped Dopo il reset dell' index
         Farmaco  Codice Autorizzazione all'Immissione in Commercio\n(AIC)  \
0    TACHIPIRINA                                         89215558.0          
1  ENTEROGERMINA                                         39138155.0          

   Quantità confezioni fornite alle farmacie ed agli esercizi commerciali  
0                                            8309360                       
1                                            3834716                       
         Farmaco  Codice Autorizzazione all'Immissione in Commercio\n(AIC)  \
0    TACHIPIRINA                                         89215558.0          
1  ENTEROGERMINA                                         39138155.0          
2       VOLTAREN                                        138192273.0          

   Quantità confezioni fornite alle farmacie ed agli esercizi commerciali  \
0                                            8309360                        
1                                            3834716                        
2                                            2940672                        

  Modalità\nPrescrizione  
0                    SOP  
1                    OTC  
2                    OTC  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 3 columns):
Farmaco                                                                   32 non-null object
Codice Autorizzazione all'Immissione in Commercio
(AIC)                   32 non-null float64
Quantità confezioni fornite alle farmacie ed agli esercizi commerciali    32 non-null int64
dtypes: float64(1), int64(1), object(1)
memory usage: 848.0+ bytes
None
In [7]:
##### Now we added a column to know
#wich drugs are SOP and wich are OTC
#Now we can create our scatter plot with different colour

df_farma=df_farma.sort_values(by='Quantità confezioni fornite alle farmacie ed agli esercizi commerciali',ascending=False)
label=df_farma['Farmaco']
ln=np.arange(0,len(df_farma))
#We create two DataFrame one for the OTC and one for the SOP

df_otc=df_farma[df_farma['Modalità\nPrescrizione']=='OTC']
df_sop=df_farma[df_farma['Modalità\nPrescrizione']=='SOP']
#DataFrame index is necessary for 
#displaying correctly on x-axis 
#the x values and 
#the corresponding y-values
index_otc=df_otc.index

index_sop=df_sop.index

otc_plot=plt.scatter(index_otc,df_otc['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'], color='red')
sop_plot=plt.scatter(index_sop,df_sop['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'], color='blue')
plt.ylabel('# of distributed boxes')
plt.xticks(ln,(label) ,rotation=90)
plt.title('Best selling Drugs without prescription in the first semester of 2016')

plt.text(10, 4, 'Data Elaboration Ing. Andrea Ciufo. Data Source: Ministry of Health',
         fontsize=15, color='gray',
         ha='center', va='top', alpha=0.5)
plt.legend((sop_plot,otc_plot),('SOP','Over The Counter that could be advertised'),loc='upper right')
plt.rcParams["figure.figsize"] = (15,4)
plt.show()
In [8]:
#In this sectio we aggregate
#All the SOP and OTC drugs
#In order to visualize 
#through a Bar Plot
#For time reason I did not well formatted Y axes ticks
#The are expressed in power of 10


sum_otc=df_otc.groupby(by='Modalità\nPrescrizione').sum()
sum_sop=df_sop.groupby(by='Modalità\nPrescrizione').sum()
print(sum_otc['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'])
print(sum_sop['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'])
sum_df_y=[sum_otc['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'].values, sum_sop['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'].values]
sum_df_x=['OTC','SOP']

sum_df = pd.DataFrame({'Modalità Prescrizione':sum_df_x, 'Quantità confezioni fornite alle farmacie ed agli esercizi commerciali':sum_df_y})

ind=np.arange(len(sum_df_y))

plt.bar(ind,sum_df['Quantità confezioni fornite alle farmacie ed agli esercizi commerciali'], width=0.2)
plt.rcParams["figure.figsize"] = (7,3)
plt.title('Best selling Drugs without prescription in the first semester of 2016')

plt.ylabel('# of boxes distributed')
plt.xticks(ind,(sum_df_x) ,rotation=90)


plt.text(0.5, 0.5, 'Data Elaboration Ing. Andrea Ciufo. Data Source: Ministry of Health',
         fontsize=15, color='gray',
         ha='center', va='top', alpha=0.5)

plt.show()
Modalità\nPrescrizione
OTC    30361800
Name: Quantità confezioni fornite alle farmacie ed agli esercizi commerciali, dtype: int64
Modalità\nPrescrizione
SOP    12599616
Name: Quantità confezioni fornite alle farmacie ed agli esercizi commerciali, dtype: int64

 

 

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

shares