Multibox Single Shot Detector (SSD)

ভূমিকা

এসএসডি ডিটেক্টর একাধিক স্তর ব্যবহার করে অন্য একক শট ডিটেক্টর থেকে পৃথক যা বিভিন্ন আইশের সাথে বস্তুর উপর একটি সূক্ষ্ম নির্ভুলতা সরবরাহ করে। (প্রতিটি গভীর স্তর বড় বস্তু দেখতে হবে)।

এসএসডি সাধারণত রেজনেট প্রাক প্রশিক্ষিত মডেলের উপর একটি ভিজিজি দিয়ে শুরু করে যা সম্পূর্ণরূপে কনভোলিউশন নিউরাল নেটওয়ার্ক রূপান্তরিত হয়। তারপর আমরা কিছু অতিরিক্ত রূপান্তর স্তর সংযুক্ত, যা আসলে বড় বস্তু হ্যান্ডেল সাহায্য করবে। এসএসডি আর্কিটেকচার মূলত কোন গভীর নেটওয়ার্ক বেস মডেল ব্যবহার করা যেতে পারে।

লক্ষ্য করা একটি গুরুত্বপূর্ণ বিষয় হল যে ভিজিজি নেটওয়ার্কের উপর ছবিটি পাস করার পরে, কিছু রূপান্তর স্তরগুলি 19x19, 10x10, 5x5, 3x3, 1x1 মাপের বৈশিষ্ট্যের মানচিত্র তৈরি করে। VGG এর conv4_3 দ্বারা উত্পাদিত 38x38 বৈশিষ্ট্যের মানচিত্রের সাথে এইগুলি বৈশিষ্ট্য বৈশিষ্ট্যগুলি যা বদ্ধ বাক্সগুলির পূর্বাভাসের জন্য ব্যবহার করা হবে।

Conv4_3 ছোট বস্তুর সনাক্ত করার জন্য দায়ী যখন conv11_2 বৃহত্তম বস্তুর জন্য দায়ী।

চিত্রটিতে দেখানো হয়েছে যে কিছু অ্যাক্টিভেশনগুলি নেটওয়ার্ক থেকে "ধরা" এবং একটি বিশেষ সাব-নেটওয়ার্কে প্রেরিত হয় যা শ্রেণীবদ্ধকারী এবং লোকালাইজার হিসাবে কাজ করা উচিত। পূর্বাভাসের সময় আমরা প্রতি বস্তুর প্রতি একাধিক বাক্সে ফিল্টার করতে একটি অ-ম্যাক্সিমা সপ্রেসন অ্যালগরিদম ব্যবহার করতে পারি।

অ্যাঙ্কর (অগ্রাধিকার বা ডিফল্ট বক্স) ধারণা

নোঙ্গরগুলি বিভিন্ন স্থানীয় অবস্থান, স্কেল এবং দৃষ্টিপাত অনুপাতগুলিতে চিত্রের উপর ওভারলেড বাক্সগুলির একটি সংগ্রহ যা স্থল সত্য চিত্রগুলিতে রেফারেন্স পয়েন্ট হিসাবে কাজ করে। এটি ইয়োলো ধারণা মত যেখানে অ্যাক্টিভেশন ম্যাপে প্রতিটি সেলে একাধিক বাক্স রয়েছে।

একটি মডেল তারপর প্রতিটি নোঙ্গর জন্য দুটি পূর্বাভাস করতে প্রশিক্ষিত হয়:

প্রতিটি নোঙ্গর জন্য একটি বিচ্ছিন্ন শ্রেণী পূর্বাভাস
একটি অফসেটের ক্রমাগত ভবিষ্যদ্বাণী যা দিয়ে নোঙ্গরটিকে স্থল-সত্যের বাইন্ডিং বাক্সে ফিরিয়ে আনতে হবে

প্রশিক্ষণের সময় এসএসডি বিভিন্ন উপায়ে _ ডিফল্ট বাক্সের সাথে বস্তুর সাথে মেলে। বৈশিষ্ট্য মানচিত্রের প্রতিটি উপাদান (সেল) এর সাথে সম্পর্কিত অসংখ্য ডিফল্ট বাক্স রয়েছে। 0.5 এর চেয়েও বেশি একটি আইওইউ (জ্যাককার্ড সূচক) সহ কোনও ডিফল্ট বক্স একটি ম্যাচ বলে মনে করা হয়।

উপরের ছবিটি বিবেচনা করুন, লক্ষ্য করুন যে বিড়ালটির 2 টি বক্স আছে যা 8x8 বৈশিষ্ট্যের মানচিত্রের সাথে মেলে তবে কুকুরের কোনটি নেই। এখন 4x4 বৈশিষ্ট্যের মানচিত্রে কুকুরের সাথে মেলে এমন এক বাক্স রয়েছে।

8x8 বৈশিষ্ট্যের মানচিত্রে বাক্সগুলি 4x4 বৈশিষ্ট্যের মানচিত্রের তুলনায় ছোট। এসএসডি কিছু বৈশিষ্ট্য মানচিত্র দখল করে, প্রতিটি বস্তুর বিভিন্ন স্তরের জন্য দায়ী, এটি একটি বড় পরিসর জুড়ে বস্তু সনাক্ত করার অনুমতি দেয়।

প্রতিটি কক্ষের প্রতিটি ডিফল্ট বক্সের জন্য নিম্নলিখিত নেটওয়ার্ক আউটপুট:

দৈর্ঘ্য c এর একটি সম্ভাব্য ভেক্টর, যেখানে c শ্রেণী সংখ্যা এবং ব্যাকগ্রাউন্ড ক্লাসের সংখ্যা যা কোন বস্তুকে নির্দেশ করে।
অফসেট প্রতিনিধিত্বকারী 4 টি উপাদান (x, y, প্রস্থ, উচ্চতা) সহ একটি ভেক্টর আসল বস্তুর ডিফল্ট বক্স অবস্থানকে সরানো হয়।

Multibox ক্ষতি ফাংশন

প্রশিক্ষণ সময় আমরা একটি যৌথ শ্রেণীবিভাগ এবং প্রতিক্রিয়া ক্ষতি কমানোর।

Yolo হিসাবে এসএসডি ক্ষতি শ্রেণীকরণ উদ্দেশ্য এবং স্থানীয়করণ উদ্দেশ্য ভারসাম্য।

স্থানীয়করণ ক্ষতি

শ্রেণীবিভাগ ক্ষতি

আমরা পাইটোরচ ব্যবহার করে এই ক্ষতির অগ্রগতি প্রচার করছি

def forward(self, predictions, targets):
        """Multibox Loss
        Args:
            predictions (tuple): A tuple containing loc preds, conf preds,
            and prior boxes from SSD net.
                conf shape: torch.size(batch_size,num_priors,num_classes)
                loc shape: torch.size(batch_size,num_priors,4)
                priors shape: torch.size(num_priors,4)
            ground_truth (tensor): Ground truth boxes and labels for a batch,
                shape: [batch_size,num_objs,5] (last idx is the label).
        """

        loc_data, conf_data, priors = predictions
        num = loc_data.size(0)
        num_priors = (priors.size(0))
        num_classes = self.num_classes

        # match priors (default boxes) and ground truth boxes
        loc_t = torch.Tensor(num, num_priors, 4)
        conf_t = torch.LongTensor(num, num_priors)
        for idx in range(num):
            truths = targets[idx][:,:-1].data
            labels = targets[idx][:,-1].data
            defaults = priors.data
            match(self.threshold,truths,defaults,self.variance,labels,loc_t,conf_t,idx)

        # Send localization and confidence to GPU if available
        if GPU:
            loc_t = loc_t.cuda()
            conf_t = conf_t.cuda()

        # wrap targets as Variables (Include on graph, to use autograd)
        loc_t = Variable(loc_t, requires_grad=False)
        conf_t = Variable(conf_t,requires_grad=False)

        pos = conf_t > 0
        num_pos = pos.sum()

        # Localization Loss (Smooth L1)
        # Shape: [batch,num_priors,4]
        pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
        loc_p = loc_data[pos_idx].view(-1,4)
        loc_t = loc_t[pos_idx].view(-1,4)
        loss_l = F.smooth_l1_loss(loc_p, loc_t, size_average=False)

        # Compute max conf across batch for hard negative mining
        batch_conf = conf_data.view(-1,self.num_classes)
        loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1,1))

        # Hard Negative Mining
        loss_c[pos] = 0 # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)
        _,loss_idx = loss_c.sort(1, descending=True)
        _,idx_rank = loss_idx.sort(1)
        num_pos = pos.long().sum(1)
        num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
        neg = idx_rank < num_neg.expand_as(idx_rank)

        # Confidence Loss Including Positive and Negative Examples
        pos_idx = pos.unsqueeze(2).expand_as(conf_data)
        neg_idx = neg.unsqueeze(2).expand_as(conf_data)
        conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
        targets_weighted = conf_t[(pos+neg).gt(0)]
        loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)

        # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N

        N = num_pos.data.sum()
        loss_l/=N
        loss_c/=N
        return loss_l,loss_c

Tensorflow তে প্রয়োগ করা একই ক্ষতি হ্রাস করুন (লক্ষ্য করুন যে কোডটি আরও বেশি জটিল)

def ssd_losses(logits, localisations,
               gclasses, glocalisations, gscores,
               match_threshold=0.5,
               negative_ratio=3.,
               alpha=1.,
               label_smoothing=0.,
               device='/cpu:0',
               scope=None):
    with tf.name_scope(scope, 'ssd_losses'):
        lshape = tfe.get_shape(logits[0], 5)
        num_classes = lshape[-1]
        batch_size = lshape[0]

        # Flatten out all vectors!
        flogits = []
        fgclasses = []
        fgscores = []
        flocalisations = []
        fglocalisations = []
        for i in range(len(logits)):
            flogits.append(tf.reshape(logits[i], [-1, num_classes]))
            fgclasses.append(tf.reshape(gclasses[i], [-1]))
            fgscores.append(tf.reshape(gscores[i], [-1]))
            flocalisations.append(tf.reshape(localisations[i], [-1, 4]))
            fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4]))
        # And concat the crap!
        logits = tf.concat(flogits, axis=0)
        gclasses = tf.concat(fgclasses, axis=0)
        gscores = tf.concat(fgscores, axis=0)
        localisations = tf.concat(flocalisations, axis=0)
        glocalisations = tf.concat(fglocalisations, axis=0)
        dtype = logits.dtype

        # Compute positive matching mask...
        pmask = gscores > match_threshold
        fpmask = tf.cast(pmask, dtype)
        n_positives = tf.reduce_sum(fpmask)

        # Hard negative mining...
        no_classes = tf.cast(pmask, tf.int32)
        predictions = slim.softmax(logits)
        nmask = tf.logical_and(tf.logical_not(pmask),
                               gscores > -0.5)
        fnmask = tf.cast(nmask, dtype)
        nvalues = tf.where(nmask,
                           predictions[:, 0],
                           1. - fnmask)
        nvalues_flat = tf.reshape(nvalues, [-1])
        # Number of negative entries to select.
        max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
        n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size
        n_neg = tf.minimum(n_neg, max_neg_entries)

        val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
        max_hard_pred = -val[-1]
        # Final negative mask.
        nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
        fnmask = tf.cast(nmask, dtype)

        # Add cross-entropy loss.
        with tf.name_scope('cross_entropy_pos'):
            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                  labels=gclasses)
            loss = tf.div(tf.reduce_sum(loss * fpmask), batch_size, name='value')
            tf.losses.add_loss(loss)

        with tf.name_scope('cross_entropy_neg'):
            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                  labels=no_classes)
            loss = tf.div(tf.reduce_sum(loss * fnmask), batch_size, name='value')
            tf.losses.add_loss(loss)

        # Add localization loss: smooth L1, L2, ...
        with tf.name_scope('localization'):
            # Weights Tensor: positive mask + random negative.
            weights = tf.expand_dims(alpha * fpmask, axis=-1)
            loss = custom_layers.abs_smooth(localisations - glocalisations)
            loss = tf.div(tf.reduce_sum(loss * weights), batch_size, name='value')
            tf.losses.add_loss(loss)

Mohammad Mostofa Zaman

Multibox Single Shot Detector (SSD)

ভূমিকা

অ্যাঙ্কর (অগ্রাধিকার বা ডিফল্ট বক্স) ধারণা

Multibox ক্ষতি ফাংশন

স্থানীয়করণ ক্ষতি

শ্রেণীবিভাগ ক্ষতি

তথ্যসূত্র:

0 comments:

Post a Comment

Popular Posts

New Research

SAY HELLO TO ME

ADDRESS

EMAIL

TELEPHONE

MOBILE